• No results found

R ob J agt ( s 1910744) Aforecastingmethodforintradayelectricityspotprices.

N/A
N/A
Protected

Academic year: 2021

Share "R ob J agt ( s 1910744) Aforecastingmethodforintradayelectricityspotprices."

Copied!
68
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A forecasting method for intraday electricity spot prices.

Rob Jagt (s1910744)

Supervisor: dr. D. Ronchetti University of Groningen, June 2016

Abstract

(2)

Contents

I Introduction 3

I.1 The (German) electricity market . . . 3

I.2 Data characteristics . . . 6

I.3 Problem definition, research question and motivation . . . 8

II Theory 9 II.1 Model derivation . . . 9

II.2 Literature overview short term electricity price forecasting . . . 21

III Emperical analysis 26 III.1 Model validation . . . 26

III.2 The periodic autoregressive model . . . 28

III.3 Extreme upward and downward prices . . . 33

III.4 Probability of spike occurrences . . . 43

IV Estimation of the armax and mean reverting jump diffusion models 52 V Point forecasts of the electricity spot price 55 VI Discussion 59 VI.1 Model comparison . . . 59

VI.2 Further remarks . . . 63

VIIConclusion 65

(3)

I.

Introduction

At the climate conference in Paris in 2015, 195 countries adopted the first-ever universal, legally binding global climate deal. The agreement consists of a comprehensive plan to limit the global temperature increase to well below 2 degrees centigrade. Concerning the electricity production the goal is to reduce the CO2emissions from 411g/KWh to as little as 15 g/KWh. For this to be

realised things need to be changed drastically.

In the last decade there already has been an large increase in investments in renewable en-ergy and after the conference in Paris this will increase even further. This revolution will have great on our energy system, specifically for electricity, and this impact will only get bigger. The old ’top down approach’, in which a few big power plants supply all the energy and transport it to the end consumers, is going to replaced by a ’bottom up approach’, in which energy is produced locally and is distributed in a more local electricity grid.

One of the main characteristics of renewable energy sources, like wind and solar energy, is that their energy supply is dependent on uncontrollable processes like the amount of wind and sun hours. Currently our energy demand is almost perfectly price elastic. For proper integration of renewables into the energy system, specifically for electricity, the energy system must change drastically.

Currently, it is not feasible to store electricity in an economical way. The dynamics of the energy supply and the price inelasticity of the demand leads to the fact that electricity prices exhibit exceptional behaviour in comparison to other financial data. In the light of the liberation of the electricity market, making forecasts of the electricity spot prices markets have gained increased interest [1]. The ultimate goal is to make electricity demand more price elastic such that the grid can incorporate fluctuations in energy supply instead of the supply having to match the demand. For this to be the case, being able to make accurate forecasts of what the electricity spot price is going to be is essential. In this MSc thesis a new modelling approach is derived and compared to current models present in literature. The structure of the report is as follows: First, a comprehen-sive overview will be given on how the (German) electricity market functions and the used data set will be discussed in detail. After this, a concise description of the problem definition will be given, together with the research questions. Secondly, the theoretical framework will be derived, together with an overview of the relevant literature. Thirdly, all the results will be presented and compared. Here it will be shown that developed model improves the current models presented in literature by properly taking into account the stylized facts of electricity spot prices. The report concludes with the conclusions, recommendations and future outlook.

I.1.

The (German) electricity market

Over the counter and exchange traded markets

(4)

Within the exchange there are three different markets which can be distinguished. These market are the Futures Market, Day-Ahead Market and the Spot Market. See also table [1]. Here, stan-dardized contracts on physical delivery of electricity are traded. Most of the trades (approximately 75 - 80 % in terms of amount of MWh’s traded) take place within the futures market, in which all contracts are long term contracts with time to delivery of more than one day. The second largest exchange market is the Day-Ahead Market in which approximately 20% of the trades take place. Here, parties trade on an hourly basis for delivery on the same hour the next day such that they can sell a foreseeable surplus or can supplement a shortage. Lastly a small portion of the trade are deals concerning delivery within the same day. At the spot market, traders can buy and sell at short notice surplus or shortage of electricity. Especially the spot market will increase in relevance due to the growth of (partially unpredictable) renewable energy sources.

Futures Market Day-Ahead Market Spot Market

Approximate amount traded (%) 75−80% 20% 0−5%

Time until delivery ≥1 day 1 day 1 hour

Table 1: The exchange traded market for electricty market can be subdevided into three markets: The futures market, the day-ahead market and the spot market. The market differ both in size (in terms of MWh traded) and the time until delivery.

Next to the OTC and exchange traded markets there are also markets regulated by the transmission system operators (TSOs), who keep control power available to maintain stable and reliable supply. Further discussion on these markets is not relevant to the topics treated in this paper and will be omitted.

In the OTC market, the electricity price is agreed upon in a bilateral agreement. In case of the exchange traded markets the market clearing price is determined by the aggregated demand and supply curves. See also Figure 1 . The exchange market acts as an intermediary, making sure that the total demand equals the supply.

The fact that demand and supply have to be matched at every moment in time is crucial for elec-tricity markets, because elecelec-tricity is non storable. The transmission system operators have to keep the net frequency within a very narrow bandwidth from 50 Hz and maintain a specific potential for different parts of the electricity grid. These specific characteristics give rise to extraordinary properties concerning the price data, especially within the electricity spot market. Before these properties can be discussed the Merit Order Model has to be introduced.

Merit Order Model

The merit Order Model is a theoretical framework that explains what the market clearing price is and which energy sources will provide the electricity. It is a widely accepted fundamental framework to determine electricity spot price. See also [3] . The concept is based on the fact that the energy sources that have the lowest marginal cost of producing energy are the first to meet the demand. The energy sources that have the highest marginal cost of operation will only provide electricity when the demand is high. This model has been very successful in explaining the daily seasonality in electricity prices.

(5)

Figure 1: Example of determination of the market clearing price in an auction. The market clearing price is the price for which the supply and demand curves cross. The demand and supply curves are created from the aggregated bids and demand offers. Figure from [1].

characteristics: renewables (wind, solar, hydro), gas/oil and coal/nuclear/biomass. Renewable energy sources have the lowest CO2emissions, the lowest marginal production cost (although

the total production cost are the highest) and are inflexible and uncontrollable. The gas- and oil-fired power plants have the second highest CO2 emissions and have the highest marginal

production cost. These power plants are usually only brought online during the day when the demand is high. As opposed to the renewable energy sources these power plants are controllable and flexible. They can be brought online within a time frame of 15 minutes. The last category includes the coal and nuclear power plants. They are considered the most environment unfriendly energy sources and have the second lowest marginal cost of production. As for the gas- and oil-fired power plants, these energy sources are controllable, but are generally not very flexible. The electricity output of these energy sources has to be very constant. Within the German electricity market these plants supply the base load of energy (the lowest amount of energy demand required throughout the day). Biomass falls within this category since it is burned in coal-fired power plants. An example of how this works out for the Dutch and German case is depicted in figure 2. In the German case, the price setting energy source is most of the time a coal-fired power plant (hard coal and lignite), whereas for the Dutch case the gas-fired power plants are dominant in the price setting. Since the renewable energy sources have a negligible marginal cost higher energy production effectively places the supply curve to the right. If the total demand remains the same the residual load (total demand minus the total production of renewables) will decrease. This lowers the market clearing price. Thus the effect of renewables is that the wholesale electricity price decreases.

(6)

Figure 2: On the vertical axis the market clearing electricity price (coloured line) and the occurance (histogram) are shown on the vertical axis. On the horizontal axis the residual load (total electricity demand minus the total renewable production) is depicted. Both figures are for the year 2014. Left figure: Example of the German merit order. Right figure an example of the Dutch merit order. Taken from [2].

rise to phenomenal characteristics. The efficient market hypothesis states that the current price fully reflects all the available information and that past prices cannot be used to make an excess profit (more than the market over all), by using this information. However due to the fact that electricity cannot be stored economically feasible this arbitrage argument is no longer valid and the historic price data is rich of information that can be used for forecasting purposes. Lastly, the fact that transmission system operators have to maintain a stable electricity grid gives a stringent requirement that the supply and demand have to be matched continuously. The demand for electricity is however largely price-inelastic and is only partially able to compensate for the change in supply. If the supply is not able to match the demand the only thing that changes is the price. This gives rise to sharp upward and downward spikes on top of a smooth curve which can not be explained by the merit order model. The upward/downward spikes occur when there is a shortage/excess of electricity. On average, the upward spikes are much higher in absolute size than the downward spikes. This inverse leverage effect that positive shocks impact the price more than negative shocks is also quite unusual in financial data.

I.2.

Data characteristics

In the previous section the general features of the German electricity market are discussed. In this section a closer look will be taken at the specifics of the data set used for the analysis throughout this paper. All the data are obtained from publicly accessible sources and are also investigated in the paper by D.Liebl [4]. As discussed, the report will focus on the German electricity spot market, which is traded at the European Energy Exchange (EEX). On the website of the exchange (www.eex.com) data on the electricity spot price can be found. On the website (www.entsoe.eu) of the European Network of Transmission System Operators for Electricity, data on the electricity demand is available. The information on the total power produced by renewable energy sources are provided by the EEX Transparency Platform (www.transparency.eex.com). Additionally, the total data set can be downloaded from [5].

(7)

Figure 3: The time series of the German electricity spot prices (Euro/MWh) between january 2006 and september 2008. In total, there are L = 24096 data points. Price spikes as high as 2000 euro/MWh are observed. However, the range of the vertical axis however is for clarity reasons depicted from 0 euro/MWh to 300 euro/MWh.

3, the total time series of the electricity spot price is depicted. For estimation purposes, taking a larger time-span will lead to lower variance of the estimates. However for a too large data set the dynamics of the system will change. This could for example be a change in power generation composition. This will make the time series non-stationary, which is a unwanted property if models should be estimated using this data. Taking into account this trade-off, this size of the dataset was chosen.

Figure 4: Fourier transform of the time series of the German electricity spot prices. Firstly the low base signal indicates the slow varying (monthly) signal. Secondly the peaks at 24 Hz and 48 Hz indicate the strong daily pattern. Finally, the 3.4 Hz peak indicates a weekly varying signal in the prices. Other peaks are overtones/harmonics of the previously mentioned signals.

(8)

peak is present at approximately 3.4 Hz indicating a weekly varying signal in the prices, which is also plausible. The other peaks are due to higher overtones/harmonics.

I.3.

Problem definition, research question and motivation

The problem definition, research questions and motivation will be discussed, combining the mate-rial from the previous section. As discussed in the general introduction integrating unpredictable renewable energy sources requires the demand of electricity to become more responsive to price changes. An essential part of this transition towards a smart grid is to be able to forecast the electricity spot price several hours into the future.

Next to the social issues, modelling and forecasting the electricity spot price is also interest-ing from a fundamental econometric point of view. The fact that past electricity prices contain large amounts of information, that there are lots of explanatory variables which can be used to model this price and that there are a lot of stylized facts (such as price spikes and inverse leverage effects), which are unseen in regular financial data, makes this an interesting field of research. Thus given this problem definition and motivation the research question of this report is as follows: •How to make accurate short term (several hours) forecasts of the electricity spot price?

Given the data characteristics, this research question can be subdivide into several more economet-ric research questions:

•How to model the slow varying baseline?

(9)

II.

Theory

Given the research questions of the previous section, in this section the theoretical framework will be derived. In the next section, the models will be estimated and the results will be discussed. This section consists of two parts. The first part will focus on deriving the developed model and the associated theory. Specifically, this will contain the relevant theory form extreme value theory and a discussion on periodic autoregressive (par) and probit models. The second part will give a general overview of the electricity price forecasting literature and will go into detail on autoregressive moving average with exogenous variables (armax) and mean reverting jump diffusion models (mrjd) to compare the developed model with.

II.1.

Model derivation

Periodic autoregressive model

As discussed in the introduction of the theory section, first the developed model will be derived. The model validation will be discussed in the results section. For now, the discussion on incorpo-rating the price spikes is postponed and the focus for now will be on modelling the seasonality of the model. As discussed in the data characteristics section, this seasonality gives rise to a slow varying (monthly) signal and fast varying daily signal.

In order to incorporate the merit order model properly into our model the daily varying pattern should be taken care of. Simply using the residual load as an explanatory variable for the entire time series would not work, since this does not take into account that the price setting energy producer is different for different demand. This leads to a different price to be paid per MWh and hence different coefficients. This is solved by modelling the time series not as an univariate time series, but considering the spot price as a multivariate series of a vector of size 24×1, containing the spot prices for every hour.

The slow varying base signal can be modelled in several ways. Possible ways to incorporate the base line into the model, is to either take daily or weekly averages and model this using auto regressive moving average models or to just simply use dummy variables for every month. In this report for the latter is chosen, however since in this research the emphasis will be one short term forecast (of a couple of hours) this will not influence the results significantly.

Lastly, as was described in the introduction section, the residuals are likely to contain plenty of relevant information for modelling and forecasting. Therefore the model should also contain autoregressive variables as explanatory variables. However, since the circumstances are different for different hours during the day (for instance caused by the price setting energy generator) this is also likely to depend on the time of the day.

Taking together the above mentioned argument, the base signal of the electricity spot price is going to be modelled using as a periodic autoregressive (PAR) model. For more details see also 6 . Here, the base signal of the electricity spot price is modelled as follows:

yh,t=x0h,tβh+

24

j=1

φh,jyh−j,t+uh,t, (1)

(10)

xh,tis a d×1 vector of explanatory variables, βh is d×1 parameter vector, φh,jis a parameter to

determine the lagged dependencies. In the case that h−j is negative yh−j,t =yh−j+24,t−1. The

vector xh,tcontains dummies for the day and month. Additionally, xh,t also contains data on the

temperature, the total demand minus the wind infeed and the wind infeed itself on(h, t). In order to rewrite this PAR model into a first order vector autoregressive (VAR) model, the following Equation for the residuals is composed,

ΦA

et=ΦBet−1+zt,

where eh,t=yh,t−x0h,tβh, et = [e1,t, . . . , e24,t]and where the elements of the matricesΦAandΦB

are given by ΦA i,j=            1 i=j 0 j>i −φi−ji j<i , ΦB i,j=    −φi−j+mi i−j≤0 0 otherwise ,

where m=24, t now stand for the day and i, j indicate the row and column number respectively. Pre multiplying both sides with the inverse ofΦAgives

et =Φet−1+et

Let now ytbe defined as yt = [y1,t, . . . , y24,t]0, xt block diagonal, with on the diagonal the vectors [x1,t0 , . . . , x024,t]and β= [β1, . . . , β24]. We then obtain the following

yt =xtβ+et et =Φet−1+et  → et−1 =yt−1xt−1β ytxtβ=Φ(yt−1xt−1β) +et, yt =xtβ+Φ(yt−1xt−1β) +et,

In order to rewrite this into one Equation the following is defined: Y = vec(Yr), where Yr is

a T×m matrix with rows [y1,t, . . . , ym,t] with T equals to the number of analysed days. Z is

a block diagonal matrix with diagonal matrices Zi which have rows[xi,t, ¯xi,ty¯i,t], where ¯yi,t is

a m×1 vector containing the 1 until m hours lagged variables of yi,t and ¯xi,t contains the one

hour lagged explanatory variables, excluding the day and month dummies. Now, let k be the number of columns in[xi,t, ¯xi,ty¯i,t]. Then let B be a k×m matrix with its nth column equal to

the k· (n−1) +1 until k·n rows of β stacked on top of a vector γ which comes from ¯xi,t . Then

finally, the par model as expressed in Equation (1) can be rewritten into one single Equation:

Y =ZA+E, (2)

where A=vec BΦ 

(11)

Here it is assumed that the errors are uncorrelated between observations and that the standard error can be different for different hours. These assumptions will be discussed in the results section. Estimates of A can simply be obtained using ordinary least squares(OLS). The OLS estimate of A, ˆA, is given by

ˆ

A= (Z0Z)−1Z0Y. (3)

To obtain consistent estimates and standard errors some general assumption underlying OLS need to be discussed first. The data comes from a time series and the explanatory variables (since they contain lagged variables) will not be strictly exogenous with respect to the error terms. However, when the time series is stationary and ergodic also weak exogeneity suffices to obtain consistent estimates. These assumptions boil down to stating that the joint probability distribution does not change when the stochastic process is shifted in time and that the time average of the sequence of events is the same as the ensemble average.

As will be shown in the results section, the univariate time series of the electricity spot price is not stationary. However, for the par model, the time series is effectively split up into 24 separate Equations, one for every hour. Assuming stationarity for every separate time series is more realistic. A possible way in which the assumption of ergodicity can be violated is if there are variables which have long memory, such that observations which are far apart in time are still strongly correlated. Intuitively, this means that even though the number of observations is large the information these observations contain is limited by this long memory. Thus the sample size does not increase fast enough such that the estimates of the parameters converge asymptotically to the true parameter values. This long memory effect has for instance been observed in the Scandinavian power market [7] [8]. Here, the large contribution of hydro power gives rise to this long memory. However the contribution of hydropower in Germany is limited and therefore it is save to assume ergodicity. Assuming stationarity and ergodicity of the time series, together with the usual assumption of correct model specification, weak exogeneity, no perfect collinearity, homoskedasticity and no serial correlation allows us to use central limit theorems, that show that the OLS estimator of the coefficients of the formula for every separate hour will converge to a normal distribution in this large sample. To make this more formal, let A(i) be defined as

A(i) = Bi Φi

 ,

where BiandΦiare both the i-th column of B andΦ. Then the OLS estimate of A(i), ˆA(i), is be

given by

ˆ

A(i)= (Z0iZi)−1Z0iYi, (4)

where Yi is the i-th column of Yr. The least squares estimator of A(i), ˆA(i), is consistent (that is,

ˆ

A(i) converges in probability to A(i)) and asymptotically normal: ˆ

A(i) d−→N(0, σi(Zi0Zi)−1). (5)

An unbiased estimator for σiis given by ˆs2

i, where ˆs2i is defined as:

ˆs2i = (Yi

(i)Z

i)0(Yi(i)Zi)

(12)

The Generalized Extreme Value Distributions

In the previous section, the daily varying baseline has been modelled using a par model. This properly takes into account the merit order model and the explanatory power present in the historical and present price data. However, as was discussed in the data characteristics section, there are also strong upward and downward peaks submersed on top of this baseline when there is a shortage or an oversupply of electricity. This cannot be explained by the merit order model. In this report, these peaks will be characterized using extreme value theory (EVT). Here, the necessary theory will be discussed and derived. The next section will go into more detail on forecasting these peaks using a probit model.

Extreme value theory focusses on the statistical behaviour of

Mn =max{X1, . . . , Xn}, (6)

where X1, . . . , Xnare independently and identically distributed random variables with a certain

distribution function. Here, the Xi represent the OLS residuals of Equation [1], if the lagged

variables are omitted. More formally, let the following linear model be defined for yh,t:

yh,t=x0h,tβh+uh,t. (7)

Then the residuals ˆuh,tare defined as

ˆ uh,t=yh,t−x0h,tβˆh, where ˆ βh = T

t=1 xh,tx0h,t !−1 T

t=1 xh,tyh,t.

Given the distribution function of the residuals, F(.), on can obtain the distribution for Mn. It is

easily seen that this function is given by:

Pr{Mn ≤z} =Pr{X1≤z, . . . , Xn ≤z}

=Pr{X1≤z} × · · · ×Pr{Xn≤z}

= {F(z)}n.

However the only problem is that the function F is unknown. Extreme value theory focusses on the approximate family of distribution for these extreme points. This class of distributions is called the Generalized Extreme Value Distribution. It has the same role in central limit theorems for extreme values, analogous to the role the normal distribution has in central limit theorems for sums of random variables.

To make this argument more formal, consider the process {X1, . . . , Xn}, where all the Xi are

i.i.d. Additionally let Sn =∑ni=1Xi. If and only if E[|Xi|] <∞ the Kolmogorov’s Strong Law of

Large Numbers gives that:

(13)

Then, from the Lindeberg-Lévy central limit theorem one obtains that lim n→∞P " Sn−nE[Xi] pV[Xi] ≤x # =Φ(x), for anyx∈R,

whereΦ(x)is the CDF of a standard normal distribution function and V[Xi]is the variance of Xi.

A same line of reasoning can be applied to the variable Mn. To avoid problems concerning

the support of F a new renormalized variable M∗nis defined as Mn∗=

Mn−bn

an ,

where an, bn∈R and an >0.

The entire range of possible limiting distribution of M∗n is now given by the following

theo-rem (see also 9 ) :

Theorem 1 If there exist sequences of constants{an>0}and{bn}such that

Pr ( Mn−bn) an ≤z  →G(z) as n→∞,

where G is a non-degenerate distribution function, then G can be written as a single family of models having distribution functions of the form

G(z) =exp −  1+ξ z−µ σ −1/ξ! , (8)

defined on the set{z : 1+ξ(z−µ)>0}, where µ, σ, ξ ∈ <and satisfy−∞<µ< ∞, σ>0 and

−∞ < ξ < ∞. The limiting distribution G(z) is called the generalized extreme value (GEV) family of

distributions.

Within the GEV family, three different types of families are distinguished, commonly referred to as the Gumbel, Fréchet and the Weibull. See also Table 2 . Given the support, the Fréchet distribution is appropriate to model the upward spikes and the Weibull distribution is appropriate for the downward spikes. Also these distributions have fat tails. This is an important feature as will be seen in the result section.

Weibull Gumbel Fréchet

Shape parameter ξ <0 0 >0 Support −∞, µσ ξ i (−∞, ∞) hµσξ,∞ 

Table 2: The Weibull, Gumbel and Fréchet distribution all belong the GEV family. They can be distinguished by their scale parameter ξ and their support.

For the electricity spot prices, there are both upward and downward spikes. Let the maxima of the upward spikes be denoted by Z1up, . . . , Zupm1 and the minima of the downward spikes as

Zdown1 , . . . , Zdownm2 . Within the results section the precise method for obtaining spike data is

(14)

assumption is likely to be violated since it is known from electricity spikes forecasting literature that spikes exhibit clustering 10 11 . For now this assumption is maintained and at the end of this section this problem will be tackled.

There are several ways in which parameter estimates of the GEV distribution can be obtained. Here, the parameters will be estimated using maximum likelihood. As can be seen in Table 2 the support of the GEV distribution is not the same for all the different parameters µ, σ and ξ. This may potentially be problematic, since it violates the regularity conditions that are required for the asymptotic properties of the maximum likelihood estimator to be valid. This has been investigated by R.Smith 12 and the following conclusions were obtained:

when ξ > −0.5, maximum likelihood estimators are regular, in the sense of having the usual asymptotic properties;

•when−1<ξ< −0.5, maximum likelihood estimators are generally obtainable, but do not have

the standard asymptotic properties;

when ξ < −1, maximum likelihood estimators are unlikely to be obtainable.

The probability density function of the Generelized Extreme Value (GEV) distribution, hξ,µ,σ(x)is

given by hξ,µ,σ(x) = 1 σtξ,µ,σ(x) ξ+1exp[−t ξ,µ,σ(x)], (9)

where tξ,µ,σ(x)is defined as:

tξ,µ,σ(x) =       −h1+ξ x−µ σ i−1/ξ ξ6=0 exph−x−µ σ i ξ=0,

where the same conditions apply as in Theorem 1 .

Given the pdf function of the GEV distribution, the log likelihood function l(µ, σ, ξ) is easily

obtained. For ξ6=0, l(µ, σ, ξ)becomes

l(µ, σ, ξ) = −m log σ− (1+1/ξ) m

i=1 log  1+ξ zi−µ σ  − m

i=1  1+ξ zi−µ σ −1/ξ , (10) provided that 1+ξ zi−µ σ  >0, for i=1, . . . , m. (11) Maximizing the likelihood function as defined in Equation[10]yields the maximum likelihood estimates for the parameters µ, σ and ξ. In the case that the maximum likelihood for the parameter

ξis close to zero the log likelihood function has to be replaced by the log likelihood function of

the Gumbel distribution, since the(1+1/ξ)term diverges as ξ→0.

(15)

LetΘ be defined as the vector of parameters, i.e. Θ = (µ, σ, ξ). Then the maximum likelihood

estimator ofΘ, ˆΘMLE , is asymptotically normally distributed:

n(ΘˆMLE−Θ)−→d N(0,I−1), (12)

whereI is the Fischer information matrix with its elements Ii,jdefined as

Ii,j =EX " − 2ln h ξ,µ,σ(X) ΘiΘj # .

In order to access the validity of the usage of the GEV distribution to model the upward and downward spikes, the model fit can be compared to the data. The most straightforward way of doing this is to simply plot the histogram of the data and superimpose the fitted distribution on top of this. Inspection of these graphs however are difficult, especially when it comes to assessing the quality of fit of the tails. Therefore it may be beneficial to also make a quantile plot. Given the independent observations of the upward and down ward spikes from the generalized extreme value distribution, a quantile plot consist of the points

n ˆG−1(i/(m+1)), z (i)  , i=1, . . . , mo, (13) where ˆ G−1  i m+1  =µˆ− ˆσ ˆ ξ " 1−  −log  i m+1 − ˆξ#

and z(i)represents the ordered spikes data such that z(1)≤z(2)≤ · · · ≤z(m).

The generalized Pareto distribution

Next to the family of GEV distributions, there exists another distribution which can be used to characterize the distribution of extreme events. For the derivation of the GEV distribution, the concept of block maxima was introduced in Equation [6]. In this section the concept of threshold exceedance is used to derive the Generalized Pareto distribution.

Let X1, X2, . . . , Xn be i.i.d. random variables. A realization xi of this series is considered to

be an extreme event when a certain threshold u has been exceeded. Let the distribution function of Xi be given by as F. Then the conditional probability that Xiexceeds the threshold u by a value

of y, given that it has exceeded u, is given by

Pr{X>u+y|X>u} = 1−F(u+y)

1−F(u) , y>0.

If for this same series Mn = max{X1, . . . , Xn}converges to a GEV distribution, then for large

enough u, the distribution of(Xi−u), condtional on X>u is approximately

(16)

defined on{y : y>0 and 1+ξy/ ¯σ>0}, where

¯σ=σ+ξ(u−µ).

In the limit that u goes to infinity, this approximation becomes exact. Further details and proofs are outside the scope of this report, for more details see [13]. The distribution H(y), as defined in Equation (14) is called the generalized Pareto distribution. There is a strong duality between both distributions, since when the block maxima of the series X1, X2, . . . , Xnhas a GEV distribution,

then the threshold exceedances will be approximately generalized Pareto distributed, provided that the threshold is large enough. In particular, ξ is the same for both distributions. This duality indicates that the shape parameter ξ is dominant in determining the qualitative behaviour of the distributions.

To determine the correct threshold size a trade-off has to be made between bias and variance. This is true for both the data on block maxima and threshold exceedences. A larger block size or higher threshold would lead to less bias. However, this also generates less data points to estimate the distributions and therefore increases the variance. The goal is to make the threshold as low as possible while at the same time maintaining validity of the approximations.

To select an appropriate threshold a possible technique is to look at a so called mean resid-ual life plot. Let Y follow a Generelized Pareto distribution with distribution function 14 . The expected value of Y is then given by

E(Y) = σ

1−ξ, for ξ<1. (15)

Taking again i.i.d. random variables X1, X2, . . . , Xn and u0 a threshold value such that the

exceedences over this threshold are approximately GP distributed, one obtains that for u>u0the

following relation holds:

E(X−u|X>u) = σu0+ξu

1−ξ ,

where σu0 is the scale parameter corresponding to excesses of the threshold u0. Thus the expected

mean of the excess over the threshold u is expected to increase linearly with u, given that u>u0.

More specifically, define the following set of points: ( u, 1 nu nu

i=1 (x(i)−u) ! : u<xmax ) , (16)

where x(1), . . . , x(nu)are the observations which are above the threshold and xmaxis the largest

observation present in the data set. The value of u, such that the sample analogue of the expected mean excess over u is plotted against u becomes linear, is an appropriate choice for u0. This plot is

called a mean residual life plot.

(17)

obtained and is given by l(σ, ξ) = −k log σ− (1+1/ξ) k

i=1 log(1+ξyi), (17)

subjected to the constrain that (1+σ−1ξyi) > 0 for i = 1, . . . , k. The maximum likelihood

estimators will be asymptotically normally distributed as given in Equation (12) , but now the probability density function of the GP distribution is used to determine the Fisher information matrix. For ξ6=0, the quantile plot is obtained by plotting the following pair of data

n ˆH−1(i/(k+1)), z (i)  , i=1, . . . , ko, (18) where ˆ H−1(y) =µˆ− ˆσ ˆ ξ h y− ˆξ1i.

Extreme values clustering

In the previous sections it was assumed that the underlying stochastic process, giving rise to the price spikes, was i.i.d. It is however known from literature that spikes in electricity spot prices exhibit price clustering and hence the independence assumption is likely to be violated. Fast amounts of literature have been been written on this topic, see for instance [10] [11] [14] [15] . For a theoretical analysis on extremes in dependent series see also [13] .

A more realistic assumption about the stochastic random variables X1, . . . , Xn , instead of

as-suming independence, is to assume the time series is stationary. This implies that the joint distribution of two elements in the stochastic variables, Xiand Xj, only depends on the difference

between i and j not on the value of i or j itself. However the different elements of the series no longer have to be independent of each other.

To see how the change from an i.i.d. time series to a stationary time series influences the above developed methods to estimate the parameters of the GEV and GP distributions, the following theorem is considered. For the derivation and details see [13].

Theorem 2 Define the series X1, . . . , Xnand X∗1, . . . , X∗nas a stationary series and an i.i.d. process where

the marginal distribution of both processes are the same. Additionally let Mn =max{X1, . . . , Xn}and

M∗n=max{X1∗, . . . , X∗n}. Under suitable regularity conditions,

Pr  (M∗n−bn) an ≤z  →G1(z),

as n → ∞ for normalizing sequences{an > 0} and{bn}, where G1 is a non-degenerate distribution

function, if and only if

Pr (M n−bn) an ≤z  →G2(z), where G2(z) = (G1)θ(z),

(18)

This theorem states that the limiting distribution of a stationary and ergodic series that converges to a specific distribution is related to the limiting distribution of the corresponding independent series. The effect of the dependence structure between the different element of the time series is fully captured by the parameter θ. This is a very interesting property since if it is given that G1(z) follows a GEV distribution with parameters (µ, σ, ξ), it follows from Equation (8) that

G2(z) = (G1(z))θalso follows a GEV distribution with parameters(µ, σ, ξ∗), where(µ, σ, ξ∗)

are related to(µ, σ, ξ)as follows

µ∗=µσ ξ(1−θ

−ξ)

σ∗=σθξ ξ∗=ξ.

This is a very convenient result, since it states that the only property that the dependence structure in the process alters is the parametrization of the limiting distribution. The family of the limiting distribution however remains the same for all different degrees of clustering. Since the parameters of the GEV distribution have to be estimated anyway, this change in parametrization is not relevant for this application. There is a need for consistent estimates of the parameters of the distributions, the theoretical parametrization behind it is not relevant. The only way in which the clustering of spikes negatively influences the results is that the approximation used to derive the limiting distribution of the GEV and GP distribution become poorer.

The parameter θ is defined as the extremal index. An independent sequence has an extremal index of 1. Equivalently, low values of θ indicate dependence among the time series. The reverse is however not true: an extremal index close to 1 does not necessarily implies no dependence structure. Analogous for dependence vs. correlation, there is perfect relation between Y= |X|, but the correlation however is zero.

An estimate of θ within the time series of the spikes can be obtained from the return levels. To define the return levels let X follow a GP distribution. The probability that X exceeds x given that X exceeds u is given by

Pr{X>x|X>u} =  1+ξ x−u σ −1/ξ . This can be rewritten into

Pr{X>x} =ζuθ  1+ξ x−u σ −1/ξ ,

where ζu=Pr{X>u}. The threshold value of x, xm, which on average is exceeded once every m

observations is obtained by solving

ζuθ  1+ξ x−u σ −1/ξ = 1 m. Solving this Equation gives an expression for xm

(19)

where xm is called the m-observation return level. The estimates of the parameters of the GP

distribution can be obtained in the regular way using maximum likelihood as described above. In order to obtain estimates for ζ and θ, the number of clusters needs to be defined. Let the number of exceedences of the threshold u be defined as nu. Additionally let the total number of data point

be given by n. Finally let the total number of clusters be defined as nc. Two threshold exceedences

are considered to be in the same cluster if they are located within same minimum gap r close to each other. Then parameter estimates for ζuand θ, ˆζuand ˆθ are obtained from

ˆ

ζu= nu

n and ˆθ=

nc

nu. (20)

If the number of clusters is almost equal to /far less than the number of threshold exceedences ˆθ approaches one/zero. This is consistent with the view that θ with a value of 0 indicates strong dependence and that a θ with a value of 1 can indicate no strong dependence (although this not necessarily needs to be the case).

Parametric reduction of the number of parameters

As will be shown in the empirical analysis, the electricity spot price time series as a 24×1 multivariate time series is likely to be stationary. The univariate series however is not, since the temporal dependence structure changes throughout the day. Upward spikes are generally observed during the day and downward spikes are more often observed during the night. To account for this asymmetry, two approaches are going to be investigated further. The first one is going to be discussed in this subsection and will focus on parametrizing the coefficients of the GEV distribution to make them dependent on the time of the day. The second approach is to use a probit model to model the probability of spike occurrences.

To account for the non-stationarity of the arrival of an extreme value let Zifollow a GEV

distribu-tion with parameters µ(h), σ(h), ξ(h). To make the parameters of the distribution of Zidependent

on the time of the day the parameters are parametrized as follows

σ(h) =  1, cos h−12 24   β0 β1  (21) µ(h) =  1, cos h−12 24   β2 β3  (22) ξ(h) =  1, cos h−12 24   β4 β5  . (23)

The reason why this parametrization is chosen is because now σ(h), µ(h)and ξ(h)are all maxi-mum during the day at 12:00 and minimaxi-mum at 00:00 or 24:00. For the upward spikes it is expected that the value for ξ and µ are positive, while this should be negative for the downward spikes. Additionally it is expected that the value for σ should be higher for the upward spikes than for the downward spikes, since the upward spikes are much larger in absolute sense (inverse leverage). Denoting by β= [β0, . . . , β5]the complete vector of parameters, the likelihood is simply given by

L(β) =

T

i=1

(20)

For non-zero ξ one obtains that the log-likelihood is given by l(µ(h), σ(h), ξ(h)) = − T

i=1 " log σ(h) + (1+1/ξ(h))log  1+ξ(h) zi−µ(h) σ(h)  +  1+ξ(h) zi−µ(h) σ(h) −1/ξ(h)# .

Substituting the Equations for σ(h), µ(h)and ξ(h)into the Equation l(µ(h), σ(h), ξ(h)) one can

maximize this Equation yielding the maxium likelihood estimate of the vector β. Approximate standard errors and confidence intervals can be obtained from asymptotic distribution similar to how it has been derived in Equation 12 .

To check whether the parametrized model is a good representation of the data again a QQ plot can be made. In the case the data is modelled using the non-stationary approach however some modification is needed. Let Zi follow a GEV distribution with the maximum likelihood

estimates of the data as the parameters, i.e.

Zi∼GEV(µˆ(h), ˆσ(h), ˆξ(h)).

Now let the standardized variables zi be defined as

zi = 1 ˆ ξ(h) log  1+ξˆ(h) Zi−µˆ(h) ˆσ(h)  ,

If the chosen parametrization is correct these standardized variables should follow a standard Gumbel distribution, with probability distribution function given by

Pr(Zi ≤z) =exp(−e−z), z∈ R.

Let the ordered realisations of Zi be given by ˜z(1), . . . , ˜z(m). Then the quantile plots of the

standardized data and the theoretical quantiles of the standard Gumbel distribution consists of the pairs  ˜z(i),−log  −log  i m+1  , i=1, . . . , m. (25)

If the parametrization is correct these points should lie one a straight line. Exogenous variables determining the probability of extreme values

(21)

As will become apparent in the results section similar conditions do not always lead to the arrival of extreme price events. A natural way to incorporate this phenomenon is to model the arrival of a price spike as stochastic binary variable S. Let the vector of regressors which are assumed to influence the outcome S be given by Xspike. The following Probit model can be written

down

Pr(S=1|Xspike) =Φ(Xspike0βspike),

whereΦ is the CDF of the normal distribution.

Logical condidates for the explanatory variables for the arrival of a price spike would be the residual load and the temperature. To account for the clustering of spikes there exists several approaches in literature. See for instance [10] [11] [15] . Here the clustering is taking into account by adding interaction terms between the residual load, temperature and the total amount of spike occurances in the last two weeks. Further discussion on this will be postponed until the results section.

A consistent estimate of the vector βspikeis simply obtained by maximum likelihood. Let the data set{si, xspike,i}ni=1contain the data on the spike occurance and the explanatory variables. The

log-likelihood is then given by

l(βspike) = n

i=1  silnΦ(xi0βspike+ (1−si)ln(1−Φ(xi0βspike))  . (26)

Minimizing the likelihood function with respect to βspike will give the maximum likelihood estimator ˆβspikewhich is consistent and asymptotically normally distributed as

√ n(βˆspikeβspike)→−d N(0,Σ−1), where Σ=E " φ2(Xspike0βspike)X0X

Φ(Xspike0βspike)(1−Φ(Xspike0βspike))

# ,

and φ(.) is the probability density function of the standard normal distribution. A consistent estimator ofΣ, ˆΣ is given by ˆ Σ= 1 n n

i=1

φ2(xspike,i0βˆspike)xspike,i0xspike,i

Φ(xspike,i0βspike)(1−Φ(xspike,i0βspike)).

II.2.

Literature overview short term electricity price forecasting

(22)

vital importance from a risk management perspective. Although the motivation in this report is totally different from the motivation in current research, a comparison with this literature is still considered to be useful. For a comprehensive overview, see [1] and chapter 4 of [16].

There are different classifications for all the modelling approaches present in literature. The first class consists of game theoretic and more fundamental production-cost type of models. The second class uses artificial intelligence, non-linear and neural type of techniques to model electricity prices. Since these types of models fall outside the scope of this report they will be omitted. Lastly a popular class of models consists of so called Markov regime-switching models. Here the idea is to model the detrended electricity spot price by different states with different stochastic properties. These models however are used for derivative pricing and generally do not forecast hourly prices accurately. Therefore also these models are omitted here. Instead, this subsection will focus on the more econometric type of models.

Jump diffusion models

The first class of models which are generally used in the electricity spot price forecasting literature are so called jump-diffusion models. Generally, the way in which the electricity spot price is modelled in these type of models is to split up the electricity spot price into two components: a deterministic part and a stochastic part:

St=µt+Xt, (27)

here St represents the electricity spot price, µt represents the deterministic part and Xt is the

stochastic process. The stochastic part is then modelled using a differential Equation, which generally has the following form

dXt=m(Xt, t)dt+σ(Xt, t)dWt+dq(Xt, t), (28)

where Wtis a standard Brownian motion and q(Xt, t)represents a pure jump process. The general

Brownian motion (gbm) is not considered to be able to capture the stylized facts of electricity spot prices, since it would consider the price after a price increase to be the new price level and will not return to its normal level. Instead, for the drift term m(Xt, t)generally the following

parametrization is taken: m(Xt, t) = (αβXt). In that case Equation (28) is called a

mean-reverting-jump-diffusion model (mrjd). Usually, the volatility term σ(Xt, t)is taken to be constant.

To account for heteroskedasticity, also other parametrizations are taken: σ(Xt, t) =σ|Xt|γ.

The process q(Xt, t)is a pure jump process which can be written as J(µj, σj)dΠ(λ). Here the jump

size, J(µj, σj), is normally distributed with mean µj and standard deviation σj. The term dΠ(λ)

represents a homogeneous Poisson process with a constant intensity λ. Assuming that λ to be constant is suggested to be a bad assumption and a (deterministic) periodic intensity function λ(t) would be more reasonable [17] . However, the scarcity of jumps on the daily scale can make the identification of any adequate periodic function problematic [1].

The first paper to incorporate the mean-reverting-jump-diffusion model was a paper by Cartea et. al in 2005 [18] . Here, they modelled the residuals of the logarithm of the electricity spot price using the mean-reverting-jump-diffusion model as defined in Equation (28).

(23)

subtracted from the logarithm of the electricity spot price and the detrended logarithmic residuals, ln X(t), are modelled as

d ln X(t) = −θ1ln X(t)dt+σdW(t) +h(lnX(t−))dJ(t),

where again W(t)is a standard Brownian motion and θ1and σ are positive constants and J(t)is

now a time-inhomogeneous compound Poisson process and h(.)is an indicator function which can obtain two values,±1, indicating the direction of the jump and it can be written as

h(lnX(t)) = 

+1, if lnX(t) < I (t) −1, if lnX(t) ≥ I (t), whereI (t)is a threshold in the current spot price.

For the spike intensity, i(t), they make a parametric model and assume that i(t)is represented by the following deterministic function

i(t) =θ2×s(t), (29)

where θ2is the expected intensity of price spikes and the function s(t)is given by

s(t) =  2 1+ |sin[π(t−τ)/k]| −1 d ,

where d, k and τ are constants. For the spike size, p(x; θ3, φ), they propose a truncated exponential

distribution which has a density p(x; θ3, φ) =

θ3exp(−θ3x)

1−exp(−θ3φ), 0≤x≤φ,

where θ3is the average jump size and φ is the maximum possible jump size.

In both models of Roncoroni et al. and Cartea et al. there is a single parameter θ1that accounts

for the mean reversion. The rate of mean reversion, however, for the base signal and the spikes is totally different. A spike would require a much faster rate of mean reversion than the base signal. To account for this feature Benth et al. developed a so called factor model [21] [22] which was inspired by a more general paper by Barndorff-Nielsen et al. on non-Gaussian Ornstein-Uhlenbeck based models [23]. This methodology has also been used by Klüppelberg et al. [14] . Again the same approach is taken to divide the model into a deterministic and a stochastic part. However, the stochastic part, X(t)is now represented by a weighted sum of multiple independent non-Gaussian Ornstein-Uhlenbeck processes Yi(t). So X(t)is defined as

X(t) =

n

i=1

wiYi(t), (30)

where wiare weights and Yi(t)can be written as follows

dYi(t) = −λiYi(t)dt+σi(t)dLi(t), Yi(0) =yi, i=1, . . . , n, (31)

where λiare non-negative constants, σi(t)are positive bounded functions and Li(t)is assumed to

be independent increasing càdlàg pure jump processes. They can be represented in terms of their Poisson random measures Ni(dt, dz):

Li(t) =

Z t

0

Z ∞

(24)

They assume that Li(t)is integrable and that Ni(dt, dz)has deterministic predictable compensator’s

vi(dt, dz). These processes are called Sato processes and more details on these can be found in 25 .

Taking into account the deterministic predictable compensator, the compensated Poisson random measure, ˜Ni(ds, dz), is then defined as

˜

Ni(ds, dz) =Ni(ds, dz) −vi(ds, dz),

where vi(ds, dz) =ρi(t)dtvi(dz)and ρi(t)is a deterministic function controlling the time varying

jump intensity. The compensated jump process, ˜Li then becomes

Li(t) = Z t 0 Z ∞ 0 z ˜Ni (ds, dz).

The great advantage of this methodology proposed by Benth et al. above the models developed by Roncoroni et al. and Cartea et al. is that due to the additive structure as defined in Equation (30) the model is much more flexible to account the different rates of mean reversion. Specifically, the autocorrelation function of the kthlag of X(t)is now given by

ρ(k) =w1e−kλ1+w2e−kλ2+ · · · +wne−kλn. (33) Discrete time jump diffusion models

The mean reverting jump diffusion models are represented by Equation (28). In order to estimate the parameters of this Equation it needs to be taken into account that the electricity spot prices form a discrete time series. The discretized version of Equation (28) becomes

Xt=α+φXt−1+σdWt with probability 1−λ

Xt=α+φXt−1+σdWt+µJ+σJJt with probability λ, (34)

where dWtand Jtare both standard normal distributions. The conditional density function of Xt,

given Xt−1can be written as

f(Xt|Xt−1) =λN1(Xt|Xt−1) + (1−λ)N2(Xt|Xt−1),

where N1(Xt|Xt−1)and N2(Xt|Xt−1)are given by

N1(Xt|Xt−1) = ((σ2+σ2J))− 1 2exp −(Xt−αφXt−1−µJ) 2 2(σ2+σ2J) ! N2(Xt|Xt−1) = (2πσ2)− 1 2exp −(X t−αφXt−1)2 2  .

Consistent estimates of the parameters of the distribution of f(Xt|Xt−1) can be obtained by

maximum likelihood. Let θ= [α, φ, µJ, σ2, σJ2, λ], then the likelihood estimator of θ, ˆθ, is the vector

which minimizes the log likelihood function of f(Xt|Xt−1), which can be written as

l(θ) =

T

t=1

log(f(Xt|Xt−1)).

(25)

Armax models

Next to the jump diffusion models, the more standard econometric regression models are also used extensively in electricity price forecasting. Employed models used in literature comprise time varying regression, threshold autoregressive, GARCH, ARMAX and different combinations of these models, just to mention a few.

Since these models tend to perform worse than the more advanced models, most of the dis-cussion on these models is omitted. For a more in depth disdis-cussion see also 1 . In this report an univariate ARMAX will be estimated to see whether modelling the electricity spot price as a multivariate time series indeed improves the forecasting power.

Let the electricity spot price be given by ytand the exogenous variables be given by the vector xt. Then the general autoregressive moving average model with exogenous variables and lag

coefficient p and q can be written as:

φ(B)yt=θ(B)et+xtβ, (35)

where B is the backward shift operator: Bhyt = yt−h, φ(B) =1−φ1B− · · · −φpBpand θ(B) =

(26)

III.

Emperical analysis

Now that all the theoretical pieces are put into place the results will be discussed. The structure of this section is as follows. First a validation for the chosen model will be given. Here, it will become apparent that the electricity spot price cannot be seen as an univariate process but has to be modelled as a 24×1 vector containing the electricity spot price on every hour. Secondly, the modelling results of the price spikes will be discussed. For the estimation 23016 out of 24096 data point are used. The last 1080 data points are used for forecasting. Values of the electricity spot price above 200 Euro/MWh have been replaced by 200 Euro/MWh, because these values are due upward price spikes. These spikes will be modelled separately using the probit model. After the entire model is estimated, also an armax and a mean reverting jump diffusion model are going to be estimated.

III.1.

Model validation

As described in the theory section the electricity spot price (without its price peaks) will be modelled using a periodic autoregressive model as given by Equation (1). Here it was postulated that the electricity spot price cannot be seen as a univariate time series but rather as a vector process of all the 24 electricity spot prices together. The theoretical justification for this was that in order to properly take into account the merit order model, different coefficients for the explanatory variables were needed for different hours. Further proof of this will be given when the forecasts of the univariate model are going to be compared to the multivariate model.

First the stationarity of the data is discussed. This is done by looking at the residuals of Equation (7) , which is repeated here for convenience:

ˆ

uh,t=yh,t−x0h,tβˆh.

To test for a unit root, an augmented Dicky Fuller (ADF) test is executed. The procedure for the ADF test is to estimate the following regression model

∆ ˆut=α+βt+γ ˆut−1+δ1∆ ˆut−1+ · · · +δp−1∆ ˆut−p+1+et,

where ˆutis the univariate time series of the residuals. The number of augmented terms (number of

lags), is based on the first augmented term which is not significant at the 5% level. The coefficient of γ is statistically significantly different from zero and hence the null hypothesis of a unit root is rejected.

α β γ δ1 δ2

Parameter estimate 0.0071119 -0.2508 -0.18689 -0.056664 -0.053327 Uncorrected t-ratios 0.062585 -35.41 -22.2419 -6.8232 -6.5764 Significance level - <0.01 <0.01 <0.01 <0.01

Table 3: Estimation results for the augmented Dicky-Fuller test together with the (uncorrected) t-ratios on the coefficients and the levels at which the coefficients are statistically significantly, using Dickey-Fuller critical values.

(27)

properties such as mean, variance and autocorrelation. should all be constant over time. Clear evidence that this is not the case can be obtained from the cross correlation structure of the residuals, which is depicted in Figure 5.

Figure 5: Correlations structure between the residual prices of Equation7 at different times of the day. Here the colour indicates how strong the correlation is as indicated by the colour bar on the right side.

As can be seen from this figure the correlation structures still contain a lot of structure, which is very exceptional for spot price data. Normally in financial data the intra-period correlation patterns are such that the correlation only depends on the length between two periods of time, not on the time itself. This would be visible in the correlation matrix as that it is symmetric on both diagonals. This feature is a prerequisite for stationarity.

(28)

residual in the electricity price was high at 04:00, then it will still be high at 05:00, but not necessar-ily at 09:00. This is nicely indicated by a principal compontent analysis (PCA) indicated in Figure 6. The first 3 or 5 principal components already account for 74.2% or 84.6% of the variation in the residual prices. These principal components are very usefull in making forecasts for more than one day ahead. Possible applications for this would be storage of hydro power. They allow for a parsimonious model, whereas most of the variance can still be explained. However, the (PCA) does not fully exploit the rich intra-day correlation structure present in the data to make forecast a few hours ahead. Therefore PCA is not useful for the short range forecasting which will be done in this report and therefore further discussion will be omitted.

Figure 6: Left figure: The principal components with the highest (blue), second highest (orange) and thrid highest (yellow) eigenvalue as a function of the hour on the day. Right figure: The percentage variance explained by the different principal components.

Next to the rich and interesting autocorrelation structure it is also apparent that the residuals are far from normal. This is clearly indicated by the sample excess kurtosis of the residuals as a function of the hour of the day depicted in Figure 7 . Note that the vertical axis has a logarithmic scale and that an excess kurtosis of more than 500 has been observed. This is almost entirely due to the large price spikes and will be discussed in the section on upward and downward spikes.

III.2.

The periodic autoregressive model

Now that the multivariate version of the OLS has been discussed and the need for an periodic autoregressive model has been described, it is time to add the lagged prices to the explanatory variables and to estimate the full PAR model.

(29)

Figure 7: Sample excess kurtosis of the residuals of the regression of Equation7 as a function of the time of day (hours).

is preferred. Adding more parameters to the PAR model increases the likelihood by adding parameters, but doing so may result in overfitting. The Schwartz criterion solves this problem by introducing a penalty term for the number of parameters in the model. A similar criterion is the Akaike criterion. The Akaike criterion (AIC) and the Schwartz criterion (SC) are given by

SC(l) = −2 ln(ˆl(Aˆ)) +k ln(L) AIC(l) = −2 ln(ˆl(Aˆ)) +2k,

where l is the lag size, k is the number of parameters in the model, L =23016 are the number of fitted data point and ˆl(Aˆ)is the maximized value of the log likelihood, with the maximum liklihood estimates as parameters. Assuming that the errors of our PAR model are i.i.d. normally distributed this can be rewritten as

SC(l) =L ln(RSS/L) +k ln(L) AIC(l) =L ln(RSS/L) +2k,

where RSS stands for the residual sum of squares: RSS=ni=1(Yi−XiAˆ)2.

(30)

Figure 8: The value of the Schwartz criterion (blue line) and the Akaike criterion (orange line) as a function of the lag length.

The PAR model can both be estimated for the electricity spot price and the logarithm of the electricity spot price. Taking logarithms of the electricity spot price cannot be done without modification, because if there is a downward price spike the price usually drops to 0 (due to regulations prices lower than zero where not allowed, this has however been changed after 2009). Since most of the traditional power generators in Germany are inflexible negative electricity prices make sense for short periods of time if that burden is lower than the start-up costs. In more recent data sets, negative prices have indeed been observed. This is a very unusual characteristic for a commodity and can be explained by the fact that electricity is not storable. This is a clear reason why electricity spot prices should be modelled using absolute prices and not with the logarithm of the electricity spot price. Since taking the logarithm of zero is not possible all zero values haven been replaced with a value of 1 Euro/MWh. The histograms of the residuals of the PAR model with both the absolute value and the return as dependent variables are depicted in Figure 9. A plot of the histogram of the residuals at 19:00 together with a fitted normal distribution and a t distribution is shown in Figure 10. Here it becomes evident that the residuals of the PAR model are not normally distributed and have fatter tails. As can be seen in the figure a t-distribution provides a much better fit of the data.

(31)

Figure 9: Histogram of the residuals of the PAR model with both the returns (left histogram) and the absolute values (right histogram) as the dependent variables. Both histograms display the residuals of the 24 hours seperately, where dark blue represents the first hour and light yellow represents the 24thhour. The bin sizes are 0.1 (left histogram) and 5 (right histogram) respectively.

Figure 10: Histogram of the residuals at 19:00 together with a fitted normal (red curve) and a t distribution (green curve).

overestimate of the coefficient of the impact of the residual load in the low demand regions and an underestimate of this impact in the high demand regions.

(32)

Figure 11: Least square estimates of the coefficients (blue line) for residual load as a function of the time of the day (hours). The 95 % confidence interval is indicated by the red dotted line.

Figure 12: Kendalls tau (left figure) and Spearmans rho (right figure) correlation matrix between the residuals of different times of the day. Here the colour indicates how strong the correlation is as indicated by the colour bar on the right side.

lagged explanatory variables) are depicted on the left plots and the residuals of the PAR model are depicted on the right plots. As was also deduced from the correlation matrix, lagged residuals contain a lot of valuable information as long as the lag length is not too long. The residuals at 13:00 contain explanatory power about what the price at 14:00 is going to be, but the residuals at 03:00 contain no forecasting power for the electricity spot price at 10:00.

(33)

Figure 13: Top two figures: Scatter plot between the residuals at 13:00 and 14:00 for both the residuals of the regular OLS model without the lagged variables as explanatory variables (top left) and with lagged variables as explanatory variables (top right). Bottom two figures: Scatter plot between the residuals at 03:00 and 10:00 for both the residuals of the regular OLS model without the lagged variables as explanatory variables (bottom left) and with lagged variables as explanatory variables (bottom right).

some volatility clustering in the long term. This is a somewhat surprising results, since volatility clustering has been observed in the other literature [24]. The focus of this report is on short term electricity price forecasting and hence volatility clustering will be omitted.

III.3.

Extreme upward and downward prices

Now that the baseline and the daily seasonality have been covered, it is time to discuss the strong outliers (price spikes) present within the data. In this section, all the characteristics of the upward and downward spikes are discussed, such as the time of occurrence, clustering of spikes and of course their distribution.

Upward spikes

(34)

Figure 14: The autocorrelation function as a function of lag length for the squared residuals of the entire times series and the individual time series at 04:00, 09:00, 14:00, 19:00, 23:00 The blue lines indicate the 95 % confidence interval.

variance. Therefore in this case a threshold level of 75 Euro/MWh is chosen.

Figure 15: Residuals of the OLS model which did not contain the lagged variables as explanatory variables as a function of time. The coloured lines represent one time (light blue), two times (green) and three times (red) the sample standard deviation.

(35)

Figure 16: Left figure: Mean residual life plot for the upward spikes (blue curve) together with two dotted curves which are one standard deviation more (orange dotted) and less (yellow dotted). Right figure: zoomed in section of the left figure

found in Table 4. As expected, the maximum likelihood estimates indicate the strongly skewed distribution of the upward spikes as indicated by the large value of ˆξ. The histogram of the peak

over threshold residuals together with the estimated GP distribution is depicted in Figure 18 .

Figure 17: Histogram of the upward spikes (blue bars) together wit a fit of generalized extreme value distribution (red curve).

ξ σ µ

Parameter estimate 0.8315 41.0686 105.7254

95 % Confidence interval (0.5468-1.1161 ) (31.2872-53.9080) (96.0560-115.3948) Table 4: Maximum likelihood estimators of the parameters of the generalized extreme value distribution of the upward

(36)

Figure 18: Histogram of the upward spikes (peak over treshold) together wit a fit of the generalized Pareto distribution (red curve). Here the parameter estimates ˆξ=0.58 and ˆσ=62.4 of the GP distribution are used.

Figure 19: Maximum likelihood estimates (blue curve) of ξ (left figure) and σ (right figure) of the generalized Pareto distribution of the upward spikes as a function of the threshold size. The confidence intervals are indicated by the yellow and orange curve.

(37)

Figure 20: QQ plot of the standardized variables of the upward spikes with the corresponding theoretical Gumbel distribution.

Figure 21: QQ plot of the of the upward spikes with the corresponding theoretical generalized Pareto distribution.

Downward spikes

From the same data set used for the upward spikes the downward spikes are obtained. It should be noted that the downward spikes are special, due to the market regulations. During the period between January 1st 2006 and 31 September 2009 negative spot prices were not possible. Thus large downward spikes almost always resulted in a electricity spot price which was zero or close to zero. To differentiate between the regular residuals and these downward spikes, only prices that are between 0 Euro/MWh and 3 Euro/MWh are considered.

Referenties

GERELATEERDE DOCUMENTEN

The objective of this research is provide insight in the direct and indirect effects on performance measurement systems within organizations whose results and

This point is illustrated in Figure 1 as well where it can be seen that in the short-run nominal gold prices can show large deviations from the inflation hedge price

It is therefore reasonable to expect MFIs serving a higher percentage of female clients undertake lower amounts of loan demand, and female borrowers have relatively lower

Om te achterhalen hoe jouw organisatie de samenwerking tussen begeleiders en vrijwil- ligers kan optimaliseren, heeft Zorg Beter met Vrijwilligers de Vrijwilligersscan ontwikkeld. Op

Plant Research International (PRI) beschikt nu echter over een apparaat dat meerdere spelden tegelijk kan vinden en dat aan een minieme hoeveelheid monster genoeg heeft om de dader

• The higher / lower (as compared to the mean attitude) a respondent’s indicated attitude, the higher / lower their respective concession on either price or quantity. • The higher

! Price and Quantity Framing did not reduce feelings regarding difficulty. and wrongness during the

However, a small reduction in the content size without changing the price or the package itself will barely change the perceived value of the product (Granger and