• No results found

Forecasting the yield curve using resampled residuals in the Nelson-Siegel framework

N/A
N/A
Protected

Academic year: 2021

Share "Forecasting the yield curve using resampled residuals in the Nelson-Siegel framework"

Copied!
54
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master Thesis Econometrics Specialization: Financial Econometrics

Forecasting the yield curve using

resampled residuals in the Nelson-Siegel

framework

Author: Joppe J. Kiers 10680020 University of Amsterdam Supervisor:

Prof. dr. Peter H. Boswijk Second marker: Dr. Simon A. Broda

(2)

Contents

1 Introduction 6

2 Literature Overview 8

2.1 The Nelson-Siegel model . . . 9

2.1.1 Non-parametric models . . . 11

2.1.2 Macroeconomic factors . . . 11

3 Theoretical framework 12 3.1 The yield curve . . . 12

3.2 The Nelson-Siegel class . . . 13

3.3 Modifications . . . 17

3.4 Time-variation: Diebold-Li . . . 18

4 Models and Techniques 19 4.1 Forecast roadmap . . . 19

4.1.1 Step 1: estimate λ’s and β’s . . . 19

4.1.2 Step 2a: Fit AR(1)- processes on the β parameters . . . 20

4.1.3 Step 2b: AR(1)-residuals . . . 20

4.1.4 Step 3: Simulations on future β parameters . . . 20

4.1.5 Step 4: Fit yield curve and make inference . . . 22

4.2 Motivation of chosen procedures . . . 22

4.3 Evaluation criteria . . . 23 5 Data 25 5.1 Jungbacker dataset . . . 25 6 In-sample fit 28 6.1 Fit of yields . . . 28 6.2 Nelson-Siegel parameters . . . 29

(3)

6.3.1 Non-normality . . . 30

6.3.2 Correlation . . . 31

7 Forecast 33 7.1 Models under consideration . . . 33

7.2 RMSFE and Standard deviation . . . 33

7.3 Forecast procedure . . . 35 7.4 Forecast horizon . . . 36 7.5 Window selection . . . 37 7.6 Maturities . . . 38 7.7 Model . . . 38 7.8 Significance . . . 40 7.9 Rejection Frequency . . . 40 8 Conclusion 42

(4)

List of Figures

3.1 Factor loadings . . . 14

3.2 Optimal lambda . . . 15

3.3 Sensitivity on all four parameters . . . 16

5.1 Jungbacker dataset . . . 25

5.2 End-month yields for three different maturities . . . 26

5.3 Differently shaped yield curves . . . 27

6.1 Evolution of the β-parameters over time . . . 29

(5)

List of Tables

6.1 Statistics of error terms in AR(1)-processes . . . 31

7.1 Forecast options . . . 34

7.2 Total ranking based on DS-scoring . . . 35

7.3 Ranking sensitive on forecast horizon . . . 36

7.4 DS scoring for different forecast windows . . . 37

7.5 Ranking for different maturities . . . 38

7.6 Total ranking based on DS-scoring . . . 39

1 Summary Statistics for Jungbacker dataset . . . 47

2 Goodness of fit of both models . . . 48

3 Descriptive statistics for β-parameters . . . 49

4 Statistics for AR(1)-processes . . . 50

5 Statistics for Diebold-Li model and Adjusted Nelson Siegel Svenson model . . 51

6 Average standard deviations for 1-month ahead forecast. . . 52

7 Average standard deviations for 3-month ahead forecast. . . 52

8 Average standard deviations for 6-month ahead forecast. . . 53

9 Average standard deviations for 12-month ahead forecast. . . 53

(6)

Chapter 1

Introduction

Within the financial world, interest rates play a very important and central role. Interest rates are used in loan structures, but more importantly in discounting future cash-flows and determining the time-value of money. Especially the value of portfolios with large duration are very sensitive to changes in interest rates. The yield curve is a representation of interest rates or yields, with respect to their maturity. The yield curve can be constructed using prices and values in the market.

In 2006, Diebold and Li published their paper Forecasting the term structure of govern-ment bond yields, in which they proposed a model to establish out-of-sample forecasts that are proven to be better than other models. This model uses the yield-curve fitting and parameterization proposed by Nelson and Siegel in 1987, and modelled the time-dependent parameters over time.

In continuation of the model by Diebold and Li, we propose two resampling procedures within the model that should improve the density forecast of the yield curve. These two pro-cedures are proposed as the residuals in the Diebold-Li model are not normally distributed, which will be shown. Two other procedures will be used as comparison of the forecasting abilities.

In total we compare four forecasting procedures, one with normally distributed error terms, a model with correlated normally distributed error terms. In the other two models we resample the residuals of the auto-regressive processes individually as well as jointly. The latter two procedures hold on to the beauty of the parametrization of the yield curve but also preserve the underlying distribution of the error terms. We will do so in both the model of Diebold and Li (DL) as well as the Dynamic Adjusted Nelson Siegel Svensson (DAS) model.

(7)

This thesis will zoom in on the forecasting abilities and evaluate these using known cri-teria. Within the drawn framework the central research question is: Does the forecast procedure improve when using resampled residuals?

We also opt to answer the following sub-questions: • What resampling procedure performs best?

• Is there a difference between the performance of residual resampling in the model pro-posed by Diebold and Li and the Dynamic Adjusted Nelson Siegel Svensson model? • Does the forecast horizon change the performances of the proposed procedures? • Do the other choice in our forecast model influence the performance?

• Do results hold on all maturities?

In Chapter 2 we will give a short overview about what has already been investigated and concluded in the literature concerning yield curve models. From this chapter we continue with the model that we will use which will be explained in the third chapter. In Chapter 4 we thoroughly describe the model and techniques we use for the simulations and forecast of the yield curve. The data we use is described in the Chapter 5.

In Chapter 6 we consider the in-sample fit of both the Diebold-Li model and Dynamic Ad-justed Nelson Siegel Svensson model. The out-of-sample forecast abilities are discussed in Chapter 7. This chapter explains the forecast results in general and discusses the possible calibrations of the models and procedures. Per sub-topic we discuss the results and try to answer the sub-questions. In the final chapter, Chapter 8, we will summarize our findings and give some recommendations.

(8)

Chapter 2

Literature Overview

In the past decades, various models using different approaches have been developed to model the term-structure of interest rates better known as the yield-curve. The yield curve repre-sents interest rates for different maturities at a given time. In this chapter we will discuss some of the work that has already be done.

In general, yields for low maturities are lower than for high maturities. The yield curve is usually upward sloping, but can be decreasing, hump shaped or even inverted as well 1. We also know that standard deviations for yields with a short maturity are usually higher than the standard deviation for yields with a longer maturity; yields with a shorter maturity fluctuate more over time.

The models that have been developed can roughly be divided into two different classes: arbitrage-free and equilibrium models. The difference between these models lies in the pur-pose we use them for. In an arbitrage-free model, we use a measure Q, the equivalent martingale measure. This measure is defined such that no arbitrage opportunities exist and is determined by the yields observed in the market. These kind of models are used for ex-ample to derive (market consistent) prices of financial products. In an equilibrium model, possible outcomes are linked to a probability measure P for, for example, a risk management point of view.

Existing models

Equilibrium models use assumptions on economic variables to derive a process for the short rate which is the (annualized) interest on borrowing for a very short time, also referred as the risk-free rate. The short rate can be modelled as a Geometric Brownian motion (Rendleman

1

(9)

and Bartter, 1980, or a mean-reverting process (Vasicek, 1977) with additionally a built-in non-negativity restriction (Cox, Ingersoll and Ross, 1985). Multi-factor models have been proposed such as Brennan and Schwartz (1982) and Longstaff and Schwarz (1992). From the short rate, the whole yield curve can be obtained. The aim for equilibrium models is to form expectations on future yields and to model the dynamics. This can be of interest from a risk management point of view as we can use the results to help policymakers. Investment decisions can be made to achieve a certain goal, for example a sufficient coverage ratio.

Opposed to equilibrium models, the class of arbitrage-free models has been developed. In arbitrage-free models, the aim is to derive the cross-sectional relations within the yield curve such that no arbitrage opportunities exist. The difference between the two classes is de-scribed in Hull (2012) in the following way: ”In an equilibrium model, today’s term structure of interest is an output. In a no-arbitrage model, today’s term structure of interest rate is an input”.

The no-arbitrage model uses today’s yields to form a yield curve that is consistent and arbitrage-free. Examples of no-arbitrage models are Ho and Lee (1986) and Hull and White (1990) which is an arbitrage-free version of the Vasicek-model. Black and Karasinski (1991) designed a model that assumes non-negativity of the interest rate as well as lack of arbitrage. Most of these models are developed within a binomial-tree structure and are used to derive for example the price of a financial derivative today under the risk-neutral measure.

Within the class of no-arbitrage models, Brace, Gatarek and Musiela (1997) developed a model that uses forward rates and forward volatilities to estimate the whole yield curve in their so called Libor Market Model. This model is widely used to price (exotic) options and other derivatives. For derivatives that have a path-dependent value, such as Asian options, Monte Carlo simulations are used to derive the price.

2.1

The Nelson-Siegel model

Nelson and Siegel (1987) introduced a class of yield-curves that represents the variety of shapes a yield-curve can have. This type of model fits a curve from a set of maturity-dependent interest rates and enables us to find yields for any maturity, interpolated as well as extrapolated.

In fact the Nelson-Siegel model is a parametrization of maturity-dependent yields into a whole term-structure of interest. The model uses one loading parameter, which accounts for the ’hump’ through the factor loadings and 3 beta’s regarding the short-term, mid-term and

(10)

lond-term factors. Diebold and Li (2006) rearranged the model proposed by Nelson and Siegel slightly into a model with a clearer intuition behind the parameters. The Diebold-Li representation of the Nelson-Siegel class of yield curves is defined as follows:

yt(τ ) = β0t+ β1t  1 − e−λtτ λtτ  + β2t  1 − e−λtτ λtτ − e−λτ  (2.1)

in which yt(τ ) is the yield at time t and maturity τ . The two factors,

 1−e−λtτ λtτ  and  1−e−λtτ λtτ − e

−λτ make up the factor loadings, which are influenced by the factor loading

parameter λt. When the loading parameter λt is set constant in advance, OLS is used to

estimate the three β’s. In that case the model is easy to estimate and we end up with three economically relevant parameters. From these parameters we can derive yields for any ma-turity, therefore we do not need to keep up with a large amount of data. The Nelson-Siegel model is explained in more detail in Chapter 3.

The class proposed by Nelson and Siegel represents the curve as a whole using 4 parameters, but does not say anything about an underlying model on the development of the parameters over time. Diebold and Li, were the first to model the evolution of the parameters over time and were able to produce accurate forecasts for the short and long horizon. They modelled the parameters as AR(1) processes separately while leaving the loading parameters constant and compared the forecasting properties to alternative models. They concluded that the AR(1) representation of the parameters resulted in superior 6-month and 12-month ahead forecasts for different maturities compared to for example the random walk model.

In continuation of Diebold and Li, De Pooter (2007) compared different Nelson-Siegel exten-sions on their forecasting abilities. It was concluded that the four-factor model proposed by Bj¨ork and Christensen (1999) performed similarly to the (Adjusted) Nelson-Siegel-Svensson model. As the latter is harder to estimate and the unadjusted model has a potential multi-collinearity problem, the four-factor model has their preference.

Koopman et al. (2010) introduced time-variation of the λ-parameter in the Nelson-Siegel model of Diebold and Li. First they introduced a time-varying loading parameter by treat-ing the factor loadtreat-ing as a fourth VAR parameter and used the Extended Kalman Filter to model the yield curve over time. Next they introduced a GARCH-model to specify the over-all volatility of the loading parameter. Both extensions solely proved improvements on the Diebold-Li model and combined the fit is even better. The combination of the two results into the Dynamic Nelson-Siegel model with time-varying factor loadings parameter and a common GARCH component. Koopman et al. did not investigate the forecasting abilities of

(11)

their proposed model but only on the fit.

As explained in Bolder (2001), in affine models ”future dynamics of the term structure of interest rates depend on the evolution of some observed, or unobserved, factor ”. In the class of Nelson-Siegel models we do have these factors, but as shown in Diebold and Li, the Nelson-Siegel class of models is not affine. This also agrees with the forecasting abilities. It is proven that affine models produce poor forecasts, whilst shown by i.a. De Pooter, the Nelson-Siegel class produces accurate forecasts.

The Nelson-Siegel model does not belong to the arbitrage-free class by construction, but Coroneo et al. (2008) show that yields produced by this model are not significantly differ-ent than the implied arbitrage-free Nelson-Siegel yields. Additionally, Christensen et al. (2010) designed an Affine Arbitrage-free class of Nelson-Siegel models, to exclude possible arbitrage problems.

2.1.1 Non-parametric models

Caldeira and Torrent (2013) consider a nonparametric approach towards the yield-curve forecasting method. They compared three different nonparametric functional data analysis methods with the random walk, (V)AR and the model of Diebold and Li and concluded that their nonparametric approach, especially with the Kernel estimator, outperforms both the random walk as well as alternative models. However, for a short forecast horizon, these results do not hold for all maturities. With increasing horizon, their forecasts relatively to alternative techniques improved considerably.

2.1.2 Macroeconomic factors

In the literature, various extensions considering macroeconomic variables, such as inflation and exchange-rates are added. In our view, adding such terms would intuitively improve our forecast model as in the financial world all factors are connected. Diebold et al. (2006) extended the previous model to include these macroeconomic factors, which resulted in better forecast than using the model by Diebold and Li.

(12)

Chapter 3

Theoretical framework

In this chapter we elaborate the Nelson-Siegel class of yield curves. First we illustrate how the model is built up and how it can be interpreted. We go into detail about the sensitivity on the parameters and finally we introduce some modifications and extensions on the model and explain on their characteristics.

3.1

The yield curve

The forward rate is the interest rate for a period of time, say from t1 to t2 and depicted by

f (t1, t2). The forward rate can be seen as the future yield at t = t1 with maturity τ = t2− t1.

The yield or spot rate is combination of forward rates for different periods. For example the (annualized) yield at t = 0 for a two year bond is:

y0(2) = ((1 + f (0, 1)) ∗ (1 + f (1, 2)))1/2− 1

If we make the steps very small, hence ∆t → 0, we get the instantaneous forward rate f (t). Integrating over the instantaneous forward rates gives the total interest over the period, the annualized interest rate is know as the yield:

y0(τ ) = 1 τ Z τ 0 f (x) dx

The yield at t = 0, y0(τ ), depends on the maturity τ , therefore yields for different maturities

(13)

3.2

The Nelson-Siegel class

Nelson and Siegel parameterized the forward rate into a Laguerre function. A Laguerre function is combination of a polynomial multiplied with an exponential decay term. Their proposed representation for the instantaneous forward rate is: 1

f (τ ) = β0+ β1e−λτ+ β2(λτ e−λτ)

This results in the following formula for the Nelson-Siegel class of yield curves at time t:

yt(τ ) = β0t+ β1t  1 − e−λtτ λtτ  + β2t  1 − e−λtτ λtτ − e−λtτ  (3.1)

We interpret the parameters as Diebold and Li: the factor-loading parameter λt is concerned

with the exponential decay rate, for low λt the decay is slower than for higher λt. Also for

higher λt, the ’hump’ in the yield-curve appears earlier than for lower λt. The beta’s: β0t, β1t

and β2t, can be interpreted as level, slope and curvature respectively, which will be illustrated

later. Time is depicted with t and τ is the maturity, therefore yt(τ ) is the yield with maturity

τ at time t.

We can rewrite Equation 3.1 in matrix-form, to form a view on the whole yield-curve: Yt= X(λt)βt= [1 X1(λt) X2(λ1)] βt, so for maturities (τ1, . . . , τM):     yt(τ1) .. . yt(τM)     =     1 1−eλ−λtτ1 tτ1 1−e−λtτ1 λtτ1 − e −λtτ1 .. . ... ... 1 1−eλ−λtτM tτM 1−e−λtτM λtτM − e −λtτM        β0t β1t β2t    (3.2)

In the literature 17 different maturities are used to estimate the parameters, in fact: {τ1, . . . , τM} =

{3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120}, measured in months. In this case, Yt ∈ R17×1, Xt ∈ R17×3 and βt ∈ R3×1. The curvature of the yield curve is determined

by both X1(λt) and X2(λt). Figure 1 shows both the factor loadings as a function of the

maturity.

1

Nelson-Siegel used a slightly different notation. We followed the notation proposed by Diebold and Li (2006)

(14)

Figure 3.1: Factor loadings X1(red) and X2(green), for λ = 0.598

As the maturity decreases to zero (τ → 0), the first factor loading X1 tends towards

1, for all λ. The limit as the maturity goes to infinity, τ → ∞ is zero. Mathematically: limτ →0 1−exp

−λtτ

λtτ = 1 and limτ →∞

1−exp−λtτ

λtτ = 0. For the second factor loading, both limits

are zero: limτ →01−exp

−λtτ

λtτ = limτ →∞

1−exp−λtτ

λtτ = 0. Therefore the limit for the yields as the

maturity goes to zero, y(τ ) → β0+β1. And if the maturity goes to infinity, the yield goes to β0.

We can see β0 as the level parameter in this model, this parameters influences yields at

all maturities equally. The average slope between the short term and long term yields is β0− (β0+ β1) = −β1. Hence β1 is interpreted as the slope parameter. With β0 and β1 we

can derive the start and end-point of the yield-curve. The β2 parameter determines what

happens at the intermediate maturities. Therefore β2 is referred to as the curvature

param-eter, it determines the curvature of the yield-curve.

If we know the factor loading parameter λ, we can find the β’s by performing OLS on Equation 3.2. We can derive λ in two ways. Either we can choose λ such that the second factor loading attains its maximum at a certain maturity, say τ∗. Figure 3.2 shows the re-lationship between τ∗ and corresponding λ. We see that for lower maturities, the graph is steeper than for higher maturities. A small change in λ results in a large change in τ∗ for higher maturities. In Diebold-Li, the value of λ is set such that the second factor loading attains its maximum at τ∗ = 30 months. 2

2

In Diebold and Li, the λ for which the second factor loading attains its maximum is derived to be 0.0609. From our analysis we end up with a slightly different value, namely 0.0598.

(15)

Figure 3.2: Optimal λ for different maturities τ

Another way to derive λ is to choose the value that minimizes the sum of squared residuals (SSR). The model with the lowest SSR fits the data best. Over time the shape of the curve can vary, this is reflected in the the optimal λ, which changes over time as well. Throughout this thesis we use the latter of the two methods and let λ be such that the yield curve is fitted best, from sum of squared residuals point of view.

In Figure 3.3, we show how changes in all four parameters individually influence the yield-curve. We consider a base-case in which λ = 0.09, and β = [5 − 2 0.5]0. In each graph we vary one of the four parameters and let the others be as in the base-case: ceteris paribus. First of all, we vary the decay parameter, we see that the whole curve changes, both the curvature as well as the value at τ = 120, but the limit remains the same. We see that the value for maturities close to zero does not change, this is in line with what we expect. As shown, the limit of the yield as τ → 0 is β0+ β1, which remains unchanged here. The loading

factors X1 as well as X2 are not zero at τ = 120 months, so there are differences at this

(16)

Figure 3.3: Sensitivity of the yield-curve on the parameters in Nelson-Siegel model. In the base-case (black) the β- parameters are (5, -2, -0.5) and λ = 0.09. In the upper-left graphs the λ-parameter varies from 0.03 to 0.15. In the upper-right graph, β0 varies from 3 to 7. The lower-left graph shows

the sensitivity on β1, from -5 to 1. In the lower-right graph β2 varies from -5.5 to 4.5. As the parameters individually are varied, all other parameters are equal to the base-case.

In the top right graph, we vary β0 and we see that the shape is unchanged, the curve is

only shifted vertically. Varying β0 only influences the level of the yield curve. In the third

graph (bottom left), we vary β1; now the values for close to zero maturities are changed

obviously, but we see that all yield-curves proceed to the same limiting value β0. By varying

β1, we vary the ’start-point’ of the graph, the limit for τ → ∞ remains unchanged. For all

β1, the graphs move similarly, monotonic increasing or decreasing, to the limiting value. As

we change β2, we see that the whole shape changes, but that the limits for τ → 0 and τ → ∞

remain unchanged.

From these graphs we can also see why β0, β1 and β2are referred as level, slope and curvature

parameters respectively. It is clear that combinations of different values for the β-parameters can result in any of the shapes associated with the yield curve such as increasing, decreasing, hump-shaped, S-shaped or inverted.

(17)

3.3

Modifications

After the publication of the original paper of Nelson and Siegel in 1987, many adjustments on their model were proposed to fit the yield-curve better or with accents on certain aspects. In the following section, we introduce a selection of these modifications.

Nelson Siegel Svensson

Svensson (1994) introduced an additional parameter and factor loading to increase flexibility in the model. The β2 parameter and X2 are duplicated using another λ, namely λ2. This

allows for more than one local maximum or minimum. The forward rate in this model is almost the same as for the Nelson-Siegel model, but with the addition of the last term:

ft(τ ) = β0+ β1e−λ1τ + β2(λ1τ e−λ1τ) + β3(λ2τ e−λ2τ)

We see the additional β and factor loading also back in the function for the yields:

yt(τ ) = β0t+ β1t  1 − e−λ1tτ λ1tτ  + β2t  1 − e−λ1tτ λ1tτ − e−λ1tτ  + β3  1 − e−λ2tτ λ2tτ − e−λ2tτ 

This model is widely used by financial institutions such as the European Central Bank. The ECB uses the Nelson Siegel Svensson model to communicate the daily yield-curve on their website. Both maturity dependent yields as well as the parameters that serve as an input for the model, are published. A downside of the Svensson extension is that we have a multicollinearity problem if λ1 and λ2 are very close to each other. The second and third

factor loading will become equal and we can not derive the β2 and β3 parameters separately,

but only as their sum. This is not a problem if we pre-set the λ-parameters at a desired value, but if we derive λ1 and λ2 to be optimal, this problem can occur. De Pooter (2007) proposed

the Adjusted Nelson Siegel Svensson (AS) model in which this the multicollinearity problem can not occur. The third factor loading is slightly changed, which leads to the following formula for the yield curve:

yt(τ ) = β0t+ β1t  1 − e−λ1tτ λ1tτ  + β2t  1 − e−λ1tτ λ1tτ − e−λ1tτ  + β3t  1 − e−λ2tτ λ2tτ − e−2λ2tτ  (3.3)

(18)

Other modifications

Besides the proposed model of Svensson, other modifications on the original Nelson Siegel model have been made. The first modification is more of a simplification. In research is proven that the first two β parameters explain about 95% of the variance. Therefore it is proposed, by Bomfim (2003), but also by others, to exclude β2 and its corresponding factor

loading. Due to the decay parameter, we still end up with a non-linear yield-curve, but the variety of shapes is limited.

Bliss (1997) proposed to let both factor loadings depend on different decay parameters. This model only differs from the original model if the two decay parameters are different. Another modification is done by Bjork and Christensen (1999), who added one extra β pa-rameter and factor loading. This factor loading is the same as the first factor loading, but instead of λ, 2λ is used in the exponential.

3.4

Time-variation: Diebold-Li

The Nelson-Siegel class was just introduced to fit the whole yield curve through some maturity dependent yields. The parametrization of the yield cureve can be done at every point in time to form a set of time-dependent parameters. As mentioned in the previous chapter, Diebold and Li were the first to model the evolution of the parameters over time. They did so in a two-step approach: first they fixed the λt such that the second factor loading attains

its maximum at τ∗ = 30 months and derived the set of β-parameters for every end-month yield-curve. Next, an AR(1)-process was fitted to these time-dependent parameters to model the dynamics over time, which resulted in the Dynamic Nelson Siegel model. We will use the abbreviation DL for this model, referring to Diebold and Li. If the β-parameters in the Adjusted Nelson Siegel Svensson (AS) model follow an AR(1)-processes, the Dynamic Adjusted Nelson Siegel Svensson is obtained (DAS).

(19)

Chapter 4

Models and Techniques

In the previous chapter the theoretical framework was explained, in this chapter we get into more detail how we use this framework to make forecasts. Each step in the procedure is discussed in detail. The techniques that we use and the choices that can be made are explained. Finally we introduce criteria on which we evaluate the forecast abilities.

4.1

Forecast roadmap

Before we can go through the forecasting procedure stepwise, we need to define some symbols and sets. The forecast of the set of yields at time t, h-steps ahead will be referred to as ˆYt,h

and a forecasted yield with maturity τ , we be referred to as ˆyt,h(τ ). The set of maturities we

consider is {τ1, · · · , τM} =M.

The data we use depends on the length of the (rolling) window; w. We use the most re-cent w observations available to fit our model. This subset of our data is referred to as Wt = {Yt−w+1, · · · , Yt}. To evaluate the fit in a consistent way, we will forecast the same

yields ˆYt,h for different forecast horizons h, for t ∈ {1, · · · , T } =T. Hence Wt−h→ ˆYt,h for

all forecast horizons. So for example, the 3-month ahead forecast for time t uses data up to time t − 3 such thatWt−3→ ˆYt,3.

4.1.1 Step 1: estimate λ’s and β’s

As explained in the previous chapter, we use the λ parameter(s) that fit the observed yields best. For a λ or pair of λ’s, we know the factor loadings for all maturities and subsequently the β-parameters are estimated using OLS. The best fit is the model with the lowest SSR. The fitted values are the values that the NS or AS formula gives at the maturities under consideration. For every yield yt(τ ) we have a fitted value depending on the maturity and λ:

(20)

˜

yt(τ, λ), the residual is obtained as: ut(τ, λ) = yt(τ ) − ˜yt(τ, λ). Therefore the optimal λ, λ∗

w.r.t. all the maturities and the data used (the window-length), is:

λ∗= argmin

λ W

XXM

ut(τ, λ)2

For the Adjusted Nelson Siegel Svensson model we derive the pair of λ’s in the same way. To prevent the occurance of extreme values for λ and subsequently the β-parameters, we impose some restrictions on the decay parameters. In the Diebold-Li model, we assume τ∗ to be between 12 and 60 months, hence λ ∈ (0.03, 0.15). For the DAS model, λ1 is assumed

to be smaller than λ2 (λ1 < λ2). From the optimal decay parameters, we can estimate the

β-parameters by OLS.

4.1.2 Step 2a: Fit AR(1)- processes on the β parameters

The next step is to model the β-parameters over time. We fit an AR(1) process on all β parameters individually. 1 So for each β-parameter we have:

βi,t = ci+ ψiβi,t−1+ εi,t

ciis a constant, ψiis the first order auto-regressive parameter and the error-term is εi,t. These

AR(1)-processes are fitted on βi,t−w, · · · , βi,t−1and form the basis of our forecast procedure.

4.1.3 Step 2b: AR(1)-residuals

We are also interested in the residuals in the AR(1)-processes. These are the differences between fitted values and realized values and referred to as ˜εi,t with standard deviation si.

The covariance matrix of the residuals of all AR(1) processes within the model, is denoted Σ. In two of the four forecast procedures under consideration, we re-use the residuals, which we describe in the next step.

4.1.4 Step 3: Simulations on future β parameters

In this step we draw N = 1000 simulations on the future β-parameters. Via this Monte Carlo simulation we want to say sensible things on future yields. We consider four methods to draw these simulations, which we will explain in detail. The one-step ahead forecast in an AR(1) model is: ˆβi,t+1 = ci+ φiβi,t. If we simulate the AR(1)-process, we also consider the error

1Alternatively we could fit a VAR(1) on all parameters combined, but it is shown that the off-diagonal

entries are close to zero. As this would also lead to more variables to be estimated, we choose to fit the AR(1) processes individually.

(21)

term, for simulation j ∈ {1, · · · , N } we get: ˆ

βi,t+1[j] = ci+ φiβi,t+ e[j]i,t

Below, four alternatives for the treatment of the error-term, e[j]i,t, are introduced.

I: Base-case (BC)

The base case is as the model proposed by Diebold and Li. We draw normally distributed error-terms with mean zero and standard deviation equal to the standard deviation of the residuals si. We use these error terms in our simulations for ˆβ

[j] i,t. 2

II: Residual resampling (RS)

For a window with length w, we have w βi-parameters, which we fit into AR(1)-processes,

this results in w − 1 residuals. From these residuals we draw one and use this as our error term, hence e[j]i,t = ˜ε[j]i,r

i with ri∈ {1, · · · , w − 1}. We do so for all AR(1)-processes separately.

III: Correlated error terms (CD)

In this procedure we use error terms that are correlated. In step 2b, we derived the covariance matrix of the residuals, we want the randomly drawn error terms to have the same variance and covariance structure as the residuals. We start with drawing a vector of standard normal error terms for the AR(1)-processes. Next we pre-multiply the vector with the Cholesky-decomposition of the covariance matrix. This multiplication results in a vector of correlated error terms, which we use in our forecast simulations.

IV: Joint residual resampling (JRS)

The joint-resampling procedure works in a similar way as the (individual) resample procedure but now the residuals are drawn jointly. This means that we draw one r ∈ {1, · · · , w − 1} for the AR(1)-processes combined. Hence we get e[j]i,t = ˜ε[j]i,r with r ∈ {1, · · · , w − 1}.

If we make forecasts with horizon h > 1, the forecasts becomes path dependent. This means that we forecast the AR(1)-process for 1 period and re-use these simulated values ˆβi,t+1[j] in the AR(1)-model. We repeat this process for h steps and are interested in the terminal simulated β’s.

2For this procedure, it would be possible to determine the point-forecast as well as the standard deviation

in closed form. As this is not the case for the other procedures, we prefer to do Monte Carlo simulation on this procedure as well, to evaluate all procedures in the same way.

(22)

4.1.5 Step 4: Fit yield curve and make inference

From our simulations, we have N sets of three or four forecasted (terminal) β-parameters. The next step is to use these parameters as well as the λ-parameter(s) we derived in step 1, as input in the Nelson-Siegel or Adjusted Nelson Siegel Svensson formula to derive the associated term-structure. This gives us N forecasted yield curves for every procedure, which we can compare with the realized yields.

4.2

Motivation of chosen procedures

In the previous section, we went through the forecast procedure and the steps involved. But why do we propose to compare these specific procedures? The first procedure (BC) serves as a benchmark for all the three other procedures. This is the model Diebold and Li proposed and is proven to perform very well compared to other (non-Nelson-Siegel) models. In Chapter 6, we will show non-normality of the residuals in the AR(1)-processes. The second model, the individual resample procedure, is chosen as this procedure should grasp the distribution of the error terms better than a normal distribution. We just re-use the old residuals and attempt to replicate the distribution in our forecast. We do not consider any correlation between the AR(1)-processes of the β-parameters. The BC procedure and the RS procedure are considered as we want to investigate if the use of normally distributed error terms suffices while the historical residuals are non-normal.

As a third procedure we have the correlated model. With this model we introduce cor-relation between the error-terms, but not between the processes as we use AR(1)-processes on the parameters individually. The fourth model is proposed as this procedure should grasp the irregularity of the correlation of the error terms as well as the (non-normal) distribution3. We draw the residuals jointly and we expect this to grasp the simultaneous movements of the error terms in the AR(1) processes. Via these two procedures, we investigate if it is better to use a general overall correlation while the tail-correlation shows this might be to much of a simplification. In the fourth procedure, JRS, we both include the historical correlation as well as the non-normality. We will compare all proposed procedures to the Base-case but also to each other.

As we see in Step 3, the forecasts only differ in the simulated error-term. For all the different procedures, the error-term has mean zero therefore the expected value for the procedures are equal. In many financial purposes, the variance of maturity dependent yields is of interest too. Think of any financial product that is non-linear in the yield, the correct specification

3

(23)

of the spread of the yield is of importance.

In risk management models, Value-at-Risk (VaR) and Expected Shortfall (ES) are used as risk measures. The VaR is just the desired percentile of the loss function of a model and ES is the expected loss, conditional on the loss being larger than the VaR. Both methods make use of the loss function and the underlying distribution and are therefore very dependent on the specification of these. The four forecast procedures we compare should differ in the distribution of the outcomes: the density forecast.

4.3

Evaluation criteria

Evaluation criteria are applied to measure the quality of the forecast. Two of such criteria are the deviation from the realized value, the forecast error and the variance of the forecast. As mentioned before, we do not expect differences in the point-estimation between the models apart from the simulation errors, hence forecast errors to be equal. Besides looking at the standard deviation, we will also look at another criterion, the Dawid Sebastiani scoring rule.

Dawid-Sebastiani scoring rule

In the Dawid-Sebastiani scoring rule as proposed in Gneiting and Ratery (2007) there is a trade-off between goodness of fit and variance:

DSS(τi) =

(¯yˆt|t−h(τi) − yt(τi))2

σ2y

t|t−h(τi))

+ log(σ2(ˆyt|t−h(τi)))

Based on this scoring rule, we prefer the method with the minimum value. Again we use the forecasted value (¯yˆt|t−h(τi)), the average of the N simulated values and as well as the

standard deviation (σ2(ˆyt|t−h(τi))) of these simulated values. The score becomes lower if the

forecast is close the the realized value, hence if the forecast error is low. If we differentiate the DS scoring-rule w.r.t. the variance, σ2, we get4:

∂ ∂σ2DSS = ∂ ∂σ2 (¯yˆt− yt)2 σ2 + log(σ) = − (¯yˆt− yt)2 (σ2)2 + 1 σ2

The derivative is zero if (¯yˆt− yt)2 = σ2. As the DSS-function w.r.t. variance is convex,

therefore at (¯yˆt−yt)2 = σ2 the DS-scoring rule attains a minimum. This might feel intuitively

wrong, we usually prefer a model in which the variance of our forecast is lowest. The DS-scoring rule ties the forecast error to the variance. A large forecast error should be predicted as well, reflecting in a larger variance. We would prefer that the point-forecast would have

4

(24)

been made with less certainty hence we expect more deviation from this value at forehand.

Note that the DS scoring formula is not injective, therefore given some forecast error, two dif-ferent variances can result in the same scoring value. In this way two total difdif-ferent variances can be equally desirable based on this score.

(25)

Chapter 5

Data

In this chapter we discuss the data we have used. We consider the data itself and discuss the characteristics.

Figure 5.1: Yields for the Jungbacker subset, January 1979 up to December 1999.

5.1

Jungbacker dataset

The dataset we used in this thesis is a subset of the data used in Jungbacker et al. (2014). This dataset consists of zero yields for US treasuries based on unsmoothed Fama - Bliss forward rates. The subset we use, runs from January 1979 up to December 2009. This dataset is obtained similarly to the set used in Diebold and Li (2006) but covers a larger period. For more details about the methodology and construction of the dataset, we refer to Diebold and Li. We divide this dataset into two parts; the first 21 years, January 1979

(26)

up to December 1999 serve as our in-sample set and the last 10 years for out-of-sample testing.

The dataset consists of end-month yields for 17 different maturities, varying from 3 months to 10 years. To be more specific, maturities under consideration are: 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, 120 months. We have for each maturity 252 data-points, which results in a total of 4284 maturity-dependent yields.

A three-dimensional representation of the dataset over the period January 1979 up to De-cember 1999 is given in Figure 5.1 . This shows that yields for short maturities are usually lower than yields for higher maturities, therefore most of the yield curves are upward sloping. We also see that, for all maturities, yields decreased over time. In Figure 5.2 the evolution of yields of some selected maturities over time is drawn, which confirm that same trend. Yields of lower maturities fluctuate more, than yields of higher maturities; the standard-deviation is higher. This - as well as other descriptive statistics - can be found in Table 1 in the Appendix.

Figure 5.2: End-month yields for three different maturities: 3-month (blue), 2-year (red) and 10-year (green), in the period from January 1979 to December 1999.

We also notice that there are periods of flatter yield curves and periods of a steeper yield curve. This is illustrated in Figure 5.3, in which we have shown different yield curves. Over time we have seen periods with large fluctuations and periods in which the yields behave quite stable. Yields of all maturities were subjected to these higher and lower fluctuations.

(27)

Figure 5.3: Yield curves at different dates: January 1981 (green) the shape is inverted, September 1989 (blue), we have a flat yield curve, May 1992 (red) represents a normal yield curve.

(28)

Chapter 6

In-sample fit

6.1

Fit of yields

We fit the Nelson-Siegel model and the Adjusted Nelson Siegel Svensson model on the data described in Chapter 5. The optimal λ for the NS model is λ = 0.0963, for the AS model λ1 = 0.0584 and λ2 = 0.0722. This means that the hump in the Nelson-Siegel model is at

τ∗ ∼= 17 months. For the Adjusted Nelson Siegel Svensson model, X2 attains its maximum

at τ1∗ ∼= 32 months and third factor loadings is maximal at τ2∗ ∼= 12 months. Using these λ’s we derive β-parameters. The goodness of fit statistics at all maturities are given in Table 2 in the Appendix.

The AS model overall fits the data better. The standard deviation of the residuals as well as the minimal and maximal error is smaller compared to the Nelson-Siegel model. We also see that on almost all maturities the fit of the Adjusted Nelson Siegel Svensson is better than the Nelson-Siegel model. This difference is especially very distinct on the 3-month maturity. In the Nelson Siegel model the average error is 0.0379, where in the AS model the average error is -0.0047, which is also reflected in the standard deviation of the error.

In the fit of the Nelson-Siegel model, we see that short and long maturities are in general overestimated, the average error is negative. The mid-term maturities are underestimated as the error term is positive. In the AS model, we only see such discrepancy to a lesser extend. This difference between the two models means that the AS model fits the curvature better than the NS model. This is not a very unexpected result as in the AS model, we have two parameters for the curvature, whilst the NS model has just one.

(29)

6.2

Nelson-Siegel parameters

We have shown in Chapter 3 that the β-parameters in the NS model can be interpreted as level, slope and curvature. For the Adjusted Nelson Siegel Svensson model, both β2 and β3

are the curvature parameters. In both models β0 and β1 are interpreted as level and slope

parameters. We observe similar statistics for these parameters in both models in Table 3 in the Appendix.

Correlation between β0 and β1, is similar for the Nelson-Siegel model and the Adjusted

Nelson Siegel Svensson model. These two parameters have a correlation of almost zero and are uncorrelated. The β2-parameter in the NS model is partially correlated with the two

other parameters. In the AS model,the β2 and β3 parameters are negatively correlated.

Figure 6.1: Evolution of β-parameters over time for fixed, optimal λ. In the NS-model (upper graph) λ = 0.0963, in the AS model (lower graph) λ1= 0.0584 and λ2= 0.0722. The line-colors are defined

as follows: β0: blue, β1: red, β2: green and β3: black (for the AS model only)

In both models we see high autocorrelation of the level-parameter, even after 24-months. The autocorrelation of the β1 term declines faster in the NS model than in the AS model.

(30)

period-on-period fluctuations. For the slope and curvature parameters, the 24-month auto-correlation is almost zero.

In Figure 6.1 we see the course of the β-parameters over time. This figure is in agreement with our findings from the statistics: the level-parameter in both models behave similarly, as well as (to a lesser extent) the slope-parameter.

6.3

Autoregressive model fit

In Figure 6.1 we see the evolution of the β-parameters over time. We want to grasp these movements into the autoregressive processes, to model the behavior over time. Table 4 in the Appendix shows the statistics of the AR(1)-processes on the parameters. The standard de-viation of the error terms in the processes are in line with the (small) movements in the graph.

The development of the slope parameter shows more movement, which is reflected in the standard deviation of the error terms. The standard deviation in the curvature parameters is highest which we see in the figure as well. For the DAS the standard-deviation of the curvature parameters is lower than in the DL model. The AR-parameters agree with the autocorrelation factors we have derived in the previous section.

6.3.1 Non-normality

In simulations of an AR(1)-process, we assume the simulated error terms are normally dis-tributed. As can be seen in Table 6.1, the skewness and kurtosis of the residuals do not agree with a normal distribution. 1 The Jarque-Bera test statistic on normality of the error terms, can be found in this table as well. Under the null-hypothesis, the test-statistic follows an χ2-distribution with two degrees of freedom and we reject the normality assumption of the residuals in all of the AR(1)-processes.

1

(31)

Diebold-Li Dynamic Adjusted Nelson Siegel Svensson

Skewness Kurtosis JB statistic Skewness Kurtosis JB statistic

β0 -0.1134 3.8093 7.005∗ β0 0.1095 4.2085 14.959∗∗∗

β1 -0.7361 9.6652 462.040∗∗∗∗ β1 -0.2078 7.2940 184.561∗∗∗∗

β2 0.7293 12.4403 904.864∗∗∗∗ β2 -0.0214 4.6255 26.220∗∗∗∗

β3 0.3928 7.7342 228.379∗∗∗∗

Table 6.1: Statistics of error terms in AR(1)-processes on the β-parameters individually. The JB statistic is the Jarque-Bera test statistic on Normality.(∗ indicates rejection at a 95 %-significance

level,∗∗∗ for an 99.9%-level and∗∗∗∗ indicates rejection at an 99.99 % significance level.)

6.3.2 Correlation

Additionally to the non-normality we did some further investigation of the residuals in the AR(1) processes, to be more precise: the tail-correlation. We sorted on one of the residudals and derived the correlation of the cross-terms for the first p pairs. If we have sorted on the first residual and p = 15, we only consider the 15 lowest residuals in the first AR(1) pro-cess and derive the correlation between these and the corresponding residuals in the other AR(1)-processes. This gives the correlation in the left-tail of residuals, additionally if we take the highest p terms, we get the correlation in the right tail. Correlation sorted on the first residual in the Diebold-Li model is shown in Figure 6.2.

We see that especially ρ1,2 and to a lesser extend ρ1,3 behave differently in the left and

right tail. For p = 242, all terms are taken into account and the values are the values are the same in the left and right graphs. The different behavior means that large positive shock influences the other processes differently than a large negative shock. We see a similar image if we sort on the other residuals as well as in the Dynsmic Adjusted Nelson Siegel Svensson model.

In the previous chapters, we have shown that the three β-parameters can be interpreted as level, slope and curvature. A large shock in one of the parameters, which happens in the tails, changes the yield curve (as a whole) in another way than for smaller shocks.

(32)

Figure 6.2: Correlation between the residuals sorted on the first AR(1) process of the Diebold-Li model. The correlation between the first and the second process, ρ1,2 is represented by the green line,

ρ1,3 is shown in blue and ρ2,3 is the red line. The graph on the left is sorted low-high, the graph on the right is sorted high-low, on the horizontal axis we have p, the number of pairs taken into account.

(33)

Chapter 7

Forecast

In the previous chapter, we discussed the in-sample fit of the Diebold-Li and Dynamic Ad-justed Nelson Siegel Svensson model. From the AR(1) processes on the β-parameters, we can make forecasts on future parameter values. We use these to calculate forecasted yields for all maturities. In this chapter we make such forecasts for the period between January 2000 and December 2009 (120 months) and we evaluate these.

7.1

Models under consideration

In our forecast procedure, we have several options to consider. We can choose what underlying model we use, DL or the DAS model. We can vary the forecast horizon, between 1 month and 1 year ahead. Another choice is the window length, the number of previous periods we take into consideration for the estimation of our model: how far do we want to go back when we fit our model and forecast the yield curve? Table 7.1 gives an overview of all possible options we can choose and the characteristics our forecasts could have.

7.2

RMSFE and Standard deviation

When comparing the forecast abilities of different models, we usually first looks at the Root Mean Squared Forecast Error. As the name already gives away, this statistic looks at the squared forecast errors and averages these out. This gives an overall view on the point es-timation of the model compared to the realized values. The four forecast procedures we compare, each follow the same AR(1) process, but differ in the simulated error terms. The error terms in all procedures have an expected value of zero, therefore the forecasts of the yields itself are the same for all procedures1.

1

When doing Monte Carlo simulations, we notice small differences between the point forecast, but this is only due to the simulation error.

(34)

Topic Options

Model Diebold-Li and Dynamic Adjusted Nelson Siegel Svensson

Forecast procedure BC, RS, CD and JRS Forecast horizon h = 1, 3, 6, 12 months

Window-length w = 60, 80, 120, 160, 200, 240 and maximal

Forecast period T = January 2000 - December 2009 Maturities τ M = 3, 6, 9, 12, 15, 18, 21, 24, 30

36, 48, 60, 72, 84, 96, 108, 120 months

Simulations N 1000

Table 7.1: Forecast options

Secondly we are interested in the standard deviation of the forecast, how certain are we that the realized value is close to the forecasted value. We prefer a standard deviation as close as possible to the true standard deviation. In Table 6 to 9 in the Appendix, we show the standard deviations for the forecasted yields consolidated over the different window lengths. If we purely look at the standard deviation of the forecasts, we see that between the four procedures, the JRS method has on all maturities, horizons and in both models the lowest values. We also see that the standard deviation of the BC and RS procedures are very close to each other. The CD procedure has the largest standard deviation on all maturities and forecast horizons.

We prefer a lower standard deviation only to a certain extent. The confidence interval should be as small as possible, but the realized value should lie within this interval at the percentage equal to the at forehand determined significance level. In Chapter 4 we introduced the Dawid-Sebastiani scoring rule, which is a trade-off between forecast error and standard deviation. We have shown that give some forecast error, a variance equal to the squared forecast error gives the lowest and most desired Dawid Sebastiani scoring value. The JRS procedure has the lowest variance. The forecast error fluctuates a lot over time, in most cases the variance in the JRS procedure is closest, but there are also times that this is totally not the case.

(35)

the outcome of the Dawid-Sebastiani scoring statistic. As we have a lot of data and test values, we consider the percentages a certain method is ranked 1st, 2nd, 3rd and 4th. This ranking is done on all forecast horizons, window-lengths, forecast periods and maturities and only broken down on the option under consideration. We first look at the difference in fore-cast procedures only and consolidate on all other options: forefore-cast horizon, forefore-cast window and all maturities. This is our overall view from which we single out one factor at the time. We explain the topic in detail and the differences associated with the possible options.

7.3

Forecast procedure

The forecast procedures we use are explained in detail in Chapter 4. We use the base-case model (BC) as a benchmark to evaluate our other models. Secondly, we use the residual resample model RS in which we separately resample the residuals in the AR(1)-processes for the β-parameters. The third model we consider is the model with randomly drawn correlated error terms (CD). Finally we propose the joint residual resample model (JRS) in which the underlying distribution and correlation of the error terms are preserved.

Rank BC RS CD JRS

1 12.2% 14.1% 18.3% 55.4%

2 42.8% 47.6% 3.1% 6.5%

3 43.5% 37.3% 7.1% 12.1%

4 1.5% 1.0% 71.5% 25.9%

Table 7.2: Ranking based on Dawid-Sebastiani Scoring rule consolidated over all models, horizons and windows.

First of all the JRS procedure is in more than half of the cases based on the Dawid Sebastiani scoring rule the most preferred procedure. The BC and RS procedures have almost the same standard deviation as shown in the previous section, this results in similar ranking based on the DS scoring. Between the two, RS performs just a little better than the BC procedure. Overall we see that CD performs the worst of these four procedures, ranking in almost three out of four simulated forecasts last. The JRS procedure either performs most favorable or least favorable, ranking one in four times last. As mentioned these rankings are consolidated over all possible choices in our forecast procedure. We can say that these are the average rankings between the four procedures.

(36)

7.4

Forecast horizon

We want to forecast the yield curve with respect to a fixed period: January 2000 up to De-cember 2009. This means that the data used to forecast the yield curves differs per forecast horizon and window-length. The 1-month ahead forecast for January 2000 uses data up to December 1999. The 12-month ahead forecast uses data up to January 1999 and w periods back. Similarly we can derive that for the first forecasted period, the 3-month ahead forecast uses data up to October 1999 and the 6-month ahead forecast uses data up to July 1999. In this way the period for which we make forecasts is for all horizons the same.

For a 1-month forecast horizon, the JRS method has only w sets of residuals to be resampled. As we do N = 1000 simulations per forecast moment, we end up with some identical forecast within these simulation. For h = 3 months, the number of possible outcomes is w3 and in general there are wh possible outcomes. Hence for all but the 1-month ahead forecast, this problem does not arise.

1-Month ahead 3-Months ahead

Rank BC RS CD JRS Rank BC RS CD JRS

1 10.3% 10.5% 8.3% 70.9% 1 11.6% 14.1% 12.8% 61.5%

2 42.2% 49.3% 2.6% 5.9% 2 43.5% 46.8% 3.1% 6.6%

3 45.8% 39.0% 4.2% 11.0% 3 43.2% 37.9% 5.9% 13.1%

4 1.8% 1.2% 84.9% 12.2% 4 1.8% 1.2% 78.2% 18.8%

6-Months ahead 12-Months ahead

Rank BC RS CD JRS Rank BC RS CD JRS

1 12.5% 14.4% 19.8% 53.4% 1 14.5% 17.4% 32.2% 35.9%

2 42.9% 47.6% 2.9% 6.6% 2 42.5% 46.7% 3.8% 7.0%

3 43.3% 37.0% 8.4% 11.4% 3 41.8% 35.2% 9.9% 13.1%

4 1.4% 1.0% 69.0% 28.7% 4 1.2% 0.7% 54.1% 44.0%

Table 7.3: Rankings on the Dawid-Sebastiani Scoring rule, broken down for all forecast horizons.

As we increase the forecast window we do see some changes in the ranking based on the DS scoring. For a short forecast horizon, the JRS procedure has in more than two out of three forecast the lowest DS scoring value. As we increase the horizon, this decreases to one out of three for the 12-month forecast. Where the JRS procedure gets less favorable, all

(37)

the other three procedures get more favorable, but the CD procedure in particular. For the 1-month forecast just in 8.3% of the times the CD procedure has the lowest DS value and in 84.9 % the highest, but for the 12-month forecast, these percentages are 32.3% and 54.1% respectively. Still the CD performs relatively to the other procedures least favorable, but not as bad as for the short horizon. The BC and RS procedures constantly improve compared to the others. As the horizon increases, the difference between the two becomes larger in favor of the RS procedure.

For the 1-month forecast, we can clearly see that the JRS procedure performs best, but for larger forecast horizons it becomes harder to see a clear winner between the procedures. Based on the percentage ranking first, we would still go for the JRS procedure, but this would involve being in 44.0% the worst procedure. Our choice would go for the JRS procedure in the first three forecast horizons and the RS procedure in the 12-month forecast horizon.

7.5

Window selection

We have 7 different rolling windows on which we fit our model and produce forecasts. These vary from 60 months (5 year) up to 240 months (20 year). Additionally we also consider the full or recursive window, in which we use all data available. This means that the start of our data on which we fit the models, is always at January 1979 and runs up to the most recent moment for our forecast. The size of the window increases as time increases. To compare the difference in outcome in window-lengths we divide the outcome on different window-lengths up into two groups. We handle the window-lengths 60, 80, 100 and 120 months as short windows and the other four: 160, 200 and 240 month, as well as the full window, as long windows. Results are shown in Table 7.4.

Short Windows Longer Windows

Rank BC RS CD JRS Rank BC RS CD JRS

1 10.9% 11.5% 22.6% 55.0% 1 13.6% 16.6% 13.9% 55.9%

2 42.5% 48.6% 4.3% 4.5% 2 43.0% 46.6% 1.8% 8.5%

3 44.6% 38.6% 8.1% 8.7% 3 42.4% 35.9% 6.1% 15.5%

4 2.0% 1.2% 65.0% 31.8% 4 1.0% 0.8% 78.1% 20.1%

Table 7.4: Dawid Sebastiani-scoring broken down on short and long windows.

As we have seen with the choice of the forecast horizon, we see the biggest difference between the window-length in the CD procedure. For longer windows, the CD performs even worse than for shorter windows. This mainly in favor of the RS and BC procedures.

(38)

The JRS procedure benefits mostly in the percentage being ranked fourth. Apart from these differences we do not see any other major changes for larger windows.

7.6

Maturities

From our forecasted β-parameters we can derive forecasted yields for any maturity. We consider only those 17 maturities the we started with. To break down on the maturities we divided the 17 maturities into four groups; 3-12 months, 15-24 months, 30-72 months and 84-120 months. Rankings based on the DS scoring can be found in Table 7.5.

3 - 12 Months 15 - 24 Months Rank BC RS CD JRS Rank BC RS CD JRS 1 12.5% 13.1% 20.0% 54.3% 1 11.8% 12.2% 21.8% 54.3% 2 44.2% 48.1% 4.6% 3.1% 2 44.4% 48.3% 3.5% 3.8% 3 41.6% 37.5% 12.5% 8.4% 3 42.4% 38.6% 8.9% 10.1% 4 1.6% 1.3% 62.9% 34.2% 4 1.4% 1.0% 65.8% 31.8% 30 - 72 Months 84 - 120 Months Rank BC RS CD JRS Rank BC RS CD JRS 1 11.1% 13.0% 19.4% 56.5% 1 13.8% 18.2% 11.7% 56.4% 2 42.7% 48.3% 2.6% 6.4% 2 39.7% 45.7% 1.8% 12.8% 3 44.8% 37.7% 4.7% 12.8% 3 45.0% 35.1% 2.9% 17.0% 4 1.4% 1.0% 73.3% 24.3% 4 1.6% 0.9% 83.7% 13.8%

Table 7.5: Ranking based for different maturity groups, based on the Dawid Sebastiani scoring rule.

For the first three groups, relative performance between the four procedures is quite constant except for a small shift for the CD procedure towards fourth places. In the fourth maturity group, we see the same shift but to a larger scale in favor of mostly the RS procedure. For higher maturities, the CD procedure relatively gives the least desired results. We know that for higher maturities, the β0 parameter has the largest influence. In the CD procedure we

only take the correlation into account, which causes relatively to the other procedures the worst forecast for β0.

7.7

Model

In this section we look at the difference in performance of the four procedures in the DL model and the DAS model. The underlying models are slightly different and therefore produce

(39)

different forecasted values, which we can compare as well. First we will look at the DS scoring rankings for the two models separately to see if there is a difference between the performance of the procedures between the models, this results in Table 7.6.

Diebold Li Dynamic Adjusted Nelson Siegel Svensson

Rank BC RS CD JRS Rank BC RS CD JRS

1 13.6% 15.9% 18.2% 52.3% 1 10.8% 12.2% 18.4% 58.6%

2 42.3% 46.3% 2.1% 9.3% 2 43.2% 49.0% 4.1% 3.7%

3 42.6% 36.8% 5.0% 15.6% 3 44.4% 37.8% 9.2% 8.7%

4 1.5% 1.0% 74.7% 22.8% 4 1.6% 1.0% 68.4% 29.0%

Table 7.6: Ranking based on Dawid-Sebastiani Scoring rule for all models, horizons and windows.

In both models, more than half of the time JRS gives the lowest DS scoring value, but for the DAS model, the percentage is higher and JRS is more favorable than in the DL model. We also see that in the DAS model the percentage fourth places for JRS is also higher. Even more than in the DL model, JRS is ranked either first or last. Again BC and RS have sim-ilar results, but RS is slightly more favorable in both models. Overall we do not see much difference in the ranking and relative performance between the two models.

Although it is not the topic of this thesis, we are also interested in the overall performance of the DAS and DL model individually. As we have seen in Chapter 5, the fit of the Diebold-Li model is different than the fit of the DAS model. Therefore we can compare both models on forecasted values as well. Based on RMSFE, the Diebold-Li model has on average better predictive abilities than the Dynamic Adjusted Nelson Siegel Svensson model, but differences are very small. Diebold and Mariano (1995) propose a method to test equality in predic-tive forcastability, which we will additionally use to compare the two models using the Joint Residual Resampling procedure. In this method we have to use a loss function L(εt) that has

roughly speaking the following properties:

• Takes the value zero when the error-term is zero • Increases in the error-term

• Is always non-negative

In the Dawid Sebastiani scoring, we consider the maturities separately, therefore we propose an loss function that covers the full yield curve. The loss function we use is:

L(εt,τi) = 1 M M X ε2t,τi

(40)

For both models we derive the values for the loss functions and take the difference between the two. The test-statistic is defined as the average if the difference divided over its standard deviation.

Based on the Diebold-Mariano test statistic, we cannot reject the null-hypothesis. This result holds on all the forecast horizons and window lengths. This is not that surprising as the two models are very similar. In Chapter 6, we have seen that the Adjusted Nelson Siegel Svensson model fits the data better. As the AS model has more variables, we have more uncertainty in our forecasts. As both models have equal predictive accuracy, we prefer the DL model with JRS over the DAS model for parsimonious reasons.

7.8

Significance

We see that the percentages in ranking between the procedures differs a lot. In the case that we consolidate over all options, such as in Table 7.2, we have calculated over 130,000 DS-values per procedure. If all procedures would perform equally, the probabilities in the ranking would have been 25% in all the fields. The corresponding standard deviation (of this Bernoulli distribution) would be 0.12%.

This is a very motivating result, but in our case it is not that easy at all, the scoring values are not independent for two important reasons. First of all, for forecast horizons larger than 1-month, our forecasts become path dependent, which makes the standard deviation hard to determine. Secondly, for all maturities we only have 120 realized values which we use in the calculation of the scoring value.

Taking the dependence issue into account, we still see large differences between the per-centages, especially for the shorter forecast horizons. As described above, it is hard to determine the significance level of the ranking, we are confident that apart from the BC vs. RS, percentages are significantly different.

7.9

Rejection Frequency

As we have seen in the previous section, it is difficult to determine the significance of the outcome. We know that in well specifies models, the rejection frequency is equal to the sig-nificance level α. In Table 10, it is shown for different sigsig-nificance levels what the rejection frequency, the frequency that the true value lies outside the empirical confidence interval. The confidence interval is determined by taking the α/2 and (1 − α/2) percentiles in the simulations.

(41)

In Section 7.2 of this chapter we have seen that the standard deviation for the JRS method is the smallest of the four procedures and the CD gives the highest standard deviation. Similar results can be seen in the table above. As the JRS results in a smaller confidence interval, the rejection frequency is higher than for the other models. Especially if we increase the forecast horizon, the rejection frequency becomes quite high.

Based on our research, we can say that the CD procedure under-rejects and the JRS pro-cedure over-rejects on all forecast horizons. For the two other propro-cedures, both the forecast horizon and the significance level play a role. If we increase the forecast horizon, the rejection frequency increases for both procedures. For the 90% significance level and 1-month horizon, both BC and RS under-reject but for the 99% significance level, both over-reject. For forecast horizons above 1-month, both procedures over-reject on (almost) all significance levels.

Between the two models, DL and DAS, we see that for the BC and RS procedures the rejection frequency of the DL model is higher on all horizons and significance levels. For the CD and JRS procedures it is just the other way around, DAS produces higher rejection frequencies.

We can conclude that based on the rejection frequencies, it is hard to see a true pattern except that JRS has the highest rejection frequencies and furthest away from the expected rejection frequency. Also CD under-rejects in both models and all horizons and significance levels. This is due to the high variance of the yields, the interval becomes too large. Therefore the true value is almost all the time within the confidence interval.

(42)

Chapter 8

Conclusion

In this thesis we have introduced an alternative procedure to forecast the yield curve. This procedure is an adjustment on the existing yield-curve forecast model of Diebold and Li that uses the curve fitting of Nelson and Siegel. First we have explained the underlying model and we elaborated on the implications. Next we looked at the in-sample properties to finally investigate the forecasting abilities of our proposed procedures.

The dataset we used consists of maturity dependent end-month yields over a period of 31 years. The last 10 years are used to research the out-of sample forecasting abilities of the pro-posed models. The propro-posed procedures should grasp the distribution of the residuals better than the normal distribution. Also the irregularity of the correlation in the tails between the residuals of the individual models is taken into account in the JRS procedure.

To get back to the research question in this thesis: Does the forecast procedure improve when using resampled residuals? Consolidated over all options, the Joint Residual Resam-pling forecast procedure results in forecasts that are based on the Dawid-Sebastiani scoring rule better than the three other procedures we investigated. The normal resampling proce-dure performs just a little bit better than the base-case, therefore we do not see this as a real improvement. The Dawid Sebastiani scoring rule ties the forecast error to the (forecasted) standard deviation from our Monte Carlo simulations.

It turns out that including only the correlation in the error terms results in less accurate forecasts; the CD procedure, has the least desired out-of sample forecasting abilities. If we zoom in on the possible choices for the forecasts, the forecast horizon has the most distinct differences in performance of the JRS procedure. Up to 6 month ahead, JRS performs clearly the best, but for 12-month forecast the JRS forecast procedure has not our preference. For all window-lengths, the advantage of the proposed procedure is equal and applies to all

(43)

ma-turities. These results hold in both the Diebold-Li model and the Dynamic Adjusted Nelson Siegel Svensson model.

The difference in performance on the forecast horizon is also reflected in the rejection fre-quency. If we increase the forecast horizon, the rejection frequencies of all procedures increase and are, apart from the CD procedure, above desired values. This suggests that when fore-casting the yield curve on a longer horizon, the usage of the DL model or DAS model might not be the best choice.

As possible follow-up research, it can be investigated if the same advantages we have seen when using resampled residuals also apply for other forecast methods within the Nelson Siegel framework. The usage of the Kalman Filter in the estimation procedure, is one of the sug-gestions. Also models in which the decay-parameter is treated as a time-varying parameter could be of interest.

To conclude we think that especially the JRS procedure produces forecast that reflect the underlying distribution of the yields better than the normal distribution. This results in equal point-forecasts, but more certainty and lower standard deviation. These results hold on all calibrations and on a shorter forecast horizon in particular.

(44)

References

Bjork, T. and B. Christensen (1999), ”Interest rate dynamics and consistent forward rate curves”, Mathematical Finance, 9 (4), 323-348.

Black, F. and P. Karasinski, (1991), ”Bond and Option pricing when Short-rates are Lognor-mal”, Financial Analyst Journal, 47 (4), 52-59.

Bliss, R. R., (1997), ”Testing Term Structure Estimation Methods”, Advances in Futures and Options Research, 9, 197-231.

Bolder, D., (2001), ”Affine Term-Structure Models: Theory and Implementation”, Bank of Canada, Working Paper, 2001-15.

Bomfim, A. N., (2003), ”Monetary Policy and the Yield Curve”, FEDS Working Paper, 2003-15.

Brace, A., D. Gatarek and M. Musiela, (1997), ”The Market Model of Interest Rate Dy-namics”, Mathematical Finance, 7 (2), 127-147.

Brennan, M. and E. S. Schwarz, (1982), ”An Equilibrium model of Bond Pricing and a Test of Market Efficiency”, Journal of Financial and Quantitative Analysis, 17(3), 301-329.

Caldeira, J. and H. Torrent, (2013), ”Forecasting the U.S. Term Structure of Interest Rates using Nonparametric Functional Data Analysis”, Working paper series.

Christensen, J. H. E., F. X. Diebold, G. D. Rudebusch, (2010), ”The Affine Arbitrage-Free Class of Nelson-Siegel Term Structure Models”, 164(1), 4-20.

Coroneo, L., K. Nyholm and R. Vidova-Koleva, (2008), ”How Arbitrage-Free is the Nelson-Siegel Model?”, European Central Bank, Working paper series, 874.

Referenties

GERELATEERDE DOCUMENTEN

Drift naar de lucht % van verspoten hoeveelheid spuitvloeistof per oppervlakte-eenheid op verschillende hoogtes op 5,5m afstand van de laatste dop voor een conventionele

Because the dynamics of river systems are relatively slow, because to prevent flooding input and state constraints need to be considered and because future rain predic- tions need to

As both operations and data elements are represented by transactions in models generated with algorithm Delta, deleting a data element, will result in removing the

The study sought to explore the characteristics of successful women entrepreneurship in the Vaal Triangle, with specific focus on the strategies the women entrepreneurs

Once the model is estimated we simulate one-year ahead yield curves which we then use for calculation of interest rate risk for portfolio of Sava Reinsurance company.. The results

First, the yield curves of Germany and the UK are modelled with the Nelson-Siegel (NS) curve. As mentioned earlier, the yield curve is analyzed in terms of level, slope and

This table presents regression results in the years before the crisis (pre: 2000-2006) and during and after the crisis (post: 2007- 2014) of the effect of yield curve movements and

Thin slabs having a thickness of 2 cm have been produced in the lab in order to keep the ef- forts of the sample production as low as possible. Producing thin slabs for testing