• No results found

Interrelations in the climate system : Comparison of a vector autoregressive model, a generalized additive model and multivariate adaptive regression splines

N/A
N/A
Protected

Academic year: 2021

Share "Interrelations in the climate system : Comparison of a vector autoregressive model, a generalized additive model and multivariate adaptive regression splines"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Bachelor’s Thesis Econometrics

Interrelations in the climate system

Comparison of a vector autoregressive model, a generalized

additive model and multivariate adaptive regression splines

Petra van Meel

Student number: 10640916 Date of final version: June 28, 2016 Bachelor’s programme: Econometrics

Supervisors: M. Hennequin and H. Li

Faculty of Economics and Business University of Amsterdam

(2)

i Hierbij verklaar ik, Petra van Meel, dat ik deze scriptie zelf geschreven heb en dat ik de volledige verantwoordelijkheid op me neem voor de inhoud ervan. Ik bevestig dat de tekst en het werk dat in deze scriptie gepresenteerd wordt origineel is en dat ik geen gebruik heb gemaakt van andere bronnen dan die welke in de tekst en in de referenties worden genoemd. De Faculteit Economie en Bedrijfskunde is alleen verantwoordelijk voor de begeleiding tot het inleveren van de scriptie, niet voor de inhoud.

(3)

Contents

1 Introduction 1

2 Climatological variables and their couplings 3 2.1 Climatological variables . . . 3 2.2 Couplings between climatological variables . . . 4 3 Theory on GAM and VAR models 6 3.1 Vector autoregressive models . . . 6 3.2 Generalized additive models . . . 7 4 ODP dataset and methodology 9 4.1 ODP site 1264 . . . 9 4.2 Data analysis . . . 10 4.3 Methodology . . . 13 5 VAR and GAM results and analysis 16 5.1 VAR results and analysis . . . 16 5.2 GAM results and analysis . . . 18 6 Performance comparison of VAR, GAM and MARS 22 6.1 Model fit of VAR and GAM . . . 22 6.2 Predictive performance of VAR, GAM and MARS . . . 23

7 Conclusion 25

Bibliography 27

Appendices i

(4)

Chapter 1

Introduction

At the December 2015 United Nations climate conference in Paris harsh measurements to limit the rise in global temperatures are agreed upon. This indicates that global climate change is taken seriously. In the current debate on global warming, many scientists believe that global warming is mostly due to human influences. They mention greenhouse gasses and other anthro-pogenic forces as key causes for the warming of the planet. Wigley and Raper (2001) state that the temperature is going to increase with 1.7◦C to 4.9◦C from 1990 to 2100 (90% confidence interval) if the emission of greenhouse gasses is not reduced. This implies that the temperature would rise with 0.3◦C on average per decade if no action is taken. Droughts, extinction of species and sea level rise caused by a retreat of glaciers are just a few of the consequences of this increase in temperature. According to Karl and Trenberth (2003) however, natural forcings such as insolation changes and volcanic eruptions play a huge role in the rise of temperature as well. Hence, disagreement about the causes and effects of global warming is present. Karl and Trenberth (2003) emphasise for example that the exact effects of anthropogenic global warming on the climate system are still unknown.

Since it is important to reduce the uncertainty about climate changes in the future, it is of interest to investigate the long-term dynamical relations between climatological variables. The aim of this study is to construct a suitable model for the dependencies between the climatological time series variables insolation, carbon isotope and oxygen isotope by comparing performances of linear and nonlinear models. Diks and Mudelsee (2000) find that variables of the climate system are interrelated and that couplings between them are nonlinear. On the other hand, Attanasio, Pasini and Triacca (2012) argue that, due to the central limit theorem, averaging can result in near-linear relations between climatological variables. Aforementioned findings motivate the use of both linear and nonlinear models to investigate the dependencies in the climate system. The model fit and predictive performance of these models are compared in order to find the best suitable model for the couplings between climatological variables.

In order to construct a suitable model for the dependencies between climatological vari-ables, a literature review on climatological variables is conducted first. The couplings between them and the nature of these couplings are investigated, resulting in a number of hypotheses. Thereafter, the data obtained from Ocean Drilling Program (ODP) site 1264 is introduced.

ODP is a program which obtains the climatological time series variables sedimentary

(5)

CHAPTER 1. INTRODUCTION 2 gen isotope and sedimentary carbon isotope by drilling in the bottom of the ocean. These variables are reliable recorders of global ice volume and deep ocean circulation, respectively. Moreover, since solar radiation is also one of the main factors that influence the temperature (Diks & Mudelsee, 2000), insolation is included as climatological variable as well. The time series variables contain 2032 observations which cover the past five million years. Due to the fact that sedimentation is a nonconstant process, the time series are unevenly spaced. This problem is mitigated by dividing the dataset into four distinct time periods. Since Diks and Mudelsee (2000), Hennequin (2012) and Vogel (2012) use roughly the same four time periods in their analysis of climatological variables, splitting up the dataset has the additional advantage that the results obtained in this study can be compared to their results. After splitting up the dataset, the characteristics of the data in these periods are investigated and models for each of the four time periods are constructed.

To investigate the linear couplings between climatological variables, a vector autoregressive (VAR) model is estimated. This multivariate model takes the possible endogeneity of the climatological variables into account. To allow for nonlinearities, a generalized additive model (GAM) is estimated. This semiparametric model uses information from the data to construct the functional form, instead of parametric relations being specified by the user. After the models are computed, both their predictive performance and goodness of fit are compared in order to determine the best climate model. Both models are also compared with multivariate adaptive regressions splines constructed on the same dataset.

The remainder of this study is organised as follows. Chapter 2 contains a literature review on climatological variables and dynamical relations between them in the long run. Next, theory on vector autoregressive and generalized additive models is described in Chapter 3. There-after, Chapter 4 describes and analyses the ODP site 1264 dataset and gives the empirical methodology. The results of the linear and nonlinear models are then presented in Chapter 5. Subsequently, comparisons of the model fit and predictive performance of the models are presented in Chapter 6. Lastly, Chapter 7 draws a conclusion from the results.

(6)

Chapter 2

Climatological variables and their

couplings

In order to gain insights into factors which influence the temperature and their interrelation-ships, this chapter presents results from earlier research on the climate system. First, Section 2.1 describes climatological variables and their relationships with temperature. Subsequently, Section 2.2 elaborates on the mutual relationships between climatological variables and the na-ture of these relationships. The findings in this chapter eventually lead to hypotheses on the dependencies in the climate system.

2.1

Climatological variables

This section provides an overview of climatological variables. One of these variables is insolation, which is the incoming solar radiation on a certain surface area of the Earth. Insolation depends on orbital changes of the Earth (Berger & Loutre, 1991) and the angle of incidence of the sun’s rays on Earth (Khavrus & Shelevytsky, 2010). Berger and Loutre (1991) explain that variations in the orbital plane of the Earth influence the mean Sun-Earth distance. They reason that since insolation is obviously affected by this distance, insolation depends on fluctuations in the orbital plane of the Earth. Moreover, the angle between the Earth’s surface and the Sun determines the surface over which the solar energy is spread and therefore the solar radiation at a certain latitude on Earth (Khavrus & Shelevytsky, 2010). Many studies find a link between insolation and temperature. Imbrie and Imbrie (1980) for example, state that variability in insolation due to fluctuations in the orbit of the Earth influences the temperature. Khavrus and Shelevytsky (2010) support this statement, they argue that the larger the angle between the Sun and the surface of the Earth, the higher the temperature on Earth. Shindell, Rind, Balanchandran, Lean and Lonergan (1999) also state that solar radiation is of large influence on the global climate system. Insolation is therefore regarded as an important climatological variable.

Secondly, the variable oxygen isotope is considered. The oxygen isotope δ18O is a reliable recorder of global ice volume and water temperature. Mudelsee and Stattegger (1994) argue that an increase of δ18O implies a decrease in water temperature or increase of glaciation. They therefore state that, under the assumption of an approximately constant oceanic deep water temperature, the oxygen isotope is an important climatological variable. Although Diks and Mudelsee (2000) do not specifically mention the assumption stated above, they support the

(7)

CHAPTER 2. CLIMATOLOGICAL VARIABLES AND THEIR COUPLINGS 4 statement that the oxygen isotope is an important climatological variable. Thompson et al. (1995) and Fricke and O’Neil (1999) also find the existence of a globally significant relationship between the oxygen isotope and temperature. The latter two authors find differences in the strength of these relationships and argue that this is due to fluctuations of climate mode-specific factors on the oxygen isotope. Based on the findings above, δ18O can be seen as a proxy for the temperature and is thus a significant climatological variable.

The third variable, the carbon isotope δ13C, is a reflection of the strength of formation of North Atlantic Deep Water (NADW). According to Raymo, Hodell and Jansen (1992), it is used to map patterns of deep ocean circulation. They find that a decrease of global temperature leads to an increase of suppression of NADW formation. Grossman and Ku (1986) obtain the same result: they evidence that a decrease in temperature by 1◦C leads to an increase of δ13C by 0.11%. Since Diks and Mudelsee (2000) also consider the carbon isotope as an important climate influencer, δ13C is included as the third climatological variable.

The last variable considered is dust flux. Dust flux indicates the speed of deposition of sedi-ment and is therefore a measure of fluctuations in aridity (Tiedemann, Sarnthein & Shackleton, 1994). A significant positive correlation between dust flux and global temperature during glacial periods is discovered by Lambert et al. (2008). Their results point out that this correlation becomes stronger for lower temperatures. On the other hand, they find that this correlation is lacking during interglacial periods. Although it seems that the relationship between dust flux and global temperature is lacking in certain climates, the dust flux variable is still a relevant climatological variable.

In short, previous research suggests four important climatological variables: insolation, dust flux, δ18O and δ13C. The remainder of this study concentrates on these variables. The next sec-tion further investigates the existence of interrelasec-tions between the four climatological variables and their nature.

2.2

Couplings between climatological variables

Research on the dynamical relationships between climatological variables based on nonlinear correlation measures by Diks and Mudelsee (2000) indicates that the coupling between insola-tion, δ18O and δ13C has been increasing over the past five million years. They conclude that

this development is predominantly caused by the increased coupling between δ18O and δ13C. Furthermore, they demonstrate that δ18O is a Granger cause of δ13C in some time periods. They also conclude that (in some time periods) insolation is a Granger cause of the two clima-tological isotopes. This last finding is supported by Tiedemann et al. (1994). They state that changes in insolation due to fluctuations in the orbit of the Earth are a driving force of the δ18O record. In accordance with the results of Diks and Mudelsee (2000), Edwards et al. (2000) find a coupling between δ18O and δ13C. They therefore suggest to use the two variables together in

order to investigate climate changes. Lastly, a relationship between dust flux and δ18O is found by Kolla, Biscaye and Hanley (1979). They argue that deposition of sediment, and thus dust

(8)

CHAPTER 2. CLIMATOLOGICAL VARIABLES AND THEIR COUPLINGS 5 flux, is influenced by fluctuations in ice volume. The relationship found between dust flux and δ18O is particularly strong in the last glacial period.

So far, it has been found that climate variables are often related to each other. The remainder of this section further investigates the nature of these interrelationships. From the literature it appears that the climate system is most often modelled by global climate models. These models take the physics, chemistry and biology of the planet and atmosphere into account and allow for interrelations with for example the Sun and volcanic eruptions (Karl & Trenberth, 2003).

Another frequently used method, for example by Pasini, Lor`e and Ameli (2006), is the application of nonlinear models. Since these models lead in general to the same results as the global climate models, the relationships between climatological variables are predominantly assumed to be nonlinear. The statement of nonlinearity of the climate system is confirmed by Diks and Mudelsee (2000). By using the conditional marginal redundancy as test statistic for Granger causality, they find nonlinear couplings between climatological variables.

Despite the widely assumed nonlinearity of the climate system, Attanasio et al. (2012) state that (due to the central limit theorem) averaging of the data can cause the relations between variables of the climate system to be nearly linear. They show that a linear predictive method is able to make accurate temperature predictions. Based on these findings, they argue that a linear model can provide reliable estimates of causal links in the climate system.

From the literature it appears that still some uncertainty about the climate system exists. Although many studies find evidence that the variables insolation, dust flux, δ18O and δ13C influence the temperature, the exact interrelationships of these climatological variables and the nature of these relationships are still unclear. This can be further investigated by comparing re-sults of linear vector autoregressive (VAR) and nonlinear generalized additive models (GAMs). Since most research shows that the interrelationships between climatological variables are non-linear, a hypothesis in this study is that the predictive performance and model fit of GAMs are better than those of VAR models. Furthermore, earlier research shows that δ18O is a Granger cause of δ13C. Another hypothesis is therefore that δ18O is able to explain fluctuations in δ13C.

Moreover, changes in the orbital plane of the Earth and the angle of incidence of the sun’s rays on Earth are assumed to be the main influences on insolation. Consequently, it is expected that δ18O and δ13C can hardly explain changes in insolation. Before testing these hypotheses, the next chapter describes theory on both a linear vector autoregressive model and a nonlinear generalized additive model.

(9)

Chapter 3

Theory on GAM and VAR models

It appears from the previous chapter that the couplings between climatological variables are in general considered to be nonlinear. However, since Attanasio et al. (2012) argue that linear methods also provide good results, it may be of interest to investigate linear couplings as well. It is thus convenient to use both a linear and nonlinear model on the ODP data in this study. Before applying the models, theory hereof is presented. Section 3.1 focusses on linear vector autoregressive models and mentions the possibility of using autoregressive moving average (ARMA) models for the squared innovations 2t in case of time varying variance. Hereafter, Section 3.2 describes theory on generalized additive models.

3.1

Vector autoregressive models

This section discusses a linear model for the time series of (possibly endogenous) variables. Heij, De Boer, Franses, Kloek and Van Dijk (2002, p. 656) state that the use of univariate linear models leads to inconsistent estimates of the parameters in case variables are endogenous. The authors therefore suggest to use multivariate models rather than univariate ones in case of possible endogeneity. These multivariate models are called vector autoregressive models and can be efficiently estimated by maximum likelihood. The maximum likelihood estimators which are hereby obtained have the usual asymptotic statistical properties in case of a stationary VAR process (Heij et al., 2004, p. 661-662). The VAR model with p lags for m variables is given by Yt= α + θ1Yt−1+ θ2Yt−2+ ... + θpYt−p+ t, t∼ IID(0, Ω), (3.1)

where Ytis the m × 1 vector of variables, α is the m × 1 vector of constants and tis the m × 1

vector of innovations. θj (j = 1, ..., p) and Ω are squared matrices of order m of autoregression

coefficients and covariances of the innovations.

Normally, it is assumed that all innovations t have the same variance σ2. However, it is

possible that the conditional variance σ2t = var(yt|yt−1, yt−2, ...) contains lagged effects and thus

depends on the past (Heij et al., 2004). If a time series has time periods with large volatility as well as time periods with small volatility, the time series shows clustered volatility. In this case, it is appropriate to use generalized autoregressive conditional heteroskedasticity (GARCH) models. These models try to imitate clustered volatility by modelling the series 2

t as ARMA

models. Since the use of a suitable model for the variance has the advantage that it induces

(10)

CHAPTER 3. THEORY ON GAM AND VAR MODELS 7 more accurate forecast intervals, it is important to use VAR-GARCH instead of VAR models if the time series shows clustered volatility.

3.2

Generalized additive models

In the previous section a parametric technique in which the regression effects need to be linear has been discussed. However, since most research shows that the couplings between climatolog-ical variables are nonlinear, a semiparametric method which could incorporate these nonlinear couplings is now discussed. An advantage of semiparametric techniques is that there is no need to specify detailed parametric relations: instead of making many assumptions upon the func-tional form of the explanatory variables as in the parametric case, they mainly use information from the data to construct the functional form. Semiparametric methods are thus especially useful when the way in which variables are coupled is unknown. This section focusses in par-ticular on semiparametric generalized additive models. Later in this study, GAMs are used to identify nonlinear effects between climatological time series variables.

According to Wood (2006), a GAM is a generalized linear model in which the linear predictor is replaced by an additive predictor. This implies that in a GAM, the mean of a dependent variable is via a link function connected to the sum of nonparametric smooth functions of explanatory variables (partial regression functions). The only underlying assumptions on the functional form of GAMs are thus that the partial regression functions should be additive and smooth. If yi represents the dependent variable, xj (j = 1, 2, ..., m) represents the jth

explanatory variable and fj(·) represents a nonlinear smooth function of the jth explanatory

variable, and if g(·) is defined as the link function which relates the dependent variable to an additive function of m explanatory variables, a GAM can be defined as

g(E[yi]) = α + f1(x1i) + f2(x2i) + ... + fm(xmi). (3.2)

Note that (3.2) gives a very simple specification of a GAM. Trends or interaction terms of ex-planatory variables for example could be easily incorporated in the model as well. Furthermore, note that not all partial regression functions fj(·) need to be nonlinear. Moreover, emphasis is

put on the fact that yi in (3.2) may follow any distribution from the exponential family.

The purpose of GAMs is to fit the partial regression functions fj(·) to the underlying true

function to the best extent possible. In order to achieve this, the partial regression functions consist of linear combinations of several basis functions (Wood, 2006). If bi(·) represents the ith

basis function and αi is an unknown parameter, a partial regression function f (·) made up of

m basis functions can be specified as

f (x) =

m

X

i=1

αibi(x). (3.3)

Since the partial regression functions are required to be smooth, the first two derivatives of the resulting function should be continuous. The smooth partial regression functions can be constructed by using a backfitting algorithm, which is mainly based on a scatterplot smoother

(11)

CHAPTER 3. THEORY ON GAM AND VAR MODELS 8 (Hastie & Tibshirani, 1986). For additive models, this backfitting algorithm always converges to functions of the form (3.2) by minimising

X i  yi−Pjfj(xji) 2 +X j λj Z fj00(tj)2dtj. (3.4)

This criterion thus incorporates both the goodness of fitP

i



yi−Pjfj(xji)

2

to the data and the amount of nonlinearity R fj00(tj)2dtj of the partial regression functions. It appears from the

formula that improvement of the smoothness of the partial regression functions can only be achieved by sacrificing the fidelity to the data (Wood, 2006). The smoothing parameter λj

de-termines the degree of smoothing: the higher the value of λj, the smoother the partial regression

function fj(·). The penalty to ensure a smooth model fit is imposed to avoid overfitting.

Instead of (3.4), the negative penalised log likelihood should be minimised for other gener-alized additive models (Wood & Augustin, 2002). Minimisation of the negative penalised log likelihood can be achieved by backfitting in combination with a maximum likelihood algorithm.

In this chapter a linear as well as a nonlinear regression method have been explored. These methods provide, in combination with the theory discussed in Chapter 2, the basis to estimate explicit models for the dependencies between climatological time series variables. Before doing so however, the ODP data and the empirical methodology are presented in Chapter 4.

(12)

Chapter 4

ODP dataset and methodology

This chapter introduces the dataset and presents the empirical methodology. Section 4.1 ex-plains how the data is obtained and operationalises the climatological variables. Section 4.2 discusses thereafter the problem of unevenly spaced time series and two possible methods to mitigate this problem. Furthermore, the data is analysed to gain insights into their character-istics. Finally, Section 4.3 contains the empirical methodology.

4.1

ODP site 1264

The data used in this study is obtained from the Ocean Drilling Program (ODP) site 1264 (Bell, Jung, Kroon, Lourens & Hodell, 2014) and covers the past 5000 ka (1 ka is defined as 1000 years). The program, which is located on the northern flank of Walvis Ridge, drills in sedimentation layers on the bottom of the eastern part of the South Atlantic. The sediment fractions which are hereby obtained, contain data on the important climatological variables δ18O and δ13C. The observed time series consists of 2032 observations in total. Furthermore, the time series of insolation at 15◦S is incorporated in the dataset (Berger & Loutre, 1991). Since the ages of the insolation data differ from the ones of δ18O and δ13C, linear interpolation of the insolation time series is used in order to match the insolation ages to the isotope ages. This section continues with the operationalisation of the time series variables.

Firstly, insolation is defined as the incoming solar radiation on 15◦ south of the Earth’s equatorial plane. Secondly, the oxygen isotope (δ18O ) is a reliable recorder of global ice volume and water temperature and is defined as the ratio of18O/16O with respect to the Vienna Pee Dee Belemnite (VPDB) reference standard. δ18O is thus defined as

δ18O =

18O

16O sample− 18O16O VPDB 18O

16O VPDB

!

x 1000. (4.1)

Thirdly, the carbon isotope (δ13C ) is a reflection of the strength of North Atlantic Deep Water.

The constructions of δ13C and δ18O are alike, with the difference that the ratio for δ13C is

13C/12C.

Note that data of the variable dust flux is not available. This should be taken into account when interpreting the results in Chapter 5 since omitted variables can lead to biased estimates.

(13)

CHAPTER 4. ODP DATASET AND METHODOLOGY 10

4.2

Data analysis

This section provides an overview of a number of properties of the data. First, two possible methods to address the problem of unevenly spaced time series are mentioned and their benefits and drawbacks are discussed. Thereafter, the dataset is split into four distinctive time periods and the descriptive statistics of insolation, δ18O and δ13C in these four time periods are reviewed. Then the Augmented Dickey-Fuller test is performed to check for stationarity of the time series. In order to collect the data, measurements in the sediment layers are made with intervals of 2.5 cm. The problem herewith is that the deposition of sediment differs over time, whereby the spacing of observation times is not constant. The time series are thus unevenly spaced, which could lead to biased estimates and incorrect conclusions. It is therefore of great importance to resolve the issue of unequally spaced time series. The next paragraphs discuss two methods which can mitigate this problem.

The first method uses interpolation to transform the data into equally spaced time series. Tiedemann et al. (1994) performs for example a Gaussian weighting interpolation. A drawback of performing interpolation methods is that non-existing observations are generated. Obviously, it is better to work with real observations (if possible) rather than with generated ones, since generation of observations can lead to an increase in correlation between observations. This may cause the results to be different than results that would have been obtained when using only real observations.

The second possible method splits the dataset in non-overlapping climatic periods. Diks and Mudelsee (2000) apply this method. They split their ODP 659 dataset in four periods, each of which with a different climatic regime (Period IV is in the early Pliocene, period III is in the late Pliocene, period II is in the early Pleistocene and period I is in the Pleistocene ice ages). They argue that since the coefficient of variation of the time spacing is lower than 50% in each period, a large part of the problem of uneven time spacing is solved by dividing the dataset in these four time periods. Hennequin (2012) critically notes that one should keep in mind that the estimated models for the four periods are not easy to compare. A reason for this is for instance that the number of observations differ in each of the four periods.

Because of the drawback of the interpolation method and the fact that the dynamical rela-tions differ in the four periods as evidenced by Diks and Mudelsee (2000), it is chosen to follow their approach and use the dataset splitting method. The chosen time periods are as follows: period IV runs from 5000.6 to 4050.0 ka ago, periods III and II consist of the observations between 3599.1 to 2590.3 and 2587.2 to 892.6 ka ago and period I started 888.2 ka ago and runs up to 1950. Periods IV, III, II and I contain 821, 400, 475 and 174 observations, respectively. The coefficient of variation of the time spacing in the whole dataset is 73%. The coefficients of variation in the different time periods are, in order from the oldest to the youngest time period, 61%, 56%, 36% and 36%. Thus, splitting the dataset in aforementioned time intervals mitigates the problem of unequal time spacing. This is especially the case for the last two time periods. As mentioned above, care needs to be taken when comparing the models in the different time

(14)

CHAPTER 4. ODP DATASET AND METHODOLOGY 11 periods because of the large differences in observations per time period.

Although the age boundaries differ a bit from the ones used by Diks and Mudelsee (2000) (due to large variations in time spacing from 4050 to 3600 ka ago and the fact that the data in this study consists of about twice the number of observations of Diks and Mudelsee (2000)), the ODP 1264 dataset is split into roughly the same four time periods. By doing so, the results in this study can be compared with the findings of Diks and Mudelsee (2000), Hennequin (2012) and Vogel (2012). For period IV, the comparison is slightly more difficult than for the other periods. Period IV runs from 5000 to 4050 ka ago in this study, while the time boundaries in respect of this period in the studies mentioned above are 5000 and 3583 ka ago.

After splitting up the dataset into different time periods, the properties of the time series in the four periods can be analysed. Table 4.1 shows the descriptive statistics and correlation matrices of periods IV, III, II and I. It appears that the means of insolation and δ13C are relatively constant in the four periods, while the mean of δ18O increases over time. Furthermore, the variance of all variables increases over time (with a decrease in variance of insolation in period I as the only exception). This supports the statement of Diks and Mudelsee (2000) in respect of an increase in climate variability over time. Moreover, the skewness and kurtosis of insolation and δ18O do not show a clear pattern, while those of δ13C do. The skewness of δ13C increases from -1.03 towards 0, which indicates that the distribution of δ13C becomes more symmetrical over time. In addition, the kurtosis of δ13C decreases over time but remains positive in all four periods.

Table 4.1: Descriptive statistics and correlation matrices of insolation, δ18O and δ13C

Variable Mean Variance Skewness Kurtosis Correlations Period IV Insolation δ18O δ13C Insolation 455.99 262.12 0.10 -0.39 1.0000 δ18O 3.21 0.01 0.18 2.25 -0.0987 * 1.0000 δ13C 0.70 0.03 -1.06 2.52 0.0872 * 0.0171 1.0000 Period III Insolation δ18O δ13C Insolation 455.33 358.34 0.19 -0.62 1.0000 δ18O 3.39 0.05 -0.19 1.60 -0.0081 1.0000 δ13C 0.64 0.04 -0.68 0.92 -0.1029 * -0.0615 1.0000 Period II Insolation δ18O δ13C Insolation 456.25 455.40 0.14 -0.56 1.0000 δ18O 3.97 0.08 0.10 -0.35 0.0627 1.0000 δ13C 0.76 0.05 -0.52 0.75 0.0088 -0.0510 1.0000 Period I Insolation δ18O δ13C Insolation 455.84 334.16 0.08 -0.68 1.0000 δ18O 4.28 0.12 -0.38 0.67 0.0556 1.0000 δ13C 0.57 0.07 -0.11 0.14 -0.1829 * -0.4441 * 1.0000

This table shows the mean, variance, skewness and kurtosis of insolation, δ18O and δ13C in periods IV, III,

II and I. It also presents the correlation matrices of insolation, δ18O and δ13C in all time periods. ’*’ indicates that the correlation is significant at the 5% level.

(15)

CHAPTER 4. ODP DATASET AND METHODOLOGY 12 The correlation matrices show that the correlations between insolation and δ18O and inso-lation and δ13C are significant at a 5% level in period IV, while in period III only a significant correlation between insolation and δ13C exists. Remarkably, no significant correlation is present in period II. In period I, correlations between insolation and δ13C and δ18O and δ13C are signif-icant. Only the correlation between insolation and δ13C is significant in all time periods (except for period II). Strikingly, the correlation between these two variables is negative in period IV, while it is positive in periods III and I. However, when interpreting these correlations between the variables, it should be kept in mind that correlation is only a linear measure. It is possible that the correlation between two variables is insignificant, while a strong nonlinear relationship is present. Still, correlation provides a first impression of couplings between variables.

Since time series data is used, it is important to determine whether or not the time series are stationary. A deterministic term should be included in the model (in case of a deterministic trend) or the data should be differenced (in case of a stochastic trend) if the time series is nonstationary (Heij et al., 2004, p. 581). In order to test for stationarity, the Augmented Dickey-Fuller (ADF) test is performed. In case a deterministic trend term is excluded from the test equation, it tests the null hypothesis of a random walk (unit root) with drift against the alternative hypothesis of a stationary time series. However, if the deterministic trend term is included, the alternative hypothesis is that the time series is trend stationary.

Figure 4.1: Time series of insolation, δ18O and δ13C

Time series of the climatological variables. The horizontal axis indicates the age, and insolation, δ18O (d18O) and δ13C

(d13C) are presented on the vertical axis.

Figure 4.1 shows that there is only a clear upward pattern in the time series of δ18O in periods III and II, a deterministic trend term is therefore only included in the ADF test equations for these time series. Furthermore, since the time series do not fluctuate around zero, a constant is included for all variables in all periods. The lag lengths are based on the Schwartz information criterion (SIC). The null hypothesis of the ADF tests (Appendix I) is rejected at a significance level of 1% in all cases, which means that there exists enough statistical evidence to conclude that all time series, except for the trend stationary time series of δ18O in periods III and II, are stationary. To verify the results of the ADF test, a Phillips Perron test (which applies a Newey-West correction for serial correlation (Heij et al., 2004, p. 598)) is performed. The conclusions based on this test match the conclusions based on the ADF tests. The next section

(16)

CHAPTER 4. ODP DATASET AND METHODOLOGY 13 discusses how to cope with the deterministic trends in δ18O in periods III and II and describes how the dependencies in the climate system are modelled.

4.3

Methodology

This section presents the empirical methodology. The way in which the study is conducted is motivated and the steps which are taken in order to define the final GAMs and VAR models are described. Finally, this section provides an explanation on the comparison of the models.

Prior to estimating the models, one should determine how to take into account the deter-ministic trends in δ18O in periods III and II. These trends are probably caused by external effects (Bell et al., 2014). One possibility to account for the deterministic trends is to linearly detrend δ18O in both periods (Diks & Mudelsee, 2000) by running a linear regression of δ18O on age and subtract the trend (Heij et al., 2004, p. 579). Another possibility is to include age as explanatory variable in the models with dependent variable δ18O. Both methods are intuitive, but have the limitation that the deterministic trend is assumed to be linear. As the first method can be applied more consistently than the second in the two models explained later in this section, I choose to apply the first method. The linearly detrended time series of δ18O (det δ18O) in period II is computed as

δ18Ot= α + βage 1t+ εt, (4.2)

det δ18Ot= δ18Ot− ˆβage 1t, (4.3)

where age 1 = −(age - 2587.2). The minus ensures that age 1 increases with the relevant age steps and 2587.2 is subtracted from age such that age 1 equals zero for the first observation in period II. The time series of δ18O in period III is detrended analogously. ADF tests on det δ18O in periods III and II lead, as expected, to the conclusion that the detrended time series are stationary and thus that the deterministic trends are properly removed by the linear detrending method. Once the variables are linearly detrended in both periods, models for the dependencies in the climate system can be constructed.

First, the linear relations between the climatological variables δ18O, δ13C and insolation are investigated. Since earlier research shows that the variables are likely to be interrelated, I choose to use the linear multivariate VAR model discussed in Section 3.1. For each of the four time periods, a VAR model is estimated by using the vars package (Pfaff, 2008) in R (R Core Team, 2014). All three climatological variables are used as dependent variable and lags of insolation, δ18O and δ13C serve as explanatory variables. Due to the trend stationarity of δ18O in periods III and II, det δ18O is used in these periods.

The SIC is used to preselect an appropriate lag order. Based on various grounds, the start lag order chosen for all models is three. Firstly, for the ease of comparison of the models in the four periods, it is convenient to start with the same number of lags in each time period. Secondly, too high lag orders, and thus too many parameters, can cause the data to be overfitted. Although overfitting improves the model fit over the estimation sample, it reduces the out-of-sample

(17)

CHAPTER 4. ODP DATASET AND METHODOLOGY 14 forecast performance and this should be avoided (Heij et al., 2004, p. 576). Lastly, considering the use of GAMs later on, it is important to choose the lag order not too large, since good smooth functions can only be estimated for low lag orders (Buja, Hastie and Tibshirani, 1989). If Insol represents insolation and j in the subscripts t−j (j = 1, 2, 3) indicate lag numbers 1, 2

and 3, the VAR model with three lags can be specified as     Insolt δ18Ot δ13Ct     =     α1 α2 α3     + Θ1     Insolt−1 δ18Ot−1 δ13Ct−1     + Θ2     Insolt−2 δ18Ot−2 δ13Ct−2     + Θ3     Insolt−3 δ18Ot−3 δ13Ct−3     +     ε1t ε2t ε3t     , (4.4) where Θj =     θj11 θj12 θj13 θj21 θj22 θj23 θj31 θj32 θj33     , j = 1, 2, 3 and     ε1t ε2t ε3t     ∼ IID         0 0 0     ,     σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33         .

After all models (with three lags) are estimated, insignificant variables are deleted for the purpose of estimator efficiency; the standard errors of the parameter estimates are higher in a model that includes redundant variables in comparison with those of a model that does not contain redundant variables (Heij et al., 2004, p. 143-144). Backward elimination (Heij et al., 2004, p. 281) is used in order to exclude the insignificant variables. This method has, in contrast to forward selection, the advantage that the models are only misspecified if the full model, i.e. the model with three lags, is already misspecified. Backward elimination implies that the most insignificant variable (based on the t-statistic) is deleted from the model. The resulting VAR model is then estimated and the most insignificant variable is again deleted. This process is continued until a model with solely significant variables (5% significance level) remains. Once the final VAR models are computed, ARCH-LM tests are performed in order to determine whether it should be appropriate to model the squared innovations 2t with ARMA models. To test for the presence of serial correlation in the residuals, the Breusch-Godfrey LM test is performed.

Subsequent to estimating the linear VAR models for periods IV, III, II and I, a nonlinear estimation method is applied to the data. Semiparametric GAMs are chosen for this purpose for two reasons. As mentioned already in Section 3.2, the only underlying assumptions of GAMs are that the functions fj(·) should be additive and smooth. This provides an increase in efficiency

compared to parametric methods (Wood & Augustin, 2002). Another benefit of GAMs in particular, is that they are very useful for investigating marginal effects of individual variables on the dependent variable (Cameron & Trivedi, 2005). Marginal effects of explanatory variables on E[yi] can be visualised by plots of the estimated partial regression functions. Distinguishing

between effects of explanatory variables is much more difficult for some other nonparametric regression techniques, as for example neural networks (Gorr, 1994).

The GAMs are estimated in R by the gam package (Hastie, 2015). The parametric part of the model is fit by iteratively reweighted least squares (glm.fit ). Again, all three climatological variables are used as dependent variable and lags of insolation, δ18O and δ13C serve as

(18)

inde-CHAPTER 4. ODP DATASET AND METHODOLOGY 15 pendent variables. For the ease of comparison with the VAR models, three lags are included in the GAMs. If Insol represents insolation and j in the subscripts t−j (j = 1, 2, 3) indicate

lag numbers 1, 2 and 3, and fi(·) (i = 1, 2, ..., 9) represents a smooth function, the GAM with

dependent variable insolation can be specified as

g(E[Insolt]) = α1+ f1(Insolt−1) + f2(δ13Ct−1) + f3(δ18Ot−1) + f4(Insolt−2) + (4.3)

f5(δ13Ct−2) + f6(δ18Ot−2) + f7(Insolt−3) + f8(δ13Ct−3) + f9(δ18Ot−3).

The models with dependent variables δ13C and δ18O can be specified in the same manner. Again, backward elimination is used to exclude insignificant variables from the model. In-stead of t-tests, F -tests are now used for this purpose. The gam package splits the estimated effects in a parametric and a nonparametric part, a variable is therefore excluded from the model if both parts are insignificant at the 5% level. Backward elimination is continued until the final GAM model, which is a model with solely significant (5% level) variables, remains.

Once the models are computed and analysed, the GAMs and VAR models in the four periods are compared. First, the Akaike information criterion (AIC) is used to compare the goodness of fit of both models. Since it is important to be able to make accurate predictions on the climate system, the root mean squared error (RMSE) of both models is compared as well. The first 80% of the observations in each period are used as estimation sample and the last 20% of observations serve as hold-out sample. Estimated models on the estimation sample are then used to predict the dependent variable in the hold-out sample. Furthermore, the predictive performance of GAMs and VAR models is compared with the predictive performance of multivariate adaptive regression splines (MARS) based on the same ODP 1264 data. The next chapter presents the results of the models described in this section.

(19)

Chapter 5

VAR and GAM results and analysis

This chapter presents and analyses the estimation results of both a linear and nonlinear model. Section 5.1 and 5.2 show and analyse the estimated VAR models and GAMs for the four time periods described in the previous chapter.

5.1

VAR results and analysis

This section presents and analyses the VAR models for all time periods with dependent variables insolation, δ13C and δ18O.

Firstly, the estimated VAR models with dependent variable insolation (Table 5.1) are dis-cussed. The results indicate that insolation almost entirely depends on itself, which supports the hypothesis that insolation is mainly influenced by changes in the orbit of the Earth and the angle of incidence of the sun’s rays on Earth and not by climatological isotopes. The only ex-ception is the significant effect of the third lag of δ13C on insolation in period III. However, the 5% significance level of this lag is low in comparison with the 0.1% significance level of the lags of insolation. Furthermore, lags of δ13C are not significant in the other periods. Consequently, the significant effect in period III might be caused by chance. Another possible explanation for this counterintuitive significant effect is that another variable, which is not included in the model, influences both δ13C and insolation. It may therefore seem as if δ13C affects insolation, while in fact the other variable influences insolation. The findings on insolation are (over and above the aforementioned exception) in line with results of Diks and Mudelsee (2000), since they find no evidence of δ18O and δ13C being Granger causes of insolation in the four periods. Secondly, the estimated VAR models with dependent variable δ13C (Table 5.2) are consid-ered. δ13C depends mainly on itself and insolation. In each period, two or three lags of δ13C positively affect δ13C. Furthermore, δ13C is affected by insolation in periods III and I. In addi-tion, δ13C is influenced by δ18O in the two middlemost periods. The results are in accordance with results of Diks and Mudelsee (2000), since they find that insolation is a Granger cause of δ13C in period I and that δ18O is a Granger cause of δ13C in the last three periods. However, as the estimated coefficients of δ18O differ in sign and one of the effects is only significant at the 5% level, the significance might be caused by chance.

(20)

CHAPTER 5. VAR AND GAM RESULTS AND ANALYSIS 17 Table 5.1: Estimated VAR models with dependent variable insolation

Period IV Period III Period II Period I Constant 65.15264 160.1289 323.8913 1010.1776 [2.7520] [16.1612] [22.4130] [70.9124] Insol(-1) 1.7532 *** 1.4496 *** 1.0129 *** -0.2851 *** [0.0156] [0.0479] [0.0456] [0.0749] Insol(-2) -0.8961 *** -1.1189 *** -0.8888 *** -0.6804 *** [0.0156] [0.0671] [0.0508] [0.0569] Insol(-3) - 0.3116 *** 0.1659 *** -0.2501 *** [0.0483] [0.0455] [0.0743] δ13C(-1) - - - -δ13C(-2) - - - -δ13C(-3) - 4.4083 * - -[2.1757] δ18O(-1) - - - -δ18O(-2) - - - -δ18O(-3) - - -

-Table 5.2: Estimated VAR models with dependent variable δ13C

Period IV Period III Period II Period I

Constant 0.31043 0.9365 -0.1527 -1.1156 [0.0313] [0.3031] [0.1439] [0.4463] Insol(-1) - -0.0023 ** - -[0.0007] Insol(-2) - 0.0017 * - 0.0029 *** [0.0007] [0.0010] Insol(-3) - - - -δ13C(-1) 0.1917 *** 0.1801 *** 0.2306 *** 0.3842 *** [0.0345] [0.0490] [0.0455] [0.0740] δ13C(-2) 0.1900 *** 0.2170 *** 0.2933 *** 0.2641 ** [0.0345] [0.0491] [0.0457] [0.0766] δ13C(-3) 0.1772 *** - 0.1457 ** -[0.0345] [0.0456] δ18O(-1) - - - -δ18O(-2) - -0.0899 * 0.1091 ** -[0.0452] [0.0372] δ18O(-3) - - -

-Table 5.3: Estimated VAR models with dependent variable δ18O

Period IV Period III Period II Period I

Constant 0.8420 1.1109 1.4968 3.2784 [0.1566] [0.3119] [0.3157] [0.3739] Insol(-1) -0.0012 * - 0.0011 * -[0.0006] [0.0005] Insol(-2) 0.0013 * 0.0018 ** - -[0.0006] [0.0007] Insol(-3) - -0.0014 * - -[0.0007] δ13C(-1) - - - -0.2550 * [0.1041] δ13C(-2) - - - -δ13C(-3) - - - -δ18O(-1) 0.3021 *** 0.2296 *** 0.1944 *** 0.2678 ** [0.0345] [0.0498] [0.0459] [0.0802] δ18O(-2) 0.2341 *** 0.2251 *** 0.1513 ** -[0.0351] [0.0500] [0.0462] δ18O(-3) 0.1905 *** 0.1472 ** 0.1148 * -[0.0345] [0.0495] [0.0458]

Estimated VAR models in periods IV, III, II and I. ’***’, ’**’ and ’*’ indicate that the variable is significant at the 0.1%, 1% and 5% level. Standard errors are shown between brackets.

(21)

CHAPTER 5. VAR AND GAM RESULTS AND ANALYSIS 18 Thirdly, the estimated VAR models with dependent variable δ18O (Table 5.3) are discussed. δ18O is predominantly influenced by lags of itself (in all periods) and insolation (in the oldest three periods). The latter is in agreement with the claim that insolation is a Granger cause of δ18O (Diks & Mudelsee, 2000, Tiedemann et al., 1994). Furthermore, the first lag of δ13C affects δ18O, but this effect is only significant in period I. The influence of δ13C on δ18O in the most recent period might point to an increased coupling between these variables. However, since this effect is only significant in one period and the significance level is 5%, the significance might be caused by chance. As Diks and Mudelsee (2000) find that δ13C is no Granger cause of δ18O but that the reverse is true, it is possible that δ13C depends on δ18O instead of vice versa.

In short, the VAR models show the following. Firstly, insolation is mainly driven by itself, which confirms the hypothesis based on earlier research. Secondly, δ13C depends predominantly on itself and insolation, but δ18O also significantly affects δ13C. These results are also in line with results of earlier research. Lastly, δ18O depends mostly on itself and insolation. Moreover, it depends on δ13C. This might, however, be caused by chance as the effect is only significant in one period and in contrast with results of earlier research.

However, the following needs to be noted. First, it appears from the ARCH-LM test (Ap-pendix II) that there exists enough statistical evidence to conclude that the error terms should be modelled with GARCH in all four periods. Estimation of VAR-GARCH models could there-fore be a subject for further research. Second, the Breusch-Godfrey LM test (Appendix III) points out that enough statistical evidence is present to reject the null hypothesis of the absence of serial autocorrelation in the residuals at the 1% significance level in periods IV to II. The models in these periods are thus not well specified, which may be caused by omitted variables and the ignorance of nonlinearities in the climate system. As a result, care needs to be taken when interpreting the results above. In the next section, GAM results are analysed to determine to what extent nonlinearities are present in the climate system.

5.2

GAM results and analysis

This section presents and analyses the GAMs with dependent variables insolation, δ13C and δ18O for all time periods. Furthermore, the GAM results are compared to results of the cor-responding VAR models. As mentioned earlier, GAMs are based on the assumption that the partial regression functions are additive. Violation of this assumption results in misspecification of the model, which should be taken into consideration when interpreting the results below.

Firstly, the estimated GAMs with dependent variable insolation (Table 5.4) are discussed. The estimated GAMs for insolation show just as the VAR models that insolation depends mainly on itself. However, the GAMs indicate that the relations are not completely linear as assumed by the VAR models. They find that half of the relations are significantly nonlinear, which supports the theory on the existence of nonlinearities in the climate system. Just as in the VAR model, the third lag of δ13C affects insolation in period III.

(22)

CHAPTER 5. VAR AND GAM RESULTS AND ANALYSIS 19 Table 5.4: Estimated GAMs with dependent variable insolation

Period IV Period III Period II Period I

Constant 64.7284 159.4247 324.3108 1033.0647 f(Insol(-1)) 1.7774 *** 1.4538 *** 1.0124 *** -0.3007 + ++ f(Insol(-2)) -0.9412 *** -1.1270 *** -0.8880 *** -0.7225 *** +++ ++ f(Insol(-3)) 0.0236 0.3170 *** 0.1646 *** -0.2588 *** +++ f(δ13C(-1)) - - - -f(δ13C(-2)) - - - -f(δ13C(-3)) - 4.4900 * - -f(δ18O(-1)) 1.4936 - - 1.7304 +++ + f(δ18O(-2)) - - - -f(δ18O(-3)) -1.7396 * - -

-Table 5.5: Estimated GAMs with dependent variable δ13C

Period IV Period III Period II Period I

Constant 0.1600 0.9858 -0.1589 -0.9265 f(Insol(-1)) 0.0004 * -0.0020 * - -f(Insol(-2)) - 0.0013 - 0.0032 * ++ f(Insol(-3)) - 0.0001 - -+ f(δ13C(-1)) 0.1751 *** 0.1504 *** 0.2337 *** 0.3771 *** + f(δ13C(-2)) 0.1786 *** 0.1930 *** 0.2846 *** 0.2396 ** +++ + f(δ13C(-3)) 0.1494 *** - 0.1394 ** -f(δ18O(-1)) - - - -0.0675 *** + f(δ18O(-2)) - -0.0910 * 0.1132 * -0.0086 * f(δ18O(-3)) - - -

-Table 5.6: Estimated GAMs with dependent variable δ18O

Period IV Period III Period II Period I

Constant 0.9615 1.0155 1.502 3.5296 f(Insol(-1)) 0.7119 ** - 0.0011 * -0.0002 ++ f(Insol(-2)) - 0.0008 - -+ f(Insol(-3)) - - - -0.0011 ++ f(δ13C(-1)) - - - -0.2645 * f(δ13C(-2)) - - - -f(δ13C(-3)) 0.0436 * f(δ18O(-1)) 0.3062 *** 0.2179 *** 0.1972 *** 0.2289 *** ++ + f(δ18O(-2)) 0.2249 *** 0.2217 *** 0.1486 *** 0.1098 * f(δ18O(-3)) 0.1701 *** 0.1399 ** 0.1134 *

-Estimated coefficients of the parametric part of the GAMs in periods IV, III, II and I. ’***’, ’**’ and ’*’ indicate that the parametric part is significant at the 0.1%, 1% and 5% level. ’+++’, ’++’ and ’+’ indicate the same significance levels for the nonparametric part.

(23)

CHAPTER 5. VAR AND GAM RESULTS AND ANALYSIS 20 Moreover, as it is widely assumed that insolation is not influenced by other climatological variables, it is remarkable that δ18O nonlinearly influences insolation in two periods. A possible explanation for this significant effect is that δ18O influences the obliquity of the Earth and therefore insolation. However, this effect is presumably not of great significance (Mitrovica, Forte & Pan, 1997). As the significant effects do not exist in all periods and the significance levels are in three of four cases only 5%, the significant relations might also be caused by chance. Secondly, the estimated GAMs with dependent variable δ13C (Table 5.5) are considered. The results on lags of δ13C itself and insolation are comparable to the results of the VAR model − except for the fact that insolation now also influences δ13C in three instead of two periods.

Moreover, δ13C depends on δ18O in the youngest three periods, while it only depends on δ18O in periods III and II in the VAR model. GAMs thus find stronger evidence for the dependence of δ13C on insolation and δ18O than VAR models. Significant evidence for the presence of nonlinear relations exists for all explanatory variables, which, just as the results on insolation, supports the statement that dependencies in the climate system are nonlinear.

Thirdly, the estimated GAMs with dependent variable δ18O (Table 5.6) are discussed. In accordance with the results of the VAR models, δ18O depends on itself in all time periods.

Nonlinearities in these relations predominantly exist in periods IV and I. Insolation affects δ18O in all time periods in comparison with significant effects in merely the oldest three periods for the VAR models. This is due to the fact that the estimated relation in period I is nonlinear for the first as well as the third lag of insolation. δ13C affects δ18O in two periods instead of in only one period as in the VAR model. Although the effects of δ18O on δ13C might be caused by chance, the significance is still in contrast with results of Diks and Mudelsee (2000). It is possible that δ18O influences δ13C instead of the other way around. Therefore, it might be interesting to perform Diks-Panchenko tests on the data in order to investigate nonlinear Granger causalities (Diks & Panchenko, 2006).

Figure 5.1: Partial regression functions GAMs

The solid lines are the estimated partial regression functions and the dotted lines represent the 95% confidence intervals. The rug plots show the observations of the explanatory variables.

The relations between dependent and explanatory variables can be visualised by plots of the partial regression functions obtained by estimating the GAMs. Figure 5.1 shows the partial

(24)

CHAPTER 5. VAR AND GAM RESULTS AND ANALYSIS 21 regression functions of the third lag of insolation with dependent variable δ18O (left) and of the second lag of insolation with dependent variable insolation (right) in period I. The right plot indicates that the second lag of insolation linearly affects insolation itself. However, the relation in the left plot is like most of the other relations (Appendix IV−VII) clearly nonlinear, which supports the general assumption that dependencies in the climate system are mainly nonlinear. Lastly, the GAM results are compared to GAM results obtained by Hennequin (2012) and Vogel (2012), using ODP site 659 data. They additionally include the variable dust flux in their models, which may cause the results to differ. Their findings on relations between the oxygen isotope and the carbon isotope are in accordance with the findings in this study. Furthermore, this study evidences that insolation is an important influencer of δ18O and δ13C. Remarkably, Hennequin (2012) and Vogel (2012) find that insolation does not affect the isotopes. A possible explanation for this finding is that their insolation data is not obtained at the same location as their isotope data. Moreover, their results show that dust flux significantly affects insolation and both isotopes. Therefore, the models in this study may be misspecified due to the exclusion of dust flux.

In summary, the GAMs and VAR models indicate roughly the same variables as significant. The GAMs show, however, that many of the relations the VAR models indicate as linear are in fact nonlinear. In addition, the GAMs find more significant relations than the VAR models. This is probably due to the fact that GAMs permit nonlinearities since these relations are almost all nonlinear. At first glance, the GAMs appear to be more suitable than the VAR models to model the dependencies in the climate system. This statement is investigated in the next chapter by a performance evaluation of both models.

(25)

Chapter 6

Performance comparison of VAR,

GAM and MARS

The previous chapter presented the results of the final GAMs and VAR models. On the one hand, the GAMs indicate that many relations are nonlinear. On the other hand, Attanasio et al. (2012) state that linear models can be useful for investigating the dependencies in the (generally assumed) nonlinear climate system. Therefore, it is of interest to compare the linear and nonlinear models on their model fit (AIC) and predictive performance (RMSE). Section 6.1 and Section 6.2 describe these comparisons. In addition, the predictive performances of both models are compared with the predictive performance of MARS models.

6.1

Model fit of VAR and GAM

Table 6.1 shows the AICs of the VAR models and GAMs in all time periods. When interpreting the values in this table, keep in mind that AIC is not a quality measure in absolute sense: even the model with the lowest AIC might fit the data poorly. Yet, AIC is a useful measure to determine which of the candidate models has the best goodness of fit.

Table 6.1: Comparison of GAMs and VAR models: AIC

Period IV Period III Period II Period I

Dep. variable VAR GAM VAR GAM VAR GAM VAR GAM

Insolation 3989.96 3947.96 2867.80 2846.80 3726.44 3714.77 1403.93 1376.54 δ13C -640.70 -652.78 -186.74 -197.86 -243.45 -233.55 -20.75 -17.94

δ18O -1475.31 -1468.40 -192.67 -186.90 -83.02 -78.65 108.17 107.98

This table shows the AICs of the GAMs, VAR and MARS models with dependent variables (Dep. variable) insolation, δ13C and δ18O in periods IV, III, II and I.

The AICs of the GAMs are in seven out of twelve cases lower than the AICs of the VAR models, which implies that the model fit of the GAMs is in more than half of the cases better than the model fit of the VAR models. The AICs of the GAMs with dependent variable insolation are in all cases lower than those of the VAR models, which is probably due to the large number of significant nonlinear relations between insolation and the explanatory variables (Table 5.4). Furthermore, the AICs of the GAMs with dependent variable δ13C are lower than those of the VAR models in the oldest two periods. In addition, the AIC of the GAM for δ18O is lower than that of the VAR model in the youngest period. Again, this could be explained by the number

(26)

CHAPTER 6. PERFORMANCE COMPARISON OF VAR, GAM AND MARS 23 of significant nonlinear relations as there are more significant nonlinear relations in periods IV and III than in periods II and I for δ13C. Moreover, three nonlinear relations are significant for dependent variable δ18O in period I, while there is at most one significant nonlinear relation in the other periods. The fact that some GAMs have higher AICs than VAR models is probably due to the penalty AIC imposes for overfitting. The fact that the smooth functions in the GAMs are able to follow the underlying true functions more precisely than the linear functions estimated by the VAR models does not outweigh the penalty term in these cases.

In short, the model fit of the GAMs is in most cases better than the model fit of the VAR models. This is probably due to the fact that GAMs are able to capture nonlinearities in the climate system while VAR models are only able to capture linear relations. In some cases (the cases in which the number of significant nonlinear relations is low), VAR models have lower AICs than GAMs, which can presumably be explained by the penalty AIC imposes for overfitting.

Not only the model fit of both the linear and nonlinear models is of interest, it is also important to investigate how well the models predict relative to each other. Therefore, the next section compares their RMSEs.

6.2

Predictive performance of VAR, GAM and MARS

In order to compare GAMs, VAR and MARS models on their predictive performance, RMSEs are computed. Before the RMSEs are evaluated, MARS is briefly explained.

MARS (Friedman, 1991) is a nonparametric regression technique which uses piecewise and local parametric fitting methods. If y represents the dependent variable, xi (i = 1, ..., n) is an

explanatory variable and f(·) is a function to be constructed, the model is given by

y = f (x1, ..., xn) + ε. (6.1)

MARS uses recursive partitioning to ensure that the coefficients best match the data and to approximate the underlying function (which captures the joint relationship of the dependent variable on the explanatory variables) to the best extent possible by several linear functions. Just like GAMs, MARS models allow the relations between dependent and explanatory variables to be nonlinear. GAM and MARS differ from each other in the sense that smooth partial regression functions are estimated in GAMS, while MARS fits piecewise linear segments and automatically allows for variable interactions. See Friedman (1991) for more details on the MARS method. The MARS models in this study are based on the same ODP 1264 dataset as the GAMs and VAR models. See Van der Vlist (2016) for details on the steps taken in order to define the final MARS models and analysis of the MARS results.

Table 6.2 shows the RMSEs of GAMs, VAR and MARS models. It is noteworthy that the RMSEs of insolation are higher than those of both isotopes in all models, which is probably a consequence of the higher variance of insolation. Over and above three exceptions, the RMSEs of the nonlinear models (i.e. GAMs and MARS models) are always lower than the ones of the VAR models. Thus, the nonlinear models yield more accurate predictions than linear models. This is probably due to the fact that the semi- and nonparametric models use the data to

(27)

CHAPTER 6. PERFORMANCE COMPARISON OF VAR, GAM AND MARS 24 construct the functional form of the model, which ensures that the regression functions are fit to the data to the best extent possible and enables these models to make precise predictions. The fact that the VAR models with dependent variable δ13C provide better forecasts than both nonlinear models in two cases might by caused by overfitting of the data by the GAMs and MARS models. Although overfitting improves the model fit over the estimation sample, it yields worse out-of-sample forecasts. The next paragraphs further investigate the similarities and differences between the predictive performance of both nonlinear models.

Table 6.2: Comparison of GAMs, MARS and VAR models: RMSE

Period IV Period III

Dep. variable VAR GAM MARS VAR GAM MARS

Insolation 20.8524 5.4675 5.4908 19.1492 7.1486 6.2483 δ13C 0.1351 0.1404 0.1428 0.1627 0.1747 0.1903

δ18O 0.1098 0.0948 0.0954 0.2269 0.2043 0.2279

Period II Period I

Dep. variable VAR GAM MARS VAR GAM MARS

Insolation 22.4420 16.8501 14.1567 19.8524 17.5980 25.1591 δ13C 0.1842 0.1715 0.1644 0.2823 0.1768 0.1997

δ18O 0.2626 0.2614 0.2568 0.4183 0.3420 0.3785

This table shows the RMSEs of the GAMs, VAR and MARS models with dependent variables (Dep. variable) insolation, δ13C and δ18O in periods IV, III, II and I.

GAMs yield more accurate predictions than MARS models for the climatological isotopes in three periods, which could be explained by a difference in flexibility between both models. GAMs estimate smooth partial regression functions, whereas MARS models produce piecewise linear functions with different slopes in each segment. Therefore, GAMs are better able to follow the underlying true functions than MARS models. However, the differences between the RMSEs of both models for δ18O and δ13C are minimal and might not even be significant.

On the other hand, there is no clear preference for either one of the nonlinear models when the aim is to predict insolation; GAMs outperform MARS models in periods IV and I, whereas MARS models outperform GAMs in the other two periods. MARS models with dependent variable insolation allow for five lags, while the highest allowed lag order for the corresponding GAMs is three. The MARS models (Van der Vlist, 2016) indicate that the fourth and fifth lag of insolation significantly influence insolation itself. This may be the reason that, despite the higher flexibility of GAMs, the RMSEs of MARS models with dependent variable insolation are lower than those of the GAMs in half of the cases. To verify this, further research on the predictive performance of GAMs that allow for five lags of insolation is needed.

In summary, nonlinear models produce in general better forecasts of climatological variables than linear models. Moreover, the forecast performance of GAMs on climatological isotopes is slightly better than that of MARS models, but differences in their RMSEs are small. Mean-while, none of both nonlinear models yield clearly better insolation predictions than the other. This might be due to the fact that the fourth and fifth lags of insolation are also included as explanatory variables in the MARS models.

(28)

Chapter 7

Conclusion

Improved insight into interrelations in the climate system can reduce the uncertainty about climate changes in the future. The aim of this study was therefore to construct a suitable model for the dependencies between the climatological time series variables insolation, carbon isotope and oxygen isotope by comparing performances of linear and nonlinear models. ODP site 1264 data over the past 5000 ka was used for this purpose. The time series in this dataset are unevenly spaced and to mitigate this problem, the dataset was divided into four time periods with different climatic regimes. To investigate the linear couplings between climatological variables, VAR models were estimated. GAMs, which are able to model nonlinear relations, were thereafter estimated. Both models were compared on their model fit and predictive performance. The predictive performances of GAMs and VAR models were also compared to those of MARS models, which are just like GAMs able to capture nonlinearities in the climate system.

The GAMs and VAR models evidenced that insolation is predominantly driven by itself, which supports the hypothesis that insolation is mainly influenced by changes in the orbit of the Earth and the angle of incidence of the sun’s rays on Earth and not by climatological isotopes. Furthermore, insolation appeared to be an important explanatory variable for both δ13C and δ18O. Moreover, it has been found that δ13C depends on δ18O in the majority of the time periods. These findings are all in line with results of earlier research. The significant effect of δ13C on δ18O however, contradicts the theory. It seems possible that δ18O actually influences δ13C instead of the other way around. Therefore, it might be of interest to perform Diks-Panchenko tests on the data in order to investigate nonlinear Granger causalities.

The GAMs showed that many of the relations indicated as linear by the VAR models are in fact nonlinear. This supports the statement that the climate system is highly nonlinear and consequently, favours the use of GAMs rather than VAR models.

The model fit of GAMs was in most cases better than the model fit of the VAR models. This was probably due to the fact that GAMs are able to capture nonlinearities in the climate system while VAR models are only able to capture linear relations. In cases whereby the number of significant nonlinear relations was low, GAMs often had higher AICs than VAR models. This can supposedly be explained by the penalty AIC imposes for overfitting. The advantages of smooth functions over linear functions did not outweigh the penalty term in these cases.

Nonlinear models did in general yield better forecasts of climatological variables than linear

(29)

CHAPTER 7. CONCLUSION 26 VAR models, which again supports the use of nonlinear models to model the dependencies in the climate system. Furthermore, GAMs produced better predictions of climatological isotopes than MARS models, but differences in their RMSEs were small. For insolation on the other hand, there is no clear preference for either model. This might be due to the fact that the fourth and fifth lag of insolation were included as explanatory variables in the MARS models but not in the GAMs.

To conclude, the results of this study confirmed the hypothesis that the climate system is nonlinear. Moreover, comparisons of both the model fit and predictive performance lead to the conclusion that nonlinear models are more suitable than linear models to model the dependencies in the climate system. Furthermore, GAMs are slightly better than MARS models if the aim is to predict climatological isotopes. Further research on GAMs that include five lags of insolation is needed to determine whether GAMs actually yield more accurate predictions of insolation than MARS models as well.

Finally, shortcomings of this study and additional recommendations for further research are discussed. A limitation of this study is that only three climatological variables were investi-gated, while at least five variables are claimed to be active (Diks & Mudelsee, 2000). As omitted variables lead to misspecification, further research that incorporates more climatological vari-ables (as for example dust flux) is recommended. Moreover, it appeared that the error terms of the VAR models should be modelled with GARCH. Estimation of VAR-GARCH models could therefore be a subject for further research. Lastly, nonlinear GAMs were compared to linear VAR and nonlinear MARS models in this study. It might be of interest to compare the GAMs to even more models, as for example artificial neural networks.

Referenties

GERELATEERDE DOCUMENTEN

The most practical approaches rely either on linearisation, making techniques from linear model order reduction applicable, or on proper orthogonal decomposition (POD), preserving

Similar to other models that have flexibility in the implementation of the model like quasi-outsourcing being able to be international or domestic, the 24-hour knowledge factory

Het is mogelijk, dat uit een analyse volgt dat er in het geheel genomen geen significante verschillen zijn in de BAG-verdeling naar een bepaald kenmerk (bijvoorbeeld

Tise’s experience at senior management levels in the profession includes being the first President of the Library and Information Association of South Africa from 1998 –

Whereas it can be safely assumed that the right and left upper lung lobes drain via their respective right and left upper pul- monary veins, and similarly for the lower lung lobes

Solutions for the scalar MIMO case, within scaling and permutation ambiguities, have been proposed in the past, based on the canonical decomposition of tensors constructed

For the clustering on pure data, 18 clusters are obtained of which five show a periodic profile and an enrichment of relevant motifs (see Figure 10): Cluster 16 is characterized by

Harshman, R A , & Lundy, M E , 1984a, The PARAFAC model for three-way factor analysis and multidimensional scaling In H.G Law, C W Snyder Jr., J A Hattie, & R P McDonald