## Trends in environmental data:

## when uncertainty matters

### Krista Jaarsma

### Master Thesis in Mathematics

### March 2013

### Trends in environmental data:

### when uncertainty matters

Summary

A lot of research is done on the behavior of environmental processes. Questions like how did these processes behave in the past and how will it behave in the future, are of interest. Within this context trend analysis is an important tool in the study of observed environmental data.

Trend values are estimators of the observations or predictions of the future observations. Both the degree of uncertainty of this values and the detection of significant increase or decrease of the trend is of great importance. It is, because it contributes to a better prediction or estimation.

There is extensive literature on trend estimation. Also several papers exist about uncertainty in trend estimation in the form of a point wise confidence interval for the trend estimates.

About uncertainty in trend differences is also written in some articles. But full uncertainty information is missing for statistics such as return periods or the chance of crossing thresholds.

My thesis will give renewal in the sense that it will describe a way to find this uncertainty information. I will present a way to find two new confidence intervals. The first one is a simultaneous confidence band for the generated trend and trend differences. The second confidence interval (as well point wise as simultaneous) is that of the exceedance of any threshold level. Besides giving a procedure to produce this, a new program in R will be presented called TrendspotteR 1.0. In TrendspotteR 1.0 a trend, trend differences, exceedance and differences in this exceedance of a threshold will be generated with maximum information on uncertainty. In the last chapter I will discuss a relevant expansion of TrendSpotteR 1.0 to estimate trends in data sets which contain some huge outliers and are distributed according to the Gumbel distribution.

Master Thesis in Mathematics Author: Krista Jaarsma Supervisor(s): E.C. Wit

Second supervisor: A.E.P. Veldman

External supervisor: A.C. Petersen, H. Visser Date: March 2013

Institute of Mathematics and Computing Science P.O. Box 407

9700 AK Groningen The Netherlands

### Contents

1 Introduction 1

2 IRW trend estimation 3

2.1 Trend estimation . . . 3

2.2 Relevance of uncertainty information . . . 4

2.3 Kalman filter . . . 5

2.3.1 Filtering . . . 6

2.3.2 Smoothing . . . 7

2.3.3 Prediction . . . 7

2.4 IRW trend model . . . 7

3 Method and software 9 3.1 Trend ensembles and uncertainties . . . 9

3.1.1 Error in exceedance probability . . . 10

3.1.2 Simultaneous confidence bands . . . 16

3.2 TrendSpotteR 1.0 software . . . 18

3.2.1 Creating non-standard statistics . . . 19

3.3 Examples . . . 20

3.3.1 Spring temperatures 1906-2012 . . . 20

3.3.2 Global economic losses from weather-related disasters . . . 21

4 Gumbel distribution 27 4.1 GEV distributions . . . 27

4.2 Gumbel distribution in TrendSpotteR 1.0 . . . 28

5 Discussion and conclusion 33

Bibliografie 35

Appendix 39

iii

### Introduction

Everywhere in the world people try to visualize environmental processes. Governments have to adjust their policy to the (expected) changes in this environmental processes. Is global warming a fact and how quickly is the earth’s temperature increasing? How big are the consequences? Is the sea level significantly rising, and how fast? Besides governments, also individuals and businesses are interested in trends in environmental data. The tourism and recreation sector for example, is interested in the changing temperature. Is it true that the summers will become hotter? This last question is also a question of interest for healthcare institutes since it is a well known fact that increasing temperatures in summer go along with an increase in the death rate. In addition to this few examples one can think of many more reasons why the study of behavior of environmental processes is essential.

Trend estimation is an important tool in the visualization of environmental processes.

Scanning the literature, one can find a lot of information about trend estimation. One im- poratant aspect of the trend estimation is the presence of uncertainty information about the estimation of the trend. More information about trend uncertainty results in better esti- mation which is important since the estimation mostly will be used for important decision making.

In this report, which is an extended version of the paper of Jaarsma et al.(2013) [[14]] (see Appendix), a unique method that generates uncertainty information for trend estimation that has not been found yet will be described; simultaneous confidence intervals and confidence intervals for the chance of crossing a threshold. Besides these uncertainties, the most pow- erful thing of this method is that full uncertainty information can be obtained for arbitrary functions of the trend and probabilities of exceedances of some level.

This thesis will not only present the new method, the implementation of the method in a new package for R called TrendSpotteR 1.0 will also be descibed. The package is an expansion of the TrendSpotter software developed by Visser, see Visser(2004) [[29]]. TrendSpotteR 1.0 is able to produce the trends and chances of exceedance together with differences in trends and exceedance likelihoods, along with full uncertainty information. In the report, the functioning of the TrendSpotteR 1.0 will be clarified with 2 examples.

Trend estimation in TrendSpotteR 1.0 is based on some normality assumptions. Not all datasets will satisfy this requirement. It is likely that series of extreme events do not satisfy the normality assumptions. Since it is the extreme events that often have the greatest impact, the last chapter of this thesis will focus on extreme environmental processes. One can think of floods, maximum temperatures reached each month, extreme rainfall, etc. It will be shown

1

how TrendSpotteR 1.0 can be extended for ’extreme’ datasets distributed according to a so-called Gumbel distribution, which is one of the extreme value distributions.

### IRW trend estimation

### 2.1 Trend estimation

There is vast literature on the behavior of environmental processes. Questions like how did these processes behave in the past and how will it behave in the future, are of great interest.

Within this context trend analysis is an important tool in the study of environmental data.

As stated by Harvey (1989) [[12]], there are two reasons for wishing to estimate trends in (environmental) time series data. The first reason is to give a representation of the data, that changes relatively slowly (smoothly) over time. The second reason might be to find a series that, if extrapolated, gives the clearest indication of the future long-term movements in the series.

Scanning the environmental literature on trends, it shows that a wide variety of trend models have been applied. We mention some examples of trend models, without being com- plete: moving averages with a pre-defined window, ordinary least squares (OLS) linear trend models, polynomial trend fits, splines, exponential smoothing, trends based on Bayesian prin- ciples, trends in rare events based on logistic regression, smooth transition models, LOESS estimators, Binomial filters, GARCH trends, Friedman super smoothers, wavelets, structural time-series trend models, robust regression trends (MM or LTS regression) and Box-Jenkins ARIMA models and variations (SARIMA, GARMA, ARFIMA). Next to these trend models a number of statistical tests for trends exist: Students t test for sub-periods of time, Mann- Kendall tests for monotonic trends (with or without correction for serial correlation), and trend tests against long-memory time series.

It is important to realize that there is no best trend model. The choice of a particular model depends on ones goals or wishes. To clarify, we give three examples:

• Suppose one wants to estimate a trend using simple mathematics and software general available. For this type of problems, trend estimation by moving average (MA) is a good choice. However, MA has some drawbacks: (i) the choice of the averaging window is subjective, (ii) due to the window chosen, there is no trend estimate at the beginning and ending of the time series, and (iii) no uncertainty information is available.

• Another case could be that one is interested in a good prediction (extrapolation) per- formance. For this goal the group of ARIMA models is a good choice. However, these models have one important drawback: the actual trend in the sample period is not given (trend tests such as Mann-Kendall, have the same disadvantage).

3

• If one wants for example a trend estimate along with uncertainty information, using software general available, the OLS linear trend model is a good choice. This model has an obvious drawback: only linear trends are allowed. However, linear trends are not realistic for most environmental time series.

Please see Mills (2010) [[19]], Chandler and Scott (2011) [[3]], and Visser and Petersen (2012)[[31]] for more discussions on trend choices.

### 2.2 Relevance of uncertainty information

Confidence bands around a trend contain information about the certainty of the estima- tion. This is a really important aspect of trend estimation. The more is known about the (un)certainty of the trend estimation, the more can be said about the estimation. When a prediction is done about the amount of rainfall for the next 10 years based on measured data, it is not really likely that the true value for the next year will be equal to the estimated trend value in that specific year. But if the estimation also has 95% pointwise confidence intervals, we know that with a chance of 95% the amount of rainfall will lie between the upper and the lower limit of this confidence interval. In other words, our prediction is much stronger once we know the uncertainty of the estimation.

There are only a few flexible trend methods which give confidence bands for the trend
estimates. We name LOESS estimators, DBM models and structural time series models
(STMs). The model used throughout this thesis is called the Integrated Random Walk (IRW)
model. This IRW trend model, sometimes denoted as smooth trend model, is a sub-model
from the class of STMs. The IRW trend model can be viewed as a special case of the so-
called local linear trend model where the noise component level equation is set to zero (e.g.,
Commandeur and Koopman, 2007 - Ch. 3) [[6]]. If uncertainty is presented, it is by showing
pointwise confidence limits for the trend µ_{t}. In paragraph 2.4 we will show the formulae for
the IRW trend model and go a little bit more into detail.

Visser (2004) [[29]]showed that the application of the IRW model, in combination with
the discrete Kalman filter, allows one to estimate uncertainties in any trend difference µ_{t}− µ_{s}
, where t ≤ s. These differences and uncertainties are important since it allows one to detect
significant decreases or increases. In Visser (2004) [[29]] and Visser and Petersen (2012) [[31]],
the following 3 functions of time are proposed:

• µ_{t} with corresponding 2σt pointwise confidence bands, t = 1, , N .

• µ_{t}− µ_{(t−1)}, the first difference with corresponding 2σ_{t} pointwise confidence bands, t =
2, , N .

• µ_{N} − µ_{t}, the lagged differences with corresponding 2σt pointwise confidence bands,
t = 1, , N .

Another advantage of having uncertainty information is laid in the possibility of calculating chances for crossing pre-defined thresholds. E.g., if one estimates a trend in annual averaged temperatures it may be of interest to calculate the time-varying probabilities of crossing a pre-defined temperature threshold. Please see Visser and Peterson (2012) [[31]] for an example (their figures 1, 4 and 6). The inverse of these chances are denoted as return periods.

In section 3.1 we introduce the method of estimating trend ensembles. A trend ensemble is a set of, say, 1000 IRW trend realizations which might occur with equal chance, based

on the same IRW trend model. We will show how these trend realizations can be derived from the Kalman filter equations (section 2.3) and how these ensembles can be applied to extend the possibility of estimating uncertainty bands. One such extension, discussed in paragraph 3.1.2, is the estimation of simultaneous confidence bands, in addition to pointwise confidence bands. The advantage of having simultaneous confidence bands is that these bands appeal to the intuition of people. If we ask colleagues if they understand what the 95%

uncertainty bands around a trend means, they generally believe that the true trend will lie between the upper and lower limits, with a chance of 95%. However, that interpretation is only true for simultaneous confidence bands, not for pointwise confidence bands (cf. http:

//en.wikipedia.org/wiki/Confidence_band). Thus, presenting simultaneous confidence bands rather than pointwise confidence limits, has communicational advantages.

Finally, we will show how trend ensembles allow one to construct confidence bands for the probability of crossing thresholds. Uncertainties for these chances or their reverses, average return periods, will be derived. These uncertainties cannot be gained from standard theory (cf. Visser and Petersen, 2012, Section 7 [[31]]) .

### 2.3 Kalman filter

Structural time series models can be estimated by the Kalman filter. The Kalman filter needs this models to be in the so called state-space form. State space models consists of a measurement equation

yt= c^{0}_{t}αt+ ξt (2.1)

and a transition equation

α_{t}= T α_{t−1}+ η_{t} (2.2)

here, the prime stands for transpose. y_{t} stands for the observation at time t so this is a
scalar. αt is the k × 1 state vector, T is the k × k known system matrix and ct is a k × 1
vector. ηtis the k × 1 vector of disturbances and ξtis a scalar disturbance term. ηtand ξtare
distributed independently of each other. Assuming that the concerning model is Gaussian,
both ηtand ξtare normally and independently distributed. In other words ηt∼ NID(0, σ^{2}Q)
and ξt∼ NID(0, σ^{2}ht). Here, σ^{2} and ht are both scalars and Q a k × k diagonal matrix. In
the R file the following assumptions are made:

• The matrices T and Q are a priori known time-independent matrices. When Q is not known a priori, it can be determined using Maximum Likelihood optimization.

• Q is a diagonal matrix with variances of the noise vector η_{t}on its main diagonal.

• c_{t} and h_{t}are known. h_{t}= 1 when the observation y_{t} is not missing.

The Kalman filter will return (in case of a Gaussian model) the Minimum Mean Square
Linear Estimator (MMSLE) of α_{t}. For more details, see Harvey(1989, Ch.3). Now, let
the vector a_{t|s} denote the Minimum Mean Square Estimator (MMSE, which is an optimal
estimator) of αt conditional upon all information up to and including time s. More details
about this can be found in Anderson and Moore (1979,Ch.5) [[22]].

a_{t|s} = E [αt|y_{1}, · · · , ys] (2.3)

The covariance matrix of a_{t|s}− α_{t}, which is also called the Mean Square Error (MSE) matrix,
is denoted by

σ^{2}P_{t|s} = E(a_{t|s}− α_{t})(a_{t|s}− α_{t})^{0}|y_{1}, · · · , y_{s}

(2.4) The Kalman filter knows three different situations for which it generates an optimal esti- mate of αt.

• t = s which is called filtering

• t < s which is called smoothing

• t > s which is referred to as prediction

We will clarify and present a short overview with the corresponding equations for these situations in the subparagraphs below.

2.3.1 Filtering

The Kalman filter is recursive which means that at a certain time t − 1 we predict αt and Pt

both for time t. At the time y_{t} is available, the so called one-step-ahead prediction error will
be computed. This is the difference between the real value of y_{t} and the prediction of this
value at time t − 1. Since yt is known at this moment, the estimates for a and P at time t
can now be updated using the additional information of y_{t}. In formulae:

Prediction equations:

a_{t|t−1}= T a_{t−1|t−1} (2.5)

P_{t|t−1}= T P_{t−1|t−1}T^{0}+ Q (2.6)

One-step-ahead prediction error or innovation:

n_{t|t−1} = y_{t}− c^{0}_{t}a_{t|t−1} (2.7)

The one-step-ahead prediction error is normally and independently distributed, i.e. n_{t|t−1}∼
N ID(0, σ^{2}f_{t|t−1}) with f_{t|t−1}:

f_{t|t−1}= c^{0}_{t}P_{t|t−1}ct+ ht (2.8)

Updating equations:

a_{t|t}= a_{t|t−1}+ P_{t|t−1}c_{t}n_{t|t−1}f_{t|t−1}^{−1} (2.9)

P_{t|t}= P_{t|t−1}− P_{t|t−1}ctc^{0}_{t}P_{t|t−1}f_{t|t−1}^{−1} (2.10)
The initial value, to start the filter, a_{0|0} can be taken arbitrary, take for example a_{0|0} equal
to the first observation. The initial value P_{0|0} = κI, here κ is a large but finite number and
I is the k × k identity matrix. Take κ for example equal to 100 times the first observation.

In this case you are sure that it is a finite number, but also that it is large relatively to the
observations. Now, the filter will generate its own starting values in, say, N_{s} iterations. N_{s}
is also refered as the adjusting time.

2.3.2 Smoothing

Once, the values of a_{t|t}are all available, it is possible to smooth this estimations. This is done
by using not only the information of ys where s ≤ t but also the additional information of ys

where s > t.

Three types of smoothing can be used. For more information about this smoothing types,
see Anderson and Moore (1979, Ch.7) [[22]]. The one that is of interest in our type of trend
estimation and the one that is used in the R file is fixed-interval smoothing. This fixed-interval
smoother works with backwards recursion and is denoted by a_{t|N} with N the sample size and
t ≤ N .

a_{t|N} = a_{t|t}+ P_{t}^{∗}(a_{t+1|N} − T a_{t|t}) (2.11)

P_{t|N} = P_{t|t}+ P_{t}^{∗}(P_{t+1|N} − P_{t+1|t})P_{t}^{∗}^{0} (2.12)

P_{t}^{∗} = P_{t|t}T^{0}P_{t+1|t}^{−1} (2.13)

with t = N − 1, · · · , 1. The smoothed innovations or residuals are defined by

n_{t|N} = yt− c^{0}_{t}a_{t|N} (2.14)

This residuals are normally distributed, i.e. n_{t|N} ∼ N (0, σ^{2}f_{t|N}) with

f_{t|N} = c^{0}_{t}P_{t|N}c_{t}+ h_{t} (2.15)
2.3.3 Prediction

The Kalman filter is also able to do predictions of trend estimates. After the processing of
the N observations by the Kalman filter, the ith prediction for a_{N +i|N} and P_{N +i|N} can be
generated with the following recursive formula:

a_{N +i|N} = T a_{N +i−1|N} (2.16)

P_{N +i|N} = T P_{N +i−1|N}T^{0}+ Q (2.17)

Now, the predictions of the observations y_{N +i}based on y_{N} are obtained as follows:

y_{N +i|N} = c^{0}_{N +i}a_{N +i|N} (2.18)

### 2.4 IRW trend model

Like we mentioned before, the trend model chosen for this thesis is the integrated random walk (IRW) trend model. This model can be seen as a natural extension of the OLS linear trend:

both linear and more flexible trends can be estimated, along with uncertainty information.

Furthermore, the trend is given over the full sample period and predictions can be generated along with uncertainty bands. Finally, the flexibility of the trend can be chosen by maximum likelihood optimization (based on minimization of one-step-ahead prediction errors). The presence of uncertainty information is an important feature of this approach as we have shown in paragraph 2.2. In this paragraph, we will elaborate on the IRW trend model. Other

references for IRW trends are: Kitagawa (1981) [[16]], Young et al. (1991) [[33]], Harvey (1989) [[12]], Mills (2010) [[19]], Visser (2004) [[29]], and Visser and Petersen (2009, 2012) [[30], [31]].

The IRW trendmodel has the following form:

Measurement equation:

y_{t}= 1 0µ1,t

µ_{2,t}

+ ξ_{t} (2.19)

Transition equation:

µ_{1,t}
µ2,t

=2 −1

1 0

µ_{1,t−1}
µ2,t−1

+λ_{t}

0

(2.20) From this equations it is easy to see that the matrices T , ct, αtand Q for the IRW-trendmodel are:

T =2 −1

1 0

(2.21)

c_{t}=1
0

(2.22)

α_{t}=µ1,t

µ_{2,t}

(2.23)

Q =σ^{2}_{λ} 0
0 0

(2.24)
There are still some unknowns in the model like σ^{2}_{λ} in the matrix Q and σ^{2}. This values
can be estimated using Maximum Likelihood estimation. The log-likelihood of the filtering
procedure is based on the normality of the noise processe ξt and ηt. It is derived from the
prediction-error decomposition

log L = −1 2

"

(N − N_{s}) ln(2πσ^{2}) +

N

X

Ns+1

ln(f_{t|t−1}) + 1
σ^{2}

n^{2}_{t|t−1}
f_{t|t−1}

!#

(2.25)

with N_{s} the adjusting time. By maximizing this function the estimated value of σ^{2} can be
found:

ˆ

σ^{2} = 1
N − N_{s}

N

X

t=Ns+1

n^{2}_{t|t−1}

f_{t|t−1} (2.26)

Substituting ˆσ^{2} into log L and multiplication by −2 gives the concentrated log-likelihood
function L_{c}

log Lc=

N

X

t=Ns+1

log(ˆσ^{2}_{M L}f_{t|t−1}) (2.27)
The value of ˆσ_{M L}^{2} can now be found by minimizing log L_{c}.

### Method and software

### 3.1 Trend ensembles and uncertainties

Using the IRW trend model and the Kalman filter for a suitable time series, optimal estimates
for the trend µt can be found given the measurements y1, ...., yN. Now, the Kalman filter
generates estimates of the trend values µ = (µ_{1}, ..., µ_{N}) that follow a multivariate normally
distribution if η_{t} ∼ N (0, σ^{2}_{η}) and _{t} ∼ N (0, σ^{2}_{}) for all times t. In chapter 2 we showed
the working and the formulae of the Kalman filter. From this fomulae we can deduce the
mean ˆµ = (ˆµ_{1}, ..., ˆµ_{t}, ..., ˆµ_{N}) and the N × N covariance matrix Σ of this multivariate normal
distribution. They have the following form:

ˆ

µ = (ˆµ1, ..., ˆµ_{N}) with ˆµt= (10)a_{t|N} (3.1)

Σ = V ar(µ) = σ^{2}_{M L}

P_{1,1|N}^{1,1} · · · P_{1,N |N}^{1,1}
... . .. ...
P_{N,1|N}^{1,1} · · · P_{N,N |N}^{1,1}

(3.2)

where the equation for P_{s,t|N} is given by Jong and Mackinnon (1988) [[7]]:

P_{s,t|N} = P_{s}^{∗}P_{s+1,t|N} (3.3)

The matrix P_{s}^{∗} is given in 2.13 and P_{t,t|N} = P_{t|N} holds.

Once ˆµ and Σ are known, we can generate a series of M trends based on this multi-
variate normal distribution. There are two general approaches to sampling µ^{(k)} from the N
dimensional multivariate normal distribution N (ˆµ, Σ). The first approach is based on con-
ditional distributions (Ripley, 1987 [[23]]). The second approach is the approach followed in
the R function mvrnorm(), found in the library MASS. This function is implemented in the
TrendSpotteR software. Library MASS supports functions and datasets described in Ven-
ables and Ripley (2002) [[26]]. Function mvrnorm() makes use of the fact that if V1, , VN are
iid N(0,1) random variables and A is an NxN matrix, then µ^{(k)} = ˆµ + AV is multivariate
normal with mean ˆµ and covariances E[(µ^{(k)}_{i} − ˆµ_{i})(µ^{(k)}_{j} − ˆµ_{j})] being the (i, j) element of the
covariance matrix. Because of the linearity of the expectation this covariance matrix is equal

9

to

E[(µ^{(k)}− ˆµ)(µ^{(k)}− ˆµ)] = E[(AV )(AV )] (3.4)

= E[AV V^{T}A^{T}]

= AE[V V^{T}]A^{T}

= AIA^{T}

= AA^{T}

Thus, to generate random vectors µ^{(k)}, we have to find A, the square root of Σ, or Σ = AA^{T}.
To find matrix A the R function mvrnorm() uses the method of eigenvalue decomposition.

For details please see Kaas et al. (2008) [[15]]. The set of M trends constructed in the way outlined above, will be denoted here as a trend ensemble.

The advantage of having a trend ensemble is laid in the fact that the computation of con-
fidence bands is made easy. Only simple statistical concepts are needed. E.g., the trend esti-
mate ˆµ_{t} and its corresponding variance σ_{(µ,t)}^{2} follow directly from the M estimates ˆµ^{1}_{t}, ..., ˆµ^{M}_{t}
: just take the mean and variance of these M simulated trend estimates at time t. For large
M this mean and variance will converge to the mean and variance which follow directly from
the Kalman filter formulae. In the same way trend differences and uncertainty bands are
calculated for any trend difference µt− µ_{s} . Since we have M estimates ˆµ^{1}_{t}, ..., ˆµ^{M}_{t} and M
estimates ˆµ^{1}_{s}, ..., ˆµ^{M}_{s} we also have M estimates for ˆµ_{t}− ˆµ_{s}. Thus, we can calculate means,
variances or any percentile on these M values. E.g., 95% pointwise confidence bands follow
from the 2,5th percentile and the 97,5th percentile.

In fact, any possible trend function with corresponding confidence interval can be gener-
ated. One such trend function is the calculation of trend accelerations. In climate change
research it is of interest if trends in global mean sea level rise show accelerations or in other
words, is the second derivative of µ_{t} positive? Analoguous to the trend differences µ_{t}− µ_{s}we
can compute M differences µt− 2µ_{(t−1)}+ µ_{(t−2)}along with any confidence band of interest.

Confidence limits for the chance of crossing a threshold L (denoted by P_{t}^{L}) can also be
derived from trend ensembles. By calculating the variance σ_{k}^{2}of the residuals y1− ˆµ^{k}_{1}, ..., yN −
ˆ

µ^{k}_{N} we can compute chances P_{t}^{(L,k)}, t = 1, ..., N assuming ˜y^{k}_{t} to be normally distributed:

˜

y_{t}^{k} ≈ N (ˆµ^{k}_{t}, σ_{k}^{2}). In this way we can generate M chances P_{t}^{(L,1)}, ..., P_{t}^{(L,M )}, and uncertainty
bands for P_{t}^{L} follow by computing variances or percentile values of interest based on these
M values P_{t}^{(L,k)}. Also uncertainties for any chance difference P_{t}^{L}− P_{s}^{L}can be computed. To
check the uncertainty of the exceedance of some threshold, we will generate in the subsection
below a theoretical expression of the variance of this exceedance.

3.1.1 Error in exceedance probability

Let us denote the chance to exceed a threshold of level L by P_{t}^{L}. TrendSpotteR has an option
to compute this chance of exceedance, along with its 95% confidence intervals. As mentioned
earlier, this is done by use of the generated trend ensemble. Because there is no literature
about the uncertainties of the estimation of P_{t}^{L} it is case to check this uncertainties. We will
compare the standard deviation of P_{t}^{L} we found with the use of the trend ensemble with a
theoretical expression for the variance of this chance.

The standard deviation of P_{t}^{L} for every time t, calculated from the trend ensemble
with 1000 P_{t}^{L}-curves (call this matrix PEnsemble) can be found by one single command

in TrendSpotteR:

SDPtl < − apply(PEnsemble,1,sd)

In the continuation of this paragraph, we will generate a theoretical expression for this
standard deviation. Again we note that P_{t}^{L} equals the chance to exceed level L at time t.

By definition:

P_{t}^{L}=
Z ∞

L

√1 2πσt

exp

−^{1}_{2}

y−µt σt

2

(3.5)

We can write ˆσ^{2}_{t} as follows:

ˆ

σ^{2}_{t} = σˆ_{}^{2}+ ˆσ^{2}_{η} (3.6)

= σˆ_{M L}^{2} h_{t}+ ˆσ_{M L}^{2} P_{t|N}^{1,1}

The Taylor expansion for the second moment generates an approximation for the variance of the probability of exceeding some pre-defined threshold.

V P_{t}^{L}

= V (g(ˆµ_{t}, ˆσ^{2}_{t})) (3.7)

≈ ˙g µt

σ_{t}^{2}

0

V ˆµt

ˆ
σ_{t}^{2}

˙g µt

σ_{t}^{2}

with g equal to P_{t}^{L}. The next thing to do is to generate an expression for the components of
these matrices.

Derivation of ˙g µ_{t}
σ_{t}^{2}

.

˙g µt

σ_{t}^{2}

is equal to

δ

δµtg(µ_{t}, σ^{2}_{t})

δ

δσtg(µt, σ_{t}^{2})

!

. The elements of this vector will be derived below.

δ δµt

g(µt, σ^{2}_{t}) = δ
δµt

[P_{t}^{L}] (3.8)

= δ

δµ_{t}
Z ∞

L

√1

2πσ_{t}exp

−^{1}

2

y−µt σt

2 dy

= Z ∞

L

δ δµt

√ 1 2πσt

exp

−^{1}_{2}

y−µt σt

^{2}
dy

= Z ∞

L

y − µ_{t}
σ_{t}

1
σ_{t}

√ 1

2πσ_{t}exp

−^{1}

2

y−µt σt

^{2}
dy

= 1

σ^{2}_{t}
Z ∞

L

√1 2πσt

exp

−^{1}_{2}

y−µt σt

2

ydy − µt

Z ∞ L

√1 2πσt

exp

−^{1}_{2}

y−µt σt

2 dy

= 1

σ^{2}_{t}

Z ∞ L

√1

2πσtexp

−^{1}

2

y−µt σt

2

R∞ L

√1

2πσtexp

−^{1}_{2}

y∗−µt σt

2

dy^{∗}

ydy − µt

Z ∞ L

√1 2πσt

exp

−1 2

y^{∗}− µ_{t}
σ_{t}

2
dy^{∗}

= 1

σ^{2}_{t}[EYt,L− µ_{t}]P_{t}^{L}

= 1

σ_{t}^{2}[µ_{t}+ σ_{t}(P_{t}^{L})^{−1}φ(L) − µ_{t}]P_{t}^{L}

= 1

σt

φ(L)

with φ(L) equal to the probability density function N (µt, σ^{2}). Furthermore EYt,L is the
mean of the truncated normal distribution. Truncated distributions are conditional distribu-
tions that results from restricting the domain of some other probability distribution. This
truncated distributions arise a lot in practical statistics where one is interested in the values
which lie above or below some given threshold or within a specified range. The cumulative dis-
tribution function f for a ≤ x ≤ b of the truncated normal distribution has the following form:

f =

1

σφ(^{x−µ}_{σ} )
φ(^{b−µ}_{σ} ) − φ(^{a−µ}_{σ} )

with φ(ξ) equal to the probability density function of the standard normal distribution and φ
its cumulative distribution function. In our case a = L and b = ∞. When b = ∞, φ(^{b−µ}_{σ} ) = 1.

Knowing this, one is able to see why step 6 and 7 of 3.8 are equal to eachother. The following
step is equal since the expected value of a random variable of this one sided truncated normal
distribution is equal to µ + σ ^{φ(}

L−µ
σ )
1−φ(^{L−µ}_{σ} ).

We do something similar for _{δσ}^{δ}2

tg(µt, σ^{2}_{t}).

δ

δσ^{2}_{t}g(µt, σ^{2}_{t}) =

Z δ

δσ_{t}^{2}[P_{t}^{L}] (3.9)

= δ

δµt

Z ∞ L

√1 2πσt

exp

−^{1}_{2}

y−µt σt

^{2}
dy

= Z ∞

L

1

p2πσ_{t}^{2} exp

−^{1}

2

y−µt σt

2

y − µ_{t}
σ_{t}^{2}

2

− exp

−^{1}

2

y−µt σt

2

1 2√

2π(σ_{t}^{2})^{(3/2)}dy

= Z ∞

L

1 2p

2πσ^{2}_{t} exp

−^{1}_{2}

y−µt σt

2

y − µ_{t}
σ_{t}^{2}

2

− 1
σ_{t}^{2}

dy

= Z ∞

L

1
(σ_{t}^{2})^{2}2p

2πσ_{t}^{2} exp

−^{1}

2

y−µt σt

^{2}

(y − µ_{t})^{2}− σ_{t}^{2}

dy

Derivation of V ˆµ_{t}
ˆ
σ_{t}^{2}

.
V ˆµ_{t}

ˆ
σ^{2}_{t}

can be computed by the inverse of the Fisher information matrix, which is the neg- ative value of the expected Hessian matrix. The Hessian matrix of the log-likelihood has the following form:

H =

δ^{2}ln L
δµ^{2}_{t}

δ^{2}lnL
δµtδσ^{2}_{t}
δ^{2}lnL

δµtδσ^{2}_{t}

0 δ^{2}ln L
δ(σ^{2}_{t})^{2}

(3.10)

H is an 2x2 matrix where the elements of the matrix ^{δ}^{2}_{δµ}^{ln L}2

t are known from the output of the
Kalman filter and is denoted by σ_{M L}^{2} P_{t|N}. The values of the elements in _{δµ}^{δ}^{2}^{lnL}

tδσ^{2}_{t} and _{δ(σ}^{δ}^{2}^{ln L}2
t)^{2} are
still unknown.

Derivation of ^{δ}_{δ(σ}^{2}^{ln L}2
t)^{2}.
We know that

ln L = −1 2

(N − Ns) ln(2πσ^{2}_{t}) +

N

X

t=Ns+1

ln(f_{t|t−1}) + 1
σ_{t}^{2}

η^{2}_{t|t−1}
f_{t|t−1}

(3.11)

The first and second order derivative of the log-likelihood are:

δ^{2}ln L(σ_{t}^{2})

δσ_{t}^{2} = −N − Ns

2σ^{2}_{t} + 1
2(σ_{t}^{2})^{2}

N

X

t=Ns+1

η^{2}_{t|t−1}
f_{t|t−1}

(3.12)

δ ln L(σ_{t}^{2})

δ(σ_{t}^{2})^{2} = N − Ns

2(σ_{t}^{2})^{2} − 1
(σ_{t}^{2})^{3}

N

X

t=Ns+1

η^{2}_{t|t−1}
f_{t|t−1}

(3.13)

The Hessian of σ^{2}_{t} can be computed as follows:

H(σ^{2}_{t}) = δ^{2}ln L(σ_{t}^{2})

δ(σ^{2}_{t})^{2} (3.14)

= (N − N_{s})
2(σ_{t}^{2})^{2} − 1

(σ^{2}_{t})^{3}

N

X

t=Ns+1

η_{t|t−1}^{2}
f_{t|t−1}

E[H] = E (N − N_{s})
2(σ_{t}^{2})^{2} − 1

(σ^{2}_{t})^{3}

N

X

t=Ns+1

η_{t|t−1}^{2}
f_{t|t−1}

(3.15)

= E (N − N_{s})
2(σ_{t}^{2})^{2}

−

1
(σ_{t}^{2})^{3}

N

X

t=Ns+1

η_{t|t−1}^{2}
f_{t|t−1}

= (N − N_{s})
2(σ^{2}_{t})^{2} − 1

(σ_{t}^{2})^{3}E

N

X

t=Ns+1

η_{t|t−1}^{2}
f_{t|t−1}

= (N − Ns)
2(σ^{2}_{t})^{2} − 1

(σ_{t}^{2})^{3}(N − Ns)σ_{t}^{2}

= (N − Ns)

2(σ^{2}_{t})^{2} − (N − Ns)
(σ^{2}_{t})^{2}

= −(N − N_{s})
2(σ_{t}^{2})^{2}
This is true since ^{(N −N}_{2(σ}2 ^{s}^{)}

t)^{2} and _{(σ}^{1}2

t)^{3} are constants and σ_{t}^{2}= E

PN

t=Ns+1

η^{2}_{t|t−1}
f_{t|t−1}

by definition
of our maximum likelihood estimator. The next thing to do is to compute the variance of
this variance σ^{2}_{t}.

V (σ_{t}^{2}) = −(E[H])^{−1} (3.16)

= (N − N_{s})
2(σ_{t}^{2})^{2}

−1

= 2(σ^{2}_{t})^{2}
(N − Ns)

The standard deviation is of σ_{t}^{2} is now simply the square root of the variance of σ_{t}^{2}.
Derivation of _{δµ}^{δ}^{2}^{ln L}

tδσ_{t}^{2}

Since the Kalman filter states µ_{t}= c^{0}_{t}a_{t|t−1} and η_{t|t−1}= y_{t}− c^{0}_{t}a_{t|t−1}, we have:

ln L = −1 2

(N − Ns) ln(2πσ^{2}_{t}) +

N

X

t=Ns+1

ln(f_{t|t−1}) + 1
σ_{t}^{2}

η^{2}_{t|t−1}
f_{t|t−1}

(3.17)

= −1 2

(N − Ns) ln(2πσ^{2}_{t}) +

N

X

t=Ns+1

ln(f_{t|t−1}) + 1
σ_{t}^{2}

(yt− µ_{t})^{2}
f_{t|t−1}

The derivative of ln L with respect to µt equals:

δ ln L δµt

= 1
σ_{t}^{2}

N

X

t=Ns+1

(µt− y_{t})

f_{t|t−1} (3.18)

Now, it is easy to see that:

δ^{2}ln L

δµtδσ_{t}^{2} = − 1
σ^{4}_{t}

N

X

t=Ns+1

(µ_{t}− y_{t})

f_{t|t−1} (3.19)

Taking the expectation, we get:

E

− 1
σ^{4}_{t}

N

X

t=Ns+1

(µ_{t}− y_{t})
f_{t|t−1}

= − 1
σ^{4}

N

X

t=Ns+1

E (µ_{t}− y_{t})
f_{t|t−1}

(3.20)

= − 1
σ^{4}

N

X

t=Ns+1

E (µ_{t}− (µ_{t}+ ξ_{t}))
f_{t|t−1}

= − 1
σ^{4}

N

X

t=Ns+1

E

−ξ_{t}
f_{t|t−1}

= − 1
σ^{4}

N

X

t=Ns+1

E[−ξt]
E[f_{t|t−1}]

= − 1
σ^{4}

N

X

t=Ns+1

0
E[f_{t|t−1}]

= 0 Every element of the matrix V ˆµt

ˆ
σ_{t}^{2}

is derived now. The matrix has the following form:

V ˆµt

ˆ
σ_{t}^{2}

= V "σ^{2}_{M L}P_{t|N} 0
0 _{(N −N}^{2(σ}^{2}^{t}^{)}^{2}

s)

#

(3.21)

Check for the confidence intervals of P_{t}^{L}.

We found a theoretical expression for the variance and the standard deviation (by taking the
square root of this variance) of P_{t}^{L}.

As a simplification of 3.7, we can make the assumption that the variance of σ^{2} = 0, which
would make the computations a little easier. The expression for the variance of P_{t}^{L} then
becomes

V [P_{t}^{L}] =

δ

δµ_{t}g(µ_{t}, σ^{2}_{t})

2

V [µ_{t}] (3.22)

To check whether both methods, i.e. the theoretical derivation (3.22) and the derivation
with the use of the trend ensemble for the standard deviation of P_{t}^{L}, result in a corresponding
value for the standard deviation, we want the fraction of both standard deviations to be

about 1 ± 0.2. It appears that the fraction satisfies this requirement when the dataset is not
too small (around number of observations ≥ 50). To clarify this, we have drawn 60 random
variables from the standard normal distribution and computed the fraction for level 2, see
column 5 of figure 3.1. As one can see, both methods find similar values for the standard
deviation of the exceedance of level 2 since the fraction is between 1 and 1.2. For a smaller
number of observations, the requirement is not satisfied, see figure 3.2. One idea is that this
is due to the simplifying approximation of 3.22 instead of 3.7 and that we need to take the
variance of σ_{t}^{2} into account instead of using the simplifying assumption of V [σ_{t}^{2}] ≈ 0. What
turns out when we compare the fraction with the use of 3.7 (column 6 of figure 3.1 and 3.2)
and 3.22 (column 5 of figure 3.1 and 3.2) is that the difference is negligible, and the results are
quitte similar. The deviation of both methods for small data sets can probably be explained
by ommiting the second order and higher order terms in the Taylor expansion 3.7. Probably
they are not negligible due to the high variance in a small data set.

3.1.2 Simultaneous confidence bands

Besides the creation of confidence intervals for P_{t}^{L}, another application of trend ensembles is
the calculation of simultaneous confidence bands, in addition to pointwise confidence bands.

Simultaneous confidence bands give upper and lower limits for the trend as a whole. In the
context of trend ensembles simultaneous 95% confidence limits can be estimated by finding
a tight envelop around the estimates ˆµ_{1}, ..., ˆµ_{N} such that exactly 0.95 ∗ M simulated trends
fall within this envelop, or equivalently, the choice of the 0.05 ∗ M extreme trends which fall
outside the envelop. This choice is not unique since the term extreme is not unique. The pro-
cedure we have chosen to implement in the TrendSpotteR software is as follows (to illustrate
we choose an ensemble of 1000 trends, so M = 1000):

• Call the ensemble with 1000 trends T_{t,i} , with i = 1 · · · , 1000.

• R_{t,i} is the ranked matrix where all 1000 values from T_{t,i} are ranked for each time t from
small to large.

• Compute the relative rank R^{0}_{t,i} = |R_{t,i}−^{1000+1}_{2} | for each time t.

• Consider M_{i} to be the matrix with for every trend i the sum of the 5 values of Tt,i with
highest value in R^{0}_{t,i}.

• Compute q_{i}= 95^{th} percentile of M_{i}.

• Call T_{t,i}^{∗} the matrix where trends with Mi ≥ q_{i} are removed from Tt,i.

(Note if > 50 trends M_{i}≥ q_{i}, then remove the (say) n trends with M_{i}> q_{i}and randomly
remove n − 50 from the trends with Mi = qi)

• Now, the upper band for the simultaneous confidence interval is UB= max_{t}[T_{t,i}^{∗}] and
the lower band is LB= mint[T_{t,i}^{∗}].

We note that simultaneous confidence limits are rarely found in the literature (Liu et al., 2008).

Figure 3.1: 60 random draws from the standard normal distibution. Column 2 shows the standard deviation of the exceedance of level 2 computed by the TrendSpotteR 1.0 method, Column 3 shows the standard deviation analytical computed with use of 3.22, Column 4 shows the extended analytical version of the standard deviation 3.7, Column 5 represents the fraction of Column 2 divided by 3 and the last column is the fraction of 2 divided by 4.

Figure 3.2: 20 random draws from the standard normal distibution. Column 2 shows the standard deviation of the exceedance of level 2 computed by the TrendSpotteR 1.0 method, Column 3 shows the standard deviation analytical computed with use of 3.22, Column 4 shows the extended analytical version of the standard deviation 3.7, Column 5 represents the fraction of Column 2 divided by 3 and the last column is the fraction of 2 divided by 4.

### 3.2 TrendSpotteR 1.0 software

The theory set out in chapter 2 and section 3.1 has been programmed in R. As such the
software estimates IRW trends, along with 95% pointwise and simultaneous confidence bands,
idem for trend differences µ_{t}− µ_{(t−1)} and µ_{N} − µ_{t}, and idem for probabilities of exceeding
thresholds (P_{t}^{L}) or differences thereof (P_{N}^{L} − P_{t}^{L}). The software includes a main program
(script), along with four functions. Furthermore, a concise users manual is supplied along
with the two datasets described in section 3.3. The files are as follows:

• ReadMe.txt . This file is the manual for running the software and addresses error messages.

• TrendSpotteR.R. This file contains the starting script of the software.

• OptionScreen.R. This function creates the opening screen for filling in the trend options.

See figure 3.3 for an example.

• Run.R, This function contains the algorithms for the Kalman filter and smoothing equations. Also the routine for maximum likelihood optimization is implemented here.

The optimization method chosen is a simple brute force shooting method. This method avoids convergence to local minima in the one-dimensional parameter space.

• RunLn.R, Idem for log-transformed data.

The two datafiles added to the software, are called CNT Season.dat (cf. section 3.3.1) and GlobalDamage.dat (cf. section 3.3.2). These input files are provided in ASCII format (other formats are not supported by TrendSpotteR 1.0).

The TrendSpotteR 1.0 software output is organized in three folders that will be created in the R working directory when running the program. The first folder is called Figures. This folder includes all graphs TrendSpotteR 1.0 generates (cf. examples in section 3.3, and figures 3.4, 3.5, 3.6 and 3.7). The second folder, called OutputDir, provides the data underlying the graphs in Figures. These files can be used to design more advanced graphical presentations.

OutputDir also includes a file called Options, from which one can retrieve the input option information. The third output folder is called Ensemble, and consists of the ensemble files corresponding to trends, trend differences, threshold probabilities and differences in threshold probabilities (cf. figures 3.4b and 3.6b). Each of these files contains a first row with the time variable, followed by 1000 rows with simulated trends or threshold probabilities.

If TrendSpotteR 1.0 is loaded for the first time it will automatically install the R packages mvtnorm, tseries, tcltk, prettyR and MASS. Once TrendSpotteR 1.0 is running, an option screen will be shown (figure 3.3). Options are:

• a log transformation to the data. If this option is chosen, the IRW model is estimated on log-transformed data and estimation results will be back transformed to the original scale (except for innovations) .

• one can choose for maximum likelihood optimization or set the smoothing parameter by hand.

• if the data file contains missing data, the missing data code can be given here.

• if threshold exceedances are of interest, a threshold can be given.

• if predictions for future trend values are of interest, the number of predictions can be given here.

Now, the software will estimate trends and corresponding uncertainty bands. However, one has to check the validity of a number of assumptions underlying IRW trends. A first check is the whiteness of residuals (the standardized innovation series, in Kalman filter terms). Also the homoscedasticity of residuals should be checked. Additionally, the normality of residuals should be checked. This normality is a prerequisite for using trend ensembles (and thus for estimating simultaneous confidence bands). We note that the pointwise confidence bands for trends and trend differences in TrendSpotter do not depend on the normality of residuals.

However, 95% confidence limits should be read as 2 − σ confidence limits if normality is violated.

Standard graphs in TrendSpotter for checking assumptions, are the QQ-plot, the ACF plot and the acutal plot of residuals. For more detailed information on checking residuals we refer to Harvey (1989, Ch. 5) [[12]], and Chandler and Scott (2011, Section 3.3) [[3]].

3.2.1 Creating non-standard statistics

In this Section we will give two examples of simple extensions of the TrendSpotteR software, based on basic knowledge of R.

Assume one wants to present 80% pointwise confidence bands of the trend estimate µt

instead of 95%. These 80% confidence bands can be easily deduced from the trend ensemble matrix T , containing 1000 simulated trends. The upper limit is simply generated by taking the 90th percentile of the matrix T , the lower limit by the 10th percentile. In R one can

simply use the following commands to create these two limits:

upperLimit < − apply(T,2,quantile,probs=0.10) lowerLimit < − apply(T,2,quantile,probs=0.90)

A second example. Suppose one wants to do a Box-Cox transformation to the data Xt(the measurements). This transformation can be useful if residuals are not normally distributed (Chandler and Scott, 2011 - Section 3.3.2) [[3]]. Box-Cox transformations have the following form:

Y_{t}=

( (X_{t}^{λ}−1)
λ , λ 6= 0

log(Xt), λ = 0 (3.23)

where λ denotes the power of the transformation. The case where λ = 0 (log transformation)
is included in the TrendSpotteR software. However if λ 6= 0, one may want to transform the
trend results back to the original scales of Xt, or Xt= (λYt+ 1)^{1}^{λ}. The following statement
will place the transformed trend data, present in matrix T , in a new matrix T T :

TT < −(lambda ∗ T + 1)ˆ(inv(lambda))

Now, any means, variances and percentiles can be computed, based on this T T matrix. E.g., the following statements yield the 50, 2.5 and 97.5 percentiles (95% pointwise uncertainty bands):

Trend < − apply(TT,1,quantile,probs=0.5)

upperLimit < − apply(TT,2,quantile,probs=0.025) lowerLimit < − apply(TT,2,quantile,probs=0.975)

Analoguous, trend differences can be computed on the basis of matrix T T , along with point- wise uncertainty bands.

In chapter 4 we will show another example of creation of non-standard statistics with the use of TrendSpotteR 1.0. It will be explained how trends can be estimated with the use of TrendSpotteR 1.0 for data which are distributed according to a Gumbel distribution.

### 3.3 Examples

Trend estimation with TrendSpotteR 1.0 is illustrated here for two examples. This first example concerns a series of century-long spring temperatures in the Netherlands. The second example is on global economic damage by weather-related disasters.

3.3.1 Spring temperatures 1906-2012

Seasonal temperatures in the Netherlands are given in the ASCII file CNT Season.dat. This
file is part of the TrendSpotteR 1.0 software. The data file consists of six columns: Year, win-
ter temperatures, spring temperatures, summer temperatures, autumn temperatures, winter
temperatures and annual averages, all expressed in ^{◦}C. Here, we estimate an IRW trend
model for spring temperatures, along with probabilities for crossing the threshold of 11.0 ◦ C.

The corresponding input screen is given in figure 3.3. Background information on these temperature data has been given by Schrier et al. (2011) [[24]].

The TrendSpotteR 1.0 run generates six graphs with trends, trends differences, probabil-
ities of crossing the threshold of 11.0^{◦}C and differences thereof. See figures 3.4a through f.

Figure 3.4a shows the estimated trend ˆµ_{t}, along with its corresponding confidence intervals
(both simultaneous and pointwise). Figure 3.4b shows all simulated 1000 trends which have

Figure 3.3: Input options screen for the analysis of spring temperatures in the Netherlands, 1906-2012

been used to compute these simultaneous confidence bands. Figure 3.4c shows the first dif-
ferences µ_{t}− µ_{(t−1)}, along with confidence bands. The graph shows that these differences are
statistical significant (α = 0.05) over a short time interval: the lower limit of simultaneous
confidence bands lies above zero over the period 1984-2007. In the same way figure 3.4d shows
that the trend differences µ_{2012}− µ_{t} are statistically significant positive for all years before
1906-2009.

Figure 3.4e shows the probability of crossing the threshold of 11.0^{◦}C. The graph shows
that the probabilities are near zero over the period 1906-1980, with a rapid acceleration over
1981-2012. The probability in 2012 has risen to 0.20. The uncertainty bands show this value
to be statistically significant. The corresponding average return period is calculated from the
inverse probability: once in five years. Finally, figure 3.4f shows that the probability P_{2012}^{11.0} is
significant higher than all other probabilities P_{t}^{11.0}.

TrendSpotteR 1.0 generates three graphs for checking model assumptions, see figure 3.5.

Both ACF (figure 3.5a) and the innovation plot (figure 3.5b) show that the innovations are in good agreement to be white noise processes. All 22 autocorrelations are within the 2 − σ confidence limits and show no periodic behavior. We have checked the homoscedasticity of residuals by visual inspection of figure 3.5b. Figure 3.5c shows the normality plot for the innovations. The graph shows that these innovations are reasonable centered around a straight line. Thus, no log-transformation or other Box-Cox transformations are needed here.

3.3.2 Global economic losses from weather-related disasters

Visser et al. (2012) [[27]] have analysed trends in weather-related disaster data, 1980-2010.

These data have been extracted from the CRED database on natural disasters. Here, we show the TrendSpotteR results for one series, that of global economic losses from weather- related disasters. Weather-related disasters are disasters due to hurricanes, extratropical