Trends in environmental data:
when uncertainty matters
Krista Jaarsma
Master Thesis in Mathematics
March 2013
Trends in environmental data:
when uncertainty matters
Summary
A lot of research is done on the behavior of environmental processes. Questions like how did these processes behave in the past and how will it behave in the future, are of interest. Within this context trend analysis is an important tool in the study of observed environmental data.
Trend values are estimators of the observations or predictions of the future observations. Both the degree of uncertainty of this values and the detection of significant increase or decrease of the trend is of great importance. It is, because it contributes to a better prediction or estimation.
There is extensive literature on trend estimation. Also several papers exist about uncertainty in trend estimation in the form of a point wise confidence interval for the trend estimates.
About uncertainty in trend differences is also written in some articles. But full uncertainty information is missing for statistics such as return periods or the chance of crossing thresholds.
My thesis will give renewal in the sense that it will describe a way to find this uncertainty information. I will present a way to find two new confidence intervals. The first one is a simultaneous confidence band for the generated trend and trend differences. The second confidence interval (as well point wise as simultaneous) is that of the exceedance of any threshold level. Besides giving a procedure to produce this, a new program in R will be presented called TrendspotteR 1.0. In TrendspotteR 1.0 a trend, trend differences, exceedance and differences in this exceedance of a threshold will be generated with maximum information on uncertainty. In the last chapter I will discuss a relevant expansion of TrendSpotteR 1.0 to estimate trends in data sets which contain some huge outliers and are distributed according to the Gumbel distribution.
Master Thesis in Mathematics Author: Krista Jaarsma Supervisor(s): E.C. Wit
Second supervisor: A.E.P. Veldman
External supervisor: A.C. Petersen, H. Visser Date: March 2013
Institute of Mathematics and Computing Science P.O. Box 407
9700 AK Groningen The Netherlands
Contents
1 Introduction 1
2 IRW trend estimation 3
2.1 Trend estimation . . . 3
2.2 Relevance of uncertainty information . . . 4
2.3 Kalman filter . . . 5
2.3.1 Filtering . . . 6
2.3.2 Smoothing . . . 7
2.3.3 Prediction . . . 7
2.4 IRW trend model . . . 7
3 Method and software 9 3.1 Trend ensembles and uncertainties . . . 9
3.1.1 Error in exceedance probability . . . 10
3.1.2 Simultaneous confidence bands . . . 16
3.2 TrendSpotteR 1.0 software . . . 18
3.2.1 Creating non-standard statistics . . . 19
3.3 Examples . . . 20
3.3.1 Spring temperatures 1906-2012 . . . 20
3.3.2 Global economic losses from weather-related disasters . . . 21
4 Gumbel distribution 27 4.1 GEV distributions . . . 27
4.2 Gumbel distribution in TrendSpotteR 1.0 . . . 28
5 Discussion and conclusion 33
Bibliografie 35
Appendix 39
iii
Introduction
Everywhere in the world people try to visualize environmental processes. Governments have to adjust their policy to the (expected) changes in this environmental processes. Is global warming a fact and how quickly is the earth’s temperature increasing? How big are the consequences? Is the sea level significantly rising, and how fast? Besides governments, also individuals and businesses are interested in trends in environmental data. The tourism and recreation sector for example, is interested in the changing temperature. Is it true that the summers will become hotter? This last question is also a question of interest for healthcare institutes since it is a well known fact that increasing temperatures in summer go along with an increase in the death rate. In addition to this few examples one can think of many more reasons why the study of behavior of environmental processes is essential.
Trend estimation is an important tool in the visualization of environmental processes.
Scanning the literature, one can find a lot of information about trend estimation. One im- poratant aspect of the trend estimation is the presence of uncertainty information about the estimation of the trend. More information about trend uncertainty results in better esti- mation which is important since the estimation mostly will be used for important decision making.
In this report, which is an extended version of the paper of Jaarsma et al.(2013) [[14]] (see Appendix), a unique method that generates uncertainty information for trend estimation that has not been found yet will be described; simultaneous confidence intervals and confidence intervals for the chance of crossing a threshold. Besides these uncertainties, the most pow- erful thing of this method is that full uncertainty information can be obtained for arbitrary functions of the trend and probabilities of exceedances of some level.
This thesis will not only present the new method, the implementation of the method in a new package for R called TrendSpotteR 1.0 will also be descibed. The package is an expansion of the TrendSpotter software developed by Visser, see Visser(2004) [[29]]. TrendSpotteR 1.0 is able to produce the trends and chances of exceedance together with differences in trends and exceedance likelihoods, along with full uncertainty information. In the report, the functioning of the TrendSpotteR 1.0 will be clarified with 2 examples.
Trend estimation in TrendSpotteR 1.0 is based on some normality assumptions. Not all datasets will satisfy this requirement. It is likely that series of extreme events do not satisfy the normality assumptions. Since it is the extreme events that often have the greatest impact, the last chapter of this thesis will focus on extreme environmental processes. One can think of floods, maximum temperatures reached each month, extreme rainfall, etc. It will be shown
1
how TrendSpotteR 1.0 can be extended for ’extreme’ datasets distributed according to a so-called Gumbel distribution, which is one of the extreme value distributions.
IRW trend estimation
2.1 Trend estimation
There is vast literature on the behavior of environmental processes. Questions like how did these processes behave in the past and how will it behave in the future, are of great interest.
Within this context trend analysis is an important tool in the study of environmental data.
As stated by Harvey (1989) [[12]], there are two reasons for wishing to estimate trends in (environmental) time series data. The first reason is to give a representation of the data, that changes relatively slowly (smoothly) over time. The second reason might be to find a series that, if extrapolated, gives the clearest indication of the future long-term movements in the series.
Scanning the environmental literature on trends, it shows that a wide variety of trend models have been applied. We mention some examples of trend models, without being com- plete: moving averages with a pre-defined window, ordinary least squares (OLS) linear trend models, polynomial trend fits, splines, exponential smoothing, trends based on Bayesian prin- ciples, trends in rare events based on logistic regression, smooth transition models, LOESS estimators, Binomial filters, GARCH trends, Friedman super smoothers, wavelets, structural time-series trend models, robust regression trends (MM or LTS regression) and Box-Jenkins ARIMA models and variations (SARIMA, GARMA, ARFIMA). Next to these trend models a number of statistical tests for trends exist: Students t test for sub-periods of time, Mann- Kendall tests for monotonic trends (with or without correction for serial correlation), and trend tests against long-memory time series.
It is important to realize that there is no best trend model. The choice of a particular model depends on ones goals or wishes. To clarify, we give three examples:
• Suppose one wants to estimate a trend using simple mathematics and software general available. For this type of problems, trend estimation by moving average (MA) is a good choice. However, MA has some drawbacks: (i) the choice of the averaging window is subjective, (ii) due to the window chosen, there is no trend estimate at the beginning and ending of the time series, and (iii) no uncertainty information is available.
• Another case could be that one is interested in a good prediction (extrapolation) per- formance. For this goal the group of ARIMA models is a good choice. However, these models have one important drawback: the actual trend in the sample period is not given (trend tests such as Mann-Kendall, have the same disadvantage).
3
• If one wants for example a trend estimate along with uncertainty information, using software general available, the OLS linear trend model is a good choice. This model has an obvious drawback: only linear trends are allowed. However, linear trends are not realistic for most environmental time series.
Please see Mills (2010) [[19]], Chandler and Scott (2011) [[3]], and Visser and Petersen (2012)[[31]] for more discussions on trend choices.
2.2 Relevance of uncertainty information
Confidence bands around a trend contain information about the certainty of the estima- tion. This is a really important aspect of trend estimation. The more is known about the (un)certainty of the trend estimation, the more can be said about the estimation. When a prediction is done about the amount of rainfall for the next 10 years based on measured data, it is not really likely that the true value for the next year will be equal to the estimated trend value in that specific year. But if the estimation also has 95% pointwise confidence intervals, we know that with a chance of 95% the amount of rainfall will lie between the upper and the lower limit of this confidence interval. In other words, our prediction is much stronger once we know the uncertainty of the estimation.
There are only a few flexible trend methods which give confidence bands for the trend estimates. We name LOESS estimators, DBM models and structural time series models (STMs). The model used throughout this thesis is called the Integrated Random Walk (IRW) model. This IRW trend model, sometimes denoted as smooth trend model, is a sub-model from the class of STMs. The IRW trend model can be viewed as a special case of the so- called local linear trend model where the noise component level equation is set to zero (e.g., Commandeur and Koopman, 2007 - Ch. 3) [[6]]. If uncertainty is presented, it is by showing pointwise confidence limits for the trend µt. In paragraph 2.4 we will show the formulae for the IRW trend model and go a little bit more into detail.
Visser (2004) [[29]]showed that the application of the IRW model, in combination with the discrete Kalman filter, allows one to estimate uncertainties in any trend difference µt− µs , where t ≤ s. These differences and uncertainties are important since it allows one to detect significant decreases or increases. In Visser (2004) [[29]] and Visser and Petersen (2012) [[31]], the following 3 functions of time are proposed:
• µt with corresponding 2σt pointwise confidence bands, t = 1, , N .
• µt− µ(t−1), the first difference with corresponding 2σt pointwise confidence bands, t = 2, , N .
• µN − µt, the lagged differences with corresponding 2σt pointwise confidence bands, t = 1, , N .
Another advantage of having uncertainty information is laid in the possibility of calculating chances for crossing pre-defined thresholds. E.g., if one estimates a trend in annual averaged temperatures it may be of interest to calculate the time-varying probabilities of crossing a pre-defined temperature threshold. Please see Visser and Peterson (2012) [[31]] for an example (their figures 1, 4 and 6). The inverse of these chances are denoted as return periods.
In section 3.1 we introduce the method of estimating trend ensembles. A trend ensemble is a set of, say, 1000 IRW trend realizations which might occur with equal chance, based
on the same IRW trend model. We will show how these trend realizations can be derived from the Kalman filter equations (section 2.3) and how these ensembles can be applied to extend the possibility of estimating uncertainty bands. One such extension, discussed in paragraph 3.1.2, is the estimation of simultaneous confidence bands, in addition to pointwise confidence bands. The advantage of having simultaneous confidence bands is that these bands appeal to the intuition of people. If we ask colleagues if they understand what the 95%
uncertainty bands around a trend means, they generally believe that the true trend will lie between the upper and lower limits, with a chance of 95%. However, that interpretation is only true for simultaneous confidence bands, not for pointwise confidence bands (cf. http:
//en.wikipedia.org/wiki/Confidence_band). Thus, presenting simultaneous confidence bands rather than pointwise confidence limits, has communicational advantages.
Finally, we will show how trend ensembles allow one to construct confidence bands for the probability of crossing thresholds. Uncertainties for these chances or their reverses, average return periods, will be derived. These uncertainties cannot be gained from standard theory (cf. Visser and Petersen, 2012, Section 7 [[31]]) .
2.3 Kalman filter
Structural time series models can be estimated by the Kalman filter. The Kalman filter needs this models to be in the so called state-space form. State space models consists of a measurement equation
yt= c0tαt+ ξt (2.1)
and a transition equation
αt= T αt−1+ ηt (2.2)
here, the prime stands for transpose. yt stands for the observation at time t so this is a scalar. αt is the k × 1 state vector, T is the k × k known system matrix and ct is a k × 1 vector. ηtis the k × 1 vector of disturbances and ξtis a scalar disturbance term. ηtand ξtare distributed independently of each other. Assuming that the concerning model is Gaussian, both ηtand ξtare normally and independently distributed. In other words ηt∼ NID(0, σ2Q) and ξt∼ NID(0, σ2ht). Here, σ2 and ht are both scalars and Q a k × k diagonal matrix. In the R file the following assumptions are made:
• The matrices T and Q are a priori known time-independent matrices. When Q is not known a priori, it can be determined using Maximum Likelihood optimization.
• Q is a diagonal matrix with variances of the noise vector ηton its main diagonal.
• ct and htare known. ht= 1 when the observation yt is not missing.
The Kalman filter will return (in case of a Gaussian model) the Minimum Mean Square Linear Estimator (MMSLE) of αt. For more details, see Harvey(1989, Ch.3). Now, let the vector at|s denote the Minimum Mean Square Estimator (MMSE, which is an optimal estimator) of αt conditional upon all information up to and including time s. More details about this can be found in Anderson and Moore (1979,Ch.5) [[22]].
at|s = E [αt|y1, · · · , ys] (2.3)
The covariance matrix of at|s− αt, which is also called the Mean Square Error (MSE) matrix, is denoted by
σ2Pt|s = E(at|s− αt)(at|s− αt)0|y1, · · · , ys
(2.4) The Kalman filter knows three different situations for which it generates an optimal esti- mate of αt.
• t = s which is called filtering
• t < s which is called smoothing
• t > s which is referred to as prediction
We will clarify and present a short overview with the corresponding equations for these situations in the subparagraphs below.
2.3.1 Filtering
The Kalman filter is recursive which means that at a certain time t − 1 we predict αt and Pt
both for time t. At the time yt is available, the so called one-step-ahead prediction error will be computed. This is the difference between the real value of yt and the prediction of this value at time t − 1. Since yt is known at this moment, the estimates for a and P at time t can now be updated using the additional information of yt. In formulae:
Prediction equations:
at|t−1= T at−1|t−1 (2.5)
Pt|t−1= T Pt−1|t−1T0+ Q (2.6)
One-step-ahead prediction error or innovation:
nt|t−1 = yt− c0tat|t−1 (2.7)
The one-step-ahead prediction error is normally and independently distributed, i.e. nt|t−1∼ N ID(0, σ2ft|t−1) with ft|t−1:
ft|t−1= c0tPt|t−1ct+ ht (2.8)
Updating equations:
at|t= at|t−1+ Pt|t−1ctnt|t−1ft|t−1−1 (2.9)
Pt|t= Pt|t−1− Pt|t−1ctc0tPt|t−1ft|t−1−1 (2.10) The initial value, to start the filter, a0|0 can be taken arbitrary, take for example a0|0 equal to the first observation. The initial value P0|0 = κI, here κ is a large but finite number and I is the k × k identity matrix. Take κ for example equal to 100 times the first observation.
In this case you are sure that it is a finite number, but also that it is large relatively to the observations. Now, the filter will generate its own starting values in, say, Ns iterations. Ns is also refered as the adjusting time.
2.3.2 Smoothing
Once, the values of at|tare all available, it is possible to smooth this estimations. This is done by using not only the information of ys where s ≤ t but also the additional information of ys
where s > t.
Three types of smoothing can be used. For more information about this smoothing types, see Anderson and Moore (1979, Ch.7) [[22]]. The one that is of interest in our type of trend estimation and the one that is used in the R file is fixed-interval smoothing. This fixed-interval smoother works with backwards recursion and is denoted by at|N with N the sample size and t ≤ N .
at|N = at|t+ Pt∗(at+1|N − T at|t) (2.11)
Pt|N = Pt|t+ Pt∗(Pt+1|N − Pt+1|t)Pt∗0 (2.12)
Pt∗ = Pt|tT0Pt+1|t−1 (2.13)
with t = N − 1, · · · , 1. The smoothed innovations or residuals are defined by
nt|N = yt− c0tat|N (2.14)
This residuals are normally distributed, i.e. nt|N ∼ N (0, σ2ft|N) with
ft|N = c0tPt|Nct+ ht (2.15) 2.3.3 Prediction
The Kalman filter is also able to do predictions of trend estimates. After the processing of the N observations by the Kalman filter, the ith prediction for aN +i|N and PN +i|N can be generated with the following recursive formula:
aN +i|N = T aN +i−1|N (2.16)
PN +i|N = T PN +i−1|NT0+ Q (2.17)
Now, the predictions of the observations yN +ibased on yN are obtained as follows:
yN +i|N = c0N +iaN +i|N (2.18)
2.4 IRW trend model
Like we mentioned before, the trend model chosen for this thesis is the integrated random walk (IRW) trend model. This model can be seen as a natural extension of the OLS linear trend:
both linear and more flexible trends can be estimated, along with uncertainty information.
Furthermore, the trend is given over the full sample period and predictions can be generated along with uncertainty bands. Finally, the flexibility of the trend can be chosen by maximum likelihood optimization (based on minimization of one-step-ahead prediction errors). The presence of uncertainty information is an important feature of this approach as we have shown in paragraph 2.2. In this paragraph, we will elaborate on the IRW trend model. Other
references for IRW trends are: Kitagawa (1981) [[16]], Young et al. (1991) [[33]], Harvey (1989) [[12]], Mills (2010) [[19]], Visser (2004) [[29]], and Visser and Petersen (2009, 2012) [[30], [31]].
The IRW trendmodel has the following form:
Measurement equation:
yt= 1 0µ1,t
µ2,t
+ ξt (2.19)
Transition equation:
µ1,t µ2,t
=2 −1
1 0
µ1,t−1 µ2,t−1
+λt
0
(2.20) From this equations it is easy to see that the matrices T , ct, αtand Q for the IRW-trendmodel are:
T =2 −1
1 0
(2.21)
ct=1 0
(2.22)
αt=µ1,t
µ2,t
(2.23)
Q =σ2λ 0 0 0
(2.24) There are still some unknowns in the model like σ2λ in the matrix Q and σ2. This values can be estimated using Maximum Likelihood estimation. The log-likelihood of the filtering procedure is based on the normality of the noise processe ξt and ηt. It is derived from the prediction-error decomposition
log L = −1 2
"
(N − Ns) ln(2πσ2) +
N
X
Ns+1
ln(ft|t−1) + 1 σ2
n2t|t−1 ft|t−1
!#
(2.25)
with Ns the adjusting time. By maximizing this function the estimated value of σ2 can be found:
ˆ
σ2 = 1 N − Ns
N
X
t=Ns+1
n2t|t−1
ft|t−1 (2.26)
Substituting ˆσ2 into log L and multiplication by −2 gives the concentrated log-likelihood function Lc
log Lc=
N
X
t=Ns+1
log(ˆσ2M Lft|t−1) (2.27) The value of ˆσM L2 can now be found by minimizing log Lc.
Method and software
3.1 Trend ensembles and uncertainties
Using the IRW trend model and the Kalman filter for a suitable time series, optimal estimates for the trend µt can be found given the measurements y1, ...., yN. Now, the Kalman filter generates estimates of the trend values µ = (µ1, ..., µN) that follow a multivariate normally distribution if ηt ∼ N (0, σ2η) and t ∼ N (0, σ2) for all times t. In chapter 2 we showed the working and the formulae of the Kalman filter. From this fomulae we can deduce the mean ˆµ = (ˆµ1, ..., ˆµt, ..., ˆµN) and the N × N covariance matrix Σ of this multivariate normal distribution. They have the following form:
ˆ
µ = (ˆµ1, ..., ˆµN) with ˆµt= (10)at|N (3.1)
Σ = V ar(µ) = σ2M L
P1,1|N1,1 · · · P1,N |N1,1 ... . .. ... PN,1|N1,1 · · · PN,N |N1,1
(3.2)
where the equation for Ps,t|N is given by Jong and Mackinnon (1988) [[7]]:
Ps,t|N = Ps∗Ps+1,t|N (3.3)
The matrix Ps∗ is given in 2.13 and Pt,t|N = Pt|N holds.
Once ˆµ and Σ are known, we can generate a series of M trends based on this multi- variate normal distribution. There are two general approaches to sampling µ(k) from the N dimensional multivariate normal distribution N (ˆµ, Σ). The first approach is based on con- ditional distributions (Ripley, 1987 [[23]]). The second approach is the approach followed in the R function mvrnorm(), found in the library MASS. This function is implemented in the TrendSpotteR software. Library MASS supports functions and datasets described in Ven- ables and Ripley (2002) [[26]]. Function mvrnorm() makes use of the fact that if V1, , VN are iid N(0,1) random variables and A is an NxN matrix, then µ(k) = ˆµ + AV is multivariate normal with mean ˆµ and covariances E[(µ(k)i − ˆµi)(µ(k)j − ˆµj)] being the (i, j) element of the covariance matrix. Because of the linearity of the expectation this covariance matrix is equal
9
to
E[(µ(k)− ˆµ)(µ(k)− ˆµ)] = E[(AV )(AV )] (3.4)
= E[AV VTAT]
= AE[V VT]AT
= AIAT
= AAT
Thus, to generate random vectors µ(k), we have to find A, the square root of Σ, or Σ = AAT. To find matrix A the R function mvrnorm() uses the method of eigenvalue decomposition.
For details please see Kaas et al. (2008) [[15]]. The set of M trends constructed in the way outlined above, will be denoted here as a trend ensemble.
The advantage of having a trend ensemble is laid in the fact that the computation of con- fidence bands is made easy. Only simple statistical concepts are needed. E.g., the trend esti- mate ˆµt and its corresponding variance σ(µ,t)2 follow directly from the M estimates ˆµ1t, ..., ˆµMt : just take the mean and variance of these M simulated trend estimates at time t. For large M this mean and variance will converge to the mean and variance which follow directly from the Kalman filter formulae. In the same way trend differences and uncertainty bands are calculated for any trend difference µt− µs . Since we have M estimates ˆµ1t, ..., ˆµMt and M estimates ˆµ1s, ..., ˆµMs we also have M estimates for ˆµt− ˆµs. Thus, we can calculate means, variances or any percentile on these M values. E.g., 95% pointwise confidence bands follow from the 2,5th percentile and the 97,5th percentile.
In fact, any possible trend function with corresponding confidence interval can be gener- ated. One such trend function is the calculation of trend accelerations. In climate change research it is of interest if trends in global mean sea level rise show accelerations or in other words, is the second derivative of µt positive? Analoguous to the trend differences µt− µswe can compute M differences µt− 2µ(t−1)+ µ(t−2)along with any confidence band of interest.
Confidence limits for the chance of crossing a threshold L (denoted by PtL) can also be derived from trend ensembles. By calculating the variance σk2of the residuals y1− ˆµk1, ..., yN − ˆ
µkN we can compute chances Pt(L,k), t = 1, ..., N assuming ˜ykt to be normally distributed:
˜
ytk ≈ N (ˆµkt, σk2). In this way we can generate M chances Pt(L,1), ..., Pt(L,M ), and uncertainty bands for PtL follow by computing variances or percentile values of interest based on these M values Pt(L,k). Also uncertainties for any chance difference PtL− PsLcan be computed. To check the uncertainty of the exceedance of some threshold, we will generate in the subsection below a theoretical expression of the variance of this exceedance.
3.1.1 Error in exceedance probability
Let us denote the chance to exceed a threshold of level L by PtL. TrendSpotteR has an option to compute this chance of exceedance, along with its 95% confidence intervals. As mentioned earlier, this is done by use of the generated trend ensemble. Because there is no literature about the uncertainties of the estimation of PtL it is case to check this uncertainties. We will compare the standard deviation of PtL we found with the use of the trend ensemble with a theoretical expression for the variance of this chance.
The standard deviation of PtL for every time t, calculated from the trend ensemble with 1000 PtL-curves (call this matrix PEnsemble) can be found by one single command
in TrendSpotteR:
SDPtl < − apply(PEnsemble,1,sd)
In the continuation of this paragraph, we will generate a theoretical expression for this standard deviation. Again we note that PtL equals the chance to exceed level L at time t.
By definition:
PtL= Z ∞
L
√1 2πσt
exp
−12
y−µt σt
2
(3.5)
We can write ˆσ2t as follows:
ˆ
σ2t = σˆ2+ ˆσ2η (3.6)
= σˆM L2 ht+ ˆσM L2 Pt|N1,1
The Taylor expansion for the second moment generates an approximation for the variance of the probability of exceeding some pre-defined threshold.
V PtL
= V (g(ˆµt, ˆσ2t)) (3.7)
≈ ˙g µt
σt2
0
V ˆµt
ˆ σt2
˙g µt
σt2
with g equal to PtL. The next thing to do is to generate an expression for the components of these matrices.
Derivation of ˙g µt σt2
.
˙g µt
σt2
is equal to
δ
δµtg(µt, σ2t)
δ
δσtg(µt, σt2)
!
. The elements of this vector will be derived below.
δ δµt
g(µt, σ2t) = δ δµt
[PtL] (3.8)
= δ
δµt Z ∞
L
√1
2πσtexp
−1
2
y−µt σt
2 dy
= Z ∞
L
δ δµt
√ 1 2πσt
exp
−12
y−µt σt
2 dy
= Z ∞
L
y − µt σt
1 σt
√ 1
2πσtexp
−1
2
y−µt σt
2 dy
= 1
σ2t Z ∞
L
√1 2πσt
exp
−12
y−µt σt
2
ydy − µt
Z ∞ L
√1 2πσt
exp
−12
y−µt σt
2 dy
= 1
σ2t
Z ∞ L
√1
2πσtexp
−1
2
y−µt σt
2
R∞ L
√1
2πσtexp
−12
y∗−µt σt
2
dy∗
ydy − µt
Z ∞ L
√1 2πσt
exp
−1 2
y∗− µt σt
2 dy∗
= 1
σ2t[EYt,L− µt]PtL
= 1
σt2[µt+ σt(PtL)−1φ(L) − µt]PtL
= 1
σt
φ(L)
with φ(L) equal to the probability density function N (µt, σ2). Furthermore EYt,L is the mean of the truncated normal distribution. Truncated distributions are conditional distribu- tions that results from restricting the domain of some other probability distribution. This truncated distributions arise a lot in practical statistics where one is interested in the values which lie above or below some given threshold or within a specified range. The cumulative dis- tribution function f for a ≤ x ≤ b of the truncated normal distribution has the following form:
f =
1
σφ(x−µσ ) φ(b−µσ ) − φ(a−µσ )
with φ(ξ) equal to the probability density function of the standard normal distribution and φ its cumulative distribution function. In our case a = L and b = ∞. When b = ∞, φ(b−µσ ) = 1.
Knowing this, one is able to see why step 6 and 7 of 3.8 are equal to eachother. The following step is equal since the expected value of a random variable of this one sided truncated normal distribution is equal to µ + σ φ(
L−µ σ ) 1−φ(L−µσ ).
We do something similar for δσδ2
tg(µt, σ2t).
δ
δσ2tg(µt, σ2t) =
Z δ
δσt2[PtL] (3.9)
= δ
δµt
Z ∞ L
√1 2πσt
exp
−12
y−µt σt
2 dy
= Z ∞
L
1
p2πσt2 exp
−1
2
y−µt σt
2
y − µt σt2
2
− exp
−1
2
y−µt σt
2
1 2√
2π(σt2)(3/2)dy
= Z ∞
L
1 2p
2πσ2t exp
−12
y−µt σt
2
y − µt σt2
2
− 1 σt2
dy
= Z ∞
L
1 (σt2)22p
2πσt2 exp
−1
2
y−µt σt
2
(y − µt)2− σt2
dy
Derivation of V ˆµt ˆ σt2
. V ˆµt
ˆ σ2t
can be computed by the inverse of the Fisher information matrix, which is the neg- ative value of the expected Hessian matrix. The Hessian matrix of the log-likelihood has the following form:
H =
δ2ln L δµ2t
δ2lnL δµtδσ2t δ2lnL
δµtδσ2t
0 δ2ln L δ(σ2t)2
(3.10)
H is an 2x2 matrix where the elements of the matrix δ2δµln L2
t are known from the output of the Kalman filter and is denoted by σM L2 Pt|N. The values of the elements in δµδ2lnL
tδσ2t and δ(σδ2ln L2 t)2 are still unknown.
Derivation of δδ(σ2ln L2 t)2. We know that
ln L = −1 2
(N − Ns) ln(2πσ2t) +
N
X
t=Ns+1
ln(ft|t−1) + 1 σt2
η2t|t−1 ft|t−1
(3.11)
The first and second order derivative of the log-likelihood are:
δ2ln L(σt2)
δσt2 = −N − Ns
2σ2t + 1 2(σt2)2
N
X
t=Ns+1
η2t|t−1 ft|t−1
(3.12)
δ ln L(σt2)
δ(σt2)2 = N − Ns
2(σt2)2 − 1 (σt2)3
N
X
t=Ns+1
η2t|t−1 ft|t−1
(3.13)
The Hessian of σ2t can be computed as follows:
H(σ2t) = δ2ln L(σt2)
δ(σ2t)2 (3.14)
= (N − Ns) 2(σt2)2 − 1
(σ2t)3
N
X
t=Ns+1
ηt|t−12 ft|t−1
E[H] = E (N − Ns) 2(σt2)2 − 1
(σ2t)3
N
X
t=Ns+1
ηt|t−12 ft|t−1
(3.15)
= E (N − Ns) 2(σt2)2
−
1 (σt2)3
N
X
t=Ns+1
ηt|t−12 ft|t−1
= (N − Ns) 2(σ2t)2 − 1
(σt2)3E
N
X
t=Ns+1
ηt|t−12 ft|t−1
= (N − Ns) 2(σ2t)2 − 1
(σt2)3(N − Ns)σt2
= (N − Ns)
2(σ2t)2 − (N − Ns) (σ2t)2
= −(N − Ns) 2(σt2)2 This is true since (N −N2(σ2 s)
t)2 and (σ12
t)3 are constants and σt2= E
PN
t=Ns+1
η2t|t−1 ft|t−1
by definition of our maximum likelihood estimator. The next thing to do is to compute the variance of this variance σ2t.
V (σt2) = −(E[H])−1 (3.16)
= (N − Ns) 2(σt2)2
−1
= 2(σ2t)2 (N − Ns)
The standard deviation is of σt2 is now simply the square root of the variance of σt2. Derivation of δµδ2ln L
tδσt2
Since the Kalman filter states µt= c0tat|t−1 and ηt|t−1= yt− c0tat|t−1, we have:
ln L = −1 2
(N − Ns) ln(2πσ2t) +
N
X
t=Ns+1
ln(ft|t−1) + 1 σt2
η2t|t−1 ft|t−1
(3.17)
= −1 2
(N − Ns) ln(2πσ2t) +
N
X
t=Ns+1
ln(ft|t−1) + 1 σt2
(yt− µt)2 ft|t−1
The derivative of ln L with respect to µt equals:
δ ln L δµt
= 1 σt2
N
X
t=Ns+1
(µt− yt)
ft|t−1 (3.18)
Now, it is easy to see that:
δ2ln L
δµtδσt2 = − 1 σ4t
N
X
t=Ns+1
(µt− yt)
ft|t−1 (3.19)
Taking the expectation, we get:
E
− 1 σ4t
N
X
t=Ns+1
(µt− yt) ft|t−1
= − 1 σ4
N
X
t=Ns+1
E (µt− yt) ft|t−1
(3.20)
= − 1 σ4
N
X
t=Ns+1
E (µt− (µt+ ξt)) ft|t−1
= − 1 σ4
N
X
t=Ns+1
E
−ξt ft|t−1
= − 1 σ4
N
X
t=Ns+1
E[−ξt] E[ft|t−1]
= − 1 σ4
N
X
t=Ns+1
0 E[ft|t−1]
= 0 Every element of the matrix V ˆµt
ˆ σt2
is derived now. The matrix has the following form:
V ˆµt
ˆ σt2
= V "σ2M LPt|N 0 0 (N −N2(σ2t)2
s)
#
(3.21)
Check for the confidence intervals of PtL.
We found a theoretical expression for the variance and the standard deviation (by taking the square root of this variance) of PtL.
As a simplification of 3.7, we can make the assumption that the variance of σ2 = 0, which would make the computations a little easier. The expression for the variance of PtL then becomes
V [PtL] =
δ
δµtg(µt, σ2t)
2
V [µt] (3.22)
To check whether both methods, i.e. the theoretical derivation (3.22) and the derivation with the use of the trend ensemble for the standard deviation of PtL, result in a corresponding value for the standard deviation, we want the fraction of both standard deviations to be
about 1 ± 0.2. It appears that the fraction satisfies this requirement when the dataset is not too small (around number of observations ≥ 50). To clarify this, we have drawn 60 random variables from the standard normal distribution and computed the fraction for level 2, see column 5 of figure 3.1. As one can see, both methods find similar values for the standard deviation of the exceedance of level 2 since the fraction is between 1 and 1.2. For a smaller number of observations, the requirement is not satisfied, see figure 3.2. One idea is that this is due to the simplifying approximation of 3.22 instead of 3.7 and that we need to take the variance of σt2 into account instead of using the simplifying assumption of V [σt2] ≈ 0. What turns out when we compare the fraction with the use of 3.7 (column 6 of figure 3.1 and 3.2) and 3.22 (column 5 of figure 3.1 and 3.2) is that the difference is negligible, and the results are quitte similar. The deviation of both methods for small data sets can probably be explained by ommiting the second order and higher order terms in the Taylor expansion 3.7. Probably they are not negligible due to the high variance in a small data set.
3.1.2 Simultaneous confidence bands
Besides the creation of confidence intervals for PtL, another application of trend ensembles is the calculation of simultaneous confidence bands, in addition to pointwise confidence bands.
Simultaneous confidence bands give upper and lower limits for the trend as a whole. In the context of trend ensembles simultaneous 95% confidence limits can be estimated by finding a tight envelop around the estimates ˆµ1, ..., ˆµN such that exactly 0.95 ∗ M simulated trends fall within this envelop, or equivalently, the choice of the 0.05 ∗ M extreme trends which fall outside the envelop. This choice is not unique since the term extreme is not unique. The pro- cedure we have chosen to implement in the TrendSpotteR software is as follows (to illustrate we choose an ensemble of 1000 trends, so M = 1000):
• Call the ensemble with 1000 trends Tt,i , with i = 1 · · · , 1000.
• Rt,i is the ranked matrix where all 1000 values from Tt,i are ranked for each time t from small to large.
• Compute the relative rank R0t,i = |Rt,i−1000+12 | for each time t.
• Consider Mi to be the matrix with for every trend i the sum of the 5 values of Tt,i with highest value in R0t,i.
• Compute qi= 95th percentile of Mi.
• Call Tt,i∗ the matrix where trends with Mi ≥ qi are removed from Tt,i.
(Note if > 50 trends Mi≥ qi, then remove the (say) n trends with Mi> qiand randomly remove n − 50 from the trends with Mi = qi)
• Now, the upper band for the simultaneous confidence interval is UB= maxt[Tt,i∗] and the lower band is LB= mint[Tt,i∗].
We note that simultaneous confidence limits are rarely found in the literature (Liu et al., 2008).
Figure 3.1: 60 random draws from the standard normal distibution. Column 2 shows the standard deviation of the exceedance of level 2 computed by the TrendSpotteR 1.0 method, Column 3 shows the standard deviation analytical computed with use of 3.22, Column 4 shows the extended analytical version of the standard deviation 3.7, Column 5 represents the fraction of Column 2 divided by 3 and the last column is the fraction of 2 divided by 4.
Figure 3.2: 20 random draws from the standard normal distibution. Column 2 shows the standard deviation of the exceedance of level 2 computed by the TrendSpotteR 1.0 method, Column 3 shows the standard deviation analytical computed with use of 3.22, Column 4 shows the extended analytical version of the standard deviation 3.7, Column 5 represents the fraction of Column 2 divided by 3 and the last column is the fraction of 2 divided by 4.
3.2 TrendSpotteR 1.0 software
The theory set out in chapter 2 and section 3.1 has been programmed in R. As such the software estimates IRW trends, along with 95% pointwise and simultaneous confidence bands, idem for trend differences µt− µ(t−1) and µN − µt, and idem for probabilities of exceeding thresholds (PtL) or differences thereof (PNL − PtL). The software includes a main program (script), along with four functions. Furthermore, a concise users manual is supplied along with the two datasets described in section 3.3. The files are as follows:
• ReadMe.txt . This file is the manual for running the software and addresses error messages.
• TrendSpotteR.R. This file contains the starting script of the software.
• OptionScreen.R. This function creates the opening screen for filling in the trend options.
See figure 3.3 for an example.
• Run.R, This function contains the algorithms for the Kalman filter and smoothing equations. Also the routine for maximum likelihood optimization is implemented here.
The optimization method chosen is a simple brute force shooting method. This method avoids convergence to local minima in the one-dimensional parameter space.
• RunLn.R, Idem for log-transformed data.
The two datafiles added to the software, are called CNT Season.dat (cf. section 3.3.1) and GlobalDamage.dat (cf. section 3.3.2). These input files are provided in ASCII format (other formats are not supported by TrendSpotteR 1.0).
The TrendSpotteR 1.0 software output is organized in three folders that will be created in the R working directory when running the program. The first folder is called Figures. This folder includes all graphs TrendSpotteR 1.0 generates (cf. examples in section 3.3, and figures 3.4, 3.5, 3.6 and 3.7). The second folder, called OutputDir, provides the data underlying the graphs in Figures. These files can be used to design more advanced graphical presentations.
OutputDir also includes a file called Options, from which one can retrieve the input option information. The third output folder is called Ensemble, and consists of the ensemble files corresponding to trends, trend differences, threshold probabilities and differences in threshold probabilities (cf. figures 3.4b and 3.6b). Each of these files contains a first row with the time variable, followed by 1000 rows with simulated trends or threshold probabilities.
If TrendSpotteR 1.0 is loaded for the first time it will automatically install the R packages mvtnorm, tseries, tcltk, prettyR and MASS. Once TrendSpotteR 1.0 is running, an option screen will be shown (figure 3.3). Options are:
• a log transformation to the data. If this option is chosen, the IRW model is estimated on log-transformed data and estimation results will be back transformed to the original scale (except for innovations) .
• one can choose for maximum likelihood optimization or set the smoothing parameter by hand.
• if the data file contains missing data, the missing data code can be given here.
• if threshold exceedances are of interest, a threshold can be given.
• if predictions for future trend values are of interest, the number of predictions can be given here.
Now, the software will estimate trends and corresponding uncertainty bands. However, one has to check the validity of a number of assumptions underlying IRW trends. A first check is the whiteness of residuals (the standardized innovation series, in Kalman filter terms). Also the homoscedasticity of residuals should be checked. Additionally, the normality of residuals should be checked. This normality is a prerequisite for using trend ensembles (and thus for estimating simultaneous confidence bands). We note that the pointwise confidence bands for trends and trend differences in TrendSpotter do not depend on the normality of residuals.
However, 95% confidence limits should be read as 2 − σ confidence limits if normality is violated.
Standard graphs in TrendSpotter for checking assumptions, are the QQ-plot, the ACF plot and the acutal plot of residuals. For more detailed information on checking residuals we refer to Harvey (1989, Ch. 5) [[12]], and Chandler and Scott (2011, Section 3.3) [[3]].
3.2.1 Creating non-standard statistics
In this Section we will give two examples of simple extensions of the TrendSpotteR software, based on basic knowledge of R.
Assume one wants to present 80% pointwise confidence bands of the trend estimate µt
instead of 95%. These 80% confidence bands can be easily deduced from the trend ensemble matrix T , containing 1000 simulated trends. The upper limit is simply generated by taking the 90th percentile of the matrix T , the lower limit by the 10th percentile. In R one can
simply use the following commands to create these two limits:
upperLimit < − apply(T,2,quantile,probs=0.10) lowerLimit < − apply(T,2,quantile,probs=0.90)
A second example. Suppose one wants to do a Box-Cox transformation to the data Xt(the measurements). This transformation can be useful if residuals are not normally distributed (Chandler and Scott, 2011 - Section 3.3.2) [[3]]. Box-Cox transformations have the following form:
Yt=
( (Xtλ−1) λ , λ 6= 0
log(Xt), λ = 0 (3.23)
where λ denotes the power of the transformation. The case where λ = 0 (log transformation) is included in the TrendSpotteR software. However if λ 6= 0, one may want to transform the trend results back to the original scales of Xt, or Xt= (λYt+ 1)1λ. The following statement will place the transformed trend data, present in matrix T , in a new matrix T T :
TT < −(lambda ∗ T + 1)ˆ(inv(lambda))
Now, any means, variances and percentiles can be computed, based on this T T matrix. E.g., the following statements yield the 50, 2.5 and 97.5 percentiles (95% pointwise uncertainty bands):
Trend < − apply(TT,1,quantile,probs=0.5)
upperLimit < − apply(TT,2,quantile,probs=0.025) lowerLimit < − apply(TT,2,quantile,probs=0.975)
Analoguous, trend differences can be computed on the basis of matrix T T , along with point- wise uncertainty bands.
In chapter 4 we will show another example of creation of non-standard statistics with the use of TrendSpotteR 1.0. It will be explained how trends can be estimated with the use of TrendSpotteR 1.0 for data which are distributed according to a Gumbel distribution.
3.3 Examples
Trend estimation with TrendSpotteR 1.0 is illustrated here for two examples. This first example concerns a series of century-long spring temperatures in the Netherlands. The second example is on global economic damage by weather-related disasters.
3.3.1 Spring temperatures 1906-2012
Seasonal temperatures in the Netherlands are given in the ASCII file CNT Season.dat. This file is part of the TrendSpotteR 1.0 software. The data file consists of six columns: Year, win- ter temperatures, spring temperatures, summer temperatures, autumn temperatures, winter temperatures and annual averages, all expressed in ◦C. Here, we estimate an IRW trend model for spring temperatures, along with probabilities for crossing the threshold of 11.0 ◦ C.
The corresponding input screen is given in figure 3.3. Background information on these temperature data has been given by Schrier et al. (2011) [[24]].
The TrendSpotteR 1.0 run generates six graphs with trends, trends differences, probabil- ities of crossing the threshold of 11.0◦C and differences thereof. See figures 3.4a through f.
Figure 3.4a shows the estimated trend ˆµt, along with its corresponding confidence intervals (both simultaneous and pointwise). Figure 3.4b shows all simulated 1000 trends which have
Figure 3.3: Input options screen for the analysis of spring temperatures in the Netherlands, 1906-2012
been used to compute these simultaneous confidence bands. Figure 3.4c shows the first dif- ferences µt− µ(t−1), along with confidence bands. The graph shows that these differences are statistical significant (α = 0.05) over a short time interval: the lower limit of simultaneous confidence bands lies above zero over the period 1984-2007. In the same way figure 3.4d shows that the trend differences µ2012− µt are statistically significant positive for all years before 1906-2009.
Figure 3.4e shows the probability of crossing the threshold of 11.0◦C. The graph shows that the probabilities are near zero over the period 1906-1980, with a rapid acceleration over 1981-2012. The probability in 2012 has risen to 0.20. The uncertainty bands show this value to be statistically significant. The corresponding average return period is calculated from the inverse probability: once in five years. Finally, figure 3.4f shows that the probability P201211.0 is significant higher than all other probabilities Pt11.0.
TrendSpotteR 1.0 generates three graphs for checking model assumptions, see figure 3.5.
Both ACF (figure 3.5a) and the innovation plot (figure 3.5b) show that the innovations are in good agreement to be white noise processes. All 22 autocorrelations are within the 2 − σ confidence limits and show no periodic behavior. We have checked the homoscedasticity of residuals by visual inspection of figure 3.5b. Figure 3.5c shows the normality plot for the innovations. The graph shows that these innovations are reasonable centered around a straight line. Thus, no log-transformation or other Box-Cox transformations are needed here.
3.3.2 Global economic losses from weather-related disasters
Visser et al. (2012) [[27]] have analysed trends in weather-related disaster data, 1980-2010.
These data have been extracted from the CRED database on natural disasters. Here, we show the TrendSpotteR results for one series, that of global economic losses from weather- related disasters. Weather-related disasters are disasters due to hurricanes, extratropical