• No results found

Analysis of weather-related disasters using extreme value theory

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of weather-related disasters using extreme value theory"

Copied!
42
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

David Bartels | January 6, 2021

Analysis of weather-related

disasters using extreme value

(2)

Second assessor: dr. A.A. Tsvetkov

(3)

Abstract

by David Bartels

(4)

Abstract i

1 Introduction 1

1.1 Research questions . . . 2

2 Theory 3 2.1 Extreme value theory . . . 3

2.1.1 Block-maxima approach . . . 4

2.1.2 Peaks-over-threshold approach . . . 5

2.1.2.1 Threshold selection . . . 7

2.2 Exposure and vulnerability . . . 8

2.3 Hypotheses . . . 9

3 Data and methodology 10 3.1 Disaster data . . . 10

3.2 Normalisation of losses . . . 12

3.3 Candidate covariates . . . 14

3.3.1 Temperature anomaly (TA) . . . 15

3.3.2 El Ni˜no and Southern Oscillation (ENSO) . . . 16

3.3.3 IPCC temperature anomaly projections . . . 18

4 Results and discussion 19 4.1 Block-maxima . . . 19

4.1.1 Stationary. . . 19

4.1.2 GEVD with covariates . . . 20

(5)

Contents iii

(6)

Introduction

As climate change intensifies in the decennia to come, it is expected to increase the likelihood of high-loss extreme weather events occurring. For example, a heatwave that occurred in central Europe in 2003 and caused 13.1 billion in damages, was made four to ten times more likely to occur due to human interference with the climate (Phelan, 2011 [16]). More recently, California experienced record-breaking wildfires in 2017 and 2018, with estimated economic losses exceeding$40 billion. A study published this year found that climate change more than doubled the observed frequency of autumn days with extreme fire weather in recent years, compared to the 1980s (Goss et al., 2020 [8]). These facts show that insurability of extreme weather risk is jeopardised by climate change in a number of ways. From the (re)insurer’s point of view, the possibility of extremely high losses can make the amount of capital required to guarantee solvency unfeasible. Furthermore, due to climate change, certain risks can become too hard to quantify, which is a prerequisite for actuarial insurability. On the other side, rising premiums due to higher average losses and higher volatility might make insurance too expensive for policy holders to afford. This extends to insurers, who are also policy holders when they opt to reinsure (part of) their portfolio (Charpentier, 2008 [4]). In order to get a good grasp of the projected future losses due to extreme weather, average losses per event are not detailed enough. Since events can occur that are so severe that the insured loss exceeds the capital of an insurance company, reinsurance is required to make sure such an event does lead to bankruptcy of the insurer. This risk cannot be quantified just by using average losses per event. The distribution of losses per event, however, is much more useful in this regard. It can be used to analyse what type of reinsurance is required for these events.

Coupling losses and temperature would be helpful in making a more robust framework for the estimation of future losses. Since global average temperature can be used to

(7)

Contents 2 approximate the advance of climate change, finding a way to link this to a distribution function of event severity would improve the predictability of future extreme weather losses. One quantification of temperature changes is the temperature anomaly, which is the difference between the measured temperature and the mean temperature in a chosen reference period (NASA, 2020 [11]).

1.1

Research questions

Aim of this thesis: to find a model that fits the probability distribution of the total damages per event. By event, a natural disaster is meant. Specifically, a weather-related natural disaster. This includes hydrological disasters (floods and landslides), meteorolog-ical disasters (storms, cold waves and heat waves) and climatologmeteorolog-ical disasters (droughts and wildfires).

There is a trade-off to be made between the scope and the precision of the model. A model that covers all types of natural disasters in all regions of the world would be preferable to one that has a more narrow scope. However, many factors play a role in the severity of each event. These factors (the level of urbanisation, for instance) might be wildly different from region to region and between event types. Therefore, aggregat-ing data from different regions or event types reduces the precision of the model. In this thesis, a balance is struck by looking at events that occurred only in one specific continent. Because most research in the literature appears to focus on the USA, the region that will be considered in this thesis is North America.

When analysing disaster losses, there is an important distinction between total losses and insured losses. The former being an estimation of the total value of capital destroyed by the event, while the latter only concerns the payout to policy holders of insured dam-ages. Because an estimation of future insured losses not only requires an estimation of future total losses, but also a projection of the change in insurance habits, this adds a complex variable. That is why this thesis will focus on the total losses, and not the insured losses.

The aim of this thesis is summarized in the research question:

What is the distribution of the total capital losses per weather-related nat-ural disaster event in North America?

To further the insight into this matter, this paper will also look for an answer to the subquestion:

(8)

Theory

2.1

Extreme value theory

Whereas most of the probability density of random variables is typically concentrated around the average, extreme value theory (EVT) deals with the probability density in the tails of a distribution. Since the focus of this thesis is on extreme weather events, their distribution will be in in the realm of extreme value theory. EVT can be used to analyse block-maxima or peaks over a high threshold. The first approach will be used to estimate the distribution of the most extreme disasters in a given time-frame, while the latter will be used to estimate the distribution of the exceedances per event over a predefined threshold of losses.

There are important differences between the two approaches. Ferreira and De Haan ar-gue that the peaks-over-threshold (POT) models use the available data more efficiently than the block-maxima methods, as all observations over a certain threshold are included (Ferreira and De Haan, 2015 [7]). Conversely, block-maxima methods discard observa-tions when there are multiple in one block, while retaining some lower observaobserva-tions when there is no extreme data point in a block. However, they note that block-maxima meth-ods are more desirable when the data is only available in the form of periodic maxima, or if the observations are not identically and independently distributed (i.i.d.), for example when there is a seasonal periodicity. In such a case, using the maxima of blocks that are larger than the period of dependence can negate the seasonality(Ferreira and De Haan, 2015 [7]). The data used in this research is not in the form of periodic maxima, but since the natural disasters are weather-related, there is likely to be seasonal periodicity. Blocks of one year would factor out the periodicity caused by the seasons. The GEV distribution is an effective tool to calculate return levels: the time in which a given

(9)

Contents 4 threshold is expected to be exceeded once. However, since the GEV distribution de-scribes how the block-maxima (also known as the first-order statistics) are distributed, without specifying the distribution of smaller events, it is not suitable for the estimation of the losses per event. The peaks-over-threshold approach is preferable in this regard. Because the distribution of block-maxima provides useful insight in the most extreme losses that can be expected to occur, this thesis will estimate that distribution using the block-maxima approach. To find the distribution of losses per event of the events that exceed a high threshold, the peaks-over-threshold approach will be used.

2.1.1 Block-maxima approach

The block-maxima approach consists in dividing the data in periods, i.e. ”blocks”, and using only the largest observation in each of the blocks. As long as there are enough years of data available, the data is i.i.d. and the parameters of the distribution are time-independent, these maxima will be GEV distributed. An important theorem in EVT is the Fisher-Tippett-Gnedenko-Theorem, which is also referred to as the Extreme Value Theorem (Cannon, 2010 [2]).

Given i.i.d. random variables X1, X2, ..., consider the maximum value of the first n

variables Mn= max(X1, X2, ..., Xn) (Holland et al., 2012 [9]). Now suppose that there

exist sequences an≥ 0 and bn∈ R such that

lim

n−→∞P(an(Mn− bn) ≤ x) = G(x), (2.1)

where G(x) is a non-degenerate distribution. Then the theorem states that G(x) is always a function in the Generalised Extreme Value (GEV) family of distribution func-tions.

All distribution functions in this family are of the form G(x) = exp −  1 + ξ−x − µ σ  !−1/ξ , (2.2) Defined when x is such that the term between square brackets is larger than zero. Furthermore, ξ and µ must be finite and σ strictly positive (Holland et al., 2012 [9]). In this parameterisation, µ is the location parameter and determines where most of the probability density is located, analogous to the mean of a Gaussian distribution. σ is the scale parameter, which determines the width of the distribution, analogous to the standard deviation of a Gaussian. The third parameter, ξ, is called the shape parameter. It governs the tail behaviour of the distribution (Holland et al., 2012 [9]).

(10)

parameter. Distributions with a negative shape parameter belong to the Weibull class. Distributions in this class all have right endpoints. If a distribution in this family has a positive shape parameter, it is in the Fr´echet class of distributions. These distributions change relatively slowly for large x. Their tails decay like a power function, where the rate of decay α = 1/ξ, i.e. the inverse of the shape parameter, is referred to as the tail index. The right endpoint of Fr´echet distributions is xF = ∞. A class with tails that

are not quite as heavy as the Fr´echet distribution’s, but with an infinite right endpoint nonetheless, is the Gumbel class. This encompasses all distributions in the GEV family with ξ = 0 (McNeil, Frey, Embrechts, 2015 [10]).

According to Cannon, the main assumptions of the extreme value theorem (note that this theorem arrives at stationary GEV-family distributions) are:

1. There are enough extreme values.

2. The data points are identically and independently distributed.

3. The series is stationary, i.e. the parameters of the distribution they come from are time-independent (Cannon, 2010 [2]).

About the i.i.d. nature of the data, McNeil et al. point out that the block size is assumed to be large enough such that even dependent underlying data will lead to independent block maxima. In the case of natural disasters, this would require using a block size that is large enough to include all disasters caused by the same weather phenomenon. For instance, making sure that a hurricane season does not span multiple blocks. A block length of one year will filter out dependence on scales smaller than a year. However, yearly blocks will not smooth out oscillating processes with periods that are longer than a year (McNeil, Frey, Embrechts, 2015 [10]).

2.1.2 Peaks-over-threshold approach

The other way to isolate the tail events is by only looking at observations that ex-ceed a high threshold. The exex-ceedances, i.e. (observation − threshold | observation > threshold) should then approximately follow a generalised Pareto distribution (GPD) (Ferreira and De Haan, 2015 [7]).

McNeil et al. define the GPD in their book as follows: Gξ,β(x) =

(

1 − (1 + ξx/β)−1/ξ, ξ 6= 0,

(11)

Contents 6 where β > 0, and x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ −β/ξ when ξ < 0. Like the GEV distribution, the GPD has a scale parameter (β) and depending on the shape parameter (ξ), there are a few special cases. When

• ξ > 0, then Gξ,β is the same as the distribution function of an ordinary Pareto

distribution with α = 1/ξ and κ = β/ξ.

• ξ = 0, then Gξ,β matches the distribution function of an exponential distribution.

• ξ <, then Gξ,β is a short-tailed, Pareto type II distribution.

McNeil et al. explain that, while the GEVD is the natural model for block-maxima, the GPD is a natural model for the excess distribution over high thresholds. For a random variable X with distribution function F , the distribution function of this excess distribution is given by

Fu(x) = P (X − u ≤ x|X > u) =

F (x + u) − F (u)

1 − F (u) , (2.4) defined for 0 ≤ x < xF − u, where u is the threshold and xF ≤ ∞ is the right endpoint

of F . Fu is the distribution of amount of loss exceeding the threshold u, given that u is

exceeded .

To get an idea of the average exceedance, the mean excess (or mean residual life) function is used. For a random variable X with finite mean, this is given by

e(u) = E(X − u|X > u)[10]. (2.5) The Pickands-Balkema-de Haan theorem states that a positive-measurable function β(u) can be found such that

lim

u−→xF

sup

0≤x<xF−u

|Fu(x) − Gξ,β(u)(x)| = 0 (2.6) if and only if F is in the maximum domain of attraction of some non-degenerate distribu-tion funcdistribu-tion Hξ, ξ ∈ R. This is equivalent to saying that there exists a non-degenerate

function distribution function H(x) that is a GEV distribution of type Hξ, for which

the following equation holds: lim

n−→∞P ((Mn− dn)/cn≤ x) = limn−→∞F n(c

nx + dn) = H(x), (2.7)

for existing sequences of real constants dn and cn where cn> 0 for all n (McNeil, Frey,

Embrechts, 2015 [10]).

(12)

GEV distribution, that distribution’s excess distribution converges to a GPD as the threshold is raised. Also, the shape parameter of both distributions will be the same (McNeil, Frey, Embrechts, 2015 [10]).

2.1.2.1 Threshold selection

It is important to choose an appropriate threshold when using the POT approach. A higher threshold excludes more observations, leading to more variance in the approxi-mation of the model parameters. On the other hand, a lower threshold leaves in more observations that might not be in the tails of the distribution, which leads to a biased result of the approximation. So there is a trade-off to be made.

As McNeil et al. explain, if X has distribution function F = Gξ,β, it follows from

equation (2.4) that

Fu(x) = Gξ,β(u), β(u) = β + ξu, (2.8)

where 0 ≤ u < 1, and 0 ≤ u ≤ −β/ξ if ξ < 0. The important take-away from this is the characterizing property of the GPD that the mean residual life function is linear in the threshold u (McNeil, Frey, Embrechts, 2015 [10]).

The choice of u can therefore be based on the mean residual life (mrl) function. In the plot of the mrl as a function of candidate threshold uc, there should be an interval of

(13)

Contents 8

2.2

Exposure and vulnerability

It is important to distinguish between hazard and risk. Hazard can be defined as “a potential threat to humans and their welfare arising from a dangerous phenomenon or substance that may cause loss of life, injury, property damage and other commu-nity losses or damage.” A distinction is made between technological hazards, such as accidents or infrastructure failures, and natural hazards, which are caused by natural processes (Smith, 2013 [19]). The focus in this thesis will be on natural hazards. Risk, on the other hand, refers to the statistical probability of experiencing a hazard and its likely consequences. The likely consequences of an earthquake may be extremely severe, but there is little risk if the probability of such an earthquake occurring is extremely low.

(14)

2.3

Hypotheses

Based on the theory presented in this chapter, the following hypotheses are formulated about the weather-related natural disaster events in North America:

• The highest-loss events of each year will follow a distribution in the family of generalised extreme value distributions.

• Exceedances over high thresholds of normalised losses per extreme weather event will follow a generalised Pareto distribution.

(15)

Chapter 3

Data and methodology

3.1

Disaster data

The observations of natural disasters are taken from the EM-DAT database. It is pub-licly available and contains data on catastrophic events from 1900 to the present day. It includes information on casualties, number of people affected, but most importantly: insured and uninsured losses. The EM-DAT database is kept up-to-date by the Centre for Research on the Epidemiology of Disasters (CRED), which is part of the Catholic University of Leuven.

The database contains 24000 catastrophic events. Since the list of events is not as ex-haustive for the early 1900s, we will focus on the years that have more complete data. According to Phelan , systematic recording of natural disaster events started in 1980 (Phelan, 2011 [16]).

Therefore, 1980-2019 seems like a good period to look at. Only events that are related to weather and climate will be considered. This leaves around 10.000 events in the cat-egories of ”Climatological”, ”Hydrological” and ”Meteorological”.

The relevant aspects of each event recorded in the database are: • The date on which the event started.

• The type of the disaster. To make sure only weather-related events are included in the analysis, the datapoints with type ”meteorological”, ”climatological” or ”hydrological” are used and the other types of disasters are excluded.

• The country in which the event occured. Since disasters are complex and diverse in their causes and effects, the events that happened in the region of North America (USA and Canada) will be the focus. This leaves 534 observations.

(16)

• Total damages are the most important statistic, as the probability distribution of this is what the thesis aims to model.

• Insured damages are also included in the data and are most important to insurers. However, estimating insured losses instead of total damages would make the model complex to a level that is beyond the scope of this thesis.

All observations of losses will be corrected for inflation to the 2019 level. Since not all numerical optimization methods can handle observations in the order of 1011, some graphs will show normalized losses (observations divided by the losses of the largest loss event). This will be noted in the figures or their caption. To get an idea of the data, a scatterplot is shown below in figure3.1. Both plots show the total damage per event in terms of millions of 2019 inflation corrected dollars. Plot A has a normal y-axis, while plot B has a logarithmic y-axis. Climatological disasters include droughts and wildfires, while hydrological disasters include floods and landslide and meteorological disasters are extreme temperature events (only 10 events in total) and storms.

Looking at these scatterplots in figure3.1, it is clear that there are only a few observations of disasters that caused less than 100 million USD in total damages. There is a simple explanation for this, as events are only included in the database if at least one of the following criteria are fulfilled:

• More than 10 deaths.

• More than 100 people affected, injured or homeless.

• The country affected by the event declares a state of emergency and/or appeals for international assistance.

(17)

Contents 12

Figure 3.1: Scatterplot of total losses of all weather-related disaster events in the USA and Canada between 1980 and 2019 recorded in the EM-DAT database. This includes

climatological (red), hydrological (green) and meteorological (blue) disasters.

3.2

Normalisation of losses

Summarizing the theory in chapter 2, the risk posed by a hazard can be expressed as (Smith, 2013 [19]):

Risk = Hazard probability · exposure · vulnerability. (3.1) In this thesis, the observed losses incurred by disasters will be adjusted for the exposure present at the time of these events, in order to compare disasters as if they happened in the same year. Any trends that arise in these comparisons might be caused by either changes in vulnerability or changes in other factors. By introducing measurements of some factors as covariates in the estimation of the models, (part of) the trends may be explained by these covariates.

(18)

capital stock in these years was the same as the average growth in the five years before. The normalisation procedure can be summarised as:

N Li = L · Ii· Ci· Ui, (3.2)

where:

• i identifies the year in which an event took place,

• NL is the normalised amount of losses caused by an event, • I is the inflation index in year i, with I = 1 in the reference year, • C is the capital stock index in year i, with I = 1 in the reference year, • I is the urbanisation index in year i, with I = 1 in the reference year.

The reference year in this research will be 2019. In other words, all losses will be normalised such that the events appear to have all occurred in 2019. Figure 3.2shows the normalised losses of the events that were plotted in figure 3.1. When comparing both figures, be mindful that the axis ranges are different, approximately by a factor of 4.

(19)

Contents 14

3.3

Candidate covariates

One of the objectives of this thesis is to investigate whether introducing extra variables (covariates) in the estimation will produce models that more accurately describe the dis-tributions of the yearly maxima and exceedances over a high threshold. Three different types of candidate covariates are proposed: temperature anomaly, a measure of the El Ni˜no/La Ni˜na cycle and a measure of the degree of urbanisation.

The deviance statistic is a measure of how much closer the linear models fit the data, compared to the stationary model. It is defined as

D = 2(l(M1) − l(M0)), (3.3)

where l(Mi) is the maximised log-likelihood under model i. A positive value of D

indi-cates that M1 performs better than M0. The asymptotic distribution of D is given by

the χ2 distribution with k degrees of freedom, where k is the difference in dimensionality

between M1and M0. When the value of D is large compared to the critical value of χ2k,

this suggests that M1 explains substantially more of the observed variation than M0

(Fawcett [5]). Since the difference between the stationary model and the linear models is 1 dimension, the critical value of the 5% significance level is χ21(0.05) = 3.841. However, there is a trade-off to be made. When two models with different complexity perform equally well, the simpler, i.e. more parsimonious, model is preferred. In other words, parameters that are added to a model should justify the extra complexity by improving the accuracy of the model substantially. To quantify this, the Akaike infor-mation criterion (AIC) can be used. Agilan and Umamahesh use the AICc, which has a correction for small sample sizes in it. The goal of the AICc is to provide a statistic that can be used to avoid underfitting as well as overfitting (Agilan and Umamahesh, 2017 [1]).

It is calculated as follows:

AICc = −2logL + 2p + 2p(p + 1)

n − p − 1, (3.4) where −logL is the minimised negative log likelihood of estimated model, p is the number of parameters that the model uses and n is the sample size (Agilan and Umamahesh, 2017 [1]).

(20)

their choice of parameterisation of the non-stationary GEVD, they note that although all three parameters (location, scale, shape) could be made variable, the scale parameter β is assumed to be constant. The reason for this is that allowing β to be variable requires long-term observation. The shape parameter ξ is also held constant in their models, because it is hard to estimate and it is unrealistic to assume ξ would have a smooth function in time (Agilan and Umamahesh, 2017 [1]).

They conclude that the way to go is a linear dependence of the location parameter on the covariates, i.e. of the form µ(t) = µ0+ θ · cov(t) or, equivalently, µ(cov) = µ0+ θ · cov,

where cov denotes the value of a covariate (Agilan and Umamahesh, 2017 [1]). GEVD that use multiple covariates have location parameter µ( ~cov) = µ0 + ~θ · ~cov. In this

parameterisation, θ and ~θ are respectively a coefficient and a vector of coefficients to be estimated. They quantify the (linear) response of the location parameter on the covariates.

3.3.1 Temperature anomaly (TA)

A relation is expected between disaster severity and temperature changes. It is im-portant to choose the right TA dataset for the task at hand. Several sets of TA data are available. NASA provides the temperature anomaly over land in the US main-land. NOAA publishes the monthly combined land-ocean temperature anomaly for the Northern hemisphere and globally (NOAA, 2020 [14]). The combined land-ocean TA is expected to perform better as a covariate, since a large portion of the disasters in the EM-DAT dataset are tropical storms, which form over sea.

The International Panel on Climate Change (IPCC) publishes global temperature anomaly projections based on several Representative Concentration Pathways (RCPs). Since the aim of this thesis is to make projections of future natural disaster risk, a model that incorporates TA as a covariate would be ideal. The projected TA for each of the RCPs can then be used as input for the risk projections.

Therefore, the covariates that will be considered are the annual global temperature anomaly, the 5-year average global temperature anomaly and the 10-year moving aver-age global temperature anomaly. These values are plotted in figure3.3. There is a clear upward trend of around 0.37 degrees Celsius per decade, with little noise.

(21)

Contents 16

Table 3.1: Mean and likely range of global temperature anomaly (in degrees Celsius) relative to average temperatures in the period 1986-2005. Taken from the IPCC Fifth

Assessment Report.

2046-2065 2081-2100

Scenario Mean (likely range) Mean (likely range) RCP2.6 1.0 (0.4 to 1.6) 1.0 (0.3 to 1.7) RCP4.5 1.4 (0.9 to 2.0) 1.8 (1.1 to 2.6) RCP6 1.3 (0.8 to 1.8) 2.2 (1.4 to 3.1) RCP8.5 2.0 (1.4 to 2.6) 3.7 (2.6 to 4.8)

Figure 3.3: Annual, 5-year and 10-year moving averages of the global temperature anomaly.

3.3.2 El Ni˜no and Southern Oscillation (ENSO)

(22)

When the opposite happens, either due to the trade winds weakening or the water in the eastern Pacific warming up, this also causes a positive feedback. The increased convection in the central and eastern Pacific weakens the trade winds. This in turn reduces the transport of water from east to west, reducing the upwelling of cold water in the east, which results in warmer water in the east than usual. This combination of processes is called El Ni˜no. Although no El Ni˜no/La Ni˜na episode is alike, they usually occur every 3-5 years, with neutral periods in between.

Pielke and Landsea (1999, [17]) look at the SST in the central Pacific region known as Ni˜no 3.4. When these temperatures are 0.4◦C warmer than the long-term average SST in the months August, September and October, they classify that period as an El Ni˜no. They note that these months are of particular interest, as 95% of the category 3, 4 and 5 Atlantic hurricane activity takes place in those months [17]. During El Ni˜no, there is increased vertical wind shear. According to NOAA, vertical wind shear, i.e. differences in wind speed and direction in layers of air, has a negative effect on hurricane formation (NOAA, 2020 [13]). The increased vertical wind shear during El Ni˜no events directly reduces chances of a hurricane forming. During La Nina events, the opposite is true: reduced vertical wind shear provides better conditions for hurricane formation (Pielke and Landsea, 1999 [17]).

It has been found that both the frequency of damaging storms and damage per storm is greater during La Nina events. That does not mean, however, that large hurricanes do not form during El Nino. Furthermore, they conclude that the ASO Nino 3.4 SSTs described above can be used as a statistically significant indicator of damage. They point out that this is especially useful for reinsurance companies and financial markets looking to assess the risk of catastrophic loss. Finally, they note that many other factors also affect the Atlantic hurricanes (Pielke and Landsea, 1999 [17]).

For the purpose of this thesis, the variable that will be used to indicate the state of the ENSO will be the Southern Oscillation Index (SOI). NOAA provides a database containing the SOI, which is calculated as

SOI = (ST DT − ST DD)/M SD, (3.5) where ST Di= (Actual SLPi− M ean SLPi) pP(Actual SLPi− M ean SLPi)2/N .

In the last equation, N denotes the number of months, Actual SLPi represents the

measured sea level air pressure (SLP) at location i = T, D (Tahiti and Darwin) and M ean SLPi represents the mean SLP at location i in the 1981-2010 base period. Note

(23)

Contents 18 (French Polynesia) and Darwin (Australia). Prolonged periods of negative (positive) val-ues of SOI coincide with warmer (colder) sea surface temperatures in the east, typically seen during El Nino (La Nina) episodes (NOAA [15]). This thesis will use the SOI values made available by NOAA as the candidate covariate related to the ENSO. These values are shown in figure 3.4.

Figure 3.4: Scatterplot of monthly SOI values and smoothed line for reference.

3.3.3 IPCC temperature anomaly projections

(24)

Results and discussion

4.1

Block-maxima

4.1.1 Stationary

In the block-maxima analysis, after normalising the losses, the largest damage event of each year was selected and these 40 yearly maxima were used to fit a GEV distribution. The yearly maxima are plotted in figure 4.1.

Figure 4.1: Normalised losses of the largest observations in each year.

(25)

Contents 20

Fitting GEV distributions to the yearly maxima resulted in diagnostical plots shown in figure4.2. In this figure, the top-left probability-probability (PP) plot shows the empiri-cal and model cumulative distribution functions (cdf) plotted against each other. In the top right, the quantile-quantile (QQ) plot shows the empirical and model quantiles plot-ted against each other. A model that corresponds perfectly to the empirical data would have all points in these plots on the (blue) line at exactly 45 degrees through the origin. The PP-plot shows little deviation from the blue line, while the QQ-plot only slightly underestimated the highest observed quantile (which represents hurricane Katrina in 2005). This means that the stationary GEVD accurately describes the data. Note that both axes of the QQ-plot represent losses per event in billions of US dollars. This esti-mation resulted in a GEV distribution with parameters (µ, β, ξ) = (6.120, 5.755, 0.8802), where the location and scale parameters are in billions of USD. The standard errors are (1.054, 1.291, 0.2052).

Figure 4.2: Diagnostical plots for the stationary GEV distribution fitted to the yearly maxima.

4.1.2 GEVD with covariates

GEVDs with non-stationary location parameter µ(cov) = µ0+ θ · cov, where cov denotes

(26)

Table 4.1: D-statistic and difference in AICc of GEVD fits with covariates, compared to the stationary model.

Covariates D-statistic AICc difference None 0 0 AA 1.22 -1.26 TA 2.53 0.06 MA 3.02 0.54 SOI 3.45 0.97 AA, SOI 4.51 -0.59 TA, SOI 5.40 0.30 MA, SOI 7.30 2.20

Table 4.2: Table showing the estimated parameters of the GEVD with covariates 10-year moving average global TA (MA) and SOI. All parameters except ξ are given in

billions of USD.

Variable Estimate Standard error µ0 2.523 1.376

θM A 4.303 1.123

θSOI 0.7839 0.2888

β 4.059 1.424 ξ 1.482 0.4738

These are also shown in table 4.1. When adding a second covariate, SOI performs the best. The AICc of the MA,SOI-model is more than 2 lower than that of the stationary model, suggesting that this model is substantially better and that the added complexity of the two covariates is justified. The estimated parameters for the MA,SOI-model are given in table 4.2.

(27)

Contents 22

(28)

4.2

Peaks-over-threshold

4.2.1 Threshold

To aid with the selection of an appropriate threshold, the mean residual life function is plotted in figure 4.4. It is hard to spot a clear point at which the plot becomes linear. Therefore, the stability of the shape parameter as a function of candidate thresholds is plotted in figure 4.5. The shape parameter is stable at around 0.80 from the 35% quantile candidate threshold onward, accentuated by the blue line. This suggests that the 35% quantile, i.e. the value that is higher than 35% of the observations (0.565 billion USD), would be a good threshold for the GPD estimation.

As a third way to look at what the most appropriate threshold is, the threshold weights are plotted in figure 4.6. The highest threshold weights are attained by the 0.35th quantile. Taking all these diagnostic methods into account, it is concluded that u = 0.5 billion USD is a suitable threshold for the estimation of the generalised Pareto distribution (GPD).

Figure 4.4: Plot of the mean residual life or mean excess as a function of candidate threshold in billions of USD, up to 30 billion USD. To make the plot easier to read,

(29)

Contents 24

Figure 4.5: Plot used to evaluate the stability of the ξ estimation at different candidate thresholds.

Figure 4.6: Plot of threshold weight as a function of the quantile of the candidate threshold, using three different validation thresholds: the 0.7, 0.75, 0.8, 0.85, 0.9 and

(30)

4.2.2 Stationary

A GPD with stationary parameters was fitted to the data. This resulted in a GPD with scale β = 1.784±0.1956 billion USD and shape ξ = 0.7313±0.1020. Figure4.7shows the PP-plot and QQ-plot of the estimated model. Notice that both plots correspond very well with the blue line, which means the modeled and estimated probability distributions are almost the same. The highest observed quantile, which is dominated by hurricane Katrina in 2005, stands out. Figure4.8 shows that the cdf of the estimated model and the empirical cumulative distribution function are similar.

Figure 4.7: Probability-probability plot and quantile-quantile plot of the estimated stationary GPD.

(31)

Contents 26

4.2.3 Non-stationary

Table 4.3 shows the D-statistic and AICc difference of GPD models with covariates, compared to the stationary case. All single-covariate-models seem to be an improve-ment on the stationary case. However, combining both SOI and one of the TA datasets to be used as covariates improves the model significantly. The best fit is achieved us-ing the global annual temperature anomaly (AA) and the SOI, although the other two measure of TA are not far off. The parameters of the models that use two covariates are shown in table4.4. The estimated GPDs have parameters (scale, shape) = (β, ξ), where β = β0+ θi · i + θSOI · SOI. In this case, i = AA, T A, M A denotes the temperature

anomaly-related covariate that was used.

The models show a positive dependency between SOI and the scale parameter. This suggests that disasters are prone to cause less damage in El Ni˜no years than in neutral or La Ni˜na years (which show the most extreme events). This relationship between SOI and the losses per event was expected.

Each model has a negative dependency between temperature anomaly and the scale parameter. This means that the peaks-over-threshold are lower in warmer years. A surprise, because the majority of observations is comprised of hurricanes, which tend to be more extreme when water temperature is higher. Therefore, the opposite effect was expected. To verify this, a histogram of the normalised losses per event is plotted in fig-ure4.9. The result confirms that in years with a relatively low temperature anomaly, the distribution of losses per event was more heavy-tailed, i.e. there were more severe events.

4.2.4 Possible explanations

So what is the cause of this counter-intuitive result? In the methodology section, the choice was made to look for distributions with non-stationary scale parameters, while keeping the shape parameter constant. Does a different choice of parameterisation lead to different results? To investigate this, the fitting procedure was repeated for models with constant shape parameter and linearly dependent scale parameter and for models that have both parameters linearly dependent on the covariates. This resulted in negative coefficients corresponding to all temperature-related covariates in each parameter. In other words, the choice of parameterisation does not explain the negative dependence on temperature. Since fitting models with variable shape parameters is questionable (Agilan and Umamahesh, 2017 [1]), the decision to only model variability in the scale parameter is maintained.

(32)

Table 4.3: D-statistic and difference in AICc of GPD fits with covariates, compared to the stationary GPD model.

Covariates D-statistic AICc difference None 0 0 AA 5.587449 3.545047 TA 3.296019 1.253617 MA 3.310428 1.268027 SOI 6.870596 4.828194 AA, SOI 16.334285 12.235048 TA, SOI 15.478531 11.379293 MA, SOI 15.474491 11.375254

real terms to account for urbanisation and increase in capital stock were overestimated. This would lead to the overestimation of events that took place longer ago, in comparison to more recent events. It would therefore explain some of the downward trend that has been observed. However, analysis of the losses per event before normalisation results in GPD models with the same negative dependency between temperature anomaly and the scale parameter. Flaws in the normalisation procedure can therefore not be the explanation for the observed trend.

(33)

Contents 28

Table 4.4: Table showing the estimated parameters of the GPD models that use SOI and either the annual global TA (AA), the 5-year moving average global TA (TA) or

the 10-year moving average TA (MA).

Variable Stationary SOI AA, SOI TA, SOI MA, SOI β0 1.784 1.826 2.982 3.130 3.173

standard error 0.1956 0.1996 0.4721 0.5112 0.5556 θ1 n/a n/a -1.644 -2.074 -1.949

standard error n/a n/a 0.5354 0.6704 0.6644 θSOI n/a 0.4051 0.5394 0.5660 0.5883

standard error n/a 0.1094 0.1163 0.07722 0.09513 ξ 0.7313 0.7148 0.6576 0.6662 0.6592 standard error 0.1020 0.1005 0.09689 0.09593 0.09768

Figure 4.9: Histograms of the losses per event in the 20% coldest (top) and 20% warmest (bottom) years. The events are coloured according to their subgroup. Note that the10log of the losses in millions of dollars is shown, i.e. 9 corresponds to a 109=

(34)

4.3

Projected distributions

The non-stationary GEVD and GPD models can be used to project future expected maximum yearly losses and losses per event, respectively. Assuming that the depen-dence of the distributions on global temperature anomaly and SOI remain the same, projections of these covariates can be used as variables to generate distributions that are likely to occur in the future. For the future global TA projections, the IPCC pathways will be used. Since the ENSO is a cyclical physical process, this is expected to still be the case in the future. Therefore, three different ENSO states are proposed as examples: El Ni˜no (with an SOI value of -1.5), neutral (SOI = 0) and La Ni˜na (SOI = 1.5).

4.3.1 Block-maxima projections

The best non-stationary model was the MA,SOI-model, with parameters as shown in table 4.2. Figure 4.10shows the return level plots of the MA,SOI-model with El Ni˜no, neutral and La Ni˜na conditions. As expected, losses are lowest (highest) in the El Ni˜no (La Ni˜na) situation. The present day temperature anomaly was used for this plot. Plot4.11shows the return levels of the model using the mean 2046-2065 TA projections of different pathways, as seen in table 3.1. The return levels plots show that higher temperature anomaly translates into higher return levels. This means that the yearly maximum damage event is expected to be more extreme for higher levels of the TA. Plot

4.12 shows the same, but using the mean 2081-2100 TA projections.

(35)

Contents 30

Figure 4.10: Return level plots of the MA,SOI-model, using different ENSO variables and the today’s temperature anomaly.

(36)
(37)

Contents 32

4.3.2 Peaks-over-threshold projections

Because the trend found in the estimation of models using temperature anomaly as a co-variate is questionable, the IPCC projections cannot be used to improve the projections of future exceedance distributions. However, the dependence of the scale parameter on the SOI is an improvement on the stationary model. The SOI-model can therefore be used to project expected (normalised) losses over per event in excess of the threshold of 500 million USD. Note that, to calculate the exceedance distribution in a given year, these normalised losses should be multiplied by the index values of capital stock, urban-isation and inflation. The cumulative exceedance distribution of normalised losses over 500 million USD for different SOI values is plotted in figure 4.13.

(38)

Conclusion

From the EM-DAT database, observations of damage caused by weather-related disas-ters that occurred in North America between 1980 and 2019 were selected. The obser-vations were normalised to account for increases in capital stock and urbanisation. Of the normalised data, the most damaging weather-related natural disaster events of each year were used to estimate a generalised extreme value distribution with constant pa-rameters, resulting in a GEVD with parameters (µ, β, ξ) = (6.120, 5.755, 0.8802), where the location and scale parameters are in billions of USD, and the standard errors were (1.054, 1.291, 0.2052). This was a good fit for the data as a whole, confirming the hy-pothesis that the block-maxima would follow a distribution in the GEV family.

Several GEV models with location parameters that are linearly dependent on different measures of temperature anomaly (TA) were constructed. The best fit was achieved using the global 10-year moving average TA. This was a better fit for the data than the stationary model, when looking at the deviance statistic and AICc.

The best GEVD model used both the 10-year moving average global TA and the South-ern Oscillation Index (SOI) as covariates. This resulted in a non-stationary model with a location parameter that depends positively on both the temperature anomaly and the SOI. These estimated parameters van be found in table 4.2. Return levels were esti-mated for each of the representative concentration pathways defined by the IPCC and for three different states of the El Ni˜no Southern Oscillation cycle.

A generalised Pareto distribution was estimated, using all observations of events that occurred between 1980 and 2019 and caused total normalised damage exceeding a thresh-old value of 500 million USD. This resulted in a reasonable fit of the exceedances over the threshold, confirming the hypothesis that exceedances over high tresholds of losses per event follow a GPD distribution. The stationary model gives insight in the distribu-tion of losses per event under comparable condidistribu-tions to those in the observadistribu-tion period (19080-2019).

(39)

Contents 34 A GPD estimation using temperature anomaly and SOI as covariates resulted in pa-rameters for the temperature anomaly dependence that were negative. Temperature anomalies upwards of 2 degrees Celsius would cause these model’s scale parameters to drop below zero. Since the GPD is only defined for positive scale parameters, the mod-els with TA as a covariate cannot be used to generate reasonable projections of future exceedance distributions. However, the GPD model that incorporates SOI as a covariate is an improvement on the stationary model.

(40)

[1] V. Agilan and N. V. Umamahesh. Non-Stationary Rainfall Intensity-Duration-Frequency Relationship: a Comparison between Annual Maximum and Partial Du-ration Series. Water Resources Management, 31(6):1825–1841, Apr. 2017.

[2] A. J. Cannon. A flexible nonlinear modelling framework for nonstationary general-ized extreme value analysis in hydroclimatology. Hydrol. Processes, 24(6):673–685, Mar 2010.

[3] O. Cardona, M. Van Aalst, J. Birkmann, M. Fordham, G. Mc Gregor, P. Rosa, R. Pulwarty, E. Schipper, B. Sinh, H. D´ecamps, M. Keim, I. Davis, K. Ebi, A. Lavell, R. Mechler, V. Murray, M. Pelling, J. Pohl, A. Smith, and F. Thomalla. Deter-minants of risk: Exposure and vulnerability, volume 9781107025066, pages 65–108. Cambridge University Press, United Kingdom, Jan. 2012.

[4] A. Charpentier. Insurability of climate risks. The Geneva Papers on Risk and Insurance-Issues and Practice, 33(1):91–109, 2008.

[5] L. Fawcett. Chapter 5: Non-stationary extremes, Unkown year. URL: https://www.mas.ncl.ac.uk/ nlf8/teaching/mas8391/background/chapter5.pdf (ac-cessed December 5, 2020).

[6] R. I. Feenstra, Robert C. and M. P. Timmer. Capital Stock at Constant National Prices for United States [RKNANPUSA666NRUG], re-trieved from FRED, Federal Reserve Bank of St. Louis, 2015. URL: https://fred.stlouisfed.org/series/RKNANPUSA666NRUG (accessed January 5, 2021).

[7] A. Ferreira and L. de Haan. On the block maxima method in extreme value the-ory: PWM estimators. The Annals of Statistics, 43(1):276–298, Feb. 2015. arXiv: 1310.3222.

[8] M. Goss, D. L. Swain, J. T. Abatzoglou, A. Sarhadi, C. A. Kolden, A. P. Williams, and N. S. Diffenbaugh. Climate change is increasing the likelihood of extreme

(41)

Bibliography 36 autumn wildfire conditions across california. Environmental Research Letters, 15(9):094016, 2020.

[9] M. P. Holland, R. Vitolo, P. Rabassa, A. E. Sterk, and H. W. Broer. Extreme value laws in dynamical systems under physical observables. Physica D: Nonlinear Phenomena, 241(5):497–513, Mar. 2012.

[10] A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management, chapter 5, pages 135–166. Revised edition.

[11] NASA. Data.GISS: GISS Surface Temperature Analysis (v4): Analysis Graphs and Plots, 2020. URL: https://data.giss.nasa.gov/gistemp/graphs/ (accessed November 12, 2020).

[12] NOAA. El Ni˜no — National Oceanic and Atmospheric Administra-tion, 2015. URL: https://www.noaa.gov/education/resource-collections/weather-atmosphere/el-nino (accessed December 23, 2020).

[13] NOAA. How do hurricanes form?, 2020. URL: https://oceanservice.noaa.gov/facts/how-hurricanes-form.html (accessed Jan-uary 3, 2021).

[14] NOAA. Global Surface Temperature Anomalies — Monitor-ing References — National Centers for Environmental Informa-tion (NCEI), 2020. URL: https://www.ncdc.noaa.gov/monitoring-references/faq/anomalies.phpanomalies(accessed October 8, 2020).

[15] NOAA. Southern Oscillation Index (SOI) — Teleconnections — National Centers for Environmental Information (NCEI), URL: https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/soi-calculation (accessed December 21, 2020).

[16] L. Phelan. Managing climate risk: extreme weather events and the future of insur-ance in a climate-changed world. Australasian Journal of Environmental Manage-ment, 18(4):223–232, 2011.

[17] R. A. Pielke and C. N. Landsea. La ni˜na, el ni˜no, and atlantic hurricane damages in the united states. Bulletin of the American Meteorological Society, 80(10):2027 – 2034, 01 Oct. 1999.

[18] R. A. Pielke Jr and C. W. Landsea. Normalized hurricane damages in the united states: 1925–95. Weather and forecasting, 13(3):621–631, 1998.

(42)

[20] T. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex, and P. Midgley. Summary for policymakers, contribution of working group 1 to the fifth assessment report of the intergovernmental panel on climate change. Climate Change 2013: The Physical Science Basis., 2013.

Referenties

GERELATEERDE DOCUMENTEN

• Technology at a human scale –utopia (LATE MODERISM) • Diversity of Life Styles –utopia (POST

With the lack of an up-to-date systematic overview of the scientific literature, the current study aimed to sys- tematically review international literature on tests and protocols

Nünning points to American Psycho as an example of such a novel, as it call's upon “the affective and moral engagement of readers, who are unable to find a solution to to puzzling

The combination of both the qualitative and the quantitative knowledge allows the firms to be better prepared for a variety of possible futures that could happen. Mathematical models

Following the above, the theoretical model consists of four factors: absorptive capacity (contains of recognition, assimilation, transformation and exploitation of new

In short, this research provided insights in the challenges and issues for organisations in the digital age and created a framework of leadership behaviours that could

[r]

Verschillende termen uit het CAI-interpretatieluik kon- den wel teruggevonden worden binnen deze databank, maar over het algemeen is dit systeem toch niet zo aan- sluitend bij