Extreme value statistics

(1)

Extreme value statistics

Bachelor's thesis

June 2016

Student: F.C. Verbon

Primary supervisor: Dr. A.E. Sterk

(2)

Abstract

In this thesis we study extreme value statistics. First the conditions and issues and basic modelling of statistical sample spaces will be discussed. Then the theory of the extremal types theorem will be used to determine extreme value distribution models of large data sets. We use both exact known distributions to compute the generalized extreme value model and the unknown distributions. As an application we consider data of induced earthquakes in the Netherlands. In the last section we compute return levels to estimate the probability of extreme events.

(3)

1 Extreme value statistics

1.1 History of extreme value theorem

Figure 1: Lisbon 1755

The pioneer of extreme value statistics is Leonard Henry Caleb Tippett (8 May 1902 - 9 November 1985), known professionally as L. H. C. Tippett, who was an English statistician born in London. He studied for his Master of Science in statistics under Karl Pearson at the Galton Laboratory, University College London and R.A. Fisher at Rothamsted. He spent his entire career, 1925 to 1965, as a staff member of the Shirley Institute, Manchester. Along with R.A. Fisher and Emil Gumbel, he pioneered extreme value theory.

The problem of yarn breakage rates in weaving, led Tippett to study the statistics of extreme events. He realized that the strength of a thread was controlled by the strength of its weakest fibres. Tippet obtained three asymptotic limits describing the distributions of extremes. Emil Julius Gum- bel codified this theory in his 1958 book Statistics of Extremes, including the Gumbel distributions that bear his name. The Weibull distribution is named after the Swedish mathematician Waloddi Weibull, who described it in de- tail in 1951, although it was first identified by Fréchet (1927). The Fréchet distribution named after Maurice Fréchet who wrote a related paper in 1927.

Further work was done by Fisher and Tippett in 1928 and by Gumbel in the fifties. As a result of Tippets work in the textile industry he was awarded the Shewart Medal of the American Society for Quality Control [1, 2, 3].

(5)

Extreme value theory is used to model the risk of extreme, rare events, such as the 1755 Lisbon earthquake [4]. The earthquake almost totally de- stroyed Lisbon and adjoining areas by the combination with fires and a tsunami. Seismologists today estimate the magnitude in the range of 8.5- 9.0 and that the two different types of sedimentological data both suggest a recurrence time on the order of 1500-2000 years for an earthquake of this magnitude [5].

1.2 Usage and Paradigm

Extreme value statistics is a branch of statistics dealing with the statistical behaviour or extreme values. It provides a framework that enables extrapolation. Such extreme values are often events that are dangerous or risks with a bad outcome. Applications of extreme value statistics are numerous such as the following:

• Insurance industry, risks

• Food science

• Ocean wave modelling

• Thermodynamics of earthquakes

• Meteorological change

• Weather extremes

And many applications more [6].

The statistics are focused on data, which are obtained from observations or measurements. The data are interpreted as realizations of a random variable, denoted in a sequence X₁, X₂, . . . . The uppercase X_i represents the random quantity whose realized value is x_i. We assume that there is no limit on the number of observations we can make, otherwise the asymptotic arguments of limit theorems would fail. In the simplest case, as follows, suppose {X₁, . . . , X_n, } is a sequence of hourly sea-levels. Then define:

M_n= max(X₁, . . . , X_n). (1) Which is the maximum over an n-observation period. If the exact statistical behaviour of the Xi were known, then the behaviour of Mn could be calculated exactly.

In practice, the behaviour of the X_iis unknown. Making exact calculation on Mn impossible. However under assumptions, the approximate behaviour

(6)

of M_nfor large value of n follows from detailed limit arguments by letting n →

∞. This leads to a family of models that can be estimated by the observed values of M_n. This approach is termed as the ’Extreme value paradigm’.

1.3 Suitable regularity conditions

The basic 3 conditions the model needs are as follows:

1. The data must be large enough and can be made larger by adding new data. This is because the extreme value theorem is developed using asymptotic arguments.

2. The data itself is derived under idealized circumstances. This is an important condition otherwise the model would be good for nothing.

3. The last condition is that the data is used such that the most useful information is used. The data of the extreme values may lead to a wastage of information when implemented in practice. Such as the common way of recording extreme data by only storing the maximum observed value over a period. Ignoring the second maximum in the period that could be higher than in other observed periods. Therefore excluded from the analysis. This would give a model of poorer quality.

Those 3 conditions emphasize the importance of statistical implementations as a complement to the development appropriate models for extremes.

1.4 Issues

There are issues with each way of making the extreme value model. Those issues can be listed into four issues alone.

1. All estimation techniques of the parameters of the GEV distribution have pros and cons like the maximum likelihood or the bayesian infer- ential technique.

2. The quantification of uncertainty of the parameters. It is an issue if we do not know if the parameters are estimated with a high or low certainty.

3. If the extreme value model preforms badly, then there is little hope of working well in extrapolation [6].

(7)

1.5 Overview of statistical modelling

The basic ingredients of a statistical model are the following. A random variable X, which represents a quantity whose outcome is uncertain. If the outcome is known it will be denoted as the lower case x. The set of possible outcomes of X is denoted as Ω, the sample space.

1.5.1 Discrete sample space

If the sample space Ω of X is discrete, like {0, 1, 2, . . . }, then X is a discrete random variable. Its probability distribution is determined by the probability mass function, which takes the form

f (x) = P r{X = x},

for each value of x in Ω. Thus f (x) is the probability that the random variable X takes the value of x. Two examples of family of models that have a discreet sample space are: Binomial distribution and uniform discrete distribution.

1.5.2 Continuous sample space

The sample space of Ω can also be continuous. Then we have a continuous random variable and the probability distribution function is defined as

F (x) = P r{X ≤ x},

for each x in Ω. If we differentiate the probability distribution function, we get the probability density function.

f (x) = dF dx,

The expectation and the variance are defined as follows.

Definition 1 The expectation, denoted as E(X) =R

Ωxf (x)dx.

Definition 2 The variance, denoted as V ar(X) =R

Ω{x − E(X)}²f (x)dx.

The standard deviation is defined as the square root of the variance, providing a measure of variability in the same units as X.

(8)

1.5.3 Maximum log-likelihood

Assume we have a large number of realisations x₁, . . . , x_nfrom a random variable having a probability density function f (x|θ0) with unknown parameter θ₀, where the true parameter is denoted as θ. We try to find a value of parameter which suits the model the best with the known data. The likelihood function is:

L(θ) =

n

Y

i=1

f (x_i|θ), (2)

and the log-likelihood function is:

l(θ) = log L(θ) =

n

X

i=1

log f (x_i|θ). (3)

The maximum likelihood estimator ˆθ₀ of θ₀ is defined as the value of θ that maximizes the appropriate likelihood function. Since the logarithm function is monotonic, the log-likelihood takes its maximum at the same point as the likelihood function, so that the maximum likelihood estimator also maximizes the corresponding log-likelihood estimator.

The most used method is to differentiate the log-likelihood and equating it to zero, and check whether it is a maximum by plotting or differentiate is again and if it would be negative.

Example 1 We have the data {5, 9, 3, 12, 14} and we have the Poisson distribution from the random variable X. f (x) = ^e^−λ_x!^λ^x What is the maximum likelihood estimator?

The parameter θ of the Poisson distribution is one parameter λ.

L(θ) = L(λ)

= P (X = 5)P (X = 9)P (X = 3)P (X = 12)P (X = 14)

= f (5)f (9)f (3)f (12)f (14)

= e^−λλ⁵ 5!

e^−λλ⁹

9! . . .e^−λλ¹⁴ 14!

= exp^−5λλ⁴³ 5!9!3!12!14!

We then differentiate the log-likelihood function and find the maxima and

(9)

minima where the slope is zero by Fermat’s theorem.

l(λ) = 43 log λ − 5λ − log(5!9!3!12!14!) d

dλl(λ) = 43

λ − 5 = 0

⇒ λ = 43 5 .

We check if it is a maximum. _dλ^d²2l(λ) = −_λ⁴³2. It is indeed a maximum thus the maximum likelihood estimate is λ = ⁴³₅ .

1.5.4 Model diagnostics

The model obtained from the data could still be bad. How do we check if the model fits well? Suppose we have the data x₁, . . . , x_n and that they are independent realizations from a common population with unknown distribution function F . We have an estimate of F , denoted ˆF . We want to check the plausibility that the x_i are a random sample from ˆF . Let x₍₁₎, . . . , x_(n) be the ordered sample such that x_(i) ≤ x_(i+1)∀i ∈ (1, n − 1). One way to see if the model distribution fits the data is by a probability plot P-P plot.

Definition 3 Given an ordered sample of independent observations from a population with estimated distribution function ˆF , a probability plot consists of the points

{( ˆF (x_(i)),_n+1ⁱ )|i = 1, . . . , n}.

If ˆF is reasonable model for the population distribution, the points of the probability plot should lie close to the unit diagonal. Otherwise it gives evidence of a failure in ˆF as a model for the data. Although this method works, the most used is the Quantile plot or Q-Q plot [7]. Which is as follows.

Definition 4 Given an ordered sample of independent observations from a population with estimated distribution function ˆF , a quantile plot consists of the points: {( ˆF⁻¹(_n+1ⁱ ), x_(i))|i = 1, . . . , n}.

The name ”quantile plot” derives from the fact that the quantities x_(i) and Fˆ⁻¹(_n+1ⁱ ) each provide an estimate of the _n+1ⁱ quantile of the distribution F . If ˆF is a reasonable estimate for F , then the quantile plot should also consist of points close to the unit diagonal. The probability plot and the quantile plot contain the same information expressed on a different scale. However, the perception gained on different scales can be important, so what looks like a reasonable fit on one scale may look poor on the other.

(10)

2 Classical extreme value Theory and Models

2.1 Model formulation

In this chapter we develop a model which represents the cornerstone of the extreme value theory. The model focuses on the statistical behaviour of

M_n = max{X₁, . . . , X_n}.

where X₁, . . . , X_n is a sequence of independent random variables having a common distribution function F . In applications, the X_i usually represent values of a process measured on a regular time-scale. In other words, in practice M_n represents the maximum of the events over n time units that is observed.

2.2 Exact extreme value statistics

In theory the distribution of Mn can be derived exactly for all values of n:

P r{M_n ≤ z} = P r{X₁ ≤ z, . . . , X_n≤ z} =

n

Y

i=1

P r{X_i ≤ z} = {F (z)}ⁿ. One method is to use techniques to estimate F from the observed data and then substitute this estimate to {F (z)}ⁿ. The alternative is to accept that F is unknown and to look for an approximate families of models for Fⁿ, which can be estimated on the basis of the extreme data only. This is similar to the usual practice of approximating the distribution of sample means by the normal distribution, as justified by the central limit theorem. The arguments in this chapter are an extreme value analogue of the central limit theorem.

Definition 5 A sequence of random variables X₁, X₂. . . having distribution functions F₁, F₂, . . . respectively, is said to converge in distribution to the random variable X with distribution function F if Fn(x) → F (x) as n → ∞ for all continuity points x of F .

Theorem 1 (Central Limit theorem [6]) Let X₁, X₂, . . . be a sequence of independent and identically distributed random variables with mean µ and finite variance σ². Define:

X¯_n = X₁+ · · · + X_n

n ,

then as n → ∞,

√n( ¯X_n− µ)

σ → Z ∼ N (0, 1).

(11)

This is remarkable since the asymptotic distribution of the sample mean is normal regardless of the distribution of the X_i. Analogous arguments are used in subsequent chapters to obtain approximating distributions for sample extremes.

We proceed by looking at the behaviour of Fⁿ as n → ∞. But this alone is not enough: for any z < z₊, where z₊ is the upper end-point of F or equivalently the smallest value of z such that F (z) = 1. Fⁿ(z) → 0 as n → ∞, so that the distribution of M_n degenerates to a point mass on z₊. This difficulty is avoided by allowing a linear renormalization of the variable M_n:

M_n^∗ = M_n− b_n

a_n , (4)

for sequences a_n > 0 and b_n. In short we do not want a step function at z₊. We therefore seek limit distributions for M_n^∗, with appropriate choices of {an} and {bn}, rather than Mn.

The entire range of possible limit distributions for M_n^∗is given by the extremal types theorem.

Theorem 2 (The extremal types theorem [6]) If there exist sequences of constants an > 0 and bn such that:

P r M_n− b_n a_n ≤ z

→ G(z) as n → ∞, (5)

where G is a non-degenerate distribution function, then G belongs to one of the following families:

Type I (Gumbel): G(z) = exp{− exp− ^z−b_a }, −∞ < z < ∞;

Type II (Fréchet): G(z) =

(0, if z ≤ b,

exp{− ^z−b_a −α

}, otherwise

Type III (Weibull): G(z) =

(exp{−− ^z−b_a α

}, if z < b,

1, otherwise

All three families from the extreme type theorem have distinct forms of behaviour. Which is determined by the tail behaviour for the distribution F of the random variables.

Let’s look at the behaviour of G at z₊, the smallest value of z such that F (z)=1. We see for the Weibull distribution that z₊ is finite. For Fréchet and Gumbel distributions z₊ = ∞. The density of G Gumbel decays expo- nentially and the G Fréchet decays polynomially. Just like the central limit theorem, the extreme type theorem is universal.

(12)

2.3 Generalized extreme value distribution

There are two weaknesses. First a technique is required to choose which of the three families is most appropriate for the data at hand. Secondly once a decision is made, subsequent inferences presume this choice to be correct, and do not allow for the uncertainty such a selection involves, even though this uncertainty may be substantial. It is straightforward to check that the three families can be combined into a single family of models having distribution function of the form:

Definition 6 The generalized extreme value (GEV) distribution is:

G(z) = exp (

−

1 + ξ z − µ σ

⁻¹_ξ )

, (6)

Which is defined on the set {z|1 + ξ(z − µ)/σ > 0}, where µ, ξ ∈ (−∞, ∞) and σ > 0. The most important parameter is the ξ, since it determines the tail thickness. If ξ = 0 it is type I, Gumbel family. If ξ > 0 it is type II F r´echet. If ξ < 0 it is type III Weibull.

2.4 Examples of exact distribution of the GEV

2.4.1 Uniform distribution

Assume we have a sequence of independent random variables X₁, X₂, . . . that share the same uniform distribution on [0, 1] and hence a continuous sample space. Let

M_n= max(X₁, X₂, . . . , X_n). (7) We see that:

P (M_n≤ z) = P (X₁ ≤ z, X₂ ≤ z, . . . , X_n≤ z)

= P (X₁ ≤ z)P (X₂ ≤ z) . . . P (X_n ≤ z)

= zⁿ. (8)

The limit is 0 if z 6= 1 and if z = 1. Thus we want to rescale. Now we have P (a_n(M_n− b_n) ≤ z) = P

M_n− b_n ≤ z a_n

= P

M_n ≤ z a_n + b_n

. (9)

(13)

Substituting (9) into (8) , we get:

P

M_n ≤ z a_n + b_n

= z a_n + b_n

n

. (10)

Now we want that this as n → ∞ converges to a G(z), then we choose a_n= n and bn= 1 we get:

z n + 1

n

→ e^z as n → ∞ (11)

We compare e^z with the GEV distribution (6). We see that ξ must be −1. σ has to be 1, otherwise we have a fraction that e^z doesn’t have. Now µ needs to be equal to −1 to solve 1 + z + µ = z for equality. Thus

(µ, σ, ξ) = (−1, 1, −1).

2.4.2 Exponential distribution

If X1, X₂, . . . is a sequence of independent standard exponential Exp(1) variables, then

F (x) = 1 − e^−x for x > 0. If we use M_n we get:

P (M_n≤ z) = P (X₁ ≤ z, X₂ ≤ z, . . . , X_n≤ z)

= P (X₁ ≤ z)P (X₂ ≤ z) . . . P (X_n ≤ z)

= Fⁿ(z)

= (1 − e^−z)ⁿ. (12)

Rescaling gives

P (a_n(M_n− b_n) ≤ z) = P

M_n− b_n ≤ z a_n

= P

M_n ≤ z a_n + b_n

. (13)

So substituting this will give:

Fⁿ(z) → Fⁿ z a_n + bn

and

(1 − e^−z)ⁿ→ (1 − e⁻⁽^an^z ^+bⁿ⁾)ⁿ.

(14)

If we choose a_n= 1 and b_n= log(n). We get:

(1 − e^{−z−log(n)})ⁿ =

1 −e^−z n

n

→ exp(−e^z) (14)

as n → ∞. Comparing (14) with the GEV distribution (6) gives (µ, σ, ξ) = (0, 1, 0)

2.4.3 Fréchet distribution

If X1, X2, . . . is a sequence of independent standard Fréchet variables, then F (x) = e⁻¹^x

for x > 0.

Fⁿ z an

+ b_n

= exp −1

z an + bn

!n

We let b_n= 0 and a_n= n⁻¹, then we get:

(e⁻¹^nz)ⁿ = e⁻¹^z ,

as n → ∞. Comparing e⁻¹^z with (6), we have (µ, σ, ξ) = (1, 0, 1), a type II Fréchet family.

2.4.4 Probability density function of every type

The generalized extreme value distribution (6) is the combination of all the three types. The differences of the three are in the ’tail’. In figure 2 we can also see that an increase of the scale parameter σ flattens the distribution and an increase of the location parameter µ changes the complete distribution to the right. The shape parameter ξ changes the thickness of the tail and the tail behaviour. If ξ is changes monotonic to 0, the tail thickness becomes thinner.

(15)

(a) Type I, Gumbel plots (b) Type II, Fréchet plots

(c) Type III, Weibull plots

Figure 2: (a) Gumbell probability density plots with variation in σ and µ.(b)(c) Probability density plots of Fréchet and Weibull distribution with variation in ξ.

3 Estimating the GEV model from data

3.1 Block maximum method

In order to estimate the GEV model, we first need a large data set. One way to extract the extreme values is to create equally sized blocks in the data and determine the maximum or minimum value from it. This however still could lead to loss of information if in some of the blocks there are numerous extreme values. The larger the blocks are, the more often this happens and the worse the GEV model becomes. Also the amount of values to estimate the parameters of the GEV model decreases when the block size increases, leading to a bad fit. If the blocks are too small, the data will not represent a extreme value model, since the extreme values of small blocks are not

(16)

extreme. It would still look too similar to the original data.

We now can question ourselves what a good block size is. When the GEV is accurate with the data and the normal changes of the block size would not change the ξ parameter a lot, then we know the block size is acceptable.

Once we have set a block size, the max block method gives a new data set with the extreme data. We use the maximum likelihood estimator of the GEV to find the parameters (ξ, σ, µ). Using a Q-Q plot we can check whether the estimated GEV distribution fits the extremes of the data.

3.2 Examples with max block method

3.2.1 Uniform distribution

We want to compute the GEV model of uniformly distributed variables between zero and one. We took a data set of 10⁶ samples and for each block size 1 until 150 we computed the GEV parameters and parameters with re- spect to the block size. The parameters are computed numerically with a 95% probability interval.

In Figure 3 we see that the parameters (ξ, σ, µ) are approximately (-1,0,1) Type III Weibull and that the block sizes 1 to 10 result in a poor GEV model.

The extreme value model parameters (ξ, σ, µ) are better if the values from block maximum method are more extreme. In this case it is when the block size is about 100 or more. When the block size is too large, the fit will be less accurate since there are barely values left to fit. Therefore we do not need to look at block sizes of 150 and more. We check whether the fit is good, by comparing the quantiles of the data with the quantiles of the estimated GEV distribution in a Q-Q plot.

(17)

(a) ξ versus the block size (b) σ versus the block size

(c) µ versus the block size (d) Q-Q plot

Figure 3: (a)(b)(c) Plots of the parameters and (d) a Q-Q plot for which parameters (ξ, σ, µ) = (-0.9725,0.0194,0.9801) were used with block size 50.

Comment: The Q-Q plot of figure 3d shows that the fit is rather good.

Only for lower quantiles the Q-Q plot deviates from the diagonal.

3.2.2 Normal distribution

We have generated a data set with 10⁵ random variables from a normal distribution with mean 10 and standard deviation 4. From the new parameters (ξ, σ, µ) we see that the GEV is different and behaves different. From figure 4 we see that the parameters are approximately (0,1,24).

This is a Type I Gumbel GEV and the values are for block size 5,000 (-0.0136,0.9931,24.6537). Since the Q-Q plot is good, we can accept those parameters for the GEV.

(18)

Figure 4: (a)(b)(c) Plots of the parameters and (d) a Q-Q plot for which parameters (ξ, σ, µ) =(-0.0136,0.9931,24.6537) were used with block size 5,000.

The fitting becomes less accurate when the block size is over 5,000, since the parameters change dramatically with fewer than 200 values to fit. How- ever the Q-Q plot of a block size of 5,000 is still good, since the upper right corner of the Q-Q plot in panel 4d in exactly on the diagonal line. The upper right corner of the Q-Q plot is more important, since they represent the more extreme values.

3.2.3 Earthquake data

In the Netherlands the KNMI keeps a public record of induced earthquakes [8] from 1986-12-26 until 2016-05-01. Induced seismicity refers to typically minor earthquakes and tremors that are caused by human activity that alters the stresses and strains on the Earth’s crust. From this record I extracted the magnitudes in a (1259,1)-matrix. The lowest magnitude of the record is -0.8 and the highest magnitude is 3.6. Magnitude calculations are based on

(19)

a logarithmic scale, so a ten-fold drop in amplitude decreases the magnitude by 1. The magnitude of 3.6 was in Huizinge, Groningen. The second largest magnitude was 3.5 was in Westeremdum (Groningen) and Alkmaar (North- Holland). All of these earthquakes are caused by the gas extraction in the neighbourhood. In figure 5 we can see that common earthquakes are under a magnitude of 2. We also see what the history of the earthquakes was between 1986 and 2016.

(a) Data plot, with equal intervals

(20)

(b) Histogram of the data

Figure 5: Induced earthquake data of the KNMI

From the data we can estimate for each block size the parameters (ξ, σ, µ) for the GEV distribution. Showed in figure 6. The approximate parameters of KNMI dataset is (ξ, σ, µ) =(-0.5,0.5,2.5). This is a type III W eilbull. We see that the Q-Q plot is good, since for the extreme values on the upper right corner it is on the diagonal line.

(21)

Figure 6: (a)(b)(c) Parameter plots of the KNMI record, (d) Q-Q plot with block size 40 and parameters (ξ, σ, µ) = (-0.5757,0.5491,2.7074)

4 Predictions

4.1 Probability density functions

In the chapter before we know that for every data set we can get the parameters exactly if the distribution of the data set is known. If we do not know the distribution exactly, then we can estimate the parameters which we need to check with a Q-Q plot. In section 3.2 we made three examples of estimating the parameters. Once the parameters have been obtained we can plot the density function, similar as in section 2.4.4.

(22)

(a) Probability density function of the uniform GEV distribution with (ξ, σ, µ) =(-0.9725,0.0194,0.9801)

(b) Probability density function of the normal GEV distribution with (ξ, σ, µ) =(-0.0136,0.9931,24.6537)

(c) PDF of earthquake GEV distribution with parameters (ξ, σ, µ) =(-0.5757,0.5491,2.7074)

Figure 7: (a) the density function of the uniform distribution from H3.2.1 (b) the density function of the normal distribution from H3.2.2 (c) the density function of the earthquake distribution from H3.2.3.

In figure 7a see that the maximum value of the uniform data is 1. That is exactly correct since the maximum value is 1. In figure 7b we see that a normal distribution with mean 10 and standard derivative 4 could give values of 30 and higher. In figure 7c we see that the GEV density function stops at magnitude 3.67. Therefore the Netherlands should not expect an induced earthquake of magnitude of 3.7 or higher.

(23)

4.2 Return levels

Often the blocks are chosen to correspond to a time period of length one year, in which case n is the number of observations in a year and the block maxima are annual maxima. Estimates of extreme quantiles of the annual maximum distribution are then obtained by inverting (6).where G(z_p) = 1 − p. In common terminology, zp is the return level associated with the return period

1

p (i.e small probability means large period to happen again), the level z_p is expected to be exceeded on average once every ¹_p years. More precisely, z_p is exceeded by the annual maximum in any particular year with probability p.

With the inverse of the GEV function we can calculate the probability for an extreme event to occur.

4.3 Return levels of Annual data

To compute return levels we need the inverse of (6), which is:

zp =

(µ − ^σ_ξ1 − {− log(1 − p)}^−ξ , for ξ 6= 0, µ − σ log − log(1 − p), for ξ = 0.

4.3.1 Annual data example

In this example we have 365,000 values of data, in which we get each day for over 1,000 years a random normal value with mean 5 and standard variation 2. The GEV parameters we obtain is (ξ, σ, µ)=(-0.0904,0.6763,10.5475).

With this we can compute annually what the return levels are.

Figure 8: Probability any sample is smaller than the (y-axis) value.

(24)

The inverse function domain is probability. So 60% as an input will give a value such that the probability of having an extreme value , greater or equal than that value is 40%. So 90% as an input in the inverse function will give a value such that, the probability of having an extreme value equal or greater to that value is 10% or in uncommon terminology, the change of having a sample equal or less to that value is 90%. 100% as an input will give the maximum value possible in the GEV, since all values should be less or equal to that value.

There is roughly fifty-fifty on value 10.67 with the annual data with mean 5 and standard derivation 2. 99% of the time the maximum annual is higher than 9.44. 1% of the time the value is 13.0928 or higher. 0.01% of the time 14.7750 or higher could occur. This has a return period of 10.000 years. In general we are only interested in the extreme scenarios of the last 1-10%.

The common terminology 1 − p can be confusing, but 100% is the maximum value that could occur according to the GEV model and 0% is the lowest value that could occur. [6]

From all the data we have we can compute the return levels.

Probability Uniform data Normal data Earthquake data Annual data

90% 0.9978 26.8547 3.4001 11.9246

91% 0.9980 26.9613 3.4162 11.9854

92% 0.9983 27.0797 3.4330 12.0523

93% 0.9985 27.2129 3.4505 12.1269

94% 0.9987 27.3655 3.4690 12.2114

95% 0.9989 27.5446 3.4887 12.3092

96% 0.9992 27.7621 3.5099 12.4260

97% 0.9994 28.0399 3.5334 12.5724

98% 0.9996 28.4277 3.5603 12.7712

99% 0.9998 29.0821 3.5937 13.0928

99.5% 0.9999 29.7280 3.6160 13.3936

99.9% 1.0000 31.2010 3.6433 14.0220

100% 1.0000 97.6758 3.6612 18.0287

If we want to compute the return period, we need to have a data which is consistent in the period of measurement. Such as each day or each x time units. Each data set we have except for the inconsistent earthquake data has this property. To calculate the return period. We need to find p from (1 − p).

If we take the 99% we have that p is 1%. Thus the return period is ¹_p, which is 100x, thus once in the 100 days we expect a value of 13.0928 or higher of the Annual data. For 50%, likewise, it is once in the 2 days with the value of 50%.

(25)

Because the Earthquake data does not consist of samples measured reg- ularly in time, we can average the time period. There are 1259 earthquake from 1986-12-26 till 2016-05-01. Which is a time period of 11,033 days.

The average earthquake is every 8.7633 days. Thus an earthquake of 3.5937 (99%) is every 876.33 days or 28.78 months or 2.39 years approximately. The reason this is inconsistent is because between 1986 and 1996 there are 108 earthquakes. Between 1996 and 2006 there are 340 earthquakes and between 2006 and 2016 there are 769. The last 10 years have an earthquake per 4.7 days ratio. Almost double the average and the largest magnitude induced earthquake was in the last 2006-2016 period with one of the second largest in 2006.

(26)

A Matlab code

A.1 Maxblock

This code is made to change a row matrix in a new row matrix s.t for each blocksize b only the maximum will sent to the new row matrix. The last block is often smaller and ignored since the original row matrix is often not divisible by b in length and it will disrupt the data, because it could not give extreme data. K is the numer how many times it does fit in. That is automattically the length of the new row matrix.

1 f u n c t i o n[ C]= maxblock (A, b )

2 %maxblock (A, b )

3 %i n p u t m a t r i x A van een r i j 1 t o t L

4 %een een x waarde m a t r i x r i j e r o n d e r . .

5

6 %i n p u t b

7 % b l o c k l e n g t h b

8 L=l e n g t h(A) ;

9 B=A ( 1 : ( L−mod( L , b ) ) ) ;

10 e=(L−mod( L , b ) ) ;

11 K=e /b ;

12 C=z e r o s( 1 ,K) ;

13 f o r i =1:K

14 j=i ∗b ;

15 C( 1 , i )=max(B( ( j −(b−1) ) : j ) ) ;

16 17 18 19 20 21

22 end

23 C ;

24 c l e a r B

25 c l e a r e

26 c l e a r K

27 end

(27)

A.2 Parameter fitting

This code is made to compute the parameters of the maxblock matrix. How- ever it does only computes one row matrix of a chosen blocksize b. Since we do not know what the best b is to have the best parameters. The code will be used later in a more general code.

1 %Gev e s t i m a t o r

2 f u n c t i o n [ P]=GEV(C)

3 %GEV(C)

4 %i n p u t m a t r i x A van een r i j 1 t o t L

5

6 paramEsts = g e v f i t (C) ;

7

8 P=paramEsts

9 end

A.3 P-P and Q-Q plot

This is a code to make a P-P plot of a row matrix with a specific blocksize b, after we run GEV and maxblock.

1 k=P( 1 ) ;

2 sigma=P( 2 ) ;

3 mu=P ( 3 ) ;

4 5

6 pd=m a k e d i s t ( ’ G e n e r a l i z e d E x t r e m e V a l u e ’, ’ k ’, k , ’ sigma ’ , sigma , ’mu ’ ,mu) ;

7

8 f i g u r e

9 p r o b p l o t ( pd , C)

(28)

This is a code to make a Q-Q plot of a row matrix with a specific blocksize b, after we run GEV and maxblock. Note: the function qqplot has different built-in domains than probplot.

1 %n o d i g C en P van GEV.

2

3 k=P( 1 ) ;

4 sigma=P( 2 ) ;

5 mu=P ( 3 ) ;

6 7

8 pd=m a k e d i s t ( ’ G e n e r a l i z e d E x t r e m e V a l u e ’, ’ k ’, k , ’ sigma ’ , sigma , ’mu ’ ,mu) ;

9

10 f i g u r e

11 q q p l o t (C, pd )

A.4 Parameter plotting

The script Aar I used only for the Earthquake data. Since it was a small row-vector. A is the matrix of the KNMI data. It will give all parameters of each b.

1 %b a s i s m a t r i x Aar , z o n d e r maxblock .

2

3 f o r i =1:50

4 b=i ;

5 C=maxblock (A, b ) ;

6 GEV(C)

7 end

(29)

This code is the basic code to store all parameters in T. With in this case b

∈ [1, 60].

1 %GEV60

2 T=z e r o s( 6 0 , 3 ) ;

3

4 f o r i =1:60

5 b=i

7 H=GEV(C) ;

8 T( i , 1 : 3 )=H;

9

10 end

11 T ;

12 f i g u r e

13 p l o t(T)

The same and slower version for large data sets.

1 %GEV500 p l o t

2

3 T=z e r o s( 5 0 0 0 , 3 ) ;

4

5 f o r i =10:5000

6 b=i

8 H=GEV(C) ;

9 T( i , 1 : 3 )=H;

10

11 end

12 T ;

13 f i g u r e

14 p l o t(T)

(30)

References

[1] Fréchet, M., (1927). "Sur la loi de probabilité de l’écart maximum." Ann.

Soc. Polon. Math. 6, 93.

[2] Fisher, R.A., Tippett, L.H.C., (1928). "Limiting forms of the frequency distribution of the largest and smallest member of a sample." Proc. Cam- bridge Philosophical Society 24:180 - 190.

[3] Gumbel, E.J. (1958). "Statistics of Extremes." Columbia University Press, New York.

[4] Mendes-Victor, L., Oliveira, C.Azevedo, J.Ribeiro, A. (Eds.) (2009) "The 1755 Lisbon Earthquake: Revisited" Springer.

[5] M.-A. Gutscher , Université de Bretagne Occidentale, Brest, France, , 250th anniversary of lisbon earthquake. The Great Lisbon earthquake and tsunami of 1755: lessons from the recent Sumatra earthquakes and possible link to Plato’s Atlantis

[6] Stuart Coles An introduction to statistical modeling of extreme values.

Springer, page [2-3,27,46,49] 2002.

[7] Wilk, M.B.; Gnanadesikan, R. (1968), "Probability plotting methods for the analysis of data", Biometrika (Biometrika Trust) 55 (1): 1–17 [8] http://cdn.knmi.nl/knmi/map/page/seismologie/all_induced.pdf

Extreme value statistics