Time Frame of interaction: An empirical approach to uncover the multi-temporal aspect of climate data

(1)

Masters Thesis

Time frame of Interaction

Author: Jier Nzuanzu Supervisor: Dr. Dim Coumou Sem Vijverberg Examiner: Rick Quax

A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computational Science

in the

Computational Science Lab Informatics Institute

(2)

Declaration of Authorship

I, Jier Nzuanzu, declare that this thesis, entitled ‘Time frame of Interaction’ and the work presented in it are my own. I confirm that:

This work was done wholly or mainly while in candidature for a research degree

at the University of Amsterdam.

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly

at-tributed.

Where I have quoted from the work of others, the source is always given. With

the exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself.

Signed:

Date: 1 August 2020

(3)

(4)

UNIVERSITY OF AMSTERDAM

Abstract

Faculty of Science Informatics Institute

Master of Science in Computational Science Time frame of Interaction

by Jier Nzuanzu

Incorporating dynamics of climate precursors occurring on a certain timescale with a maximized signal-to-noise ratio is crucial to obtain best forecast skill. And these dy-namical properties are obtained by aggregating on precursors by choosing an appropriate window-size. Yet, some climate indices are not easily captured within one window-size as they may contain signals on multiple timescales. In our current time were rising temperatures are more common producing heatwaves, a better understanding of the multi-temporal aspect of climate indices is indispensable. In this research we develop an empirical baseline in which we are able to retrieve the timescale in which the signal-to-noise ratio between precursor and target is maximized. With the use of the community guided search, Response Guided Causal Precursor Detection (RGCPD) pre-processed with a discrete wavelet transform, we find that simple observational data can uncover buried signal by noise through the Momentary conditional Independence (MCI) measure of the causal discovery framework PCMCI. A robustness test evaluation of our baseline through AR(1) processes additionally reveal that multi-temporal timescale climate pre-cursors are difficulty captured in a linear lag fashion by PCMCI.

(5)

First and foremost I am grateful to the opportunity I have obtained from Dr. Coumou to study this subject, your overview in literature and simple yet difficult question has helped me enormously. Secondly, but not least, I speak my gratitude in the patience and openness you have shown toward me Sem Vijverberg. Who would have thought, from our first lecture in complex analysis in the same major to the same supervisor for our master thesis, Lennart Mettrop, thank you. You were there to witness the hardship and joys of this process. Willemijn, you have always been there, your enormous emotional support has helped me through to also realize that a hurdle a day does not mean the end of the world. Mom, although, you still will not understand what I actually I major in, I am thankful to you.

(6)

List of Figures

3.1 Soil moisture data sampled on a daily mean . . . 17

3.2 SST data sampled on a daily mean . . . 17

3.3 Geostrofic height sample daily. . . 17

3.4 The different edge color represents the sign of the significance of the link,

a positive or negative in the present time τ = 0, different edges depicts

the link significance in different lags. The nodes’ color represent the

autocorrelation . . . 17

3.5 Mid-pacific Multiresolution analysis done with wavelet Daubechie 4. This

means that the wavelet contains on the first level decomposition 8 waves within a frame that will traverse the signal. As we go further down the wavelet widens and we obtain four wavelets and so on. Left are the scale approximations, showing the long trend of the data. Alternatively as we can see,the data smoothens. Right are the short change of variation cap-tured by the wavelets, the details. These are the high-frequency content of the data. Going further than level 7 for this data is not interesting as we can see in approximation as details; the signal are elongated. Moreover,

for this daily data we have in fact 27 = 128 days of approximation but

27−1 = 64 days of details.This is within our JJA-period. Approximation

are redundant by design and wavelet are numerically calculated by the

difference between approximation scales. . . 18

3.6 On the upper most plot we present the generated AR(1) data.

Subse-quent plots show the autocorrelation, Partial correlation, Periodogram, distribution, QQ-plot and Ljung-Box Pierce test. Especially the Partial correlation plot confirms that we sample an AR(1) as only the first lag is significant above the default 95% threshold interval. The Quantile-to-Quantile plot together with the distribution confirm that the AR(1) has been standardized. Lastly, the Ljung-Box Pierce test confirms that the residual of AR(1) up to lag 20 are have no autocorrelation. This means the data cannot reject the null hypothesis the data are independently

distributed. . . 20

3.7 Default shape wavelets and scale coefficients . . . 22

3.8 Different shape on level 11 decomposition . . . 22

(9)

3.9 A representation of the wavelet used to decompose an ECG signal, LA8 stands from Least Asymmetric wavelet with length 8. It meets the afore-mentioned property to be suited to denoise a signal. The scaling function that captures the long trend of signal,also known as approximation. Then the wavelet function that captures the subtle changes in the signal also known as details. Approximations are long evolving trend captured, low frequency and details are high frequency that can directly be translated to the sampling of a timeseries. Approximation are redundant due to the requirement of the dyadic grid and details are difference of consecutive ap-proximations. They have unit scale. Therefore the physical interpretation

uses 2m−1∆t instead of 2m approximation, m ∈ N. . . 22

3.10 A NPESS of the original ECG signal and the NPESS of each level show that higher level decomposition extract different components of the signal. The original signal itself and the first level decomposition are close. In term of NPESS, this shows that stopping at level 1 decomposition will

only retain more noise and nothing else. . . 22

3.11 Multi-resolution decomposition of the ECG signal with LA8. The ap-proximation shows the long trend that still is captured from the chosen depth. The vertical dotted line show how the wavelet extract these peaks at different levels. The higher the level, the slight displacement of the signal, the shape of the wavelet modifies along each level decomposition. Forcing alignment of the wavelet with the signal as Percival and Mofjeld [1997] promotes does solve this issue, but it does not result to the total variance of the original signal. This means adding up the variance of the

approximation and details levels. . . 23

3.12 An inspection of per level decomposition of ECG signal, autocorrelation and partial correlation. Upper graph is the ECG analysis itself, below is the approximation signal at level 5 decomposition that gives the promi-nent properties of ECG at that level decomposition. While the autocorre-lation strongly is present in signal as decomposition, detail 4 and 5 seem to show the important autocorrelation. The partial correlation however is very instructive in how actual independent the signals are, and each level decomposition show a different prominent lag. The most partial correla-tion driver is shown at level of approximacorrela-tion after decomposicorrela-tion, which

depict the slow trend of the signal. . . 25

3.13 Autocorrelation of ECG decomposition up to level 5, ECG included. . . . 25

3.14 Partial correlation of ECG decomposition up to level 5, ECG included. . . 25

3.15 Autocovariance per level decomposition using LA8 on ECG signal . . . . 26

3.16 Change of variance per level decomposition using LA8 on ECG signal. ECG variance of 1575.5 and 1574.0 when approximation and details

vari-ance are summed up. . . 26

3.17 What strikes on the left image is the resemblance with Figure 3.11 where we test the autocovariance per scale. This reflects directly the essence of the original signal. Intriguingly, we see that the last levels, 4 and 5, are small signal and quite noisy. On the right the variance per level with a

χ2 ≈ 5% confidence interval show how there is almost no variance further

down the decomposition. But also the confidence is very smaller due to

(10)

List of Figures ix

4.1 From left to right we see the MCI output of the PCMCI that shows

significance. Each level, on the x-axis has a positive peak as we expect it to be. The peak thus symbolise the ability of MCI to find the signal at the level of extraction. The not included levels show the same behaviour. The chosen depth decomposition is evaluated by the pywavelet package using MODWT transform until there exist no signal to decompose. This

means up to level 12 decomposition. . . 31

4.2 Left we see a NPESS decomposition of our target timeseries, level 10, 11, 0, 1

first two low energy signal are extracted from the level decomposition. However, level 0, 1 show an alternation between high, low partialized level

energy. This is reasonable as level 1 is on the daily timescale 21−1, 22−1

and level 1 on a 2 day timescale, both noisy timeseries. On the right we

see a variance plot per level decomposition, with a χ2 -confidence interval

of 5%. This translates to most information is within the interval levels

6, · · · , 11. A difference of 6 · 10−5 difference with the reconstructed

vari-ance by wavelet decomposition and the original varivari-ance of the signal.

The signal has been well captured. . . 31

4.3 On the left the autocorrelation of the original signal and its decomposed

levels. The approximated level depict a long trend result of the data to that level. As of autocorrelation, it shows highly correlated data, the partial correlation, plot on the right, however depicts just one distinct

signal. . . 32

4.4 An interesting result is that the negative MCI values of the signal are

close to the positive signal, as if it announces the arrival of a positive peak. Level 6 and 8 show the behaviour we expect. However, level 7 does not. Here we presented both the artificial precursor and modified target

without an added shifted lagged data to PCMCI . . . 32

4.5 A shifted lag included as precursor to find the peak as solution of the

absence of the peak with a plain AR process. Level 6, 7 depict a similarity as both have a negative value MCI, while a peak for level 8 is completely

absent. . . 33

4.6 A multi-resolution decomposition of the AR signal, left; green depict the

detail signal, high- frequency and red the low-frequency. We see that the choice of depth, clearly reduce the signal to almost a flat line. On the right side a NPESS where we immediately can notice that level 1,2 do not really convey the energy of the data. That is, the data is quite possibly

highly noisy on those levels. . . 34

4.7 Left we see the generated sine wave, just a small view, with an amplitude

of 5 and ω = 0.1. The amplitude correspond with the magnitude of the timeseries. The noisy sine just illustrate the distortion that might occur when the sine noise is perturbed with white noise. On the right is the multi-resolution of the composed AR + clean sine timeseries. Note the regularity starting from level 5, this is explained by the regularity that

the sine signal has and the matching frequency of the wavelet.. . . 34

4.8 An AR contaminated with a pure sine wave, each level shown and not

(11)

4.9 An AR contaminated with a pure sine wave. None of the level MCI peaks expected are found by PCMCI. Instead, level 6 and 7 show a negative peak. Level 8 do show a similar peak as the version without the lag shift included. But then the expected peak at level 9, not shown here, no MCI

peak is found. . . 35

4.10 From left to right the variance per wavelet level decomposition to illus-trate the difference of the SST, East, Mid-pacific with response variable temperature. Note that most information are contained in the lowest level

decomposition SST different from the temperature. . . 36

4.11 Mid-Pacific MCI peak for levels 4, 5 and 8, here the peak of MCI is consistently at level 9 or 10. Here we take the negative MCI value as the PCMCI finds a negative significant link consistently. Surprisingly level 8

shows some inconsistency and gives a reason to not fully trust the results. 36

4.12 A Mid pacific autocorrelation and partial correaltion accompanied with

similar analysis for the level decomposition of the target.. . . 37

4.13 East pacific MCI analysis for level 1 to 3, all other levels show consistent

peak. . . 38

4.14 East pacific autocorrelation and partial correlation and the target level

(12)

List of Tables

3.1 The values per level decomposition coefficients, there is no difference

be-tween approximation or details decomposition. As one level decomposi-tion composed of approximadecomposi-tion and detail contain a content of the data at that level. However, due to the sub-sampling impose by the dyadic grid of two, up-sampling is required to reconstruct the data from its orig-inal size, 14606 data-points. The first level reconstruction misses 6 data points due to boundary conditions. But as we see, more data is added by

up-sampling with value divisible by two . . . 18

(13)

1 Baseline Wavelet signal per level analysis, where each omitted level decom-position is seen as an artificial precursor and newly reconstructed signal as target. The target will contained the decomposed signal in additive way. The aim is to then recover the signal by the Graphical model PCMCI by the mean of a MCI peak value. This will quantify and indicate how

significant the level is. . . 28

2 An extension of the baseline, where we intend to have all values in the

same time unit, regardless of the Wavelet timescale decomposition. An extra added precursor is added to create more independent data points. This idependence occurs by lag shifting the m begin data points of the target, and m last data points of the modified target in powers of 2 as we

work in dyadic grid. . . 28

(14)

Abbreviations

Institute

CSL Computational Science Lab

UvA Universiteit van Amsterdam

VU Vrije Universiteit Amsterdam

Climate

SST Sea Surface Temperature

SM Soil Moisture

S2S Subseasonal to Seasonal

IPCC Intergovernmental Panel on Climate Change

ENSO El Ni˜no Southern Oscillation

NAO Northern Atlantic Oscillation

WPO Western Pacific Oscillation

EPO Eastern Pacific Oscillation

PNA Pacific North America

JJA June July August

Algorithms

RGCPD Response Guided Causal Precursor Detection

DWT Discrete Wavelet Transform

MODWT Maximum Overlap Discrete Wavelet Transform

PCMCI PC Momentary Conditional Independence

AR Auto Regression

WT Wavelet Transform

FT Fourier Transform

EMD Empirical Mode Decomposition

(15)

NPESS Normalised Partial Energy Sequence Signal

ICA Independant Component Analysis

EOF Empirical Orthogonal Function

SGSA Symplectic Geometry Spectrum Analysis

Causality

MIT Momentary Information Transfer

IT Information Transfer

CEN Causal Effect Network

MCI Momentary Conditional Information

GC Granger Causality

GGC Graphical Granger Causality

(16)

Chapter 1

Introduction

In recent years, we have seen a rise in high temperatures during summer periods, which

are known as a heatwaveCoumou and Rahmstorf[2012]. The Intergovernmental Panel

on Climate Change (IPCC) has urged governments to aim for lowering the CO2

in-take into the atmosphere of 2 degree Celsius Guiot and Cramer [2016], Change [2014].

With these climate developments in mind, the more frequent hot temperatures recorded, require a better understanding. How we could predict these phenomena accurately is one of the active research topics in climate science. That is not an easy task as in a climate system we are dealing with multi-timescale interactions of climate indices. For example between sea slowly fluctuating surface temperature (SST) regions in time and other varying processes as soil moisture temperature. This multi-timescale aspect poses a challenge in how to accurately predict or understand the physical interaction of these climate indices. While a SST region slowly changes in time, an anomaly herein might promote the occurrence of large-scale atmospheric (Rosby) waves through climate feed-backs as the Walker circulation. Via these waves, remote SST regions can influence circulation downstream, which in turn have an effect on the temperature at the surface. An known example of such mechanism is the ENSO mechanism that produces drought at one place on the earth and flood at another location. Incorporating the knowledge of the transportation of the information say, SST anomaly to temperature variations downstream to surface temperature, is thus challenging. When, this exchange happens between at least two different regions, is an intersting yet challenging question to ask. With this in mind, recent literature acknowledge the present challenge of a physical and statistical understanding of climate data, due to the process, data itself and computation,

Runge et al.[2019a]. Climate model output poses a challenge due to the time-dependant

nature giving rise to autocorrelation and time delays. At the data aggregation level, from the high-dimensional data are reduced by dimensionality reduction techniques.

(17)

For example, a selected time aggregation, sub-sampling which might introduce relations

between different climate indices, that might not be present, Smirnov and Bezruchko

[2012]. Moreover, Ebert-Uphoff and Deng [2014] illustrate that incorporating temporal

knowledge from climate data in a causal setting brings along some pitfalls. The source of pitfalls lies in the signal strength and the fact that causal discovery only picks up the strongest signal. Selecting a climate region that is of interest to study causaly is an active research area, a topic we will not cover. Nonetheless, we will make use of such framework, to deal with the processing and data challenge, that we intend to adress. The computation challenge is dealt by using a generation of AR(1) processes to overcome, what is common in climate data, ample size high-dimensionality.

The selected gridded regions from a climate model, however, have an influence on the temporal resolution; a time aggregation period and averaged time window of that period to search for a signal strength. This is a data challenge. Without a strong signal, no significant link. A lower temporal resolution might lead to a higher autocorrelation of a signal and this might be the only signal present. A higher resolution might be more descriptive, yet short duration effects might be absent. A trade-off is needed to overcome the physsical and statistical limitations presented by the data.

A way to overcome these limitations might be the ability to apply a statistical analysis

that result in a physical meaningful pattern, Ilin et al. [2006] effectively follows this

path. They present an adapted version of Idependant Component Analysis (ICA) - ICA in climate literature known as Emprirical Orthogonal Functions (EOF) - Denoise Source Separation (DSS). The appeal of DSS is that linearity assumption is weakend by two assumptions; the source data should be uncorrealated and it should hold some desired property, for example, non-Gaussian. The uncorrelatedness happens through whitening or sphering the data. These uncorrelated data components need an emphasis, therefore, there is the second assumption that uses a denoising algorithm, a linear and non-linear version. This framework has proved able to extract most prominent internannual time

scale well-known El Ni˜no-Southern Oscillation (ENSO). Addiitonally, slower frequency

components were extracted that provided a meaningful representation of climate vari-ability.

Similarly in Oh et al. [2003], we see an attempt to understand an aspect of the

cli-mate variability, where the focus is on solar irradiance by assuming external forcing, for example, anthropogenic to substantially contributes to low and high-frequency vari-ation. And this variation seems difficult to separate from an internal coupled climate system. Therefore, they apply a signal decomposition method; a Wavelet Transform to understand the characteristics of the climate variability given a small-sized dataset. With the decomposition method not only solar irradiance modes 11 and 85 years have

(18)

Introduction 3

been extracted but, a multi-linear regression has established their relative influence on climate on the Wavelet components. It seems that a mode decomposition method adds an advantage to uncovering a signal from climate data regardless of the composition.

Deo et al.[2017], reinforces the notion that a mode decomposition aids in understanding

the characteristics of climate data. Here a Wavelet decomposition method has been chosen as a preprocessing step for a machine learning neural network. They showed that with a preprocessing of a Wavelet method, the prediction error for a drought index improved. Additionally, their predictive model accuratively obtained the temporal

win-dow that physically is known from the drought index. Moreover, Anshuka et al. [2019],

show in their meta-review of data-driven methods that a preprocessing with Wavelet decomposition for a prediction model outperforms a regular neural network or known ARMA models for a drought index. Again, we note that having a mode decomposition step for climate data aids to analyse the physical understanding as it was the case for

Oh et al.[2003] or improve the predictability of an index forAnshuka et al. [2019].

However, in climate science, there are multiple methods to apply a mode decomposi-tion on time series; Fourier Transform, Empirical Mode decomposidecomposi-tion (EMD), Wavelet Transform, Symplectic geometry Spectrum Analysis (SGSA), Singular Spectrum

Anal-ysis Xie et al. [2014]. For example, Molla et al. [2006] use EMD to investigate climate

changes, specifically from rainfall data. The advantage of EMD is data-dependent; it extracts Intrinsic Mode Functions (IMF) which are separated in intrinsic oscillations using local temporal and structural characteristics of the data. Additionally, IMF’s

energy density distribution satisfies the χ2-distribution, Wu and Huang [2004]. This

property supports a statistical evaluation of their significance with the original signal. Additionally, IMF’s are approximately orthogonal; reconstruction of the original signal

from the IMF returns the original signal. With these properties, Molla et al. [2006],

show that principal frequencies seem to appear around the same peak of the decadal climate variability.

Yet, EMD faces the difficulty of solely being defined by an algorithm, thus lackink an

analytical formulation for a performance evaluation. Nonetheless, Franzke [2009] show

the so-thought existing trend of North Atlantic Oscillation (NAO) index might not be a significant trend when analysed by EMD against climate noise. A climate noise is a part of variability which originates from averaging timeseries whose intrinsic timescale is shorter than the averaging period of the signal, their aggregation. For example, a typical timescale associated with standard atmospherical processes is in the range of 5-10 days. This range, in turn, is associated with synoptic weather systems and teleconnection patterns. The research evaluated the trend by using a thousand realisations of AR(1) fitted processes and averaged it and applied the EMD decomposition. The explained

(19)

variance of the significant IMF’s lying within the confidence interval, 97.5 % and 2.5 %, on a log scale, hypothesised that to be outside of the confidence interval will mean the index is significantly different from climate noise.

Despite the success of EMD to actively retrieve non-linear trend in climate indices as

Franzke [2009] has shown, all previous methods are applied as a preprocessing step

c.q. the Wavelet decomposition. As Yuan et al. [2019] illustrate, the NAO index is

predicted with a high accuracy thanks to the of EMD decomposition. Or that Molla

et al.[2006] use Fourier Transformed surrogate data to evaluate the climate variability

by using the EMD-algorithm. Moreover, EMD, although proved successful in climate science, has drawbacks in its algorithm where for a decomposition of the signal it relies on extrema location, interpolation, and its recursive way of decomposing the signal. Additionally, the widely used Singular Spectrum Analysis, where the decomposition by the Singular Value Decomposition (SVD) still assumes linearity in which climate data

often does not posseses. Or rather exists in a short period. According to Xie et al.

[2014], SGSA, which borrows simplectic geometry strong mathematical foundation in

the study of Hamiltonian dynamic systems, fairs better in dealing with non-linearity. Nonetheless, this framework has not been widely used in literature.

This is not the case for Wavelet decomposition. As Paluˇs [2014] use Wavelet

decom-position as the principal analysis as it has similarity with EMD; the difference between these two is that EMD only requires the characteristics of the signal itself. A Wavelet decomposition, however, involves a choice of a Wavelet to match the features of the input

signal, yet a powerful method Anshuka et al.[2019]. InPaluˇs [2014] , the focus was on

enhancing surface air temperature from various European locations to understand the oscillatory and synchronisation phenomena of atmospheric dynamics on multiple tem-poral scales within a causal frame using Wavelet decomposition. From there on, they obtain the phase and amplitude. Their focus is on the possible influence of slow oscil-lation on the amplitude of higher-frequency variability in a multi-scale setting. These variables are necessary to establish a Conditional Mutual Information (CMI) measure. To be precise, conditioning of the phase by the forward lagged amplitude on the current and lagged backward amplitude.

Furthermore, they formed multiple hypotheses using Fourier Transformed (FT) surro-gate data. The simple premise represents a linear stochastic process in which no interac-tion between different temporal scale exists. This premise uses randomisainterac-tion of Fourier phases. The sophisticated hypothesis does expect an information transfer from larger to smaller timescales by the phase and amplitude locking. Due to the Wavelet Transform (WT), they have been able to show quantitative evidence for the causal cross-scale in-teractions. The conditional means of the amplitude on the present value of the phase in

(20)

Introduction 5

a binned fashion, accomplishes to quantify the effect size. This effect size is just the dif-ference between the maxima and minima of the bins. The decomposed binned signals, however, are only understandable relatively, due to the Wavelet decomposition prop-erty. The quantification of the phase, as an effect size effectively expresses the change of amplitude on the slow cycled represented by the phase of low frequency modes. This study showed an observation that probably a regional manifestation of a general weather phenomenon of cross-scale interaction in the atmospheric dynamics, local low-frequency modes influence.

This established connection byPaluˇs [2014] sheds light on how a multi-scale system

re-gional climate variability and its global counterpart are disentangled in a causal frame-work. This insight provides for a temporal interpretation. Interestingly, the causal framework used was Granger Causality (GC).

Yet, Granger Causality (GC) on itself is, for example, limited on the observable sampling interval of the time series. Additionally, GC is not able to detect almost instantaneous

signals, Runge et al. [2019a]. This limitation means that the causal direction is not

inferrable. Moreover, different causal network learning algorithms do identify these

contemporaneous signals. However, the cause is not always inferrable solely based on conditional independence based methods. To understand the multi-scale synergetic as-pect of climate data by just using a variant of GC is only permissible if more information is present or a completely different approach.

This is not to say that GC is not a powerful tool in climate science;Liu et al.[2010], show

that with an Hidden Markov Random Field (HMRF) and L1-regularized regression GC can successfully infer temporal relation of multiple multivariate time series. Monthly measures of climate and climate forcing were present in the analysis: temperature,

precipitation, vapor, cloud cover, wet days, frost days, CO2, H2, CO, solar radiation

and aerosols from CRU1, NOAA2 and NCDC3. All time series had a span of 13 years

and covered on 2.5 by 2.5 degree grid most of the USA. This approach however focusses on the statistically independence of each time series, and inferring the temporal aspect between these time series.

Additionally, in Lozano et al. [2009] we see a different use of GC, Graphical Granger

Causality (GGC), with respect to the spatial and temporal aspect of climate data. GGC is a collection of methods which combines graphical modelling with the notion of Granger Causality. Particularly a regression algorithm along GC is needed to establish causal links for each variable present in the graph. They use a slight different approach of

1

http://www.cru.uea.ac.uk/cru/data

2_{http://www.esrl.noaa.gov/gmd/dv/ftpdata.html} 3

(21)

GGC, a Group Elastic Net (GEN) is introduced. GEN should account for the spatio-temporal aspect of climate data. Group elastic Net groups time-lagged variables from different time steps for the same feature together, the same grouping is present for grid locations data. This elastic property is necessary as they are interested whether an entire time series additional information provides for a different point observation

rather than a specific lag. Besides the spatial-temporal modelling, they do as well

investigate in climate change attribution. This is established by the extreme value

theory, a natural family of probability distribution for modelling the magnitude of events and those events whose magnitude exceeds a threshold. With the combination of Group

Elastic Net, they have been able to: attribute the change in the temperature to CO2 and

other greenhouse gases, even in presence of solar radiance; extreme value modelling has confirmed the intensity of extreme weather events; the combination of GEN and Extreme value modelling indicate the recurrent level of temperature as well as the presence of

CO2 and other greenhouse gases.

But there exists different causal graph approaches beyond Granger Causality. In

Ebert-Uphoff and Deng [2012], we see the use of the PC algorithm implemented in the

TETRAD package4. Their focus have been to derive hypotheses between atmospheric

low-frequency prominent modes -Western Pacific Oscillation (WPO), Eastern Pacific Os-cillation (EPO), Pacific North America (PNA) pattern and North Atlantic OsOs-cillation (NAO), in a causal framework. Not only are they able to extract a causal relationships between these tele-connections that are conform with the physical processes, they state that, this causal approach aids in formulating new hypotheses regarding the time scale between the tele-connections. With this causality framework, they essentially take a different route where in GGC an independence between multiple variable is required and further separated with a regression, here a Conditional Independence test is re-quired and the causal direction between nodes is explored. In this research they coin the directionality between nodes an information flow.

In Kretschmer et al. [2016] we see the use of this information flow more clearly. Here

the focus was of causal discovery between climate Artic actors on winter circulation on weekly to monthly time scales to uncover drivers of the circulation and give insight behind these mechanisms. The research coins their approach as a Causal Effect Net-work (CEN). The preference of a CEN is the ability to overcome spurious correlations due to autocorrelation, indirect effects or common drivers which are the property of the

PC-MCI algorithm proposed by Runge et al. [2012a],Runge et al. [2012b] and

intro-duced in Runge et al. [2014]. The main contribution of the algorithm is to account for

autocorrelation in time series. This property greatly influences the analysis of climate data. Autocorrelation in time series introduces ambiguity in the statistical and physical

4

(22)

Introduction 7

interpretation of climate analysis. Moreover, the improvement in this algorithm

com-pared to Ebert-Uphoff and Deng [2014] is that in the PC step assumes directionality

between nodes. Additionally, the MCI step uses Momentary Information Transfer ap-proach (MIT) instead of Information Transfer (IT) together with partial correlation as an independence test. This approach, where only a subset of time lagged data obtained from the PC step is explored for further significance answers the computational chal-lenge of data complexity. The only two variable needed for this graphical algorithms are the alpha value and maximum lag taumax. The alpha values hyper-parameter, serves the algorithm to converge given a certain threshold.

The accessibility of PC-MCI might then be a favorable tool to use with respect to

climate variability and its low frequency component data. However, understanding

extreme weather events and isolating signals to causally investigate is a challenging task,

Kretschmer et al. [2017]. To overcome the challenge a data driven Response Guided

Community Precursor Detection (RG-CPD) method with a CEN is introduced. The advantage of the method is the incorporation of tele-connection and the accountability of different time representations of these tele-connections detected in the CEN. With such a framework they are able to recover the most relevant physical processes that are causal in a statistical sense. Moreover, modifying the time lags in the framework in which they investigate introduced a flexibility in predicting skill at sub-seasonal timescales for the Stratospheric Polar Vortex (SPV). The SPV has an influence in the winter weather, and RGCPD had been able to uncover the physical process through the relevant causal predictors.

A next step is to make predictions using these causal precursors. Maximizing the signal-to-noise ratio is crucial to obtain the best forecast skill. We often apply binned temporal aggregation to both the target of interest and the precursor, with a window-size cor-responding to the timescale of the precursor. Binned temporal aggregation refers to taking a 7-day period and down-scaling it to a 7-day mean value. After down-sampling to a certain window-size, the timeseries naturally contain most variability on the 7-day timescale. Hence, aggregating to a window-size means that we attempt to focus on dy-namics occurring on a certain timescale. This way, we try to maximize the signal of the precursor and eliminate as much noise as possible from the target. However, choosing a window-size to aggregate may not be trivial. For example, precursors from the Pacific

ocean may contain both a signal on decadal, multi-annual Newman et al. [2003] and

arguably even on sub-seasonal McKinnon et al.[2016] timescales.

To actually capture this signal attributed to a low frequency precursor is not trivial. When constructing a conventional statistical model for sub-seasonal forecasting, one has multiple data point per year defined at a certain lag with respect to the target period

(23)

having an equal amount of data points. It is therefore not possible to aggregate the precursors with different window-sizes. Another potential solution is to keep a shared time-axis and to maximize the signal-to-noise ratio of precursors by applying filtering techniques.

Here, we would like to develop an empirical approach to extract the optimal window-size which optimizes the signal-to-noise ratio between precursor and target. Therefore, we intend to combine a denoising method with the RGCPD framework. The denoising method at choice will serve to investigate different timescales between target and pre-cursor. And PCMCI is to reduce the autocorrelation in these timescales. The goal is to asses the timescale in which the potential precursor has a strong link with certain response variable holds, quantified by the MCI values.

This means, we will focus on understanding this information transfers on a case study of the Eastern U.S. which is situated between the Atlantic and Pacific Ocean (SST) regions. Here, we solely focus on the time aspect of our precursor regions SST and our

target Eastern US heatwave. This connection has been established byKraaijeveld and

Vijverberg [2019] in a cluster algorithmic way.

To understand this process, we need to address two challenges; scarcity of data due to different time aggregation with RG-CPD, a method to establish the existence of timescale of which precursor and target share the most information. The first challenge will be addressed by using an AR process to overcome the scarcity. Along with the AR process, we intend methodically to add more complexity in it, in order to establish a framework of interaction between response variable and potential precursor. To arrive at this goal we will investigate the following question:

Can we algorithmically detect the timescale at which the signal-to-noise ratio between precursor and target is maximized?

This research is further composed of a background section where we will elaborate on the theoretical concepts needed to understand our method and result section. The method section will go into detail on the chosen settings, challenges for our purpose and the designed algorithm After the result section,we will first discuss our result in depth and evaluate our hypothesis that we developed in the method section. Shortly after, we will recapitulate the conclusion previously made to finish with a look forward for future work.

(24)

Chapter 2

Background

In this chapter we will explore on how filtering algorithm works in order to overcome the time resolution limitation. Next, we will explore how to overcome the data scarcity by presenting a widely known method in climate science,(V)AR data simulation. With these two methods we will need a framework to investigate our main question. Therefore, we will properly introduces the RG-CPD framework and to finish with the graphical algorithm used, PMCI.

2.1 Mode decomposition

As we have seen in chapter 1there exists different mode decomposition methods.

How-ever, choosing one comes with some theoretical assumptions. Here, we choose to elabo-rate on Wavelet Analysis as the defacto method for mode decomposition.

2.1.1 General Notion

To understand the discrete wavelet transform property a proper introduction in Wavelet transform is required, particularly how the discrete relates to the continuous wavelet transform. A wavelet is a function that converts a function or signal into a more succinct representation. With a wavelet function two operations can be achieved, a translation

through the input and the dilation of the Wavelet, its scale. While applying these

operations on a wavelet a new signal representation unfolds from the original signal. However, a choice of the original non-translated nor dilated wavelet, the mother wavelet is an essential part as the shape of the wavelet has influence on the signal representation. But this feature might also present an advantage as there are different wavelets capable of decompose the same signal.

(25)

A wavelet function itself needs to obey certain criteria in order to be useful for signal decomposition. A wavelet should have the following property:

E =

Z +∞

−∞

|Ψ(t)2|dt < +∞, (2.1)

which means it should follow the property of the original signal f (x) in order to be

an inner product in space where f (x) ∈ L2R, Mallat [1989]. Equation 2.1 can also be

interpreted as the integral of Ψ equals zero and the square equals 1,Percival and Walden

[2000]: if both equation hold then for any , 0 < < 1 there must be an interval of

finite length such that:

Z T

−T

(Ψ(t))2dt < 1 − . (2.2)

This means for small enough values of close to zero, there exist a finite signal, small wave, wavelet. This can be confirmed by the condition of the integral of square Ψ to be one where values needs to be bigger than zero, while the integral to be zero implies that any positive values need to be cancelled out. Hence, a small wave. How we sample the value of the wavelet between the finite interval, is possible if we impose the admissibility argument. A wavelet is said to be Fourier admissible if :

Ψ(f (t)) ≡ Z +∞ −∞ Ψ(t) exp−i∗2π∗f ∗tdt, (2.3) such that Cψ = Z ∞ 0 |Ψ(f (t))|ˆ 2 f (t) df < ∞ (2.4)

hold. This implies that the wavelet has no zero-frequency, or differently put a zero mean.

Each wavelet has different admissible constant Cψ and complex wavelets must contain

real and vanishing frequencies for negative values. This research will only focus on real wavelets. The advantage of the admissibility criterion along with the previous criteria is that a wavelet will behave as a band-pass filter. Only those signal components within a finite range frequency(filter-band) and in proportion characterized by the energy of the

wavelet are filteredAddison[2017].

The advantage of a wavelet filter for a signal f is to manipulate the wavelet function Ψ(t) in its dilation or contraction of a while translating b-step through the signal. This flexibility is a key difference between the wavelet filter and the Fourier transform of f , where the latter has a fixed window passing through the signal while the wavelet is easily manipulated. This advantage of the wavelet is where the wavelet can capture different

aspect of the signal, or say zoom in a specific are of the signal Daubechies and Bates

(26)

Background 11

m ∗ ω0

T_m,nwin(f (s)) = Z

f (s) ∗ g(s − nt0)e−i∗m∗ω0∗s, (2.5)

where m, n ∈ Z, ω0, t0> 0 has a similar formulation in the discretized wavelet transform.

Namely, the wavelet transform for a finite interval a, b can be formulated as

(Twavf )(a, b) = |a|12

Z

f (s)Ψ s − b

a

. (2.6)

Discretizing equation2.6in a natural way is to sample a, b using a logarithmic

discretiza-tion of a parameter and link this with b steps locadiscretiza-tions. So moving b direcdiscretiza-tion will be proportional to the scale a. So we obtain the following discretization:

T_m,nwave(f ) = |a0| m 2 Z f (s)Ψ s − nb0a m 0 am₀ = a −m 2 0 Z f (s)Ψ(a−m₀ s − nb0), (2.7)

where m ∈ N controls the wavelet dilation, n the translation, a0 is a specified fix dilation

step parameter set at a value greater than one, b0 is the location parameter that must

be greater than zero. In both equation 2.5and 2.7 we again see that all gω,t covers the

same window widths regardless of ω, while for the wavelet Ψa,b the width is adapted to

the scale a.Thus a flexible representation.

2.1.2 Discreet Wavelet Transform

Equation 2.7 provides the expression of the discrete wavelet transform that we will

develop. Besides the continuous representation of a wavelet transform in equation 2.6,

the discrete wavelet transform has merits on its own,Percival and Walden [2000]. As

we know from the Fourier transform, the function f can be decomposed numerically

by taking the dot product with its Fourier representation g_ωm,n₀_,t₀(s − n ∗ t0) exp−i∗m∗ω∗s.

Modifying the n shifts by t0-steps and its multiples, allows for the reconstruction of all

f dot product with Twin

m,n(f ). The analogy also hold for the wavelet transform Tm,nwave

also known as wavelet coefficients, detail coefficients Daubechies [1988]. In literature

for Fourier analysis a compact function g is preferred and a very popular choice is a Gaussian g. This choice of g is supposed to be well concentrated in both time and frequency, if it is the case, then the content of function f near time t and near frequency ω can loosely be interpreted.

Analogously, the wavelet transform requires these criteria in order to have a good repre-sentation of function f while keeping its dynamic. That means a correct reprerepre-sentation of the signal depends on the energy of the resulting wavelet coefficient to be within a range AE, BE of the original signal. That is to be in Here, A, B denote the frame bound where the dot product between wavelet and signal occurs, E the energy of the

(27)

original signal, Addison [2017]. Values of A, B frames depend on choosing values for

both parameters a0, b0. If A = B = 1, the wavelet family function forms what is called

orthonormal basis such that the reconstruction of the original function f is possible.

A common choice of a0, b0 from 2.7 are 2, 1 respectively. Different choices are possible

but is beyond the scope of our research, Daubechies[1988]. The aforementioned choice

of 2, 1 is known in literature as the dyadic grid arrangement. This is the simplest and most efficient discretization for practical purposes, for creating orhtonormal basis and

having finite energy that match the signal in f , Addison[2017]. The wavelet function

becomes then: Ψm,n(t) = 1 √ 2mΨ t − n2m 2m , (2.8) which becomes ≡ 2−m2Ψ 2−mt − n .

Equation2.8is useful by explaining its property in a multi-resolution representation

context. This is a representation where a signal f undergoes an incremental inspection

from a coarse to a higher resolution approximation by a scaling function φ,Mallat[1989],

namely

φm,n(t) = 2

−m

2 φ(2−mt − n). (2.9)

Note the similarity with equation2.8. The benefit of equation 2.9 is that under certain

restrictions it arrives at the same property as formula 2.8. Practically this means, to

use the wavelet under different values of m levels of wavelet dilation while traversing the

function f . Put it differently, the integration of f invariant for shifts by ¯nb0, should equal

a value αm,n. Thus for every dilation m only those αm,n(f ) that are local extrema are

retained. This is in fact a redundant sub-sampling, which is proportional to 2−m times

the original f . Why use a factor 2j, is simply a requirement to have every decomposition

to be a scale version of the previous level decomposition. This decomposition of which its previous scale is a recursive function called scaling equation that describe the scale function φ(t), namely:

φ(t) =X

k

ckφ (2t − k) . (2.10)

Integrating both parts produces a sum of ck = 2 scaling coefficient, additionally to

preserve orthogonality the sum of squares of these scaling coefficient equals 2. This leads in practice to reverse alternate scaling coefficient to define a differencing of the associated wavelet equation:

Ψ(t) =X

k

(−1)k_C

(28)

Background 13

with Nk denoting finite number of scaling coefficients. As with the wavelet transform

which captures the detail of a function f , the scaling function represents an approxi-mation of the function f . Consequently, function f can be written as the sum of the

approximation up to a scale m0 choice and its detail coefficient up to that level m0. This

means that at a smaller scale m−1 we obtain an increased resolution of the signal by the sum of the scale m and its details at that same scale. Effectively, the wavelet equation

2.11and scale equation2.10together form a multi-resolution representation of a function

f . This function effectively undergo a highpass-filter, wavelet and lowpass-filter scale function. Iteratively, coarse approximation of f , later recursiverely, by the approxima-tion are performed and its differencing, rather alternating of these approximaapproxima-tion yields the detail components.

2.1.3 Maximum Overlap Discrete Wavelet Transform

The MODWT distinguishes itself from the DWT by not imposing the data to be

di-visible by two, dyadic grid, it works on any length of the data Percival and Mofjeld

[1997]. This means that the MODWT approximation and details are highly redundant

and non-orthogonal. The MODWT is known in literature under different names:

trans-lation/time invariant DWT, undecimated/stationary DWT,Percival and Walden[2000].

The key difference is due to the whole time serie circurlarly filtered by normalized scaled

function and wavelet of factor 2m0_{, with m}

0the final depth decomposition. However this

redundancy, MODWT multi-resolution is possible, the wavelet and approximation are shift invariant, such that for each m-th level decomposition every shift of the timeseries X would yield distinct approximation and details. Moreover, the normalized filters are zero phased filters, which can benefit the time alignment of some component of a time-series through its decomposition. The energy argument in which we previously defined

the DWT, also allows the DWT to estimate the variance of a timeseries at its m0 level

of approximation and details decomposition. This is also the case for the MODWT. This latter is more efficient as it circularly shift the timeseries and so avoid boundary

condition issues, which are present in DWT,Percival[2008].

2.2 Autoregressive Models

As one of a statistical tool used in climate science, AR(p) or ARMA(p, q) processes are

more common,El-Askary et al.[2012]. Mostly these tools are used to fit existing data to

conceptualize or extract the p, q coefficients underlying the data,Katz and Skaggs[1981].

(29)

our data but is not limited by the amount of data. Therefore, we are in need of some properties of those AR(1) processes to be contained.

While creating an AR(1) processes, the polynomial describing this process must have value outside the unit circle. This will mean that we have a stationary process. To ensure this, the first coefficient should be strictly less than 1. Therefore, an Augmented Dick and Fuller test (ADF) is required, to test this stationarity restriction. ADF null-hypothesis is the timeseries is non stationarity. In our use case the null null-hypothesis

must be rejected for a given confidence interval,Dickey and Fuller[1979]. Nonetheless,

the generated process might have stationarity, but might still show a trend. To solve

the issue, the Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS) Hobijn et al. [2004] is

used. The KPSS null hypothesis is that the timeseries is stationary. This means our timeseries should not reject the null hypothesis with the enforced restriction of trend stationarity. Finally, the timeseries should be standardized as we aim to decompose these timeseries and each level decomposition should have meaningful descriptive statistics. To this end the Quantile and Powertransform are applied, where the AR(1) timeseries

is fitted to normal distribution. The fitted AR(1) data should additionally contain

unique autocorrelation. The Ljung-Box Pierce test solves the problem by evaluating the residuals of the regressed AR(1) to be below a critical threshold.

2.3 Community Guided search

2.3.1 Response Guided

The objective of the Response-Guided-Causal-Precursor-Detection (RGCPD) framework is to extract statistically significant gridded data, and combine it with causal discovery

algorithm Kretschmer et al. [2017]. The selection of a gridded data region is in

refer-ence to a response variable, meteorological variables that in fact precede the response variable. This is a community, sea surface temperature (SST), soil moisture (SM) for ex-ample. Each community variable is calculated for different lags using correlation maps. Correlation maps of the same sign, positive or negative,are grouped if they pass a thresh-old, p < 0.005. A weighted area of these regions are then calculated. This approach reduce dimensionality but preserve the spatio-temporal pattern for the community. To assert the quality of these correlation maps, a k-fold cross validation is employed for robustness. The extracted timeseries are the timeseries that will have to be evaluated by the causal precursor detection to quantify their link and handle autocorrelation of each timeseries.

(30)

Methods 15

2.3.2 Causal Precursor Detection

In the Causal Graphical model the nodes consist of the timeseries and edges lag spe-cific directed link between timeseries. The direction of links is employed in variation of the PC-algorithm. This framework focus is concentrated around time-lagged causal

discovery assessed with conditional independence testing assuming time-order Runge

[2018]. The most prevalent assumption used in the PCMCI framework are causal

suf-ficiency, causal Markov Condition and faithfulness. For observational data, additional assumption of no instantaneous effect and stationarity are required. Causal sufficiency translates to the fact no latent variables directly influence the timeseries. This

assump-tion is the most important in the PCMCI algorithm, Runge [2015]. Causal Markov

Condition, constitutes of the independence of any timeseries Yt∈ Xt with X timeseries

process up to a lag τ . The contrapositive, the dependence also holds and thus connected-ness of nodes follow. Additionally, the limitation up to lag tau, reinforce the sufficiency argument. The faithfulness assumption guarantees that the graph entails all conditional independence relation implied by the Markov condition. With these assumptions the PCMCI algorithm forms a two step algorithm, PC and MCI.

2.3.2.1 PC

This step starts with an entire graphical representation of the timeseries unconditionned.

The conditional selection between nodes X_t−τi , X_tj aims to find a causal link if X_t−τi ∈

P(X_tj) the parent of timeseries Xj at lag t. The parents of this timeseries are collected

on an iterative manner and ordered in ascending order of magnitude. The Markov assumption assures that the parents are a sufficient condition to establish conditional independence using partial correlation as default test. An hyper-parameter α is used to let the conditional independence test to converge.

2.3.2.2 MCI

The MCI step uses the estimated parents of Xj, X_t−τi in the PC step to further filter

irrelevant conditions on false positives. The inclusion of the estimated parents of X_t−τi

is to account for autocorrelation in the source. Finally, the significance of the links can be assessed by the p-values or by false discovery rate. A false discovery rate obtains

q values, q = min pm_r, 1, with p original p-value, r the rank of the p-values of the

PC-step and m the number of computed p values in total, m = N2τmax. Furthermore,

in the MCI step a conditional independence test, partial correlation is used. Effectively,

(31)

Methods

The approach set out for this research is thus to algorithmically find the optimal time scale that maximise the signal to noise ratio between target and precursor. To do so, we propose on the first stage to establish an algorithm for observational data as a baseline. The data used in this research originates from the ERA-Interim and ERA-5 reanalysis dataset containing global gridded data extracted at 2-degree resolution; the lower the degree the higher the grid data representation is of the model. The range of the data used is from 1979-2018. The dataset is not a model itself but an observation of the mentioned model. Three datasets have been collected; sea surface temperature (SST), soil moisture (SM), geostrophic height at 500hPa (zH). Demonstratively we will have our focus on the SST dataset has this has known timescale. Our period of interest are of three months June-July-August (JJA) for the recorded heatwaves. The second outline of the method is to test the robustness of the baseline using AR(1) generated model sample with the same timeresoltion as the observational data. The timeresolution we employ are daily mean for the observational data, as this generate a bigger sample size. We follow the following recipe:

1. Multiresolution analysis of data;

2. Modification of multiresolution to be put through PCMCI;

3. Comparison of the decomposed level with the original signal, in qualitative manner, to assess the usefulness of PCMCI.

3.1 Observational data

From the RGCPD framework we obtain data that have siginificance with regard to the target variable of interest temperature. Hereby we represent the sampling of the

(32)

Methods 17

three observational data set under the graphical model of PCMCI. This graph portraits significant relations between geolocations data extracted from the RGCPD framework on the present sampling time, τ = 0 and past τ = 1, 2.

Figure 3.1: Soil moisture data sam-pled on a daily mean

Figure 3.2: SST data sampled on a daily mean Figure 3.3: Geostrofic height sample daily Figure 3.4: The different edge color represents the sign of the significance of the link, a positive or negative in the present time τ = 0, different edges depicts the link

significance in different lags. The nodes’ color represent the autocorrelation

From the three datasets we narrow down our choice of investigation to the SST data as in literature its time scale is known to be ≈ 10 years. A multiresolution analysis of our SST data, for example for the precursor Mid-pacific is expressed in this way. Here we use a wavelet Daubechie 4 as this is a common wavelet used in hydrological

data, Percival and Mofjeld [1997]. The 4 in Daubechie wavelet means the wave has a

length of 8, Daubechie wavelet family are symmetric wavelets. This length is not to be confused with the width of the wavelet. The width is changed during the multiresolution algorithm.

Applying the multi-resolution algorithm on the SST Mid-Pacific timeseries, for

inspec-tion, we see in Figure 3.5 we see the high and low frequency components of the signal.

These high-frequency seems to persist for three levels, from 1 to 4 days mean, and then

are separated. Note that low frequency are on a dyadic grid 2m, with m the level

de-composition. The details components are essentially a differencing of subsequent levels,

thus resulting in 2m−1 scale. The benefit of the high-frequency thus helps relate the

sampling frequency ∆t to level decomposition. However, by design the discrete wavelet transform down-sample the data by a factor two during a multiresolution algorithm. This is a limitation to continue on with this approach as the following table suggests:

Table3.1provides thus a motivation to use a different method of decomposition.

More-over, we require larger timeseries than the size of 14606 data points, which is only possible by the daily time aggregation in RGCPD. Generating AR(1) data provides a

(33)

Figure 3.5: Mid-pacific Multiresolution analysis done with wavelet Daubechie 4. This means that the wavelet contains on the first level decomposition 8 waves within a frame that will traverse the signal. As we go further down the wavelet widens and we obtain four wavelets and so on. Left are the scale approximations, showing the long trend of the data. Alternatively as we can see,the data smoothens. Right are the short change of variation captured by the wavelets, the details. These are the high-frequency content of the data. Going further than level 7 for this data is not interesting as we can see in approximation as details; the signal are elongated. Moreover, for this daily data we have in fact 27_{= 128 days of approximation but 2}7−1 _{= 64 days of details.This is within}

our JJA-period. Approximation are redundant by design and wavelet are numerically calculated by the difference between approximation scales.

Level Coefficients Reconstruction

1 7303 14600 2 3655 14602 3 1831 14606 4 919 14614 5 463 14630 6 235 14662 7 121 14726

Table 3.1: The values per level decomposition coefficients, there is no difference be-tween approximation or details decomposition. As one level decomposition composed of approximation and detail contain a content of the data at that level. However, due to the sub-sampling impose by the dyadic grid of two, up-sampling is required to recon-struct the data from its original size, 14606 data-points. The first level reconrecon-struction misses 6 data points due to boundary conditions. But as we see, more data is added

by up-sampling with value divisible by two

solution and instead of a DWT we opt for the MODWT approach. This choice al-lows us to inspect broader depth of level decomposition. Additionally, any data able to be represent by the power of 2 could be padded and examined. This restrictions adds redundancy, essentially the MODWT core aspects, however every level is uniquely

(34)

Methods 19

decomposed, through the translation and shift invarinace property of MODWT. After the MODWT multi-resolution we use the unpadded version of every level. To effectively compare the observational data we standardize all timeseries as preprocessing step before applying PCMCI.

3.2 Artificial data

With Artificial data that has the same aspect as climate data, we follow the choice of

Franzke[2009] of using an AR(1) realisation where data limitation is not an issue. We

opt to mimick the original data as its time aggregation, therefore we sample 5000 daily data points.

Figure 3.6 show the first inspection of our generated AR(1) timeserie. By default as

we can see the generation of the timeseries forms a bell curve, this is benefit as it might provide stationarity. This is a property we require as PCMCI except stationary independent data with no seasonality or trend. By the Augmented Dickey Fuller test

Dickey and Fuller [1979] we test the stationarity of the data and its mean stationarity

via the Kwiatkowski, Phillips, Schmidt and Shin, (KPSS) test Hobijn et al. [2004].

To ensure to have a normal distributed data we apply a normaltest d’Agostino [1971].

A higher value than threshold 5% after the normal test leads to a PowerTransform

or QuantileTransform of the data, Yeo and Johnson [2000]. Effectively, Figure 3.6 is

transformed. Both method preprocess the timeserie to have a normal distributuion, without affecting the stationarity. Before and after the stationarity test and restriction of

normality the Ljung-Box test is applied on the generated AR(1),Box and Pierce [1970].

This test if there is no autocorrelation in the fitted AR(1) residuals of the timeserie. Differently put, if its residuals are pure random consequently the original data as well.

All data generation are done using the python package StatsmodelSeabold and Perktold

[2010].

3.3 Wavelet

The generated AR(1) timeseries can now be decomposed via a multi-resolution algorithm using the Wavelets. We apply the decomposition to investigate how to interpret the

results and which settings are appropriate. The package PyWavelet1 is the only package

provided to work with the discrete version of wavelet transform in python, Lee et al.

[2019].

(35)

Figure 3.6: On the upper most plot we present the generated AR(1) data. Subsequent plots show the autocorrelation, Partial correlation, Periodogram, distribution, QQ-plot and Ljung-Box Pierce test. Especially the Partial correlation plot confirms that we sample an AR(1) as only the first lag is significant above the default 95% threshold interval. The Quantile-to-Quantile plot together with the distribution confirm that the AR(1) has been standardized. Lastly, the Ljung-Box Pierce test confirms that the residual of AR(1) up to lag 20 are have no autocorrelation. This means the data cannot

reject the null hypothesis the data are independently distributed.

3.3.1 Choice Wavelet

To understand the decomposition, we need to understand the property of the Wavelet itself. The property of the signal is better captured by a fitting wavelet as this captures a certain characteristic of the signal. So far in the case of the discrete wavelet transform the DaubechiesN and Least asymmetricN also known as symletsN wavelets are

suitable for climate data, here N represent the length of the wavelet chosenPercival and

Mofjeld [1997]. A wavelet also has vanishing moments depending on their coefficient

N, which means how well the Wavelet can capture a polynomial behaviour of a signal. Moreover, Wavelet and their scaling equations are created through their coefficients and

(36)

Methods 21

numerical computation Wavelet database python2 and Wavelet database R3. Although

the named wavelets are orthogonal, a slight difference in shape denoted by N has a

significant impact on the result. ThereforeSang et al. [2016]. proposed four criteria:

1. Progressive and linear phase;

2. Good localization both in time and frequency;

3. Should be adapted to trade-off between time and scale resolution; 4. Wavelet should meet the orthogonality property;

the slight difference in shape is additionally crystallized in requirement 3, different time-series with the same length might not benefit the use of the same wavelet. The underlying property depends on the type of timeseries. All other criteria are there to ensure the decomposition of the signal from noise to its true components is approximately well separated. We investigate the choice of our wavelet by evaluating how well each level captures the energy of the original signal. Therefore, we use the Normalized Partial

Energy Sequence Signal (NPESS) introduced by Percival and Walden[2000]. How well

the signal has been captured by the choice of wavelet should be denoted by the NPESS to become close to unity, with less n data points possible. Here on forward we will use

an ECG signal4 to illustrate the motivation for a certain choice of Wavelet. From the

choice of wavelet on to our experiments we stay in use of the LA8 wavelet, as this has

proven a good candidate that meets the above requirements. Moreover, Percival[2008],

apply this wavelet to analyse hydrological timeseries successfully. Additionally the LA8 Wavelet follow the aforementioned criteria.

From Figure 3.9 we might see a slight translation in both scale function and wavelet.

This translate to a different change in capturing the signal. In Figure 3.10 we see that

level 5 decomposition compared to lower level decomposition that a distinct but low

NPESS signal show. Decomposing beyond level 5, as we see in Figure 3.11 will not

result in more information. Interestingly, the overall NPESS, of the ECG signal used with the wavelet LA8, gives the impression that the wavelet choice is too great. It works too well, all higher level, except level 0 comes close to unity quite fast.

2 http://wavelets.pybytes.com/ 3 http://staff.washington.edu/dbp/wmtsa.html 4 https://tinyurl.com/yd78uxn7

(37)

Figure 3.7: Default shape wavelets and scale coefficients

Figure 3.8: Different shape on level 11 decomposition Figure 3.9: A representation of the wavelet used to decompose an ECG signal, LA8 stands from Least Asymmetric wavelet with length 8. It meets the aforementioned property to be suited to denoise a signal. The scaling function that captures the long trend of signal,also known as approximation. Then the wavelet function that captures the subtle changes in the signal also known as details. Approximations are long evolv-ing trend captured, low frequency and details are high frequency that can directly be translated to the sampling of a timeseries. Approximation are redundant due to the requirement of the dyadic grid and details are difference of consecutive approximations. They have unit scale. Therefore the physical interpretation uses 2m−1_{∆t instead of 2}m

approximation, m ∈ N 0 200 400 600 800 1000 #Datapoints 0.0 0.2 0.4 0.6 0.8 1.0 Parital Energy

ECG NPESS with LA8

NPESS Wave_LA8_level_0 NPESS Wave_LA8_level_1 NPESS Wave_LA8_level_2 NPESS Wave_LA8_level_3 NPESS Wave_LA8_level_4 NPESS Wave_LA8_level_5 Original data

Figure 3.10: A NPESS of the orig-inal ECG signal and the NPESS of each level show that higher level de-composition extract different compo-nents of the signal. The original sig-nal itself and the first level decompo-sition are close. In term of NPESS, this shows that stopping at level 1 decomposition will only retain more

noise and nothing else.

3.3.2 Choice Depth

How deep we want to decompose depends on what we want to extract from the

orig-inal signal. The full characteristic of our signal is the objective. Although, Percival

and Walden [2000] encourages to search for the adequate depth that traps a certain

property of the signal, Sang et al. [2016] disregards this advice. Namely, the purpose

of decomposing comes with an uncertainty, that is, how statically reliable is the chosen depth. This means thus certain choice of depth leads to no signal present, a flat line.

Time Frame of interaction: An empirical approach to uncover the multi-temporal aspect of climate data

Masters Thesis