Measuring High-Frequency Contagion in Hawkes Jump-Diffusion Models Using Particle Filters

(1)

Measuring High-Frequency Contagion

in Hawkes Jump-Diffusion Models

Using Particle Filters

Jan Meppe

∗

Prof R.J.A. Laeven

†

Prof H.P. Boswijk

‡

November 5, 2018

Abstract

To study high-frequency contagion, we develop a novel estimation routine based on particle filtering to estimate a simplified version of the Hawkes jump-diffusion model (Aït-Sahalia, Cacho-Diaz, & Laeven, 2015). We provide a Monte Carlo simulation study and an empirical application using 5-minute stock index futures returns of the S&P 500 and the FTSE 100. We find evidence of high-frequency contagion from the US to the UK, but not vice versa. Our novel estimation routine works well in simulations, but estimation of the Hawkes parameters remains difficult on real data.

Keywords: Jump-diffusion, Hawkes process, particle filters, financial conta-gion, self-excitation, cross-excitation

∗_{Amsterdam School of Economics, University of Amsterdam}

†_{Supervisor, Department of Quantitative Economics, Amsterdam School of Economics,}

University of Amsterdam

(2)

Acknowledgements

First of all, I want to thank my two supervisors, Peter Boswijk & Roger Laeven for their invaluable advice, guidance, and support throughout the process. It wasn’t the easiest process and much setbacks had to be overcome, such is life. Furthermore, I’d like to thank my family, close friends, office-mates, people from TI, and everyone else. You know who you are.

External things are not the problem. It is your assessment of them. Which you can erase right now.

(3)

It is a well-established fact that financial markets move together, especially during tumultuous periods. The past financial crises show that our current understanding of the complex interactions between financial markets is still inadequate. At the macro-level, we have observed this in the form of the Asian currency crisis in the 90s, the global financial crisis starting in 2008 and most recently the Euro-zone debt crisis. At the micro-level we observe this in our electronic exchanges as an increasing number of so-called flash crashes, periods of extreme intra-day price drops followed by an almost instantaneous rebound.

Understanding how international financial markets are connected is impor-tant for policymakers and investors (Pastor & Veronesi, 2012, 2013). Fur-thermore, research has also shown that intra-day price jumps have important implications for risk premia (Pan, 2002), option pricing (Merton, 1976), and risk management (Duffie, 2001). These intra-day jumps can propagate from one market to another, a process that is referred to as financial contagion. More specifically, we define financial contagion as the propensity of a shock in one market to increase the likelihood of a shock in another market (Aït-Sahalia et al., 2015). Fully understanding how this works is important to safeguard the future stability of our financial markets. Therefore, the aim of this thesis is to get a better understanding of how high-frequency shocks propagate through international financial markets.

Traditional models in quantitative finance have not been able to reproduce the extreme asset price crashes we regularly observe in crises. This spawned a new branch of literature using compound Poisson (or more generally Lévy processes) as the driving force behind the jumps (Merton, 1976; Bertoin, 1996; Sato, 1999; Tankov & Cont, 2003). While such models are able to generate jumps, they fail to reproduce the clustering and cross-market prop-agation of jumps commonly observed empirically during crises (Yu, 2004; Aït-Sahalia, Laeven, & Pelizzon, 2014; Carr & Wu, 2017). To address these shortcomings, the literature moved to using Hawkes (1971) processes as the underlying driver of these jumps. In contrast to models based on Lévy pro-cesses, models based on Hawkes processes allow for dependency between past and present jumps through self-excitation (a jump increasing the probabil-ity of jump in the same series) and cross-excitation (a jump increasing the probability of a jump in another series). These two features are what makes Hawkes processes capable of capturing jump clustering and jump propaga-tion, and hence better model financial contagion.

In an influential paper, Aït-Sahalia et al. (2015) manage to embed Hawkes jumps in a jump-diffusion model and use it to study international financial

(5)

contagion at the daily level. Like Aït-Sahalia et al. (2015), most of the work done on financial contagion is done at the lower frequencies (i.e. daily, monthly, yearly). Yet, with the increasing availability of high-frequency data we are given the opportunity to study this phenomenon of financial contagion at a much finer level of detail than ever before.

There is relatively little work done studying high-frequency financial con-tagion. While the very recent work by Aït-Sahalia, Fan, Laeven, Wang, and Yang (2017); Dungey, Erdemlioglu, Matei, and Yang (2018) develops statistical tests to test for high-frequency self- and cross-excitation, empir-ical evidence about the size, direction, and magnitude of contagion at the high-frequency level remains elusive. Of the few papers estimating intra-day (semi-)parametric models, two relevant ones are Bos (2008) and Stroud, Pol-son, and Müller (2004), both of whom use a constant jump intensity and Bayesian estimation techniques, bringing us to our next point.

To our knowledge, there is no research that estimates Hawkes jump-diffusion models using maximum likelihood and particle filters. Theoretically, max-imum likelihood estimation (MLE) is the superior estimation technique in jump-diffusion models providing asymptotic normality, consistency, and effi-ciency under only mild regularity conditions (Sorensen, 1991). In contrast to previous models, our jump intensity is path-dependent which requires the use particle filters to estimate the log-likelihood. Particle filtering furthermore allows us to separate the jumps from the volatility (Sahalia, 2004). Aït-Sahalia et al. (2015) have developed an estimation routine based on method of moments. Another paper using a similar type of model with self-exciting jumps is that of Boswijk, Laeven, and Lalu (2015) who use a generalized method of moments (GMM) estimation procedure with continuous moments in combination with lower frequency derivatives data. Bayesian techniques are also used (Fulop, Li, & Yu, 2014; Maneesoonthorn, Forbes, & Martin, 2017).

Thus, we identify two gaps in the literature: First, there is a lack of research and empirical evidence about high-frequency contagion. Second, most of the (high-frequency) jump-diffusion models using Hawkes’ jumps are estimated using either moment-based estimation (Aït-Sahalia et al., 2015; Boswijk et al., 2015) or Bayesian methodology (Fulop et al., 2014; Maneesoonthorn et al., 2017). It is in this gap of the literature in which we make a contribu-tion. Our first contribution to the literature is to provide the first steps in measuring high-frequency international financial contagion. Our second and arguably most important contribution is the development of a novel estima-tion routine to estimate Hawkes jump-diffusion models.

(6)

This raises several questions. First, we ask whether it is actually possible and if so how we can estimate jump-diffusion models with Hawkes jumps using particle filters. Second, we look at what kind of high-frequency contagion we can find. Is there actually any evidence for self-excitation in markets, or is there none? How strong of an effect is it? What evidence is there for finan-cial contagion through cross-excitation? Is the cross-excitation symmetric or asymmetric? For example, Aït-Sahalia et al. (2015) find a large asymmetri-cal cross-excitation effect of the US on other markets (but not vice versa), do we find the same?

To answer these questions we develop a new estimation routine for the Hawkes jump-diffusion model. Our work builds upon the work of Malik and Pitt (2011). They show that many different econometric models can be cast into a state space form, estimating them using simulated maximum like-lihood using particle filters. We realized that a constant volatility version of the Hawkes jump-diffusion model can also be cast into this form, and hence be estimated using the same techniques. The particle filter is able to pro-vide an estimate of the log-likelihood, yet the log-likelihood is stochastic and discontinuous in the parameters, which poses serious problems for parameter estimation. Malik and Pitt solve this by using a combination of common random numbers and smoothing the empirical cumulative distribution func-tion (ECDF) of the weights to continuously resample the particles. We take the key elements from their method and apply and adapt it to our problem at hand.

We provide two Monte Carlo simulation studies (one univariate and one bivariate) to assess the finite performance of our routine. For the particular parameter set based on Johannes, Polson, and Stroud (2009) our filter and estimation routine works well.

For our empirical application we use high-frequency stock index futures data from the S&P 500 and the FTSE 100. Because the American stock index futures trade around the clock, we are able to obtain perfectly synchronous observations within the standard (European) business hours. This allows us to estimate our model on this data and measure the amount of high-frequency contagion. Our results indicate that, while both markets exhibit self-exciting behavior, jumps in the US have a significantly larger impact on the UK market than vice versa, also called asymmetric cross-excitation. Our results echo the same sentiment found in the literature, but at a higher frequency. However, our novel parameter estimation routine is unable to provide statistically significant estimates of the Hawkes parameters.

(7)

the related literature. Section 2 describes the model, estimation routine, and optimal filtering equations. Section 3 contains simulations results. Section 4 describes the data. Section 5 contains the results. Section 6 concludes and provides directions for future research.

(8)

1 Literature review

This chapter of the thesis contains a comprehensive literature review. We first cover financial contagion, then Hawkes processes, and finally particle filters and the smoothing method of Malik and Pitt (2011) which lies at the center of our novel estimation routine.

1.1 Financial contagion

The global financial crisis of 2008 highlighted our limited understanding of how international financial markets are connected. Designing robust and stable financial systems has become a top priority for financial regulators, governments, and policymakers. To do so, it is necessary to gain a deeper understanding of how financial shocks propagate in time and space. To pro-vide one example of what we call financial contagion we go back to the event that many now consider the start of the financial crisis: the bankruptcy of the Lehman Brothers investment bank.

On September 29th 2008 14:00 (UTC+0), the United States House of Repre-sentatives voted against providing governmental support (a bail-out) to the Lehman Brothers and intra-day markets responded swiftly. Figure 1 shows

Figure 1: Intra-day market reaction of the S&P500 and FTSE100 futures prices to the United States House of Representatives decision not to bail out the

Lehman Brothers investment bank (September 29th 2008).

the intra-day reaction to the announcement in the S&P 500 and FTSE 100 futures prices. We see an immediate drop of the S&P 500 followed by a drop of the FTSE 100 immediately after. We observe a similar contagion at lower frequencies over longer time periods (Aït-Sahalia et al., 2015). Figure 2

(9)

(a) S&P 500 futures returns

(b) FTSE 100 futures returns

Figure 2: Returns of the S&P 500 index futures and the FTSE index futures.

shows a graph of the S&P 500 and FTSE 100 stock index futures returns over the period from 2006 to 2015. Extreme returns are clustered together both in time (on the horizontal axis) and in space (vertically).

Traditional stochastic volatility models in the literature have been unable to produce the financial contagion and multivariate clusters of extreme price jumps we observe in crises. This led to the development of jump-diffusion models incorporating discontinuous jumps based on compound Poisson pro-cesses (or more generally Levy propro-cesses) as the driving force behind these jumps. While these type of models were able to reproduce some several salient features of financial crises, they failed to reproduce the time-series clustering and cross-market propagation of jumps commonly observed. Be-fore delving deeper into these models, we ask ourselves where the term finan-cial contagion comes from.

The origin of the terminology behind “financial contagion” can be traced back to the late 90s. Before 1997 the word “contagion” was used almost exclusively in a medical context to refer to the spread of diseases and viruses. This all changed in 1997 with the Southeast Asian currency crisis. Originating from Thailand, the effects of the crisis quickly spread to neighbouring countries and eventually even to Russia and Brazil, ultimately leading to the demise of the famed Long Term Capital Management (LTCM) hedge fund. The economic history of the world is littered with contagious financial crises but in particular the financial crisis of 2008 and the Euro-zone debt crisis have sparked new interest in financial contagion and how to prevent, control, and

(10)

contain it.

Before discussing further, it is important to define what we mean exactly with financial contagion. It is difficult to exactly pin down what we mean by contagion as many different definitions of financial contagion exist in the liter-ature. Forbes and Chinn (2004) provide an excellent empirical survey on the literature of financial contagion including its different definitions. Because the goal of this thesis is to measure the degree of high-frequency contagion between international financial markets, and our model largely follows the one used in Aït-Sahalia et al. (2015), we use their definition of financial con-tagion: the propensity of a shock in one market to increase the likelihood of a shock in another market.

Financial contagion has been studied at different frequencies, but the bulk of the literature on financial contagion is done at the daily (and/or lower frequencies) level. Cheung, Tam, and Szeto (2009) and Claessens and Forbes (2013) provide excellent surveys on this literature. For this work, particularly relevant work is done by Aït-Sahalia et al. (2014), Aït-Sahalia et al. (2015) and Hainaut and Moraux (2017).

Aït-Sahalia et al. (2014) and Aït-Sahalia et al. (2015) were some of the first and influential papers using Hawkes processes to better capture international financial contagion. The former applies Hawkes processes in modeling Euro-zone sovereign credit default swaps (CDS) and the latter uses them to model international financial contagion.

To do so, Aït-Sahalia et al. (2015) introduces the Hawkes jump-diffusion model, analogous with the well-known Poisson jump-diffusion model (Merton, 1976). The model has a drift term, diffusion term, and a pure jump process that is a Hawkes process. Hawkes processes allow for path-dependence in the jumps which generate feedback mechanisms leading to jump clustering and contagion to other markets. They use this model to study international financial contagion between several large stock markets. A method of mo-ments based parameter estimation routine is developed. Aït-Sahalia et al. (2015) find that both significant self-excitation in all markets in combination with asymmetric cross-excitation. Shocks in the US have a disproportionate effect on other countries but not vice versa. Our model is a constant volatility version of Aït-Sahalia et al. (2015).

In an unpublished working paper, Hainaut and Moraux (2017) propose an-other jump-diffusion asset return model with Hawkes jumps. The key nov-elty in their work is that the baseline intensity of the jumps itself is another Hawkes process. Hence jumps can momentarily increase the baseline

(11)

inten-sity for future jumps leading to even more extreme jump clusters than in Aït-Sahalia et al. (2015). Hainaut and Moraux (2017) use a particle filter to simulate the log-likelihood after which they correctly note that this creates a discontinuous log-likelihood, which is an issue that needs to solved. Further explanation stops there but results and parameter estimations are reported. This problem of a discontinuous log-likelihood (and our solution) is discussed in the chapter on particle filtering.

In contrast with the already existing body of literature on financial contagion, there is a lack of literature on high-frequency contagion. Instead of directly estimating intra-day models, high-frequency data is often used to augment models at the daily level. For example, Elerian, Chib, and Shephard (2001), Andersen, Bollerslev, Diebold, and Ebens (2001) and Barndorff-Nielsen and Shephard (2002) all augment daily SV models with intra-day data. It can also be used to calculate daily integrated variance in a non-parametric manner as is done in Barndorff-Nielsen and Shephard (2006).

All the previously mentioned work either studies contagion at the lower fre-quencies or uses high-frequency data to augment lower frequency models. In the more theoretical niche of the high-frequency literature regarding Hawkes processes and financial contagion we have Boswijk, Laeven, and Yang (2018) and Dungey et al. (2018) who develop asymptotic tests for, respectively, self-and cross-excitation in a high-frequency setting.

Yet, estimation of actual intra-day models remains rare. This is because working with high-frequency data is difficult. Intra-day data exhibits strong daily patterns. It is a stylized fact in the literature that volatility is higher just after opening and just before closing and lower during lunch time. This, in combination with the opening and closing of the market, the persistent, stochastic, mean-reverting volatility, difficult to anticipate shocks, many pa-rameters and high volumes of data, implies that parametric estimation of intra-day models is considered a difficult task.

Bos (2008) and Stroud et al. (2004) are two of the very few papers that actually attempt to estimate intra-day parametric models. Bos (2008) models the log-price process as a jump-diffusion model with stochastic volatility and jumps that are normally distributed with a constant intensity, and market microstructure noise. Bos (2008) estimates the parameters with a Bayesian approach, using data augmentation for the unobserved components in the model. Stroud and Johannes (2014) model the log-price as a jump-diffusion with stochastic volatility and jumps that are normally distributed with a constant intensity. Volatility is decomposed into a slow and fast component with different mean-reversion times. They filter out intra-day seasonalities

(12)

using cubic smoothing splines (Weinberg, Brown, & Stroud, 2007). The model is estimated using Bayesian MCMC methods.

In summary, studying financial contagion is important because it has impor-tant implications for the stability of our financial system. We find a lack of empirical evidence on high-frequency contagion with models that use Hawkes processes as the underlying process behind the jumps. Hawkes processes have enabled these models to better capture the salient features observed, we will expound further upon Hawkes processes in the next section.

1.2 Hawkes processes

First introduced by and named after Hawkes (1971), Hawkes processes are a type of path-dependent point processes in which a jump can influence its future intensity. Hawkes processes are special in the sense that when jump occurs in a series, this can directly influence the probability of the next jump. This in turn can lead to a series of self-reinforcing jumps, self-excitation. In a bivariate Hawkes process a jump in one series can increase both the probability of a jump in its own series but also the probability of a jump in the other process, this is called cross-excitation. If a Hawkes process is both self- and cross-exciting, this is referred to as mutual-excitation.

The first practical application of Hawkes processes was by Ogata (1988) who used them to model earthquake aftershocks. After this, many others found useful applications for Hawkes processes such as in social networks (Crane & Sornette, 2008), biology (Reynaud-Bouret, Schbath, et al., 2010), and crime (Mohler, Short, Brantingham, Schoenberg, & Tita, 2011). The uses of Hawkes processes in the financial econometrics literature also seem plentiful. They are used for example to model the incoming flow of high-frequency transaction data (Bowsher, 2007), modeling the yield curve (Salmon & Tham, 2008), pricing credit derivatives (Errais, Giesecke, & Goldberg, 2010), and of course to measure international financial contagion between financial markets (Aït-Sahalia et al., 2015).

A formal definition of a Hawkes process is as follows.

Definition (Hawkes process) Consider a suitably filtered probability space (Ω, F , P) on which we define a counting process N(t). If the process N(t)

(13)

has the following Ft-conditional intensity,

P [N (t + ∆t) − N (t) = 0|Ft] = 1 − λt∆t + o(∆t)

P [N (t + ∆t) − N (t) = 1|Ft] = λt∆t + o(∆t)

P [N (t + ∆t) − N (t) > 1|Ft] = o(∆t),

(1)

where this counting process’ conditional intensity follows the following dy-namics

λt= λ∞+

Z t

0

φ(t − u)dN (u), (2) for λ∞ > 0 and φ : (0, ∞) → [0, ∞] which are respectively called the

back-ground intensity and kernel (or excitation function), then we say that N (t) a Hawkes process. We note that the homogeneous Poisson process is a special case of the Hawkes process that can be found by setting φ(·) = 0.

While the definition mentioned above is often used and consistent with the literature there exists another equivalent interpretation that is more intuitive. Let {t1, t2, . . . , tk} be the arrival times of the jump up to time period t, then

the Ft-conditional intensity of the Hawkes process is given by,

λt= λ∞+

X

ti<t

φ(t − ti). (3)

This form is equivalent and requires specifying the background intensity and excitation function. An often-used excitation function is the exponential kernel φ(t) = βe−αt, also used in Hawkes’ original paper. For α, β > 0 the conditional intensity then becomes

λt= λ∞+ Z t −∞ βe−α(t−s)dN (s) (4) = λ∞+ X ti<t βe−α(t−ti)_. ₍₅₎

Arrival of a jump immediately increases shocks the intensity by a factor β after which the intensity exponentially decays at rate α back to the baseline or background intensity.

Taken together, the joint process (Nt, λt) is Markov (Aït-Sahalia et al., 2015).

This property will be exploited in setting up our particle filter. Under suitable parameter restrictions the intensity process is non-negative and stationary. For non-negativity we require λ∞ ≥ 0, α ≥ 0, β ≥ 0 and for stationarity we

(14)

Figure 3: Example of a univariate Hawkes process with α = 0.30, β = 0.10, λ∞= 0.10.

require sufficiently strong mean-reversion, that is, a sufficient condition for stationarity is α > β.

Figure 3 shows a simulated sample path of a Hawkes process with baseline intensity λ∞ = 0.10, mean reversion parameter α = 0.30 and shock size

β = 0.10 initialized with N0 = 0. The intensity (blue line) stays constant

until the first jump (red mark) after which there is a rapid series of jumps clustered together. In the absence of any further jumps the intensity decays to its baseline intensity. This example shows that Hawkes processes are well adapted to capture jump clustering.

In the bivariate case we define two point processes (N1,t)t≥0 and (N2,t)t≥0

each with their own Ft-conditional jump intensities given by,

P (Ni,t+∆t− Ni,t = 1|Ft) = λi,t· ∆t + o(∆t)

P (Ni,t+∆t− Ni,t = 0|Ft) = 1 − λi,t · ∆t + o(∆t)

P (Ni,t+∆t− Ni,t > 1|Ft) = o(∆t)

(6)

for i = 1, 2. The Ft-conditional jump intensity of both processes are given

by,

dλ1,t = α1(λ1,∞− λ1,t)dt + β1,1dN1,t + β1,2dN2,t

dλ2,t = α2(λ2,∞− λ2,t)dt + β2,1dN1,t + β2,2dN2,t,

(15)

for baseline intensities λi,∞, mean reversion parameters αi and shocks βi,j

for i = 1, 2 and j = 1, 2. If β1,2 = β2,1 = 0 this reduces to two separate

univariate Hawkes processes. However, when these terms are not zero, a jump in one series can increase the probability of a jump in the other series: cross-excitation. This cross-excitation is what will allow us to model spillovers from one market to another and measure the contagion between markets.

Similarly, it can be shown that taken together the tuple (N1,t, N2,t, λ1,t, λ2,t)

is jointly Markovian. Nonnegativity requires λi,∞ ≥ 0, αi ≥ 0, βi,j ≥ 0 for

i = 1, 2 and j = 1, 2. Stationarity requires αi > βi,1+ βi,2 for i = 1, 2.

Figure 4: Example of a bivariate Hawkes process with α₁= 0.30, α2 =

0.50, β11= 0.03, β12= 0.15, β21= 0.09, β22= 0.05, λ1,∞= 0.10, λ2,∞= 0.05

Figure 4 shows a simulated path of a bivariate Hawkes process. We see the two intensities (denoted with the lines) and the jump indicators (denoted with the marks). The intensity remains at its base level until the first jump in the first series which increases both the intensity in the first (blue line) and second series (red line). We see that the intensity of the second process (red line) is excited more by the first series (blue line) than by itself. The bivariate Hawkes process is able to capture both the jump clustering and jump contagion effects.

(16)

1.3 Particle filters

Introduced by Gordon, Salmond, and Smith (1993) (and the terminology popularized by Kitagawa (1996)), particle filters have become much-used methods to model non-linear non-Gaussian state space models. The principal advantage of estimation methods based on particle filters is that they do not rely on linearisation of the model, but this is paid for with the fact that methods based on particle filters are generally computationally expensive. Thanks to the ever-decreasing cost of computational power these methods have become hugely popular in all fields of science such as computer vision, chemical engineering, ecological modeling, and of course economics.

We consider a time series yt that depends contemporaneously on a hidden

state xt. We model this using a discrete non-linear non-Gaussian state space

model. Consider the following model indexed by some static parameter vector θ,

yt∼ p(yt|xt, θ) (8)

xt∼ p(xt|xt−1, θ), x1 ∼ p(x1|θ) (9)

for t = 1, ..., T . For brevity we omit θ wherever superfluous. We refer to Equation (8) and Equation (9) as the measurement equation and state equa-tion respectively. It is furthermore assumed that the observaequa-tions, {yt|xt},

are conditionally independent and the states {xt} are Markovian. We

de-note available information up until and including time period t with Ft =

{y1, . . . , yt}.

The goal in this type of model is to separate the signal (state equation) from the noisy observations (measurement equation). This is done by finding the distribution of the state equation given some information set. We are interested in the following three distributions of the state: the predictive density (p(xt|Ft−1)), the filtered density (p(xt|Ft)), and the smoothed density

(p(xt|FT)).

In this context, as the name suggests particle filtering is a specific type of simulation-based method to estimate the filtered density p(xt|Ft). This

method was developed by Gordon et al. (1993) who called it the sampling importance resampling algorithm (SIR).

Particle filters are useful because they are versatile, can be used to disentangle (stochastic) volatility from shocks, compare models, make volatility forecasts, and are easily augmented using additional option prices (Johannes et al., 2009).

(17)

Many advances have been made in the field with important references being Gordon et al. (1993); Kitagawa (1996); Berzuini, Best, Gilks, and Larizza (1997); Pitt and Shephard (1999).

1.3.1 Sequential Importance Resampling (SIR)

Sequential Importance Resampling (Gordon et al., 1993) is a method of se-quentially drawing samples from p(xt|Ft, θ). For the SIR to work, there are

two requirements that need to be satisfied. First, we need to be able to evaluate the measurement density, p(yt|xt). Second, we need to be able to

simulate from the transition density p(xt+1|xt).

The particle filter approximates p(xt|Ft, θ) with a cloud of weighted particles

{x1

t, . . . , xNt } with probabilities {π1t, . . . , πNt }. Assume we have these particles

from a previous iteration of the algorithm. Our goal using the particle filter is to obtain a draw from a distribution that approximates the true filtering density p(xt+1|Ft+1). Using Bayes’ rule we can write,

p(xt+1|Ft+1) ∝ p(yt+1|xt+1)

Z

p(xt+1|xt)p(xt|Ft)dxt,

The most straightforward and simple particle filter is the SIR particle filter (also called the bootstrap particle filter) defined below. The intuition behind the SIR particle filter is as follows. Given this period’s particles, we first propagate the particles to the next period using the state equation. Then, we resample these particles based on which particles were more likely to have generated the next observation.

Algorithm 1 (Bootstrap/SIR filter) Given N particles x(i)_t ∼ p(xt|Ft), i =

1, . . . , N , for t = 0, . . . , T − 1,

1. Sample N new particles (from the bootstrap importance density)x_e(i)_t+1 ∼ p(xt+1|x

(i)

t ) for i = 1, . . . , N

2. Calculate the unnormalized weights and normalize them

w(i)_t+1= p(yt+1|ex (i) t+1), π (i) t+1 = w_t+1(i) PN i=1w (i) t+1 .

π_t+1(i) , i = 1, . . . , N is now approximately distributed as p(xt+1|Ft+1)

3. Resample the particles x(i)_t+1 from {x_e(1)_t+1, . . . ,x_e(N )_t+1} with probabilities {π1

(18)

In the last step the particles are resampled using a multinomial resampling scheme. Different schemes for this have been devised as well, see (Kitagawa, 1996; Liu & Chen, 1998; Carpenter, Clifford, & Fearnhead, 1999).

The output of the SIR can be used to approximate the log-likelihood. Using the prediction error decomposition, we can write the log-likelihood as,

log L(θ) =

T −1

X

t=0

log p(yt+1|θ; Ft).

We approximate p(yt+1|θ; Ft) with,

b p(yt+1|θ; Ft) = Z p(yt+1|xt+1)p(xt+1|Ft; θ)dxt+1 ≈ 1 N N X i=1 p(yt+1|ex (i) t+1) = 1 N N X i=1 w_t+1(i) .

In other words, the SIR approximates the log-likelihood by summing up the logarithms of the mean of the unnormalised weights. Del Moral (2004) shows that for N ≥ 2 this estimator for the likelihood is unbiased.

1.3.2 Fully Adapted Particle Filter (FAPF)

Coined by Malik and Pitt (2011) as a fully adapted particle filter, this type of filter is a special case of the auxiliary particle filter (APF) introduced by Pitt and Shephard (1999). A particle filter can be adapted fully if it is possible to evaluate the predictive density,

p(yt|xt−1) =

Z

p(yt|xt)p(xt|xt−1)dxt,

and sample from the filtering distribution of the latent states, p(xt|xt−1; yt) =

p(yt|xt)p(xt|xt−1)

p(yt|xt−1)

.

This is only possible in a small amount of models but leads to significant gains in efficiency and speed. In that case, we resample the old particles first according to the predictive density, and then sample new particles using the transition density.

(19)

The FAPF is a special case of the APF. The main intuition behind the APF is that it flips the order of the SIR algorithm, using the information in the next period’s observation more efficiently. First, using the predictive density we look at which particles were likely to have generated next period’s ob-servation. Then using these probabilities we resample the old particles first, and after having resampled the old particles using next period’s information, we propagate the particles to the next period using the state equation. The algorithm for the FAPF is as follows.

Algorithm 2 (FAPF) Assume we have samples x(i)_t ∼ p(xt|Ft), i = 1, . . . , N .

For t = 0, . . . , T − 1

1. Calculate the unnormalised weights using the predictive density w_t|t+1(i) = p(yt+1|x(i)t ), i = 1, . . . , N

and normalise them

π(i)_t|t+1= w (i) t|t+1 PN i=1w (i) t|t+1 , i = 1, . . . , N

2. Resample the old particles. Draw N particles_ex(i)_t from {x(1)_t , . . . , x(N )_t } with probabilities {π(1)_t|t+1, . . . , π(N )_t|t+1} for i = 1, . . . , N .

3. Sample new particles x(i)_t+1 ∼ p(xt+1|ex

(i) t , yt).

The output of the FAPF can again be used to approximate the log-likelihood, leading to better and more accurate results than the SIR. The log-likelihood is given by, log L(θ) = T −1 X t=0 log p(yt+1|θ; Ft).

where the output of the SIR approximates p(yt+1|θ; Ft) with,

b p(yt+1|θ; Ft) = 1 N N X i=1 p(yt+1|x(i)t ) = 1 N N X i=1 w_t|t+1(i) ,

(20)

1.3.3 Likelihood evaluation complications

After filtering out the latent states, we are interested in estimating the static parameter vector θ. Kantas et al. (2015) provide an excellent overview of the current state of the art methodology in estimating parameters. It turns out that estimating the parameter vector is often significantly harder than filtering out the latent states.

The log-likelihood is given by the joint density of the observations as function of the parameters,

log L(θ) = log p(y1, . . . , yT|θ)

= T X t=1 log p(yt|θ; Ft−1), where, p(yt|θ; Ft−1) = Z p(yt|xt; θ)p(xt|Ft−1; θ)dxt.

The particle filter allows us to draw from an approximation of p(xt−1|Ft−1; θ)

and we assume we can draw from the transition density p(xt|xt−1; θ). By

drawing samples from the particle filter and propagating them using the transition density we can generate samples from p(xt|Ft−1; θ).

The particle filter can thus be used to estimate the log-likelihood as fol-lows, log bLn(θ) = T X t=1 logp(y_b t|θ; Ft−1) = T X t=1 log 1 N N X i=1 w_t(i) !

where w(i)_t , i = 1, . . . , N are the weights defined in Algorithm 1 and 2. However, this simulated log-likelihood is not smooth in the parameters θ. A small perturbation in θ should induce a small perturbation in the approx-imated log-likelihood. This is not the case right now. Even with a fixed seed (that is, we draw the same uniforms each time to generate new random variables) a small change in the parameters can induce a big (discontinuous) change in the generated particles. We solve this problem by smoothing the empirical CDF as explained in the next section.

(21)

1.4 Continuous resampling

Because we can fully adapt our particle filter, only the resampling method introduces a discontinuity. Ideally, we would like a small perturbation in the parameters to induce a small perturbation in the log-likelihood. However, af-ter we change the parameaf-ters slightly the weights of the particle filaf-ter change slightly. A small change in the parameters can induce a sudden change in the particle weights, leading to different particles being resampled, this in turn leads to a discontinuity in the log-likelihood. To solve this issue and smooth out the resampling operation we use and adopt the approach in Malik and Pitt (2011).

1.4.1 Univariate case

The empirical cumulative distribution function (ECDF) is given by

b FN(x) = N X i=1 π(i)1{x−x(i)_}, (10)

where 1{·} denotes the indicator function returning one if its argument is

positive and zero otherwise, x(i) are sorted particles (in ascending order), and PN

i=1π

(i) _{= 1. This ECDF is approximated by the following function,}

e FN(x) = N X i=1 ckGk x − x(i) x(i+1) _{− x}(i) (11) with x(N +1) _{= ∞, x}(0) _{= −∞, G} 0(z) = GN(z) =1{z}. More importantly we

have, c0 = π1/2, cN = πN/2 and ci = (πi+1− πk_{)/2 for all others in between}

with i = 1, .., N − 1. In general we can take any cumulative distribution func-tion for Gk(z) can be used but for ease of implementation and interpretation

we take Gk(z) = z.

Intuitively, the function bF (x) linearly interpolates the ECDF bFN(x) between

each mid point. The distance between the actual empirical cdf and this approximation is of order 1/N (Malik & Pitt, 2011).

Following Malik and Pitt (2011) we use stratified sampling to generate sam-ples from the inverted CDF, reducing sample impoverishment. Sample im-poverishment happens when, after a certain number of particle filter itera-tions, the particles collapse to a single point. Usually, N uniform random variables u1, ..., uN ∼ U (0, 1) are generated. On the other hand, stratified

(22)

sampling (Kitagawa, 1996; Carpenter et al., 1999) requires only one random uniform variable to be generated u ∼ U (0, 1). We can then generate ordered uniforms with ui = (i − 1)/N + u/N for i = 1, ..., N .

In the univariate version of our model, our particles are the intensity and jump indicator, given by xt = (λt, ∆Nt). Each particle has a certain

(nor-malized) weight πtassociated with it. In Malik and Pitt (2011), in which this

smooth resampling method is introduced, the latent state is the log-volatility. The ECDF of the particles is approximated by linearly interpolating the mid points which results in a smooth resampling operator.

However, because the jump indicator ∆Nt is a discrete random variable

tak-ing on two values (zero or one) the CDF (and also the empirical CDF) is by definition discontinuous. An initial idea is to approximate the inverted CDF of the jump particle with a logistic function but this destroys the inter-pretability. This leaves us with no choice to develop a novel smooth resam-pling approach based on the intensity particles. Our approach is motivated by the (empirical) observation that, a particle with a jump ∆N_t(i) = 1, is associated with a large intensity particle λ(i)_t .

Without loss of generality (WLOG), we can sort the particles x(i)_t = (λ(i)_t , ∆N_t(i)) for i = 1, ..., N such that λ(1)_t ≤ ... ≤ λ(N )_t . We now approximate the ECDF of the intensity as shown in (11). Using the stratified resampling approach we can generate uniform variates, and invert the approximated ECDF. This gives new (smooth) draws λ(1)∗_t , ..., λ(N )∗_t . To now generate ∆N_t(1)∗, ..., ∆N_t(N )∗ we round (up or down) the newly drawn intensities to the nearest value of the old particles and take the associated ∆N_t(i) particle. This last step can be skipped as the jump particles are not used any further, only the intensity particles are used to generate next period’s jumps.

1.4.2 Bivariate case

Inverting and smoothing the ECDF is straightforward in the univariate case. Smoothing the ECDF in the bivariate case is also straightforward but sam-pling from it is not. Hence, the bivariate case requires a different approach. To continuously resample the bivariate case, we use the conditionally inde-pendent approximation described in Malik and Pitt (2011). This method is an approximation but becomes exact as N → ∞ with N/P → ∞. Recall that our goal is to continuously sample N new particles from {xk_{, π}

k} where

(23)

The empirical cumulative distribution function (ECDF) of the first compo-nent (xk

1) of the particle xk is given by,

b FN(x1) = N X i=1 π(i)1_{x−x(i) 1 } , (12)

with x(i)₁ the sorted particles in ascending order. We divide the uniform line in P partitions and store the indices of the particles that fall into these P partitions, that is, we store,

Rp = {i| bFN(x (i) 1 ) > p−1 P and bFN(x (i−1) 1 ) < p P}, p = 1, ..., P. (13)

If πk= 1/N for all k this would reduce to storing the quantiles. Simply put,

Rp stores all the indices for which the ECDF lies between (p − 1)/P and p/P .

Now denote Np as the amount of indices in each individual partition Rp such

that we have,

Rp = {kp1, ..., k Np

p }. (14)

We now have to adjust the weights of the particles in each partition as fol-lows, e πj =          P ×nFc_n(x (k1 p) 1 ) − (p − 1)/P o , j = 1 P × π_kj p, j = 2, ..., Np− 1 P × p/P − cFn(x (kpNp−1) 1 ) , j = Np . (15)

This ensures that PNp

j=1eπj = 1 within each partition. What we have done so far is dividing up the particles of the first component into different partitions and normalizing the weights in each partition.

What we now have to do is generate N/P samples from each of the P par-titions for both x1 and x2. We sample x1 and x2 by applying the same

con-tinuous approximation as in the univariate case. That is, we use the ECDF of x1 within each partition to generate N/P samples of x1 and the ECDF

of x2 of all the particles within each partition to generate N/P samples of

x2.

It is then necessary to link the two samples together in a random fashion within each partition. The assumption is made that the draws within the partition are independent such that we can link the generated samples to-gether without issue. This is the reason why the method is called the con-ditionally independent resampling approximation. If we choose the number

(24)

of partitions P to be particularly high, then this assumption is less likely to be violated but the continuous ECDFs will be based on fewer points which may become a problem. If we take P = 1 then we simply apply the contin-uous resampling to both x1 and x2 and assume they are independent. This

method becomes exact asymptotically as N, P → ∞ with N/P → ∞ (Malik & Pitt, 2011). They find good results using P = 40 partitions with N = 8000 particles.

This concludes the literature review. We started by investigating financial contagion, after which we went over Hawkes processes and particle filters, describing the SIR and FAPF algorithms. After this we went over the prob-lems that are caused by simulating log-likelihood using particle filters and the solution of continuously resampling the particles. In the next section, we set up the model and derive the exact filtering equations.

(25)

2 Model

This section describes the model used and the optimal filtering equations in the univariate and bivariate case.

2.1 Set up

Consider a suitably filtered probability space (Ω, F , P) equipped with a right-continuous filtration {F }t≥0, on which we define the log-price process (Yi,t)t≥0

for two assets i = 1, 2. The log-price process follows the following continuous time dynamics,

dYi,t = µidt + σidWi,t+ Zi,tdNi,t, i = 1, 2 (16)

where µi is the drift, σithe volatility, Wi,t are (ρ-correlated) standard

Brown-ian motions, Zi,t ∼ N (µi,J, σi,J2 ) are the jump sizes and (Ni,t)t≥0 is a counting

process governed by the following bivariate Hawkes process with mutually exciting dynamics,

dλi,t = αi(λi,∞− λi,t)dt +

X

j=1,2

βi,jdNj,t. (17)

For mean-reversion and stationarity in the intensity dynamics we require αi ≥ 0, βi,j ≥ 0, λi,∞ ≥ 0 and αi ≥ βi,1 + βi,2, i = 1, 2. The parameters

in this model are given by θ = (µi, σi, αi, βij, λi,∞, µi,J, σi,J), i = 1, 2, j =

1, 2. In the estimation routine we fix some parameters as we are mainly interested in the Hawkes parameters αi, βij, λi,∞, i = 1, 2, j = 1, 2. Using

Hawkes processes for the jump intensity dynamics is what will allow us to capture the high-frequency contagion effects between different international financial markets. This way, jumps have effect both on the jump intensity dynamics of the market iself, but also on the jump intensity dynamics of neighbouring markets. A jump (dNi,t = 1) not only increases its own jump

intensity λi,t by βi,i (the probability of a future jump in the same series), but

also increases its neighbour’s jump intensity λj,t, j 6= i by βj,i.

Following in the footsteps of Merton (1976), Johannes et al. (2009), Boswijk et al. (2015), and Aït-Sahalia et al. (2015) we use normally distributed jumps. We do this because of parsimony and more importantly because it allows us to derive exact optimal filtering equations. Other jump size distributions are also possible such as the double exponential distribution used in Kou (2002) or the positive and negative exponential distributions used in Hainaut and Moraux (2017).

(26)

In essence, our model is a constant volatility version of the model in Aït-Sahalia et al. (2015) with the key novelty being the estimation routine.

2.2 Optimal Univariate Filter

In this section we derive the optimal filter for the univariate case. It turns out that in both the univariate and bivariate case without stochastic volatility is possible to fully adapt the particle filter due to the tractable jump size distribution.

While the model is still fully adapted even with µ 6= 0, we start by fixing µ = 0. This is done for several reasons. First, in a high-frequency setting the mean return is not significantly different from zero. Second, this parameter is often hard to identify. In the univariate case the model is given by,

dYt= σdWt+ ZtdNt

dλt= α(λ∞− λt)dt + βdNt

(18) We have t = 1, 2, .., M T observations over T periods where M denotes the amount of intra-day intervals of length ∆ = 1/M . Denote the observed returns with yt and the latent states with xt = (λt, ∆Nt). We discretize the

model with an Euler scheme,

yt≡ Yt− Yt−1= σ∆Wt+ Zt−1∆Nt λt= λt−1+ α(λ∞− λt−1)∆t + β∆Nt ∆Nt=     

0 with prob 1 − λt−1∆t + o(∆t)

1 with prob λt−1∆t + o(∆t)

> 1 with prob o(∆t)

(19)

where ∆Wt = Wt− Wt−1∼ N (0, ∆t) and ∆Nt= Nt− Nt−1.

In order to show that the particle filter is fully adapted we have to show two things, first that we can sample from the predictive density, and second that we can directly sample from the density of the latent states.

The predictive density is given by, p(yt|xt−1) =

Z

p(yt|xt)p(xt|xt−1)dxt, (20)

and the filtered density of the latent states is given by, p(xt|xt−1; yt) =

p(yt|xt−1)

(27)

The first term we encounter in both Equation (20) and the numerator of Equation (21) is the likelihood p(yt|xt). Equation (19) shows that the

likeli-hood is normally distributed,

yt|xt= σ∆Wt+ Zt−1∆Nt∼ N (mt, vt)

with mt = µJ∆Nt and vt= (σ

√

∆t)2_{+ (σ}

J∆Nt)2. Alternatively, we can also

write this as follows

p(yt|xt) = φ(y, m1t, v 1 t) ∆Nt_{φ(y, m}0 t, v 0 t) 1−∆Nt = φ∆Nt=1(y) ∆Nt_φ ∆Nt=0(y) 1−∆Nt with m1_t = µJ and vt1 = (σ √

∆t)2+ σ2_J and m0_t = 0 and v_t0 = (σ√∆t)2. With φ(x, µ, σ2_{) the pdf of a normally distributed random variable with mean µ,}

variance σ2_{, evaluated at x and φ}

∆Nt=j(y) ≡ φ(y, µ

j t, v

j

t), j = 0, 1.

The second term we encounter in Equation (20) and Equation (21) is the transition density p(xt|xt−1). To derive the transition density we start by

nothing that the distribution of λt is degenerate, conditioning on ∆Nt, that

is,

p(xt|xt−1) = p(λt, ∆Nt|xt−1)

= p(λt|λt−1, ∆Nt−1, ∆Nt)p(Nt|λt−1, ∆Nt−1)

=1{λt=λt−1+α(λ∞−λt−1∆+β∆Nt)}· p(∆Nt|λt−1, ∆Nt−1)

(22)

where 1{·} denotes the indicator function. We now assume that ∆t is small

enough such that at maximum one jump occurs in each interval.

The final piece of the puzzle is the predictive density which can be found by summing out the latent states,

p(yt|xt−1) = Z p(yt|xt)p(xt|xt−1)dxt = φ∆Nt=1(yt)p(∆Nt= 1|xt−1) + φ∆Nt=0(yt)p(∆Nt= 0|xt−1) = X j={0,1} φ∆Nt=j(yt)p(∆Nt = j|xt−1).

In other words, the predictive density is a mixture of normal distributions and the distribution of the filtered states is given by,

p(xt|xt−1; yt) = p(yt|xt)p(xt|xt−1) p(yt|xt−1) . = φ∆Nt=1(yt) ∆nt_φ ∆Nt=0(yt) 1−∆Nt P j={0,1}φ∆Nt=j(yt)p(∆Nt= j|xt−1) (λt−1∆t)∆Nt(1 − λt−1∆t)1−∆Nt. (23)

(28)

2.3 Optimal Bivariate Filter

Due to relatively simple structure of the model, the filter can also be fully adapted in the bivariate case. In the bivariate case, our model is given by

dY1,t = σ1dW1,t+ Z1,tdN1,t

dY2,t = σ2dW2,t+ Z2,tdN2,t,

(24) where W1,t, W2,t are two ρ-correlated Brownian motions and N1,t and N2,t are

two counting processes whose intensities are governed by a bivariate Hawkes process. We discretize the model with an Euler scheme,

yt,1 = Y1,t− Y1,t−1 = σ1∆W1,t+ Z1,t−1∆N1,t yt,2 = Y2,t− Y1,t−1 = σ2∆W2,t+ Z2,t−1∆N2,t λ1,t = λ1,t−1+ α1(λ1,∞− λ1,t−1)∆t + β11∆N1,t+ β12∆N2,t λ2,t = λ2,t−1+ α2(λ2,∞− λ2,t−1)∆t + β21∆N1,t+ β22∆N2,t. ∆Ni,t =     

0 with prob 1 − λi,t−1∆t + o(∆t)

1 with prob λi,t−1∆t + o(∆t)

> 1 with prob o(∆t)

.

(25)

This derivation is similar to the univariate case. Denote the observations with yt = (y1,t, y2,t)0 and the states with xt = (λ1,t, λ2,t, ∆N1,t∆N2,t). Again,

we have to evaluate the predictive density, p(yt|xt−1) =

Z

p(yt|xt)p(xt|xt−1)dxt,

and sample from the filtered density of the states p(xt|xt−1; yt) =

p(yt|xt−1)

.

Equation (25) shows that the likelihood p(yt|xt) ∼ N (µ, Σ) is multivariate

normally distributed with mean vector ¯µ and covariance matrix Σ, ¯ µ =µJ,1∆N1,t µJ,2∆N2,t , (26) Σ = σ2₁, ρσ1σ2 ρσ1σ2, σ22, +(σJ,1∆N1,t) 2_, ₀ 0, (σJ,2∆N1,t)2 . (27)

(29)

We denote the likelihood with p(yt|xt) = φ∆N1=i,∆N2=j(yt) for i, j = 0, 1.

Similar to the univariate case, both intensities can be conditioned out, p(xt|xt−1) = p(λ1,t, λ2,t, ∆N1,t, ∆N2,t|xt−1) = p(λ1,t|∆N1,t, ∆N2,t)p(λ2,t|∆N1,t, ∆N2,t)p(∆N1,t, ∆N2,t|xt−1) =1{λ1,t=λ1,t−1+α1(λ1,∞−λ1,t−1)∆t+β11∆N1,t+β12∆N2,t}× 1{λ2,t=λ2,t−1+α2(λ2,∞−λ2,t−1)∆t+β21∆N1,t+β22∆N2,t}× p(∆N1,t, ∆N2,t|xt−1) = (λ1,t−1∆t)∆N1,t(1 − λ1,t−1∆t)1−∆N1,t(λ2,t−1∆t)∆N2,t(1 − λ2,t−1∆t)1−∆N2,t (28) Finally, the predictive density can be found by summing over all possible combinations of (∆N1,t, ∆N2,t). p(yt|xt−1) = Z p(yt|xt)p(xt|xt−1)dxt (29) = X i=0,1 X j=0,1 φ∆N1=i,∆N2=jp(∆N1,t = i, ∆N2,t = j|xt−1) (30)

In this section we described the model and explicitly derived the optimal filtering equations by writing out the predictive densities and filtered den-sities of our model both in the univariate and bivariate case. This is made possible by the mathematically tractable jump size distribution and constant volatility. In the next section we use these results to estimate the model on simulated data.

(30)

3 Simulation study

In this simulation study we analyze the accuracy, precision, and feasibility of our novel estimation routine. In our Monte Carlo simulation we use empiri-cally plausible parameter values from Johannes et al. (2009). We can already expect beforehand that identification in this model is difficult. Estimating the Hawkes parameters depends completely on the observed jumps. If jumps are rare identification will necessarily be weak as well. Nevertheless, our sim-ulation study shows that, taking the same parameter values as Johannes et al. (2009), our model is able to filter out the jumps and intensity and more importantly, that our novel parameter estimation works. Before providing the simulation results, we start with an example run of the filter and with showing the impact of the smoothing procedure on the log-likelihood.

Figure 5: This figure shows the filter’s performance using a single simulated dataset with the following parameters T = 100, M = 10, ∆t = 1/M, S = 1000, µ =

0, σ = 0.82, α = 2, β = 1, λ∞= 1, µJ = −2.5, σJ = 4.0

.

Figure 5 shows the performance of the filter using a single set of generated data. The data is generated with µ = 0, σ = 0.82, α = 2, β = 1, λ∞ = 1, µJ =

(31)

−2.5, σJ = 4.0, after which the filter is run once. In detecting jumps, the

filter attains an accuracy of 97% with the 3% misclassification coming from jumps that are too small to be picked up. The Root Mean Squared Error (RMSE) for filtering the intensity λt is 0.27 and the mean absolute error

(MAE) is 0.12.

Figure 6: This figure shows the impact of the smoothing method on the surface of the log-likelihood. The left figure shows the log-likelihood without and the right figure shows the log-likelihood with the smoothing procedure. One set of data is generated and we run the filter to calculate the log-likelihood for various

of (α, β). Similar figures arise as functions of the other parameters. True values are T = 100, M = 10, ∆t = 1/M, S = 1000, µ = 0, σ = 0.82, α = 2, β = 1, λ∞=

1, µJ = −2.5, σJ = 4.0

.

Figure 6 shows the impact of the smoothing procedure on the surface of the log-likelihood. One set of data is generated with the following parameters T = 100, M = 10, ∆t = 1/M, S = 1000, µ = 0.00, σ = 0.82, α = 2, β = 1, λ∞ = 1, µJ = −2.5, σJ = 4.0 after which the filter is ran for various values

of (α, β) to obtain an estimate of the log-likelihood, keeping the other param-eters at their true values. The figure on the left shows the log-likelihood with-out the smoothing procedure, that is, just the standard resampling method by inverting the ECDF of the weights. On the other hand, the right figure shows the surface of the log-likelihood with the smoothing procedure. The log-likelihood has been smoothed and as noted in Malik and Pitt (2011) this allows us to use standard optimization techniques instead of having to resort to non-smooth optimization algorithms. While the log-likelihood does attain its maximum at the true values for (α, β) the figure shows that it is weakly

(32)

identified along the diagonal where α = β.

3.1 Univariate simulation results

For the univariate simulation we generate 2000 data sets using the following parameter values µ = 0, σ = 0.82, α = 2, β = 1, λ∞ = 1, µJ = −2.5, σJ = 4.0

and estimated the model on each of the data sets. We fix µ, µJ, and σJ

at their true values as we are mainly interested in seeing whether we can recover the Hawkes parameters and volatility. Figure 7 plots the empirical distribution function of the estimates obtain from the Hawkes jump-diffusion model with 2000 simulated sample paths. The histogram indicates that the model parameters can be recovered with a reasonable degree of accuracy but that there is a slight bias most of the parameters except for the mean-reversion speed α. We think this is caused by the small sample. Small sample quantiles of the estimates are reported in Table 1 and parameter estimates are reported in Table 2.

Table 1:

This table reports the quantiles of the small sample distribution of the parameter estimates found using 2000 simulated sample paths. True parameters are given by

T = 100, M = 10, ∆t = 1/M, S = 1000, µ = 0.05, σ = 0.82, α = 2, β = 1, λ∞= 1, µJ = −2.5, σJ = 4.0 Parameter 10% 25% 50% 75% 90% σ 0.772 0.783 0.807 0.825 0.844 α 1.822 1.932 2.034 2.156 2.306 β 0.878 0.922 0.972 1.025 1.071 λ∞ 0.877 0.918 0.960 1.003 1.044

3.2 Bivariate simulation results

For the bivariate simulation, we start by restricting µ1 = µ2 = 0. We also

choose σ1 = 0.82, σ2 = 0.50 with a correlation of ρ = 0.39. Furthermore,

we restrict the two baseline intensities to be the same λ1,∞ = λ2,∞ = 1 and

we restrict the two mean-reversion parameters to be the same α1 = α2 = 2.

For the self- and cross-exciting terms we choose β11 = 1, β12 = 0.5 and

β21 = 0.50, β22 = 1. Finally, the jump sizes and variances are given by

(33)

Figure 7: This figure plots the empirical distribution function of the univariate Hawkes jump-diffusion model estimated with a particle filter, obtained by 2000

simulated paths. True values are given by T = 100, M = 10, ∆t = 1/M, S = 1000, µ = 0, σ = 0.82, α = 2, β = 1, λ∞= 1, µJ = −2.5, σJ = 4.0

.

Table 2:

This table reports the parameter estimates of the univariate Hawkes model with exponential decay. We report the true values, mean and standard errors.

Parameter True value Mean Std. error

µ 0.00 - -σ 0.82 0.807 0.040 α 2.00 2.051 0.255 β 1.00 0.974 0.092 λ∞ 1.00 0.959 0.078 µJ -2.50 - -σJ 4.00 -

-case we fix the jump parameters and do not estimate them. With these parameters we generate 1500 data sets and estimate the model on these

(34)

Figure 8: This figure plots the empirical distribution function of the bivariate Hawkes jump-diffusion self- and cross-excitation parameters estimated with a

particle filter obtained by 1500 simulated paths. True parameter values are T = 100, M = 10, S = 2000, P = 20, µ1 = µ2 = 0, σ1 = 0.82, σ2 = 0.50 with

α1 = α2= 2, β11= 1, β12= 0.5, β21= 0.50, β22= 1 and

λ1,∞= λ2,∞= 1, µJ 1 = −2.5, σJ 2 = 4.0, µJ 2 = −3.0, σJ 2= 4.0.

generated data sets.

Figure 8 and Figure 9 plot the empirical distribution functions of the Hawkes jump-diffusion model parameters obtained from estimating the model 1500 times on different data sets generated with the same parameters. Table 3 shows the quantiles of the small sample distribution of the simulation results. Table 4 provides the estimated parameters for the bivariate Hawkes jump-diffusion model. The results indicate that, if we fix some of the parameters to their true values, the parameter estimates can be retrieved with reasonable precision. The estimated means are close to the true values and the standard errors are small.

This concludes the simulation study. We illustrated the performance of the filter and the smoothing procedure of the log-likelihood and provided a uni-variate and biuni-variate Monte Carlo simulation. We find that, for this

(35)

par-Figure 9: This figure plots the empirical distribution function of the diffusion, mean-reversion, baseline intensity, and correlation parameters of the bivariate Hawkes jump-diffusion model estimated with a particle filter and 1500 simulated

paths. True parameter values are

T = 100, M = 10, S = 2000, P = 20, µ1 = µ2 = 0, σ1 = 0.82, σ2 = 0.50 with

α1 = α2= 2, β11= 1, β12= 0.5, β21= 0.50, β22= 1 and

λ1,∞= λ2,∞= 1, µJ 1 = −2.5, σJ 2 = 4.0, µJ 2 = −3.0, σJ 2= 4.0.

ticular set of parameters, the filter works well and the parameters can be estimated with reasonable precision. In the next section we describe the data on which we estimate the model in the empirical application.

(36)

Table 3:

This table reports the quantiles of the small sample distribution of the parameter estimates found using 2000 simulated sample paths. True parameter values are

T = 100, M = 10, S = 2000, P = 20, µ1 = µ2 = 0, σ1 = 0.82, σ2 = 0.50 with α1 = α2= 2, β11= 1, β12= 0.5, β21= 0.50, β22= 1 and λ1,∞= λ2,∞= 1, µJ 1 = −2.5, σJ 2 = 4.0, µJ 2 = −3.0, σJ 2= 4.0. Parameter 10% 25% 50% 75% 90% σ1 0.751 0.789 0.832 0.882 0.904 α1 1.920 1.949 1.995 2.045 2.069 β11 0.921 0.959 1.001 1.048 1.072 β12 0.426 0.459 0.511 0.554 0.587 λ1,∞ 0.924 0.956 1.009 1.054 1.086 σ2 0.431 0.469 0.513 0.554 0.580 α2 1.822 1.892 1.937 1.986 2.023 β21 0.413 0.445 0.487 0.529 0.568 β22 0.916 0.958 1.003 1.042 1.062 λ2,∞ 0.929 0.958 1.003 1.062 1.099 ρ 0.303 0.329 0.381 0.421 0.461

(37)

Table 4:

This table reports the parameter estimates of the bivariate Hawkes jump-diffusion model simulations. We report the true values, mean and standard errors. True

parameter values are given by

T = 100, M = 10, S = 2000, P = 20, µ1 = µ2 = 0, σ1 = 0.82, σ2 = 0.50 with

α1 = α2= 2, β11= 1, β12= 0.5, β21= 0.50, β22= 1 and

λ1,∞= λ2,∞= 1, µJ 1 = −2.5, σJ 2 = 4.0, µJ 2 = −3.0, σJ 2 = 4.0..

Parameter True value Mean Standard error

µ1 0 - -σ1 0.82 0.835 0.080 α1 2 1.994 0.067 β11 1 0.997 0.060 β12 0.50 0.508 0.068 λ1,∞ 1 1.009 0.072 µJ 1 -2.5 - -σJ 1 4 - -µ2 0 - -σ2 0.5 0.512 0.055 α2 2 1.936 0.075 β21 0.5 0.496 0.084 β22 1 0.998 0.059 λ2,∞ 1 1.012 0.083 µJ 2 -3 - -σJ 2 4 - -ρ 0.39 0.3855 0.086

(38)

4 Data

The data for our empirical application consists of high-frequency S&P 500 and FTSE 100 stock index futures returns from January 2012 to December 2012. To calculate the 5-minute intra-day returns we take the difference of the logarithm of the last two tick prices observed before their respective 5-minute marks. This frequency is regarded as the highest frequency at which market micro-structure issues, such as the bid-ask bounce, can be neglected (Andersen et al., 2001).

Futures contracts trade on a quarterly cycle. The contracts offered expire in March, June, September, and December. To avoid price effects we calculate the stock index future returns based on the contract closest to maturity and roll over to the next contract on the first day of the month of expiration. For example, imagine it is January 1st now. We calculate all intra-day returns using the contract that is closest to maturity, that is, the contract expiring in March. When it’s March 1st, we roll over the contract and calculate the returns based on the contract expiring in June.

High-frequency futures returns are particularly well-suited to study financial contagion for several reasons. First, using stock index futures alleviates the infrequent trading problem that occurs when using the underlying equity in-dexes (Martens, 2002). Stock index prices are based on the last observed transaction prices of the constituents. A result of this is that the stock index lags in price developments, especially during the early hours where there is little volume in the underlying stocks. Second, stock index futures incorpo-rate new information faster than the spot market due to lower transaction costs and less restrictive rules on short selling (Wu, Li, & Zhang, 2005). Strong empirical evidence is cited by Wallace, Kalev, and Lian (2014) who find that during the global financial crisis of 2008 the S&P 500 stock index futures dominated the price discovery process (compared to the underlying). Third, stock index futures trade significantly longer hours than regular stock indices. Since 1994 the Chicago Mercantile Exchange (CME) trades using the GLOBEX system on which the (E-mini) S&P 500 futures trade nearly 24/7 with only a short break between 4:00pm and 5:00pm (UTC-6). The underlying equity index trades only during regular business hours (8:30am to 3:15pm UTC-6). However, because we study financial contagion we need perfectly synchronous observations, which means that we are limited to the trading hours in which the two markets overlap.

Summary statistics for the high-frequency stock index futures returns of the S&P 500 and FTSE 100 are presented in Table 5. The mean of the series is

(39)

Table 5:

This table reports the descriptive statistics for the percentage log returns (100 × ∆ log(pt) where pt) of the SP500 and FTSE100 stock index futures

spanning the period January - December 2012

Region N Min Max Mean Std Skewness Kurtosis US 26244 -0.388 0.399 8.3·10−5 0.0198 0.2486 60.114 UK 26244 -0.151 0.144 10.8 ·10−5 0.0188 -0.0268 4.543

not significantly different from zero and the large kurtosis indicates fat tails. Surprisingly, for this sample we see that the skewness of the returns of the S&P 500 is slightly positive, while uncommon, this is not unheard of (Noh & Kim, 2006; Abhyankar, Copeland, & Wong, 1997).

High-frequency intra-day returns exhibit regular intra-day fluctuations in the volatility. The diurnal pattern is U-shaped with high volatility periods at the beginning and end of the trading day and lower volatility levels during lunch. Carefully removing this pattern is important for obvious reasons. Several methods in the literature have been proposed such as cubic splines (Stroud & Johannes, 2014; Weinberg et al., 2007) and flexible Fourier forms (Slimane, Mehanaoui, Kazi, et al., 2014). Because we are starting from the assumption that the data contains jumps, we have to filter out this regular intra-day pattern using a method that is robust to jumps. Therefore, we follow the method proposed in Boudt, Croux, and Laurent (2008). The main intuition behind the method is to decompose the high-frequency return in a volatility term, a periodic term, and a random noise term and then standardize the high-frequency returns using a jump-robust estimator for the periodic term, for details, see Appendix A.

(40)

5 Results

This section contains the results of the univariate and bivariate empirical application and some diagnostic statistics. We present, interpret and explain the results and critically evaluate our methodology.

Table 6:

This table presents the parameter estimates of the univariate models estimated on the high-frequency returns of the SP&500 (US) and the FTSE100 (UK) in

2012. The number of asterisks (*, **, ***) denotes significance at the 95%, 97.5%, and 99.5% level, respectively. Standard errors are in parentheses.

Parameter US UK σ 0.576∗∗∗ 0.440∗∗∗ (0.079) (0.051) α 55.177 27.581 (210.190) (205.191) β 35.476 24.96 (318.47) (110.142) λ∞ 1.356∗∗∗ 2.615∗∗∗ (0.086) (0.164) µJ -0.019 -0.014 (0.053) (0.025) σJ 0.319∗∗∗ 0.083∗∗∗ (0.056) (0.032)

We start with the univariate case. Fixing µ = 0 in both cases leaves us with 6 parameters to be estimated: σ, α, β, λ∞, µJ, σJ. Table 6 presents the

parame-ter estimates for the univariate Hawkes jump-diffusion model estimated with our novel estimation routine based on particle filtering. In both S&P 500 and FTSE 100 the large estimated values of β indicate a strong self-excitation. We estimate an average of 1.356 jumps per day for the S&P 500 and 2.615 for the FTSE 100. The jump sizes that are estimated are negative and small, indicating that we find a larger amount of smaller jumps throughout the day. To find the standard errors we numerically approximate the Hessian at the estimated parameter values. We see that the standard errors found using this method are particularly large for the mean-reversion parameter α and the self-excitation parameter β. The estimates for the volatility σ, background intensity λ and jump standard deviation σJ are significant at the 0.1%

(41)

Table 7:

This table presents the parameter estimates of the bivariate model estimated on high-frequency returns of the SP&500 (ES) and the FTSE100 (LFE) in 2012. The number of asterisks (*, **, ***) denotes significance at the 95%, 97.5%, and

99.5% level, respectively. Standard errors are in parentheses. Note that we have restricted α1 = α2 and λ1,∞= λ2,∞ and fixed µ1= µ2 = 0.

Parameter US Parameter UK σ1 0.624∗∗∗ σ2 0.459∗∗∗ (0.021) (0.065) α1 42.233 α2 42.233 (66.834) (80.832) β11 18.454 β21 4.532 (201.304) (102.384) β12 15.469 β22 28.584 (128.654) (90.340) λ1,∞ 2.065∗∗∗ λ2,∞ 2.065∗∗∗ (0.065) (0.113) µJ 1 -0.058 µJ 2 -0.121 (0.0326) (0.0784) σJ 1 0.205∗ σJ 2 0.201∗ (0.1035) (0.092) ρ 0.392∗∗∗ (0.092)

We now consider the bivariate case. Fixing µ1 = µ2 = 0 and α1 = α2 and

λ1,∞= λ2,∞leaves us with 12 parameters to be estimated in total σ1, σ2, α =

α1 = α2, β11, β12, β21, β22, λ1,∞ = λ∞, µJ 1, σJ 1, µJ 2, σJ 2, ρ. Compared with

the univariate case the volatility estimate for the S&P 500 is slightly higher while the FTSE 100 is roughly the same. The mean-reversion parameter α lies in between the two univariate estimates. The parameters we are most interested in are the self- and cross-excitation parameters β11, β12, β21, β22

which measure the degree of contagion between the markets. Similar to the univariate case, we find that the S&P 500 shows strong self-excitation (indicated by the large β11) but also has considerable cross-excitation as

indicated by the large value of β12. On the other hand, the FTSE 100 also

exhibits strong self-excitation but little to no cross-excitation towards the US markets, as indicated by the comparatively low β21. This finding is consistent

with Aït-Sahalia et al. (2015) who find a similar asymmetric cross-excitation between the markets, but at the daily frequency. As in the univariate case,

(42)

the volatility estimates σ1, σ2, the background intensities λ1,∞, λ2,∞, the jump

size standard deviations σJ 1, σJ 2, and the correlation parameter ρ are all

estimated significantly. The Hawkes parameters and the mean of the jumps are not significant. We note that the estimates of the jump means µJ i, i =

1, 2 are considerably higher in the bivariate case than in the two separate univariate cases.

Figure 10: This figure plots the path of the intensities for the S&P500 (top panel) and the FTSE100 (bottom panel) using the estimated parameters from the

two separate univariate Hawkes jump-diffusion models.

Using the estimated parameters from the univariate Hawkes jump-diffusion model we plot the time-varying and path-dependent jump intensity, we do this in Figure 10. The figure indicates the presence of jump clustering, pe-riods of relatively little jumps and pepe-riods of significant jump activity. The top panel of Figure 10 shows the estimated path of the jump intensity in the S&P 500 stock index futures and the bottom panel shows the estimated path of the jump intensity in the FTSE 100 stock index futures, estimated using the parameter values from the two separate univariate cases.

Figure 11 shows the path-dependent intensity for the S&P500 and FTSE100 but using the estimated parameters of the bivariate Hawkes jump-diffusion

(43)

Figure 11: This figure plots the path of the intensities for the S&P500 (top panel) and the FTSE100 (bottom panel) using the estimated parameters from the

bivariate Hawkes jump-diffusion model.

Figure 12: This figure plots the filtered jumps for the S&P500 and the FTSE 100 based on the estimated parameters in the univariate and bivariate model. The top two panels contain the filtered jumps of the univariate model and the bottom two panels contain the filtered jumps based on the bivariate model.

Measuring High-Frequency Contagion in Hawkes Jump-Diffusion Models Using Particle Filters