The Heston Model: Theory, Numerical Issues and Calibration

(1)

The Heston Model: Theory, Numerical Issues and

Calibration

(2)

Acknowledgements

In the first place I thank my parents for their faithful support and encouragement during my studies.

I am very grateful to my supervisors dr. Nieuwenhuis and dr. Vellekoop, for suggesting the topic, spending a large amount of time on reading the draft of this thesis and for providing very useful comments throughout the duration of the project.

As this project has been initiated by The Derivatives Technology Foundation, most of the work was done in Amsterdam amongst the staff of AtomPro and Saen Options. I would like to thank everyone as I enjoyed my stay quite a lot, in particular dr. Van den Hijligenberg and dr. De Geeter for supervision, comments and for providing the necessary data. Special thanks go out to dr. Ga´al for his interest in the project, his valuable on-the-spot suggestions and for occasionally providing the necessary distraction.

Of course I am also very much indebted to the staff of the Econometrics department at the University of Groningen, who took care of my mathematical and statistical education during the last five years. This is a proper place to express my gratitude for many interesting and challenging lectures I enjoyed. I would also like to express thanks to the staff of the History department, and in particular to my mentor dr. Wolffram, for encouraging me to pursue my studies of this beautiful subject in spite of many complications.

Many thanks also go out to the teachers at the Murmellius Gymnasium in Alkmaar who not only provided our class with the necessary skills for academic study, but were very keen at stimulating our interests in both arts and sciences at every possible occasion.

(3)

6 Calibration 47 6.1 Data . . . 47 6.2 Calibration Settings . . . 48 6.2.1 Optimization Routine . . . 48 6.2.2 Objective function . . . 48 6.2.3 Initial Value . . . 48 6.2.4 Weights . . . 49 6.2.5 Fit Measures . . . 50 6.3 Dividends . . . 50 6.4 Test Data . . . 51 6.4.1 Graphical illustration . . . 51 6.4.2 Simulation study . . . 55

6.5 AEX Index Data . . . 60

6.5.1 The basic Heston model . . . 60

6.5.2 Parameter Stability . . . 66

6.5.3 Piecewise Constant Parameters . . . 67

6.6 Additional calibration results . . . 69

7 Applications 74 8 Conclusion 75 9 Appendix 76 9.1 Derivations for the Heston model with piecewise constant parameters . . . 76

9.2 Local Volatility . . . 78

9.2.1 Main Result . . . 79

9.2.2 Relationship with instantaneous volatility . . . 80

(4)

1 Outline

In the past twenty years numerous attempts have been made to relax the assumptions of the Black-Scholes (1973) option pricing formula, in particular the assumption of constant volatility [Hull and White (1987), Scott (1987), Stein and Stein (1991), Heston (1993), Ball and Roma (1994)]. Empirical studies based on different types of data show this assumption is counterfactual [Macbeth and Merville (1979), Jackwerth and Rubinstein (1996), Bakshi et al (1997), Fouque et al. (2000), Bree and Joseph (2003)]. The most quoted evidence consists of the sloping implied volatility surfaces observed in cross-sectional option data, the nonnormal distribution of the logarithm of asset returns and the historical time series properties of the variance of the stock returns process. Let us first formalize the implied volatility surface argument. Assume we have, under the risk-neutral measure Q, a predictable stock price process {St}, 0 ≤ t ≤ T governed by

dS(t) = rS(t)dt + σS(t)dW (t) (1)

with respect to the filtration {F_t,0≤t≤T} generated by the Brownian motion W (t). The starting point is the Black-Scholes formula for the European Call Option

CBS(S(t), T − t, r, σ, K) = S(t)N (d+(T − t, S(t))) − e−r(T −t)KN (d−(T − t, S(t))) (2) where d+(T − t, S(t)) = 1 σp(T − t){log S(t) K + (r + 1 2σ 2_{)(T − t)}} ₍₃₎ d−(T − t, S(t)) = 1 σp(T − t){log S(t) K + (r − 1 2σ 2_{)(T − t)}} ₍₄₎

and N (.) denotes the cumulative distribution function of the standard normal distribution. Define σImp(CM, S(t), T − t, r, K) to be that value of the Black-Scholes volatility parameter which makes

the Black-Scholes Call option pricing formula fit the market price of a Call option CM with strike

price K and time to maturity T − t, while the value of the underlying is S(t) and the interest rate equals r

σImp:= {σ ∈ R+: CM = CBS(S(t), T − t, r, σ, K)}. (5)

We can plot the value of σImp across the grid of strikes and maturities (K, T ) we observe in the

market. Whereas theoretically this should be a flat surface in three-dimensional space, in practice it is curved both along the strike and the maturity dimension. [Bakshi et al. (1997), Fouque et al. (2000), Gatheral (2006)] The phenomenon is demonstrated in figures 1 and 2 for empirical cross-section data on AEX index options, dating from May 2, 2006. The curved shape of implied volatility across the maturity dimension can be partly attributed to idealized assumptions of the basic Black-Scholes model other than the assumption of constant volatility. One may, for example, think of the constant interest rate assumption [Bakshi et al. 1997]. However, across the moneyness dimension these effects are neutralized and the constant σ assumption can clearly be seen to be quite much of an idealization of reality.

From Merton’s (1973) general formulas we have the positive relationship between volatility and the price of an option and we can also easily derive that the derivative of the Black-Scholes Call option price with respect to volatility has positive sign. Based on these notions we can draw a number of conclusions from our empirical evidence in Figure 1, but for clarity, we will first introduce some definitions. Defining F (t) = er(T −t)_{S(t) to be the Forward Price of the stock, we henceforth name}

a Call option contract out-of-the-money (OTM) if F << K, at-the-money (ATM) if F ≈ K and in-the-money (ITM) if K << F .

(5)

Figure 1: Implied volatilities for AEX Index Calls, time-to-maturity 0.125 years and 1.12 years

(6)

relative values in the Black-Scholes world: for an absolute comparison we would have to specify a flat Black-Scholes volatility somehow, which is not realistic. For time-to-maturity 1.12 we see that the relative value of out-of-the money options is lower as compared to at-the-money and in-the-money options. This phenomenon is called a volatility skew. Figure 1 is quite representative for general market conditions as implied volatilities of options with short time-to-maturity often smile, whereas those with longer time-to-maturity often display a skew.

The implied volatilities reflect trader’s preferences. Before the 1987 stock market crash, smiles and skews were virtually absent and the Black-Scholes assumptions worked rather well. However, this crash, leading to a decline of about twenty percent on a single day, demonstrated that the assump-tion of lognormal stock returns isn’t a good descripassump-tion of reality. From that period on, traders focused on the empirical log returns distribution and found it to be leptokurtic (that is,with fatter tails than the normal distribution, and more peaked) and often also skewed (not symmetric). Heston (1993) discusses the effects of taking skewness and kurtosis of the log returns distribu-tion into account. He compares the resulting opdistribu-tion prices to Black-Scholes opdistribu-tion prices. He distinguishes two main effects. If kurtosis is controlled for and a distribution is skewed to the left (meaning it has a fat left tail), the relative prices of out-of-the-money options decrease as compared to at-the-money and in-the-money options. Intuitively, this is due to the fact that, simultaneously, the right tail will now contain less probability mass and hence the probability of a large upward move of the stock is lower than in the Black-Scholes model. Out-of-the-money options can only gain value if such an upward move does occur and hence it is reasonable to assign to them a relatively lower price if the log returns distribution is skewed to the left.

If skewness is controlled for and kurtosis is increased, this will lead to relatively higher values of in-the-money and out-of-the-money options as compared to at-the-money options. Intuitively, in-the-money options benefit as they are less vulnerable to a large price decrease, which is now more likely due to a fat left tail, than at-the-money options. Out-of-the-money options benefit because the right tail of the log returns distribution is now also fatter, which makes it more likely that they get in-the-money again.

Figure 3 is a typical histogram of daily log returns. It is skewed to the left and leptokurtic. Fol-lowing our intuitive explanation, this will lead to relatively higher in-the-money option prices and hence to relatively high in-the-money implied volatilities, which is what we observe in Figure 1. In this figure the volatility of out-of-the-money options is higher than the volatility of at-the-money options for the shorter time-to-maturity: clearly the kurtosis effect is dominant here. It can be interpreted intuitively as the fear of a crash on the short term. For the longer time-to-maturity the skewness effect is dominant. It appears traders have different beliefs about the log returns distribution on the short and on the long run.

There are more reasons why the constant volatility assumption is an idealization, which become apparent from time series. In Figure 4 we have graphed a typical process of return volatility. We see that it is mean-reverting, i.e. it hovers around a fixed mean. A realistic market model should also possess this property in addition to allowing for a leptokurtic, skewed returns distribution. Moreover, modeling the correlation between the log returns and their volatility is very important as this is closely linked to the skewness in the log returns distribution. If there is a negative correlation between log returns and volatility, this will mean that volatility will rise if the stock price decreases and drop if the stock price increases. This could explain a fat left tail and a short right tail as the stock becomes more volatile as it goes down and less volatile if it goes up. For positive correlation, this argument can be reversed.

(7)

Figure 3: Typical histogram of daily log stock returns

(8)

2 Stochastic Volatility Models

2.1 General specification and important models

The most common type of models proposed for modeling the empirical findings mentioned in Chapter 1 postulates a second diffusion to model volatility. In a general form, under the market probability measure P

dS(t) = µSS(t)dt + f (V (t))S(t)dW1(t) (6)

dV (t) = µV(t, S(t), V (t))dt + σV(t, S(t), V (t))dW2(t) (7)

on a time axis T = [0, T ] , with {W1(t)} and {W2(t)} Brownian motions T × Ω ⇒ R with

instantaneous correlation coefficient ρ and positive initial values for S(0) and V (0). Moreover we have the filtration {Ft}0≤t≤T generated by the two-dimensional Brownian motion {W1(t), B2(t)}

from which we construct our correlated Brownian motions.1 The function f is assumed to satisfy regularity conditions; common specifications are for example f (y) =√y or f (y) = ey_{. We assume}

there also is a bank account M (t) which evolves according to dM (t) = rM dt.

The first papers using this framework are those of Hull and White (1987), Scott (1987) and Johnson and Shanno (1987). The most important specifications are listed in Table 1 (all parameters in this table are constant).2_{All specifications use equation (6) for the stock price process. One attempts}

to obtain a parsimonious formulation for the volatility equation which implies that µV and σV

are chosen to be dependent on V (t), but not on t and S(t). This is due to the fact that volatility is not directly observed in the market and model fitting, either from cross-section option data or historical time series of returns, is bound to be difficult anyway.In the following we will supress the dependence for notational simplicity and write µV = µV(V (t)), σV = σV(V (t). Except for a

Taylor series expansion in the special case with ρ = 0, and µV and σV constant in the Hull-White

model, no closed-form formulae are derived in these papers and option prices are calculated by means of Monte Carlo simulation, as discussed in section 4.3.

The ideas proposed in the early literature have been elaborated upon in subsequent years,

Table 1: Model specifications in the early stochastic volatility literature

model vol.driving process link function

Hull-White dV = µVV dt + σVV dW2 f (V ) = √ V Scott dV = β( ¯V − V )dt + σVdW2 f (V ) = V , f (V ) = eV Stein-Stein dV = −δ(V − ¯V )dt + kdW2 f (V ) = |V | Heston dV = κ(θ − V0) + ωpV (t)dW2 f (V ) = √ V

with analytical formulae for the option price becoming the primary target for the researchers. Stein and Stein (1991) introduced Fourier Transform methods to obtain an analytical formula for the arithmetic Ornstein-Uhlenbeck volatility process with ρ = 0 (see Table 1).3 _{Heston (1993)}

managed to generalize the Fourier Transform approach to the case of correlated Brownian motions, modeling variance as a Cox-Ingersoll-Ross process. This procedure and its subsequent extensions (most notably Carr and Madan (1999)) are the standard pricing methods in this models class ever since [Cf. Bakshi et al (1997), Mikhailov and Noegel (2003) Schoutens et. al.(2003)]. Ball and Roma (1994) came up with a power expansion method as in Hull and White (1987). They generalized it to different specifications of the volatility equation, and found results similar to the Fourier Transform approach but their method is still restricted to ρ = 0. It will not be elaborated here, as this restriction is at odds with empirical evidence in equity option markets.

1_{If {W}

1(t), B2(t)} is a two-dimensional Brownian Motion, then W1(t) and W2(t) = ρW1(t) +

p

(1 − ρ2_)B 2(t)

are Brownian motions with instantaneous correlation coefficient ρ.

2_{See also Fouque et al. (2000) for an overview.}

3_{Ball and Roma comment on a flaw in this paper due to improper handling of negative values of volatility, see}

(9)

Before continuing, we should ask ourselves how these models could help us to capture the empirical properties we mentioned in Chapter 1. The mean-reversion property of the time series of the variance of asset returns will be captured by the volatility specifications of all models in Table 1 except the Hull-White model.4 A simulated path of volatility in the Heston model is displayed in Figure 5. Moreover, the parametrization of stochastic volatility models allows for a wide range of distributions of the logarithm of asset returns at maturity, featuring both skewness and kurtosis (Figure 6) and thus for the volatility smile and/or skew (Figure 7). A detailed discussion of these properties is presented in Chapter 3.

2.2 Pricing

In stochastic volatility models, we can apply the conventional option pricing methods, being the Equivalent Martingale Measure (EMM) approach and the Self-Financing Portfolio approach. How-ever, as the models are not complete, we require additional assumptions. Suppose we want to find the price of a European Call using the EMM approach. We assume there is an equivalent martin-gale measure Q such that under Q the discounted stock price ˜St= S(t)/M (t) is a martingale with

respect to the filtration generated by the two-dimensional Brownian motion. By a fundamental argument5_{it holds that if we price any option with maturity T at time t ∈ T using the expression}

C(t, S(t), V (t)) = EQ(e−r(T −t)C(T, S(T ), V (T ))|F_t), then there will be no arbitrage opportunity, which makes C(t, S(t), V (t) a possible price for this option. Note that by means of a standard derivation, under the market probability measure, we can write equation (6) in the form:

d ˜S(t) = (µS− r) ˜S(t) + f (Vt) ˜S(t)dW1(t) (8)

We apply Girsanov’s Theorem6 _{to derive the risk-neutral probability measure. A useful thing to}

note, in order to arrive at the standard setting of this theorem, is the fact that two correlated Brownian motions W1(t), W2(t) can be written as

W1(t) = B1(t) W2(t) = ρ Z t 0 dB1(u) + p (1 − ρ)2 Z t 0 dB2(u) (9)

where B1(t) and B2(t) are independent Brownian motions.7 Taking this into account we can

rewrite equations (6)-(7) as d ˜S(t) = (µS− r) ˜S(t)dt + f (Vt) ˜S(t)dW1(t) (10) dV (t) = µVdt + σV(ρdW1(t) + p 1 − ρ2_dB 2(t)) (11)

where the Brownian motions W1(t) and B2(t) are independent.

Define ˜W1(t) = W1(t) +R t 0

(µS−r)

f (Vt) dt, a Brownian motion with drift to obtain

d ˜S(t) = f (Vt) ˜S(t)d ˜W1(t) (12) dV (t) = (µV − σVρ (µS− r) f (Vt) )dt + σV(ρd ˜W1+ p 1 − ρ2_dB 2(t)) (13)

According to the two-dimensional version of Girsanov’s Theorem, writing (µS−r)

f (Vs) = θ(s) and using

the Radon-Nikodym derivative process

Z(t) = exp{− Z t 0 θ(u)dW1(u) − 1 2 Z t 0 θ(u)2du}, (14)

4_{We will discuss this in detail in section 3.1.} 5_{See Fouque et al. (2000) sections 1.4 and 2.5} 6_{e.g. in Shreve (2004) p. 224.}

(10)

Figure 5: Simulation of implied volatility in the Heston model, for 1000 days. Parameter values κ = 5, θ = 0.05 (corresponding to a long-term implied volatility of 0.2236), ω = 0.2, V0= 0.05

Figure 6: Simulated histogram of log returns at maturity in the Heston Model. Parameter values κ = 5, θ = 0.05, ω = 0.2, ρ = −0.8, V0 = 0.05. Time to Maturity: 1 year, 10.000 simulations.

(11)

Figure 7: Black-Scholes Implied volatilities for Option Contracts with Time to Maturity T = 1, for S(0) = 100, following from the Heston model with parameters κ = 5, θ = 0.05, ω = 0.2, ρ = −0.8, V0= 0.05

{ ˜W1(t), B2(t)} is a two-dimensional Brownian motion under ˜Q, where ˜Q is defined by ˜P (A) =

R

AZ(T )(ω)dQ(ω), ∀A ∈ F . Under this probability measure, stochastic differential equation (12)

for the stock price process consists of an Ito integral only, which is a martingale, hence the probability measure ˜Q is an equivalent martingale measure indeed. Note that this is only one of infinitely many such measures. Define ˜Qγ _{to be the probability measure constructed analogously,}

but instead using the derivative process

Z(t) = exp{− Z t 0 θ(u)dW1(u) − Z t 0 γ(u)dB2(u) − 1 2 Z t 0 θ(u)2du − 1 2 Z t 0 γ(u)2du} (15)

where γ(t) is any suitably regular adapted process on [0, T ]. Now { ˜W1(t), B2(t)+

Rt

0γudu} is a

two-dimensional Brownian motion under ˜Qγ_{. Equation (12) will once more be a martingale. Actually}

the system (12)-(13) is but a special (albeit important) case of a more general specification which features ˜B2(t) = B2(t) +R

t

0γ(u)du instead of B2 in the second equation. By substitution we get

d ˜S(t) = f (Vt) ˜S(t)d ˜W1(t) (16) dV (t) = (µV − σV(ρ (µS− r) f (Vt) + γ(t)p1 − ρ2_{))dt + σ} V(ρd ˜W1+ p 1 − ρ2_{d ˜}_B 2(t)) (17)

Any specification of γ(t) in (17) produces an associated equivalent martingale measure ˜Pγ, which leads to option prices C(t, S(t), V (t)) = ˜EQγ(e−r(T −t)_{C(T, S(T ), V (T ))|F}

t) with, once more, an

arbitrage-free market. The process γ(t) is coined the process of the ‘market price of volatility risk’. It is not the task of the theorist to determine the ‘correct value’ of this process and thus the ‘correct’ Equivalent Martingale Measure at any time t. This measure is chosen by the market and it has to be retrieved empirically from market prices.8 _{It is common to postulate a specification for}

the γ(t) process, like for example γ(t) = λV (t), with λ constant, which is economically plausible and keeps the stochastic volatility model tractable analytically.9 _{The parameters of the chosen}

8_{Bjork (2004) 220-223.}

(12)

specification are then estimated from the prices of option contracts at a specific moment in time and the plausibility of the resulting model can be assessed by looking at its fit to the market prices. As γ(t) does not feature in the real-world models for the stock price10, the parameters used to model this process are not of major interest.11

2.3 Hedging

Stochastic volatility models contain two sources of randomness (two independent Brownian mo-tions) and one underlying traded asset other than the risk free asset (the stock). Hence, by e.g. Theorem 8.3.1 in Bjork (2004) a European call option cannot be replicated by a self-financing portfolio using the stock and the bank account.12 _{However, we can hedge with a combination}

of stock, bank account and another option on the same stock, with identical or longer time to maturity. Assume there exists an option pricing formula that can be used to price all European call options on a certain stock on a certain time moment Ci _{= C}i_{(t, S(t), V (t)), where we index}

the options with different strikes and maturities by i. In particular consider two maturities, T1

and T2, such that T1 < T2 and one fixed strike K and denote the corresponding option prices

by C1_{(t, S(t), V (t) and C}2_{(t, S(t), V (t)). We can now construct a self-financing portfolio process}

which replicates C1_{by means of S(t), M (t) and C}2_{(t, S(t), V (t)). We look for adapted processes}

{wj(t, S(t), V (t))}0≤t≤T1, 1 ≤ j ≤ 3 (abbreviated to wj(t) for convenience) with initial values

wj(0, S(0), V (0)) such that

dC1(t, S(t), V (t)) = w1(t)dS(t) + w2(t)dM (t) + w3(t)dC2(t, S(t), V (t)) (18)

C1(T1, S(T1), V (T1)) = w1(T1)S(T1) + w2(T1)M (T1) + w3(T1)C2(T1, S(T1), V (T1)) (19)

If we can find such processes, by the no arbitrage principle, for 0 ≤ t ≤ T 1 the option price is given by

C1(t, S(t), V (t)) = w1(t)S(t) + w2(t)M (t) + w3(t)C2(t, S(t), V (t)) (20)

To find the adapted processes we apply the two-dimensional Ito formula13 to write out self-financing condition (18). The subscripts in the formula denote derivatives with respect to the variables and we will abbreviate Cj_{(t, S(t), V (t) to C}j_{. The result is}

C_t1dt + C_S1dS(t) + C_V1dV (t) +1 2C 1 SSdS(t)dS(t) + C 1 SVdS(t)dV (t) + 1 2C 1 V VdV (t)dV (t) = (21) = w1(t)dS(t) + w2(t)dM (t) + w3(t)(Ct2dt + C 2 SdS(t) + C 2 VdV (t) + 1 2C 2 SSdS(t)dS(t)+ (22) +C_SV2 dS(t)dV (t) +1 2C 2 V VdV (t)dV (t))

We can use dM (t) = rert_{dt and the fact that the quadratic variance [X, X](t) of an Ito process}

dX(t) = a(t)dW (t) + b(t)dt is given byRt

0a(u)

2_{du to rearrange and obtain}

(C_t1+1 2C 1 SSf (Vt)2S(t)2+ CSV1 ρσVf (V (t))S(t) + 1 2σ 2 VC 1 V V)dt + C 1 SdS(t) + C 2 VdV (t) = (23) = (w2(t)rert+ w3(t)(Ct2+ 1 2C 2 SSf (Vt)2S(t)2+ CSV2 ρσVf (V (t))S(t) + 1 2σ 2 VC 2 V V))dt+ (24) +(w1(t) + w3(t)CS2)dS(t) + w3(t)CV2dV (t) 10_{Fouque et al. (2000) 50.}

11_{In the Heston model we will be fitting in the following chapters, it is common not to report the estimates of}

the market price of volatility risk process, but instead to rewrite the model in such a way that the parameters of this process are absorbed in the parameters which do feature in the real-world model. Compare e.g. Bakshi et al. (1997), Schoutens et al. (2003), Moodley (2005). In Mikhailov and Noegel (2003) the volatility risk parameter is estimated separately, however. We will follow the former approach.

(13)

To simplify notation define L(C) = Ct+ 1 2CSSf (V (t) 2_)S(t)2_{+ C} SVρσVf (V (t))S(t) + 1 2σ 2 VCV V

upon substitution in (23) and (24) this gives

L(C1)dt + C_S1dS(t) + C_V1dV (t) =

= (w2(t)rert+ w3(t)L(C2))dt + (w1(t) + w3(t)CS2)dS(t) + w3(t)CV2dV (t) (25)

This equality holds only in the case that the coefficients in front of the dt, dS(t) and dV (t) terms are equal, for the last two terms have independent Brownian motion components as long as ρ ∈ (−1, 1). We cannot replicate a Brownian motion by other means than by itself. From this it becomes clear that the processes {w1(t)} and {w2(t)} should satisfy the equations

C_V1 = w3(t)CV2 (26) w1(t) + w3(t)CS2 = C 1 S (27) which imply w3(t) = C1 V C2 V (28) w1(t) = CS1− C 2 S C1 V C2 V (29)

and from the self-financing equation (20) we have

w2(t) =

C1_{− w}

1(t) − w3(t)C2

M (t) (30)

from which we obtain by substitution

w2(t) = C1− C1 S+ C1 V C2 V (C_S2− C2₎ ert (31)

The only term which remains to be equated now is the dt term. Apparently we have to solve, under the boundary condition C(T1, S(T1), V (T1)) = max(ST1− K, 0)

14 L(C1) = r(C1− CS1+ C1 V C2 V (C_S2− C2)) +C 1 V C2 V L(C2) (32) which is equivalent to (L(C1) − rC1+ rC_S1) −C 1 V C2 V (L(C2) − rC2+ rC_S2) = 0 (33)

Note now, that a function satisfying (L(Ct) − rC + rCS) = 0 will also satisfy (33). Moreover, the

solution is not unique, as the solution to the partial differential equation (L(Ct) − rC + rCS) −

CVa(t, S(t), V (t)) = 0, where a is an arbitrary function, is also a solution of the equation.15

This problem directly corresponds to the one we encountered in the previous section. The value of the process a(t, S(t), V (t)) at given t has to be retrieved from market data on the prices of derivative contracts, using for example one of the stochastic volatility models discussed, specifying a functional form for a and estimating its parameters. Note, that if we choose, without loss of generality, the specification

a(t, S(t), V (t)) = µV − σV(ρ

(µS− r)

f (Vt)

+ γ(t, S(t), V (t))p1 − ρ2₎ ₍₃₄₎

for a, the corresponding solution of (33) will be the solution to the two-dimensional Feynman-Kac partial differential equation satified by C(t, S(t), V (t)) = ˜EQγ(e−r(T −t)C(T, S(T ), V (T ))|F (t)), the price process we derived in the previous section.

(14)

3 The Heston Model

Many different stochastic volatility models have been proposed in the past 20 years, as has become clear from our literature review. Instead of implementing and calibrating a large number of models, we have chosen to analyze one model in-depth, using the newest calibration and pricing methods. We have chosen the Heston (1993) model for this purpose, as it is very widely used, both in the scientific literature and by practitioners [e.g. Bakshi et al. (1997), Detlefsen (2005), Gatheral (2006)]. Under the market measure P , the evolution of stock and volatility is given by:

dS(t) = µS(t)dt +p(V (t))S(t)dW1(t) (35)

dV (t) = κ(θ − V (t))dt + ωp(V (t))dW2(t) (36)

The Brownian motions satisfy [W1, W2](t) = ρt and the parameters are assumed to be constant

and positive, except for ρ which is in (−1, 1). Note that the second diffusion models the square of the volatility, as we have a relationship of the form σ(t) = f (V (t)) =p(V (t)).

Applying the procedure from section 2.2, transformation to the risk neutral measure ˜Qγyields the system dS(t) = rS(t)dt +pV (t)S(t)d ˜W1(t) (37) dV (t) = (κ(θ − V (t))dt − ω(ρ(µ − r) +pV (t)γ(t)p1 − ρ2_{))dt + ω}p V (t)(ρd ˜W1+ p 1 − ρ2_{d ˜}_B 2(t)) (38) where ˜W1 and ˜B2 are independent Brownian motions under ˜Q

γ

.It is assumed that the market price of volatility risk satisfies the relationship γ(t) = λV (t) in order for the partial differential equation corresponding to the model (see section 3.2) to have a tractable solution.

The model under the risk-neutral measure is usually written in the form (39)-(40), where the parameters marked with a star are different from those in the specification of the model under the market measure P .It is demonstrated in Heston (1993) page 335, that κ∗= κ + λ, θ∗= _κ+λκθ . This can be seen directly from the fundamental PDE corresponding to this model. Note that we will henceforth use the parameters corresponding to (39)-(40) throughout our work. In subsequent sections we will drop the stars.

dS(t) = rS(t)dt +pV (t)S(t) ˜W1(t) (39)

dV (t) = κ∗(θ∗− V (t))dt + ωp(V (t))d ˜W2(t) (40)

3.1 Properties of the Volatility SDE

From the equation for V (t) it is intuitively clear that squared volatility is mean-reverting, as we will demonstrate explicitly in this section. The intuition is as follows: abstracting from the mean-zero random term, for a positive mean-reversion parameter κ, the drift term will be positive if V (t) < θ and it will be negative if V (t) > θ, which will push V (t) up to θ in the first case and down to θ in the second. This makes it plausible that, indeed, V (t) reverts to a fixed level θ. This is a desirable property for a stochastic volatility model, as empirical volatility/variance time series are indeed mean-reverting over time[e.g. Fouque et al (2000)].

Before discussing the properties of specification (40) it is valuable to take a look at the Ornstein-Uhlenbeck stochastic differential equation for volatility:

dpV (t) = −δpV (t)dt + kd ˜W2 (41)

which is mean-reverting, with long-term expectation equal to zero. The properties of specification (40) are closely related, aspV (t) in (41) is an Ito process and we can apply Ito’s lemma using the function f (x) = x2 _{to obtain}

(15)

Using the substitutions ω = 2k, κ = −2δ, θ = k_κ2 = _−2δk2 we have the Heston specification (40) with long-term expecation equal to θ.

In the following we will derive the distribution of V (t) in (40)/(42), starting from the Ornstein-Uhlenbeck process. A more general specification for the Ornstein-Ornstein-Uhlenbeck process, allowing for non-zero long term expectation is

dpV (t) = δ(m −pV (t))dt + kd ˜W2, δ > 0 (43)

We can derive an explicit solution by means of a variation of parameters argument. The goal is to eliminate thepV (t) term on the right hand side, in order to obtain a simple combination of a deterministic integral and an Ito integral there, while the left hand side remains tractable. Recall that by Ito’s Lemma we will obtain, for a function f (t,pV (t))

df (t,pV (t)) = f1dt + f2d

p

V (t) + f22d[

p

V (t),pV (t)](t) (44)

This implies that if we manage to find a function f (pV (t)) with the properties • ft= δpV (t)f2

• f22=0

we will have eliminated the pV (t) term. It is not very difficult to establish that the function f (t,pV (t)) = pV (t)eδt _{satisfies these requirements. This brings us to the relationship}

d(pV (t)eδt) = δmeδtdt + eδtkd ˜W2 (45) By integration p V (t)eδt =pV (0) + δm Z t 0 eδsds + k Z t 0 eδsd ˜W2(s) (46)

and by calculating the Riemann integral and rearranging

p

V (t) = m + (pV (0) − m)e−δt+ k Z t

0

e−δ(t−s)d ˜W2(s) (47)

Making use of the fact that the Ito integral of a deterministic integrand I(t) =R₀t∆(s)dW (s) is normally distributed with mean zero and varianceRt

0∆

2_{(s)ds (Shreve p. 149-150) we obtain the}

distribution ofpV (t): p V (t) ∼ N (m + (pV (0) − m)e−δt,k 2 2δ(1 − e −2δt₎₎ ₍₄₈₎

for all positive values of t, which means that this process belongs to the Gaussian class. For t → ∞ we have pV (t) ∼ N (m,k_2δ2). This is called the invariant distribution of pV (t) in the Ornstein-Uhlenbeck process. The special case (41) has the same properties, with m set to 0 in the equations. Note the important difference between the variance of this process, which converges to a fixed value for t → ∞, and the variance in equations of the type dX(t) = adt + σdW (t) where the variance goes to infinity with for t → ∞.

Having established the distribution of the Ornstein-Uhlenbeck process, we can now derive the distribution of V (t) in specification (40) in the Heston Model. From the above we know that V (t) is the square of a normally distributed random variable, with, in general, nonzero mean. An analytical solution to the process does not exist. However, we can calculate the moment generating function for V (t) using the technique of completing the square.16

We want to derive

E{exp{upV (t)2}} (49)

(16)

In general, for a normally distributed random variable x, with mean µ and variance σ2_:

E{exp{ux2}} = Z ∞

−∞

exp{ux2}φ(x)dx (50)

substituting the normal probability density function φ(x) gives

E{exp{ux2}} = √ 1 2πσ2 Z ∞ −∞ exp{ux2− 1 2σ2(x − µ) 2 }dx (51)

The exponent in the integrand is of the form

ax2− b(x − c)2 ₍₅₂₎

where a = u, b = _2σ12, c = µ. Completing the square leads to the equality

ax2− b(x − c)2= (a − b)(x − bc b − a)

2_{+ bc}2₍ a

b − a) (53)

plugging in µ and σ again and rewriting slightly leads to

−1 − 2σ 2_u 2σ2 (x − µ 1 − 2σ2_u) 2₊ uµ2 1 − 2σ2_u (54)

Note that the first term in this expression is the exponent term of a normal distribution with mean

µ

1−2σ2_u and variance

σ2

1−2σ2_u. Due to the equality established above we can write

E{exp{ux2} = √ 1 2πσ2exp{ uµ2 1 − 2σ2_u} Z ∞ −∞ exp{−1 − 2σ 2_u 2σ2 (x − µ 1 − 2σ2_u) 2_}dx ₍₅₅₎ which amounts to E{exp{ux2}} =√ 1 2πσ2exp{ uµ2 1 − 2σ2_u}, ∀u < 1 2σ2 (56)

as the probability density function integrates to one.This is the moment-generating function of the non-central chi-squared distribution. The condition u < 1

2σ2 is required to ensure that the term

σ2

1−2σ2u is positive, in order for its interpretation as a variance to hold.

The derivation for the moment generating function for process (40) is identical, using the mean and the variance of the Ornstein-Uhlenbeck process (equation 48) for µ and σ2_{. From this formula}

we obtain raw and central moments by differentiation with respect to u, evaluating the expression at u = 0.17 _{The mean and variance in terms of the Heston parameters are given by}

EV (t) = e−κtV (0) + θ(1 − e−κt) (57) V ar(V (t)) = ω 2 κV (0)(e −κt_{− e}−2κt_{) +}θω2 2κ (1 − 2e −κt_{+ e}−2κt₎ ₍₅₈₎

So, once again the variance of V (t) converges to a fixed valueθω_2κ2 as t → ∞ and the mean converges to θ.

Thus, the Ornstein-Uhlenbeck process and the Cox-Ingersoll-Ross process have an important feature in common. The practical advantage of using the Cox-Ingersoll-Ross process is, however, that it never becomes negative. It can be shown that it can take zero values, unless 2κθ > ω2.But suppose this happens at a certain t∗_{, then the SDE simplifies to dV (t}∗_{) = κθdt which means that}

the process will become positive again. This does not hold for the Ornstein-Uhlenbeck process described above. It has been demonstrated by Ball and Roma (1994) in their critique of the paper by Stein and Stein (1991) that simple solutions, like taking the absolute value of the process, are 17_{This is quite an exercise and can best be done by an algebra program like Maple. An alternative derivation for}

(17)

Figure 8: The expectation of the V (t) process over time, for different starting values V0and κ = 1,

θ = 0.05, ω = 0.2

Figure 9: The variance of the V (t) process over time, for different values of V0and κ = 1, θ = 0.05,

(18)

Figure 10: The expectation of the V (t) process over time, for different values of κ and V0= 0.075,

θ = 0.05, ω = 0.2

Figure 11: The variance of the V (t) process over time, for differentvalues of κ and V0 = 0.075,

(19)

Figure 12: The variance of the V (t) process over time, for different values of ω and V0 = 0.075,

θ = 0.05, κ = 1. Note that ω does not affect the expectation of V (t).

(20)

Figure 14: Simulated V (t) process, for κ = 1 and κ = 5, V0= 0.075,ω = 0.2, θ = 0.05.

(21)

inappropriate. Imposing a reflecting barrier complicates the analysis significantly. Hence, the CIR process is easier to handle in applications, while it has the same appealing features for volatility modeling as has the Ornstein-Uhlenbeck process.

In order to visualize the effects of the parameters on the mean and variance of the V (t)-process, we have included a number of graphs. The impact of the initial value of the variance, V0 is

demonstrated in Figures 8 and 9. Figure 8 demonstrates the mean-reversion property, as the mean of V (t) converges to the long-term average θ = 0.05. The paths of convergence of the variance of V (t) are shown in 9. Figure 10 shows the effect of κ, which is called the mean-reversion parameter as it controls the speed of convergence to the long-term average. Indeed, for κ = 5 the mean value of the process approaches the long-term value θ = 0.05 much quicker than for κ = 1 or κ = 0.5. Also, the long-term variance of V (t) differs over the values of κ as is clear from equation (58). The larger κ is, the less volatile the variance process V (t) becomes (Figure 11). The ω-parameter is related positively to the long-term variance (Figure 12), but it does not affect the mean of V (t) like κ. Hence, the larger ω, the more volatile the V (t) process is in the long run. This is why ω is often called the ’volatility of volatility’ in the literature, but note that κ has an impact on V ar(V (t) in the long run as well. Figures 13-15 depict simulations of V (t) processes for different parameter combinations for purpose of illustration.

3.2 Pricing formula for European Call

Having investigated the properties of the equation driving the volatility process we continue the analysis of the Heston model focusing on pricing European Calls. We start out from equations (39)-(40), which hold under a risk-neutral probability measure ˜Qγ, assuming the specification γ(t, S(t), V (t)) = cV (t) for the process of the market price of volatility risk.

dS(t) = rS(t)dt +p(V (t))S(t)d ˜W1(t)

dV (t) = κ(θ − V (t))dt + ωp(V (t))d ˜W2(t)

We will use the Equivalent Martingale Measure approach, that is, we will look for a price

C(t, S(t), V (t)) = EQ˜γ{e−r(T −t)(S(T ) − K)+|F (t)}, 0 ≤ t ≤ T (59) which satisfies the boundary condition

C(T, S(T ), V (T )) = (S(T ) − K)+

The processes for stock and volatility are Markovian and hence (59) is equivalent to18

C(t, S(t), V (t)) = EQ˜γ{e−r(T −t)(S(T ) − K)+|S(t), V (t)}, 0 ≤ t ≤ T. (60) Now define EQ˜γt,x,yh(X(T ), Y (T )) to be the expectation of h(X(T ), Y (T )) given that X(t) = x and Y (t) = y, for 0 ≤ t ≤ T and a function h(q, r). In particular, let h(s, v) = (s − K)+_{and define}

˜

c(t, s, v) = EQ˜γt,s,v(S(T ) − K)+ (61) 18_{Intuitively, this means that the information contained in the values of stock and volatility at time t is the only}

(22)

It can now be shown that ˜c(t, s, v) is a martingale.19 _{Equation (59) corresponds to c(t, s, v) =}

e−r(T −t)˜c(t, s, v). Due to the discounting term c(t, s, v) is not a martingale, however, it is easily shown that er(T −t)c(t, s, v) is. We apply the two-dimensional Feynman-Kac Theorem to this expression20 to establish that c(t, s, v) is a solution of

ct+ rscs+ (κθ − κv)cv+ 1 2s 2_vc ss+ ρωsvcsv+ 1 2ω 2_vc vv = rc (62)

under the boundary condition c(T, s, v) = (s − K)+ ∀s ≥ 0, v ≥ 0. The intuition behind this result is as follows. Applying Ito’s lemma to er(T −t)c(t, s, v) we obtain the partial differential equation

d(er(T −t)c) = (−rer(T −t)cdt + er(T −t)(ctdt + csds + cvdv +

1

2cssdsds + csvdsdv + cvvdvdv) (63) The term in front of dt can be obtained by writing the equation out, it is given by

−rer(T −t)c + er(T −t)(ct+ rscs+ (κθ − κv)cv+ 1 2s 2_vc ss+ ρωsvcsv+ 1 2ω 2_vc vv) (64)

and as we have established that er(T −t)c(t, s, v) is a martingale, intuitively this drift term should be equal to zero, which is exactly what equation (62) implies.

It can furthermore be demonstrated21_{that this solution satisfies other boundary conditions implied}

by the market environment.

• c(t, 0, v) = 0 ∀0 ≤ t ≤ T, v ≥ 0 (If the stock value equals zero, so does the value of the call option)

• c(t, s, 0) = (s − e−r(T −t)_K))+ _{∀0 ≤ t ≤ T, s ≥ 0 (When volatility is zero, the payoff}

is deterministic. Note that in the risk-neutral world the drift of the stock is equal to the risk-free rate, so the stock has no discount factor in this formula, whereas the strike price has)

• lims→∞c(t,s,v)_s−K = 1 ∀0 ≤ t ≤ T, v ≥ 0 (If the stock price goes to infinity, the option value

approaches the value of the stock itself, as the strike price is finite)

• limv→∞c(t, s, v) = s ∀0 ≤ t ≤ T, s ≥ 0

We could determine option prices by solving this partial differential equation numerically. We will work with the Fourier Inversion approach, however, which is derived in the original Heston paper, as this has considerable numerical advantages. For this, we require the characteristic function of the log stock price distribution at maturity. The intuition behind the derivation of this function is based on a particular interpretation of the Black-Scholes formula for the European call

CBS(S(t), T − t, r, σ, K) = S(t)N (d+(T − t, S(t))) − e−r(T −t)KN (d−(T − t, S(t))) (65) 19_{Let 0 ≤ t} 1≤ t2< T then E{˜c(t2, s, v)|Ft1} = E{E t2,s,v_{(S(T ) − K)}+_|F t1} = E{E(S(T ) − K)+|S(t2), V (t2)|Ft1} = E{E(S(T ) − K)+|F_t 2|Ft1} = E{(S(T ) − K)+_|F t1} = E{(S(T ) − K)+|St1, Vt1} = ˜c(t1, s, v)

where the third equality follows from the Markov property, the fourth from the Tower property, the fifth again from the Markov property and the final equality holds by definition.

20_{Like in Shreve (2004), Chapter 6. Our specifications for stock and volatility satisfy all necessary regularity}

conditions.

(23)

d+(T − t, S(t)) = 1 σp(T − t){log S(t) K + (r + 1 2σ 2_{)(T − t)}} ₍₆₆₎ d−(T − t, S(t)) = 1 σp(T − t){log S(t) K + (r − 1 2σ 2_{)(T − t)}} ₍₆₇₎

Under the Black-Scholes assumptions and the risk-neutral measure

S(T ) = S(t)exp{σ( ˜W (T ) − ˜W (t)) + (r −1 2σ

2_{)(T − t)}} ₍₆₈₎

from which, using ˜W (T ) − ˜W (t) ∼ N (0, T − t) we obtain

logS(T ) = log(S(t)) − σp(T − t)Y + (r − 1 2σ

2_{)(T − t)} ₍₆₉₎

where Y follows a standard normal distribution. The probability that the option finishes in the money is now given by

P (S(T ) > K|Ft) = P (log(S(T ) > log(K)) = P (log(

S(t) K ) > σ √ T − tY − (r −1 2σ 2_{)(T − t) =} = P (Y < log( S(t) K ) + (r − 1 2σ 2_{)(T − t)} σ√T − t ) = N (d−(T − t, S(t))) (70)

So the normal probability in the second term of the Black-Scholes solution equals the risk-neutral probability of the stock price exceeding the strike price at maturity, conditional on Ft. The normal

probability in the first term can also be interpreted as a conditional probability, of finishing in-the-money, but under another measure,under which

dS(t) = (r + σ2)Sdt + σSdW∗(t) (71)

In this setting we can also make use of the fact the stock price process is Markov and apply the Feynmac-Kac Theorem to obtain a probabilistic representation. Define

h(s) = Is≥K

f1(t, s) = Et,sh(S∗(T ))

f2(t, s) = Et,sh(S(T )).

Making use of the interpretation of the Black-Scholes terms as probabilities and of the fact that a probability can be written as the expectation of an indicator function, we now obtain the following representation of the Black-Scholes price:

CBS(S(t), t, r, σ, K) = S(t)f1(t, S(t)) − e−r(T −t)Kf2(t, S(t)) (72)

Note also that the boundary condition CBS(S(T ), T, r, σ, K) = (S(T ) − K)+ is satisfied as

f1(T, S(T )) = f2(T, S(T )) = 1 if S(T ) > K and 0, else.

The starting point in Heston (1993) is the assumption that the call price formula in the stochastic volatility model, is analogous.

c(t, s, v) = sf1(t, log(s), v) − e−r(T −t)Kf2(t, log(s), v) (73)

where the logarithm of the stock price is taken for convenience. If this solution is to satisfy partial differential equation (62), then it can be derived22 that its component functions f1(t, x, v) and

f2(t, x, v) (writing x = log(s)) should satisfy the partial differential equations

f1t+ (r + 1 2v)f1x+ (κθ − κv + ρωv)f2v+ 1 2vf1xx+ ρωvf1xv+ 1 2ω 2_vf 1vv = 0 (74)

(24)

f2t+ (r − 1 2v)f2x+ (κθ − κv)f2v+ 1 2vf2xx+ ρσvf2xv+ 1 2ω 2_vf 2vv= 0 (75)

These are the partial differential equations which, according to the Feynman-Kac Theorem, are the equivalents of f1(t, x, v) = Et,x,v(g(X∗(T ), V∗(T )) and f2(t, x, v) = Et,x,v(g(X(T ), V (T )), where

the stochastic processes are given by

dX∗(t) = (r +1 2V ∗_{(t))dt +}p V∗_{(t)d ˜}_W 1(t) (76) dV∗(t) = (κθ − κV∗(t) + ρωV∗(t))dt + ωpV∗_{(t)d ˜}_W 2(t) (77) dX(t) = (r −1 2V (t))dt + p V (t)d ˜W1(t) (78) dV (t) = (κθ − κV (t))dt + ωpV (t)d ˜W2(t). (79)

and g(x, v) = Ix≥logK, in order to satisfy the boundary condtion C(T, s, v) = (s − K)+, just like in

our preceding Black-Scholes example. The functions f1and f2 once more have the interpretation

of conditional probabilities of the option expiring in-the-money under two different measures, the second of which is the model risk-neutral measure.23

We cannot calculate these conditional probabilities directly, but instead we will make use of characteristic functions. By definition, the characteristic function of a conditional probability function PX|Y,Z(x|y, z) is given by

φX|Y,Z(u) =

Z ∞

−∞

eiuxPX|Y,Z(x|y, z)dx = E(eiux|y, z) (80)

The relationship between (conditional) probabilities and the characteristic function follows from the Levy Inversion Formula24 _{and is given by}

PX|Y,Z(x ≥ a|x, z) =

1

2πlimT →∞ Z T

−T

e−iuaφX|Y,Z(u)du (81)

which can be rewritten as

PX|Y,Z(x ≥ a|y, z) = 1 2 + 1 π Z ∞ 0 Re{e −iua_φ X|Y,Z(u) iu }du. (82)

So the strategy now is to obtain the characteristic functions corresponding to

f1(t, x, v) = PX∗_{(T )|X(t),V (t)}(x∗(T ) > log(K)|x(t), v(t))

f2(t, x, v) = PX(T )|X(t),V (t)(x(T ) > log(K)|x(t), v(t))

and then insert them into formulae (81) and (82). This can be achieved by once more applying the Feynman-Kac Theorem. Note that

φ1(u) = φX∗_{(T )|X(t),V (t)}(u) = E(eiuX ∗_{(T )}

|X(t), V (t)) (83)

φ2(u) = φX(T )|X(t),V (t)(u) = E(eiuX(T )|X(t), V (t)) (84)

This is just the setting we have seen above, with the function h(t, x, v) = eiux_{. The Feynman-Kac}

Theorem now states that these characteristic functions are the solutions of partial differential equations. These partial differential equations are given in the Appendix, which also contains a derivation of the solution of these equations. The solutions for j = 1, 2 are given by

φj(u) = eC(T −t,u)+D(T −t,u)v+iux (85)

(25)

where C(T − t, u) = rui(T − t) − 2κθ ω2{(ln 1 − ged(T −t) 1 − g + β(T − t)} D(T − t, u) = −2β ω2 1 − ed(T −t) 1 − ged(T −t) α = ρωui − bj+ d 2 (86) β = ρωui − bj− d 2 d = q (ρωui − bj)2− ω2(2ajui − u2) g = β α

where a1 = 1₂,a2 = −1₂,b1 = κ − ρω and b2 = κ.25 A useful way of rewriting is by noting that

φ1(u) = 1_Sφ2(u − i) which follows from straightforward substitution in the formulae in (87).26

We can now substitute expression (85) into formula (87).

fj(t, x, v) = 1 2+ 1 π Z ∞ 0 Re{e −iuln(K)_φ j(u) iu }du. (87)

The integrand is real and can in general be computed numerically. It may not be smooth and well-behaved as is suggested in Heston (1993) however. To achieve a stable numerical algorithm some additional rewriting is necessary. These numerical issues will be discussed thoroughly in Chapter 4. Finally, substitution of f1, f2 in formula (73) gives us the desired prices.

The characteristic functions required to calculate prices in the Heston model are quite complicated and they have given rise to considerable confusion in the literature and amongst practitioners. The confusion has arisen due to two distinct features. First of all, the components of the characteristic function, and in particular C(T − t, u), can be written in different algebraically equivalent ways. Our formulation is based on Minenna27and is easily rewritten to the original one of Heston (1993). However, Schoutens, Simons and Tistaert (2004) and Gatheral (2005) use another formulation, which, although algebraically equivalent, turns out to be easier to handle numerically. We will treat this topic in section 4.1, show there the rewriting exercise necessary to obtain the alternative formulation and demonstrate why it is important to use it in numerical algorithms. This ambiguity, in addition to the fact that different symbols are used for the Heston parameters in the literature (what we call ω here is often referred to as η for example) often raises suspicion of typos and other errors against which we would like to warn the reader. The main references resolving the confusion caused by this issue are Lord and Kahl (2006) who clearly distinguish between the different formulations and their numerical consequences and Albrecher et al. (2006).

Second, we have found threads on Internet forums asking which is the characteristic function corresponding to the conditional distribution pX(T )|X(t),V (t)(x(T )|x(t), v(t)) of stock price X(T )

at maturity given current values of stock X(t) and volatility V (t) and whether the Heston model possesses two such functions. Of course the answer to this question is, that this is the function we named φ2. The function φ1 is the characteristic function which corresponds to the conditional

probability of X∗(T ), a random variable resulting from a different process, which is governed by equations (76)-(77). It is nothing but an auxiliary result for determining the option price.

25_{Note that in e.g. Heston (1993) we have b}

1 = κ + λ − ρω, b2 = κ + λ, where λ is the parameter in our

specification of the market price of volatility risk. Remember that the κ we use here already corresponds to the transformed risk-neutral system (39)-(40) so λ does not feature here. If one would like to estimate the λ-parameter separately, the appriopriate characteristic function also be given by (85)-(87), only adjusting b1and b2to κ + λ − ρω

and κ + λ respectively.

26_{we can also take the term ruiτ out of the exponent, such that we have F = Se}rτ_{instead of S in the denominator}

of this last formula.

(26)

3.2.1 Extension: piecewise constant parameters

An important extension [Mikhailov and Noegel (2003)] of the standard Heston model is the use of piecewise constant parameters, that is we assume in the most general case

(κ, θ, ω, ρ, V0) =          (κ1, θ1, ω1, ρ1, V0) t ∈ [0, t1) (κ2, θ2, ω2, ρ2, V0) t ∈ [t1, t2) .. . ... (κn, θn, ωn, ρn, V0) t ∈ [tn−1, T ]

for a certain n. Instead of the five Heston parameters we need to estimate we have 4n parameters. It is left open to the modeler how many time periods are chosen and which parameters are allowed to vary. This is a question of parsimony versus fit quality. For example, Haerdle and Weron (2005) allow the ρ and ω parameters to change over time and choose the moments at which the parameters change equal to the maturities in their option universe, de facto fitting each volatility slice separately. In this procedure it is not very hard to retrieve the values of θ(t), ω(t) and ρ(t) which have to be in operation in between the different maturities. However, a more parsimonious specification involving only a few changes over time might also improve the fit significantly. For the case in which parameter changes do not coincide with expirations, we can derive a pricing function by generalization of the characteristic function described above. This procedure is outlined in the Appendix.

Other possibilities, for example a functional specification of some parameters, are also considered in the literature [Mikhailov and Noegel (2003)]. Whereas the θ and κ parameters can be treated this way, this is much harder for the parameters ρ and ω, which turn out to be very important for the calibration.

3.3 Assessment of the Heston Model in previous works

In the past decade a considerable amount of theoretical and empirical contributions commenting on the properties and the performance of the Heston model has been published. We present a summary of this literature here. This will also allow us to take into account the noted shortcom-ings of the model during calibration.

The possible advantages of applying the Heston model as compared to the Black-Scholes model with a single volatility for the entire option surface have already been discussed in sections 2.1 and 3.1. It encompasses a rich variety of option pricing effects. In particular it can give rise to log asset return distributions exhibiting negative skewness and excess kurtosis, which are often encountered in empirical data, and to a wide range of implied volatility surfaces (Heston (1993): 338-340, Moodley (2005): 2-7). Moreover, the Heston model is analytically tractable due to the presence of a pricing formula for European Calls, which allows for calibration without having to resort to time-consuming Monte Carlo methods (Mikhailov and Noegel (2003): 79). These advan-tages come at the cost of parsimony, as the Heston model needs five parameters instead of one parameter in the Black-Scholes model, if the latter is used in the theoretically correct way. The most comprehensive study of stochastic volatility models is Bakshi et al. (1997). They study the pricing and hedging performance of the basic Heston model (SV), the Heston model with jumps (SVJ) and the Heston model with stochastic interest rates (SVSI). The dataset used com-prises European Calls on the Standard and Poors 500 Index, for the period June 1988-May 1991. Whereas in the conclusion of their article they decide in favor of using the Heston model and its extensions, as it performs far better than theoretical Black-Scholes, they find a considerable number of shortcomings along the way.

(27)

they find a value of V0 such that the price of the contract equals the model price (one could call

this a Heston Implied Volatility). These values turn out not to be constant across the moneyness dimension, thus still yielding a smile, albeit less pronounced than for the Black-Scholes case. Also they find (p. 2025-2026) that the values of the ρ and ω parameters they back out from their dataset (for example, for the model in which all moneyness and maturity categories are included these are ρ = −0.64, ω = 0.39) are unrealistically large in absolute value, as compared to what is observed in time series data of stock and volatility.28

Still, the authors achieve a reduction of pricing errors of twenty to seventy percent using the He-ston SV model and its extensions as compared to the Black-Scholes model with a single implied volatility. They also assess the dynamic hedging performance, both for single-instrument hedg-ing and for Delta-neutral hedghedg-ing. For the shedg-ingle-instrument hedge they find the SV model to work better for out-of-the money options, but the performance is comparable to Black-Scholes for other contract types under daily rehedging. For Delta-neutral hedging the Heston model reduces hedging errors two- to threefold as compared to Black-Scholes, but this is partly due to the fact that the hedge in this model uses two assets, the stock and another option contract (see section 2.3). The authors also augment the Black-Scholes hedge with a position in an option contract, such that the hedge becomes vega-neutral (which is, as they remark, at odds with the theory) and compare the results with the SV model. It is found that the resulting hedge is not significantly worse than the Heston SV hedge, except for in-the-money options.

In spite of the shortcomings, the authors conclude that taking Stochastic Volatility into account is of first-order importance in improving upon the Black-Scholes formula. Further improvements of the model are suggested, especially allowing for jumps in the stock price model, as the authors find this to lead to an improvement in the pricing of short-term option contracts. Taking stochastic interest rates into account in turn leads to better long-term option prices.

Other literature containing notable empirical results on the Heston model includes Mikhailov and Noegel (2003), Gatheral (2004), Detlefsen (2005), Moodley (2005) and Schoutens et al. (2003). Mikhailov and Noegel (2003) obtain a very satisfying fit to option data on the SP 500 index for July 12th 2002, with a maximum error of less then 0.15 percent. They find that the Heston model cannot accomodate the skew displayed by options near the expiration date (p. 79), which is con-sistent with the findings of Gatheral (2004:33-34). In line with the findings of Bakshi et al. (1997) inclusion of jumps is suggested. This, however, requires three additional parameters, reducing the parsimony of the model and complicating calibration.

Mikhailov and Noegel (2003) also discuss the pitfalls of the calibration procedure of the standard Heston model, which are arguably its main drawback. They focus on the dependence of the cali-bration results on the initial value, if one uses a deterministic optimization algorithm (see section 6.2.1) and on the lack of stability of the parameters. The first point is illustrated in Moodley (2005:29) who finds totally different parameter values using three different optimization routines with the same starting values. Similar results could have been obtained using different initial parameter values in the calibration routine. This is why we have devoted considerable attention to the topic of selecting a meaningful initial value in this thesis. The second point, concerning the stability of Heston model parameters throughout time has been researched in Detlefsen (2005). Analyzing daily option surfaces for the period of one year, and recalibrating daily, he has found the following coefficients of variation29 _{for the parameters: κ:2.16, θ:1.22, V}

0:0.41, ρ:0.23, ω:0.61.

These variations are quite considerable, the standard deviations being even larger than the means of the parameters for κ and θ. We have also analyzed this issue for our AEX datasets, however, in which, at least in the short term, this feature seems to be less pronounced (see section 6.5.2). In spite of these complications, the Heston model is praised for its good fit on several occa-sions. Unfortunately, fit statistics are not very clear in most studies. Bakshi et al. only mention sums of squared errors, averaged over the days in their sample. Mikhailov and Noegel mention percentage errors, but it is not clear what kind of percentage errors they have in mind. From 28_{It can be shown that these parameters do not change when one goes from the market measure P to a risk}

neutral measure Q.

(28)

(29)

4 Methods

In the stochastic volatility literature, three basic techniques of obtaining option prices can be found: Fourier Transform methods, Finite Difference/Finite Element methods for solving partial differential equations and Monte Carlo simulation methods. In this chapter we will discuss two applications of the Fourier Transform method and Monte Carlo simulation for the Heston model. Application to other models is fairly straightforward, compare e.g. Schoutens et al. (2003). We refrain from discussing the partial differential equation approach, based on numerical solution of (74)-(75), pioneered by Wiggins (1987). This method is the exclusive topic of a thesis by Kluge (2002) to which we refer the reader for all details of the procedure.

4.1 Fourier Transform Methods: the Heston (1993) approach

The most direct way of determining the option price in the Heston Model is to use the charac-teristic function as given in (85)-(87), numerically evaluate the integrals in expression (87) and plug the results into (73). The method can be applied for a wider range of Markovian diffusion SV models, for which the characteristic function is available and formula (73) holds. Examples include the Heston model with jumps and/or stochastic interest rates, and the Stein/Stein model with Ornstein-Uhlenbeck volatility.

In the Heston model an acceptable approximation of the option price is readily obtained for reasonable parameter values. However, when calibrating we need reliable results for any param-eter combination, as we do not control the paramparam-eters our optimization algorithm picks and we definitely do not want the pricing algorithm to fail in the middle of the calibration procedure. Moreover, we need accuracy up to one eurocent for pricing and even much more for the calcu-lation of implied volatilities and of the Greeks. Unfortunately, it turns out that the integrand in (87) is not always well-behaved and smoothly decaying as is suggested in Heston (1993). For certain combinations of model and contract parameters it turns out to be oscillating instead (see Figures 16 and 17). We need to make sure that the method used for numerical evaluation of the integral is up to the task.

Three issues, which arise not only in the Heston case, have to be adressed. First we need to ensure that the integrand is numerically tractable. It turns out that algebraically equivalent for-mulations may differ very considerably from a numerical perspective. Second, we need to deal with the infinite upper integration limit, which many numerical algorithms do not allow for; those which do integrate only to an upper limit N thus introducing truncation error. Third, we need an integration scheme which can deal with oscillating integrands without producing inaccurate results or getting tied up in some ‘difficult’ interval of the integrand leading to an increase of computing time.

These issues have been adressed in Kahl and Jaeckel (2006) and Lord and Kahl (2006a, 2006b). In Kahl and Jaeckel (2006) it is noted, that in formulation (87), which is identical to Heston’s (1993) formulation, the logarithm in the expression for C(T − t, u) causes numerical difficulties. The argument of the logarithm is a complex number, as for given model parameter values, d(u) and g(u) in (87) are functions R+ → C. Hence the logarithm is a function from C to C.Let us write

ln(1 − ge

d(T −t)

1 − g ) = ln(x(u))

We can then write x(u) in Euler notation: x(u) = reiφ_{(u) = re}i(φ+2kπ)_{(u), φ in [−π, π), k ∈ Z}

Then we get

log(x(u)) = log(|r|) + i(φ + 2kπ), k ∈ Z (88)

(30)

Figure 16: Easily tractable integrand in equation (87) for model parameters κ = 1, θ = 0.05, ω = 0.2, ρ = −0.8, V0=0.05, with r = 0.03 and Call contract parameters K = 100, T = 2, when

S0= 100 (at-the-money)

Figure 17: Oscillating integrand in equation (87) for model parameters κ = 1, θ = 0.05, ω = 0.2, ρ = −0.8, V0=0.05, with r = 0.03 and Call contract parameters K = 100, T = 2, when S0 = 5

(31)

Suppose it reaches π for some u∗, that is, it the complex number takes a value on the negative real axis in the complex plane. Then we may rewrite x(u∗) = reiπ= re−iπ. Note however that

log(x(u∗)) = log(|r|) + iπ + 2kiπ 6= log(|r|) − iπ + 2kiπ (89)

unless we have k = 0 on the left hand side and k = 1 on the right hand side. This is a so-called branch switch at u∗. But in general computer packages restrict the logarithm to the branch cor-responding to k = 0, which means that the output for log(x(u)) will be log|r| + iφ, φ ∈ [−π, π) regardless. This means that if the situation we described occurs (and for some model/contract parameters it does!) the value of the logarithm returned by a software package will jump from limφ↑πlog(r) + iφ to log(|r|) − iπ at u∗ and this will induce a discontinuity in C(T − t, u) and in

fj(u), j = 1, 2, distorting the integrand in (87) and leading to incorrect results. This is illustrated

in Figure 18. The phenomenon occurs quite often for large values of the contract parameter T , but also for certain model parameter combinations. No general results are available on this, however. Kahl and Jaeckel (2006) develop an algorithm, which can be applied to efficiently count the num-ber of passages of the argument through the negative real axis in the complex plane (‘rotations’) and adjust the value of k in (88) accordingly. The method works for ρ < κ_ω or Im(u) ≥ −κ_ρω and κω ≤ ρ < 2κω. Note that the first condition covers the index option case which is of interest in this thesis, as ρ then typically is highly negative, whereas κ and ω have to be positive. Most studies also report values of κ higher then ω, so the whole range of ρ is practically covered. However, in a subsequent paper (Lord and Kahl (2006a)) it is found that this counting method is unnecessary, as using a reformulation of C(T − t, u) prevents the argument of the logarithm from crossing the negative real axis if the above conditions on ρ hold and the branch switching problem does not occur.30 _{This formulation is algebraically equivalent, but has quite different numerical}

implications. Recall we have

C(T − t, u) = rui(T − t) − 2κθ ω2{ln(

1 − ged(T −t)

1 − g ) + β(T − t)} (90)

Then the alternative formulation is obtained by focusing on the term within brackets and rewrit-ing31

ln(1−ge_1−gd(T −t)) + β(T − t) = ln(ed(T −t) e−d(T −t)_1−g −g) + β(T − t) = d(T − t) + ln(e−d(T −t)_1−g −g) + β(T − t) = ln(e−d(T −t)_1−g −g) + α(T − t)

This alternative formulation is to be preferred over the implementation of Kahl and Jaeckels rotation count algorithm, as it turns out to be numerically more stable and saves computation time. The rotation count algorithm is not very complicated, quite fast and reliable itself, but it is made quite redundant by the rewriting exercise. Nevertheless the complex logarithms appears in many models (Lord and Kahl (2006)) and hence it was deemed useful to illustrate the problems involved.

The second issue we need to adress is the infinite upper integration limit. We face a tradeoff between accuracy and numerical stability of our algorithm here. From the perspective of accuracy we desire a high limit N . However, if it is so high as to reduce the integrand to values below machine precision for some n between 0 and N , our integration algorithm may fail. Due to the varying behavior of the integrand for different model and contract parameters, no unique N can be set. Fortunately Kahl and Jaeckel (2006) have (after impressive computations in the complex plane)

30_{For a proof see page 13 of Lord an Kahl (2006a).}

31_{This formulation looks different from that of Lord and Kahl (2006a), but actually it is exactly the same. The}

(32)

managed to rescale the integral from [0, ∞) to [0, 1] for the Heston model. For all computations involved we refer the reader to their 2006 paper. We only present the main idea here.

For the integrands Intj, j = 1, 2 in (87) it is first established32, by taking the limit of the various

exponent terms, that

limu→∞Intj(u) ≈ Intj(0)e−uC∞

sin(ut∞)

u (91)

where C∞ is real and positive33 and t∞ is real. This ensures that the asymptotic decay of the

integrand is at least exponential.

Then the Change of Variables Theorem is applied as follows. Recall the Theorem states

Z b a f (u)du = Z d c f (u(x))u0(x)dx (92)

with a = u(c), b = u(d). Choosing u(x) = −ln(x)_C

∞ we obtain a = 0 → c = 1 b = ∞ → d = 0 u0(x) = − 1 xC∞ Resulting in Z ∞ 0 Intj(u)du = − Z 0 1 Intj(− ln(x) C∞ ) xC∞ dx = Z 1 0 Intj(− ln(x) C∞ ) xC∞ dx (93)

Thus Kahl and Jaeckel obtain a transformation to a finite interval, eliminating truncation error. The upper integration bound still poses a problem, however, as u(1) = 0 and in this point Intj(u) is

not defined. However application of l’Hopital’s Rule does yield limit expressions for limu↓0Intj(u),

j = 1, 2 in the Heston case. These limit expressions can be found in Kahl and Jaeckel (2006), page 9.

The integration bound issue being resolved, we follow Kahl and Jaeckel in the choice of their integration algorithm (quadrature), which is the one proposed by Gander and Gautschi (2000). Once more, we only outline the general idea and refer the reader to the original paper for details. The advocated procedure is the adaptive Gauss-Lobatto rule. ‘Adaptive’ means that the integral I = R_abf (x)dx is evaluated using two different algorithms, A and B, where, without loss of generality, B is more accurate than A. Then, if |IA− IB| < , where is a given tolerance

level, the integration terminates. Otherwise, I is subdivided in two parts34 I1 and I2, then

IA,1, IB,1, IA,2, IB2 are calculated and the comparison is repeated for the first and second part. If

one of these intervals turns out not to satisfy |IA,i− IB,i| < , i = 1, 2 it is once again subdivided.

Division continues until |IA,j− IB,j| < , ∀j. Suppose this occurs when we have subdivided into

a total of n intervals, then the integration result equals ˆI =Pn

i=1IB,i.

However, this can lead to a long sequence of subdivisions where the contribution of some subparts Ij to I is negligible. Hence it is reasonable to calculate an approximation Is of I without using

the adaptive procedure. In general a method more accurate than B is used for this. The interval Ij is then no longer subdivided if Ij < ηIs, where η is some tolerance level.

In the application of Gander and Gautschi (2000) the Gauss-Lobatto quadrature with two interior nodes is used to calculate IAj, with corresponding formula

Ia= h 6{f (a) + f (b) + 5[f (m − βh) + f (m + βh]} (94) h = 1 2(b − a) m = 1 2(a + b) β = 1 √ 5 32_{Kahl and Jaeckel 2006, page 8.}

33_{At least when |ρ| 6= 1}

34_{The number of parts may be different, two is used for illustration. The same holds for the way the domain of}

The Heston Model: Theory, Numerical Issues and Calibration

The Heston Model: Theory, Numerical Issues and

Calibration

Contents

1

Outline

2

Stochastic Volatility Models

2.1

General specification and important models

2.2

Pricing

2.3

Hedging

3

The Heston Model

3.1

Properties of the Volatility SDE

3.2

Pricing formula for European Call

3.3

Assessment of the Heston Model in previous works

4

Methods

4.1

Fourier Transform Methods: the Heston (1993) approach