## American option pricing and Deep Learning

### A study of the Generative Adversarial Network as alternative for Geometric Brownian Motion for option pricing by the Least Squares Monte Carlo

### Abstract

The paper presents a novel method for the pricing of American options with deep learning. It combines the data distribution capturing ability of the Generative Adversarial Network with the Least Squares Monte Carlo for pricing of the American put. The Least Squares Monte Carlo originally uses Geometric Brownian Motion under the risk-neutral measure Q as basis for simulating stock price paths. However, the Geometric Brownian Motion is flawed in its ability to capture the data distribution of log returns, as they often contain negative skewness. To make the GAN compatible with the Least Squares Monte Carlo and its risk-neutral valuation, a mean shift is performed, where the original physical measure P of the GAN is changed to the risk-neutral pricing measure Q. The paper shows it to be a viable method for valuing American put options, being able to produce similar or better results than the GBM in most cases. Furthermore, the mean shift improves the accuracy of estimated option prices by a large margin, showing the importance of risk-neutral valuation in option pricing.

### Author: Gilbert ter Beek - 11802464 Supervisor: Sander Barendse

### Second reader: Peter Boswijk

### University of Amsterdam MSc Econometrics Track: Financial Econometrics

### December 15, 2021

Statement of Originality

This document is written by Student Gilbert ter Beek who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

### 1 Introduction

Valuation of European style options is performed analytically by models such as the Black-Scholes.

However, for American style options this is generally not the case. This problem is caused by the possibility of early exercise. Although the value of the American call can be derived analytically when the underlying asset pays no dividends or when its dividends are concrete, when dividends are paid continuously or the relevant contract is a put, no analytical solution exists (Kim, 1990). For these types of options, valuation is primarily performed by numerical or approximation methods. This paper proposes a novel numerical method for valuation of the American put.

As deep learning models become more developed, they are capable of improving current methods of option pricing. Recently, Zhang et al. (2019) and Zhou et al. (2018) have proposed a Generative Adversarial Network (GAN) for stock market prediction. The GAN is a deep neural network which focuses on modelling distributions of data (Creswell et al., 2018). The paper uses GAN to simulate stock price paths, which are then used for American option pricing with the Least Squares Monte Carlo (LSMC) of Longstaff and Schwartz (2001). The LSMC allows for risk-neutral pricing of the American put. This is originally implemented via Geometric Brownian Motion (GBM) under the risk- neutral measure Q. This means that, as the GAN does not simulate from the risk-neutral measure Q, a transformation of measures is applied to make it compatible with the LSMC. The performance of the GAN is then compared with the GBM, which, apart from being the original method for the stock price path simulation in Longstaff and Schwartz (2001), is also a general benchmark for stock price path simulation in option pricing.

The GAN is first introduced by Goodfellow et al. (2014). It consists of two individual neural networks, called agents, who are competing with each other as in a zero-sum game. The first agent is the generative model G, which captures the data distribution and generates fake data as close as possible to the training data. The second agent is a discriminative model D, which estimates the probability that a sample came from training data rather than G. Therefore, G has to generate data such that D is not able to differentiate between real and fake. When G is properly trained, the GAN is able to generate new data which follows the distribution of the training data. This paper will use the GAN to learn the distribution of historical log returns, which can then be used to generate i.i.d.

log returns. These generated log returns can then be transformed into stock price paths.

Goodfellow et al. (2014) state that the GAN is most straightforward to implement when both agents are Multilayer Perceptrons (MLP). However, any kind of neural network can be implemented. Zhang et al. (2019) propose a framework where G is a Long Short Term Memory (LSTM) model and D an

MLP. LSTM has already been shown to be a very reliable method regarding stock market prediction (Nelson et al., 2017; Roondiwala et al., 2017). Another proposed framework by Zhou et al. (2018) uses LSTM as G and a Convolutional Neural Network (CNN) as D. This paper uses the MLP-MLP GAN structure as proposed by Goodfellow et al. (2014).

As mentioned, the method used for pricing options after the stock paths are simulated, is the LSMC, which uses GBM under the risk-neutral measure Q as its original stock price simulation method. As the LSMC is based on risk-neutral valuation, this version of GBM is very convenient. However, the GAN simulates its log returns using physical measure P , hence a mean shift to the simulated log returns of the GAN is performed. With this, the measure P is changed to some measure Q, such that the conditional expectation with respect to P of the mean shifted simulated log returns is equal to the conditional expectation of the simulated log returns with respect to Q. This is then made risk-neutral with the arbitrage theorem and the mean shifted log returns then allow for risk-neutral valuation of the put. The LSMC has been shown to be very robust for pricing of the American put (Moreno and Navas, 2003). It is an algorithm based on backward induction dynamic programming. As there are several possible early exercise dates in American option pricing, the holder must decide at each state whether or not to exercise the option. This decision depends on the immediate exercise value and the continuation value of the option. To make such a decision, the LSMC states that the continuation value of the option can be estimated by means of a simple least squares regression. By using this continuously while moving backwards over the simulated paths, the final value of the option can be derived.

The benchmark model GBM, which the GAN is compared to, assumes the underlying asset to follow a log-normal distribution. This assumption is often violated, as log returns show negative skewness, making the log-normal assumption of GBM implausible as its simulated log returns give a skewness very close to zero (Ekholm and Pasternack, 2005). Therefore, GBM returns an unrealistic distribution of log returns. As the GAN focuses on modelling the data distribution of the corresponding variable, it should be able to better capture the stock return distribution and therefore be able to simulate more realistic stock price paths than GBM. Furthermore, GBM uses a constant parameter for its volatility.

However, in real stock prices, volatility changes over time and is not constant. Moreover, deriving implied volatility for the American put is a lot of effort.

The papers methods are applied to the stock Apple and the S&P 500 index. For both assets, estimates are compared for multiple strike prices and contracts with maturities of one month, three months and six months. Based on the Root Mean Squared Errors (RMSE), results show that the GAN

in combination with LSMC is able to produce similar or better results than the GBM in most cases.

The effect of the mean shift has played a significant role in this, showing the importance of risk-neutral valuation in option pricing. Moreover, the GAN is able to capture the negative skewness in the log returns, whereas GBM returns a skewness close to zero.

### 2 Literature review

### 2.1 Option pricing and Deep Learning

Although there are many published papers about deep learning and option pricing, none of them have combined deep learning with a numerical option pricing method in the way this paper does. For the LSMC specifically, since its introduction few applications with deep learning have been performed, where most focus on improving efficiency of the method. Ye and Zhang (2019) and Liang et al. (2021) for example, improve the LSMC by extending the basis function approach with deep learning models.

So far, most papers concerning option pricing using deep learning directly model the price of an option as a function of the price of the underlying asset, strike price, and other possible relevant option characteristics (Ruf and Wang, 2020). Many of these papers show promising results, being able to reproduce the Black-Scholes option pricing formula with high accuracy (Culkin and Das, 2017), while others claim to be able to outperform the Black-Scholes formula, questioning its accuracy in deep out of the money options and its ability to capture the volatility surface (Gradojevic et al., 2009;

Ke and Yang, 2019). Ke and Yang (2019) specifically use LSTM and MLP to model these option prices, showing that they can already be very reliable on their own in option pricing. Although the results are promising, all these papers disregard risk-neutral valuation and its importance in option pricing.

Returning to the GAN, it is unambiguous that it has become more prominent in financial time series simulation. Zhang et al. (2019), Zhou et al. (2018) and Koshiyama et al. (2021) have recently published papers to predict the stock market using a form of the GAN. This has been extended to the option pricing market by Wiese et al. (2019), who focus on simulating implied volatility and express option prices as a function thereof. As the GAN is becoming more popular in financial time series, its capabilities are quickly being extended, and due to the many different types of neural networks it can implement, many new forms such as the Quant GAN of Wiese et al. (2020) arise. They show that this form of GAN is capable of successfully generating multivariate financial time series, proving the broad horizon of the GAN and its applications.

As this paper solely uses GAN for stock price path simulation, it is worthwhile to consider other deep learning models on this aspect. Many deep neural networks have been studied for application in financial time series, among the more popular ones are LSTM and MLP (Sezer et al., 2020). They state that LSTM is successful in implementing the temporal characteristics of any time series signal, and has therefore been proven to be successful in forecasting stock prices. MLP, however, is more popular with index forecasting. Other noteworthy methods in their paper include Recurrent Neural Networks, Convolutional Neural Networks, Deep Belief Networks and Deep Reinforcement Learning.

This suggests that, as any neural network can be implemented, there are many other eligible GAN frameworks for stock path simulation which this paper does not consider, with many having the capability of outperforming the MLP-MLP GAN framework.

### 2.2 Option pricing theory

The most common option styles are the European and American option. A European option can only be exercised on its expiration date, whereas its American counterpart can be exercised at any time up to and including the date of expiration. The best known model for pricing European options is the Black-Scholes model of Black and Scholes (1972). This model delivers a closed form solution for the value of the European option by solving the Black-Scholes partial differential equation (PDE):

∂F

∂t(t, S) +1

2σ^{2}S^{2}∂^{2}F

∂S^{2}(t, S) + rfS∂F

∂S(t, S) = rfF (t, S), (1) where F (t, S) is the derivative’s price function, S is the stock price, rf is the risk-free rate and σ is the volatility. Solving this PDE with boundary condition F (T, S) = max(S − K, 0) leads to the Black-Scholes call price formula:

Ct= F (t, St) = StΦ(d1) − exp(−rf(T − t))KΦ(d2), (2)

where T − t is the time until maturity and K is the strike price. Φ is the cumulative standard normal distribution function with

d1= d2+ σ√

T − t = log(^{S}_{K}^{t}) + (rf+^{1}_{2}σ^{2})(T − t)
σ√

T − t . (3)

The same PDE can be solved with a different boundary condition F (T, S) = max(K − S, 0) to derive the put price formula:

Pt= F (t, St) = exp(−rf(T − t))KΦ(−d2) − StΦ(−d1). (4)

These two formula’s are valid for the European call and put as they can only be exercised at maturity. However, for many American options, the formula’s become invalid due to the possibility of exercising before maturity. The exception here is for American call options on a non-dividend paying stock, which will not be exercised early and therefore have the same value as the European version (Merton, 1973). Other ways to price options are the Binomial option pricing model of Cox et al.

(1979) and Finite difference techniques introduced by Brennan and Schwartz (1978). These methods allow for both European and American option pricing. However, due to their complexity, they become impractical when calculating option prices over longer periods of time. Alternative methods stem from Monte Carlo simulation, which can be easily applied when the value of the option depends on multiple factors like early exercise and path-dependency. This makes Monte Carlo simulation methods a valid approach for pricing American options. The advantages of such methods lie in their computational speed and efficiency. Furthermore, Monte Carlo simulation methods are simple, transparent and flexible (Longstaff and Schwartz, 2001).

The LSMC of Longstaff and Schwartz (2001) states that the holder of an American option will have to consider at each exercise time whether to exercise the option or hold on to it. This decision depends on the immediate exercise value and the continuation value of the option. The continuation value is a result of the fundamental theorem of asset pricing, where the latter implies the principle of no arbitrage.

This principle leads to risk-neutral valuation: a derivative price equals the expected discounted payoff under the risk-neutral probability measure Q. This makes it possible to define the continuation value as the conditional expectation of the discounted future cash flows under the risk-neutral measure Q.

The exact definition can be found in section 3.2.2.

### 3 Methodology

### 3.1 Generalised Adversarial Network

The GAN is a deep learning model that uses two underlying neural networks in order to capture distributions of data. The neural networks are called Generator (G) and Discriminator (D), which are competing against each other within the GAN framework. The intuition behind the GAN is that G captures the data distribution and creates noise data as close as possible to the real data, while D estimates the probability that a sample came from the real data rather than from G. G and D are trained simultaneously, so each time D finishes its task, G is updated using backward propagation and generates more accurate simulations the next time. This training process continues until D is not able

to distinguish between real and generated data.

3.1.1 Theoretical background

As proposed by Goodfellow et al. (2014), the paper uses MLP for both G and D. Let z_{t}be input
noise data for the MLP Generator to transform, defined with the prior pz(z)^{iid}∼ N (0, 1). Then define
a transformation to this input noise as G(z; θg), where G is a differentiable function represented by
an MLP with parameters θ_{g}. This means that G transforms input noise data z in such a way that
after being trained, it represents the historical log returns Rt. This means that the objective of the
GAN is to let the Generator learn its distribution pg such that it follows the real data distribution
p_{data}. The Discriminator is defined as the MLP D(R; θ_{d}) that outputs a single scalar representing
the probability that its input came from the real data Rt, instead of G. D is trained such that the
probability of assigning the correct label to both real data samples and samples from generated data
G is maximized. G is trained simultaneously to minimize D’s ability to differentiate between real and
fake data, which is equal to minimizing the function E_{z∼p}_{z}_{(z)}[log(1 − D(G(z)))]. This results in the
following optimization problem for both G and D:

min

G max

D V (D, G) := E_{R∼p}_{data}_{(R)}[log(D(R))] + E_{z∼p}_{z}_{(z)}[1 − log(D(G(z)))]. (5)
where V (D, G) represents the value function of this two-player minimax game. When pg = pdata,
the Discriminator is unable to differentiate between the data distributions, this means that D(R) =
D(G(z)) = ^{1}_{2}, and both G and D cannot improve anymore. The frameworks layout is represented by
figure 1.

Figure 1: The layout of the MLP-MLP framework of GAN.

To provide some intuition for the value function, consider the binary cross-entropy loss function which resembles V (D, G), given by

L = −X

y log(ˆy) + (1 − y) log(1 − ˆy) (6)

where y is a label with values 1 for real data and 0 for fake data and ˆy is the prediction of the model.

This means that for y = 1, ˆy is generated from the distribution of the real data, and for y = 0, ˆy is generated from the distribution of G. If then only one observation is considered and the negative sign in front of the equation is omitted, the following results hold:

y = 1, ˆy = D(R) =⇒ L = log(D(R))

y = 0, ˆy = D(G(z)) =⇒ L = log(1 − D(G(z))).

(7)

Adding both these equations gives:

L = ln(D(R)) + log(1 − D(G(z))), (8)

which is already very similar to V (D, G). This equation is only for one data point, however, the entire dataset has to be considered, which is done by taking the expectation. For a discrete distribution this results in a summation, whereas a continuous distribution gives an integral:

E[L] = E[ln(D(R))] + E[ln(1 − D(G(z)))]

=X

p_{data}(R)ln(D(R))] +X

p_{z}(z)ln(1 − D(G(z)))
or

= Z

pdata(R)ln(D(R))dR + Z

pz(z)ln(1 − D(G(z)))dz.

(9)

In the end, this is equivalent to the value function V (D, G).

3.1.2 Implementation

The GAN uses MLP for its Generator and Discriminator, the MLP is a class of the feedforward Artificial Neural Network (ANN), but can also imply any feedforward ANN. The MLP Discriminator uses 3 hidden layers with 100 neurons in all layers and uses the Leaky Rectified Linear Unit (ReLU) activation function as in Zhang et al. (2019), using α = 0.2:

φ(x) =

0.2x, if x < 0 x, if x ≥ 0

. (10)

The Leaky ReLU allows for quicker convergence as it is capable of back propagation for non-positive values, which the normal ReLU version is not. The output layer uses the sigmoid function, which is common for ANN classification frameworks. Furthermore, it has a range of (0, 1), and is therefore a very good fit for the Discriminator.

σ(x) = 1

1 + e^{−x}. (11)

The Discriminator also includes a dropout value of 0.3, which is a simple way to prevent neural networks from overfitting (Srivastava et al., 2014).

The MLP Generator is similar to the MLP Discriminator, except for its output layer activation function. As the Generator must be able to generate log returns, the output range (0, 1) of the sigmoid function does not suffice and the tanh activation function is used instead:

tanh(x) = e^{x}− e^{−x}

e^{x}+ e^{−x}. (12)

Moreover, both MLPs use the binary cross-entropy loss function, which closely resembles the value function given in equation (5).

After setting up the Generator and Discriminator, the GAN is trained with minibatch stochastic gradient descent. First, D is trained such that it is capable of classifying real or fake data and updated with the Adam optimizer. G is then updated using the Discriminators error and Adam optimizer as well. The training algorithm is documented in Appendix A. In this paper, the number of times that D is trained individually is k = 15, whereas the batch size is m = 64. The training process is performed for 500 epochs. After the training process, the Generator can simulate i.i.d. log returns similar to the distribution of the real log returns, which can then be transformed into stock price paths.

### 3.2 Least Squares Monte Carlo

The LSMC of Longstaff and Schwartz (2001) is an algorithm based on backward induction dynamic programming that solves for the continuation value of an American option at each stopping time. A stopping time indicates a time step where a decision for early exercise or holding on to the option must be made. This decision depends on the early exercise value and continuation value of the option, which are defined later on. The method is applied to every simulated path of the GAN and GBM, working backwards from the last time step to the first and comparing the different payoffs at each exercise date. When all paths have reached their initial point, the final value of the option can be computed.

The theoretical framework of GBM under the risk-neutral measure Q can be found in Appendix A.

3.2.1 A numerical example

Consider an American put option on a non-dividend stock with initial stock price 1 and strike value 1.10. Furthermore, let there be 8 simulated stock paths with a time frame of three steps and let the risk-free rate be rf = 0.0009, which is the daily U.S. 3 Month Treasury Bill rate of December 31, 2020.

Table 1: Stock price paths.

Path t = 0 t = 1 t = 2 t = 3 payoff at t = 3

1 1.00 1.00 1.04 1.05 0.05

2 1.00 0.84 0.76 0.73 0.37

3 1.00 0.90 0.94 1.10 0.00

4 1.00 0.86 0.85 0.95 0.15

5 1.00 1.19 1.41 1.25 0.00

6 1.00 1.02 1.24 1.38 0.00

7 1.00 1.06 1.34 1.57 0.00

8 1.00 0.98 1.15 1.25 0.00

The final put payoff equals the difference between the strike price and stock price if the resulting value is positive and zero otherwise. Therefore, the cash flow vector at t = 3 can easily be derived and is given in the most right column of table 1. If the put is in the money at time 2, then the option holder must decide whether to exercise the option or continue holding on to it. This decision is made by comparing the immediate payoff and continuation value, and choosing the corresponding option with the highest payoff. Longstaff and Schwartz (2001) state that the continuation value can be obtained by a simple least squares regression. Table 2 shows the state space of this regression at t = 2, where column X represents the stock prices for which the put is still in the money and column Y the discounted future cash flows for these paths.

In this paper, Y is regressed on X using the Laguerre basis functions, which are defined later on and denoted by Li(X), using only the paths where the option is in the money. For simplicity, only the first 2 terms of the Laguerre polynomials are used in this example. The value of continuing the option then equals the following conditional expectation:

E[Y |X] = −0.027L_{0}(−2.705 + 3.556X) + 0.332L_{1}(−2.705 + 3.556X) (13)
where −0.027 and 0.332 are simply coefficients of the regression. Using this equation for each path gives
the continuation values, which are then compared with the immediate exercise values to determine the

optimal stopping time, shown in table 3.

Table 2: Regression at t = 2.

Path Y X

1 0.05 e^{−0.0009} 1.04
2 0.37 e^{−0.0009} 0.76
3 0.00 e^{−0.0009} 0.94
4 0.15 e^{−0.0009} 0.85

5 - -

6 - -

7 - -

8 - -

Table 3: Decision for early exercise at t = 2.

Path Continuation Exercise Decision

1 -0.03 0.06 Exercise

2 0.30 0.34 Exercise

3 0.09 0.16 Exercise

4 0.20 0.25 Exercise

5 - - -

6 - - -

7 - - -

8 - - -

When the exercise value is larger than the continuation value, this exercise value will be the payoff at the corresponding time step, and all the payoffs in later time steps are removed. The cash-flow matrix at t = 2 is then given by table 4.

Table 4: Cash flow matrix at time t = 2.

Path t = 1 t = 2 t = 3

1 - 0.06 0.00

2 - 0.34 0.00

3 - 0.16 0.00

4 - 0.25 0.00

5 - 0.00 0.00

6 - 0.00 0.00

7 - 0.00 0.00

8 - 0.00 0.00

Continuing the same process for time t = 1, using the values of table A1 for the regression and table A2 provided in Appendix A for the decision making process, gives the option cash flow matrix represented by table 5. With this option cash flow matrix, the final value of the put can be derived by discounting all cash flows to time t = 0 and taking the average of the row sums. This results in the final option price 0.14. Though simple, this example shows the intuition behind the LSMC and how a simple regression can be used to price an American option.

Table 5: Final cash flow matrix of the option.

Path t = 1 t = 2 t = 3

1 0.10 0.00 0.00

2 0.00 0.34 0.00

3 0.20 0.00 0.00

4 0.00 0.25 0.00

5 0.00 0.00 0.00

6 0.08 0.00 0.00

7 0.04 0.00 0.00

8 0.12 0.00 0.00

3.2.2 Valuation framework

Longstaff and Schwartz (2001) assume a finite time horizon [0, T ], in which they define probability space (Ω, F , P ) and equivalent martingale measure Q. Ω represents the set of all simulated paths within the time horizon with ω representing a single path. Let t ∈ [0, T ], then Ft represents the

information set at time t. They then introduce the notation C(ω, s; t, T ) as the path of option cash flows, conditional on the option not being exercised at or before time t and on the option holder following the optimal stopping strategy for all s ∈ (t, T ]. This optimal stopping strategy is decided by comparing the immediate exercise values and continuation values of the option, and choosing the corresponding strategy with the highest payoff.

The LSMC assumes a finite number of exercise dates 0 < t1 < t2 < ... < tK = T , and considers
the optimal stopping policy at each time step. At time t_{k}, the value of early exercise is known to the
holder of the option, as it is simply the cash flow of this early exercise. The continuation value, which
follows from the future cash flows when holding on to the option, however, can not be determined so
easily. Longstaff and Schwartz (2001) define this continuation value as

F (ω, tk) = EQ[exp(−rf(tj− tk))C(ω, tj; tk, T )|Ft_{k}] (14)
which is the conditional expectation of the discounted future cash flows with respect to risk-neutral
pricing measure Q, where rf is the risk-free rate. Using this conditional expectation, the problem of
optimal exercise reduces to comparing the early exercise value and the continuation value.

3.2.3 The algorithm

The LSMC states that the conditional expectation can be approximated by least squares regression for each exercise time. At time tk−1, it is assumed that F (ω, tk−1) can be expressed as a linear com- bination of orthonormal basis functions. Longstaff and Schwartz (2001) propose the set of (weighted) Laguerre polynomials as example, however, Moreno and Navas (2003) show that different polynomial functions provide very similar results, so any orthonormal basis function can be used. Furthermore, they state that using more than four terms does not significantly change the final option prices. This paper uses the Laguerre polynomials up to four terms, given by:

L_{0}(X) = 1
L1(X) = −X + 1
L2(X) = 1

2 X^{2}− 4X + 2
L_{3}(X) = 1

6 −X^{3}+ 9X^{2}− 18X + 6 .

(15)

The algorithm can then be described as follows. The initial step in the algorithm is to determine
the cash flow vector C_{t}_{K} at the last time step t_{K}, with

Ci,t_{K} = max(K − Si,t_{K}, 0), (16)

where i represents the individual stock paths. The second step is to calculate the cash flow Ct_{k} of the
option for time tk, with 1 ≤ k ≤ T − 1, and determine the exercise value and select those which have
a positive payoff, such that

Ci,t_{k}= max(K − Si,t_{k}, 0) > 0. (17)
The next step is to obtain the continuation values, which represent the conditional expectation of the
discounted future cash flows under the risk-neutral measure Q, given the stock price paths. These
values can be obtained by regressing the discounted future cash flows on the stock paths where the
option is in the money at time tk, using the Laguerre basis functions. Using a similar notation as in
the previous section, but changing the symbol of the individual stock price paths from ω to i gives the
continuation value:

F (i, tk, S) =

3

X

j=0

αjLj(S) (18)

where αj are the coefficients of the regression and the option is exercised early when

C_{i,t}_{k}> F (i, t_{k}, S). (19)

Using the above equations, the algorithm steps backwards through time until the first step is reached and at each time step, early exercise is performed if the option is In The Money (ITM) and if the condition of equation (19) is met. If this is the case, all remaining future cash flows in the path are removed. When all paths reach their initial point, the full cash flow matrix is derived. Then by discounting all cash flows to t = 0, the option value can be calculated by taking the average over all row sums.

3.2.4 On risk-neutrality of the GAN

The LSMC uses the risk-neutral pricing measure Q, Longstaff and Schwartz (2001) implement this via the stock path simulation of the GBM. The GAN, however, is not capable of directly using the risk-neutral pricing measure Q as it originally uses physical measure P to simulate the log returns.

Therefore, to be able to use the LSMC in combination with GAN, the original measure P must be changed to the risk-neutral pricing measure Q. This is realized by applying a mean shift to the measure P that makes it risk-neutral.

From the simulated one period log returns of the GAN, Rt,t+1|FP can be retrieved. Using this result, the one period mean of the log returns can be estimated:

EP[Rt,t+1|FP] = µP. (20)

This measure P can be changed to some measure Q by a mean shift that makes the mean equal to µQ, making it risk-neutral later on. Meaning that the log returns are location shifted under Q. This gives:

E_{Q}[R_{t,t+1}|F_{P}] = µ_{Q}

= E_{P}[R_{t,t+1}− (µP− µQ)|F_{P}].

(21)

Then, by first simulating FP and applying the mean shift, FQ is simulated with:

R˜t,t+1= Rt,t+1− (µP− µQ), (22)

This implies that ˜R can be simulated and that it is possible to obtain approximations of the risk-
neutral means of all kinds of functions g(·) of R_{t,t+1}and variables from F_{P}. This result is for the one
period log returns, however, it can be extended to multi period log returns:

R˜t,t+h= Rt,t+h− h(µP− µQ), (23)

where h can be any positive integer within the simulated log return horizon. With these multi period log returns, the stock price paths under Q can be calculated with:

P˜_{t+h}= P_{t}exp( ˜R_{t,t+h}). (24)

Now consider the put premium P (r_{t,t+h}, K) in terms of simple return r and strike price K, which
must be divided by the starting price S0in case of simple return. Then by the arbitrage theorem:

EQ[exp(−hrf) max(K − rt,t+h, 0)] = P (rt,t+h, K). (25)

The right hand side can be observed by the market, whereas the left hand side can be approximated
by simulation from F_{Q}. µ_{Q} can then be found by minimizing the squared distance between the left
hand side and right hand side of the equation.

minµ_{Q}(P (r_{t,t+h}, K) − 1
N

N

X

s=1

(exp(−hr_{f}) max(K − exp(R_{t,t+h}+ h(µ_{Q}− µP)), 0)))^{2}. (26)

Finally, as Rt,t+h and µP have already been derived from the simulation and µQ is now estimated.

FQ can be simulated using the risk-neutral measure Q and the LSMC can be used to price the put.

To improve the accuracy of µ_{Q}, it is expressed as a function of strike K. Meaning that for every K, a
different µQ is derived.

Another method to derive µQ is with the risk-free rate, using the fact that the conditional expec- tation of the one period simple returns under the risk-neutral measure Q is equal to risk-free rate in

terms of simple return. This leads to:

EQ[exp(−rf) exp( ˜Rt,t+1)|FP] = 1

⇔

E_{Q}[exp( ˜R_{t,t+1})|F_{P}] = exp(r_{f}),

(27)

where the left hand side of the lower equation can be written in terms of measure P , this gives:

E_{P}[exp(R_{t,t+1}− µ_{P}+ µ_{Q})|F_{P}] = exp(r_{f}). (28)
As µP and µQ are scalars, they can be taken out of the conditional expectation:

EP[exp(Rt,t+1)|FP] exp(µQ)

exp(µP) = exp(rf). (29)

Then by dividing both sides with exp(µQ) exp(rf) and taking the logarithms, an expression for µQ is given:

µQ= − log E_{P}[exp(Rt,t+1)|FP]
exp(µP) exp(rf)

(30) Unlike the previous method, this method does not need to observe any option prices in the market, as the risk-free rate and the results of the simulation provide sufficient information to calculate µQ. Moreover, this method only uses one value of µQ for every strike K.

### 4 Data

The GAN is trained with historical data. Log returns are derived from the adjusted closing prices retrieved from the Yahoo Finance database, which means that they have been adjusted for stock splits and dividends. This study covers the daily historical prices of the stock Apple and the index Standard

& Poor’s 500 (S&P) ranging from January 2000 up to and including December 2020, such that there is 21 years of stock data and 5284 observations of log returns. As can be seen in Figure B1 of Appendix B, Apple is a clear example of exponential growth, while the S&P’s growth is more constant.

Figure B2 and B3 show the log returns of the assets. Apple is much more volatile than the S&P, and shows to be highly volatile during the entire dataset. However, for the S&P, recent log returns (considering the year 2020 and its pandemic) are more volatile than earlier years. Thus encouraging using a form of moving average for estimating the parameter of volatility for GBM. Table 6 shows the main data statistics of the log returns. Both assets, but Apple in particular, show an excessively high kurtosis and a negative skewness. This occurrence is also represented in the histograms of the log returns.

Table 6: Data descriptive statistics of the historical log returns.

Asset Mean Std. Min Max Skewness Kurtosis

Apple 0.0010 0.0266 -0.7313 0.1302 -4.0953 111.9086

S&P 500 0.0002 0.0126 -0.1277 0.1096 -0.3933 10.9431

Table 7 shows the parameters used for simulation of the GBM. The risk-free rate rf = 0.0009, which is the daily U.S. 3 Month Treasury Bill rate of December 31, 2020. MA volatility represents the daily average volatility derived with the simple Moving Average (MA) for both assets, using a rolling window of 50 trading days. Transforming this to annual volatility gives 43% and 29% for Apple and the S&P respectively. Another possible volatility estimate is the Exponential Weighted Moving Average (EWMA), which estimates can be found with GBM in Appendix A. In this paper, both of the volatility estimation methods are compared and the best performing parameter is chosen, which resulted in using the MA volatility.

Table 7: Parameters of the GBM.

Asset Underlying last Risk-free rate r_{f} MA volatility σ

Apple 131.8770 0.0009 0.0268

S&P 500 3756.0701 0.0009 0.0180

The historical option prices are retrieved from the optionsdx^{1} database. The considered options
have expiration dates January 29, February 19, March 18 and June 18, all within the year 2021 and
are categorized as 1 month, 2 months, 3 months and 6 months puts respectively. Table B1 and B2 in
Appendix B show the put prices at the closing time of the market on December 31, 2020 and their
corresponding strike prices. The ITM strike prices are denoted with bold text in their respective tables
and the bottom row shows the amount of trading days until maturity of the option. Prices of the 2
months put are used for deriving µQ as explained in subsection 3.2.4. The other options are used to
value the performance of the papers methods in option pricing.

### 5 Results

This section contains the main results of the paper, with supplementary tables and graphs in Ap- pendix C. All results have been computed with 100,000 simulated paths per model and dataset. First

1www.optionsdx.com

consider the GAN’s ability to capture the assets data distribution. Figure C1 and C2 show the his- tograms of the log returns simulated by GAN and GBM respectively. For the S&P they show that the spread of the log returns in GBM is larger than that of the GAN. For Apple this is not the case.

This can be explained with the captured standard deviations of the GAN. The GAN has captured a standard deviation of 0.0122 and 0.0316 for the S&P and Apple respectively. For the S&P, this is very close to the standard deviation of the real historical data, but far below the MA volatility used for the GBM. For Apple, the GAN’s standard deviation is higher than both the historical standard deviation and the derived MA volatility, leading to a larger spread in log returns. The skewness of the simulated log returns for the S&P is −0.5764, which is close to the skewness observed in the data and demonstrates the negative skewness in the log returns, whereas the skewness of the GBM for the S&P is −0.0005, failing to capture this negative skewness as it is very close to zero. For Apple, the GBM also fails to capture the negative skewness, giving the same value of −0.0005, whereas the GAN gives a negative skewness of −0.2254, which is substantially different to the observed skewness in the data, but at least features the negative skewness of the log returns. These factors imply that the GAN has better captured the log return distribution of the S&P when compared to Apple.

Figure 2: Index path simulations of the S&P 500.

Figure 2 shows the path simulation of the S&P 500 for the GAN and GBM. The graphs only use the first 10,000 simulated paths, as using more paths would not necessarily give a better display of the simulation. Though the graphs look similar, the difference in the spread of the y-axis is large.

The GBM, which range is between 2000 to 8000 shows more volatility than the GAN, which range is between 2500 to 6500. This is in line with the histograms discussed in the previous paragraph.

Furthermore, GBM resembles clear growth over time, which the GAN does not. Figure C3 in the

Appendix shows the stock path simulations of Apple. In this case, there is less difference in the spread of the y-axis and the graphs are more similar.

When the log returns of the GAN are simulated, the mean shift can be applied and the options
can be priced. The mean shift is computed for every strike price, with the solutions for µ_{Q} from
the optimization problem of equation (26), and its estimated values can be found in table C1. This
method for approximating µQ is chosen as the main method, because it returns the lowest RMSE’s.

It shows that a different value of (µ_{P}− µ_{Q}) is found for every strike price. The simulated stock prices
under the mean shift are shown in Figure C4, where the mean shifts for K = 130 and K = 3750
have been chosen for Apple and the S&P respectively. Table 8 contains the estimated prices and the
RMSE of the 1 month put for both Apple and the S&P, where the bold strike prices denote the ITM
options. Based on RMSE, GBM outperforms the GAN model for Apple, but performs worse for the
S&P. GBM slightly undervalues the put slightly for Apple and overvalues it for the S&P, whereas the
GAN overvalues the put for both Apple and the S&P.

Table 8: Estimated prices of the 1 month put.

Apple S&P 500

Strike GAN GBM Real Strike GAN GBM Real

120 1.78 1.28 1.58 3650 41.41 51.55 47.40

125 3.03 2.48 2.68 3700 54.22 68.89 60.00

130 5.02 4.28 4.48 3750 73.00 89.41 77.32

135 7.64 6.70 7.00 3800 106.33 113.75 98.58

140 10.80 9.97 10.00 3850 136.37 142.38 127.27

145 14.70 13.83 14.35 - - - -

RMSE: 0.52 0.29 - - 6.80 11.84 -

Table C2 and C3 show the estimated prices of the 3 months put and the 6 months put respectively.

For Apple, the GAN performs almost as good at GBM for the 3 months put and much better for the 6 months put. For the S&P, the GAN gives slightly worse estimates for the 3 months put, but much better estimates for the 6 months put. The GBM now undervalues almost all put contracts, which is especially the case for the 6 months put. Table C4 shows the RMSE when the options are categorized as ITM and Out of The Money (OTM). For the S&P, the strike of 3750 is omitted as it is almost At The Money and to keep the sample sizes equal. The table shows that for Apple, almost all OTM puts

have a smaller RMSE when compared to ITM puts. Suggesting that for Apple, both GAN and GBM are slightly better at estimating OTM options. This is also the case for the S&P, where almost all OTM puts have a smaller RMSE for both GAN and GBM.

Table C5 shows the estimated prices of the GAN for the mean shift where µ_{Q} is derived with the
risk-free rate as in equation (30). The resulting values for µQ are 0.00198 and 0.00007 for Apple and
the S&P respectively, which lead to a mean shift of −0.00022 and −0.00447. These values are very
different from the mean shift values of table C1 and lead to very different put prices. The estimated
prices of the options for the GAN without a mean shift are shown in table C6 and an overview of the
RMSE’s is given in table C7. For the S&P, the estimates for the mean shift derived with the risk-free
rate perform the worst, whereas the estimates for the mean shift derived by optimization perform best.

For Apple, the estimates without a mean shift perform the worst and the estimates for the mean shift with optimization perform best. Overall, using the mean shift derived with optimization, there is a clear difference in performance when compared to estimates without the mean shift, where the RMSE after the mean shift is much smaller in many cases. This result shows the importance of risk-neutral valuation in option pricing.

Recalling the decision of using MA as the volatility estimation method for the GBM, a comparison with the EWMA is given in table C8. For Apple, the RMSE of the MA is smaller in all cases. For the S&P the RMSE of the MA is larger for 1 month put but much smaller for the longer term puts.

These results show that for both Apple and the S&P, the MA volatility is the best choice.

### 6 Conclusion

The paper has introduced a novel method for pricing American options by combining deep learning and Monte Carlo simulation, using the Generative Adversarial Network as basis for the Least Squares Monte Carlo instead of the Geometric Brownian Motion under the risk-neutral measure Q. In the papers application, it has proven to be a viable method for pricing of the American put, being able to produce similar or better results than GBM in most cases. Furthermore, the GAN is capable of better approximating the distributions of the log returns as compared to GBM, being able to capture the negative skewness in the data.

The effect of the mean shift shows the importance of risk-neutral valuation in option pricing.

Despite its importance, risk-neutral valuation is often largely ignored in machine learning papers about option pricing. The paper has shown that implementing risk-neutral valuation in option pricing with deep learning leads to promising results and could be an important factor for future research.

All in all, combining deep learning with numerical option pricing methods is a promising starting point in the field of option pricing. The GAN in this paper is implemented to only model daily option prices, which means that the put is valuated based on daily exercise. However, this could be apprehended by using high-frequency data, which allows for immediate exercise at any point in time from initiation until the expiry date. Furthermore, to improve accuracy the underlying MLP networks can be replaced with other deep learning methods more suitable for time series such as the LSTM, but could also be extended with volatility models such as GARCH to incorporate volatility estimates.

The GAN could also be implemented such that instead of simulating i.i.d. log returns, it generates log returns conditional on previous estimated log returns. Using previously generated log returns as input noise for the Generator instead of samples from the standard normal distribution.

### References

Black, F. and Scholes, M. (1972). The valuation of option contracts and a test of market efficiency.

The journal of finance, 27(2):399–417.

Brennan, M. J. and Schwartz, E. S. (1978). Finite difference methods and jump processes arising in the pricing of contingent claims: A synthesis. Journal of Financial and Quantitative Analysis, 13(3):461–474.

Cox, J. C., Ross, S. A., and Rubinstein, M. (1979). Option pricing: A simplified approach. Journal of financial Economics, 7(3):229–263.

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., and Bharath, A. A. (2018).

Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65.

Culkin, R. and Das, S. R. (2017). Machine learning in finance: the case of deep learning for option pricing. Journal of Investment Management, 15(4):92–100.

Ekholm, A. and Pasternack, D. (2005). The negative news threshold—an explanation for negative skewness in stock returns. The European Journal of Finance, 11(6):511–529.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.

Gradojevic, N., Gen¸cay, R., and Kukolj, D. (2009). Option pricing with modular neural networks.

IEEE transactions on neural networks, 20(4):626–637.

Ke, A. and Yang, A. (2019). Option pricing with deep learning. CS230: Deep Learning, 8:1–8.

Kim, I. J. (1990). The analytic valuation of american options. The Review of Financial Studies, 3(4):547–572.

Koshiyama, A., Firoozye, N., and Treleaven, P. (2021). Generative adversarial networks for financial trading strategies fine-tuning and combination. Quantitative Finance, 21(5):797–813.

Liang, J., Xu, Z., and Li, P. (2021). Deep learning-based least squares forward-backward stochastic differential equation solver for high-dimensional derivative pricing. Quantitative Finance, pages 1–15.

Longstaff, F. A. and Schwartz, E. S. (2001). Valuing american options by simulation: a simple least- squares approach. The review of financial studies, 14(1):113–147.

Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of economics and manage- ment science, pages 141–183.

Mina, J., Xiao, J. Y., et al. (2001). Return to riskmetrics: the evolution of a standard. RiskMetrics Group, 1:1–11.

Moreno, M. and Navas, J. F. (2003). On the robustness of least-squares monte carlo (lsm) for pricing american derivatives. Review of Derivatives Research, 6(2):107–128.

Nelson, D. M., Pereira, A. C., and de Oliveira, R. A. (2017). Stock market’s price movement prediction with lstm neural networks. In 2017 International joint conference on neural networks (IJCNN), pages 1419–1426. IEEE.

Roondiwala, M., Patel, H., and Varma, S. (2017). Predicting stock prices using lstm. International Journal of Science and Research (IJSR), 6(4):1754–1756.

Ruf, J. and Wang, W. (2020). Neural networks for option pricing and hedging: a literature review.

Journal of Computational Finance, Forthcoming.

Sezer, O. B., Gudelek, M. U., and Ozbayoglu, A. M. (2020). Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied Soft Computing, 90:106181.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958.

Wiese, M., Bai, L., Wood, B., and Buehler, H. (2019). Deep hedging: learning to simulate equity option markets. arXiv preprint arXiv:1911.01700.

Wiese, M., Knobloch, R., Korn, R., and Kretschmer, P. (2020). Quant gans: Deep generation of financial time series. Quantitative Finance, 20(9):1419–1440.

Ye, T. and Zhang, L. (2019). Derivatives pricing via machine learning. Boston University Questrom School of Business Research Paper, (3352688).

Zhang, K., Zhong, G., Dong, J., Wang, S., and Wang, Y. (2019). Stock market prediction based on generative adversarial network. Procedia computer science, 147:400–406.

Zhou, X., Pan, Z., Hu, G., Tang, S., and Zhao, C. (2018). Stock market prediction on high-frequency data using generative adversarial nets. Mathematical Problems in Engineering, 2018.

### Appendix A

Algorithm A1: Minibatch stochastic gradient descent training of the Generative Adversarial Net- work. Let k be the number of times that D is trained individually and let m be the size of the batch.

while convergence is not found do for k steps do

Sample batch of m noise samples {z1, z2, ..., zm} from noise prior pg(z)

Sample batch of m examples {x_{1}, x_{2}, ..., x_{m}} from real data generating distribution pdata(x)
i.e. log-returns

Update D by ascending its stochastic gradient:

5θ_{d}

1 m

m

X

i=1

[log(D(xi) + log(1 − D(G(zi)))].

end for

Sample batch of m noise samples {z_{1}, z_{2}, ..., z_{m}} from noise prior p_{g}(z)
Update G by descending its stochastic gradient:

5θ_{g}

1 m

m

X

i=1

[log(1 − D(G(zi)))].

end for

The gradient-based updates can use any standard gradient-based learning rule, this paper uses the Adam optimizer.

Table A1: Regression at t = 1.

Path Y X

1 0.06 e^{−0.0009} 1.00
2 0.34 e^{−0.0009} 0.84
3 0.16 e^{−0.0009} 0.90
4 0.25 e^{−0.0009} 0.86

5 - -

6 0.00 e^{−0.0009·2} 1.02
7 0.00 e^{−0.0009·2} 1.06
8 0.00 e^{−0.0009·2} 0.98

Note: row 6, 7 and 8 are cash-flows from t = 3, therefore they need to be discounted for one more time step.

Table A2: Decision for early exercise at t = 1.

Path Continuation Exercise Decision

1 0.04 0.1 Exercise

2 0.28 0.26 Continue

3 0.19 0.20 Exercise

4 0.26 0.24 Continue

5 - - -

6 0.02 0.08 Exercise

7 -0.05 0.04 Exercise

8 0.08 0.12 Exercise

### Geometric Brownian Motion

Following Longstaff and Schwartz (2001), stock price St is assumed to be a risk-neutral process that satisfies the following stochastic differential equation (SDE):

dS_{t}= r_{f}S_{t}dt + σS_{t}dW_{t}^{Q} (31)

where W_{t}^{Q}is a Brownian motion under the risk-neutral measure Q and the constants rf and σ represent
the risk-free rate and volatility respectively. Solving this SDE for the initial value S0 provides the
analytical solution

St= S0exp

rf−σ^{2}
2

t + σW_{t}^{Q}

, (32)

which allows for risk-neutral stock price simulation. This version of the GBM is convenient in option pricing as it is risk-neutral under Q. Therefore, risk-neutral valuation is possible: the derivative price equals the expected discounted payoff under the risk-neutral measure Q, which is an important result of the arbitrage theorem.

As equation (24) uses constant parameter σ, it needs to be estimated. This volatility can be estimated with historical data, however, this does not properly represent recent volatility in the stock market due to factors such as the coronavirus. Resulting in historical volatility being an unsatisfactory estimation. Instead, a simple Moving Average with a window of 50 trading days is considered together

with the EWMA, which advantage lies in its capacity of giving higher weights to more recent returns and lower weights to older returns. Its mathematical representation for the one-day volatility at time t is:

σ^{2}_{t} = 1 − λ
1 − λ^{M +1}

M

X

i=0

λ^{i}(Rt−i− rf)^{2}, (33)

where M + 1 is the amount of one day log returns, R_{t}represents the log returns and λ is the smoothing
factor which represents the weights assigned to the log returns. Following Mina et al. (2001), the value
of λ is set to 0.97. Once the daily variance is obtained, it can be transformed to yearly volatility. Daily
MA and EWMA parameter estimates are shown in table A3.

Table A3: Volatility estimates of MA and EWMA.

Asset EWMA MA

Apple 0.0234 0.0268

S&P 500 0.0151 0.0180

### Appendix B

Figure B1. Historical adjusted closing prices of Apple and the S&P 500.

Figure B2. Historical log returns of Apple and the S&P 500.

Figure B3. Histogram of historical log returns of Apple and the S&P 500.

Table B1: Historical option prices Apple.

Strike price 1 month 2 months 3 months 6 months

120 1.58 2.78 4.00 7.30

125 2.68 4.10 5.60 9.35

130 4.48 6.20 7.50 11.70

135 7.00 8.76 10.35 14.35

140 10.00 11.67 13.15 17.25

145 14.35 15.48 16.20 20.60

Trading days 19 33 53 120

Table B2: Historical option prices S&P 500.

Strike price 1 month 2 months 3 months 6 months

3650 47.40 72.01 103.20 183.33

3700 60.00 82.25 115.60 195.50

3750 77.32 98.85 135.00 213.10

3800 98.58 137.04 153.52 240.77

3850 127.27 163.42 190.65 282.84

Trading days 19 33 53 120

### Appendix C

Figure C1. Histograms of simulated log-returns of the GAN.

Figure C2. Histograms of simulated log-returns of the GBM.

Figure C3. Stock price simulations of Apple.

Figure C4. Stock price simulations of Apple and the S&P after the mean shift.

Table C1: Estimated values of the mean shift.

Strikes Apple Value mean shift Strikes S&P 500 Value mean shift

120 -0.00354 3650 0.00027

125 -0.00355 3700 0.00005

130 -0.00329 3750 -0.00007

135 -0.00309 3800 0.00011

140 -0.00298 3850 0.00005

145 -0.00269 - -

Table C2: Estimated prices of the 3 month put.

Apple S&P 500

Strike GAN GBM Real Strike GAN GBM Real

120 4.11 3.43 4.00 3650 105.29 93.18 103.20

125 5.67 4.98 5.60 3700 110.03 111.37 115.60

130 8.06 6.95 7.50 3750 124.53 132.11 135.00

135 10.82 9.38 10.35 3800 167.14 155.18 153.52

140 13.83 12.25 13.15 3850 190.37 181.29 190.65

145 17.67 15.57 16.20 - - - -

RMSE: 0.73 0.72 - - 8.13 6.59 -

Table C3: Estimated prices of the 6 months put.

Apple S&P 500

Strike GAN GBM Real Strike GAN GBM Real

120 6.40 5.77 7.30 3650 193.40 129.53 183.33

125 8.07 7.46 9.35 3700 177.24 147.85 195.50

130 10.84 9.48 11.70 3750 183.02 168.05 213.10

135 13.84 11.85 14.35 3800 240.41 190.22 240.77

140 16.91 14.53 17.25 3850 256.67 214.49 282.84

145 21.04 17.57 20.60 - - - -

RMSE: 0.79 2.37 - - 20.12 53.71 -

Table C4: RMSE of ITM and OTM options.

Apple S&P 500

Option type 1 month 3 months 6 months 1 month 3 months 6 months

GAN ITM 0.63 0.97 0.44 8.45 9.63 18.51

GAN OTM 0.39 0.33 1.03 5.89 4.21 14.74

GBM ITM 0.34 0.85 2.76 15.14 6.72 60.11

GBM OTM 0.24 0.58 1.90 6.94 7.69 50.82

Table C5: Estimated GAN prices with mean shift from the risk-free rate.

Apple S&P 500

Strike 1 month 3 months 6 months Strike 1 month 3 months 6 months

120 1.48 3.19 4.59 3650 32.56 73.74 119.81

125 2.60 4.57 6.06 3700 48.39 92.39 140.04

130 4.30 6.36 7.87 3750 69.13 114.34 162.24

135 6.60 8.57 10.02 3800 95.02 139.33 186.72

140 9.63 11.28 12.59 3850 126.13 167.47 213.08

145 13.44 14.55 15.58 - - - -

RMSE: 0.44 1.44 4.05 - 9.34 22.68 59.14

Table C6: Estimated GAN prices without a mean shift.

Apple S&P 500

Strike 1 month 3 months 6 months Strike 1 month 3 months 6 months

120 3.59 10.96 22.21 3650 36.16 85.74 146.92

125 5.62 13.79 25.53 3700 53.05 106.33 169.33

130 8.22 16.94 29.02 3750 74.95 129.78 193.45

135 11.38 20.38 32.65 3800 101.97 155.97 219.61

140 15.02 24.07 36.41 3850 134.15 185.30 247.32

145 19.05 27.96 40.28 - - - -

RMSE before: 3.94 9.68 17.67 - 6.91 9.51 28.66

Table C7: RMSE’s of GAN estimates with mean shifts and without mean shift.

Apple S&P 500

Mean shift 1 month 3 months 6 months 1 month 3 months 6 months

Optimization 0.52 0.73 0.79 6.80 8.13 20.12

Risk-free rate 0.44 1.44 4.05 9.34 22.68 59.14

Without 3.94 9.68 17.67 6.91 9.51 28.66

Table C8: Estimated GBM prices using the EWMA volatility estimate.

Apple S&P 500

Strike 1 month 3 months 6 months Strike 1 month 3 months 6 months

120 0.86 1.65 4.36 3650 36.55 67.55 93.03

125 1.88 2.90 5.91 3700 52.26 84.06 110.08

130 3.57 4.71 7.82 3750 71.87 103.81 129.26

135 6.02 7.13 10.13 3800 96.29 126.92 150.77

140 9.36 10.20 12.83 3850 126.14 153.41 175.29

145 13.41 13.94 15.95 - - - -

RMSE EWMA: 0.84 1.81 3.97 - 6.54 32.66 91.81

RMSE MA: 0.29 0.72 2.37 - 11.84 6.59 53.71

### Python code snippets

d e f GBM simulation (mu, sigma , t , n ) :

”””

f u n c t i o n t o r e t u r n a m a t r i x o f t h e l o g r e t u r n s f o r t h e G e o m e t r i c Brownian Motion , where t h e rows d e n o t e d i f f e r e n t t i m e s t e p s and t h e columns d e n o t e t h e d i f f e r e n t paths , d o e s n o t i n c l u d e P 0

mu : r i s k−f r e e r a t e

sigma : s t a n d a r d d e v i a t i o n o f t h e l o g r e t u r n s t : amount o f t i m e s t e p s

n : amount o f p a t h s

”””

# s e t random s e e d np . random . s e e d ( 0 )

# s e t dt and dW, n o t e t h a t a s i n g l e s t e p dW i s N( 0 , t−s = 1) in t h i s case

# a s we t a k e t h e c u m u l a t i v e sum a f t e r w a r d s dt = np . a r a n g e ( t )

s i n g l e d W = np . random . normal ( s i z e =(t , n ) ) t o t a l d W = np . cumsum ( s i n g l e d W , a x i s =0)

r e t u r n np . exp ( (mu − sigma ∗∗2 / 2)∗ dt + sigma ∗ total dW .T) .T

d e f LSM( s t o c k p a t h s , K, r ) :

”””

f u n c t i o n which i m p l e m e n t s t h e L e a s t S q u a r e s Monte C a r l o a l g o r i t h m f o r an American put o p t i o n , r e t u r n i n g t h e p r i c e

s t o c k p a t h s : m a t r i x o f g e n e r a t e d p a t h s f o r t h e s t o c k p r i c e , rows need t o r e p r e s e n t t i m e s t e p s and columns r e p r e s e n t d i f f e r e n t p a t h s

K: s t r i k e p r i c e o f o p t i o n

r : r i s k f r e e r a t e

”””

# d e r i v e f i n a l c a s h f l o w v e c t o r

c a s h f l o w = np . maximum(K−s t o c k p a t h s [ − 1 , : ] , 0)

# f u n c t i o n i s implemented s u c h t h a t d i s c o u n t f a c t o r r e d u c e s t o np . exp(−r ) d i s c o u n t f a c t o r = np . exp(−r )

# go backwards i n t h e s t o c k p a t h s o v e r t h e t i m e s t e p s , s t a r t i n g from

# t h e s e c o n d l a s t

f o r i i n r a n g e ( s t o c k p a t h s . s h a p e [ 0 ]−2 ,0 , −1):

# d e t e r m i n e e x e r c i s e v a l u e and b o o l e a n f o r i f t h e r e i s e x e r c i s e p o s s i b i l i t y e x v a l u e = np . maximum(K−s t o c k p a t h s [ i , : ] , 0)

e x b o o l = e x v a l u e > 0

# i f t h e r e i s no p o s s i b l e e a r l y e x e r c i s e , no r e g r e s s i o n i s needed i f True i n e x b o o l :

# u s e t h e L a g u e r r e b a s i s f u n c t i o n w i t h f i f t h d e g r e e p o l y n o m i a l s t o

# r e g r e s s t h e v a l u e s o f t h e d i s c o u n t e d f u t u r e c a s h f l o w s on t h e v a l u e

# o f t h e p a t h s where t h e o p t i o n i s i n t h e money

r e g r e s s i o n = np . p o l y n o m i a l . l a g u e r r e . L a g u e r r e . f i t ( s t o c k p a t h s [ i , : ] [ e x b o o l ] , c a s h f l o w [ e x b o o l ]∗ d i s c o u n t f a c t o r , 3)

C o n t v a l u e = r e g r e s s i o n ( s t o c k p a t h s [ i , : ] )

# c r e a t e b o o l e a n f o r where t o p e r f o r m e a r l y e x e r c i s e e a r l y e x e r c i s e = e x b o o l & ( e x v a l u e > C o n t v a l u e )

# update c a s h f l o w f o r e a r l y e x e r c i s e v a l u e s c a s h f l o w = c a s h f l o w∗ d i s c o u n t f a c t o r

c a s h f l o w [ e a r l y e x e r c i s e ] = e x v a l u e [ e a r l y e x e r c i s e ]

e l s e :

c a s h f l o w = c a s h f l o w∗ d i s c o u n t f a c t o r

p r i c e = np . round ( np . a v e r a g e ( c a s h f l o w∗ d i s c o u n t f a c t o r ) , 4)

r e t u r n p r i c e

d e f d e f i n e d i s c r i m i n a t o r ( n i n p u t s = 1 ) :

’ ’ ’

f u n c t i o n t o d e f i n e t h e d i s c r i m i n a t o r

n i n p u t s : number o f i n p u t s

’ ’ ’

np . random . s e e d ( 1 ) t f . random . s e t s e e d ( 1 ) model = S e q u e n t i a l ( )

# add i n p u t and h i d d e n l a y e r s model . add ( Dense ( 1 0 0 ,

k e r n e l i n i t i a l i z e r = i n i t i a l i z e r s . RandomNormal ( s t d d e v = 0 . 0 2 ) , i n p u t d i m=n i n p u t s ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) ) model . add ( Dropout ( 0 . 3 ) )

model . add ( Dense ( 1 0 0 ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) ) model . add ( Dropout ( 0 . 3 ) )

model . add ( Dense ( 1 0 0 ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) ) model . add ( Dropout ( 0 . 3 ) )

# o u t p u t l a y e r

model . add ( Dense ( 1 , a c t i v a t i o n =’ s i g m o i d ’ ) )

# c o m p i l e model

o p t = Adam( l r = 0 . 0 0 1 , b e t a 1 = 0 . 9 , b e t a 2 = 0 . 9 9 9 )

model . c o m p i l e ( l o s s =’ b i n a r y c r o s s e n t r o p y ’ , o p t i m i z e r=opt , m e t r i c s = [ ’ a c c u r a c y ’ ] ) r e t u r n model

d e f t r a i n d i s c r i m i n a t o r ( model , h i s t d a t a , n e p o c h s , n b a t c h ) :

’ ’ ’

f u n c t i o n t o t r a i n t h e d i s c r i m i n a t o r t o d e t e c t f a k e i n p u t

model : d i s c r i m i n a t o r model

h i s t d a t a : t h e t r a i n i n g / r e a l d a t a

n e p o c h s : number o f t r a i n i n g i t e r a t i o n s

n b a t c h : b a t c h s i z e f o r s t o c h a s t i c g r a d i e n t d e s c e n t

’ ’ ’

h a l f b a t c h = i n t ( n b a t c h / 2 )

# run e p o c h s manually random . s e e d ( 1 )

np . random . s e e d ( 1 ) t f . random . s e t s e e d ( 1 ) f o r i i n r a n g e ( n e p o c h s ) :

# g e t r e a l e x a m p l e s

i n d e x = np . random . r a n d i n t ( 0 , h i s t d a t a . s h a p e [ 0 ] , h a l f b a t c h ) X r e a l = h i s t d a t a [ i n d e x ]

y r e a l = np . o n e s ( ( h a l f b a t c h , 1 ) )

# update model

model . t r a i n o n b a t c h ( X r e a l , y r e a l )

# g e n e r a t e f a k e e x a m p l e s

X f a k e , y f a k e = g e n e r a t e f a k e s a m p l e s d i s ( h a l f b a t c h )

# update model

model . t r a i n o n b a t c h ( X f a k e , y f a k e )

# e v a l u a t e t h e model

, a c c r e a l = model . e v a l u a t e ( X r e a l , y r e a l , v e r b o s e =0) , a c c f a k e = model . e v a l u a t e ( X f a k e , y f a k e , v e r b o s e =0) p r i n t ( i , a c c r e a l , a c c f a k e )

d e f d e f i n e g e n e r a t o r ( l a t e n t d i m , n o u t p u t s = 1 ) :

’ ’ ’

f u n c t i o n t o d e f i n e t h e g e n e r a t o r

l a t e n t d i m : d i m e n s i o n o f i n p u t p r i o r n o u t p u t s : number o f o u t p u t s

’ ’ ’

t f . random . s e t s e e d ( 1 ) model = S e q u e n t i a l ( )

# i n p u t and h i d d e n l a y e r s

model . add ( Dense ( 1 0 0 , k e r n e l i n i t i a l i z e r = i n i t i a l i z e r s . RandomNormal ( s t d d e v = 0 . 0 2 ) , i n p u t d i m=l a t e n t d i m ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) )

model . add ( Dense ( 1 0 0 ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) )

model . add ( Dense ( 1 0 0 ) )

model . add ( LeakyReLU ( a l p h a = 0 . 2 ) )

# o u t p u t l a y e r

model . add ( Dense ( n o u t p u t s , a c t i v a t i o n =’ tanh ’ ) )

r e t u r n model

d e f d e f i n e g a n ( g e n e r a t o r , d i s c r i m i n a t o r ) :

’ ’ ’

f u n c t i o n t o d e f i n e t h e gan

g e n e r a t o r : g e n e r a t o r model

d i s c r i m i n a t o r : t r a i n e d d i s c r i m i n a t o r model

’ ’ ’

t f . random . s e t s e e d ( 1 )

# make w e i g h t s i n t h e d i s c r i m i n a t o r n o t t r a i n a b l e d i s c r i m i n a t o r . t r a i n a b l e = F a l s e

# c o n n e c t them

model = S e q u e n t i a l ( )

# add g e n e r a t o r model . add ( g e n e r a t o r )

# add t h e d i s c r i m i n a t o r model . add ( d i s c r i m i n a t o r )

# c o m p i l e model

o p t = Adam( l r = 0 . 0 0 1 , b e t a 1 = 0 . 9 , b e t a 2 = 0 . 9 9 9 )

model . c o m p i l e ( l o s s =’ b i n a r y c r o s s e n t r o p y ’ , o p t i m i z e r=o pt ) r e t u r n model

d e f t r a i n ( g model , d model , gan model , h i s t d a t a , l a t e n t d i m , n e p o c h s , n b a t c h , k ) :

’ ’ ’

f u n c t i o n t o t r a i n t h e g e n e r a t o r f o r e s t i m a t i n g l o g r e t u r n s

g m o d e l : g e n e r a t o r d model : d i s c r i m i n a t o r gan model : GAN

h i s t d a t a : h i s t o r i c a l d a t a o f s t o c k r e t u r n s