• No results found

This chapter focuses on investigating what model provides the best empirical fit in modelling and forecasting market impact. The market impact I of a meta-order is considered to be the expectation of implementation shortfall (IS), conditional on the meta-order characteristics X:

I(X) = EIS|X. (4.1)

Modelling and forecasting market impact can therefore be considered equivalent to re-gressing and point forecasting IS.

In particular, it is investigated whether any of the considered models significantly improve on the Power Law form2.11 of the Square-Root Law in point forecasting IS.

The considered models include alternative parametric models, non-parametric mod-els and modmod-els with additional explanatory variables. The modmod-els are systematically assessed and compared based on their point forecasts in a cross-validation (CV) setting.

The parametric models are estimated using (Weighted) Non-linear Least Squares (WNLS) and the associated standard errors are computed using a bootstrap.

4.1. Considered models

This section provides an overview of the considered market impact models.

4.1.1. Proposed in Literature

Although the Square Root Law model (2.10) and the Power Law form (2.11) are the most popular in market impact literature, alternative models have also been proposed.

For instance,Zarinelli et al.(2015) instead propose a logarithmic dependence between market impact I and relative order size π:

I(Xi| α, β) = α · σi· log 1 + β · πi. (4.2) They motivate this by arguing that the logarithmic function has a better empirical fit, especially for very small or large orders of magnitude of π, i.e. π < 10−3 or π > 10−1.

Alternatively, literature in the locally linear order book (LLOB) framework proposes that the market impact function is better approximated by a function that is linear for small π and a square-root for larger π (e.g. Donier et al., 2015; Bucci, Benzaquen, et al., 2019). The mentioned literature seemingly propose no explicit form of this model.

Therefore two forms are proposed here: first, a function with a smooth exponentially

weighted transition1 between a linear and a square-root function I(Xi| α, β, γ) = α · σi· πi· e−γ·πi+ β · σi·√

πi· 1 − e−γ·πi, γ ≥ 0. (4.3) Alternatively, a continuous function with a discrete transition between a linear and a square-root function

I(Xi| α, β, π) =

(α · σi·πi

π for πi < π α · σi·√

πi for πi ≥ π, π≥ 0. (4.4) Finally, Frazzini et al.(2018) propose an additive model with both a square-root and a linear part:

I(Xi| α, β, γ) = α + β · σi· πi+ γ · σi·√

πi. (4.5)

The latter model seems to be motivated mainly by its simplicity and empirical fit2. 4.1.2. Additional Benchmark Models

Additional to the models considered in earlier literature, some other ad-hoc model spec-ifications are also considered. They are considered as simple benchmark models and are proposed here without any additional theoretical or empirical motivation.

The following benchmark models are considered:

▷ Linear model in π

I(Xi| α, β) = α + β · σi· πi. (4.6)

▷ Third-order polynomial in π

I Xi| α, ⃗β = α +

3

X

k=1

βk· σi· πik. (4.7)

▷ Third-order polynomial in √ π

I Xi| α, ⃗β = α +

3

X

k=1

βk· σi· π

k 2

i . (4.8)

4.1.3. Non-parametric Benchmarks

Additional to the considered parametric models above, two less restrictive non-parametric models are also considered. If these models significantly outperform the parametric models, that might indicate that the collection of considered parametric models is too

1Note that limπ→0e−γ·π= 1 and limπ→∞e−γ·π= 0

2Frazzini et al.(2018) estimate a linear regression model with the mentioned square-root and linear term and some additional explanatory variables.

restricted. The two univariate non-parametric regression models are used to estimate the function f in the following regression:

I(Xi) = σi· f (πi). (4.9)

The first considered non-parametric model is a linear spline regression model, in which the f (·) in (4.9) is modelled as a continuous piecewise linear function. The linear spline is considered because it can yield reasonable approximations to many functions, but re-quires relatively little computational power in estimation and forecasting. See Appendix (B.1) for the details of the model and the estimation.

The second non-parametric estimator is the locally weighted scatterplot smoothing (Lowess) estimator of Cleveland (1979). Cameron and Trivedi (2005) argue that the Lowess estimator is a standard choice of a local regression estimator and argue that it has several attractive properties compared to standard kernel regression: it uses a variable bandwidth, it is more robust against outliers and provides more appropriate estimates near the boundary of the variable domain. The main disadvantage of the Lowess estimator is that producing forecasts with it has a very high computational cost.

See Appendix (B.2) for the details of the model and the estimation.

4.1.4. Additional Characteristics

Meta-orders are of course not fully characterized by only the relative size π and stock volatility σ. It is therefore investigated whether incorporating additional characteristics can improve forecasts point forecasts of IS.

It is important to keep the forecasting models parsimonious, because there is much noise in measuring market impact. Therefore, additional characteristics are only incor-porated if they pass the following selection procedure: first, the characteristic should have an intuitive connection to market impact, ideally established in earlier literature.

Furthermore, segmenting on the characteristic should lead to (economically) significant differences in average IS. Finally, the differences in average IS across segments of the characteristic should be consistent across different buckets of relative size π.

Based on the described selection procedure, the following characteristics are added to the model: a binary indicator for a large total market capitalization (≥e10 Bln), a buy/sell indicator, a geographical region indicator and an indicator for emerging/devel-oped markets. The visual inspection of the consistent effect of the characteristics on average IS is displayed in Appendix FiguresC.1,C.2,C.3and C.4.

The dummy variables are incorporated into the main models in an additive manner3. Other additional characteristics that were considered include: bid-ask spread, indicators for the moment of the trade (e.g. month or day of the week) and absolute order size.

3A multiplicative approach was also considered, but did not seem to improve forecast performance.

4.2. Estimation and Assessment

4.2.1. Estimation

A natural choice for estimating a parametric form of the market impact function I(· ; θ), is to estimate the parameter (vector) θ by Non-linear Least Squares (NLS). The NLS estimator can be computed4 as

θ = arg minˆ

θ

N

X

i=1

ISi− I(Xi; θ)2

. (4.10)

Alternatively, the heteroskedasticity introduced by the (stock-specific) price volatility can be taken into account by a Weighted Non-linear Least Squares (WNLS) estimator

θ = arg minˆ

θ

N

X

i=1

ISi− I(Xi; θ) σi

!2

. (4.11)

The standard errors of the estimated parameters are computed using the bootstrap method as described inCameron and Trivedi(2005): first, B = 1000 bootstrap resamples (with replacement) are drawn from the original sample. Then, the WLNS estimate (4.11) is computed for each bootstrap resample. Finally, the standard errors are computed as the standard deviation of the bootstrap estimates of the parameters.

4.2.2. Model Assessment

In empirical modelling, it is crucial to also assess a model post-estimation. This section outlines some numerical measures that are used for assessing and comparing the different regression models. In particular, models are assessed by comparing model forecasts ˆyi

with observed outcomes yi.

Assessing the performance on the data that is also used for estimating the model, i.e.

in-sample, leads to the risk of overfitting. The model forecasts are therefore assessed out-of-sample. In particular, the models are assessed using (stratified) cross-validation, which is described below.

R2 measure

A popular measure for statistical fit in linear regression models is the so called R2. Cameron and Trivedi(2005) propose several pseudo-R2measures, which extend the R2to non-linear regression models, including one based on the correlation between prediction and observation:

R2COR:=

PN

i=1(yi− ¯yi)(ˆyi− ˆyi) r

PN

i=1(yi− ¯yi)2

PN

i=1(ˆyi− ˆyi)2

. (4.12)

4Note that it is generally not feasible to find closed-form solutions for computing ˆθ, but numerical optimization schemes are still able to solve the minimization problem.

The latter is chosen for the attractive property of always being between 0 and 15. Note that there also exist (pseudo-)R2 measures that adjust for model complexity, to avoid overfitting. The latter are less relevant in the context of out-of-sample model assessment.

Although some literature questions the validity of pseudo-R2 measures for non-linear model selection (e.g. Spiess & Neumeyer, 2010), they are still commonly used in the literature for assessing market impact models (e.g. Almgren et al., 2005; Briere et al., 2020). The latter can be partially attributed to the overall popularity of the R2 measure in asset management, especially for the assessment of linear factor models.

Prediction Error Measures

Prediction error measures are also popular measures for model assessment, especially if forecasting is a main purpose of the model. The two considered examples are the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)

RMSE = v u u t

1 N

N

X

i=1

(yi− ˆyi)2, MAE = 1 N

N

X

i=1

|yi− ˆyi|. (4.13)

MAE is sometimes considered to be less sensitive to outliers than RMSE.

Cross Validation

Performance measures can be computed out-of-sample by estimating the model on one part of the data and then computing the measures on another part of the data. However, it is crucial to have a large data set to estimate the model, because of the large noise in observing market impact. Therefore, using K-fold Cross-Validation (CV) is more preferable to splitting the data (Friedman,2017).

In K-fold CV the data is split into K parts (folds). The performance measures are then computed once on each fold, with the model estimated on the other folds. The performance measures are then finally averaged over all the folds to provide an estimate of out-of-sample performance.

Because the out-of-sample performance is estimated as a sample average, a naturally induced estimator of the standard error is the standard deviation over the folds divided by√

K. Nadeau and Bengio (2003) argue however, that a standard paired sample t-test using the latter estimated standard error is not appropriate for pair-wise comparison of model performances: because the training folds overlap and are therefore not indepen-dent. They propose a corrected paired sample t-test. See Appendix (B.3) for a detailed discussion.

An easy way to split the data into K folds is to do it through random selection. This may however lead to unbalanced folds, possibly skewing the performance results. Using stratified K-fold CV can solve the issue of unbalanced folds: the folds are then chosen such that each fold has (roughly) the same distribution of specified variables.

5Using a different pseudo-R2 measure seemingly does not change the results by a lot.

4.3. Results

4.3.1. Forecast Performances

Table 4.1 shows the computed performance measures for the models6 considered in Section 4.1. The displayed measures are computed using stratified 1000-fold CV. The stratification is such that for each of folds, the distribution of relative size π is as close as possible to the distribution in the full sample (see Table3.1).

Table 4.1.:Estimated performance measures for predicting implementation shortfall (bps), displayed per model. The performance measures are computed using stratified 1000-fold Cross-Validation, with the stratification such that relative order size π follows a distribution close to that in Table3.1. Out-of-sample performances that represent a statistically signif-icant improvement on the Power Law model are displayed in bold text.

Reference RMSE R2 MAE

Main Models

Square Root Law Equation (2.10) 122.138 2.55% 75.712

Power Law Equation (2.11) 122.126 2.57% 75.713

Logarithmic Equation (4.2) 122.123 2.51% 75.730

Square Root + Linear Equation (4.5) 122.110 2.55% 75.722

Smooth Transition Equation (4.3) 122.133 2.55% 75.715

Additional Benchmarks

Linear Equation (4.6) 122.355 2.40% 75.791

Third order polynomial Equation (4.7) 122.112 2.50% 75.736 Square root polynomial Equation (4.8) 122.083 2.56% 75.709

LOWESS Section4.1.3 122.126 2.50% 75.600

Linear Spline Section4.1.3 122.100 2.50% 75.727

With Extra Variables

Square Root Law Eqn. (2.10) & Sec. 4.1.4 122.129 2.55% 75.687 Power Law Eqn. (2.11) & Sec. 4.1.4 122.126 2.59% 75.715 Logarithmic Eqn. (4.2) & Sec.4.1.4 122.100 2.59% 75.720 Square Root + Linear Eqn. (4.2) & Sec.4.1.4 122.105 2.58% 75.731 Smooth Transition Eqn. (4.5) & Sec.4.1.4 122.140 2.58% 75.707 Most notable in the results is that all considered models have poor forecasting perfor-mance, as can be seen in the consistently low values of the R2 measure and high values of the RMSE/MAE measures. For example, the R2never exceeds 2.6%7and the RMSE is close to the unconditional standard deviation of IS. These results imply that only a

6Model (4.4), a discrete transition from linear to square root, is left out. This is because the estimated threshold parameter πis consistently estimated very close to 0, so that the estimated model simply represents a square root.

7This is consistent with other empirical literature such asAlmgren et al.(2005) orBriere et al.(2020).

small part of the variance in IS is explained by the considered models. It is important to note here that poor forecasting performance is also observed for the non-parametric alternatives and the models with additional explanatory variables.

Next to the overall poor performance, it is also notable that the differences in perfor-mance are very small. For example, the difference in RMSE between the best models is smaller than 0.1, whereas the 1000-fold Cross-Validation standard error is roughly 0.6. More specifically, only three performance measures yield a (pairwise) statistically significant improvement over the Power Law model and no model improves on the Power Law model in all three measures.

4.3.2. Estimated Parameters

The estimated parameters and the bootstrapped standard errors of the considered mod-els are displayed in Appendix Table C.1.

The computed WNLS estimate for δ in the Power Law form (2.11) equals ˆδ = 0.463, with standard error 0.012. There is thus a small, but statistically significant, difference between the estimated parameter and the δ = 0.5 in the Square-Root Law (2.10).

Note that some of the parameters of the smooth transition model (4.3) are statistically insignificant. This is seemingly because the model is estimated to be very close to a pure square-root function. Also note that several of the estimated parameters of the extra variables are statistically insignificant, specifically for the region and market capitalization indicators.

4.4. Conclusion

This chapter addresses sub-question (Q.2); which asks what model is appropriate for forecasting expected IS, i.e. market impact.

The results in this chapter show that all considered models perform very poorly in forecasting expected IS. This can not be (fully) attributed to model misspecification or omitted variables, as the poor performance is also observed when considering non-parametric models or models with additional explanatory variables. Instead, it may be attributed to IS being dominated more by market noise than by market impact.

ParaphrasingBucci, Mastromatteo, et al.(2019), this can be explained as follows: even if a meta-order contributes to 10% of the market volatility during the execution, it only explains 1% of the market variance.

Using a systematic CV model assessment methodology, it is found that many of the performance differences between the models are economically and statistically insignif-icant. It is therefore concluded that it is not appropriate to perform model selection based only on the ranking of out-of-sample forecasting performance. In particular, it is found that none of the considered models consistently improves on the Power Law form of the Square-Root Law, which is considered in the heuristic model.

The Square-Root Law model has consistently been found to be an appropriate model across many empirical studies and many different data sets. Since the data set considered in this research does not lead to different conclusions, it seems most appropriate to

follow earlier literature. Therefore the Power Law form (2.11) of the Square-Root Law is considered as the most appropriate market impact model in the rest of the thesis.

Chapter 6 investigates whether any model can improve on the heuristic model in providing probabilistic forecasts of IS rather than point forecasts.