Dr.MarioBersem July,2021 MarkAppel RedeﬁningRisk Master’sThesis

(1)

Master’s Thesis

Redefining Risk

Evaluating Out-Of-Sample Performance of Semivariance Based Portfolios

Written by:

Mark Appel

Universiteit van Amsterdam

MSc Finance - Finance and Real Estate Finance Combination Track

July, 2021

Thesis Supervisor

Dr. Mario Bersem

(2)

Statement of Originality

This document is written by Mark Frederik Appel, who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

Modern portfolio theory is perhaps the most popular theory within financial literature. Its popularity can be explained by the computational efficiency and intuitive framework, but it is often met with strong criticism.

The main criticism that will be addressed is that modern portfolio theory defines portfolio risk and variance to be synonymous. Contrary to variance, semivariance only measures dispersions below a certain threshold, commonly zero. In essence, this paper will compare portfolios generated using (co)variances as an input to their counterparts that use semi(co)variances as an input. It is found that semivariance-based optimisation is commonly superior in terms of returns and various risk measures, but is inferior in diversification when applied on an array of highly diverse assets.

(4)

Preface

Writing a thesis is widely considered as one of, if not the most challenging part of one’s studies. It concludes a period of an essential part of one’s life and requires significant effort. Throughout writing my thesis, I have received support from my supervisor, dr.

Bersem, for which I am grateful. During our meetings I received important remarks and I could not have concluded my thesis as well without them. Thank you. Furthermore, I would like to thank dr. Andonov for the assistance provided during the thesis seminars.

Furthermore, I want to express my gratitude to my excellent professors and lecturers at the University of Amsterdam. Prof. Francke, prof. Marcato, dr. Theebe, dr. Droës, dr.

Schilder, dr. Booĳ and dr. van Ophem, thank you for the great education I have been able to enjoy during the first year of my Master’s. It was extremely unfortunate that due to the pandemic, education had to be organised digitally, which I feel had - literally - distanced be further from you than I would have wanted. However, for the most part, everyone did a great effort in letting me and my co-students enjoy the great education that we have received.

I want to thank my parents for how well they have always acted in my best interest.

Without their support, I would have never been able to be where I am in life now. You have always been there for me and I will be always grateful for that.

Lastly, I would like to thank my dearest friends and my girlfriend. You have always inspired me and challenged my perceptions, which has widened and improved my understanding of the world around me. I have no doubt that we will always be there for each other to experience great times in the future, and support each other where possible.

Mark Appel

(5)

Introduction 1

1.1 Introduction

P

ortfolio optimisation is one of the most popular subjects within financial literature. Markowitz (1952) presented the mean-variance optimisation framework which remains to be one of the most popular theories within financial literature, even nearly seven decades after publication, but is also often criticised in literature. Modern portfolio theory - the theory ‘around’ Markowitz’ mean-variance optimisation - assumes that a sample of returns can be used to estimate expected returns, as well as the covariance of the assets within the sample, with sufficient accuracy. Unfortunately, Markowitz’ model is sensitive to small changes in the inputs. Many researchers have made improvements to Markowitz’ model that can partially circumvent the problems, but not to a full extent.

Most traditional financial models, including modern portfolio theory, define risk and volatility to be synonymous. Volatility - or, standard deviation - is a degree in which a stochastic process varies around its mean over time. This variation can be negative or positive. In practice, investors will not experience an asset that has ‘unexpected positive surprises’ as negative. Downside risk, as the name implies, is a construct that involves considering only negative deviations. Minimising semivariance is hence more intuitive to the investor’s objective rather than minimising variance. Popular metrics of downside risk include value-at-risk and expected shortfall (conditional value-at-risk), but these metrics do not suffice to be applied for portfolio optimisation. In mathematics, the moment of a function are qualities that describe the shape of a function’s plot and variance is known as the second moment. In addition to that, semivariance is known as an example of partial moments.

Semideviation is intuitively similar to standard deviation, however, there are two main differences. Firstly, it only considers deviations that are below some threshold. Secondly, it calculates the disparities below this threshold, and not around the mean. Minimising downside risk allows the portfolio allocation to be more centered around mitigating losses, rather than mitigating uncertainty. The idea of downside risk-based optimisation came up shortly after Markowitz’ breakthrough publication, but due to the computational burden and limited available theories, there was no applicable model until nearly a half century after Markowitz’ initial publication. The computational burden is large because contrary to mean-variance optimisation - which takes only a vector of expected returns and a covariance matrix as inputs - Markowitz’

2020 mean-semivariance algorithm, utilises the complete time series as input which becomes increasingly burdensome for long timeseries of many assets. Besides the computational burden, the development of

(10)

1 Introduction 2

mathematical frameworks for working with partial moments was also developed decades after Markowitz’

publication of modern portfolio theory.

This thesis compares the the out-of-sample performance of both variance-based optimisation methods to their semivariance-based counterparts. For instance, the minimum semivariance portfolio is the semivariance-based counterpart of the minimum variance portfolio. It finds that semivariance-based optimisation consistently achieves higher returns but is inferior to variance-based models in terms of diversification when applied in highly heterogeneous universes. That being said, when applied on stock markets, the results are favourable towards downside-risk based allocation methods. The portfolio optimisation methods are applied on the constituents of the S&P 500 and EuroSTOXX 50 indices, as well as on an array of indices that represent a large part of the investable universe. Besides, for each of these environments, there are multiple samples taken based on sample length. Ultimately, the portfolios are tested on real world out-of-sample data as well as simulated data. In both cases, the results are in agreement with one another.

1.2 Research Questions

Mean-variance optimisation is practically synonymous to maximising an inner-sample Sharpe Ratio, so would not be surprising if a mean-variance optimal portfolio were to achieve a higher out-of-sample Sharpe Ratio than a mean-semivariance optimal portfolio (and, vice versa, a mean-semivariance optimal portfolio would be expected to achieve a higher Sortino Ratio). However, it is still a worthwile comparison to make, because if a mean-semivariance optimal portfolio were to outperform a mean-variance optimal benchmark both in terms of a Sharpe Ratio and Sortino Ratio, then that would be a strong indicator of outperformance.

The main research question is as follows.

Do portfolios that are constructed based on semivariance outperform counterparts based on variance?

Where the research question can be divided into a set of sub-questions. The sub-questions are comparing semivariance-based portfolios relative to their ‘variance-based counterparts’.

1. Do semivariance-based portfolios achieve higher cumulative and/or mean returns?

2. Do semivariance-based portfolios experience less severe maximum drawdowns?

3. Do semivariance-based portfolios achieve higher diversification ratios?

4. Do semivariance-based portfolios achieve higher Sharpe Ratios?

5. Do semivariance-based portfolios achieve lower Sortino Ratios?

(11)

1 Introduction 3

1.3 Hypotheses

The mean-semivariance framework by Markowitz, Starer, et al. (2020) does unfortunately not yield a mean- semivariance optimal portfolio, but it can be approximated. Based on the previously mentioned sub-questions, the corresponding hypotheses are as follows, compared to their ‘variance-based counterparts’.

1. Semivariance-based portfolios generate higher cumulative and mean returns.

Given the fact that variance-based portfolios effectively penalise above-mean returns, it is expected that portfolios that do not do so will achieve higher returns.

2. Semivariance-based portfolios experience less severe maximum drawdowns.

Given that semivariance-based portfolios can fully minimise negative deviations, it is expected that portfolios that are based on semivariances do not experience as severe drawdowns.

3. Semivariance-based portfolios achieve lower diversification ratios.

Given that the metric is based on volatility rather than semideviation, it is expected that variance-based portfolios outperform in maximising the out-of-sample diversification ratio.

4. Semivariance-based porfolios achieve lower Sharpe ratios.

5. Semivariance-based portfolios achieve higher Sortino ratios.

Because semivariance-based methods do not target volatility, it is expected that semivariance-based portfolios will not be able to achieve higher out-of-sample Sharpe Ratios. However, by the same reasoning, they are expected to achieve higher Sortino Ratios.

(12)

Literature Review 2

T

his section will provide a discussion of relevant literature. To keep the discussion structured, the section has been divided into subsections.

2.1 A Short History on Foundational Theories

The first mathematical analysis of asset prices by Bachelier (1900) is considered by many as the birth of quantitative financial theory. In his theory, Bachelier theorised that asset prices can be modelled as a Brownian motion, i.e. as a process where the growth rate follows from a Gaussian distribution. His argument, that the

‘expectation of the speculator is zero’, follows from the property of Brownian motions that past developments have no effect on future price developments. In other words, Bachelier modelled asset prices as martingales and stated that stock performance today is no indicator of future performance, which is remarkably similar to how many think about asset prices up to this day.

It was after the Great Depression, that John Burr Williams published his doctoral thesis, of which Benjamin Graham published a summary (Williams and Graham (1939)). Williams’ theory stated that a stock is only worth the expected value of discounted dividends, stating ‘If earnings not paid out in dividends are all successfully reinvested, then these earnings should produce dividends later’, further concluding ‘In short, a stock is worth only what you can get out of it’. Williams’ work laid the foundations for Myron Gordon’s popular growth model (Gordon (1959)). Implicitly, Williams’ theory suggested that if an investor is only concerned with maximising future income streams, then the investor is also only concerned with maximising the expected portfolio value.

It follows that an investor would only invest in one, seemingly most profitable, asset.

The fact that the theory rejected the notion of diversification was strongly criticised by Markowitz, saying that ‘the hypothesis (or maxim) that the investor does (or should) maximize discounted return must be rejected’, further stating that ‘diversification is both observed and sensible; a rule of behavior which does not imply the superiority of diversification must be rejected both as a hypothesis and as a maxim’ (Markowitz (1952)). It was ‘obvious’ that investors were concerned with both risk and return, and ‘variance came to mind as a measure of risk of the portfolio’

(Markowitz (1991)).

(13)

2 Literature Review 5

2.2 Modern Portfolio Theory

Markowitz (1952) presented mean-variance optimisation, i.e. maximising the expected portfolio returns relative to the amount of portfolio variance. Markowitz’ breakthrough theory is the foundation of modern portfolio theory. In practice, samples of asset returns are used to compute a covariance matrix and a vector of expected returns, which are used to estimate future portfolio returns and variance. Mean-variance optimisation optimises within the - arbitrarily selected - sample, which may lead to undesired outcomes. Therefore, Markowitz’ theory receives many, critical arguments, of which most will be discussed superficially.

Small estimation errors covariance structures can lead to large shifts in computed weights (Michaud (1989)). This case is known as Markowitz’ curse and refers to the unfortunate situation that when there is moreneed for diversification, the less stable the outcomes of mean-variance optimisation are. This is because mean-variance optimisation requires the covariance matrix to be inverted. The condition number of a matrix is the ratio between the matrix’ largest eigenvalue and its smallest eigenvalue. For instance, an identity matrix only has unit eigenvalues and thus a condition number of one. An identity matrix would represent a correlation matrix of uncorrelated assets, as all non-diagional entries are zero. Increasing the correlation between those entries of the matrix is thus synonymous to increasing the matrix’ condition number. A high condition number leads to the matrix being ill-conditioned, causing matrix inversion to yield vastly different outcomes for seemingly insignificant changes in matrix entries, that can easily be caused by estimation errors (Bailey and M. M. Lopez de Prado (2012) and López De Prado (2016)). Intuitively, one would expect longer samples to have less estimation errors, but Broadie (1993) explains a trade-off between estimation errors and nonstationarity. Nonstationarity implies that certain parameters shift over time, meaning that sampled historical returns contain less information about future correlations and expected returns the older they are.

Multiple solutions have been proposed to omit these problems. For instance, Jurczenko, Michel, and Teiletche (2013), Bednarek and Patel (2018), and Maillard, Roncalli, and Teïletche (2010), amongst many others, propose dropping the expected return vector altogether and optimise only using the covariance matrix.

Alternatively, Clarke, De Silva, and Thorley (2002) suggests using additional constraints in the optimisation process to further express the manager’s view. Furthermore, Black and Litterman (1992) attempts to achieve robustness by involving one’s own forecasts as Bayesian priors. Moreover, Ledoit and Wolf (2004) make covariance matrices more robust through regularising the matrix through shrinkage, essentially reducing the most extreme entries of the matrix. Ultimately, on a more cynical note, some researchers propose dropping optimisation altogether and use naïve 1/𝑛 portfolios (DeMiguel, Garlappi, and R. Uppal (2009)).

(14)

2.3 Hierarchical Risk Parity

All aforementioned solutions partially circumvent the weaknesses of the issues surrounding mean-variance optimisation but do completely solve Markowitz’ curse as the covariance matrix, still needs to be inverted.

López De Prado (2016) presents a portfolio allocation method that requires neither matrix inversion, nor estimation of expected returns. Hierarchical Risk Parity uses hierarchical clustering methods to find fundamental, robust similarities between assets. The portfolio allocation takes place in three steps.

1. Tree Clustering

For 𝑁 assets, the 𝑁 × 𝑁 Pearson correlation matrix𝜌^𝑁×𝑁 = {𝜌^𝑖,𝑗}_𝑖,𝑗 where𝜌^𝑖,𝑗 =𝜌[𝑟^𝑖, 𝑟_𝑗] for all 𝑖, 𝑗 in {1, . . . , 𝑁 } is constructed. The 𝑁 × 𝑁 angular distance matrix 𝐷^𝑁×𝑁= {𝑑𝑖,𝑗}_𝑖,𝑗 is computed where 𝑑_𝑖,𝑗= 𝑑[𝑟𝑖, 𝑟_𝑗] =

q1

2∗ (1 −𝜌^𝑖,𝑗). This angular distance matrix can be thought of as an transformation of the original distance matrix where high correlations lead to low distances. Next, the Euclidean distance matrix𝐷˜_𝑁×𝑁is computed. The Euclidean distance between all 𝑁 column vectors in the angular distance matrix is computed where𝑑˜_𝑖,𝑗= ˜𝑑[𝐷_𝑖, 𝐷_𝑗] =

q P^𝑁

𝑛=1(𝑑_𝑛,𝑖−𝑑_𝑛,𝑗)2, 𝐷^𝑛 being the 𝑛^thcolumn vector in 𝐷^𝑁×𝑁. This Euclidean distance matrix can be thought of as being similar to a covariance matrix. For instance, when asset 𝑖 and 𝑗 share similar angular distances with respect to all other assets, then asset 𝑖 and 𝑗 will have a low Euclidean distance, due to the summation operator operating over the entire angular distance matrix. The Euclidean distance matrix𝐷˜_𝑁×𝑁now is a distance metric over the entire matrix, rather than between each individual pair of assets. Tree clustering is an instance of agglomerative, bottom-up clustering. This means that initially, all data points can be seen as individual clusters which are eventually linked up to the point where all data points are in a cluster-of-clusters. A pair of columns (𝑖^∗, 𝑗^∗) are clustered together such that (𝑖^∗, 𝑗^∗) = arg min

𝑖,𝑗 { ˜𝑑_𝑖,𝑗} (where 𝑖 ≠ 𝑗). In other words, asset 𝑖 and 𝑗 are clustered together if they have the least Euclidean distance from each other, i.e.

if they are most similar. Now that one cluster has been formed, the question arises: what is the distance between the newly formed cluster and the other assets?

(15)

C lu st er A C lu st er B

Single Linkage Complete Linkage Ward's Linkage Average Linkage

Figure 2.1:Distance between clusters as determined by different linkage criteria

The four most common metrics to quantify distances amongst clusters, or between clusters and unclustered items are as listed and visualised in figure2.1(from left to right) (Al-Fuqaha (2014)).

a) Single linkage takes the distance between the two closest points in cluster A and cluster B, as the distance between cluster A and cluster B.

b) Complete linkage takes the distance between the two farthest points in cluster A and cluster B, as the distance between cluster A and cluster B.

c) Ward’s linkage takes the incremental sum of squares with respect to the cluster’s centre after clustering A and B as the distance between cluster A and B. In other words, the worse the centre of cluster A describes cluster A’s data points after including cluster B, is how distant cluster B is from cluster A.

d) Average linkage takes the average distance between all points in cluster A and cluster B, as the distance between cluster A and cluster B.

Now, the two vectors corresponding to the assets that have been merged together are removed from𝐷˜ and replaced by the vector containing the distances with the newly formed cluster, according to which linkage algorithm has been selected. Ultimately, there is one cluster of clusters, and thus all 𝑁 assets have been hierarchically clustered.

(16)

2. Quasi-Diagonalisation

The linkage matrix stores all information about our hierarchical clustering. It stores which two items or clusters belong to which cluster, and the distance between those items. Therefore, the covariance matrix is constructed following the order of the linkage matrix. As a result, the new covariance matrix is quasi-diagonal: more similar items are placed in proximity and thus larger values are closer to the matrix’ diagonal.

3. Recursive Bisection

Variance within clusters is calculated bottom-up based upon the quasi-diagonal covariance matrix, starting from the individual assets, up to the final clusters, using inverse variance weighting. Next, weights are allocated bottom-down, where each cluster is bisected into two subclusters. Now, weights within each cluster are allocated inverse to the subcluster’s variance, and normalised so the weights sum to 1.

For 𝑁 assets, a correlation matrix contains (𝑁²−𝑁)/2 (co)variances, or rather sceptically, (𝑁²−𝑁)/2 potential estimation errors. Figure2.2plots the minimum spanning tree based on an Euclidean distance matrix, where the linkage criterion used is Ward’s method. The minimum spanning tree visualises the distance between the assets and how they are linked. It helps intuition to visualise it this way, because using hierarchical relationships, only 𝑁 − 1 codependencies are less prone to estimation error than (𝑁²−𝑁)/2 codependencies.

(17)

Figure 2.2:Minimum Spanning Tree representing clustered correlations

(18)

2.4 Asymmetric Distributions

So far, over a century of literature has been discussed and there is a clear trend: risk is modelled as variance (or standard deviation) and return series are treated as being Gaussian distributions. However, this is an extremely risky assumption to make. For instance, the standard deviation of monthly S&P 500 index returns in the period from 1950 up to 2020 is about 3.5%, with mean returns of 0.66% (Yiu (2020)). A 20% decline would lead to a 𝑍-statistic of 𝑍 = (−0.2 − 0.0066)/0.035 ≈ −6. Under the normal distribution, this event would be expected to occur every 93,884,861 years. In the past 70 years, it occurred twice. Polish born French-American mathematician Benoît Mandelbrot extensively researched stock return distributions, stating that ‘it is my opinion that these facts warrant a radically new approach to the problem of price variation’ (Mandelbrot (1963), Mandelbrot (1967), Mandelbrot and Taylor (1967), and Mandelbrot and Hudson (1998)). Mandelbrot explained return distributions to have fat-tails, meaning that return distributions characteristically have high probability of extreme outcomes. The collapse of Long-Term Capital Management - the hedge fund employing brilliant minds like Myron Scholes and Robert Merton - can be attributed to the underestimation of skewness and kurtosis of asset return distributions (Krugman (1998)). The Black-Scholes-Merton option pricing model heavily relies on the assumption of normally distributed returns, i.e. that asset prices follow from a geometric Brownian motion. If an asset return distribution is fat-tailed, then out-of-the-money options will be heavily undervalued, as the probability of an extreme event affecting its state of moneyness is far more likely than what a normal distribution will foresee.

2.5 Downside risk

Roy (1952) agreed with Markowitz on the fact that rational investors diversify risk. He proposed a safety-first framework, where a rational investor would maximise expected returns relative to the probability that returns fall below some disaster level. Effectively, Roy stated investors pick one threshold 𝑑 and then incorporate this threshold into their optimisation. Following literature generalised this assumption to a 𝑛^thorder safety-first principle (Bawa (1978) and Tse, J. Uppal, and White (1993)). Over time, more downside-risk metrics have been developed, such as VaR and CVaR, which are widely used.

Markowitz (1959) introduced the concept of semivariance, which measures only deviations below some pre-specified target𝜏, such that 𝑆 =𝔼(min [0, 𝑅 −𝜏]²). Markowitz acknowledged that, ‘semi-variance seems more plausible than variance as a measure of risk, since it is concerned only with adverse deviations’ (Markowitz (1991)). Intuitively, is is sensible that investors are not concerned with large, above-mean returns, despite such events attributing to variance. Markowitz continued that ‘But, as far as I know, to date no one has determined whether there is a substantial class of utility functions for which mean-semi-variance succeeds’. Semivariance is a lower partial moment, and the problem in optimising semi-variance is in determining the correlation

(19)

between lower partial moments. Fishburn (1977), Harlow and Rao (1989), and Harlow (1991) made significant contributions to the development of generalised frameworks for lower partial moments, but still failed to incorporate correlations into the lower partial moment framework.

2.6 Downside Risk-based Portfolios

Semivariance (or semideviation) is much more obscure in both practice and academic literature than other downside risk measures such as VaR and CVaR. Campbell, Huisman, and Koedĳk (2001), for instance, introduces a framework for mean-VaR optimisation. There is a range of issues with this methodology. It requires investors to set an arbitrary probability level (Benninga and Wiener (1998)) and predetermine a holding period. Mean-VaR optimal portfolios are highly sensitive to the confidence level selected (Campbell, Huisman, and Koedĳk (2001)) and are unlikely to outperform mean-variance portfolios (Alexander and Baptista (2002)) or mean-variance portfolios with a VaR constraint (Alexander and Baptista (2004)). Besides, VaR does not have the property of subadditivity^∗.

The mean-semivariance optimisation can be performed using an implementation of the Critical Line Algorithm developed by Markowitz (Markowitz, Todd, et al. (1993) and Markowitz, Starer, et al. (2020)). A major shortcoming is that - unlike the mean-variance optimisation problem, which only requires a covariance matrix and vector of expected returns as input - mean-semi-variance optimisation requires the full return series. Estrada (2008) gives an overview of multiple mathematical definitions of semivariance. Given a threshold value of𝜏, for each observation of asset 𝑖’s and 𝑗’s returns 𝑟 in periods 𝑡 = {1, . . . , 𝑇}, Estrada (2008) states that semivariance 𝑠²_𝑖,𝜏and semicovariance 𝑠²_{𝑖,𝑗,𝜏}are best defined as follows, in accordance to Markowitz, Todd, et al. (1993):

𝑠²

𝑖,𝜏= 1 𝑇

𝑇

X

𝑡=1

[(𝑟_𝑖,𝑡−𝜏)⁻]²) (2.1)

𝑠²

𝑖,𝑗,𝜏= 1 𝑇

𝑇

X

𝑡=1

[(𝑟_𝑖,𝑡−𝜏)⁻] ∗ [(𝑟_𝑗,𝑡−𝜏)⁻]) (2.2)

where (𝑋)⁻=













0, if 𝑋 < 0 𝑋 , otherwise

Furthermore, Estrada (2008) introduces a ‘heuristic approach’ of mean-semivariance optimisation where the semicovariance matrix is treated as if it is a covariance matrix and used in traditional mean-variance optimisation, and proves that the portfolio weights are close to, but not identical, to the optimal weights.

∗𝑋 is subadditive if 𝑋(𝑤𝐴 + (1 − 𝑤)𝐵) ≤ 𝑤𝑋(𝐴) + (1 − 𝑤)𝑋(𝐵), like is the case with variance, i.e. the variance of a sum is less or equal to the sum of variances.

(20)

2.7 Performance Assessment

Sharpe (1966) developed what came to be known as the Sharpe Ratio. Given a mean return𝜇, risk free rate 𝑟_𝑓 and standard deviation𝜎, the Sharpe Ratio 𝑆𝑅 equals the ratio of the excess returns (𝜇 − 𝑟^𝑓) over the standard deviation. For that reason it is often referred to as a risk-to-reward ratio.

𝑆𝑅_𝑖 = 𝜇^𝑖−𝑟_𝑓

𝜎^𝑖 (2.3)

One should be aware that any estimated Sharpe Ratio is actually a sample Sharpe Ratio𝑆𝑅. Lo (2002)ˆ proved that assuming Gaussian return distributions, estimated Sharpe Ratios are normally distributed with standard deviation𝜎𝑆𝑅^ˆ .

𝜎𝑆𝑅^ˆ = s

1 𝑇 − 1

1 +1

2 𝑆𝑅ˆ ²

(2.4)

Mertens (2002) generalised Lo’s findings, proving that estimated Sharpe Ratios are normally distributed with standard deviation𝜎𝑆𝑅^ˆ when the return distribution is non-Gaussian. Given a level of skewness𝛾3

and kurtosis𝛾4, the standard deviation can be computed as follows. For Gaussian distributions, which have 𝛾3=0 and𝛾4=3, the equation collapses to the findings by Lo (2002).

𝜎𝑆𝑅^ˆ = s

1 𝑇 − 1

1 +1

2

𝑆𝑅ˆ ²−𝛾3𝑆𝑅 + 𝛾ˆ ⁴− 3 4

𝑆𝑅ˆ ²

(2.5)

It follows that given the aforementioned distribution, that one can test significance of Sharpe Ratios. Bailey and M. M. Lopez de Prado (2012) introduces the Probabilistic Sharpe Ratio 𝑃𝑆𝑅 that equals the probability that an estimated Sharpe Ratio𝑆𝑅 exceeds some given benchmark Sharpe Ratio 𝑆𝑅ˆ ^∗.

𝑃𝑆𝑅(𝑆𝑅^∗) = Prob[𝑆𝑅 ≤ ˆ𝑆𝑅]

𝑃𝑆𝑅(𝑆𝑅ˆ ^∗) = 𝑍







( ˆ𝑆𝑅 − 𝑆𝑅^∗) ∗

√𝑇 − 1 q

1 −𝛾3𝑆𝑅 +ˆ ^𝛾⁴⁻¹

4

𝑆𝑅ˆ ²







(2.6)

The Sortino Ratio 𝑆𝑅⁻(Sortino and Meer (1991) and Sortino and Price (1994)) is remarkably similar to the Sharpe Ratio. Given a mean return𝜇, target return 𝜏 and semideviation 𝑠𝜏, the Sortino Ratio computes the ratio between target outperformance (𝜇 − 𝜏) and downside deviation 𝑠𝜏.

𝑆𝑅⁻= 𝜇^𝑖−𝜏

𝑠_𝑖,𝜏 (2.7)

(21)

The ratios discussed so far all are some sort of risk-adjusted return measures. These ratios can give a distorted image or a sense of security if the investor has been lucky (Bailey and M. Lopez de Prado (2014) and M. Lopez de Prado and Lewis (2018)). Highly diversified portfolios are more robust to unexpected losses, so it is important to have a metric that allows for measuring a degree of diversification. Given a vector of portfolio weights 𝑤, a vector of standard deviations𝜎 and a covariance matrix 𝑉, the diversification ratio (Choueifaty and Coignard (2008)) is the ratio between the weighted sum of the assets’ standard deviation, over the portfolio standard deviation. In other words, it’s the ratio between the portfolio volatility if there were zero diversification benefitsand the true portfolio volatility.

𝐷𝑖𝑣𝑅 =√ 𝑤^𝑇·𝜎

𝑤^𝑇·𝑉 · 𝑤 (2.8)

Portfolio concentration is antithetical to diversification. The Herfindahl–Hirschman index (Hirschman (1964)) is commonly used in microeconomics, for example when calculating market concentration. Given a set of 𝑛 proportions ℎ, such thatP^𝑛

𝑖=1ℎ_𝑖 =1, then the Herfindahl-Hischman index equals the sum of squared proportions ℎ². The Herfindahl-Hirschman index can also be applied on portfolio weights. The decision has been made to refer to this ratio as the concentration ratio. For example, an 1/𝑛 portfolio on the S&P 500 would have a concentration ratio of 500 ∗ (1/500)²=0.002 whereas a portfolio holding only a single stock would have a concentration ratio of 1.

𝐶𝑜𝑛𝑅 =

𝑁

X

𝑖=1

𝑤² (2.9)

2.8 Simulation

Investment strategies are often tested on both real data and simulated data. Most commonly, practitioners would use a geometric Brownian motion and a Cholesky decomposition to generate correlated data sets, reflecting characteristics of return timeseries. However, such a procedure would completely disregard the non-normality of return distributions. Tuenter (2001) provides an algorithm which allows the user to fit a Johnson’s SU-distribution to a set of first four moments. From this distribution, draws can be made correlated using copulas. Johnson’s SU-distributions are commonly used in finance for risk modelling, for instance in option pricing (Choi, Liu, and Seo (2018)). Unlike Gaussian distributions, which are fully characterised by the first two moments, Johnson’s SU-distributions can take a vastly wider range of ‘bell-shapes’ and are thus more practical in describing stock distributions.

(22)

Methodology 3

F

inancial markets are highly unpredictable. Traditionally, correlation structures are estimated from sampled historical timeseries and extrapolated onto the future. However, financial markets are driven by real events, and market structures change over time. This makes financial literature and studies susceptible to data mining; by testing a bad idea on sufficient samples, an excellent result is likely to occur. It is in fact, rather easy to find an extremely overfit strategy that promises great results. For that reason, replicability is of great importance, such that findings can be put to a stress test in other environments. The practice of testing a strategy on different data than that it was developed on, is called backtesting^∗. Therefore, the performance analysis will be performed on three different datasets with different characteristics. Furthermore, for each dataset, multiple estimation periods are used as samples to base portfolio allocation upon. Each dataset will also be simulated for repeated robustness checks.

Datasets Allocation Methods

Estimation Periods S&P 500

EuroSTOXX 50

Market Indices

Hierarchical Downside Risk Parity Hierarchical Risk Parity Mean-Semivariance optimal

Mean-Variance optimal Minimum variance Minimum semivariance

Estimation period 1 Estimation period 2

Estimation period 3 Simulation Estimation

period

Figure 3.1:Overview of the different datasets, allocation methods and estimation windows used.

∗Lawrence Berkeley National Laboratory has developed aninteractive online toolthat generates a set of random return series, creates an overfit strategy, and shows how such a strategy usually performs subpar when applied on other data.

(23)

3 Methodology 15

Table 3.1:Overview of variance-based allocation methods and their semivariance-based counterparts

Variance-based Semivariance-based Minimum Variance Minimum Semivariance Mean-variance Optimal Mean-semivariance Optimal Hierarchical Risk Parity Hierarchical Downside Risk Parity

Throughout this thesis, the main goal is to address whether and how semivariance-based optimisation procedures produce allocations superior to variance-based optimisation procedures. Each row in table3.1 shows a variance-based procedure and its semivariance-based counterpart. The minimum semivariance optimisation will be performed using the algorithm by Markowitz, Starer, et al.2020. The mean-semivariance optimisation is also based upon this algorithm but it is extended, which is explained in section3.3. The paper also provides an implementation of Markowitz’ critical line algorithm, which is used to calculate the minimum-variance and mean-variance optimal portfolios. The hierarchical risk parity portfolios are generated by the algorithm in López De Prado2016, where the only input (i.e., the covariance matrix) can be replaced with the semicovariance matrix.

3.1 Main Analysis

2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006

Estimation period 1

Estimation period 2

Estimation period 3

Test period

Figure 3.2:For each dataset, three estimation windows are chosen

Figure3.2visualises the timeseries on which the portfolios will be optimised, and the out-of-sample test period in which they will be put to the test. This procedure resembles how investing would practically work.

Different lengths of estimation windows are chosen for two reasons.

1. By choosing different lengths of estimation windows, the portfolios can be compared to see which method selects more robust portfolios, i.e. which optimisation method is more robust to estimation errors.

2. By choosing multiple portfolios, there is less space for potential idiosyncrasies that would generate misleading results.

(24)

3 Methodology 16

3.2 Robustness Check: Simulation

Investment strategies are often tested on both real data and simulated data. Most commonly, practitioners would use a geometric Brownian motion and a Cholesky decomposition to generate correlated data sets, reflecting characteristics of return timeseries. However, such a procedure would completely disregard the non-normality of return distributions. Tuenter2001provides an algorithm which allows the user to fit a Johnson’s SU-distribution to a set of first four moments. From this distribution, draws can be made correlated using copulas. Johnson’s SU-distributions are commonly used in finance for risk modelling, for instance in option pricing (Choi, Liu, and Seo2018). Unlike Gaussian distributions, which are fully characterised by the first two moments, Johnson’s SU-distributions can take a vastly wider range of ‘bell-shapes’ and are thus more practical in describing stock distributions.

2025 2024

2023 2022

2021 2020

2019 2018

2017 2016

Estimation period Test set

100 different simulations

Figure 3.3:For each dataset, the portfolios are estimated based on real data and tested on 100 simulations

Simulation is a common method to test performance, because a multiplicity of simulations will contain unexpected events. Essentially, it helps to rule out the chance that the practitioner ends up with ‘false positive strategies’. The simulations are based on the ‘full dataset’, that is, the data upon which the simulation is based stretches from 2006 to 2020. These simulated test sets are pretended to be the time series lasting from 2021 - 2025. The simulations are generated in the following manner.

1. For each return series in each dataset, the mean, standard deviation, skewness and kurtosis are computed.

2. For each dataset, the correlation matrix is computed. If the matrix is not positive-definite^†, the nearest positive-definite correlation matrix is computed using aPython implementationof the method by Higham1988.

3. Using atrading holiday calendar, it is found that there are 1,261 trading days from 2021 up to 2025.

4. The results from 1. and 2. are used as inputs for thePython implementationbased upon the simulation method of Tuenter2001to generate 1,261 datapoints. This process is performed in total 100 times.

†... which was the case for all correlation matrices.

(25)

3 Methodology 17

3.3 The Mean-Semivariance Optimal Portfolio

The algorithm by Markowitz, Starer, et al.2020allows the user to find the minimum semivariance portfolio, maximise expected return given a maximum level of semivariance, or to minimise semivariance given a minimum required expected return. The target return will be set to zero𝜏 = 0, such that semivariance can be interpreted as a penalty for incurring losses in the context of optimisation. It is possible to use some stylised facts of the efficient frontier to arrive at the mean-semivariance optimal portfolio.

1. The efficient frontier runs from the minimum variance portfolio to the asset with the highest expected return.

2. All points on the frontier - except for the optimal point - are surrounded by a point with a higher Sortino Ratio, and one with a lower Sortino Ratio.

Semideviation

ExpectedReturn

Figure 3.4:Representation of the mean-semivariance optimisation algorithm

(26)

3 Methodology 18

The algorithm works the following way. Please refer to figure3.4.

1. Find the minimum semivariance portfolio (golden star).

2. The optimal portfolio has an expected return and semivariance within the range of those of the minimum semivariance portfolio and the maximum return asset (blue dot). For a random level of expected returns (a point between the highest return asset and minimum semivariance portfolio), minimise the semideviation (purple cross). Calculate the Sortino Ratio of this portfolio.

3. Generate a new random portfolio somewhere between the minimum semivariance portfolio and the maximum expected return asset (green cross). If the green cross has a higher Sortino Ratio than the purple cross, then the optimal portfolio is somewhere between the blue dot and the purple cross. If not, the optimal portfolio is between the green cross and the yellow star. Define the range where the mean-semivariance optimal portfolio can potentially be located as the ‘efficient range’. Save the weights of the portfolio with the highest Sortino Ratio so far.

4. Repeat 3. within the efficient range. After sufficient iterations, the efficient range shrinks until no superior portfolio can be found. Therefore, this algorithm is guaranteed to converge to the portfolio with the highest Sortino ratio.

3.4 Linkage Criterion

As discussed in section2.3, the user must select a linkage criterion when using hierarchical risk parity. A good starting point to select a linkage criterion is to visualise the clustering. Multidimensional scaling allows the visualisation of distance matrices similarly to how a metro map visualises travel distances. Figure3.5 plots the multidimensional scaling whilst visualising different clustering methods. It is visually clear that single linkage and Ward’s method both seem most successful, because those methods create clusters without significant overlap. However, single linkage typically produces generalised, large clusters and this is not preferable if data sets are not typically highly homogeneous. Therefore, Ward’s method is chosen because it creates less ‘generalised’ clusters.

(27)

3 Methodology 19

Figure 3.5:Multidimensional scaling allows the visualisation of clustering algorithms

(28)

3 Methodology 20

3.5 Analysing Portfolios

Each allocation method is applied on each estimation window in each dataset (see figure3.1). As a result, each method 𝑚 has a weights vector 𝑥^𝑚. For each portfolio, the out-of-sample returns are the product of the weights vector and the matrix of the out-of-sample returns. The portfolio returns are used to compute a set of statistics, which will be used to compare performance across different allocation methods. These include statistical (partial) moments such as the arithmetic mean return, volatility, semideviation, skewness and kurtosis, risk-adjusted return ratios such as the Sharpe ratio and the Sortino ratio, and other risk-indicators such as maximum drawdown, rolling volatility and value-at-risk.

For the simulated data, each allocation method has a single portfolio weights vector. Given that 100 simulations have been run, the result is that there are 100 sets of statistics to compare. However, it cannot be assumed that all statistics follow a Gaussian distribution to there is need for a non-parametric test. The Wilcoxon singed-rank test is the non-parametric equivalent of the Student’s 𝑡-test. The Wilcoxon signed-rank test tests whether the pairwise difference between two samples is statistically significant from zero. If so, the null-hypothesis is rejected and it can be concluded that these samples not drawn from identical distributions.

If the test is performed single-sided, then the alternative hypothesis tests tests whether either sample has a higher mean value.

(29)

Data 4

T

his chapter will discuss the origin of the data used, discuss how it has been altered and summarise it.

There are three datasets that will be analysed.

1. All firms currently listed on the Standard & Poor’s 500 index (S&P 500).

2. All firms currently listed on the EuroSTOXX 50 index.

3. A set of indices capturing a significant subset of the investable universe, such as stock market indices, private equity indices, fixed income indices, real estate indices, REIT indices and commodity indices.

This dataset will be referred to as the ‘broad indices’.

4.1 Data Selection and Cleaning

S&P 500

The S&P 500 measures the performance of approximately 500 U.S. listed firms. Thanks to its diversity of its constituents, it is widely regarded as a performance indicator of U.S. equity. For all firms currently constituting the S&P 500 index, adjusted closing prices have been collected for the range of 01-01-2006 up to 01-01-2021.

During that sample period, there have been several new listings and delistings. However, correcting for such cases is considered to be out-of-scope for the analysis. This can introduce a selection bias, as delistings can likely be correlated with performance (for instance, in the case of bankruptcy). Therefore, the firms that are listed in a estimation period are excluded from that analysis if they do not have a full return series during that estimation period (see figure3.2). A full table containing all stocks in the dataset with additional information

Figure 4.1:S&P 500’s efficient variance-frontier

(30)

4 Data 22

Figure 4.2:S&P 500’s efficient semivariance-frontier

can be found in appendix table1. All data is sourced from Yahoo! Finance using animplementation for Python.

One firm has been excluded from the analysis, being Caesars Entertainment (CZR). This stock experienced extremely high returns during the period of 01-01-2006 up to 01-01-2021. Including the firm in the analysis would result in an almost completely concentrated portfolio in mean-variance optimal and mean-semivariance optimal portfolios.

EuroSTOXX 50

Figure 4.3:EuroSTOXX 50’s efficient variance-frontier

EuroSTOXX 50 measures the return of the Eurozone’s fifty largest and most liquid stocks. About 62%

of equity in the Eurozone in terms of market capitalisation constitutes the index. For all firms currently constituting the EuroSTOXX 50 index, adjusted closing prices have been collected for the range of 01-01-2006 up to 01-01-2021. A full table containing all stocks in the dataset with additional information can be found in appendix table2. All data is sourced from Yahoo! Finance using animplementation for Python.

One firm has been excluded from the analysis, being Linde PLC (LIN.DE). The stock was temporarily delisted during the sample period. This leads to several problems, for instance a biased computation of

(31)

4 Data 23

Figure 4.4:EuroSTOXX 50’s efficient semivariance-frontier

Table 4.1:Nationality of EuroSTOXX 50 constituents

Country Stocks

France 17

Germany 16

Netherlands 6

Spain 4

Italy 3

Ireland 2

Belgium 1

Finland 1

Total 50

expected (co)variance. Therefore, it was decided to exclude the firm altogether.

Broad Indices

A selection of indices have been selected to analyse how different allocation methods perform in more heterogeneous environments. A selection of public equity indices, private equity indices, fixed income indices, real estate indices, REIT indices and commodity indices covers further diversification opportunities, as well in terms of international diversification. A full table containing indices in the dataset with the respective source can be found in appendix table3. When available, always total return indices have been selected.

Public Equity

The decision has been made to select stock market indices in such a way that the set is highly diverse. For instance, both large-cap indices as small-cap indices are included. Furthermore, the indices are internationally heterogeneous as U.S., European and Asian indices are selected.

(32)

4 Data 24

Figure 4.5:Broad indices’ efficient variance-frontier

Private Equity

Private equity concerns equity of firms that are not listed on a stock exchange. Therefore, private equity is less liquid and transparent, but nonetheless highly demanded by investors thanks to the attractive returns. Three private equity indices have been picked. The LPX50 index measures total returns of the fifty most liquid and largest (in terms of market capitalisation) listed private equity companies. Listed private equity concerns firm that invest into private equity, but themselves are listed on a stock exchange. The Thomson Reuters Private Equity Buyout Index replicates the performance of (leveraged) buyouts based on the performance of private equity sector-based portfolios. Buyouts constitute the largest class within private equity. Lastly, the Thomson Reuters Venture Capital Index replicates the performance of venture capital portfolios. Venture capital refers to investments into privately owned, small firms. Venture capital is considered to be most risky of all private equity investments, because many ventures eventually are discontinued.

Figure 4.6:Broad indices’ efficient semivariance-frontier

(33)

4 Data 25

Fixed-Return Indices

The Federal Reserve Economic Data database (FRED) offers freely-available indices, including fixed-income total return indices by ICE, Bank of America and Merill Lynch. Two indices have been selected; the AAA U.S.

Corporate bond index and the CCC & Lower U.S. High Yield index.

REIT Indices

Real estate investment trusts (REITs) are listed firms that invest in, operate, develop and/or finance real estate. Real estate includes a vast array of real assets, such as residential real estate and offices, but also less commonly thought-of real assets such as infrastructure, data centres and industrial complexes. Real estate markets, unlike public equity, are less transparent because prices are not public knowledge. Therefore, REITs are considered to be a more transparent and liquid method for achieving exposure to real estate markets, as they are publicly listed. Three international REIT indices have been selected. The GPR250 measures the total return of the 250 largest European REITs. The Tokyo Stock Exchange REIT Index measures the total return of all REITs listed on the Tokyo Stock Exchange. Lastly, Wilshire Global REIT Index measures the total return of the major global REITs, but most of its constituents are US-based.

Real Estate Indices

Although highly correlated, REITs are not fully capturing exposure to real estate markets. Direct real estate is the most intuitive method to invest into real estate: a property is bought by an investor who possibly (re)develops the property, receives rents and eventually disposes the property again. Real estate indices try to replicate the return of direct real estate investments, which is complex in nature. For instance, real estate prices are only determined when properties are sold. Whereas identical stocks are sold by the millions daily, a property is sold orders of magnitude less frequently. For a further discussion, refer to Geltner, MacGregor, and Schwann (2003).

Direct real estate investments are highly illiquid, so therefore it was decided to select indices that correct for the lack of liquidity. Three location-based total return indices by MSCI have been selected, replicating liquidity-corrected returns of direct real estate investments in the U.S., U.K. and Europe (ex U.K.).

Commodity Indices

Lastly, commodities have interesting characteristics. For instance, precious metals such as gold and silver are considered to offer diversification benefits because precious metals typically are appreciating when there is economic uncertainty. On the other hand, the prices of commodities that are used industrially such as steel and oil are likely to correlate more stronger with equity markets, for obvious reasons.

(34)

4 Data 26

The Thomson Reuters/CoreCommodity CRB Index measures developments in most commodity markets, for instance gasoline, copper and orange juice. Besides, the Philadelphia Stock Exchange Gold/Silver Index has been included to offer further diversification possibilities.

4.2 Descriptive Statistics

The data is summarised using box plots. Box plots provide visually intuitive visualisations of characteristics of the return data’s distributions in the full sample ranging from 2006 - 2020.

Figure4.7plots the annualised mean of each asset’s returns. Clearly, the median S&P 500 had superior returns in the dataset compared to EuroSTOXX 50 or the broad indices.

Figure 4.7:Annualised mean return of all included data

Figure4.8plots the annualised volatility of each asset’s return series. Whilst offering seemingly higher returns, S&P500 firms also had higher volatility.

Figure 4.8:Annualised volatility of all included data

(35)

4 Data 27

Figure4.9plots the annualised semideviation of each asset’s return series. Compared to figure4.8, it is a striking difference that the disparity of the semideviation of S&P 500 firms and EuroSTOXX 50 firms on the one hand, and the volatility of S&P 500 firms and EuroSTOXX 50 firms on the other hand, is significantly smaller. This means, that the contribution from positive ’swings‘ to variance for firms in the S&P 500 was a more significant then elsewhere. This implies a strong, positive skewness, as well.

Figure 4.9:Annualised semideviation of all included data

Figure4.10plots the skewness of each asset’s return series. A Gaussian distribution would have zero skewness. In the dataset, S&P 500 firms mostly had strong positive skewness, while the broad indices all had negative skewness.

Figure 4.10:Return skewness of all included data

(36)

4 Data 28

Figure4.11plots the kurtosis of each asset’s return series. A Gaussian distribution has a kurtosis of three.

A probability distribution with kurtosis𝛾4 > 3 is called leptokurtic. Note that nearly all timeseries in the data are leptokurtic, making the approximation using Gaussian distributions problematic..

Figure 4.11:Return kurtosis of all included data. Note that nearly all distributions are leptokurtic

A further description of the dataset can be found in appendix section1. It shows that daily returns of 15%

are clearly not uncommon in the dataset, contrary what would be expected from a Gaussian distribution.

This extends to the lowest observed daily return for all assets in the dataset. The dataset includes the 2008 Great Financial Crisis, 2012 European Sovereign Debt Crisis and the 2020 crash resulting from the ongoing COVID-19 pandemic. Especially for EuroSTOXX 50 firms, there have been severe single-day losses in the dataset. Such events would be too rare to occur this common under a Gaussian distribution, further providing motivation to perform semivariance-based optimisation.

Dr.MarioBersem July,2021 MarkAppel RedeﬁningRisk Master’sThesis

Master’s Thesis

Redefining Risk

Mark Appel

July, 2021

Dr. Mario Bersem

Statement of Originality

Preface

Contents

Introduction 1

1.1 Introduction

P

1.2 Research Questions

Do portfolios that are constructed based on semivariance outperform counterparts based on variance?

1.3 Hypotheses

Literature Review 2

T

2.1 A Short History on Foundational Theories

2.2 Modern Portfolio Theory

2.3 Hierarchical Risk Parity

C lu st er A C lu st er B

2.4 Asymmetric Distributions

2.5 Downside risk

2.6 Downside Risk-based Portfolios

2.7 Performance Assessment

2.8 Simulation

Methodology 3

F

Datasets Allocation Methods

Estimation Periods S&P 500

EuroSTOXX 50

Market Indices

Estimation period 1 Estimation period 2

Estimation period 3 Simulation Estimation

period

3.1 Main Analysis

3.2 Robustness Check: Simulation

3.3 The Mean-Semivariance Optimal Portfolio

3.4 Linkage Criterion

3.5 Analysing Portfolios

Data 4

T

4.1 Data Selection and Cleaning

S&P 500

EuroSTOXX 50

Broad Indices

4.2 Descriptive Statistics