Non-parametric portfolio optimization

(1)

Non-Parametric Portfolio Optimization

Jozef Battjes 10003337

Msc in Econometrics

Track: Financial Econometrics Date of final submission: 22-05-2016 Supervisor: Simon Broda

Second Reader: Peter Boswijk

Abstract

The purpose of this thesis is to determine from four non-parametric portfolio optimization methods, one semi-parametric portfolio optimiza-tion method and the equally-weighted portfolio, which one performs best by taking the Sharpe ratio as the financial performance measure. The portfolio optimization methods that are investigated are two variations of the portfolio optimization method based on the Extreme Risk Index, the Mean-Expected Shortfall optimization method and two variations of the minimum-variance portfolio. This thesis uses stock data from 26 stocks from the Dow Jones Index and the data span the period between 1-December-2004 and 1-December-2014, so 10 years of historical stock data is used. Walk forward optimization is used with di↵erent optimization time window length sizes of: 1 year, 2 years, 3 years and 4 years, to calculate the optimal returns for all the optimization methods with. Then to compare the methods with each other, the cumulative wealths for all methods are calculated and the annualized Sharpe ratios and the transaction costs for all methods are estimated. However, since the annualized Sharpe ratios are estimated, a pair-wise and a multiple Sharpe ratio test are performed to be able to compare the true Sharpe ratios of all methods with each other. The results of both Sharpe ratio tests show that all methods have the same true Sharpe ratio, also when di↵erent optimization time window lengths are used. This means that all the investigated methods perform the same by taking the Sharpe ratio as the financial performance measure.

(2)

This document is written by Student Jozef Battjes who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1 Introduction

Background

There are several ways to create an optimized portfolio of stocks. This can be done by using a parametric method, by using a non-parametric method and by using a semi-parametric method. A non-parametric method does not rely on the assumption that the data follows a distribution, so it has no prefixed amount of parameters. A parametric method, on the other hand, does rely on the assump-tion that the data follows a distribuassump-tion so it has a prefixed amount of param-eters which can be estimated. A semi-parametric method is a combination of a parametric method and a parametric method. In this thesis mostly non-parametric methods will be investigated and one semi-non-parametric method. The reason there is one semi-parametric method investigated is because a Student-t distribution is fitted to the data in one non-parametric method, so this method becomes semi-parametric in this case. In this thesis three main non-parametric portfolio optimization methods will be investigated and for two of these methods two di↵erent variations will be investigated. Furthermore there will be investi-gated one non-parametric method which required no optimization, which is the equally-weighted (EW) portfolio.1

There has already been done empirical research in the area of non-parametric portfolio optimization. These comparison studies mostly compare non-parametric portfolio optimization methods with the EW portfolio, which is used as a bench-mark method. To be able to compare the non-parametric portfolio optimization methods with the EW portfolio mostly Sharpe ratios and portfolio turnovers are investigated, which are estimated from the optimal returns of the optimiza-tion methods and the returns of the EW portfolio. These empirical research papers can mainly be divided into two categories, in one case the EW portfo-lio outperforms optimized portfoportfo-lios and in the other case the EW portfoportfo-lio is outperformed by optimized portfolios. Four empirical research papers that con-clude that the EW portfolio outperforms optimized portfolios will be described first, whereafter two papers will be described that conclude that optimized portfolios outperform the EW portfolio.

For instance De Miguel, Garlappi, & Uppal (2009), compare the Markowitz (MW) method and its extensions to naive diversification2_{, and find that the} EW portfolio outperforms the MW methods. This is because in all di↵erent cases that are investigated in this paper, the EW portfolio has higher estimated Sharpe ratios and returns and lower transaction costs then the MW methods. Then the paper by Brown, Hwang, & In (2013), also shows that the EW port-folio outperforms optimal diversification. To calculate the optimal portport-folio returns they used two variations of the mean-variance optimization method3_, with and without short selling constraints and the Bayes-Stein portfolios. They 1_{The equally-weighted portfolio is also a non-parametric method since it assumes no} dis-tribution of the stock returns, but requires no optimization unlike the other five optimization methods. See Section 3.4

2_{Naive diversification is the same as using the equally-weighted portfolio.}

3_{The mean-variance optimization method is the same as the Markowitz optimization} method. First introduced in Markowitz (1952).

(5)

mainly used estimated Sharpe ratios to arrive at their conclusion. More evi-dence that the EW portfolio outperforms optimized portfolios is described in the paper by Kolusheva (2008). In this paper the EW portfolio is compared to both the MW method and the Bayes-Stein portfolios. The results show higher estimated Sharpe ratios and lower transaction costs for the EW portfolio com-pared to the optimized portfolios. Finally, the paper by Allen, McAleer, Powell, & SingH (2014), compares the Markowitz method with the EW portfolio and the Mean-ES optimal portfolio and also shows the EW portfolio outperforms optimized portfolios, because the EW portfolio has a higher estimated Sharpe ratio than the optimized portfolios.

However, there all also empirical research papers published that show that the EW portfolio is outperformed by optimized portfolios. For instance the paper by Kritzman, Page, & Turkington (2010), gives counter evidence to the previous described papers that conclude that optimized portfolios do not out-perform the EW portfolio. They argue that the papers that conclude that the EW portfolio outperforms optimized portfolios, use short-term rolling samples in their research to estimate the optimal returns with for all methods, which are not reliable. They say that by using longer-term rolling samples, optimized portfolios outperform the EW portfolio. Another paper that shows that opti-mized portfolios outperform the EW portfolio is the paper by Mainik, Mitov, & R¨uschendorf (2015). They describe how the portfolio optimization strategy based on the Extreme Risk Index (ERI) compares to the EW portfolio and the minimum-variance (MV) portfolio. They conclude that the portfolio opti-mization method based on the ERI outperforms the EW portfolio and the MV portfolio, because of a higher estimated Sharpe ratio and a higher annualized return.

Main question

Then the main question of this thesis is defined as follows: which of the four non-parametric portfolio optimization methods, one semi-parametric portfolio optimization method and the equally-weighted portfolio, perform the best by taking the Sharpe ratio as the financial performance measure? This will also be the contribution to the literature form this thesis.

Methods

The first and second portfolio optimization methods that will be investigated in this thesis are two variations of the portfolio optimization method based on the ERI, where the same approach is used as in Mainik et al. (2015). This method minimizes the probability of only the largest portfolio losses and uses multivariate extreme value theory. Since this method is only based on the highest total losses in the historical data this method should perform well when large drawdowns occur like for instance during the financial crisis in 2007-08. The two variations of the portfolio optimization method based on the ERI use two di↵erent estimators of the tail index, which measures how fast the tail

(6)

of the distribution of the losses of the stocks4 _{goes to zero. This tail index is} estimated in one case with the Hill estimator, which is a non-parametric method. In the other case the tail index is estimated with an alternative estimator, by fitting a Student-t distribution to the logarithmic losses of the stocks, which is a parametric method. This is also why the non-parametric portfolio optimization method based on the ERI, becomes a semi-parametric portfolio optimization method in the case the tail index is estimated with the degrees of freedom from the Student-t distribution.

The third portfolio optimization method that is investigated in this thesis is the Mean-Expected Shortfall optimization method, where the same approach is used as in Rockafellar and Uryasev (2000). Expected Shortfall (ES) is an alternative to the Value-at-Risk (VaR) in the sense that the VaR is the lowest amount ⌘ such that with probability the loss will not exceed ⌘ and the ES is conditional expectation of the losses above ⌘. In their paper they minimize an auxiliary function to minimize the ES and lower the VaR simultaneously, which can be solved efficiently because it is a linear programming problem.

Then the fourth and fifth optimization method that are investigated in this thesis are two variations of the Markowitz (MW) method. The MW method is the most well known portfolio optimization method that will be investigated in this thesis. Then, since no constraint for a target return is used since also none of the other methods that are investigated in this thesis use this constraint, the optimal portfolio found with the MW method is the minimum-variance (MV) portfolio. Since the MV portfolio is found by minimizing the variance of the stock returns, it needs an estimate of the covariance matrix. In this thesis two di↵erent covariance matrixes will be used to investigate the performance of the MV portfolio, the sample covariance matrix and the shrinkage covariance matrix5.

Finally the equally-weighted (EW) portfolio is investigated in this thesis. This method invests an equal proportion of wealth in all stocks every time the portfolio is rebalanced6. This method is investigated in this thesis because most empirical previous research on non-parametric portfolio optimization compar-ison studies concludes that the found optimal portfolios do not consistently outperform the EW portfolio, which is further investigated in this thesis.

Then the methods will be compared to each other to see which one performs best. First the optimized portfolio returns for the optimization methods and the returns for the EW portfolio are calculated based on 10 years of historical data with walk forward optimization7. The optimized portfolio returns for the optimization methods and the returns for the EW portfolio are calculated for four di↵erent optimization time window lengths (WLs) of: 1 year, 2 years, 3 years and 4 years. This is done to investigate if the length of the optimization time windows has an impact on the performance of the methods. Then the returns of all methods can be compared against each other. To do this, first the 4_{The losses of the stocks that are used as data in this thesis, for the optimization methods} based on the ERI, are the logarithmic losses of the stocks.

5_{As calculated in Ledoit and Wolfe (2003).} 6_{See Section 3.5.1}

(7)

cumulative wealths for all methods are calculated to investigate which method has the highest cumulative wealth the most often at 1-Dec-2014, which is the last day of the historical data time window. Secondly the annualized Sharpe ratios and the transaction costs for all methods and for all di↵erent WLs are estimated to investigate which method has the highest annualized Sharpe ratios and the lowest transaction costs. However, since the annualized Sharpe ratios are estimated with the mean and the variance of the optimized returns of all optimization methods and the returns of the EW portfolio, the true Sharpe ratios for each method are unknown. To find out if the true Sharpe ratios are significantly di↵erent from each other a pair-wise Sharpe ratio test and a multiple Sharpe ratio test are performed. Ledoit and Wolf (2008) developed a pair-wise Sharpe ratio test to compare two true Sharpe ratios against each other in the case non-i.i.d. data is used. In this test all the optimization methods are tested against the benchmark method, the EW portfolio. Then from the results out of this test it can be concluded if any of the portfolio optimization methods that are investigated in this thesis outperforms the EW portfolio and which optimization method performs the best, or that all methods perform the same. The other test that is performed to test if the true Sharpe ratios are significantly di↵erent from each other, is the multiple Sharpe ratio test by Wright, Yam, & Yung (2014). Their test compares all the true Sharpe ratios simultaneously against each other to see if at least one true Sharpe ratio is significantly di↵erent from the rest, but this test does not tell which one(s). So from this test it can be concluded if at least one method outperforms the other methods, but it is unknown which method(s). If the pair-wise and the multiple Sharpe ratio test do not conclude that any of the true Sharpe ratios are significantly di↵erent, this means that all the non-parametric portfolio optimization methods, the semi-parametric portfolio optimization method and the EW portfolio perform the same. Both the pair-wise and the multiple Sharpe ratio test are also performed for all di↵erent WLs.

Scope and limitations

For this thesis 10 years of historical data is used, which span the period between 1-December-2004 and 1-December-2014, so the financial crisis in 2007-08 is included. However, the years 2015 and 2016 have not been included in the research. This can be a limitation because there might have been big changes in stock prices in the last two years, like for instance a new financial crisis might have occurred, which can impact the returns that are calculated with all the methods that are investigated in this thesis. Moreover, only the di↵erent window lengths of: 1, 2, 3 and 4 years have been investigated in the walk forward optimization and only 10 years of historical data is used, which may both be limitations because more data could give more accurate results. Furthermore, only 26 stocks from the Dow Jones (DJ) index are used because four of the thirty stocks available in the DJ index at 1-Dec-2014 had missing data points. This might be a limitation because those stocks could impact the results, if they for instance show large losses.

(8)

Outline

This thesis is structured as follows. Section 2 contains the literature review of previous empirical research on non-parametric portfolio optimization compari-son studies. Section 3 describes the methods that are used. Section 4 describes the data that are used. Section 5 describes the results. Section 6 gives the conclusion and ends with a discussion. An appendix provides mathematical proofs and the Matlab code that is used.

2 Literature Review

This section contains the literature review. First four empirical research pa-pers will be described that conclude that the equally-weighted (EW) portfolio outperforms optimized portfolios, by taking the Sharpe ratio as the financial performance measure. Then two empirical research papers will be described that give the opposite results, because they conclude that optimized portfolios outperform the equally-weighted portfolio, by using the Sharpe ratio as the financial performance measure.

2.1 Equally-weighted portfolio outperforms optimized portfo-lios

Most empirical research that has been conducted on the comparison of non-parametric portfolio optimization methods finds that the benchmark method, the EW portfolio, outperforms optimized non-parametric portfolio optimization methods. This is because the EW portfolio has a higher Sharpe ratio and a lower portfolio turnover than the compared optimal portfolios in these research papers. The following four empirical research papers that will be described all conclude that the equally-weighted portfolio outperforms the optimized portfo-lios.

In the paper by De Miguel et al. (2009), the Markowitz (MW) method and its extensions are compared to the EW portfolio. They test 14 variations of the MW method with the EW portfolio and for the data they use seven di↵erent empirical data sets and simulated data. They use walk forward optimization and use estimation windows of 60 and 120 months of monthly data, to calculate the optimized returns for all the optimization methods with. Then they esti-mate the Sharpe ratios and the portfolio turnovers and calculate the certainty equivalent returns (CEQs), which are all used for comparison of the method against each other. They find that non of the variations of the MW method that are tested consistently outperform the EW portfolio, since none of the optimized portfolios always has a higher Sharpe ratio or CEQ return than the EW portfolio. Besides that, the portfolio turnover of the EW portfolio is much lower than that of the MW methods. They also give an explanation for the poor performance of the MW methods. They say that because the estimation window lengths of 60 or 120 months are to low to calculate the optimal returns with for the optimization methods. Moreover they say that an estimation win-dow length of 3000 or 6000 months is needed to calculate the optimal portfolios

(9)

with when there are respectively 25 or 50 assets available, for the MW methods to have a higher CEQ than the EW portfolio.

Then the paper by Brown et al. (2013) also shows that naive diversification outperforms optimal diversification. Their paper can be seen as an extension to the paper by De Miguel et al. (2009). To calculate the optimal portfolio returns they use two variations of the mean-variance optimization method, with and without short selling constraints, and the Bayes-Stein portfolios. They use two di↵erent data sets in their research. First they use 20 portfolios from the Fama-French four-factor model, and collect this data from Ken French’s website. Secondly they use 588 monthly stock returns, from January 1963 to December 2011, collected from the Center for Research in Security Prices (CRSP). The use walk forward optimization to calculate the optimal returns for every opti-mization method with. They use an optiopti-mization time window length of 120 months every backtest to calculate the optimal weights for the next month. In the case they use the portfolios as data, they find that the Sharpe ratio (0.1393) and the portfolio turnover (0.0197) of the EW portfolio are respectively higher and lower than the Sharpe ratio (0.0770) and the portfolio turnover (145.4396) of the mean-variance portfolio without short selling constraints. Also they find that the EW portfolio has a higher CEQ return (0.0056) than the mean-variance portfolio without short selling constraints (-0.4344). Finally they find that the Bayes-Stein portfolios, without and with short selling constraints, have Sharpe ratios of respectively (0.0795) and (0.1571), CEQ returns of respectively (-0.1656) and (0.0057) and portfolio turnovers of respectively (46.7942) and (0.2315). So the EW portfolio has a higher Sharpe ratio, CEQ return and a lower turnover than the Bayes-Stein portfolio without short selling constraints and a lower Sharpe ratio and CEQ return then the constrained Bayes-Stein portfolio, but a much lower turnover. This means that the constrained Bayes-Stein portfolio does not consistently outperform the EW portfolio.Then in the case they use stock returns as data, they find again that the EW portfolio has a higher Sharpe ratio (0.1921) than the mean-variance portfolio without short selling constraints (-0.0097) and with short selling constraint (0.0717) and that the EW portfolio has a lower portfolio turnover (0.0686) than the mean-variance portfolio without short selling constraints (24.6812). Also the CEQ return of the EW portfolio (0.0079) is higher than the mean-variance method without short selling constraints (-2.9122) and with short selling constraints (0.0024). Then they conclude that the EW portfolio outperforms all the tested optimiza-tion methods, both when the portfolios and the stock returns where used as data.

More evidence that the EW portfolio outperforms optimized portfolios is described in the paper by Kolusheva (2008). In the paper the EW portfolio is compared to both the mean-variance optimization method and the Bayes-Stein portfolios. Monthly total return data is used from the 10 sectors from the S&P 500 Index. The data span the period between 31-10-1989 and 30-11-2007, which contains a total of 218 monthly observations. The estimation window lengths (WLs) used for the walk forward optimization are 60, 90 and 120 months. Then the Sharpe ratios are estimated for all methods for a WL of 60 months, which are found to be: EW portfolio (0.1959), mean-variance portfolio with short

(10)

sale constraint (0.1989) and without short sale constraint (-0.0137) and with turnover constraint (0.1845), the Bayes-Stein portfolio with turnover constraint (0.1771) and finally the MV method with turnover constraint (0.1513). Also the portfolio turnovers are calculated, which are found to be: EW portfolio (0.0254), the turnover constrained optimized portfolios (Bayes-Stein, minimum-variance and mean-variance) are (0.7784, 1.0633 and 1.0643). Furthermore, it was found that by increasing the optimization WLs that the Sharpe ratios all decreased but that the ordinal rankings of the methods remained largely unchanged. Then it is concluded that the EW portfolio outperforms the optimized portfolios, because the Sharpe ratios of the EW portfolio are mostly higher than the Sharpe ratios of the optimized portfolios and because the portfolio turnovers of the EW portfolio are always lower than the portfolio turnovers of the optimized portfolios. Furthermore it is concluded that the performance of the MW method is improved by including the short sale constraint, but the slightly higher Sharpe ratio of the MW method in comparison to the EW portfolio in that case does not outweigh the much higher portfolio turnover.

Finally Allen et al. (2014) compare the MW method with the EW portfolio and the Mean-ES optimal portfolio and find that the EW portfolio outperforms the optimized portfolios. This paper can also be seen as an extension to the paper by De Miguel et al. (2009). They use data from datastream, containing daily values of ten European Stock Indices, ranging from the beginning of 2005 to the end of 2013. They use walk forward optimization, by calculating the optimal portfolio weights for the optimization methods with two years of data and then they use these optimal weights to calculate the optimal returns for the optimization methods for the next year. They repeat this process until the op-timized returns for the year 2013 are calculated. Then they conclude that none of the methods consistently outperforms the EW portfolio, by comparing the Sharpe ratios of all the investigated methods with each others. This conclusion gives more evidence, after the three previous described papers, that optimized portfolios do not outperform the EW portfolio. Moreover, they find that the MW method with positive weights and an individual exposure with w _{ 0.4,} was the most successful optimization method. Also they find that the Mean-ES optimal portfolio does not outperform the MW method, by comparing the Sharpe ratios, and depends on the quantile level chosen.

2.2 Optimized portfolios outperform equally-weighted portfolio Kritzman et al. (2010) give counter evidence to the previous described papers that optimal portfolios do not outperform the EW portfolio. They argue that the papers that conclude that the EW portfolio outperforms optimal portfo-lios, use short-term rolling samples in their research to estimate the optimal returns with, which are not reliable. So instead of using the usual 60 or 120 monthly return data for the short-term rolling samples, they use longer-term samples to calculate the optimized returns for all the portfolio optimization methods that they investigate with. The optimal portfolios they investigate are the minimum-variance (MV) portfolio, a market portfolio and a portfolio constructed with a constant risk premium for every asset. They use 13 data

(11)

sets comprising 1028 data series constructing more then 50000 optimized port-folios. Also they split up the portfolios in three groups: asset class, beta and alpha. They find that the EW portfolio is outperformed by the MV portfolio for all three asset classes, because the optimized portfolios have higher Sharpe ratios. Furthermore they find very low transaction costs for all methods, so the transaction costs can be neglected. They conclude that optimized portfolios, like the mean-variance portfolio, outperform the equally-weighted portfolio, by using longer-term samples to calculate the optimal returns with.

Another paper that shows that optimized portfolios outperform the EW portfolio is the paper by Mainik et al. (2015). They describe how the portfolio optimization strategy based on the Extreme Risk Index (ERI) can be calcu-lated mathematically and how this method compares to the EW portfolio and the MV portfolio. The portfolio optimization method based on the ERI uses multivariate extreme value theory, to be able to minimize the probability of large portfolio losses. They perform a backtest study with the daily returns of 444 stocks from the S&P 500, with data ranging from 2001 to 2011. They find that the ERI method outperforms both the EW portfolio and the MW method on stocks with heavy tails. In particular they find that the ERI method has an annualized return of 11.5% over 4 years including the financial crisis of 2007-08. This advantage outweighs the higher transaction costs of the ERI method.

In this thesis the previous described research papers will be extended and it will be investigated if one of the 5 non-parametric portfolio optimization methods and 1 semi-parametric portfolio optimization method, which are: two variations of the portfolio optimization method based on the ERI, the Mean-ES optimal portfolio and two variations of the MW method, outperform the EW portfolio. Moreover, this thesis will try to find the answer to the question which of the non-parametric portfolio optimization methods and 1 semi-parametric portfolio optimization method performs the best, by taking the Sharpe ratio as the financial performance measure.

3 Methods

This section describes the methods that are used in this thesis which are used to investigate the performance of all the non-parametric portfolio optimiza-tion methods, the one semi-parametric portfolio optimizaoptimiza-tion method and the equally-weighted portfolio. First the two variations of the portfolio optimiza-tion method bases on the Extreme Risk Index (ERI) will be described, then the Mean-ES optimization method and two variations to the Markowitz method will be described. Then the equally-weighted portfolio will be described, which is used as a benchmark method. Hereafter the calculations of: the Sharpe ratio, the pair-wise Sharpe ratio test and the multiple Sharpe ratio test are described. Finally it is described how the transaction costs are calculated and how the op-timal returns for all optimization methods are calculated based on walk forward optimization.

(12)

3.1 Portfolio optimization based on the Extreme Risk Index The most recent developed non-parametric portfolio optimization method that is investigated in this thesis is the method based on the Extreme Risk Index (ERI), developed by Mainik et al. (2015). Like described in the Introduction, this method is based on multivariate extreme value theory so that it can mini-mize the probability of large portfolio losses. Therefore, portfolio optimization method based on the ERI is expected to perform well when their is a high probability of large portfolio losses, like the financial crisis of 2007-08.

3.1.1 Background theory

The portfolio optimization method based on the ERI uses multivariate regular variation (MRV). A random vector X = [X1, ..., XN] is multivariate regularly varying if the joint distribution of its polar coordinates, (Q, Z) = (_kXk1, XkXk11), is equal to:

B((q 1Q, Z)|Q > q) ! ⇢! ↵⌦ , q ! 1,

where ! stands for weak convergence and ⇢a is the pareto distribution, with: ⇢a(⌫,1) = ⌫ afor ⌫ 1, which describes the asymptotic distribution of q 1Q. Then is a probability measure on the 1-norm unit sphere, SN

1 , which is called the spectral measure of X and it describes the asymptotic distribution of Z. Finally the symbol _{⌦ is the direct product of probability measures.} This definition means that Q, which is asymptotically pareto distributed, is independently distributed from Z which is asymptotically distributed. Then a > 0 is the tail index, which is a measure for how fast the tail of the losses goes to zero. A property of the tail index is that it separates finite moments of Q from infinity moments, so there holds that:

E[Qc] ⇢

<_{1 when c < a} =_{1 when c > a.}

The lower the value of the tail index, the heavier the tail of the data is, which means there are more extreme values in the tail and the tail goes to zero very slowly. The pareto distribution is a distribution with a heavy tail, which is used in the definition of MRV above.8 Finally the L1-norm of X,kXk1, is defined as Pn

i=1|Xi| or in other words it is the sum of all the absolute elements in X, which is used for the calculation of the radial polar coordinate, Q. Then the angular polar coordinate, Z, is the direction in which X is pointing. Furthermore the scaling property holds, i.e.,

P (X2 ⌫A) = ⌫ aP (X2 A),

where ⌫ 1, ⌫A ={⌫x : x 2 A} and A is a measurable set sufficiently far away from the origin so this means that: kxk1 t⇤ 8 x 2 A, for some large t⇤.

(13)

Now that MRV is defined, the Extreme Risk Index can be defined based on MRV and the scaling property as:

w( , a) = lim q!+1 P (wTX > q) P (_kXk1 > q) = Z SN 1 max(0, wTz)ad (z). (1)

3.1.2 Application to stock data

In thesis all portfolio optimization methods and the EW portfolio are applied to real market data, which will be stocks from the Dow Jones Index.9 _{In this} thesis, short positions are excluded in the calculations of the optimal portfolio weights for all optimization methods, so this means that the optimal portfolio weights calculated with the portfolio optimization methods based on the ERI, are nonnegative and less than or equal to one.

First the portfolio weights wi are defined as the proportions of wealth that are invested in all stocks, for i = 1, ...., N , whenever the optimal portfolio weights are calculated according to the portfolio optimization method based on the ERI. Then the portfolio weights can be put into a vector as: w = [w1, ..., wN]. Then for the portfolio optimization method based on the ERI, logarithmic losses are used. The logarithmic losses, Xi(t), are calculated with the stock prices Si(t), for i = 1, ...., N and t = 0, 1, ..., T , as:

Xi(t) = log ✓ Si(t) Si(t 1) ◆ = log(Si(t 1)) log(Si(t)). (2) So this will give a matrix X of size [T x N ] which contains all the logarithmic losses.10 _{Then the portfolio optimization method based on the ERI uses the} logarithmic losses in the matrix X for optimization and calculates the optimal portfolio weights, w⇤.

To be able to calculate the optimal portfolio weights, the Extreme Risk In-dex has to be estimated which is described is below. First the matrix containing all logarithmic losses, X, has to be transformed into its polar coordinates Q and Z. This is done for all days, t = 1,...,T . Then X(t), the logarithmic loss vector for day t, is transformed as:

X(t) = (Q(t), Z(t))

= (kX(t)k1, X(t)kX(t)k1 1),

where_kX(t)k1is the L1-norm, which is the length of the [1 x N ] vector X(t), and X(t)kX(t)k1 1 is the direction in which X(t) is pointing. This transformation is needed to be able to estimate a and in (1). Now a can be estimated by applying the Hill estimator to the radial parts, Q, and is calculated as:

ˆ a = _P I I j=1log( Q(j) Q(I+1)) , 9_{See Section 4.}

(14)

where Q_(j) are the ordered radial parts from the historical data. These ordered radial parts are calculated by ordering all the radial parts from the historical data in descending order, this means from largest to smallest, so there holds that: Q₍₁₎ . . . Q_{(T )}. Then I is the integer value corresponding to the 10% largest radial parts, with which Q(I+1) can be calculated. So from the ordered radial parts vector, of size [1 x T ], the I+1 highest value is equal to Q_(I+1). Then the sample index can be calculated from Q_(j) = Q(ij), where ij is the sample index of the order statistic in the full data set. These sample indexes are needed to calculate the corresponding angular parts, corresponding to the 10% largest radial parts, which are used to estimate in (1). Then the Extreme Risk Index can be estimated as:

ˆw = 1 I I X j=1 max(0, wTZ(ij))ˆa,

where Z(ij) are the the angular parts corresponding to the 10% highest radial parts. Then the optimal portfolio weights can be calculated by minimizing the Extreme Risk Index estimator with respect to the portfolio weights (w), as:

w⇤= arg min w ˆw. 3.1.3 Alternative estimator for the tail index

The alternative tail index used in this thesis to investigate the performance of the portfolio optimization method based on the ERI method, is calculated with the degrees of freedom from the Student-t distribution. This is done by fitting the Student-t distribution to the matrix of logarithmic losses (X), and then calculating the degrees of freedom V for every stock separately. Then the average of all the found degrees of freedom V for all stocks, is the alternative estimator for the tail index a.

3.2 Mean-Expected Shortfall optimization

The third method that will be investigated in this thesis is the Mean-Expected Shortfall optimization method, which will be described in this section. The same approach as in the paper by Rockafellar and Uryasev (2000) is used, to calculate the Mean-ES optimal portfolio. This section will first describe the background of the auxiliary function that Rockafellar and Uryasev use to calculate the Mean-ES optimal portfolio and then an application to stock data will be described.

3.2.1 Background theory

Expected Shortfall (ES) is an alternative to the Value-at-Risk (VaR), in the sense that the VaR is the lowest amount ⌘ such that with probability the loss will not exceed ⌘ and the ES is conditional expectation of the losses above ⌘. Since the ES is dependent on the VaR and because of its definition, it follows

(15)

that the ES is always bigger then the VaR. Based on this information, Rockafel-lar and Uryasev then define an auxiliary function which can be minimized to find optimal portfolio weights. The advantage of this auxiliary function is that it reduces the minimization problem to a linear programming problem, which can be solved efficiently. To set up this function, definitions of the VaR and the ES are needed. The VaR , where is any probability level with _{2 (0, 1),} is defined as:

⌘ (w) = min{⌘ 2 R : (w, ⌘) },

where is the cumulative distribution function of the losses, which is defined as:

(w, ⌘) = Z

L(w,y)⌘ (w)

p(y)dy,

where L(w, y) is the loss associated with the portfolio w_{2 R}n and p(y) is the random vector of uncertainties y _{2 R}m_{. The vector w of portfolio weights} belongs to the set of all available portfolios W. In this thesis y will be the ob-served returns for the di↵erent stocks at every time period. Then the Expected Shortfall for a given probability _{2 (0, 1), or ES , is defined as:}

(w) = (1 ) 1 Z

L(w,y) ⌘ (w)

L(w, y)p(y)dy.

So this means that the ES calculates the expected loss given that the loss exceeds the VaR .

Then the auxiliary function, which is used to calculate the optimal portfolio weights with, is defined as:

F (w, ⌘) = ⌘ + (1 ) 1 Z y2Rn [L(w, y) ⌘]+p(y)dy, (3) where: [t]+ = ⇢ t when t > 0 0 when t_{ 0.}

If the function F in (3) is minimized with respect to ⌘, the ES can be derived. This can be done because the minimizer of F with respect to ⌘, can be plugged into in F after which the ES is found.11 This result is very useful since now the ES can be derived, without having to calculate the VaR first, which makes the minimization problem easier to solve.

Then ES can be minimized to find the optimal portfolio weights w⇤ by using the auxiliary function:

min

w2W (w) =(w,⌘)2W xRmin F (w, ⌘).

So by minimizing F with respect to w and ⌘ simultaneously, the result is equal to minimizing ES with respect to w only. This is a very useful result, because if optimal portfolio weights w⇤need to be found, this can be done without using

(16)

the ES which uses the VaR which makes the minimization problem hard to minimize.

To be able to minimize the function F , the integral in the right-hand side of (3) needs to be approximated. This can be done based on a sample set: y1, ..., yH from p(y), where yh is a vector of losses with h = 1,...,H. Then the convex function F is approximated as:

˜ F (w, ⌘) = ⌘ + 1 H(1 ) H X h=1 [L(w, yh) ⌘]+. (4)

Then the Mean-ES optimization method is applied to stock data in this the-sis. Therefore the portfolio weights wi for i = 1, ...., N are defined the same as before, as the proportions of wealth that are invested in all stocks, when-ever the optimal portfolio weights are calculated according to the Mean-ES optimization method. Then the portfolio weights can be put into a vector as: w = [w1, ..., wN]. The following two constraints should hold for the portfolio weights, whenever the portfolio weights are optimized according to the Mean-ES optimization method: N X i=1 wi = 1, wi 0 (for i = 1, . . . , N ).

So all portfolio weights sum to one and short selling is not allowed, so all portfo-lio weights have to be nonnegative. These two constraints make up the feasible set of portfolios, denoted by W. Furthermore, the constraint for the target re-turn is left out, to make the ES better comparable to the other investigated methods in this thesis which also do not use this target return constraint. Then Rit are defined as the logarithmic stock returns of stock i on day t. Then the loss of this portfolio on day t is defined as:

L(w, Rt) = [w1R1t+ ... + wNRN t] = w0Rt.

So the sum of the portfolio weights multiplied by their returns in the same period. Then the auxiliary function (3) is approximated with (4) and then by replacing y with R, by rewriting L(w, Rh) to w0Rh and finally by replacing h = 1,...,H with t = 1,...,T . Then (4) can be calculated as:

˜ F (w, ⌘) = ⌘ + 1 T (1 ) T X t=1 [ w0Rt ⌘]+.

Then ˜F (w, ⌘) has to be minimized over W x R to find the optimal portfolio weights w⇤. But this minimization problem can be reduced to linear program-ming by using the auxiliary variables ✓t, where ✓t= [ w0yt ⌘]+, for t = 1,...,T .

(17)

After rewriting, the new linear objective function is: ⌘ + 1 T (1 ) T X t=1 ✓t.

The objective function can then be minimized subject to the following linear constraints: N X i=1 wi= 1, wi 0 (for i = 1, . . . , N ), ✓t 0 (for t = 1, . . . , T ), w0yt+ ⌘ + ✓t 0 (for t = 1, . . . , T ).

Since the objective function is linear now and all constraints are linear this minimization problem can be solved by linear programming. Then, after the objective function is minimized with respect to ✓, the optimal portfolio weights w⇤ are found.

3.3 Markowitz optimization

The fourth and fifth methods that are be investigated in this thesis are two variations to the Markowitz method, based on two di↵erent estimators of the covariance matrix. These two methods are described in this section.

Optimizing a portfolio based on the Markowitz method (MW) means op-timizing the portfolio based on Modern Portfolio Theory (MPT)12_{. The idea} behind the MW method is to construct a portfolio of assets, which will be only stocks in this research, that maximizes the expected return for a given level of risk or equivalently minimizes the risk for given level of expected return. The level of risk in the MW method is estimated with the variance of the portfolio. This means that a portfolio with a higher variance has a higher risks than a portfolio with a lower variance. So the goal is to find the portfolio with the lowest variance for a given (prefixed) level of expected return, also called the target return, which is the optimal portfolio according to the MW method. By optimizing the portfolio according to the MW method, the efficient frontier is found which consists of the highest returns for given levels of variance. How-ever, since the other methods that are investigated in this thesis do not use the target return constraint and to make the methods better comparable, this tar-get return constraint is also left out in the MW method. As a result of this, the variance of the stocks will be minimized without the target return constraint, so only the the portfolio with the minimal variance will be calculated from the efficient frontier. This portfolio with the minimal variance of all portfolios on the efficient frontier is called the minimum-variance (MV) portfolio. Further-more, the MW method needs an estimate for the covariance matrix, which will be estimated by two di↵erent covariance matrixes in this thesis: the sample covariance matrix and the shrinkage covariance matrix.

(18)

Then the MW method is applied to stock data in this thesis. The idea is that the MW method minimizes the variance: 2 = w0⌃w, for a given estimate ofˆ the covariance matrix: ˆ⌃, with respect to the portfolio weights w. The same way as in the previous described methods, the portfolio weights wi are defined as the proportion of wealth that is invested in every stock for i = 1, ...., N , so there holds that: wi = [w1, ..., wN]. Then the optimal portfolio weights can be calculated according to the MW method as:

w⇤= arg min

w w

0_ˆ ⌃w,

where ˆ⌃ is the covariance matrix estimator for the logarithmic returns of all stocks. The objective function is minimized such that following constraints hold:

w01N = 1,

wi 0 (i = 1, . . . , N ),

where 1N is a column vector of ones with length N, and this constraints ensures all portfolio weights sum to one. The second constraint ensures all portfolio weights are nonnegative, so short selling is not possible. Then, as already explained, the constraint:

w0µ = R+,

with µ the estimated mean of the daily returns and R+ _{a target Return, with} R+ > 0, is left out of the optimization problem because the other methods also do not use this constraint. Therefore, the Markowitz method finds the minimum-variance portfolio.

Since the objective function is quadratic and all the constraints are linear, this optimization problem is a quadratic programming (QP) problem which can be solved efficiently. However, to be able to minimize the objective function the covariance matrix needs to be estimated. Two di↵erent ways to estimate the covariance matrix are described below.

3.3.2 Di↵erent covariance matrix estimators

The minimum-variance (MV) portfolio is calculated with an estimate of the covariance matrix, also called the variance-covariance matrix. In this thesis, two di↵erent covariance matrix estimators will be investigated. First, the matrix R contains the logarithmic returns of the stock price matrix S. This means that the matrix R has size [T x N ], where T is the total length of the history and N is the total number of stocks that are investigated. Then the first covariance matrix estimator that will be investigated, the sample covariance matrix, is defined as: S = 1 T 1 T X t=1 (Rt R)(R¯ t R)¯ 0 , (5)

(19)

where ¯R = _T1 PT_t=1Rt with size [1 x N ] and Rt is a [1 x N ] vector of all logarithmic stock returns at time t.

Secondly, the shrinkage estimator by Ledoit and Wolfe (2003), is used as an estimator for the covariance matrix. It consists of a convex combination of the sample covariance matrix and the single-index covariance matrix. The idea behind the shrinkage covariance matrix is that the sample covariance matrix is unbiased but has a high variance and that the single-index covariance is biased, but has a low variance. This means that a combination of the two results in an improved estimator, the shrinkage covariance matrix estimator. This Shrinkage covariance matrix estimator, is based on the single-index model13_{, which is} defined as:

Rit = constanti+ iR0t+ ✏it,

where Rit are the logarithmic returns for stock i at time t for i = 1,...,N and t = 1,...,T , R0t are the logarithmic market returns14 at time t and i are the slope estimates. The idea behind the above described single-index model is that the returns R are only dependent on each other through the e↵ect of one factor which is the market factor R0. This means that the residuals, ✏it, are uncorrelated with each other and uncorrelated with R0t. The covariance matrix implied by the single index model is:

⌃ = ₀2 0 + ,

where 2₀ is the variance of the market returns R0, is the diagonal matrix with the variances of the residuals (=var(✏i) = ii) on its diagonal and is defined before. Then the single-index covariance matrix can be estimated as:

S = S₀2ˆ ˆ0 + ˆ ,

where S₀2 is the sample variance of R0, ˆ is a diagonal matrix of estimated residual variances (= ˆii) on its diagonal, and ˆ are slope estimates. Then the single-index covariance matrix can be calculated, by calculating:

ˆ = cov(R, R0) var(R0) , S₀2= var(R0), ˆ_ii_{= var(R}_i₎ ˆ2 i var(R0).

Then the shrinkage covariance matrix estimator15 can be calculated, based on the shrinkage intensity, , as:

ˆ

S = S + (1 )S, (6)

13_{As in Sharpe (1963).}

14_{In this thesis the market factor R0} _{is chosen to be the mean of the logarithmic return} matrix R. So for the [T x N ] matrix of logarithmic returns R, R0 is a vector of size [T x 1] which contains the means at every time t

15_{For the calculations of the shrinkage covariance matrix, Matlab code from the website} from M. Wolfe is used.

(20)

where S is the sample covariance matrix, S is the single-index covariance matrix estimator and  is the shrinkage intensity. The optimal shrinkage intensity is defined as: ⇤ = PN i=1 PN j=1var(sij) cov(sij, sij) PN i=1 PN j=1var(sij sij) + ( ij ij)2 ,

where s_ij is the (i,j)-th entry of S, sij is the (i,j)-th entry of S, ij is the (i,j)-th entry of ⌃, and ij is the (i,j)-th entry of the true covariance matrix ⌃. Of course the true covariance matrix ⌃ and are unknown. Then ⇤ has to be estimated, which is done as:

⇤ = 1 T ⇡1 ⇡2 ⇡3 , where: ⇡1 = N X i=1 N X j=1 ⇡_ij1, ⇡_ij1 = AsyVar[pT sij], ⇡2 = N X i=1 N X j=1 ⇡_ij2, ⇡_ij2 = AsyCov[pT s_ij,pT sij], ⇡3 = N X i=1 N X j=1 ⇡_ij3, ⇡_ij3 = ( ij ij)2, and ⇡1

ij, ⇡ij2 and ⇡3ij need to be consistently estimated to calculate ⇡1, ⇡2 and ⇡3_{. These can be consistently estimated as:}

ˆ ⇡_ij1 = 1 T T X t=1 ((Rit µi)(Rjt µj) sij)2,

Then the diagonal of ⇡_ij2 is estimated as: ˆ ⇡_ii2 = ˆ⇡1_ii and for i6= j, as:

ˆ ⇡_ij2 = 1 T T X t=1 sj0s0(Rit µi) + si0s0(Rjt µj) si0sj0(R0t µ0) S2₀ (R0t µ0)(RiT µi) ⇤(Rjt µj) sijsij, where sj0is the covariance between the market and stock j and µ0is the sample mean of the market returns. Furthermore µi and µj are the sample means of assets i and j. Finally ⇡_ij3 can be consistently estimated by its sample counter-part. When all calculations are completed, the optimal shrinkage intensity ⇤ can be calculated.

(21)

3.4 Equally-weighted portfolio

Then the sixth method that is investigated is the equally-weighted (EW) port-folio. The EW portfolio is investigated in this thesis, because most previous empirical research on the comparison of non-parametric portfolio optimization methods also compare these non-parametric optimization methods with the EW portfolio, because it performs very well in terms of the Sharpe ratio and the portfolio turnover. The EW portfolio is also used as a benchmark method in the pair-wise Sharpe ratio test, to test all optimization methods against pair-wise.16 The EW portfolio invests an equal proportion of the total wealth in all stocks that are available. This means that every time the EW portfolio is rebalanced17, the proportion of wealth invested in all stocks wi is the same. So whenever the portfolio is rebalanced according to the EW portfolio, there holds that:

wi = 1 N (i = 1, . . . , N ), N X i=1 wi = 1. 3.5 Comparison

This section describes how the four non-parametric portfolio optimizations methods, the one semi-parametric optimization method and the equally-weighted portfolio are compared to each other. First, the optimal portfolio weights for all investigated methods in this thesis are calculated as described before. Then, to be able to compare the methods which each other, the optimal portfolio returns and the returns for the EW portfolio need to be calculated. This is done with walk forward optimization, which is described below. Thereafter the Sharpe ratio will be explained, which is used as the financial performance measure. Finally the pair-wise and the multiple Sharpe ratio tests are described which are used to test if the true Sharpe ratios are significantly di↵erent.

3.5.1 Walk forward optimization

In this thesis walk forward optimization (WFO) is used to calculate the opti-mal returns with for all the investigated optimization methods. Walk forward optimization is based on multiple backtests performed on the 10 years of his-torical stock data18. Every separate backtest consists of in-sample data and out-of-sample data, which together form a time window. The in-sample data is the data with which the optimal parameters are calculated, which in this thesis are the optimal portfolio weights, and the out-of-sample data is used to calculate the optimal returns with for all the optimization methods. The in-sample data time window is also called the optimization time window and the out-of-sample data time window is also called the testing time window in this thesis. So the idea is that the optimal portfolio weights are calculated for

16_{See Section 3.5.3} 17_{See Section 3.5.1}

(22)

all the optimization methods based on the in-sample data which are then used to calculate the optimal returns with, with the data in the out-of-sample data time window. In this thesis, also di↵erent in-sample data time window lengths will be investigated, namely in the cases of in-sample data time window lengths (WLs) of: 1 year, 2 years, 3 years and 4 years. The out-of-sample data time window lengths for all separate backtests and for all methods is five days in this thesis, which is one trading week. This means that the portfolios of all methods are rebalanced weekly.19 Weekly rebalancing has been chosen because it is not realistic to rebalance the portfolio every day. So the optimization methods use the first year of the historical data to optimize the portfolio weights, if the WL is 1 year, and then the found optimal portfolio weights are calculated for the next day. Based on this optimal weights the optimal returns for the whole week are calculated. Once these calculations are completed, the in-sample data time window is moved forward 5 days and the process is repeated. Then the process is repeated multiple times until the in-sample data time window can not be moved forward 5 days anymore, which means all optimal returns for all opti-mization methods are calculated. Then the optimized returns of all methods and the returns for the EW portfolio can be compared against each other, also for all di↵erent WLs, to see which method performs best.

To describe the length of all the optimized return vectors after the walk forward optimizations are completed for all optimization methods for one of the WLs and to describe the length of the return vector of the EW portfolio, T0 is used.20 So T0 will be used in this thesis from here on to describe the lengths of the optimized return vectors for all the optimization methods and the lengths of the EW portfolios. Since the first one, two, three or four years of T , dependent on the WL used, are only used for optimization T0 is smaller than the entire historical data length of T .

3.5.2 Sharpe ratio

To be able to compare the optimized returns of all the optimization methods and the returns of the equally-weighted (EW) portfolio with each other, a financial performance measure is needed. In this research the Sharpe ratio21 is used as the financial performance measure. The Sharpe ratio measures the ratio of the excess return to the risk of a portfolio. If for instance one of the methods that will be compared in this research, has a higher Sharpe ratio than another method that is compared, this means that the method with the higher Sharpe ratio gives a higher excess return then the compared method for the same level of risk. In the definition of the Sharpe ratio risk is measured with the variance of the excess returns. Mathematically the Sharpe ratio for method k, for k = 1,...,K and K = 6 in this thesis, is defined as:

SRk =

E[rk rf] p

var(rk rf)

, (7)

19_{WFO is not applied to the EW portfolio since it does not require optimization, but the} EW is rebalanced weekly like the optimization methods

20_T0_{has di↵erent lengths when di↵erent WLs are used.} 21_{First introduced by Sharpe (1966).}

(23)

where rk is a [T0 x 1] vector of returns calculated with method k, which are optimal returns in the case an optimization method is used and returns in the case the EW portfolio is used. Then rf contains the daily returns of the risk-free rate, which in this thesis is the daily 3-Month Treasury-bill (T-bill) rate. Note that since daily returns are used in this thesis, this means that the daily Sharpe ratio is calculated. To annualize this daily Sharpe ratio, it has to be multiplied byp252, since there are 252 trading days in one year.

When all the Sharpe ratios are calculated for all methods and all di↵erent WLs, they can be compared to each other. Because a higher Sharpe ratio is better then a lower Sharpe ratio, because you get more return for the same level of risk (variance), the highest Sharpe ratio corresponds to the best method. However, since the Sharpe ratios are calculated with estimates of the mean and the variance of the optimized excess returns of all the optimization methods and with the estimated excess returns of the EW portfolio, the true Sharpe ratios of all the methods remain unknown since the true distributions of the optimal returns and the returns of the EW portfolio are unknown. Therefore, comparison of the true Sharpe ratios is based on statistical inference. To be able to compare the true Sharpe ratios of every optimization method against the true Sharpe ratios of the EW portfolio, the pair-wise Sharpe ratio test by Ledoit and Wolfe (2008) is used. After the pair-wise Sharpe ratio tests are performed, also multiple Sharpe ratio tests are performed. Therefore, the multiple test by Wright, Yam, & Yung (2014) is used. This test compares all the Sharpe ratios of all methods including the EW portfolio simultaneously against each other. 3.5.3 Pair-wise Sharpe ratio test

This section describes the pair-wise test by Ledoit and Wolfe (2008). It will start with a description of the pair-wise Sharpe test by Jobson and Korkie (1981) and corrected by Memmel (2003), which is only valid for i.i.d. data. Then the notation used in the pair-wise Sharpe test, which is later needed to derive this test, is described. Finally the pair-wise Sharpe ratio test is derived, which is valid for i.i.d. data and non-i.i.d. data.

So before the pair-wise Sharpe ratio test by Ledoit and Wolfe is derived, first the Sharpe ratio test derived by Jobson and Korkie and corrected by Memmel is explained. This test uses the investment strategies c and q which have respec-tively the excess returns of: rtc and rtq, for t = 1, ..., T0, where the investment strategies c and q have a bivariate return distribution, the observed returns are a strictly stationary time series, T0 is length of the excess return vectors and excess returns are calculated by subtracting the risk-free rate form the returns. The distribution parameters are then:

µ =  µc µq , ⌃ =  ₂ c cq cq 2q .

(24)

Then the estimated Sharpe ratio di↵erence is equal to: ˆ = ˆSHc SHˆq = qµˆc ˆ2 c ˆ µq q ˆ2 q ,

and the true di↵erence of the Sharpe ratios is equal to: = SHc SHq = pµc₂ c µq q 2 q .

Then let u = (µc, µq, c2, 2q)0 and ˆu = ( ˆµc, ˆµq, ˆc2, ˆ2q). The standard error for ˆ can then be computed, by first calculating the asymptotic distribution as:

p

T0(ˆu u)! N(0; ⌦),d where the asymptotic covariance matrix is equal to:

⌦ = 2 6 6 4 2 c cq 0 0 cq 2q 0 0 0 0 2 4_c 2 _cq2 0 0 2 2 cq 2 4q 3 7 7 5 ,

and then applying the delta method. If the data used is i.i.d., then H0: = 0 can be tested against H1 : 6= 0.

The test described above, by Jobson and Korkie and corrected by Memmel, can no longer be used if the data are non-i.i.d. Since in this thesis the optimized returns of all optimization methods and the returns of the EW portfolio are au-tocorrelated, they are non-i.i.d.22, and therefore the test by Jobson and Korkie and corrected by Memmel can not be used. To account for non-i.i.d. data (data with heavy tails), Ledoit and Wolfe (2008) developed a pair-wise Sharpe ratio test. They use a robust inference method based on a studentized bootstrap confidence interval to test the true Sharpe ratios pair-wise against each other. Before this test is derived first the notation that is used is described, whereafter the pair-wise Sharpe ratio test by Ledoit and Wolfe is derived.

Ledoit and Wolfe work with uncentered second moments, with: ⇣c = E(r1c2 ) and ⇣q = E(r21q), and their sample equivalents: ˆ⇣c and ˆ⇣q. Then let v = (µc, µq, ⇣c, ⇣q)0 and ˆv = ( ˆµc, ˆµq, ˆ⇣c, ˆ⇣q)0. Then the estimated Sharpe ratio dif-ference is equal to:

ˆ = µˆc q ˆ ⇣c µˆc2 ˆ µq q ˆ ⇣q µˆq2 = f (ˆv), 22_{See Section 5.1}

(25)

and the true Sharpe ratio di↵erence is equal to: = p µc ⇣c µ2c µq q ⇣q µ2q , = f (v).

Then the delta method implies that: p

T ( ˆ ) _{! N(0; O}d 0f (v) Of(v)).

The standard error for ˆ can then be estimated with a consistent estimator for , and is equal to:

SE( ˆ ) = s O0_{f (ˆ}_{v) ˆ}_Of(ˆv) T0 , (8) where: O0 f (µc, µq, ⇣c, ⇣q) = ( ⇣c (⇣c µ2c)1.5 , ⇣q (⇣q µ2q)1,5 , 1 2 µc (⇣c µ2c)1,5 ,1 2 µq (⇣q µ2q)1.5 ). Then the pair-wise Sharpe ratio test by Ledoit and Wolfe (2008) can be derived. They propose two methods to test if the true Sharpe ratios are sig-nificantly di↵erent from each other, both are based on bootstrap23_{. The idea} is that the null hypothesis (H0 : = 0) is tested against the alternative hy-pothesis (H1 : 6= 0), which means that H0 is rejected if the two tested true Sharpe ratios are significantly di↵erent from each other. This can be tested by constructing a two-sided studentized bootstrap confidence interval for the true di↵erence of the Sharpe ratios ( ), with a nominal coverage probability level of (1-↵), and if 0 is not contained in this interval H0 is rejected. Or a two sided bootstrap p-value (pv) can be calculated, which rejects H0 if pv < ↵. In this research ↵ is equal to 5%. To construct the two-sided studentized bootstrap confidence interval, the distribution of the studentized test statistic has to be approximated, which is done as follows:

L(| ˆ | SE( ˆ ))⇡ L(

| ˆ⇤ _| SE( ˆ⇤₎),

where ˆ⇤ _{is the estimated di↵erence of the bootstrap Sharpe ratios, ˆ is the} estimated di↵erence of the estimated Sharpe ratios from the original data and

is the true di↵erence of the Sharpe ratios. Then z_|.|,⇤ can be defined as the quantile ofL(| ˆ_{s( ˆ}⇤ _⇤₎|), whereL(⇥) is the distribution of the random variable ⇥. With this information, the 1 ↵ confidence interval for can be constructed as:

ˆ ± z⇤

|.|,1 ↵SE( ˆ ). (9)

(26)

This method is used because z_{|.|,1 ↵}⇤ is bigger then z_{1 ↵/2} when the data that are used are heavy-tailed or autocorrelated. Then the bootstrap standard error is calculated as: SE( ˆ⇤) = s O0_{f (ˆ}_u_⇤_{) ˆ}_⇤_Of(ˆu_⇤₎ T0 . (10)

The alternative to constructing a confidence interval is to compute a p-value. This 2-sided p-value is calculated based on U bootstraps resamples,

with u = 1,....,U , as:

pv = { ˜d⇤,u d} + 1

U + 1 , (11)

where d⇤u is the uth bootstrap sample, calculated as: ˜

d⇤u₌ ˆ⇤,u ˆ SE( ˆ⇤u₎. In this equation d is the original test statistic:

d = | |ˆ SE( ˆ ).

The standard error of ˆ , SE( ˆ ), is computed as in (8) where the estimator of the limiting covariance matrix ( ˆ ) is calculated by kernel estimation with the prewhitened QS kernel. The standard error for ˆ⇤, SE( ˆ⇤_{), is calculated} as in (9) where ˆ⇤ _{needs to be estimated. Ledoit and Wolfe use circular block} bootstrap to generate bootstrap data. This means that blocks of return pairs are resampled for every bootstrap sample instead of individual pairs. The blocks that are resampled have a fixed size of (b 0). Then the estimated bootstrap covariance matrix is calculated as:

ˆ⇤₌ 1 l l X j=1 j 0 j, with: = p1 (b) b X t=1 y_{(j 1)b+t}⇤ , for (t = 1,...,l), l = [T0 b ], and: y_t⇤ = (r⇤_tc µˆ⇤ c, rtq⇤ µˆ⇤q, rtc⇤2 ˆc⇤, r⇤2tq ⇣ˆq⇤) (t = 1, . . . , T0),

where ˆu⇤ = (ˆµ⇤_c, ˆµ⇤_q, ˆ⇣_c⇤, ˆ⇣_q⇤) is the estimator of u = (µc, µq, ⇣c, ⇣q) based on the bootstrap data and (r⇤_tc, r_tq⇤, r_tc⇤2, r_tq⇤2) are the bootstrap estimates of (rtc, rtq, r2tc, rtq2). The only thing left to calculate SE( ˆ⇤) is the optimal block size. The algorithm that is used to calculate the optimal block size for the circular block bootstrap method is described below.24

24_{For the calculations in Matlab, code from Ledoit and Wolfe is used which can be found on} the website from M. Wolf. Also the code for the calculations of the p-values for the pair-wise Sharpe ratio tests by Ledoit and Wolf can be found on his site.

(27)

To find the optimal block size b, the following function has to be minimized: |g(b) (1 ↵)_|,

where g(b) is a function conditional on the true underlying probability mech-anism F which has generated the sample size T0, the T0 sample return pairs and the nominal confidence level 1 ↵ and has a true coverage level of 1 . However, the function g(.) is unknown and has to be estimated with bootstrap, which is done as follows:

(1) Fit a semi-parametric model ˆF to the observed sample returns. (2) Fix a selection of reasonable block sizes b.

(3) Generate P -bootstrap samples from the T0 sample return pairs of the in-vestment strategies c and q. Then compute a confidence interval CIp,b for each p = 1,....,P and b, with nominal level 1 ↵ for ˆ .

(4) Calculate: g(b) = #{ ˆ 2 CIk,b}/P .

(5) Finally compute b that minimizes: _|ˆg(b) (1 ↵)_|.

In the algorithm P = 1000 is used in this thesis. Then the optimal block size can be found, which is one of the following block sizes: {1, 2, 4, 6, 8, 10}.25 3.5.4 Multiple Sharpe ratio test

The multiple Sharpe ratio test by Wright, Yam, & Yung (2014) is used to compare the true Sharpe ratios of all the investigated methods in this thesis, including the EW portfolio, against each other simultaneously. This way it can be tested if at least one true Sharpe ratio is significantly di↵erent from the rest, but it is unknown which one(s). Their multiple test is an improvement of the multiple test by Leung and Wong (2006), which is only valid for i.i.d. data. Wright,Yam, & Yung (WYY) extended the test by Leung and Wong for the case the data used is non-i.i.d. Since the data that is used in this thesis are autocorrelated, where the data are the optimized returns of every optimization method and the returns of the EW portfolio, the test by WYY is used in this thesis.26 _{In this section the improved multiple Sharpe ratio test will be derived.} First the asymptotic distribution of the estimated Sharpe ratios will be derived, after which the the hypothesis tests are defined for both the i.i.d. case and the non-i.i.d. case.

First WYY use the same definition of the Sharpe ratio as in (7), namely: SRk =

E[rk rf] p

var(rk rf) ,

where rk is an optimized return vector in the case an optimization method is used and a return vector in the case the EW portfolio is used. Then k = 1,...,K where K = 6 in this thesis because six methods are investigated in total, and rf is the risk-free rate which in this thesis is the 3-month T-bill rate. Then the sample means of the excess returns and the sample means of squared excess

25_{The optimal block size was found to be 10, see Section 5}

(28)

returns for all the methods k = 1,...,K are defined as follows: ˆ mk 1 = 1 T0 T0 X t=1 rkt, ˆ mk₂ = 1 T0 T0 X t=1 (rkt)2,

where rkt is the excess return of method k on day t, for all t = 1,...,T0. As described before T0 is the length of the optimized return vectors of all opti-mization methods and the length of the return vector of the EW portfolio. Then all moments are collected in a vector:

¯ rT0 =

⇥ ˆ

m1₁ . . . mˆK₁ mˆ1₂ . . . mˆK₂ ⇤,

and let µ denote the population analogue. Then holds by the multivariate central limit theorem that:

p

T0( ¯rT0 µ) d

! N(0, ⌃), where ⌃ is the asymptotic covariance matrix:

⌃ = 

⌃11 ⌃12 ⌃0₁₂ ⌃22 ,

with ⌃11 the asymptotic covariance matrix of the excess returns of size

[K x K], ⌃22 the asymptotic covariance matrix of the squared excess returns of size [K x K] and ⌃12 the asymptotic covariance matrix of the excess returns and the squared excess returns together which is also of size [K x K]. Then follows by the multivariate delta method and the definition of the Sharpe ratio

that: _p

T0( ˆSR SR) ! N(0, ⇤⌃⇤d 0

), (12)

where ˆSR is the [K x 1] vector of estimated Sharpe ratios for all methods and ⇤0 is defined as: ⇤0 = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 m1 2 (m1 2 (m11)2) 3 2 0 ... 0 0 m22 (m2 2 (m21)2) 3 2 .. . .. . . .. 0 0 . . . 0 mK2 (mK 2 (mK1 )2) 3 2 m1 1 2(m1 2 (m11)2) 3 2 0 ... 0 0 m21 2(m2 2 (m21)2) 3 2 .. . .. . . .. 0 0 . . . 0 mK1 2(mK 2 (mK1 )2) 3 2 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 .

(29)

The multiple Sharpe ratio test that WYY use, tests if all the true Sharpe of all methods are the same simultaneously. They define the null hypothesis as:

H0: SR1= SR2 = ... = SRK,

where K is total amount of methods that are tested. In this thesis there are six methods tested, so K=6 in this thesis. Then H0 can be reformulated, based on K=6, as: H0: ⇧⇤ SR = 2 6 6 6 4 SR1 SR2 SR2 SR3 .. . SR5 SR6 3 7 7 7 5= 0, where: ⇧ = 2 6 6 6 4 1 1 0 . . . 0 0 1 1 . . . 0 .. . . .. ... ... 0 . . . 1 1 3 7 7 7 5. Then the alternative hypothesis is defined as:

H1: ⇧⇤ SR = 2 6 6 6 4 SR1 SR2 SR2 SR3 .. . SR5 SR6 3 7 7 7 56= 0. In the case of i.i.d. data H0 is rejected if:

T2= T0(Q⇤ ˆSR)0(⇧⇤ ˆ⌦⇤ ⇧0) 1(⇧⇤ ˆSR) > 2K 1(↵), where T2_{is the test statistic, T}

0is the length of the excess returns of all methods, ˆ

SR are the estimated Sharpe ratios and 2_{K 1}(↵) is the chi-squared distribution with K-1 degrees of freedom, ↵ is the significance level and ˆ⌦ is the estimator of the asymptotic covariance matrix. This asymptotic covariance matrix is calculated as in (12), by ⇤0⌃⇤, where ⇤ and ⇤0 can be be calculated by replacing their entries with their sample estimates. Then, ⌃ can be estimated by the sample covariance matrix of the excess returns and the squared excess returns. Finally the significance level ↵ is chosen to be 5% in this thesis. Then the test statistics and the critical values can be calculated. When T2 2_{K 1}(↵) then H0 is rejected and H1 is true, which means at least one true Sharpe ratio is significantly di↵erent from the other true Sharpe ratios. It is also possible that H0 is not rejected and this means that all true Sharpe ratios are the same.

If the data (the excess returns for every method), are not i.i.d., then ⌃ can be consistently estimated by using the Newey-West covariance matrix. The

(30)

asymptotic Newey-West covariance matrix estimator is calculated as: ˆ ⌃ = ˆV0+ m X l=1 (1 l m + 1)[ ˆVl+ ˆV 0 l], ˆ Vl= 1 T0 T0 X t=l+1 h⇤(rt, ¯rT0)h⇤(rt l, ¯rT0) 0 , where m = T 1 3

0 and h⇤(.) are the moment conditions,

h⇤_k = rk ⌥k (k = 1, ..., K, K + 1, ..., 2K). 3.5.5 Transaction costs

Also the transaction costs for all methods and for all di↵erent WLs are inves-tigated. The proxy used for the transaction costs in this thesis is the portfolio turnover, which is calculated based on the absolute value of the rebalancing trades as follows: ⌧ (t) = N X i=1 |wi(t) wi(t )|,

where wi(t ) and wi(t) are the portfolio weights of asset i before and after rebalancing at time t. To be able to compare the transaction costs of every method with each other, the average portfolio turnovers for all methods are calculated as: ¯ ⌧j = 1 T0 T0 X t=1 ⌧j(t), (13)

when daily rebalancing is used. When weekly rebalancing is used the average transaction costs for all methods and for all WLs are calculated as the average of all transaction costs whenever the portfolio is rebalanced, or in other words, the average of the weekly transaction costs.

4 Data

In this section the data that is used in this thesis is described. The data consists of historical stock returns from 26 stocks from the Dow Jones (DJ) index, collected over a period of 10 years, where each year consists of 252 trading days. The stock returns are collected from 1-December-2004 until 1-December-2014, so also the financial crisis from 2007-08 is included. The 3-Month Treasury-Bill (T-Bill) rate, which is used as the risk-free rate, is collected over the same 10 years. The stock returns data are downloaded from Yahoo Finance. Of the 30 stocks in the DJ index at 1-December-2014, there are 26 stocks used in this thesis. Four of this 30 stocks are not used because there was incomplete data for these stocks due to random missing stock returns data and/or the particular stock was not introduced yet to the DJ index at 1-Dec-2004. The following four stocks from the DJ index at 1-Dec-2014 have not been used, described by their

Non-parametric portfolio optimization

Non-Parametric Portfolio Optimization

Contents

1

Introduction

2

Literature Review

3

Methods

4

Data