Better than In-Sample Optimal Index Tracking with a Simple Heuristic

(1)

Index Tracking with a Simple

Heuristic

Monday, August 31, 2016

(2)

(3)

Index Tracking with a Simple

Heuristic

Monday, August 31, 2016

Nick W. Koning

S2038404

Abstract

(4)

1 Introduction

A stock index, like the Standard & Poor’s 500, is a statistical measure that summarizes the value of a stock market. Stock indexes are com-monly constructed by taking a (capitalization) weighted average of a collection of stocks that is representative for the entire market. As they capture the overall performance of the market, actively managed funds are often benchmarked against indexes. However, it is still a much researched question whether actively managed funds outperform stock indexes in the long run, especially when considering management fees (see e.g. Kosowski et al., 2006; Barras et al., 2010; Fama and French, 2010; P´astor et al., 2015).

As it remains uncertain whether the benefits of active an active invest-ment strategy outweighs the costs, many investors would be satisfied by mimicking the performance of a stock index. However, stock indexes are mathematical constructs that are not traded on a stock market. So, to fully replicate the index, one would have to purchase stocks of the constituents in proportions that equal their weight in the index. This is costly for two reasons: first, stocks are only traded in integer quantities, so a large investment is required to ensure that the propor-tions sufficiently match the weights in the index. Second, there may be high transaction and administrative costs involved with managing such a large portfolio.

(6)

also been used by others (Xu et al., 2016; Ruiz-Torrubiano and Su´arez, 2009). I will refer to this stylized formulation as the “Sparse Index Tracking Problem”. In this problem, the tracking error of a tracking portfolio is minimized, given that the portfolio consists of no more than k different constituents. This tracking error and the restrictions are dis-cussed in detail in Section 2.

To construct such a tracking portfolio, historical data of the index and its constituents can be used. Using this data, a portfolio can be selected that minimizes the tracking error within the sample. Unfortunately, the presence of a sparsity constraint makes finding the optimal port-folio NP-hard. This implies that the problem cannot be easily solved to optimality in practical situations. Hence, many heuristics and al-gorithms have been proposed to find a good solution. In Section 3, I will provide a short overview of some of these attempts and discuss two approaches in-depth.

One characteristic that is shared by all previous work is the fixation on obtaining optimal in-sample performance. I propose a different view on the problem and argue that optimal in-sample performance does not guarantee great out-of-sample performance due to the non-stationarity of the index tracking problem. Since the best out-of-sample perfor-mance is the desired goal, I propose a simple heuristic that performs equally or better out-of-sample than the optimal in-sample portfolio and another state-of-the art approach. This heuristic and its out-of-sample performance are discussed in Sections 3.4 and 3.5, respectively.

(7)

heuristic. Additionally, I consider the changes in constituent selection when varying the maximum amount of constituents permitted in the portfolio. Finally, I look at how the selection of constituents may vary over time.

A possibility to further improve the out-of-sample performance is to use data weighting. By decreasing the importance of data in the distant past that may no longer be relevant, a tracking portfolio with better performance for the future may be selected. An empirical analysis com-paring different weighting schemes is conducted in Section 5.

(8)

2 Index Tracking

In this section, the formulation of the index tracking is discussed. Ad-ditionally, various alternative approaches from the literature are con-sidered.

As mentioned in the introduction, historical data on the stock index and its constituents is required to construct a good tracking portfolio. To formalize the index tracking problem, the data will be defined as follows. Let the n-vector y contain the returns of the index over a time period from 1 to n. Furthermore, let the n × p-matrix X denote the returns of the constituents, where each row is a moment in time and each column contains all observations for a specific constituent. Here, the return of constituent i at time period t is defined as

Xt,i =

Pt,i− Pt−1,i

Pt,i

,

where Pt,i is the price of one share of constituent i at time t. Finally, let

the p-vector β_t contain the portfolio weights. These portfolio weights represent the proportion of the value of a portfolio that is given to each constituent. In this section, we will only consider a fixed time frame, so β_t is shortened to β. A moving time frame is considered in Section 4.3.

(9)

2.1 The Objective Function

The first component of the problem is the objective. As we are inter-ested in tracking an index as closely as possible, an obvious objective is to minimize the difference between the true returns of the index and the returns of the portfolio. However, there are multiple different ways to measure this “tracking error”. The traditional measure is the α-norm, defined as ky − Xβkγ = n X t=1 |yt− Xtβ|γ !1_γ ,

for some given vector of weights β, where yt and Xt denote the tth

el-ement and tth _{row of y and X, respectively. Meade and Salkin (1990),}

Roll (1992) and Beasley et al. (2003) recommend selecting α = 2, which is equivalent to the Euclidean norm and uses quadratic errors. As this is the most commonly used objective, it will be used to define the Sparse Index Tracking Problem.

Nevertheless, other formulations have also been considered. For exam-ple, Rudolf et al. consider 4 formulations of the tracking error using absolute deviations. Alternatively, Barro and Canestrelli (2009), Ro-hweder (1998) and Adcock and Meade (1994) include transaction cost as a penalty in the objective function in addition to the tracking error. This is also discussed by di Tollo and Maringer (2009). Related to this, Lobo et al. (2007) suggest to minimize the transaction cost, given a bound on the tracking error. Instead, the Sparse Index Tracking Prob-lem captures these tracking errors in the sparsity constraint, which will be introduced in Section 2.3.

(10)

error and the excess return min

β λky − Xβkγ+ (1 − λ)ι 0

(y − Xβ) ,

where ι is a conformable vector of ones and λ ∈ R. This is usually referred to as enhanced index tracking (Canakgoz and Beasley, 2009). Similarly, Fang and Wang (2005) use a bi-objective model including both the excess return and tracking error.

Additionally, Wang (1999) creates an objective function that tracks multiple indexes. Furthermore, Jansen and Van Dijk (2002) and Cole-man et al. (2006), consider index tracking when using a portfolio con-structed with a discrete number of stocks.

To conclude, there are many possible objectives for index tracking. But, in this thesis only the quadratic tracking error objective

min

β ky − Xβk2

is considered. Solving this unconstrained problem is trivial. However, the index tracking problem features some constraints on the portfolio. These constraints are discussed in the next subsections.

2.2 The Portfolio Constraint

Let us now continue to the constraints. The first constraint to be considered is the portfolio constraint:

(11)

where ι is a p-vector of ones. This constraint ensures that the weights contained in β sum up to 1. This way, each element of β represents the share of total investment dedicated to its corresponding constituent. So, the ith _{constituent represents β}

i of the total value of the portfolio.

Note that this implies that, due to price changes, the portfolio has to be rebalanced continuously to ensure that the weights remain equal to the portfolio weights. In practice, such frequent rebalancing would lead to excessive transaction costs. Hence, the portfolios are often rebalanced after a fixed period of time or if the difference between the portfolio weights and desired weights becomes too large.

To readers unfamiliar with index tracking, this constraint may be recog-nised from the famous mean-variance portfolio selection problem (Markowitz, 1952), given by min β β 0 X0Xβ s.t. ι0β = 1 m0β = ¯m, (1)

where ¯m is some desired level of return and m0 = _n1ι0X. In fact, the (unconstrained) index tracking problem

min

β ky − Xβk2

s.t. ι0β = 1,

is a generalization of the mean-variance problem. To see this, let us rewrite the objective function

ky − Xβk2 =

p

(12)

Note that minimizing this objective function is equivalent to minimizing (y − Xβ)0(y − Xβ).

Writing out the brackets and removing the constant yields the opti-mization problem min β β 0 X0Xβ − 2y0Xβ s.t. ι0β = 1.

For the special case y0 = λ_nι0, where λ is the optimal Lagrange multiplier associated with the second constraint in (1), the index tracking problem is equivalent to the mean-variance portfolio selection problem.

2.3 The Sparsity Constraint

The presence of a constraint that limits the number of constituents is existential to the index tracking problem. Without introducing such a limitation, a full replication of the index would be outcome of the problem. However, fully replicating an index comes with high costs.

By introducing a sparsity constraint, the cost of tracking the index may be reduced. In practice, it is seen that even index funds and exchange traded funds that can pool overhead costs often own only a fraction of the constituents of the underlying index (Rudd, 1980; Fuller, 2008). The costs involved with tracking an index is twofold: the transaction cost of rebalancing the portfolio and administrative cost. I will discuss the former in more depth.

(13)

there are often transaction cost involved with every trade, rebalancing is much costlier when having a large number of different constituents. Especially when small capitalization stocks are involved, due to their low liquidity (Keim and Madhavan, 1997; Keim, 1999).

Additionally, some constituents may be dropped from the index and replaced by others. Under full replication this would always force a trade, while under partial replication there will only be a transaction if it was included in the portfolio.

Although it is possible to explicitly include the rebalancing cost, one would have to model the rebalancing process and have accurate data on transaction costs. Using a sparsity constraint as a proxy yields a far more stylized model. The model including this constrained is given by

min β (y − Xβ) 0 (y − Xβ) s.t. ι0β = 1, kβk0 ≤ k,

where k · k0 denotes the number of non-zero elements of its argument

and k ∈ N is a parameter. This leaves one final constraint to allow the formulation of the Sparse Index Tracking Problem.

(14)

may have a strong negative correlation with many other constituents. Another reason can be that it is caused by the small sample size.

Jagannathan and Ma (2003) show that placing non-negativity restric-tions on portfolio weights may improve performance in finite samples, even when there are negative weights in the true population. Addi-tionally, as portfolios with short positions are difficult to implement, portfolio managers often self-impose no-short-positions constraints (Ja-gannathan and Ma, 2003). Hence, there exist arguments from both performance and practice to include the no-shorting constraint

β ≥ 0, in index tracking.

The inclusion of a no-shorting constraint in addition to the portfo-lio constraint (see Section 2.2) limits the maximum size of individual weights to 1. However, it may also be worthwhile to consider further restrictions on the size of the individual weights. In practice, a solution with a tracking portfolio that assigns a very large weight to a single share would likely be mistrusted and considered a fluke. By placing a “ceiling” constraint M on the maximum size of every weight, this problem may be avoided. Furthermore, it is also possible to place a minimum or “floor” constraint on the non-zero weights, which is con-sidered by Maringer and Oyewumi (2007).

(15)

may also impose sparsity as it sets many otherwise negative weights to exactly zero.

Another benefit of the floor and ceiling constraints is the reduction of the feasible region of the problem. By choosing tighter bounds on the permissible values of the weights, one can significantly boost the speed of the Mixed-Integer Optimization software. This is further discussed in Section 3.2.

By including a no-shorting constraint and a ceiling constraint, this brings us to the final formulation of the Sparse Index Tracking Problem:

(16)

3 Approaches to index tracking

Many different approaches have been considered for index tracking. In this section I will first discuss some of the previous work done to solve the index tracking problem. After this, mixed integer quadratic op-timization is treated in more detail. Furthermore, a recent approach based on non-monotone projected gradient descent (Xu et al., 2016) is explained in detail. Loosely based on this approach, I then propose a simple heuristic to the Sparse Index Tracking Problem. Finally, I com-pare the out-of-sample performance of this heuristic with the mixed-integer quadratic optimization and non-monotone projected gradient descent based methods.

3.1 Previous work

Due to the many different formulations and interpretations of the index tracking problem, there have been very diverse attempts in solving it. In this subsection I will give a short overview of some of these attempts.

(17)

effort. For an overview of meta heuristics in index tracking, see di Tollo and Maringer (2009).

In a regularization based approach, Wu et al. (2014) apply the non-negative lasso to index tracking. Similarly, Wu et al. (2014) use the non-negative elastic net. However, both papers make no mention of a portfolio constraint, which could cause conflict with the `1-penalization

(Chen et al., 2013; Fastrich et al., 2014). It is also not clear whether the penalty parameters were selected in such a way that the portfolio constraint is enforced. In an alternative regularization based approach, Fastrich et al. (2014) apply q-norm regularization to promote sparsity.

Another approach which has only recently become feasible is Mixed-Integer (Quadratic) Optimization (MIO) as a primary method to find a solution. This is discussed and implemented by Canakgoz and Beasley (2009). In the next subsection I will discuss MIO in more detail.

3.2 Mixed-Integer Optimization

(18)

The Mixed-Integer Quadratic Optimization formulation for the Sparse Index Tracking Problem is given as follows,

min β β 0 X0Xβ − βX0y s.t. ι0β = 1 ι0z ≤ k 0 ≤ βi ≤ M zi, i = 1, . . . , p z ∈ {0, 1}k. (2)

This formulation can be entered into a solver like Gurobi, which can solve this problem to optimality. Unfortunately, although a good (and perhaps even optimal) solution is often found relatively quickly, it may take very long to confirm its optimality. To give an example of the scale: an optimal solution found in a few seconds may take hours to be confirmed optimal. Fortunately, it is also possible to simply terminate the solver after a specified time. In addition to the best known feasible solution, it will also report its currently known bounds on the optimal value of the solution. However, these bounds are not of much use to the index tracking problem.

(19)

solution. Unfortunately it is impossible to know for sure whether the optimal solution is excluded unless the optimal solution is known or M = 1.

However, it is unlikely in practice that the weight of a single constituent would be substantially higher than any of the others. As k increases, the unit mass of available weight is allowed to be spread over more vari-ables, so the maximum weight is expected to decline in k. Of course, some of the weights may still be higher than others, but I found that setting M = max 2

K, 0.1 gave reliable results.

In Section 3.5, MIO is applied to a series of empirical datasets and used as a benchmark for the two other algorithms.

3.3 Non-monotone Projected Gradient Descent

This subsection discusses the non-monotone projected gradient descent algorithm proposed by Xu et al. (2016) applied to the Sparse Index Tracking Problem. To my knowledge, their algorithm yields the best known out-of-sample performance for the Sparse Index Tracking Prob-lem. In this section I will explain this approach and present their algo-rithm. The work of Xu et al. (2016) is somewhat related to Bertsimas et al. (2016) who consider subset selection in general, and I will follow most of their notation.

(20)

Prob-lem by ∆M_k =      kβk0 ≤ k β : ι0β = 1 0 ≤ β ≤ M      .

Next, consider again the quadratic objective function for index tracking f (β) = (y − Xβ)0(y − Xβ).

This function has a Lipschitz continuous gradient, which implies that there exists an ` > 0 s.t.

k∇f (β) − ∇f (eβ)k ≤ `kβ − eβk.

Now suppose we have some current solution β, then it is possible to construct an upper bound of f (η): for a convex function f (β) ≥ 0 with a Lipschitz continuous gradient we have that for all L > `

f (η) ≤ QL(η, β) ≡ f (β) + L 2kη − βk 2 2 + ∇f (β) 0 (η − β), (3) which gives us an upper bound for f (η). Note that this inequality holds with equality if β = η. This upperbound may then be minimized given the constraints of the index tracking problem. However, before this is done some more notation has to be introduced.

(21)

is introduced by Xu et al. (2016) to correct weights that don’t fulfil the floor and ceiling constraints. If a weight is below the floor value 0 of the no-shorting constraint, the weight is set equal to 0. Alternatively, if it is higher than the ceiling constraint value M , it is set equal to M .

Furthermore, Xu et al. (2016) propose a simple way to find

λ∗ = ( λ : X i∈I∗ Π[0,M ](βi+ λ) = 1 ) ,

where I_k∗ contains the indices of the k largest elements of β. In words, λ∗ is the value that should be added to each of the k largest coefficients to ensure that output of Π[0,M ](βi) satisfies the portfolio constraint.

Next, the hard-thresholding function

HM_k (β) =    Π[0,M ](βi+ λ∗) if i ∈ I∗, 0 otherwise,

can be introduced. This function returns a vector with every element that corresponds to one of the k highest elements in β set equal to Π[0,M ](βi+ λ∗), and all other elements equal to zero. This imposes the

sparsity constraint.

(22)

constraint set ∆M k is equal to b β_L= arg min η∈∆M k QL(η, β) = HM_k β − 1 L∇f (β) .

For a derivation of this result see Bertsimas et al. (2016). Applying non-monotone projected gradient descent (NPG) (Birgin et al., 2000) to HM_k β − _L1∇f (β), the upperbound given in 3 can be minimized. Let parameters Lmax > Lmin > 0, τ > 1, ε > 0, c > 0, S > 1 and

N ∈ N+ _{be given. Given data X, y, ∆}M

k , the NPG based algorithm is

then given in Algorithm 1. Data: X, y, ∆M

k , N, τ, ε, Lmin, Lmax, S

Result: β∗

Set n equal to the number of elements in y; k = 1; Randomly choose β₀ ∈ ∆M k ; while s < S and 1_nky − Xβsk −_n1ky − Xβs−1k < ε do Randomly choose Lj ∈ [Lmin, Lmax];

(23)

This non-monotone projected gradient descent algorithm is equivalent to projected gradient descent for N = 1, requiring a monotone im-provement in every step. However, setting N > 1 permits for a worse solution to be accepted, as long as its objective value is better than the worst of the past N solutions.

Bertsimas et al. (2016) use a similar approach for the regular sub-set selection problem without floor/ceiling constraints or a portfolio constraint. Using the solution of their algorithm as a warm start for a mixed-integer optimization solver, they find a substantial speed-up compared to not providing a warm start. Unfortunately, using the solu-tion from the algorithm by Xu et al. (2016) as a warm start for an MIO solver does not provide any noticeable speed-up. The reason for this is that its in-sample performance is relatively bad, causing the solution to be discarded immediately.

(24)

3.4 The Non-Negative Least Squares Heuristic

In this subsection, I propose a simple heuristic based on Non-Negative Least Squares (NNLS) as an alternative approach to the Sparse Index Tracking Problem. The idea behind the heuristic is to approach the sparse index tracking problem to find the most robust solution instead of the optimal in-sample solution. The heuristic can be computed in three steps.

The first step is to perform NNLS on the data, which yields the least squares solution given that all parameter are non-negative:

b

β_NNLS = arg min

β≥0

(y − Xβ)0(y − Xβ).

In an index tracking context, this imposes the no-shorting constraint. In addition to imposing the no-shorting constraint, NNLS also has the benefit that it is feasible even if p >> n and produces sparse solutions. Similar to Lasso, it will select at most min(n, p) non-zero variables, however this should be more than sufficient in practical index tracking applications. Finally, it may be efficiently computed using an algorithm proposed by Lawson and Hanson (1995), which is implemented in the R-package ‘nnls’ (Mullen and van Stokkum, 2012).

(25)

b βk_NNLS, with elements b β_NNLS,ik =    b βNNLS,i if i ∈ Ik, 0 otherwise,

where bβNNLS,i denotes the ith element of bβNNLS.

Now the vector bβk_NNLS may not confirm to the portfolio and ceiling constraints as described in Section 2.2 and 2.4, respectively. In other words, they may not add up to 1, and some individual weights may exceed M . In the final step, this is resolved by applying the function Π[0,M ](·) as defined in (4) to the non-zero elements of bβ

k

NNLS. The

outcome of the NNLS Heuristic is then given by eβk,M_NNLS with elements

e β_NNLS,ik,M =    Π[0,M ]( bβNNLS,ik + λ∗) if i ∈ Ik, 0 otherwise,

where λ∗ is selected to ensure that the coefficients add up to 1:

λ∗ = ( λ : X i∈Ik Π[0,M ]( bβNNLS,ik + λ) = 1 ) .

(26)

Data: X, y, k, M Result: β∗ begin b β_NNLS= arg min η≥0 {ky − Xηk2 2};

Let Ik _{contain the k largest elements of b}_β NNLS; λ∗ = n λ :P i∈IkΠ[0,M ]( bβNNLS,ik + λ) = 1 o ; Let p be equal to the number of columns of X; for i ∈ {1, . . . , p} do β_i∗ =    Π[0,M ]( bβNNLS,i+ λ∗) if i ∈ Ik, 0 otherwise, end β∗ = (β₁∗, . . . , β_p∗)0 end Algorithm 2: NNLS Heuristic

Instead of applying many projected gradient descent steps as proposed by Xu et al. (2016), this algorithm requires just a single run of non-negative least squares. In contrast to their approach that requires 7 parameters to be chosen beforehand, the NNLS Heuristic requires no tuning of parameters.

(27)

fixation on in-sample optimality due to overfitting on the sample.

I hypothesize that selecting the optimal set of portfolio constituents in-sample, makes the solution less generalizable to future observations. So, I instead prioritize the selection by picking the constituents with the largest coefficients to obtain a robust selection. In the next subsection, this heuristic is compared with the MIO and NPG based approaches on empirical data.

3.5 Comparison

In this subsection, the performance of the NNLS-algorithm is compared with the performance of MIO and NPG. For this comparison, data from the OR-library (Beasley, 1990; Beasley et al., 2003; Canakgoz and Beasley, 2009) is used. This data consists of 290 weekly observations of the returns of 8 indexes and their varying number of constituents. The number of constituents varies from 31 in the smallest dataset, to 2151 in the largest dataset. Similarly as in Ruiz-Torrubiano and Su´arez (2009), Xu et al. (2016), the data is split into two sets of 145 observa-tions. The first 145 observations are used to fit the model. The second 145 observations are used to measure the out-of-sample performance. This performance is measured by the mean-squared error (MSE).

The values for k were chosen in the same way as Xu et al. (2016) and Ruiz-Torrubiano and Su´arez (2009), with the exception of the S&P500 dataset. For this dataset, the values they chose were so high that the cardinality constraint was often not binding.

(28)

the six smallest datasets were completed on an Intel 3.40 GHz 4-core Intel i5-4670K CPU, while the computations on the two largest datasets were made on an Intel 3.50 GHz 4-core i5-4690K CPU. Both of these are standard home computer processors.

The MIO results were obtained by using Gurobi 6.5 through its R inter-face. For each value of k, the MIO-solver was terminated if optimality was confirmed or when the runtime exceeded 1800 seconds. The time limit of 1800 seconds was chosen because of practicality, but it must be noted that on the five smallest datasets, no better solution was found after 200 seconds in any case. For the MIO algorithm, the very wide ceiling constraint M = max _K2, 0.1 was chosen. Although it is possi-ble that this caused a cut-off of an optimal solution, this seems highly unlikely.

The NPG results were obtained using the same settings as used by Xu et al. (2016): Lmin = 10−8, Lmax = 108, τ = 2, c = 10−4, ε = 10−6,

S = 10000 and N = 3 for the smallest five datasets, while N = 5 was used for the largest three datasets. However, instead of using a single run with randomized initialization of β₀ and L0 as suggested

by Xu et al. (2016), I performed 100 runs with random initialization and averaged the resulting mean-squared error. This adjustment was chosen because using a single run yielded unstable outcomes. It has to be noted that the performance of the NPG algorithm was found to be substantially worse than reported by Xu et al. (2016), when aver-aging over 100 runs. Even when reducing the stopping criterion ε to the much stricter ε = 10−12, the performance did not change noticeably.

(29)

MIO NPG NNLS

k in out in out in out

5 opt_4.13e-5 _7.22e-5 _6.90e-5 _6.81e-5 _6.50e-5 _4.36e-5

Hang 6 opt_3.03e-5 _4.76e-5 _4.91e-5 _5.38e-5 _4.56e-5 _4.77e-5

Seng 7 opt_2.37e-5 _3.81e-5 _3.53e-5 _3.80e-5 _3.03e-5 _3.04e-5

(p=31) 8 opt_1.91e-5 _2.90e-5 _2.94e-5 _3.12e-5 _2.57e-5 _2.98e-5

9 opt_1.62e-5 _2.58e-5 _2.23e-5 _2.66e-5 _2.03e-5 _2.07e-5

10 opt_1.35e-5 _2.06e-5 _1.91e-5 _2.26e-5 _1.80e-5 _2.49e-5

5 opt_2.21e-5 _10.18e-5 _5.35e-5 _18.45e-5 _3.58e-5 _10.14e-5

DAX 6 opt_1.76e-5 _8.94e-5 _6.39e-5 _12.24e-5 _2.64e-5 _8.87e-5

(p=85) 7 opt_1.37e-5 _8.46e-5 _5.53e-5 _12.49e-5 _2.80e-5 _8.86e-5

8 opt_1.11e-5 _7.93e-5 _3.26e-5 _12.83e-5 _2.59e-5 _8.12e-5

9 opt_0.92e-5 _7.78e-5 _2.43e-5 _12.37e-5 _2.45e-5 _7.94e-5

10 0.81e-5 7.48e-5 2.84e-5 10.95e-5 2.16e-5 7.62e-5

5 opt_6.42e-5 _15.81e-5 _12.09e-5 _13.60e-5 _17.34e-5 _14.94e-5

FTSE 6 opt_4.96e-5 _11.19e-5 _8.92e-5 _11.13e-5 _13.07e-5 _12.68e-5

(p=89) 7 opt_3.83e-5 _9.07e-5 _7.42e-5 _9.40e-5 _14.14e-5 _10.15e-5

8 opt_2.90e-5 _9.66e-5 _6.20e-5 _8.11e-5 _10.86e-5 _7.75e-5

9 2.49e-5 8.59e-5 5.08e-5 6.70e-5 8.31e-5 6.48e-5

10 2.18e-5 8.01e-5 4.23e-5 6.24e-5 6.51e-5 6.31e-5

5 opt_4.50e-5 _11.42e-5 _10.51e-5 _15.33e-5 _6.89e-5 _12.80e-5

S&P 6 opt3.37e-5 10.07e-5 8.22e-5 12.3e-5 5.02e-5 10.47e-5

100 7 2.76e-5 7.80e-5 6.76e-5 10.79e-5 5.47e-5 7.01e-5

(p=98) 8 2.27e-5 6.76e-5 5.24e-5 9.22e-5 4.27e-5 5.83e-5

9 1.94e-5 5.91e-5 4.39e-5 7.89e-5 3.39e-5 5.37e-5

10 1.66e-5 5.55e-5 3.67e-5 6.62e-5 3.92e-5 4.74e-5

5 5.46e-5 16.29e-5 11.03e-5 18.10e-5 18.00e-5 15.08e-5

Nikkei 6 5.16e-5 7.85e-5 8.23e-5 14.99e-5 18.91e-5 15.18e-5

(p=225) 7 4.17e-5 9.87e-5 6.85e-5 13.46e-5 12.63e-5 8.65e-5

8 2.80e-5 9.13e-5 5.69e-5 11.85e-5 12.46e-5 10.18e-5

9 2.69e-5 6.58e-5 4.65e-5 10.21e-5 10.99e-5 8.92e-5

10 2.39e-5 10.44e-5 4.01e-5 9.33e-5 7.10e-5 6.67e-5

(30)

The results for the smallest 5 datasets are reported in Table 1. Indi-cated in bold is the best out-of-sample performance for each setting. Note that when counting the number of settings of best out-of-sample performance, the NNLS heuristic substantially outperforms the NPG algorithm and slightly outperforms the MIO algorithm. It even outper-forms MIO out-of-sample in many cases where MIO achieves confirmed in-sample optimality. Additionally, even in cases where the NNLS heuristic does not have the best performance, it is almost always very close to the best performance.

Although the out-of-sample performance of the NNLS heuristic reg-ularly outperforms the MIO approach, its in-sample performance is considerably worse. This suggests that the generalizability of the MIO solutions to the second half of the data suffers from overfitting on the first half of the data. In contrast, the NNLS heuristic finds very robust solutions.

A similar performance is found for the largest three datasets, as re-ported in Table 2. The NPG algorithm does perform better than MIO on these larger datasets. However, the NNLS algorithm still finds the best solution in the most cases. In two cases, MIO finds a solution that suffers from an extremely high out-of-sample tracking error, despite good in-sample performance. This high out-of-sample errors were not caused by an error in the computation. Instead, the algorithms seems to have constructed a highly unfortunate portfolio.

(31)

NP-hard and can be solved in less than a second. This also explains why the performance for k = 150 and k = 200 are identical. However, despite the confirmed in-sample optimality in these four cases, the out-of-sample tracking error is outperformed by the other approaches.

MIO NPG NNLS

k in out in out in out

10 2.47e-5 22.46e-5 9.37e-5 35.19e-5 7.39e-5 18.83e-5

S&P 20 1.04e-5 18.81e-5 4.01e-5 21.38e-5 2.81e-5 13.07e-5

500 30 0.42e-5 13.00e-5 2.48e-5 16.09e-5 1.46e-5 13.61e-5

(p = 457) 40 0.20e-5 11.76e-5 1.98e-5 14.58e-5 0.74e-5 11.46e-5

50 0.12e-5 11.18e-5 1.57e-5 12.74e-5 0.42e-5 12.22e-5

60 0.08e-5 12.15e-5 1.43e-5 12.68e-5 0.30e-5 11.26e-5

80 0.05e-5 41318.97e-5 0.53e-5 29.92e-5 1.98e-5 33.70e-5

Russel 90 0.01e-5 28.79e-5 0.38e-5 29.07e-5 2.10e-5 33.34e-5

2000 100 0.01e-5 24.18e-5 0.28e-5 22.29e-5 1.90e-5 33.04e-5

(p = 1318) 120 0.00e-5 4254.96e-5 0.42e-5 18.09e-5 1.82e-5 32.33e-5

150 opt_0.00e-5 _28.95e-5 _0.03e-5 _22.63e-5 _2.23e-5 _32.89e-5

200 opt_0.00e-5 _28.95e-5 _0.19e-5 _23.40e-5 _2.23e-5 _32.89e-5

80 0.01e-5 16.41e-5 0.35e-5 15.75e-5 0.84e-5 13.15e-5

Russel 90 0.01e-5 19.98e-5 1.14e-5 14.22e-5 0.80e-5 13.19e-5

3000 100 0.01e-5 16.70e-5 0.27e-5 13.64e-5 0.88e-5 13.11e-5

(p=2151) 120 0.00e-5 15.55e-5 0.43e-5 14.24e-5 0.36e-5 13.58e-5

150 opt_0.00e-5 _14.68e-5 _0.09e-5 _12.54e-5 _0.39e-5 _13.49e-5

200 opt_0.00e-5 _14.68e-5 _0.28e-5 _12.21e-5 _0.39e-5 _13.49e-5

(32)

4 Analysis of Selected Portfolios

In this section, the selection of the index tracking portfolios is anal-ysed. In the first subsection, the MIO and NNLS based approaches are compared by investigating whether they select similar constituents. Next, the second subsection analyses the influence of a change in the maximum number of constituents k on the selected set of constituents. In the final subsection, I consider the selection of the tracking portfolios over time. By sliding a moving window over the data I look how β_t changes over time. To my knowledge, none of these topics have been considered before.

4.1 Selection difference NNLS and MIO

This subsection discusses the overlap between the selected variables by the NNLS and MIO approaches. The NPG algorithm was excluded because its selection varied substantially over different random initial-izations. As was explained in Section 3.2, the MIO approach attempts to select the optimal subset of k variables and their portfolio weights to find the best possible in-sample fit. On the other hand, the NNLS Heuristic simply selects the variables that have the largest coefficients in the non-negative least squares solution. Although these two approaches are different in nature, they do have substantial overlap in the variables that are selected.

(33)

(DAX100), the FTSE 100 (FTSE100) and the Standard and Poor’s 100 (SP100) datasets. This is expected as the constituents with the largest coefficients in the NNLS solution are likely to describe the in-dex well in-sample.

On the largest two datasets, the Russel 2000 (RSL2000) and Russel 3000 (RSL3000), the overlap is considerably smaller. A reason for this could be that these large datasets contain far more constituents than observations, giving the MIO approach many options to pick from to find the optimal in-sample fit. One outlier is the Nikkei 225 (NK225) dataset, which even shows zero overlap for some settings. Finally, no definite pattern emerges that correlates with k.

From these results it can be concluded that although the overlap is sub-stantial in some of the smaller datasets, it is clear that both approaches yield different selections of variables.

k HS50 DAX100 FTSE100 SP100 NK225 k SP500 k RSL2000 RSL3000 5 0.60 0.60 0.40 0.60 0.00 10 0.30 80 0.22 0.29 6 0.83 0.67 0.33 0.67 0.00 20 0.50 90 0.21 0.21 7 0.86 0.57 0.43 0.57 0.14 30 0.63 100 0.24 0.29 8 0.88 0.62 0.50 0.62 0.25 40 0.60 120 0.20 0.24 9 0.78 0.56 0.56 0.56 0.11 50 0.70 150 0.26 0.36 10 0.80 0.60 0.60 0.50 0.20 60 0.80 200 0.26 0.36

(34)

4.2 Selection stability

A feature of the NNLS Heuristic is its stability in selection when vary-ing k. I measure this by lookvary-ing at the proportion of constituents that is retained when k is increased. As the NNLS Heuristic selects the constituents with the k largest coefficients, the set of constituents se-lected for k = κ will always contain the constituents sese-lected for k < κ, ∀κ ∈ N+_{. For example, if k is increased by 1, it retains the current}

set of constituents in the portfolio and adds an additional constituent. This is a desirable property for an investor that may be interested in changing the number of constituents in his existing portfolio.

On the other hand, using MIO, the set is re-selected for each value of k to find the optimal constituents for that particular setting. This may imply that the portfolio for k = κ may have no overlap with the portfolio for k = κ+1, meaning that none of the constituents in the cur-rent portfolio should be retained when increasing k. Such an outcome is highly undesirable for an investor, as this could cause him to incur large transaction costs to increase the number of constituents in his portfolio.

In practice, the MIO solution often has a high retention rate of the ex-isting portfolio when increasing k, but this is not guaranteed. In Table 4, the retention rates for MIO on the OR-library datasets are shown for multiple values of k.

(35)

setting, implying full retention when increasing k from 150 to 200. The reason for this is that the sparsity constraint is not binding for either of these levels and the sparsity is imposed by the no-shorting constraint.

From these results it may be concluded that the NNLS Heuristic more attractive than the MIO approach for investors that have no hard pref-erence for some fixed value of k, as its selected portfolio changes only slightly when changing k. This holds especially in the case of tracking indexes with large number of constituents.

k HS50 DAX100 FTSE100 SP100 NK225 k SP500 k RSL2000 RSL3000 5 to 6 0.80 0.80 0.80 1.00 0.00 10 to 20 0.40 80 to 90 0.34 0.25 6 to 7 1.00 1.00 1.00 1.00 0.50 20 to 30 0.50 90 to 100 0.28 0.41 7 to 8 0.86 1.00 1.00 0.86 0.00 30 to 40 0.73 100 to 120 0.32 0.40 8 to 9 1.00 1.00 1.00 1.00 0.12 40 to 50 0.62 120 to 150 0.36 0.25 9 to 10 0.78 1.00 1.00 1.00 0.22 50 to 60 0.70 150 to 200 1.00 1.00

Table 4: The proportion of constituents that is retained when increasing k, using MIO on the OR-libary datasets for multiple levels of k. So, a value of zero at “5 to 6” would imply that none of the constituents that were selected for k = 5 were still selected for k = 6, while a value of one would imply that all constituents were selected.

4.3 Constituent selection over time

(36)

Unfortunately, the OR-library datasets consist of anonymized data, so it is unknown which sector each of the constituents operate in. Hence, another dataset is used from the CRSP/Compustat Merged Database (Center for Research in Security Prices, 2012). The dataset consists of the daily returns of 1103 constituents of the S&P500 between Jan-uary 1970 and December 2015, in addition to the daily returns of the S&P500 index itself. Furthermore, information about their Global In-dustry Classification Standard (GICS) sector is available from the same source. Combining this data, it is possible to discover what type of com-panies are selected at each date over a 45 year period.

In theory, it would be possible to construct a tracking portfolio using the data of the entire 45 year period. However, over this period, many of the constituents that were initially in the index have been replaced and may even no longer exist. This would leave only a small portion of the constituents that have data over all the 45 years to construct the tracking portfolio. Additionally, since 45 years is a long time in financial data, information from 45 years ago is unlikely to be relevant for constructing an index tracking portfolio today.

(37)

To analyse the type of constituents that are selected, I compute the total weight of the portfolio that is given to constituents from each GICS sector. In other words, for each day, the portfolio is estimated using the previous 365 days. Then, for constituents belonging to the same sector, their weights are summed. For constituents that have no assigned GICS sector, the weight is added under “NA”. The results of this procedure are displayed as levelplots in Figure 1, for k = 10, k = 30 and k = 100.

Comparing the three plots in Figure 1, the portfolio weights are spread more evenly as k increases. However, the patterns in the intensity re-main similar. Before these patterns are interpreted, it must be noted that the dates on the horizontal axis are the final days of the 365 trad-ing day windows. So, each observation represents the approximately one and a half years that precede it.

In the levelplots in Figure 1, a presence of dramatic shifts in the sec-tor composition of the portfolios can be observed. Especially for a low value of k, the portfolio weight assigned to companies in some sector may jump from close to 0 to over 0.4 in just a few months time. This suggests that smaller tracking portfolios may have to be frequently re-constructed for small values of k. For larger values of k, this problem is still present, but not as dramatic as the weights are considerably more spread out. This suggests a trade-off has to be made between the lower investment, transaction and administrative costs for a low k, and the costs involved with more frequent reconstruction of the entire portfolio that come with a low k.

(38)

(a) k = 10

(b) k = 30

(c) k = 100

(39)

intensity of constituents from the Financial sector at the start of the 2007 Financial Crisis. Additionally, a strong change in the intensity of constituents from the Information Technology sector is present during the time of the Dot-com Bubble in 2000.

Of course it is possible that these same patterns are also shown by the true weights of the index. However, some patterns emerge for small values of k that do not emerge for larger k. For example, Healthcare seems to be the dominant sector between between 1990 and 2000 for k = 10, but not for k = 100.

(40)

5 Data Weighting Schemes

To select the weights of a tracking portfolio, one has to use historical data of the index and its constituents. Although it is obvious that re-cent data should be used, a trade-off has to be made in the maximum age of the data. On one hand, data that is too old may not be relevant any more. On the other hand, a sufficiently large dataset is required to obtain accurate portfolio weights.

One way to approach this is to select some point in the past and only use data created after that point. This is illustrated in Figure 2, where every observation before observation 90 receives no weight and every succeeding observation receives full weight. Such a rigid cut-off point is implicitly used in all the preceding sections and nearly all of the index tracking literature.

(41)

However, there exists an alternative approach: by placing different weights on every observation, a smooth decline in importance can be produced. Such weighting of data is common practice in time series data. Yet, there is little mention of the use of weighting in the index tracking literature. In this section, I will investigate whether weighting data may improve the index tracking performance. To do this, I con-sider the commonly used exponential weighting and another weighting scheme. I asses the performance of these weighting schemes by esti-mating tracking portfolios on both weighted and unweighted data and comparing their out-of-sample performance.

5.1 Weighting schemes

The most common form of data weighting in time series is exponen-tial weighting. The concept of exponenexponen-tial weighting is to weight each consecutive observation in the past by a factor (1 − α) less. So, the last observation receives a full weight of 1, the second-to-last observa-tion a weight of (1 − α), continuing to the pth observation in the past receiving a weight (1 − α)p−1. This weighting scheme allows the user to choose the value of the parameter α. Setting α = 0 is equivalent to not weighting the data at all. Conversely, when choosing α = 1, only the final observation is used. Hence, α should be chosen between 0 and 1 if it is believed that the earlier observations may contain some information, but that this information is less relevant than the more recent observations.

(42)

older observations may receive too much importance. So, α should be selected to fit the problem at hand. Figure 3 displays the exponential weighting for several values of α.

Figure 3: Some exponential weighting schemes for sev-eral different values of the parameter α, where the 145th observation is the current observation.

(43)

Figure 4: The linear weighting scheme, where the 145th observation is the current observation.

5.2 Comparison

To assess the effectiveness of data weighting, I consider three differ-ent ways of weighting: expondiffer-ential with α = 0.005, expondiffer-ential with α = 0.05 and linear weighting. I apply these weighting schemes to the same datasets from the OR-library and settings as used in Section 3.5. The effectiveness is then measured by computing the ratio of the out-of-sample tracking error with weighting, and the out-out-of-sample tracking error without weighting. These ratios are displayed in Tables 5, 6 and 7, for the exponential weighting with α = 0.005, exponential weighting with α = 0.05 and linear weighting, respectively. Unfortunately, due to an issue with the Gurobi license, the MIO approach could not be used on the largest datasets for the latter two weighting schemes.

(44)

three weighting schemes. In most cases the ratio of the tracking er-rors is very close to 1, indicating no improvement due to weighting. There are cases for which the tracking error decreases, but they are as numerous as cases where the tracking error increases. This overall ineffectiveness of reducing the importance of older data suggests that this old data is still relevant to the selection of a good tracking portfolio.

Comparing across the three different weighting schemes, the exponen-tial weighting with α = 0.05 appears to perform worst. Recalling Figure 3, it can be seen that this scheme only considers the recent past and discards much of the earlier data. This reinforces the idea that the early data is still highly relevant to the selection of the tracking portfolio.

Finally, I make a comparison across the MIO, NPG and NNLS based approaches. Between these approaches, the NPG approach appears to mostly suffer from the use of weighting, with many cases of substan-tially worse out-of-sample performance. Overall, MIO and NNLS based approaches do not necessarily suffer from the use of weighting. How-ever, the weighting does also not provide persistent improvements.

(45)

k MIO NPG NNLS k MIO NPG NNLS k MIO NPG NNLS 5 1.01 1.15 1.43 5 1.16 0.80 1.00 80 0.99 1.09 0.95 Hang 6 0.97 0.82 0.98 S&P 6 1.01 1.00 1.00 Russell 90 1.08 1.00 0.95 Seng 7 0.94 0.69 1.00 100 7 0.98 1.22 1.24 2000 100 0.88 1.38 0.99 (p = 31) 8 0.99 0.74 1.00 (p = 98) 8 0.99 2.81 1.00 (p = 1318) 120 0.80 0.96 0.95 9 0.92 0.90 1.01 9 1.30 0.79 1.00 150 1.03 0.98 0.94 10 0.96 1.16 0.60 10 0.84 1.18 1.00 200 0.80 1.08 0.94 5 0.99 0.41 1.00 5 1.20 0.76 1.01 80 1.00 0.75 0.86 DAX 6 1.05 0.82 1.00 Nikkei 6 1.95 0.69 0.86 Russell 90 1.00 0.98 0.84 (p = 85) 7 1.01 0.97 1.00 (p = 225) 7 1.16 1.70 1.05 3000 100 1.00 1.12 0.82 8 1.02 1.79 1.07 8 1.21 1.16 0.75 (p = 2151) 120 1.00 1.19 0.84 9 0.98 0.82 1.00 9 1.54 0.81 0.95 150 0.97 1.21 0.84 10 0.97 0.56 1.00 10 1.22 1.45 1.10 200 0.98 1.15 0.82 5 0.88 1.24 1.06 10 1.07 1.18 1.18 FTSE 6 1.00 1.04 0.96 S&P 20 0.71 0.92 0.97 (p = 89) 7 0.97 1.41 1.00 500 30 0.75 1.81 1.03 8 0.68 1.02 1.01 (p = 457) 40 0.94 1.27 1.04 9 0.98 0.82 0.95 50 1.00 0.99 0.95 10 1.02 0.75 0.80 60 0.84 0.88 1.06

Table 5: Ratio of the out-of sample tracking error of exponen-tial α = 0.005 weighting and no weighting.

k MIO NPG NNLS k MIO NPG NNLS k MIO NPG NNLS 5 0.69 1.62 1.44 5 1.32 1.63 0.90 80 - 1.26 1.09 Hang 6 0.91 1.76 0.97 S&P 6 0.76 1.38 0.92 Russel 90 - 0.65 1.10 Seng 7 0.90 1.97 1.28 100 7 1.52 1.35 1.17 2000 100 - 1.06 1.09 (p = 31) 8 0.98 2.25 0.88 (p = 98) 8 1.29 1.72 1.22 (p = 1318) 120 - 1.14 1.07 9 0.84 1.53 1.03 9 1.21 2.05 1.24 150 - 0.96 1.05 10 0.97 2.13 0.93 10 1.24 1.56 1.30 200 - 0.69 1.05 5 1.62 1.32 0.90 5 2.55 1.07 1.19 80 - 1.53 0.97 DAX 6 1.16 1.55 0.98 Nikkei 6 2.22 1.12 1.13 Russel 90 - 2.03 0.99 (p = 85) 7 1.03 1.89 0.97 (p = 225) 7 2.30 1.18 1.71 3000 100 - 1.65 1.00 8 1.18 1.46 1.02 8 0.67 1.69 1.26 (p = 2151) 120 - 1.34 0.97 9 1.11 1.80 1.04 9 3.17 1.53 1.17 150 - 1.06 1.01 10 1.58 1.64 1.11 10 1.24 1.29 1.40 200 - 1.47 1.01 5 1.30 1.30 0.88 5 1.33 1.13 1.22 FTSE 6 1.12 1.48 0.79 6 1.29 0.98 1.18 (p = 89) 7 1.19 1.16 0.72 7 0.85 0.99 0.89 8 0.99 1.51 0.87 8 0.94 1.13 1.01 9 0.84 1.48 0.77 9 0.88 0.88 0.72 10 0.77 1.18 0.74 10 1.02 0.89 0.84

(46)

k MIO NPG NNLS k MIO NPG NNLS k MIO NPG NNLS 5 1.02 0.90 1.40 5 1.13 1.36 1.01 80 - 1.01 0.87 Hang 6 0.90 1.08 0.96 S&P 6 0.84 0.94 1.01 Russel 90 - 1.28 0.94 Seng 7 0.83 1.04 1.12 100 7 0.97 1.11 1.24 2000 100 - 0.99 0.95 (p = 31) 8 0.96 1.11 1.17 (p = 98) 8 0.94 1.11 1.00 (p = 1318) 120 - 0.81 0.94 9 0.88 1.05 1.15 9 1.02 1.38 1.00 150 - 1.07 0.93 10 1.10 1.30 0.69 10 0.81 1.10 1.10 200 - 1.10 0.93 5 1.09 1.09 1.00 5 1.23 0.81 1.00 80 - 0.82 0.95 DAX 6 1.13 1.18 0.99 Nikkei 6 1.06 0.89 0.92 Russel 90 - 1.25 0.97 (p = 85) 7 1.23 1.31 0.99 (p = 225) 7 0.83 0.93 1.41 3000 100 - 1.09 0.96 8 1.26 1.02 1.06 8 0.73 0.97 1.05 (p = 2151) 120 - 1.23 0.97 9 1.05 1.18 1.00 9 1.37 1.36 1.16 150 - 0.94 0.97 10 1.01 0.92 1.00 10 0.77 1.13 1.37 200 - 0.85 0.97 5 0.72 1.10 1.01 5 1.09 1.11 1.16 FTSE 6 0.97 1.27 0.98 6 0.99 0.92 0.93 (p = 89) 7 0.92 1.08 0.91 7 0.56 1.14 0.76 8 0.73 1.05 0.94 8 0.95 1.01 1.04 9 0.74 1.00 1.01 9 0.70 0.86 1.05 10 0.72 1.06 0.85 10 1.10 0.86 1.24

(47)

6 Conclusion

To summarize this thesis, the Sparse Index Tracking Problem was in-troduced and a short overview of the index tracking literature was provided. Furthermore, a mixed integer optimization approach and a non-monotone projected gradient descent approach were discussed. A simple heuristic based on non-negative least squares was then intro-duced and demonstrated to yield competitive out-of-sample results on empirical data compared to state-of-the-art approaches. This suggests that the focus on in-sample optimality in the literature may have been unproductive.

(48)

References

Adcock, C. and Meade, N., 1994. A simple algorithm to incorporate transactions costs in quadratic optimisation. European Journal of Operational Research, 79(1):85–94.

Barras, L., Scaillet, O., and Wermers, R., 2010. False discoveries in mutual fund performance: Measuring luck in estimated alphas. The Journal of Finance, 65(1):179–216.

Barro, D. and Canestrelli, E., 2009. Tracking error: a multistage port-folio model. Annals of Operations Research, 165(1):47–66.

Beasley, J. E., 1990. Or-library: distributing test problems by electronic mail. Journal of the operational research society, 41(11):1069–1072. Beasley, J. E., Meade, N., and Chang, T.-J., 2003. An evolutionary

heuristic for the index tracking problem. European Journal of Oper-ational Research, 148(3):621–643.

Bertsimas, D., King, A., Mazumder, R., et al., 2016. Best subset se-lection via a modern optimization lens. The Annals of Statistics, 44 (2):813–852.

Bianchi, D. and Gargano, A., 2011. High-dimensional index track-ing with cointegrated assets ustrack-ing an hybrid genetic algorithm. Manuscript available at: http://ssrn. com/abstract, 1785908.

Birgin, E. G., Mart´ınez, J. M., and Raydan, M., 2000. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10(4):1196–1211.

(49)

Center for Research in Security Prices. Crsp/compustat merged, 2012. URL http://wrds-web.wharton.upenn.edu/wrds/. [Online; ac-cessed 14-August-2016].

Chen, C., Li, X., Tolman, C., Wang, S., and Ye, Y., 2013. Sparse portfolio selection via quasi-norm regularization. arXiv preprint arXiv:1312.6350.

Coleman, T. F., Li, Y., and Henniger, J., 2006. Minimizing tracking error while restricting the number of assets. Journal of Risk, 8(4):33. Derigs, U. and Nickel, N.-H., 2004. On a local-search heuristic for a class of tracking error minimization problems in portfolio management. Annals of Operations Research, 131(1-4):45–77.

di Tollo, G. and Maringer, D. Metaheuristics for the index tracking problem. In Metaheuristics in the Service Industry, pages 127–154. Springer, 2009.

Eddelb¨uttel, D. A hybrid genetic algorithm for passive management. In Working Paper presented at the Second conference on comput-ing in economics and Finance, Society of computational economics, Geneves, Suisse, 1996.

Fama, E. F. and French, K. R., 2010. Luck versus skill in the cross-section of mutual fund returns. The journal of finance, 65(5):1915– 1947.

Fang, Y. and Wang, S.-Y. A fuzzy index tracking portfolio selection model. In International Conference on Computational Science, pages 554–561. Springer, 2005.

(50)

q-norm constraints for index tracking. Quantitative Finance, 14(11): 2019–2032.

Fuller, S. L., 2008. The evolution of actively managed exchange-traded funds. The Review of Securities and Commodities Regulation, 41(8): 89–96.

Gilli, M. and K¨ellezi, E. The threshold accepting heuristic for index tracking. In Financial Engineering, E-Commerce and Supply Chain, pages 1–18. Springer, 2002.

Guastaroba, G. and Speranza, M. G., 2012. Kernel search: An applica-tion to the index tracking problem. European Journal of Operaapplica-tional Research, 217(1):54–68.

Gurobi. Gurobi 6.5 performance benchmarks, 2016. URL http://www. gurobi.com/pdfs/benchmarks.pdf. [Online; accessed 14-August-2016].

Jagannathan, R. and Ma, T., 2003. Risk reduction in large portfolios: Why imposing the wrong constraints helps. The Journal of Finance, 58(4):1651–1684.

Jansen, R. and Van Dijk, R., 2002. Optimal benchmark tracking with small portfolios. The journal of portfolio management, 28(2):33–39. Keim, D. B., 1999. An analysis of mutual fund design: the case of

investing in small-cap stocks. Journal of Financial Economics, 51 (2):173–194.

(51)

Kosowski, R., Timmermann, A., Wermers, R., and White, H., 2006. Can mutual fund stars really pick stocks? new evidence from a boot-strap analysis. The Journal of finance, 61(6):2551–2595.

Lawson, C. L. and Hanson, R. J., 1995. Solving least squares problems, volume 15. SIAM.

Li, Q., Sun, L., and Bao, L., 2011. Enhanced index tracking based on multi-objective immune algorithm. Expert Systems with Applications, 38(5):6101–6106.

Lobo, M. S., Fazel, M., and Boyd, S., 2007. Portfolio optimization with linear and fixed transaction costs. Annals of Operations Research, 152(1):341–365.

Maringer, D. and Oyewumi, O., 2007. Index tracking with constrained portfolios. Intelligent Systems in Accounting, Finance and Manage-ment, 15(1-2):57–71.

Markowitz, H., 1952. Portfolio selection. The journal of finance, 7(1): 77–91.

Meade, N. and Salkin, G. R., 1990. Developing and maintaining an equity index fund. Journal of the Operational Research Society, 41 (7):599–607.

Mullen, K. M. and van Stokkum, I. H. M. nnls: The Lawson-Hanson algorithm for non-negative least squares (NNLS), 2012. URL https: //CRAN.R-project.org/package=nnls. R package version 1.4.

(52)

Rohweder, H. C., 1998. Implementing stock selection ideas: Does track-ing error optimization do any good? The Journal of Portfolio Man-agement, 24(3):49–59.

Roll, R., 1992. A mean/variance analysis of tracking error. The Journal of Portfolio Management, 18(4):13–22.

Rudd, A., 1980. Optimal selection of passive portfolios. Financial Management, pages 57–66.

Ruiz-Torrubiano, R. and Su´arez, A., 2009. A hybrid optimization ap-proach to index tracking. Annals of Operations Research, 166(1): 57–71.

Shapcott, J., 1992. Index tracking: genetic algorithms for investment portfolio selection. Edinburgh Parallel Computing Centre, EPCC– SS92–24.

Wang, M. Y., 1999. Multiple-benchmark and multiple-portfolio opti-mization. Financial Analysts Journal, 55(1):63–72.

Wu, L., Yang, Y., and Liu, H., 2014. Nonnegative-lasso and application in index tracking. Computational Statistics & Data Analysis, 70:116– 126.

Xu, F., Xu, Z., and Xue, H., 2012. Sparse index tracking: An l1/2 regularization based model and solution.

Better than In-Sample Optimal Index Tracking with a Simple Heuristic

Index Tracking with a Simple

Heuristic

Index Tracking with a Simple

Heuristic

Nick W. Koning

S2038404

Abstract

Contents

1

Introduction

2

Index Tracking

2.1

The Objective Function

2.2

The Portfolio Constraint

2.3

The Sparsity Constraint

3

Approaches to index tracking

3.1

Previous work

3.2

Mixed-Integer Optimization

3.3

Non-monotone Projected Gradient Descent

3.4

The Non-Negative Least Squares Heuristic

3.5

Comparison

4

Analysis of Selected Portfolios

4.1

Selection difference NNLS and MIO

4.2

Selection stability

4.3

Constituent selection over time

5

Data Weighting Schemes

5.1

Weighting schemes

5.2

Comparison

6

Conclusion

References