• No results found

Market timing of exchange-traded funds investors

N/A
N/A
Protected

Academic year: 2021

Share "Market timing of exchange-traded funds investors"

Copied!
42
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Market timing of exchange-traded funds

investors

Abstract

The returns of investors of exchange-traded funds (ETFs) do not only depend on the funds they pick, but also the timing and magnitude of their capital flows. This paper assesses the differences

between investor and security returns for ETFs between 1993 and March 2018 by using the money-weighted return methodology. We find that the annual money-money-weighted returns are depend on the time-horizon of measurement, but that a significant deviation between time-weighted and money-weighted returns exists ranging from 7.74% to -1.80% annually. There is an indication that this effect is driven by return chasing behaviour of investors, but this is less prominent for the biggest

exchange-traded funds. Finally, we find a link between the economic cycle and market timing of ETF investors, but more research is required to confirm this.

Author: Rinke Bakker Student number: s2393328

University: Rijksuniversiteit Groningen Faculty: Faculty of Economics and Business Degree program: MSc. Finance

Thesis Supervisor: Sibrand Drijver Second Assessor: TBD

Date submitted: 08/06/2018

(2)

2

1. Introduction

Exchange-traded funds (ETFs) have enjoyed spectacular growth over the last two decades and have become a major asset class in the United States (US). Exchange-traded funds have climbed in assets under management from just 95 million in 1993 up to 3.4 trillion USD in March 2018 (ICI.org, 2018). By offering low costs, increased liquidity and transparent pricing, they have been steadily gaining market share, compared to mutual funds and closed-end funds, in the U.S. investment landscape (Collins et al., 2018). For investors, it is important to consider what their actual return is when deciding to invest in ETFs, as Dichev (2007) highlighted that there can be significant differences between investor returns and security returns. Time-weighted returns (TWR) are those quoted by funds, but they do not take consider the timing and magnitude of capital flows in and out of funds. By assessing investments as a sequence of capital flows, these can be weighted according to money invested, similar to an internal rate of return calculation. We contribute to the literature by assessing the difference between money-weighted returns (MWR) and time-weighted returns for ETFs, which shows the impact of timing decisions on investor returns.

Research on MWRs (or dollar-weighted returns), has been performed for major U.S. indices, mutual funds and hedge funds (Dichev, 2007; Dichev and Yu, 2011; Friesen and Sapp, 2007). They assess investor timing ability as TWRs minus MWRs, which hereafter is referred to as the performance gap. Previous research has found differences in not only the performance gap for various asset classes but also over time. The research has attributed the performance gap to return-chasing behaviour of investors. Investors can chase superior returns (or alpha) by investing in funds that outperformed in the past (Friesen and Sapp, 2007). By replicating their methodology for ETFs, we can exclude a search for alpha, as most ETFs have a passive strategy in which they replicate indices in risk and return (Clifford et al., 2014). This was assessed for new ETF fund flows between 2001 and 2010 by Clifford et al. (2014), who found that investment behaviour is similar for ETFs and mutual funds. However, they did not determine whether the performance gap was similar. Therefore, this study answers the questions of whether there is a performance gap for ETFs and what the drivers of the performance gap are by investigating three hypotheses. First, we determine whether there is a robust difference between TWRs and MWRs for investors. Second, we assess if return-chasing behaviour can explain this gap. Third, we explore the performance gap over time and link this to the economic cycle.

(3)

3 equal to 6.14%, and from 2013 to 2018, the performance gap became negative at -1.80%. Therefore, the drivers of the performance gap are assessed in a multivariate regression, and we find that the expense ratio and the arithmetic positively influence the performance gap, whereas volatility lowers the performance gap. Moreover, investors exhibit return-chasing behaviour, which is one of the factors that could explain the performance gap. This effect disappears when focussing on the largest ETFs, which is attributed to relatively more investments from more sophisticated institutional investors. Finally, we find evidence that there is a link between the economic cycle and performance gap, indicating that the performance gap becomes worse during economic downturn. However, this requires additional research to confirm, as we used yearly data and a sample of ETFs that only experienced two full economic cycles (NBER, 2018).

The implications of these results are that investor returns can significantly deviate from fund returns due to the magnitude of capital flows. Market timing of investors differs over time, becoming worse during an economic downturn. Moreover, this paper adds to the body of empirical research that finds evidence of return-chasing behaviour of investors, even when alpha should not be present due to the passive nature of ETFs. This effect is not present for major institutional investors, indicating that these are more sophisticated and potentially exhibit fewer behavioural biases.

This paper is structured as follows. First, the intuition behind TWRs are assessed, followed by the structural differences between ETFs and mutual funds. Next, behaviour biases are assessed and linked to the performance gap. This is supplemented by empirical research on MWRs and the performance gap. Here, we focus on the work of Dichev (2007), Friesen and Sapp (2007) and Dichev and Yu (2011), as we follow them closely in methodology and structure. This is compared to the work of Clifford et al. (2014) who focussed on ETFs. The literature section is concluded with

hypotheses on the performance gap. Next, the methodology for calculating the performance gap is presented, followed by bootstrapping hypothesis testing to control for non-normality. The

(4)

4

2. Literature review and hypotheses

This section first explores performance measurements from an investor’s point of view. Next, the intuition behind MWRs is discussed. This is followed by the theoretical drivers of a gap in investor returns and security returns. After the theoretical background, the empirical results from various authors that apply to the methodology of MWRs is applied. Previous research has focussed on market indices, hedge funds and mutual funds, whereas this paper assesses ETFs. Nevertheless, the focus of this paper is on the work of Dichev (2007), Dichev and Yu (2011) and Friesen and Sapp (2007), which is followed closely in structure and methodology. This also allows this work to compare the results of ETFs to investor performance in mutual funds, hedge funds and broad indices. The link between previous research and ETFs is subsequently made. As ETFs have a different structure, this could lead to disparity in investor returns. Finally, the section is concluded with the development of three hypotheses to evaluate whether there is a gap in investor performance and its drivers.

2.1 Performance gap

The application of MWR methodology when evaluating investors performance was first proposed by Illmer and Marty (2003), who argued that the MWR is the true return from an investor’s point of view and therefore key when deciding on investment opportunities. The concept of the MWR is well documented in corporate finance as a method to assess investments as a sequence of cash flows (Bodie et al., 2012), but it is not often applied to determine investor returns. By considering the timing of investors cash flows, the discrepancy between security and investor returns can be determined. The TWRs (buy-and-hold returns), especially, are based on both the benchmark effect, which is the return due to the decision to invest in a benchmark strategy, and the management effect, by which asset allocation and individual stock selection contribute to the return. This directly assumes equal weighting of investments over time, which is unlikely. When adding the timing effect, which is the influence of the magnitude of investor capital flows over time, the MWR is determined (Illmer and Marty, 2003). The difference between the TWR and MWR shows the performance gap over the investor horizon that is attributable to market timing (Dichev, 2007).

(5)

5 investors becomes worse due to higher trading costs relative to financial institutions. Second, the expenses on the fund result in a deviation of reported returns. Exchange-traded funds have

historically had lower fees compared to mutual funds, as the brokerage firm carries the costs (Valle, Meade and Beasley, 2014). In addition, passively managed funds have lower costs than actively managed firms. Third, taxation is a factor that further deviates investor returns from reported buy-and-hold returns.

When it comes to ETFs, they are quite similar to passively mutual funds, as they both generally try to replicate the risk and return of a basket of securities. The difference between them, however, is that ETFs are traded continuously on exchanges, whereas mutual funds are only traded once a day at a price equal to the net asset value. The share redemption and creation processes are also quite different for ETFs, as shares are traded at an index, and new shares are only created by an

authorized participant (AP) when the price deviates too much from the net asset value. In that case, the AP sells new stocks in the market and buys the underlying securities simultaneously. This can result in an arbitrage opportunity for the AP, resulting in an incentive to keep the price close to the net asset value. There is also a difference in taxation when comparing ETFs to mutual funds based on operating expenses. The structure of ETFs results in tax efficiencies relative to mutual funds, as they engage less in internal trading and in turn create fewer taxable events. Finally, ETFs do not track their benchmarks perfectly. In 2011, only 11% reproduced both the mean and volatility within 1% deviation annually, based on a sample of 822 ETFs (Valle et al., 2014). A tracking error can lead to an ETF earning an alpha, but this is generally seen as negative (Clifford et al., 2014).

2.2 Behavioural drivers

To understand why a gap between fund performance and investor performance can exist, various theories from behavioural finance are assessed. The key elements to consider here is how rational investors are and how their decisions to time their investments affect their returns. We focus on prospect theory, overconfidence and representativeness to explain market timing and in turn the performance gap.

The most widely used method in finance for individual investor behaviour is prospect theory (Ackert and Deaves, 2009), which was developed by Tversky and Kahneman (1992). They empirically

(6)

6 and Statman (1985), who found that investors in mutual funds and stocks hold on to losers for too long and sell winners too fast, due to the reluctance to admit wrong judgement and satisfaction from gains. If investors on average hold on to losers too long and sell winners too fast, they will underperform compared to a buy-and-hold strategy, measured by TWRs. When comparing

sophisticated institutional investors with private investors, the disposition effect is still present with sophisticated investors but to a lesser extent (Barber and Odean, 2007; Feng and Seasholes, 2005). A different reason why investments could be timed poorly and result in underperformance is due to overconfidence. Moreover, overconfidence can also lead to excessive active trading. If investors overestimate the precision of their knowledge relative to others, they are unrealistically optimistic about their own results. This is reinforced by the fact that investors overestimate their own contribution to past results. When considering that for every trade, there is someone taking the opposite side, it seems unlikely that they have a knowledge advantage(Daniel and Hirshleifer, 2015). Overconfidence is observed for both individual and more sophisticated investors and affects their financial decisions (Ben-David et al., 2013; Glaser et al., 2012; Puri and Robinson, 2007). This can manifest itself in irrational excessive trading and potentially lead to bad market timing. This active investment puzzle is well documented. For example, Daniel and Hirschleifer (2015) concluded that the average annualised turnover for the largest 500 stocks in the US between 1980 and 2014 was 223%, equal to around 100 billion USD per day (Daniel and Hirshleifer, 2015). All of these trades can significantly affect the MWRs of investors.

Bad timing could also be the result of the representativeness heuristic, which is closely related to overconfidence. Representativeness causes subjective judgement regarding the extent to which probabilities are evaluated. Kahneman and Frederick (2002) argued that answers to a difficult question are substituted by answers to an easier one, which biases decision making. By

overestimating the probability that one high fund return is an indication of a high mean return, investors can display return-chasing behaviour. This could also be a reason that investment

(7)

7 investments in which the capital is most productive. However, this should not be expected for ETFs, as there is no predictability of returns of market indices (Clifford et al., 2014).

2.3 Empirical Research

To assess empirical research on the performance gap, we consulted previous papers that have applied this methodology. Dichev (2007) was the first to apply it over a large timeframe to U.S. market indices. He found a significant performance gap between 1926 and 2004 for both global markets and U.S. markets and that aggregate stock returns around the world are lower following capital inflows and higher following outflows. This resulted in a performance gap between 1.3% and 5.6% for the New York Stock Exchange and Nasdaq respectively. Overall, Dichev (2007) implied that the cost of equity is lower than previously assumed. However, Keswani and Stolin (2008) questioned his methodology and results and found the opposite for certain periods, indicating that investors earn additional returns with market timing. They argued that average investor investment horizons are shorter; thus, the methodology should be applied to shorter periods. In a similar manner, the global findings were not robust, due to a significant increase in volume over the years and better coverage in DataStream over time.

The methodology was also applied to mutual funds by Friesen and Sapp (2007), who found that poor timing resulted in a performance gap equal to 1.56% annually for a sample of 7,125 funds during the 1991–2004 timeframe. While previous studies did assess MWR, they did not do so at the individual fund level, which allows them to explicitly control for selection ability. Nevertheless, previous research on MWRs at aggregate levels for mutual funds displayed similar results (Braverman and Wohl, 2005; Nesbitt, 1995), namely evidence for a performance gap. Moreover, Friesen and Sapp (2007) found evidence of return-chasing behaviour, as poor marketing timing largely erases the positive alpha presented by better-performing funds. They applied both the Fama-French three-factor model and the Carhart four-three-factor model to assess performance and concluded that funds with the highest alpha have the greatest performance gap; thus, the ability to select outperforming funds does not correlate with the ability to time their investments efficiently. Contrary to the findings of Keswani and Stolin (2008), Friesen and Sapp (2007) did not find any pattern in the performance gap over time.

(8)

8 In the context of ETFs, Clifford et al. (2014) assessed fund flows for ETFs between 2000 and 2010 and found that there was return-chasing behaviour similar to mutual funds and hedge funds. They stated that due to the passive strategy of most ETFs, return-chasing behaviour is not caused by pursuit of superior funds or managers. In addition, market timing was assessed by creating a portfolio with fund inflows and a portfolio that has outflows. They then shorted the market portfolio to create a zero-cost portfolio. They deduced that for various subsamples, there is no evidence of smart fund flows and attributed it to return-chasing behaviour. The application of alpha for ETFs is interesting, as there is no pursuit of superior skilled management (Berk and Green, 2004; Clifford et al., 2014). A difference between ETFs and mutual funds can occur from the investor clientele. In 2017, institutional investors owned up to 59% of all U.S. ETFs (Mercado, 2017; Rennison, 2017). The interest of institutional investors is attributed to two factors. First, it allows for strategic factor exposure for a low cost. Second, it allows for effective liquidity management since ETFs can be traded intra-daily, given that enough liquidity is present in the market. The performance gap for ETF investors can be worse compared to mutual funds, as they can be traded intra-daily. Barber and Odean (2002) tested the performance of investors who switched from phone-based to online trading during the 1990s. They found an increase in trading and moreover underperformance compared to the market equal to 3%. There is no clear relation between ability to trade intra-daily and market-timing; however, it is possible that this can cause deviations when comparing the performance gap.

2.4 Hypotheses

This section discusses the three hypotheses and the main research question. Based on both behavioural and empirical literature, we can expect a performance gap for investors in ETFs. The main research question is as follows: What drives the difference between investor and security returns? To evaluate this, three hypotheses are constructed in relation to deviation between TWRs and MWRs.

(9)

9 was driven by business cycles. In addition, they found both positive and negative performance gaps, where a negative performance gap results in additional investor returns above security returns due to efficient market timing. Therefore, the first hypothesis is that there is a significant gap between security and investor returns. This hypothesis is tested by following the methodology of Dichev (2007), and the results are subsequently compared to the performance gap for indices, mutual funds and hedge funds.

Second, return-chasing behaviour is assessed. Jegadeesh and Titman (1993) found that momentum strategies or buying past winners and selling past losers can result in abnormal returns. This was confirmed by Sapp and Tiwari (2004), who determined that investors chase past winners rather than use momentum-style investing. For mutual funds, Friesen and Sapp (2007) documented that their results were in line with return-chasing behaviour. Finally, Clifford et al. (2014) assessed fund flows for ETFs up to 2010 and concluded that return-chasing behaviour is present for investors. The second hypothesis consequently is that future cashflows are correlated to past returns. Third, another of the potential drivers of a performance gap is assessed, namely whether this is influenced by economic business cycles. As Keswani and Stolin (2008) concluded for major indices, the performance gap can vary over time. They showed that the economic cycle could be a driver. The economic cycle is usually measured by various macroeconomic variables, such as

unemployment rate, gross domestic product (GDP), interest rates, production indices and inflation (Diebold and Rudebusch, 1994). Investor expectations of future returns are also highly correlated with past returns (Greenwood and Shleifer, 2014), which could explain a larger gap during economic downturn. In addition, there is evidence that funds flow into markets after additional past returns and prior to poor future returns (Baker and Wurgler, 2000). To assess this, the third hypothesis states that the performance gap is negatively related to the economic cycle.

3. Methodology

This section discusses how the MWR is calculated and how this relates to the buy-and-hold return or TWR. Next, the implications of the MWRs are assessed, and the bootstrap methodology to test for significance for value-weighted portfolios is introduced. Finally, the Fama-French three-factor model and Carhart four-factor model, which are used to control for factor risk, are presented.

3.1 Money-weighted return

(10)

10 is especially interesting for ETFs, as they have seen a large influx of funds in the past couple of years, which signifies that the later years are more important than the initial periods when assessing the aggregate return for ETF investors.

Dichev (2007) proposed viewing investments and disinvestments as capital flows and solving for the internal rate of return (IRR) to comprise the MWRs. The first step is determining the capital flows that must be discounted. Capital flows can be defined as the market capitalisation at time t minus market capitalisation in the previous period, plus the return 𝑟𝑡 during the period. This is displayed in

Equation 1, in which return is calculated as a continuously compounded return. A positive capital flow at time t indicates an investor contribution compared to t-1 and a negative capital flow represents an outflow of funds.

𝐶𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤𝑡 = 𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡− 𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡−1∗ (1 + 𝑟𝑡) (1)

The MWR is defined as the rate of return that solves the equation of initial market capitalisation plus the sum of the discounted cashflows over the period equal to the discounted ending market

capitalisation, which is presented in Equation 2. The MWR is indicated by 𝑟𝑚𝑤.

𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑇 (1 + 𝑟𝑚𝑤)𝑇 = 𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛0+ ∑ 𝐶𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤𝑡 (1 + 𝑟𝑚𝑤)𝑡 𝑇 𝑡=1 (2)

The TWR, or buy-and-hold return, does not consider the timing of the cash flows of investors and is based purely on price appreciation or depreciations and dividends which will be reinvested at the end of the period. Equation 3 demonstrates that the time-weighted returns are equal to the compounded returns 𝑟𝑡𝑤 = (∏(1 + 𝑟𝑡) 𝑇 𝑡=1 ) 1 𝑇 − 1 (3)

By plugging Equation 1 into Equation 2 and rearranging it, one can see that the timing of the

investor’s capital flows influences returns directly. Equation 4 shows this and, more specifically, that MWRs are average returns that are value weighted over time.

∑𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡−1 (1 + 𝑟𝑚𝑤)𝑡−1 ∗ 𝑟𝑚𝑤 = ∑ 𝑀𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡−1 (1 + 𝑟𝑚𝑤)𝑡−1 ∗ 𝑟𝑡 𝑇 𝑡=1 𝑇 𝑡=1 (4)

(11)

11 repurchases or dividend buybacks also influence this process (Dichev and Yu, 2011). This is also applied to create value-weighted portfolios rather than equally weighted portfolios. By aggregating fund flows, one is able to assess the impact on the average ETF investor.

To attribute the impact of market timing on investor returns, the performance gap is defined as the difference between the TWR and MWR, as in Equation 5.

𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑔𝑎𝑝 = 𝑟𝑡𝑤− 𝑟𝑚𝑤 (5)

A problem with this methodology is that a high-order polynomial with both positive and negative capital flows will have multiple roots that result in the polynomial to be equal to 0, which results in multiple correct solutions. Many of these roots are either complex numbers or real numbers below -100% (Friesen and Sapp, 2007). A potential resolution would be to assess fund flows on an aggregate level, which shortens calculation. However, as Friesen and Sapp (2007) stated, this could discard valuable information and moreover does not allow for explicitly controlling for selection ability. Therefore, we use -100% as a constraint on the solution and nominate the value closest to the TWR in case the algorithms find multiple solutions after double calculations.

3.2 Bootstrap significance testing

Investment returns are considered non-normal and are influenced by dependencies, such as

correlated residuals across firms and across time (Petersen, 2009). When assessing the performance gap, there is no indication this is the case when examining the residuals. However, when assessing value-weighted portfolios, this becomes a problem since it results in only one observation.

Therefore, we use the bootstrap methodology for all value-weighted portfolios. This method is also applied by Dichev (2007), Keswani and Stolin (2008) and Dichev and Yu (2011).

(12)

12 the solution for MWRs with randomly reshuffled vector of returns i and k as the number of

replications. The average of the bootstrapped estimates, 𝜃̅, is used only for estimating the standard deviation (Stata, 2017). 𝑠𝑒 ̂ = [ 1 𝑘 − 1∑(𝜃̂ − 𝜃̅)𝑖 2 𝑘 𝑖=1 ] 1 2 (6)

The t-statistics and p-values in turn can be calculated with Equation 7, with 𝐻0 as the null

hypothesis, 𝜃̂ as the observed value for the t-statistic, and 𝜃̂∗ as the random variable for the t-statistic, assuming that the null hypothesis holds.

𝜌 = Pr(𝜃̂∗≥ 𝜃̂|𝐻

𝑜) (7)

3.3 Fama-French three-factor model and Carhart four-factor model

Finally, to control for performance, both the Fama-French three factor model and the Carhart four-factor model are applied to the individual ETFs over their lifetime. Equation 8 shows the three-four-factor model, with MRP equal to the return on the market portfolio, SMB as small minus big or factor loading to small market capitalisation stocks, and finally, HML as high minus low, or high book to market capitalisation resulting in factor exposure to value stocks over growth stocks (Fama and French, 1993). Excess return is the return on the security, minus the risk-free rate, equal to the monthly treasury-bill for the US. The intercept alpha captures the outperformance relative to a funds factor (risk) exposure. These variables are retrieved for a global portfolio from the website of Kenneth French. We use monthly global factors, as some ETFs also have a large exposure to assets outside the US.

𝐸𝑥𝑐𝑒𝑠𝑠 𝑟𝑒𝑡𝑢𝑟𝑛𝑡 = 𝛼 + 𝛽1∗ 𝑀𝑅𝑃𝑡+ 𝛽2∗ 𝑆𝑀𝐵𝑡+ 𝛽3∗ 𝐻𝑀𝐿𝑡+ 𝜀𝑡 (8)

The Carhart four-factor model adds the momentum factor, WML. This factor captures the empirical observation that a security tends to continue rising if increasing and continue to decline if it is already declining (Carhart, 1997). This is displayed in Equation 9.

(13)

13

4. Data and descriptive statistics

This section provides a detailed description of how the data sample and various sub-samples were acquired. Next, it describes how the sample of ETFs and the ETF market evolved over time. This section is concluded by descriptive statistics for the full sample.

The initial dataset was acquired from NASDAQ, ETFdb and Firstbridge. From this, a sample was created, filtered on availability of data through Thompson Reuters Datastream. This resulted in an initial sample of 1,947 ETFs. These were filtered further for a minimum of 12 months of data available to accurately assess factors such as volatility, in line with mutual fund literature (Evans, 2010). For these ETFs, monthly data on total returns from the total return index and market capitalisation were acquired, resulting in 1,719 ETFs. A final selection was made based on a maximum error of 0.10 (around 0.001% error in IRR) when calculating the MWR through the Generalized Reduced Gradient (GRG) Solver and the Evolutionary Solver in Excel, resulting in a total of 1,630 ETFs where a solution was found. The Evolutionary Solver process was only applied to the initial sample, as the process is time intensive. No significant deviations in the resulting MWR were found when comparing methods. Of these 1,630 ETFs, an additional four observations were dropped due to DataStream reporting single monthly returns above 100% that were not verifiable with Yahoo Finance. As this is an internal rate of return calculation, multiple outcomes are possible when the sign of the cash flows changes repeatedly, thus setting boundaries is required. We used a constraint of minimum -100% and maximum +200%, which is similar to Friesen and Sapp (2007), who used 100%. The maximum constraint of 200% was employed since leveraged ETFs are also included in the sample. The sample consisted of 99% US-based ETFs and accounted for more than 90% of the total U.S. ETF market in 2017 when comparing market capitalisation to other sources (Collins et al., 2018; Mercado, 2017; Rennison, 2017)

For these calculations, monthly returns were used to reduce the influence of database errors compared to daily observations and still allow for accurate estimation of timing ability. This permitted comparisons with other asset classes from Dichev, (2007), Friesen and Sapp (2007) and Dichev and Yu (2011).

Data on NAV and outstanding shares were also evaluated; however, due to limited data prior to 2014 available in Thompson Reuters Datastream, market capitalisation was used to calculate capital flows. While differences between NAV and share price might be present due to temporary premiums and discounts, the effect should be minimal assuming efficient arbitrage by the authorised

(14)

14 categories were based on data from ETFdb and cross-checked with ETF descriptions from Firstbridge. The expense ratio was also acquired from ETFdb and corresponded to the year 2018. Finally,

indicators for economic activity and GDP growth as proxy business cycles were acquired from the Federal Reserve Bank of St. Louis. For 2017 an estimate of GDP growth is used from OECD.

4.1 Descriptive statistics

(15)

15

Table 1. Descriptive statistics: sample over time

Cumulative # of funds

# of funds new funds

Capital Flow (in millions of USD) Capital Flow/ Market Capitalisation Annual TWR (value-weighted) Market Capitalisation (in millions of USD) 1993 1 1 7 0.072 8.91% 102 1994 1 0 38 0.386 -0.69% 99 1995 2 1 318 0.551 37.77% 577 1996 19 17 547 0.073 2.50% 7,515 1997 19 0 2,681 0.294 -8.52% 9,116 1998 29 10 4,285 0.312 3.62% 13,754 1999 30 1 3,251 0.128 29.09% 25,366 2000 75 45 28,931 0.327 -13.97% 88,431 2001 96 21 - 17,963 -0.278 -19.39% 64,643 2002 107 11 39,525 0.404 -16.07% 97,807 2003 119 12 11,127 0.083 20.17% 134,494 2004 151 32 37,947 0.191 12.83% 198,166 2005 197 46 76,867 0.255 9.44% 301,631 2006 311 114 45,095 0.112 13.71% 402,248 2007 469 158 63,604 0.123 12.01% 517,177 2008 554 85 210,118 0.507 -46.91% 414,060 2009 636 82 124,051 0.175 35.99% 707,925 2010 751 115 124,026 0.136 9.56% 909,426 2011 887 136 105,294 0.105 -0.83% 1,005,137 2012 983 96 153,766 0.123 8.66% 1,253,292 2013 1,098 115 158,419 0.099 13.75% 1,595,132 2014 1,237 139 170,763 0.090 8.03% 1,898,332 2015 1,422 185 209,593 0.100 -0.96% 2,102,703 2016 1,604 182 209,394 0.088 2.95% 2,381,549 2017 1,626 22 412,927 0.126 18.43% 3,267,513

The table shows the full sample of 1,626 exchange traded funds, gathered from ETFdb.com, Firstclass.com and Thompson Reuters DataStream. Capital flow is the sum of the total fund flows, calculated by the following equation: 𝑐𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤𝑡=

𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡 − 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑡−1∗ (1 + 𝑟𝑡), where t is the time in months and 𝑟𝑡 the monthly

return based on the total return index in USD. Capital flow over market capitalisation shows the magnitude of capital flows. The annual time-weighted returns are the buy-and-hold returns for one year, and the final column shows the market capitalisation in USD.

(16)

16 The average age of the funds was 88 months, which equals slightly more than 7 years. The market capitalisation was on average equal to roughly 800 million USD. The expense ratio ranged from 0.03% to 9.67% for the VanEck Vectors Income ETF, which was caused by significant indirect fees to other funds. Overall, the expense ratio in our sample was equal to 0.52%, which is roughly equal to the expense ratio for ETFs in 2017 reported by the Investment Company Institute, which tracks the U.S. investor market (Collins et al., 2018). The problem with the expense ratio is that it is based on figures for 2018 only, potentially biasing the study’s results. Therefore, interpretation must be performed with caution. The average GDP growth over the full life time of the fund ranged from 0.7% to 2.3% annually. The Fama-French three-factor model and Carhart four-factor model were applied to equity ETFs, with similar alphas of -3.8% and -3.3% respectively. Thus, on average, ETFs do not earn any alpha and underperform when compared to a standard benchmark; however, this offers no information on the tracking error. Clifford et al. (2014) argued that alpha loss can also occur due to expenses, so we controlled for this in our multivariate regressions. Moreover, momentum (WML) seemed to have a high loading on average, equal to -0.963. This could indicate that firms invest more in stocks declining in prices. Additional descriptive statistics for a subsample from 2009 onward and the panel dataset that tracks the performance gap over time are available in Appendix 2 and 3. Moreover, the full description and sources are also available in Appendix 1. Section 5.3 examines the performance gap over time with a panel dataset. Here, we used annual data for all the variables other than the arithmetic returns. We used 1-year-lagged arithmetic returns to control for return-chasing behaviour.

In Section 5, we also categorize ETFs based on their fund strategy. Alternative ETFs consist of funds that employ various active strategies in topics such as inflation expectations, long/short, managed futures and merger arbitrage strategies. These can generally be defined as non-traditional

(17)

17

Table 2. Descriptive statistics: a cross-sectional analysis

Observations Mean Median Maximum Minimum Std. Dev.

Fund-specific characteristics

Time-weighted returns 1,626 4.13% 4.34% 105.76% -83.92% 10.36%

Money-weighted returns 1,626 4.35% 3.96% 49.18% -54.50% 8.99%

Performance gap 1,626 -0.22% 0.18% 61.61% -87.45% 6.67%

Age in months 1,626 88.7 83.0 302.0 12.0 55.5

Average market cap (in millions of

USD) 1,626 792.62 89.77 70,049.16 0.48 3,119.83

Expense ratio (2017) 1,626 0.52% 0.48% 9.67% 0.03% 0.38%

Cash flow (in % of Market Cap.) 1,626 12.93% 4.63% 1785.93% -7.13% 62.95%

Standard deviation of returns) 1,626 0.051 0.045 0.296 0.000 0.035

Average GDP growth 1,626 1.414 1.542 2.300 0.711 0.474

Arithmetic return (annualised) 1,626 6.43% 6.18% 116.83% -56.32% 9.41%

Fama-French three-factor model

Alpha (annualized) 1,137 -0.038 -0.021 0.255 -0.737 0.085

HML 1,137 0.099 0.072 7.366 -8.032 0.649

MRK 1,137 1.129 1.076 5.576 -4.037 0.634

SMB 1,137 0.105 0.026 4.754 -4.727 0.685

Carhart four-factor model

Alpha (annualised) 1,137 -0.033 -0.015 0.404 -0.719 0.088

HML 1,137 0.059 0.033 6.666 -6.428 0.618

MRK 1,137 1.115 1.056 5.581 -3.534 0.630

SMB 1,137 0.090 0.009 5.218 -4.335 0.689

WML 1,137 -0.071 -0.065 1.895 -1.680 0.286

The table presents the descriptive statistics of the sample. An in-depth list describing all variables and sources is available in Appendix 1. Time-weighted returns are calculated as the annual buy-and-hold return over the entire sample. Money-weighted return is calculated by solving for 𝑟𝑚𝑤in the following equation 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛(1+𝑟 𝑇

𝑚𝑤)𝑇 = 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛0+ ∑ 𝐶𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤(1+𝑟 𝑡

𝑚𝑤)𝑡

𝑇

𝑡=1 . The gap is defined as the difference between time-weighted returns and

money-weighted returns. For equity ETFs, factor exposure is calculated, in which HML is the high minus low book value factor, MRK is the market portfolio, SMB is the small market capitalisation factor and WML is the momentum factor.

4.2 Control variables

(18)

18 there is not direct theoretical link between volatility and economic cycles, Fornari and Mele (2013) found that financial volatility can predict 30% of post-war economic activity in the US. Therefore, we also considered this factor, as Keswani and Stolin argued that an economic cycle can influence the performance gap. Performance was considered to control for differences in fund styles, as they could vary widely in the sample. For this, we used arithmetic returns, the Fama-French three-factor model and the Carhart four-factor model, similar to Friesen and Sapp (2007) and Clifford et al. (2014). The calculations for these factor models were based on Equation 8 and 9.

5. Results

This section first assesses the performance gap for individual funds in different quintiles and per fund class, as defined by ETFdb.com. Second, value-weighted portfolios over for the fund classes and over various time periods are evaluated. Next, multivariate regressions are discussed to assess the drivers of the performance gap and control for various factors. This is followed by an evaluation of return-chasing behaviour. After this, a panel data set based on the sample is presented to assess whether the performance gap varies over time and is related to the business cycle. The section concludes with a discussion of robustness and limitations.

5.1 Individual and portfolio level money-weighted returns

Table 3 shows the TWRs, MWRs and the performance gap. In Panel A, the individual ETFs are divided in 10 quintiles of 162/163 ETFs, and in panel B, per ETF class. When assessing Panel A, one idea that becomes clear is that there is a large discrepancy between the different sizes ETFs: the largest 163 ETFs account for 75% of the sample, ranging from 490 million to 1 trillion USD in average market capitalisation. The performance gap in Column 6 is not significant for any group of ETFs; moreover, the gap has varying signs with insignificant performance gaps when divided in quintiles for size. Thus, we are unable to conclude that there is a performance gap for ETF investors.

(19)

19

Table 3. Individual fund returns

Panel A: Individual fund returns per quintile

Size # of funds

Total of Average Market

capitalisation (in millions of USD) TWR MWR Gap

1st quintile 163 510 5.02% 5.14% -0.12% (0.548) 2nd quintile 162 1,690 5.51% 5.34% 0.17% (0.889) 3rd quintile 163 3,831 3.46% 3.36% 0.10% (0.963) 4th quintile 162 6,586 4.50% 4.36% 0.15% (0.876) 5th quintile 163 11,616 2.38% 3.51% -1.13% (0.324) 6th quintile 162 19,632 2.97% 3.12% -0.15% (0.970) 7th quintile 163 35,810 2.94% 2.86% 0.07% (0.953) 8th quintile 162 64,028 4.75% 5.02% -0.28% (0.599) 9th quintile 163 145,155 5.21% 5.76% -0.55% (0.507) 10th quintile 163 999,937 4.58% 5.03% -0.45% (0.442) Panel B: Individual fund returns per ETF class

ETF class # of funds

Total of Average Market

capitalisation (in millions of USD) TWR MWR Gap

Alternatives 27 1,403 2.12% 1.77% 0.35% (0.768) Bond 284 243,608 2.15% 1.28% 0.87%*** (0.005) Commodity 111 62,494 -5.03% -3.09% -1.94% (0.164) Currency 30 7,406 -0.49% 1.24% -1.72%* (0.056) Equity 1,137 942,592 5.71% 6.01% -0.30% (0.847) Real Estate 40 31,291 2.91% 3.14% -0.23% (0.893) Total 1,626 1,288,794 4.130% 4.350% -0.220% (0.801)

*** is significant at 1%; ** is significant at 5%; * is significant at 10%. P-values are displayed between brackets. Significance is calculated with a student t-test. The quintiles of Panel A are ranked on size, and Panel B shows the performance gap based on ETF class. Time-weighted return is the buy-and-hold return on an annual basis. Money-weighted return is 𝑟𝑚𝑤 that

solves the following equation 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑇

(1+𝑟𝑚𝑤)𝑇 = 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛0+ ∑

𝑐𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤𝑡

(1+𝑟𝑚𝑤)𝑡

𝑇

𝑡=1 . The gap is defined as

the difference between time-weighted and money-weighted returns. Alternative ETFs are those that employ various hedge-fund-like strategies, such as arbitrage and long-short methods.

(20)
(21)

21

Table 4. Value-weighted portfolios

Panel A: Individual fund returns per quintile

ETF class # of funds Years TWR MWR Gap

Alternatives 27 8.83 1.08% 0.84% 0.25% (0.147) Bond 284 15.50 4.14% 2.78% 1.35%* (0.091) Commodity 111 13.17 2.43% -1.08% 3.51% (0.426) Currency 30 12.08 2.36% 4.25% -1.89% (0.643) Equity 1,138 25 4.13% 6.14% -2.01% (0.573) Real Estate 40 17.58 5.38% 3.89% -1.49% (0.789) Panel B: Portfolio returns for all funds in subsamples

Subsamples # of funds Years TWR MWR Gap

Early period (1993-1999) 32 6.83 9.24% 9.25% -0.01% (0.999) 1st middle period (2000-2008) 512 9 -5.80% -13.54% 7.74%** (0.038) 2nd middle period (2009-2012) 1,005 5 9.93% 3.80% 6.14%*** (0.002) Late period (2013-2018) 1,626 4.25 9.85% 11.65% -1.80%** (0.018) 1st middle excluding 2008 (2000-2007) 441 8 1.20% 6.65% -5.45%*** (0.000) After 2008 (2009-2018) 1,626 9.25 9.89% 8.48% 1.41% 0.407 Total 1,626 25 3.80% 5.37% -1.57% (0.573)

*** is significant at 1%; ** is significant at 5%; * is significant at 10%. P-values are displayed between brackets. Significance is calculated with a bootstrap significance test, which is examined in section 3.2. The quintiles of Panel A are ranked on size and Panel B shows the performance gap based on ETF class. All portfolios consist of individual ETFs, weighted in terms of market capitalisation. Time-weighted-return is the buy-and-hold return on an annual basis. Money-weighted-return is 𝑟𝑚𝑤

that solves the following equation 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛𝑇

(1+𝑟𝑚𝑤)𝑇 = 𝑚𝑎𝑟𝑘𝑒𝑡 𝑐𝑎𝑝𝑖𝑡𝑎𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛0+ ∑

𝑐𝑎𝑝𝑖𝑡𝑎𝑙 𝑓𝑙𝑜𝑤𝑡

(1+𝑟𝑚𝑤)𝑡

𝑇

𝑡=1 .The gap is

defined as the difference between time-weighted and money-weighted returns. Alternative ETFs are those that employ various hedge-fund-like strategies, such as arbitrage and long-short methods.

(22)

22 Similarly, for the third period, from 2009 to 2012, investors underperformed due to bad marketing timing with 6.14%. In the final period, there was a performance gap of -1.80%, indicating that investors earned additional returns by efficient market timing. When assessing the total

performance gap from 2009 up to 2018, it is equal to 1.41%. Overall, there was no significance for the total value-weighted portfolio, and the performance gap seemed to vary over time in

magnitude. Nevertheless, we can confirm our first hypothesis, that there is a performance gap for ETF investors, but that it does depend on the time horizon. This is in line with the findings of Keswani and Stolin (2007). To assess the drivers of the performance gap and control for fund characteristics, various cross-sectional multivariate regressions are analysed in the next section.

5.2 Determinants performance gap

This section analyses the determinants of the performance gap with multivariate regressions, presented in Table 5. The multivariate regressions are based on the full sample from 2009 onward for equity ETFs only and for bond ETFs only. Here, we assess which factors are significant drivers of the performance gap. We control for characteristics such as age, size, expense ratio, the percentage of cash flows, volatility and also risk adjusted performance. We take the average of each

characteristic over the full duration or full sample size duration. Size is the natural logarithm of the average market capitalisation. The independent variables of GDP growth and age are dropped in some models, due to multicollinearity issues, to allow for interpretation of other variables. Potential multicollinearity is measured by variance inflation factors, which are presented in the appendix for each model. The main cause of this is high correlation among independent variables. This can result in abnormal standard errors, making it impossible to distinguish between variables and their effect on the performance gap (Farrar and Glauber, 1967). Other authors have argued that

(23)

23

Table 5. Multivariate regression performance gap

(1) Full sample (2) Top 10% market capitalisation (3) 2009 subsample (4) Bonds (5) Equity Fama-French three-factor (6) Equity Carhart four-factor Age in months 0.02% 0.01% 0.00%*** 0.01%* 0.01%* (0.169) (0.350) (0.000) (0.059) (0.083) Average market capitalisation (ln) 0.18%** 0.22% 0.07% 0.00% 0.25%* 0.25%* (0.034) (0.670) (0.418) (0.612) (0.093) (0.089) Expense ratio (2017) 170.13%** 453.51%*** 137.32%** 0.72% 155.04%* 155.29%* (0.024) (0.000) (0.027) (0.245) (0.076) (0.077)

Cash flow (in % of Market Cap.) -0.49% 0.22% -0.40% -1.07% 0.23% 0.21% (0.283) (0.454) (0.724) (0.882) (0.697) (0.735) Standard deviation of returns) -83.33%*** -45.38%*** -83.31%*** -0.32% -97.56%*** -96.10%*** (0.000) (0.088) (0.000) (0.209) (0.000) (0.000) Average GDP growth 0.47% 1.94% 16.56%*** 0.24% 0.21% (0.349) (0.014) (0.000) (0.715) (0.749) Arithmetic return (annual) 25.69%*** 20.13%*** 24.79%*** 0.00%*** (0.000) (0.045) (0.000) (0.000) factor models Alpha FF/CH4 (annual) 19.53%*** 20.24%*** (0.000) (0.000) MRK 2.10%** 2.14%** (0.015) (0.014) SMB 1.55%** 1.47%** (0.027) (0.043) HML -1.05% -1.04% (0.198) (0.214) WML 1.39% (0.317) Observations 1,626 163 1,622 284 1,137 1,137 R-squared 0.300 0.283 0.264 0.158 0.223 0.224 Adjusted R-squared 0.294 0.235 0.259 0.140 0.216 0.217

Fund-fixed effects Yes Yes Yes No No No

*** is significant at 1%; ** is significant at 5%; * is significant at 10%. P-values are displayed between brackets. Significance is calculated with a student t-test with white-adjusted standard errors. The dependent variable is the performance gap, defined as time-weighted returns minus money-weighed returns. We use an ordinary least squares regression model in which the intercept and fund-fixed effects based on the asset class are omitted from the presented results. Average GDP growth and age in months are omitted in the 2009 subsample due to multicollinearity issues. Both the Fama-French three-factor model and the Carhart four-three-factor model coefficients are estimated by individual ordinary least squares regressions for each equity ETF over its lifetime.

(24)

24 excluded. Four additional observations were removed due to the fact that the GRG Solver process was unable to find a solution for the MWRs. Model 4 focussed on only bond ETFs to assess whether bond ETFs vary in drivers of the performance gap. Finally, Model 5 and 6 applied the Fama-French three-factor model and Carhart four-factor model respectively to better control for risk-adjusted return. This is in line with Friesen and Sapp (2007) and Clifford et al. (2014). These models were only applied to equity ETFs, hence the decrease in observations. In the first three models, fund-fixed effects were used to control for the various ETF classes that were present in the sample, similar to how Clifford et al. (2014) controlled for different fund styles.

Model 1 with the full sample demonstrated that with fund-fixed effects, the major factors influencing the performance gap were the expense ratio, standard deviation of returns and arithmetic returns, with 170%, 83% and 25% respectively. This indicates that a 1% increase in expense ratio results in a 1.7% increase the performance gap. In a similar manner, a 1% increase in volatility or arithmetic returns results in a 0.83% and .25% increase in the performance gap

respectively. These numbers are quite high, but one has to consider the mean values of 0.52%, 0.051 and 6.41% for the expense ratio, standard deviation and arithmetic returns respectively. Model 1 had the highest R-squared, indicating that it is best suited to explain the performance gap. However, a potential bias might arise when comparing the other models, as the first three models used fund-fixed effects. The findings were similar when assessing the top 10% ETFs in market capitalisation in Model 2. However, the impact of the expense ratio here was even larger with a coefficient of 4.53. The average expense ratio was equal to 0.28%; thus, a slightly higher expense ratio will have a significant impact on the performance gap. This should be interpreted with caution, as we used data based on 2018 only for the expense ratio. A potential explanation for the large difference in

(25)

25 with the conclusions of Greenwood and Shleifer (2014), who found that investor expectations of future returns are highly correlated with past returns. However, this is not enough evidence to confirm that GDP growth is linked to the performance gap for bond ETFs. Therefore, we also examine this with a panel dataset in Section 5.4. Model 5 and 6 are quite similar, with adjusted R-squares equal to 0.22. Thus, adding the momentum factor does not have a major influence in explaining the performance gap. Exposure to market risk and firms with low market capitalisation have a minor but positive and significant effect on the performance gap.

Overall, for volatility there was a significant negative relationship with the performance gap, except with the subsample for bonds only in Model 4. Thus, the higher the volatility, the better investors are at timing their cash flows. This finding is also confirmed by Chen and Liang (2007) and Busse (1999) for hedge funds and mutual funds respectively. This is contrary to the findings of Friesen and Sapp (2007), who determined that investors who invest in mutual funds have worse market timing for funds that have high volatility, which they attributed to return-chasing behaviour. The difference for ETFs could be that institutional investors invest significantly in ETFs. However, the negative relation between volatility and the performance gap becomes less prominent when focussing on the largest ETFs, which have a relatively higher share of institutional investors (Rennison, 2017). We were unfortunately unable to separate institutional capital flows from private investors’ capital flows to further assess this anomaly.

The arithmetic returns had a positive relation to the performance gap, indicating that the better-performing funds, irrespective of risk, result in worse marketing timing of investors. However, due to the nature of a multivariate regression, we are unable to attribute this to return-chasing behaviour. When there is a relationship between past returns and the current performance gap, this would be in line with the findings of Clifford et al. (2014), who concluded that fund flows into ETFs are unable to anticipate returns.

(26)

26

5.2 Return-chasing behaviour

To assess return-chasing behaviour, we examine whether previous returns can explain capital flows. This is of importance because the key driver of MWRs are the (discounted) cash flows. The

implication here is that if past returns result in additional cash flows, investors are exhibiting return-chasing behaviour. To control for the difference in absolute cash flows, the cash flows in percentage of market capitalisation were scaled. Table 6 shows the mean correlation between past and future returns as an explanation for the performance gap. Overall, based on Panel A, we found no relation when including all funds that have data available for a full year. The mean correlation 1 year prior to the relative cash flows was equal to 0.003, with a relatively high standard deviation of 0.6. All other periods had a negative correlation between net cash flows and past returns, indicating that past returns lower capital inflows. This is contrary to the research of Dichev and Yu (2011), who found a positive mean correlation between past returns and capital flows, indicating that return-chasing behaviour is present. When we assessed the correlation between quarterly capital flows and returns in Panel B, we found a low correlation equal to 0.083 between returns of past quarter and current capital flows. This is an indication of return chasing behaviour, but we do not have conclusive evidence that return-chasing behaviour is present for ETFs, as Dichev and Yu found a mean

(27)

27

Table 6. Correlation of capital flows and past/future returns

Panel A: Correlation of yearly capital flows

Funds Mean Standard deviation P1 P10 P25 P50 P75 P90 P100 t-1 return 1,127 0.003 0.591 -1.00 -0.98 -0.40 0.00 0.40 0.99 1.00 t-2 return 1,004 -0.076 0.543 -1.00 -0.83 -0.47 -0.07 0.26 0.68 1.00 t-3 return 898 -0.091 0.522 -1.00 -0.80 -0.47 -0.08 0.24 0.59 1.00 t+1 return 1,127 -0.109 0.536 -1.00 -0.88 -0.50 -0.11 0.24 0.64 1.00 t+2 return 1,004 -0.099 0.537 -1.00 -0.90 -0.49 -0.09 0.28 0.61 1.00 t+3 return 898 -0.070 0.546 -1.00 -0.85 -0.48 -0.04 0.29 0.68 1.00

Panel B: Correlation of quarterly capital flows Number of observations Mean Standard deviation P1 P10 P25 P50 P75 P90 P100 t-1 return 1,619 0.083 0.341 -1.00 -0.32 -0.08 0.09 0.27 0.45 1.00 t-2 return 1,592 0.025 0.361 -1.00 -0.41 -0.14 0.05 0.21 0.42 1.00 t-3 return 1,532 0.012 0.342 -0.99 -0.36 -0.15 -0.01 0.18 0.43 1.00 t+1 return 1,619 -0.029 0.345 -1.00 -0.43 -0.20 -0.02 0.14 0.35 1.00 t+2 return 1,591 -0.022 0.357 -1.00 -0.42 -0.18 -0.02 0.14 0.38 1.00 t+3 return 1,533 -0.044 0.352 -1.00 -0.47 -0.20 -0.03 0.12 0.34 1.00

This table presents the Pearson correlation between capital flows as a percentage of market capitalisation and returns from 1 to 3 years prior and from 1 to 3 years afterwards. The number of observations drops due to the inclusion of variables with less than 36 months of data.

5.4 Time-series analysis

To assess whether there is a link between the performance gap and business cycle, as Keswani and Stolin (2007) posited, we use an unbalanced panel dataset that shows the continuous performance gap over time. For this reason, we included previous year returns also when calculating the

(28)

28

Figure 1. Performance gap for a value-weighted portfolio

Figure 1 shows the performance gap of a value weighted portfolio over time, starting in 2001 with monthly intervals. The performance gap is defined as the difference between the time-weighted returns and money-weighted returns. The GDP growth is the growth in GDP in the US, and the leading indicator is an indicator that consists of 10 economic factors that measures business trends in the US. The left-hand axis shows the leading indicator, while the right-hand axis shows the performance gap and GDP growth.

Figure 1 shows the GDP growth and performance gap of a value-weighted portfolio from 2000 onward, in percentages on the righthand side axis and the leading indicator on the left-hand axis. The table seems to indicate that the performance gap is mostly positive, signifying that the negative total performance gap from Table 4 is not robust and dependant on the time horizon over which MWRs are calculated. Moreover, it shows that there might be a correlation between the

performance gap and business cycle. Therefore, we assess a panel dataset to examine whether the economic cycle is the driver when controlling for other factors.

Table 7 shows five different fixed-effects panel regressions. Based on the Hausman test, we reject the null hypothesis that individual effects are uncorrelated with other regressors and therefore conclude that a fixed-effects model is the correct specification. This is also in line with the approach of Clifford et al. (2014), who used both month- and fund-fixed effects, corresponding with cross-sectional and time-fixed effects. Both the Fama-French three-factor model and the Carhart four-factor model are dropped due to insignificant results in Table 5 and the time-intensive task to calculate them annually for over 1,600 ETFs, which is not within the scope of this research. When comparing Table 7 with Table 5, the performance of previous year arithmetic is considered, rather

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 -20% -10% 0% 10% 20% 30% 40% 50% 01/01/2001 01/01/2004 01/01/2007 01/01/2010 01/01/2013 01/01/2016 Lead in g in d icato r U S Perf o rm an ce gap a n d G DP growth

(29)

29 than current arithmetic returns, to further assess whether return-chasing behaviour is present for investors. Moreover, standard deviations of returns were based on the current year, in line with Clifford et al. (2014), to control for current volatility. We were unable to control for the expense ratio, as only data for 2018 was available.

Table 7. Fixed-effects panel data regression

(7) GDP growth (8) Leading index (9) Top 10% based on market

capitalisation (10) Bonds (11) Equity

Age in months 0.04%ǂǂǂ 0.04%ǂǂǂ 0.02%* 0.02%* 0.04%***

(0.000) (0.000) (0.070) (0.078) (0.000)

Average market capitalisation (ln)

-0.06% -0.04% 0.25% 0.23% -0.14%

(0.805) (0.849) (0.600) (0.539) (0.616)

Cash flow (in % of Market Cap.)

0.08%ǂ 0.08%ǂ 0.01% 0.01% -0.08%

(0.030) (0.041) (0.736) (0.114) (0.741)

Standard deviation of returns -17.75%ǂ -5.72% 21.94% 31.55% -23.33%**

(0.043) (0.484) (0.164) (0.164) (0.023)

GDP growth (annual) -1.20%ǂǂǂ -0.71%*** -0.11% -1.55%***

(0.000) (0.002) (0.480) (0.000)

Leading index (annualized) -1.93%ǂǂǂ

(0.000) Arithmetic returns (t-1) 68.96%ǂǂǂ 59.03%ǂǂǂ 25.66% 123.85%*** 123.85%*** (0.000) (0.000) (0.182) (0.004) (0.004) Observations 1575 1575 157 272 1105 R-squared 0.129 0.123 0. 041 0.071 0.117 Adjusted R-squared 0.128 0.128 0.121 0.031 0.066

*** is significant at 1%; ** is significant at 5%; * is significant at 10%. P-values are displayed between brackets. Models 7 and 8 use Bonferroni corrected alpha’s, ǂǂǂ is significant at 1%, ǂǂ is significant at 5% and ǂǂǂ is significant at 10%. Significance is calculated with a student t-test with white-adjusted standard errors. The dependent variable is the performance gap, defined as the difference between time-weighted returns and money-weighted. A least-squares model with annual data and fixed effects is used, where the intercept is omitted from the presented results. The average market capitalisation is taken as the natural logarithm. Arithmetic returns have a 1-year lag.

Bonferroni significance is applied to model 7 and 8, to adjust the standard alpha levels to

(30)

30 The age of the ETF had a minor influence on the performance gap (0.04%), with significance at 1% for Model 7, 8 and 11. Thus, the performance gap was larger for funds that were older. The cash flow also had a minor but positive relationship to the performance gap, indicating that funds that receive more capital on average have worse returns for investors. The standard deviation of returns had a significant negative impact in Model 7, but this disappeared in Model 8 when using the leading index to control for the economic cycle. This was potentially caused by the correlation between economic cycle and volatility (Fornari and Mele, 2013), which was higher for the leading index compared to GDP growth. When we replace current year volatility with previous year volatility, this finding is also robust, which is in unreported results. Thus, investors are better at timing the market when volatility is high. This is in line with the research of Chen and Liang (2007) and Busse (1999), who found that professional investors display better market timing with high volatility. However, whereas in Table 5 the influence of volatility was smaller when considering only the largest ETFs, it is insignificant here. The economic cycle has a minor negative effect on the performance gap,

indicating that with low GDP growth, the performance gap is higher, which is line with Figure 1. With a 1% increase in GDP growth, there was a 0.12% decrease in the performance gap on average. This is an indication that there is a link between the economic cycle and the performance gap. However, a key issue is that the sample of ETFs only went through two economic cycles; thus, to be able to confirm this hypothesis for ETFs, one has to assess this over a longer horizon (NBER, 2018). The final finding as result of a panel dataset is that historical average returns significantly affected the performance gap for all models, except for the top 10% of ETFs in size. This is an indication that return-chasing behaviour was present, which results in poor market timing of investors. This effect seemed to be more severe for bond and equity ETFs compared to the full sample. Moreover, this effect was not present for the top 10% ETFs, potentially caused by institutional investors. Overall, combined with the correlation between last quarter returns and current cashflows, we can confirm the third hypothesis that investors in ETFs exhibit return-chasing behaviour, negatively affecting their returns, which is in line with the research of Dichev and Yu (2012), Friesen and Sapp (2007) and Clifford et al. 2014).

5.4 Robustness and limitations

(31)

31 discontinued ETFs. In addition, by not allowing factor weights of our performance measurement to vary over time when individually assessing funds, a bias might arise. For our panel dataset, we used annual data which might influence the performance gap, as we have seen that the performance gap depends on the time horizon.

First, leveraged ETFs might skew the results, as capital flows in these assets are more volatile, potentially resulting in extreme performance gaps (Clifford et al., 2014). However, when we exclude these in our analysis, we do not find any major differences in results, as displayed in Appendix 6. Second, there is a potential selection bias in the data when not including discontinued ETFs; however, due to the nature of our source, a commercial party, we have no information available to assess the impact. One could argue that the impact should theoretically be minimal, as ETFs assets are split legally from the custodian and have the net asset value of the fund as insurance.

Nevertheless, it would be interesting to see the results when assessed with a more complete database. Third, there might be a bias due to including results of the incubation period (first 12 months), which is usually controlled for in mutual fund literature (Evans, 2010). Due to the public nature of ETFs, we argue that this should have no influence. However, due to time limitations, we are unable to confirm whether this influences our results.

Fourth, by using the Fama-French three-factor model and Carhart four-factor model over the full duration, rather than using rolling 24-month windows like Friesen and Sapp (2007) and Dichev and Yu (2011), the results are biased. However, Clifford et al. (2014) did not find any different results for their sample with various performance windows for just arithmetic returns, ranging from 1 to 36 months. Nonetheless, it would be interesting to see if the factor loadings are important when determining the performance gap when the factors for individual funds are allowed to change over time and apply this to the panel dataset.

(32)

32

6. Conclusions

The return of ETF investors depends not only on the type of funds in which they choose to invest but also the magnitude of their capital flows in and out of funds. This results in a distortion between investor returns and security returns. We measured this by computing MWRs, which discount the net cashflows of ETFs over time to derive an internal rate of return on the investment. We

compared this to TWRs, which do not consider the timing aspect. We have provided a contribution to the literature by applying this to ETF investors by using a sample of 1,626 ETFs from 1993 up to 2018. Based on ETF classes, only real estate and alternative ETFs have a significant performance gap, but the sample sizes for these classes are too small for conclusive evidence. However, when we assessed value-weighted portfolios, which weight ETFs according to their market capitalisation, we found that for the average investor, the returns depend heavily on the time frame of investing. Between 2000 and 2008, there was a performance gap of 7.74%. Between 2009 and 2012, a gap of 6.14% exists, and for the final period of 2013 until 2018, investors earned additional returns due to market timing equal to 1.80%. This is contrary to the results of Friesen and Sapp (2007), who did not find any major deviations over time for mutual fund investors.

To investigate the drivers of the performance gap, cross-sectional multivariate regressions were first assessed. The main evidence indicated that the expense ratio and fund performance increase the performance gap. Thus, investors who are able to select better-performing funds do not seem to be able to time their investments efficiently. Moreover, the most expensive funds are generally those with active management and are associated with bad market timing. Furthermore, the higher the volatility of returns, the better the market timing. This effect disappears in a panel dataset for the largest 10% of ETFs.

Moreover, we found evidence of return-chasing behaviour of ETF investors, which is in line with our hypothesis and previous literature. Average past returns significantly affect the performance gap. However, the relationship between past returns and current cash flows is of a smaller magnitude compared to the study of Dichev and Yu (2011). This could potentially be explained because more sophisticated institutional investors are also actively using ETFs.

(33)

33 especially considering that the sample between 2000 and 2017 only went through two full business cycles (NBER, 2018).

(34)

34

7. References

Ackert, L., Deaves, R. 2009. Behavioral Finance: Psychology, Decision-Making, and Markets. Baker, M. and Wurgler, J. 2000. The Equity Share in New Issues and Aggregate Stock Returns. the Journal of Finance 55, 2219-2257.

Barber, B. M. and Odean, T. 2007. All that Glitters: The Effect of Attention and News on the Buying Behavior of Individual and Institutional Investors. The review of financial studies 21, 785-818. Barber, B. M. and Odean, T. 2002. Online Investors: Do the Slow Die First? The Review of Financial Studies 15, 455-488.

Barber, B. M. and Odean, T. 2000. Trading is Hazardous to Your Wealth: The Common Stock Investment Performance of Individual Investors. The journal of Finance 55, 773-806.

Ben-David, I.; Graham, J. R. and Harvey, C. R. 2013. Managerial Miscalibration. The Quarterly Journal of Economics 128, 1547-1584.

Berk, J. B. and Green, R. C. 2004. Mutual Fund Flows and Performance in Rational Markets. Journal of political economy 112, 1269-1295.

Bodie, Z., Kane, A., Marcus, A. J. 2012. Essentials of Investments 9th Edition. Braverman, O. and Wohl, A. 2005. The (Bad?) Timing of Mutual Fund Investors.

Busse, J. A. 1999. Volatility Timing in Mutual Funds: Evidence from Daily Returns. The Review of Financial Studies 12, 1009-1041.

Carhart, M. M. 1997. On Persistence in Mutual Fund Performance. The Journal of finance 52, 57-82. Chen, Y. and Liang, B. 2007. Do Market Timing Hedge Funds Time the Market? Journal of Financial and Quantitative Analysis 42, 827-856.

Clifford, C. P.; Fulkerson, J. A. and Jordan, B. D. 2014. What Drives ETF Flows? Financial Review 49, 619-642.

Collins, S., Antoniewicz, R., Holden, S., Steenstra, J. 2018. 2018 Investment Company Fact Book  Investment Company Institute.

Daniel, K. and Hirshleifer, D. 2015. Overconfident Investors, Predictable Returns, and Excessive Trading. Journal of Economic Perspectives 29, 61-88.

Dichev, I. D. 2007. What are Stock Investors’ Actual Historical Returns? Evidence from Dollar-Weighted Returns. American Economic Review 97, 386-401.

Dichev, I. D. and Yu, G. 2011. Higher Risk, Lower Returns: What Hedge Fund Investors really Earn. Journal of Financial Economics 100, 248-263.

Diebold, F. X. and Rudebusch, G. D. 1994. No Title. Measuring business cycles: A modern perspective.

Emsbo-Mattingly, L., Hofschire, D. 2017. The Business Cycle Approach to Asset Allocation. Fidelity Investments.

Referenties

GERELATEERDE DOCUMENTEN

Binne die gr·oter raamwerk van mondelinge letterkunde kan mondelinge prosa as n genre wat baie dinamies realiseer erken word.. bestaan, dinamies bygedra het, en

Theory and evidence from other studies showed that investors who were forced to trade or trades made during a bull market were more prone to the disposition

The main goal of this research is to determine whether Dutch fund managers earn abnormal returns compared to what an investor could earn with a passive strategy mimicking a

During these periods Dutch mutual funds underperform the benchmark and sector funds have significant higher return than country funds.. Additionally, during sub period 2 sector

Aangezien de P/E Ratio het meeste met de index correleert, kan verwacht worden dat de market timing methode Martket Timing based upon Normal Ranges een beter

The aim of this study is to examine the relationship of the economic freedom index, market impact costs and turnover on the average daily tracking error from international

The small spread between alphas and the close to zero average indicates that the FF (1993) three factor model with an additional market timing coefficient,

In this section, I analyse the determinants of the performance gap (Table 3, panel b) of the long-term mutual fund categories controlling for various funds characteristics such as the