Change-Point Detection on Swing Funds

(1)

University of Amsterdam

MSc in Stochastics and

Financial Mathematics

Master Thesis

Change-Point Detection

on Swing Funds

Examiner:

Dr. A.J. van Es

Supervisor:

Dr. Jan Jaap Hazenberg

Author:

Lin Lou

(2)

(3)

(4)

(5)

Acknowledgements

Foremost, I would like to thank my supervisor Dr. A.J.(Bert) van Es for his patience, motivation, humor, and knowledge. He is also the person that recruited me to this Master program. To some extent he helped me fulfil a dream of mine. Two years ago, I was won-dering whether I should go to work or try to study something I always wanted to. After the submission of this thesis, I can say that I left no regrets in my school days. Without his guidance in all the time of research and writing, this thesis cannot be done.

Besides my supervisor, I would also like to express my sincere gratitude to Dr. Jan Jaap Hazenberg who offered me the excellent internship opportunity. He led me to a new and challenging research subject. During the time I worked for him, I learned quite a lot from him and his team. His professional experience also helped me crack many research ques-tions and inspired me to develop the methodologies.

I also want to thank Willem van der Scheer, Wilmar Langerak and Ilya Kulyatin, three former colleagues of mine, for the stimulating discussions on the paper and methodologies, for the days we were working together, and for the trust they had on me.

Last but not the least, I would like to thank my parents. My love for them cannot be ex-pressed in words.

(6)

(7)

Abstract

In this thesis we develop a method to find the threshold of trading volume in swing pric-ing policy. This method is based on the simulation of market excess returns given different values of thresholds. The best estimate of the swing pricing threshold is the one that fits the swing pricing pattern the most. Since a fund management company may change its swing pricing policy, our study also means to find what change-point detection methods in multi-variate observations are more suitable for our research porpoise. Empirically, we find that the Bai-Perron break point analysis can give us satisfactory results. We aim to see whether there are alternative methods that can outperform the Bai-Perron. The latest nonparamet-ric change-point detection method developed by Matteson and James (2013) is based on divisive and agglomerative clustering, which can also work for multivariate distributions. Combined with our threshold detection model, we do not find that the two new methods can certainly improve the results.

(8)

List of Figures

3.1 22 Public Utilities . . . 21 3.2 Ward’s Clustering . . . 22 3.3 e-distance alpha=1 . . . 22 3.4 Euclidean Clustering . . . 23 3.5 E-Divisive Algorithm . . . 25 3.6 E-Agglomerative Algorithm . . . 26

4.1 Change in a Univariate Gaussian Sequence . . . 31

4.2 Changes in Correlation and Paremeters . . . 35

4.3 Only Correlation Change . . . 36

5.1 Correlogram of JPM . . . 40

5.2 Correlogram of CS . . . 41

5.3 Correlogram of Natixis . . . 41

5.4 Market Excess Returns of Natixis . . . 44

(11)

List of Tables

2.1 General Swing Pattern . . . 8

4.1 Deteced Change Points in Simulatied VAR(1) . . . . 35

4.2 Detected Correlation Change in Simulatied VAR(1) . . . . 37

5.1 Fund Info . . . 39

5.2 Sample Market Excess Returns . . . 39

5.3 Sample Flow Percentage . . . 40

5.4 Thresholds Detected without Change Point Study . . . 42

5.5 Detected Change Points (Bai and Perron) . . . 43

5.6 Swing Pricing Threshold (Bai and Perron) . . . 43

5.7 Detected Change Points (E-Divisive) . . . 45

5.8 Detected Change Points (E-Agglomerative) . . . 45

(12)

(13)

Chapter 1 Introduction

This thesis is a part of the study that I conducted in ING Investment Management (ING IM). In this thsis, we develop a unique methodology that allows us to find the swing thresh-old and detect the changes of a fund’s anti-dilution policy, known as swing pricing, from its daily NAV, number of outstanding shares and benchmark time series. As a continuation of my work in ING IM, we propose alternative methods to detect change points in a fund’s swing pricing policy in order to see if there is a better alternative to our model. In the em-pirical part of the thesis, we test the methodology on a sample of high yield bond funds. This category has relatively high transactions costs in underlying portfolios, resulting in a favorable signal-to-noise ratio.

1.1 Anti-Dilution

The first open-end fund worldwide, The Massachusetts Investors’ Trust, was launched in the United States (U.S.) in 1924 (Wilcox, 2003, p. 645) [29]. After a relatively slow start, this fund would become the largest U.S. investment fund in the 1950s. The open-end concept was copied by other funds in the U.S. already from the 1920s and in other countries after World War II (Slot, 2004, p. 107-108) [25] and this fund structure has become the dominant structure for funds globally . Mutual funds in Europe, set-up in accordance with the UCITS Directive, are obliged to have an open-end structure, providing investors the possibility to buy or sell units at least twice a month. In practice, substantially all UCITS funds provide daily liquidity.

An advantage of open-end funds is that investors can always redeem their fund investment at a price based on the Net Asset Value (NAV). When investors do not like the results of

(14)

2 Introduction

the fund in which they are invested, they can sell their investment and effectively fire the management company. However, there is a disadvantage associated with the liquidity of open-end funds. When the price of a fund on a dealing day is the same for redemptions and subscriptions and only reflects the value of the net assets of the fund per share, transaction costs at the portfolio level of the fund incurred as a result of net subscriptions into or net redemptions from the fund fall on all existing investors in the fund, not on the investors trading that day. The transaction costs in the portfolio as a result of the capital activity at the level of the fund, dilute the value of existing shareholders’ interests. In effect, the per-formance of the fund is reduced by these transaction costs and wealth is transferred from existing investors to investors subscribing or redeeming.

This dilution affect can be mitigated when funds are not traded at NAV, but apply a bid-and-ask price, where the difference between NAV and price accrues to the fund, or use swing pricing. Swing pricing is an approach where there is one transaction price per day, but that price is higher than the NAV when there are net subscriptions and is lower when there are net redemptions. This type of anti-dilution measures is not compliant with U.S. fund regulation. In Europe, the UCITS Directive neither obliges, nor forbids funds to im-plement such an anti-dilution policy. This has resulted in different regulations and market practice across the various markets within the European Union. In Luxembourg and Ireland, the two largest UCITS domiciles for cross-border fund distribution (Lipper, 2010) [16], it is left to the discretion of the fund management company as to whether or not any anti-dilution measures are implemented, resulting in differences between fund providers and individual funds in the anti-dilution policy applied. This ranges from funds not taking any anti-dilution measures to funds applying a full swing policy, whereby the mechanism is applied on each dealing day that there are subscriptions or redemptions. Others apply partial swing pric-ing, whereby the mechanism is applied only when the total net cash flow exceeds a certain pre-set threshold. The level of the anti-dilution charge, the swing factor, is not standardized either.

Both in Luxembourg and Ireland, fund management companies tend to provide only mini-mal transparency with regard to their anti-dilution policy. However, typically fund providers only publish daily transaction prices, not the NAV before the anti-dilution adjustment or the factor by which the NAV is adjusted to counter dilution. Also the thresholds applied are not disclosed with the argument that this would encourage "gaming" of the threshold, i.e. large investors spreading their subscriptions and redemption over multiple days in order not

(15)

1.2 Change-Point Analysis 3

the breach the threshold and activate the swing pricing mechanism on any day. For fund investors, this obscurity has a number of disadvantages. Firstly, when considering a sub-scription or redemption, an investor does not know whether swing pricing will occur, nor, when it does, by which percentage. Secondly, an investor cannot evaluate the effectiveness of the anti-dilution policy of a fund compared to other funds available. Thirdly, swing pric-ing makes it more difficult to evaluate the performance of a fund. A fund’s performance is dependent on the investments made by the fund - which in turn reflect the skill of the port-folio manager - and of the anti-dilution policy applied. Performance statistics published by a fund and available from sources such as Morningstar are not decomposed in components reflecting portfolio performance and the swing pricing policy, respectively. These disad-vantages combined make it difficult for a fund investor to select the appropriate fund for investment.

1.2 Change-Point Analysis

Change point analysis is the process of detecting distributional changes within observa-tions . Nowadays, with the growing importance of financial modeling, change point analysis also has been increasingly applied to the financial industry. In the finance literature, one of the most popular empirical techniques for assessing the informational impact of an ’event’ is the ubiquitous ’event’ study. This was pioneered by Fama et al. (1969) [8] in a famous study of stock splits, and by Ball and Brown (1968) [4] in their analysis of the impact of changes in accounting numbers. In our study, we examine the swing pricing events, which are the co-movements between fund’s returns and fund’s flows.

It is very crucial to know where a swing pricing policy changes. The change in a swing pricing policy does not only refer to the change on the volume of selling and buying, but also relates to when a swing pricing policy is kicked into a fund. For some funds the swing pricing policy has been introduced years ago. However, there are also some funds that recently implement the anti-dilution mechanism. Knowing when the swing policy is intro-duced to a fund is relevant to study the fund management company’s strategy. Moreover, a fund management company may also change its swing policy. Hence, given a fund’s in-formation for a period, if we know when the policy is introduced and when it changes, we can divide this period into subgroups according to its changes. Afterwards we can study the swing pricing policy in each subperiod to get the whole picture of this fund’s swing pricing

(16)

4 Introduction

for the period.

The history of change point analysis can be dated back to sixty years ago. Page (1954) [22] devises CUSUM as a method to determine changes in it, and proposed a criterion for deciding when to take corrective action. In general, change point analysis may be per-formed in either parametric and nonparametric settings. Parametric analysis necessarily assumes that the underlying distributions belong to some known family, and the likelihood function plays a major role. For example, in Lange, Carlin et al. (1992) [14] and Lavielle and Teyssiere (2006) [15] analysis is performed by maximizing a log-likelihood function, while Page (1954) examines the ratio of log-likelihood functions to estimate change points. Bai and Perron (1998) [2] consider issues related to multiple structural changes, occurring at unknown dates, in the linear regression model estimated by least squares. Additionally, Davis et al. (2006) [6] combine the log-likelihood, the minimum description length, and a genetic algorithm in order to identify change points. Nonparametric alternatives are appli-cable in a wider range of applications than parametric ones (Hariz et al., 2007) [9]. Erdman and Emerson (2007) [7] design a method to perform Bayesian single change point analysis of univariate time series. It returns the posterior probability of a change point occurring at each time index in the series. Nonparametric approaches often rely heavily on the estima-tion of density funcestima-tions (Kawahara and Sugiyama, 2011) [10], though they have also been performed using rank statistics (Lung-Yut-Fong et al., 2011) [17].

Another issue in change point analysis is the dimensionality. Most of methods mentioned in the last paragraph focus on the univariate data. Killick and Eckley (2011) [11] provides many methods for performing change point analysis of univariate time series. Currently, their method is only suitable for finding changes in mean or variance. Although their method only considers the case of independent observations, the theory and application behind the implemented methods allows for certain types of serial dependence (Killick et al., 2011) [12]. Ross and Adams (2012) [24] similarly provides a variety of methods for performing change point analysis of univariate time series. These methods range from those to detect changes in independent Gaussian data to fully nonparametric methods that can detect gen-eral distributional changes.

We will mainly focus on a newly developed change-point detection method for our swing pricing study. Matteson and James (2013) [19] propose a nonparametric approach based on Euclidean distances between sample observations. It is simple to calculate and avoids

(17)

1.2 Change-Point Analysis 5

the difficulties associated with multivariate density estimation. Their method does not re-quire any estimation of density function. The proposed approach is motivated by methods from cluster analysis (Szekely and Rizzo, 2005) [26]. This method is illustrated in Chap-ter 3. Although Matteson and James (2013) [19] relax the assumptions on the underlying multivariate distribution to a large extent, they still assume the independency. However, in Chapter 4 we will see that serial dependence, which usually happens in time series data, will jeopardize the performance of their method. In Chapter 5, we also see that their method does not outperform Bai-Perron method for the threshold detection on swing funds.

The layout of the thesis is as follows. Chapter 2 explains the effects swing pricing policy gives to a fund’s returns and flow percentage which is defined as the percentage ratio of a fund’s flow at day t to the fund’s size at the previous day. In this Chapter we also introduces the method devised by Bai and Perron in 1998 [2]. Their method empirically works for the swing regime changes in practice. In Chapter 3 we aim to examine an alternative for Bai-Perron method for finding swings within a fixed regime period, which is the nonparametric approach proposed by Matteson and James. This method is based on hierarchial clustering of the e-distance derived by (Szekely and Rizzo, 2005) [26]. In Chapter 4 we simulate this new method to check whether this method works for autocorrelated observations. One can see the empirical results in Chapter 5. Chapter 6 concludes.

(18)

Chapter 2 Swing Pricing

This chapter aims to give a brief introduction of some financial concepts and explain the mechanism of swing pricing

2.1 Determining the Threshold within Fixed Regime Time

Interval

Let Rs_t and Div be the share class1return at time t and the daily dividend of the underly-ing share class. Let N(s) denote the number of shares for share class s. NAV_ts is the price of share class s at time t. Note that the fund return is only based on non-hedged share classes in the base currency of the fund, otherwise the benchmark adjusted return would be affected by currency results. Suppose there are n share classes of non-hedged share classes in a fund, the weight on share class at time t is given by

wst =

NAV_ts× N(s)

∑n

s=1NAVts× N(s)

(2.1)

To introduce the threshold detection model, we also need to calculate the return for share class s in a fund at time t. Define R_ts as the share class s’s return at time t, (R_ti) as daily fund return for fund i at time t, wst as the weight for share class s at time t, and daily market

1_{A designation applied to a specified type of security such as common stock or mutual fund units.}

Com-panies that have more than one class of common stock usually identify a given class with alphabetic markers, such as "Class A" shares and "Class B" shares. Different share classes within the same entity typically confer different rights on their owners.

(19)

2.1 Determining the Threshold within Fixed Regime Time Interval 7

excess returnεt. The number of share classes in fund i is n, then we have:

R_ts= NAV s t + Divst NAV_t−1s − 1 (2.2) R_ti= n ∑ s=1 wstRts (2.3)

By definition, market excess return is defined to be the difference between a fund’s returns and its underlying benchmark2 returns. Let Rb_t be the benchmark return at time t, defined by Pt/Pt−1− 1, where Pt is the benchmark’s price at time t. However, in reality funds may

be priced at different times than benchmarks. We have seen multiple funds of which the returns are highly correlated with benchmark’s returns on the same day and one day before. We can understand this observation since, for example, that the benchmark is priced in the afternoon and the fund is priced at noon. This is probably related to T + 1 accounting, which means that funds at day t publish NAVs calculated using t− 1 prices based on transactions on t−2. Hence, the fund price will absorb the market information of the current day and the day before. R_t−1b is the benchmark return lagged by one day. Hence, we propose to calculate the market excess return in the following way

Ri_t=α+β1Rtb+β2Rb_t−1+εt, (2.4)

whereεt is a normal white noise. Also note that the autocorrelation within benchmark’s re-turns barely exists. Hence multicollinearity is not of concern in the above regression model.

For determining potential swing pricing events, the basic idea is that we use that the typical characteristics of a swing pricing pattern. A swing pricing pattern can be characterized by a large market excess return (due to the price adjustment of the fund in order to avoid dilu-tion) at the start of the pattern. The large market excess return is in the same direction as an above-threshold flow (as the NAV swings upward in the case of an inflow and downward in the case of an outflow). Once flows are below the threshold, the swing pricing pattern ends. The market excess return is followed by the same order of magnitude, but in the opposite direction.

2_{A standard against which the performance of a security, mutual fund or investment manager can be}

(20)

8 Swing Pricing

We introduce a characteristic function and name it Return Direction (RD) to describe whether the tracking error is in excess of a hurdle interval3. For example, given the hurdle interval, the observed tracking errors over the sample period will be translated into the RD vector with elements of 1, 0, and−1. We also introduce another vector, f , which is used to indi-cate if there is any big flow in a business day. Let F be the real data of flow percentage, which is defined as the ratio of the fund flow4to the fund size at the day before, and Tr be the swing threshold implemented in the fund respectively. The flow percentage at day t is defined as the ratio of fund flows at day t over the sum of total net assets of the fund at day

t− 1 and fund flows at day t − 1. If Ft ≥ Tr or Ft≤ −Tr, then ft will take values of 1 and −1 respectively. Otherwise it will be zero. A more general swing pattern is illustrated in

a table as follows. In the following table, RD stands for the direction of the movements of market excess returns. "1" ("-1") refers to a positively (negatively) large excess returns and flow percentage. "0" means that the magnitude of market excess return and flow percentage is within the normal range. We will discuss how to define " large" after this table.

Table 2.1 General Swing Pattern Day RD f 1 1 1 2 0 1 3 0 1 4 -1 -1 5 1 1 6 0 1 7 -1 0

Inspired by this simulated pattern, we propose a model to find the swing threshold. The market excess series is equivalent to the residual series in (2.4). The hurdle rates that determine RD’s values are consisted to be intervals whose upper/lower bound are the mean of residual series (µ_ε) plus/minus one standard deviation of it (σ_ε). We will assign values to

3_{The hurdle interval is an interval of normal values of market excess returns. If a market excess return}

falls in the interval, then we will conclude that this market excess return is within the normal range of market excess return movements. Otherwise we will classify this market excess return abnormal which is possibly caused by swing pricing

4_{The net of all cash inflows and outflows in and out of various financial assets. Fund flow is usually}

measured on a monthly or quarterly basis. The performance of an asset or fund is not taken into account, only share redemptions (outflows) and share purchases (inflows).

(21)

2.1 Determining the Threshold within Fixed Regime Time Interval 9 RD as the following. RDt=      −1 ε ≤µε−σε 1 ε ≥µ_ε+σ_ε 0 Otherwise (2.5)

We still need to develop this equation further so that it can be used for our swing pricing threshold detection. We need to only take into account those market excess returns that are generated by swing pricing patterns. We write "abnormal" for these the elements in the

RD that are not last zeros. The reason is that sometimes fund returns can be volatile for

other causes than swing pricing. If we include all abnormal market excess returns, it will jeopardize the precision of our model. Under this assumption, we will just give zero values to those non-zero RDs that are not in the pattern of swing pricing. In other words, we are blind to a RD if there are not big in/out flow on the RD’s corresponding trading day and the trading day before that. We denote RD_t∗as the "clean" vector after removing non-zero RDs not caused by swing pricing. So RD_t∗regards.

RD_t∗=      −1 ε≤µε−σε 1 ε≥µ_ε+σ_ε 0 f_t−1= 0, ft= 0 (2.6)

Under these assumptions, for any given threshold, we will have a vector of f. We will also expect that there is a simulated RD vector that is in line with the swing pattern. Hence, the goal of model is to find the threshold that minimizes the difference between the simulated market excess returns (RD(Tr)) and the one in our observations (RD∗).

arg min

Tr

N = COU NT (RD(Tr)̸= RD∗) (2.7)

Note that the function "COUNT" in (2.7) counts the number of unmatched time points be-tween RD(Tr) and RD∗. (2.7) can be viewed as a general form of Hamming Distance, for which we do not only take into account 0 and 1, but also -1 in the calculations. Tr is the pre-set threshold value for one simulation. For each trial of simulation, we will fix the value of the threshold through the sample period. Assuming that from day t there is a big flow but on day τ there is not, and there are big flows within the interval [t,τ), the following algorithm will simulate a swing pattern from [t,τ]. The Matlab code for the optimization of the COUNT function is put in Appendix B. Note: before timeτ the flow percentage can

(22)

10 Swing Pricing

only take a value of 1 or -1, but not 0.

Data: values of f are taken within the one swing time interval,t<u<τ if fu∗ fu+1=−1then

RD(Tr)u+1∗ fu= 1

else if fu∗ fu+1= 1then RD(Tr)u+1= 0

else

RD(Tr)_τ =− f_τ−1 end

Algorithm 1:Simulation Procedure for RD(Tr)

The above description is a simulation procedure for one swing pattern. Computer im-plementation will help automatically move to the start point of next swing pattern. The simulation will run through the threshold values from 0% to 20% with increment of one basis point. Hence, in total there is 2000 trails of simulated RDs for each fund. We also calculate the number of dates where RD(Tr) are not zeros and that of matched points be-tween RD(Tr) and RD∗. Thus, we can calculate the ratio of simulated market excess returns explained by swing pricing, which means how many points in RD∗match with RD(Tr). The ratio will serve as a benchmark to conclude whether a fund is swung or not. We will choose the ratio larger than 80% as an indication of fund that has bears swing policy. We choose 80% as a benchmark ratio because we have seen that it works empirically.

2.2 Estimating Change Points of Swing Regimes

A potential problem the threshold detection model will encounter is if the fund changed from one swing regime to another throughout the sampling period. It could be the case that a fund includes multiple swing thresholds or that the fund’s swing policy is implemented after the start date of the observations. Our threshold detection model works under the as-sumptions that the swing policy is already implemented to a fund before the start date and remains unchanged. If this is not true, even if we can generate the optimal threshold value for a fund, it is still possible that a ratio lower than 80% will eventually turn to conclude that the fund is not swung. Therefore, it is crucial to investigate whether there are multiple swing regimes in a fund, and if so, when it changes.

(23)

2.2 Estimating Change Points of Swing Regimes 11

This question boils down to the famous break-point analysis. The changes in swing regimes involve two time series,ε and F (the market excess returns and the flow percentages). We do not want to investigate the changes in two time series separately. Instead, we aim to see on which points they change together, i.e. where the correlation of these two time series changes. We will use the method proposed by Bai and Perron (1998) [2]. For now we as-sume the number of break points are known. We consider the following linear regression with m breaks (m+1 regimes):

εt = Ftzj+µt, (2.8)

where t = Tj−1+ 1,··· ,Tjand j = 1,··· ,m+1. Note that T1< T2<··· < Tm< Tm+1= T . Ft

is the vector covariate and zj( j = 1,··· ,m + 1) is the corresponding vector of coefficients; µt is the disturbance at time t. The indices (T1,··· ,Tm), or break points, are explicitly

treated as unknown. The estimation method is based on the least-squared principle. For the m-partition (T1,··· ,Tm), the associated least-squares estimated of zj ( j = 1,··· ,m + 1)

are obtained by minimizing the sum of squared residuals∑m+1_j=1 ∑_t=TTj

j−1+1

[

εt− Ftzj

]2

. Let ˆzj = ˆz( bTj) denote the resulting estimates. Submitting them in the objective function and

denoting the resulting sum of squared residuals as ST(T1,··· ,Tm) such that ST(T1,··· ,Tm) = m+1 ∑ j=1 Tj ∑ t=Tj−1+1 [ εt− Ftˆzj ]2 . (2.9)

The estimated break points are given by

( bT1,··· , bTm) = arg min T1,···,Tm

ST(T1,··· ,Tm) (2.10)

where the minimization is taken over all partitions (T1,··· ,Tm) such that Tj−Tj−1 ≥ h > 1. h is a constant number. Hence, it is obvious that the break-point estimators are minimizers

of the objective function but with restriction Tj−Tj−1≥ h. Finally the regression parameter

estimates are the associated least-squares estimates at the estimated m-partition bTj.

In line with the test statistics proposed by Bai and Perron (1998) [2], which is described in next section, we will firstly do a test of no break versus a fixed number of breaks. Af-terwards we will perform Bai-Perron test of m-1 versus m breakpoints to see the number of multiple regime change in our funds.

(24)

12 Swing Pricing

The optimal number of breaks is estimated in the procedure suggested by Yao (1987) [30], who showed that the number of breaks can be consistently estimated using the Bayesian Information Criterion, defined as

BIC(m) = ln ˆσ2(m) + p∗ln(T )/T (2.11) where p∗ = (m + 1)q + m, and ˆσ2 = T−1ST( bT1,··· , bTm), and where q is the number of

covariates that are subject to structural changes. In our analysis, the model is subject to pure structural change, and thus q is equal to 1. The term p∗ln(T )/T will serve as the penalty function, as the number of change points increases, the larger the loss function will be. Hence, the minimum BIC will attain the optimal number of change points. We will first apply Bai-Perron analysis to all funds to yield the change points. After that we will divide each fund into its segments on which we will apply our threshold detection model.

2.3 Test Statistics

In this section we will have a look at the test statistics proposed by Bai and Perron (1998) [2]. Details of the method and proof are in their paper. Since the swing regime change in our case is subject to a pure structural change, the test statistic implemented in our study is a simple version of that of Bai and Perron. We need to impose some restrictions on the possible values of the break dates. In particular, each break data must be asymptotically distinct and bounded away from the boundaries of the sample. Also, like the reasoning in the method of sliding windows (see Basseville and Nikiforov, 1993) [5], a trimming parameter is required so that the minimal length h for a segment is fixed. To these effects, we let

λi= Ti/T for i = 1,··· ,m, and for some arbitrary positive numberθ, a trimming parameter

which imposes a minimal length h for a segment , i.e.θ = h/T , is introduced. Write

Λθ ={(λ1,··· ,λm);|λi+1−λi| ≥θ, i = 1,··· ,m − 1,λ1≥θ,λm≤ 1 −θ}. (2.12)

Now substituting these into the objective function and denoting the resulting sum of squared residuals as ST(T1,··· ,Tm), the estimated break points ( bT1,··· , bTm) are

( bT1,··· , bTm) = arg min

(λ1,···,λm)∈Λ_θ

ST(T1,··· ,Tm), (2.13)

i.e. with the minimization taken over all partitions (T1,··· ,Tm) such that Ti−Ti−1≥ h = Tθ.

(25)

2.3 Test Statistics 13

As discussed earlier, we firstly consider the sup F∗ 5type test of no structural breaks (m = 0) versus k breaks (m = k). Let (T1,··· ,Tk) be a partition such that Ti= [Tλi]. Define a

con-ventional matrix R such that (Rz)′= (z′₁− z′₂,··· ,z′_k− z′_k+1) and

F_T∗(λ1,··· ,λk) = 1 T ( T− k − 1 k ) ˆz′R′(RbV (ˆz)R′)−1Rˆz, (2.14)

where bV (ˆz) is an estimate of the covariance matrix of ˆz robust to serial correlation and

het-eroscedasticity. The robust estimate of the covariance matrix can be done via the method derived in Newy and West (1987) [21].

We need to introduce some notations. For i = 1,··· ,m, let T_i0 be the true segment point and∆T_i0= T_i0− T_i0₋₁. Define Ω = lim_m→∞(∆T_i0)−1 T_i0 ∑ r=T_i0₋₁ T_i0 ∑ t=T_i0₋₁ E(zrz′tµrµt). (2.15)

An estimate of Ω can be constructed using the covariance matrix estimator of Andrews (1993) [1] applied to the vector ztµˆt. ˆµt is the estimated value ofµt. LetΣ = ΣiI where I is

an identity matrix, a consistent of the covariance matrix estimate of ˆz is b V (ˆz) = lim T→∞T ( ¯ F′F¯)−1F¯′Ω ¯F(F¯′F¯)−1, (2.16)

where ¯F is the matrix which diagonally partitions F at the m-partition (T1,··· ,Tm), i.e.,

¯

F = diag( ¯F1,··· , ¯Fm+1) with ¯Fi= ( fTi−1+1,··· , fTi)′. Allowing for serial correlation in the

errors, the test statistic may be rather cumbersome to compute. However, one can obtain a much simpler, yet asymptotically equivalent, version by using the estimate of the break dates obtained from the the minimization of the sum of squared residuals. Define

b

V (ˆz) = (F¯ ′_F¯

T )

−1_. _(2.17)

This procedure is asymptotically equivalent since the break dates are consistent even in the presence of serial correlation. The details can be seen in their paper.

5_{To avoid confusion, we use F}∗_{to denote F-test instead of F. In Chapter 2, we introduce F as the vector of}

(26)

14 Swing Pricing

The statistic F∗ in (2.14) is simply the conventional F-statistic for testing z1=··· = zk+1

against zi̸= zi+1for some i given in the partition (T1,··· ,Tk). In line with the test in Andrews

(1993), the supF∗test is defined as

sup F_{T (k)}∗ = sup

(λ1,···,λm)∈Λ_θ

F_T∗(λ1,··· ,λk), (2.18)

where ( ˆλ1,··· , ˆλk) minimizes the sum of squared residuals under the specified trimming

valueθ.

Bai and Perron also propose a test for l versus l + 1 breaks. The test amounts to the ap-plication of (l + 1) tests of the null hypothesis of no structural change versus the alternative hypothesis of a single change. It is applied to each segment containing the observations

b

Ti−1+ 1 to bTi for i = 1,··· ,l + 1 using the convention that bT0= 0 and bTi+1= T . We

con-clude for a rejection in favor of a model with (l + 1) breaks if the overall maximum of the sup F type statistics is sufficiently large. Bai and Perron (2003) [3] also provides the critical value of the test statistics. In our study we will stick to the critical values they provide.

(27)

Chapter 3 Hierarchical Change Point Analysis

3.1 General Framework

In this chapter we will discuss another nonparametric change-point detecion method proposed by Matteson and James (2013) [19], which is designed for fitting multiple change points to multivariate data. Change point analysis is the process of assessing distributional changes within time ordered observations. This is an important leap for the analysis of fi-nancial data. The method developed by Matteson and James (2013) [19] initiates a different approach. The approach is based on maximizing Euclidean distances cross sample observa-tions to detect multiple breakpoints. This method can detect any distributional change, and does not make any distributional assumptions beyond the existence of the α-th moment. Estimation is performed in such a way that both the number and locations of change points can be detected simultaneously.

The framework of their method can be described as follows. Let Z1, Z2, . . . , ZT ∈ Rd be

an independent sequence of time-ordered observations. For simplicity, one can think of the scenario where there is only one change point in our observations,τ, which breaks the observations into two different distributions, F1 and F2 respectively. Note that these two

distributions are not necessarily known. Hence, we will have that Z1, Z2, . . . , Zτ

iid

∼ F1 and

Z_τ+1, Z_τ+2, . . . , ZT iid

∼ F2. Next, similar to Bai-Perron’s break point analysis we discussed in

the previous chapter, they test for homogeneity in distributions. However, unlike the F-test which Bai and Perron (1998) [2] employ, they perform a permutation test using re-sampling statistics. We will explain this into detail in Section 3.3.3. Finally, they extend their ap-proach to a more general framework where multiple change points can be detected under a hierarchical clustering technique.

(28)

16 Hierarchical Change Point Analysis

3.2 Agglomerative Clustering

3.2.1 Divergence in Multivariate Distributions

Ward (1963) [28] suggested a general hierarchical clustering procedure where the crite-rion for selection the optimal pair of clusters to merge at each step is based on the optimal value of an objective function. Many clustering procedures are encompassed by this method. The objective function that Ward uses is to minimize the increase in total within-cluster sum of squared error, which is based on squared Euclidean distance. The advantage of this method is that the two properties of cluster homogeneity and cluster separability are incor-porated in the cluster criterion. However, there is a strong shortcoming of Ward’s method. Milligan (1980) [20] finds that Ward’s method tends to join clusters with a small number of observations, and it is strongly biased toward producing clusters with roughly the same number of observations. It is also very sensitive to outliers. Moreover, Ward’s method joins clusters to maximize the likelihood at each level of the hierarchy under the assumption of a multivariate normal mixture. Szekely and Rizzo (2005) [26] extends Ward’s method. Their cluster distance allows to choose different values ofα-norm, not proportional to Euclidean distance, rather than the on squared Eulidean distance as in Ward (1963) [28].

Let (t, x) denote the scalar products of vectors t, x∈ Rd, so (t, x) = t1x1+··· +tdxd. ϕ(·) is

a complex-valued function of which the conjugate is denoted byϕ, and the L2-norm|ϕ|2 is defined asϕϕ. The Euclidean norm of x∈ R2is denoted by|x| unless when there is any ambiguity. Let X′be an independent copy of X .

For any random variables X ,Y ∈ Rd, the characteristic functions of X and Y are ϕX and

ϕY respectively, soϕX(t) = E exp(itX ) andϕY(t) = E exp(itY ). Matteson and James (2013) [19] measure the divergence between multivariate distributions by

∫

Rd|ϕX

(t)−ϕY(t)|2ω(t)dt. (3.1) In the above equation, ω(·) is an positive weight function such that the integral exists. We will use the following weight function proposed by Szekely and Rizzo (2005),

ω(t;α) = ( 2πd/2Γ(1 −α/2) α2αΓ((d +α)/2)|t| d+α )−1 , (3.2)

(29)

3.2 Agglomerative Clustering 17

where α ∈ (0,2) and d denotes the dimension of multivariate distribution and Γ is the Gamma Function.Then, if E|X|α+ E|Y|α <∞, then substituting (3.2) to (3.1), the

diver-gence measure is defined as

D(X,Y;α) =

∫

Rd |ϕX(t)−ϕY(t)|2 ( 2πd/2Γ(1 −α/2) α2αΓ((d +α)/2)|t| d+α )₋₁ dt. (3.3)

Given that X , X′,Y and Y′are mutually independent and E|X|α+ E|Y|α <∞, an alternative

divergence measure based on Euclidean distances defined by Szekely and Rizzo (2005) [26] is

E (X,Y;α) = 2E|X −Y|α− EX− X′α− EY−Y′α. (3.4) Although it is not immediately obvious that this distance is not negative. Szekely and Rizzo (2005) [26] prove the non-negativity in the following theorem.

Theorem 1(Szekely and Rizzo (2005)). Suppose X , X′∈ Rdare independent and identically distributed (iid) with distribution F1, Y,Y′∈ Rdare iid with distribution F2, E(|X|α <∞ and

E(|Y|α) <∞. Then

E (X,Y;α) = 2E|X −Y|α− EX− X′α− EY−Y′α ≥ 0, (*)

and equality holds if and only if X and Y are identically distributed. Proof. See Szekely and Rizzo (2005).

The following lemma derived by Szekely and Rizzo (2005) [26]. This lemma links the divergence measure in (3.3) to the distance in (∗)., which motivates a simple empirical di-vergence measure for multivariate distributions.

Lemma 1(Szekely and Rizzo (2005)). For any pair of independent random vectors X ,Y ∈ Rd_{, and for any}_α_{∈ (0,2), if E(|X|}α₊_|Y|α_{) <}_{∞, then E (X,Y;}_α_{) =}_D(X,Y;_α_),_{E (X,Y;}_α₎_∈

[0,∞), and E (X,Y;α) = 0 if and only if X and Y are identically distributed.

Lex X∼ F1and Y ∼ F2for arbitrary distributions F1and F2. Moreover, letα∈ (0,2) such

that E|X|α+ E|Y|α <∞. Let Xn={Xi: i = 1, 2, . . . , n} and Ym={Yj: j = 1, 2, . . . , m} be

(30)

1 suggests an empirical divergence measure analogous to (3.4). Define

b E (Xn,Ym;α) = 2 mn n ∑ i=1 m ∑ j=1 Xi−Yjα− 1 n2 n ∑ i=1 n ∑ k=1 Xi− Xk′ α₋ 1 m2 n ∑ i=1 m ∑ k=1 Yi−Yk′ α . (3.5)

As Ward’s method, the above empirical measure takes into account the within and between divergence of clusters. However, it is not transparent to see why Szekely and Rizzo call this measure as an extension to Ward’s. To see the connection with Ward’s distance, we need to incorporate the within distance to|X − X′|α and|Y −Y′|α to the between distance|X −Y|α. Let ¯X and ¯Y denote the centroids or centers of the underlying distributions of X and Y, and

replaceα by 2, then n ∑ i=1 Xi−Yj 2 = n ∑ i=1 Xi− ¯X + ¯X −Yj 2 = nD1+ n ¯X−Yj 2 ,

where D1is the sample average of|X − ¯X|, which is 1_n

∑n

i=1|Xi− ¯X|.

Hence, applying the same reasoning again, we have

m ∑ j=1 n ∑ i=1 Xi−Yj 2 = mnD1+ n m ∑ j=1 ¯X−Yj 2 = mnD1+ n m ∑ j=1

¯X− ¯Y + ¯Y −Yj

2 = mnD1+ mnD2+ nm| ¯Y − ¯X|2 = nm ( D1+ D2+| ¯Y − ¯X|2 ) where D2=1_n ∑m j=1|Yi− ¯Y|2.

Similarly, we have∑n_k=1∑n_i=1|Xi− XJ′| = 2n2D1and

∑n k=1

∑n

j=1|Yi−YJ′| = 2n2D2. Plug

in these terms into (3.5), we see b E (Xn,Ym;α) = 2 mn ( nm ( D1+ D2+| ¯Y − ¯X|2 ) − 1 n22n 2_D 1− 1 m22m 2_D 2 ) = 2| ¯Y − ¯X|2

(31)

By the strong law of large numbers one can see that (3.5) almost surely converges to (3.4) as

n∧ m → ∞. With Equation (3.5), one does not need to estimated the Euclidean divergence

by performing d-dimensional integration. Furthermore, Matteson and James (2013) [19] introduced a scaled empirical divergence to perform change point analysis. Let

b

Q(Xn,Ym;α) = mn

m + nE (Xb n,Ym;α) (3.6)

One can see that (3.6) is a weighted squared Euclidean distance between cluster centers. Szekely and Rizzo (2005) [26] define this distance measure as the e-distance. For equal distributions, Rizzo and Szekely (2010) [23] show that bQ(Xn,Ym;α) converges in

distri-bution to a non-degenerate random variableQ(X,Y;α) as n∧ m → ∞. If the multivariate distributions are unequal, we note that bQ(Xn,Ym;α) converges to infinity almost surely as n∧ m → ∞. This asymptotic result motivates the statistical tests that will be described in

Section 3.3.2.

3.2.2 Merging

After defining the divergence in multivariate distributions, sample observations are merged in accordance with the minimum distance. Observations are to merge from bottom to top, starting from singletons. We calculate the distance for each singleton to others. Afterwards, we merge the two singletons with minimum distance. An infinite family of agglomerative hierarchical clustering algorithms is represented by the following recursive formula, derived by Lance and Williams (1967) [13]. It is also known as Lance-Williams Algorithms.

Suppose that there are two clusters to be merged, Ci and Cj. Let di j, dik and djk are pair

wise distances between clusters Ci, Cjand Ckrespectively. Denote the distance between the

new clusters after merging, Cij and Ck, as d(i j)K. The Lance-Williams Algorithm computes

d_{(i j)k} recursively by

d(i j)k=αidik+αjdjk+βdi j+γ|dik− djk|,

where αi, αj, β, and γ are parameters, which may depend on cluster sizes, that together

with the cluster distance function di j determine the clustering algorithm. Ward’s minimum

(32)

Ci, Cjand Ckwith sizes ni, njand nk respectively.

d(Ci j,Ck ) = ni+ nk ni+ nj+ nk d (Ci,Ck) + nj+ nk ni+ nj+ nk d(Cj,Ck ) − nk ni+ nj+ nk d(Ci,Cj ) .

Then the Lance-Williams parameters for Ward’s minimum variance method areαi= ni+nk

ni+nj+nk,

αj= nj+nk

ni+nj+nk,β =

nk

ni+nj+nk andγ = 0 respectively. It has been shown that the e-distance is

a general form of Ward’s method allowingα to be chosen randomly from (0, 2]. Hence, not surprisingly, e-distance is also calculated recursively with the same Lance-Williams param-eters as in Ward’s method. Consequently, we have

b Q(Ci j,Ck ) = ni+ nk ni+ nj+ nk b Q (Ci,Ck) + nj+ nk ni+ nj+ nk b Q(Cj,Ck ) − nk ni+ nj+ nk b Q(Ci,Cj ) .

To illustrate the e-distance clustering we provide an example in this subsection to compare with Ward’s method. We use an available example on the internet. The example shows clustering techniques with different linkage measurements such as average linkage and sim-ple linkage. We will exemplify the e-distance clustering by the same examsim-ple data. As a comparison, we also give the dendrogram of agglomerative clustering on Ward’s minimum variance method. A Matlab code developed by Szekely and Rizzo for this merging method is attached on Appendix B. In the e-distance measure we will consider both the cases ofα= 1

andα= 2. The data can be downloaded at http://faculty.smu.edu/tfomby/eco5385/data/Utilities.xls.

Suppose that we are interested in forming groups of 22 US public utilities. There are 8 measurements on each utility as in the following figure. The objects to be clustered are the utilities. They are described below. An example where clustering would be useful is a study to predict the cost impact of deregulation. To do the requisite analysis economists would need to build a detailed cost model of the various utilities. It would save a considerable amount of time and effort if we could cluster similar types of utilities and to build detailed cost models for just one typical utility in each cluster and then scaling up from these models to estimate results for all utilities.

(33)

Figure 3.1 22 Public Utilities

We see in the above firgure there are 22 public utilities. Each of them has eight variables. X1: Fixed-charge covering ratio (income/debt), X2: Rate of return on capital, X3: Cost per KW capacity in place, X4: Annual Load Factor, X5: Peak KWH demand growth from 1974 to 1975, X6: Sales (KWH use per year), X7: Percent Nuclear, X8: Total fuel costs (cents per KWH).

In the next three figures, we see Ward’s distance clustering, the e-distance clustering with

α = 1 and the Euclidean distance clustering which is set as default in many programming softwares. In this example, we see that the first two divergence measures give the same clus-ters. However, we consider that a coincidence. In the original paper of Szekely and Rizzo (2005) [26], Ward’s method and e-distance clustering indeed generate different results.

(34)

Figure 3.2 Ward’s Clustering

4 10 15 21 2 12 17 7 13 20 5 1 3 9 6 14 18 22 8 19 11 16 0 1 2 3 4 5 6 7 x 104 Groups Distance

Dendrogram e−distance (alpha=1)

(35)

3.3 Change Point Estimation 23 4 10 15 21 12 17 7 13 20 2 1 3 14 18 22 9 6 5 8 19 11 16 0 500 1000 1500 2000 2500 Groups Distance

Dendrogram (Euclidean Distance)

Figure 3.4 Euclidean Clustering

3.3 Change Point Estimation

3.3.1 Estimating the Locations of Change Points

In this section we will define a change point based on the e-distance measure. There are many different ways in which change point analysis can be performed, from purely para-metric methods to those that are distribution free. There is not any research on the change of swing pricing strategy yet, and we do not know the reasons why an investment com-pany changes its swing pricing policy. Hence, we do not know what parameters will really change in the underlying distributions. The change point detection method we introduce into this thesis is nonparametric and designed to perform multiple change point analysis while making as few assumptions as possible. Estimation can be based upon either a hi-erarchical divisive or agglomerative algorithm. Divisive estimation sequentially identifies change points via a bisection algorithm. The agglomerative algorithm estimates change point locations by determining an optimal segmentation. Both approaches are able to detect any type of distributional change within the data. Hence

(36)

For ease of notation, we define the following sets, X_τ = {Z1, Z2,··· ,Zτ} and Yτ(κ) =

{Zτ+1, Z_τ+2,··· ,Z_κ}. We locate a change point where it maximizes the Euclidean

diver-gence.

( ˆτ, ˆκ) = arg max

(τ,κ)

b

Q(Xτ,Y_τ(κ);α). (3.7)

κ= T if we know that there is at most one change point. Venkatraman (1992) [27] mentions that it may be more difficult to detect certain types of distributional changes in the multiple change point setting using only bisection. This is because it is possible that the mixture dis-tribution in Y_τ(T ) is indistinguishable from the distribution in X_τ(T ). Hence, for detecting multiple change points,κ is set to vary.

Iteratively applying the method discussed we can estimate multiple change points. This is actually similar to the method proposed by Bai-Perron (1998) [2]. Suppose that there are already k− 1 change points detected, with locations 0 < ˆτ1<··· < T. That gives us k

clusters bC1,··· , bCk, such that bCi={Zτˆi−1+1,··· ,Zτˆi} where ˆτ0 and ˆτκ = T . Then we apply

the technique to each cluster. For the ith cluster we denote a proposed change point as ˆτ(i) and the associated constant ˆκ(i). Now, let

i∗= arg max

i∈{1,···,k}

b

Q(Xτˆ,Yτ(i)ˆ ( ˆκ(i)));α), (3.8)

where Xτˆ and Yτ(i)ˆ ( ˆκ(i)) are defined with the ith cluster. Moreover, the corresponding test

statistic is

ˆ

q_κ = bQ(X_τ_ˆ_κ,Y_τ_ˆ_κ( ˆκ_κ);α), (3.9) where ˆτ_κ is the kth estimated change point in i∗cluster, and ˆκ_κis the corresponding constant.

3.3.2 Hierarchical Estimating Methods

In Cluster analysis we wish to partition the observations into homogeneous subsets. Sub-sets may not be contiguous in time without some constraints. Matteson and James (2013) [19] invent a new technique for multiple change points detection based on hierarchical clus-tering for time-ordered data. They mainly focus on divisive clusclus-tering, i.e., clusclus-tering from top to bottom. However, in this thesis we will equally talk about agglomerative cluster-ing, which means clustering from bottom to top, and divisive clustering. We will also keep the names for the clustering methods consistent with theirs, E-Divisive and E-Agglo re-spectively. For E-Divisive, multiple change points are estimated by iteratively applying a

(37)

3.3 Change Point Estimation 25

procedure for locating a single change point. The statistical significance of an estimated change point is determined through a permutation test. This test plays a critical role in our non-parametric analysis, because the distribution of the test statistic depends on the distribu-tions of the observadistribu-tions, which is unknown in general. For the importance of this test, we will discuss it further in the subsequent section. The E-Divisive procedure will be explained more clearly in the algorithms in the Appendix. This algorithm is shown in the following figure.

Figure 3.5 E-Divisive Algorithm

As mentioned in Matteson and James (2013) [19], the agglomerative approach runs much faster that the E-Divisive approach. This reduction is accomplished by only consid-ering a relatively small subset of possible change point locations; a similar restriction to the E-Divisive approach does not result in any computational savings. This method requires that an initial segmentation of the data be provided. Let Z1, Z2,··· ,ZT are independent, each

with finite αth absolute moment, for someα ∈ (0,2). Suppose there is initially provided a clustering C = {C1,C2,··· ,Cn} of n clusters. In each cluster we assume that there is

more than one single observation. To merge clusters, we proceed in the following way. Let

Ci={Zκ, Zκ+1,··· ,Zκ+ j} and Cj={Zη, Zη+1,··· ,Zη+t}.

(38)

to change-point analysis study. The problem is that if observations merge together if they are near to each other as one can see in Figure 3.3 and 3.4, then the time ordering will be distorted and the real change points may not be detected.

To preserve the time ordering, it is allowed Ciand Cjto merge if Ciand Cjare adjacent, that

is ifκ+ t + 1 =η orη+ s + 1 =κ. To identify which adjacent pair of clusters to merge we use a goodness-of-fit statistic. SupposeC = {C1,C2,··· ,Cn}, then

b Gn(C ;α) = n−1 ∑ i=1 b Q(Ci,Ci+1;α), (3.10)

where Ciand Ci+1are adjacent. Hence, in time ordered observations, we expect to see that

this algorithm works as shown in this figure.

Figure 3.6 E-Agglomerative Algorithm

This statistic is optimized by merging the pair of adjacent clusters that results in ei-ther the largest increase or smallest decrease of the statistic’s value. Repeating this process and recording the goodness-of-fit statistic at each step will eventually make all observations cluster into a single point. Finally the estimated number of change points is estimated by the clustering that maximized the goodness-of-fit statistic over the entire merging sequence.

(39)

3.3 Change Point Estimation 27

The algorithm of this method can be seen in the appendix.

Overfitting is always a concern in change point analysis. As in Bai and Perron (1998) [2], we also need a penalty function to alleviate the overfitting problem. Thus, the change point locations are estimated by maximizing

e

GK= bGK+ Penalty, (3.11)

where K is the number of change points. Naturally, a penalty function can be Penalty =−K.

3.3.3 Significance Testing

As mentioned earlier, since the distributions of observations are not known, a non-parametric testing method is required. Here, we also use the resampling method proposed by Matteson and James (2013) [19]. To be specific, the method they employ is a permuta-tion test. The method will help us determine the statistical significance of a change point given previously estimated change points. This test can also serve as a stopping criterion for the proposed iterative estimation procedure. Suppose that we have already foundκ− 1 change points which give us κ clusters. For a newly detected change point ˆτ_κ, the asso-ciated test statistic is ˆq_κ. If the value of ˆq_κ, as in (3.9), is above the critical value, then a significant change in distribution within one of the existing clusters is detected. However, a precise critical value is not feasible without any knowledge of the underlying distributions. Therefore, a permutation test is needed to solve this puzzle.

Under the null hypothesis of no additional change points, a permutation test works as fol-lows. Note that the permutation test only performs within but not across clusters. First of all, the observations within each cluster are permuted to construct a new sequence of length

T . This yields T ! new samples with equal probabilities. Afterwards, the same estimation

procedure is applied to the newly permuted sequence. Note that if we have a very long se-quence, for example more than 700 observations in our swing fund study, this method will become piratically unfeasible because the number of all possible permutations can be really large. Hence, in this thesis we follow the number of random permutations suggested by the authors, which is 499. It means that the resampling will just gives 499 random permutations regardless the sample’s length. With this number, the permutation will be repeated and after

(40)

approximate p-value is defined as

p =

∑R+1 r=11_{{r: ˆq}(r)

κ ≥ ˆqκ}

(R + 1) . (3.12)

If we fix the significance level at p0= 0.05, then the rejection criterion is p≥ p0 and the

detection algorithm terminates. Otherwise, the procedure estimates an additional location. This permutation test may be performed after the E-Divisive procedure reaches a predeter-mined number of clusters to quickly provide initial estimates. Also note that this procedure is subject to the autocorrelation of the observations. We will take into account this issue in Chapter 5.

3.4 Time Series Analysis

The underlying assumption of the e-distance measure is that the observations have to be independent, which is however usually not true for time ordered observations. In time series analysis, one may encounter autocorrelation problems. In our swing pricing study, the market excess return defined in Chapter 2 is free of autocorrelation by definition. The flow percentage is defined as the ratio of fund flow at day t to the fund size at day t− 1. By the pricing rule of a mutual fund, the fund price is usually correlated with that one day before. Hence, the vector of flow percentage is exposed to the problem of autocorrelation too, and that will jeopardize the underlying assumption of the distance measurement. In reality, if a fund is not actively traded, for example if there are some days where the funds are not traded at all, then the severity of autocorrelation in the vector of flow percentage will be downgraded. From our experience we note that a change in swing pricing policy often indicates a shift in correlation between the vector of market excess return and the vector of flow percentage. To capture this concurrent relationship, we need another model.

A model taking into account/approximating multivariate dynamic relationships is the VAR(p), vector autoregression of order p. Suppose a VAR(p) model of the 2×1 vector of time series

yt= (y1t, y2t) with autoregressive order p:

(41)

3.4 Time Series Analysis 29

where Aiare 2× 2 coefficients martices and c is a 2 × 1 of vector of intercepts. νt is a 2× 1

vector of disturbances that have the following properties:

E(νt) = 0,

E(νtν_t′) =Σ_ν, E(νtν_s′) = 0.

By the assumptions, we know that{νt} is a sequence of serially uncorrelated random vectors with concurrent full rank covariance matrixΣ. The concurrent relationship between y1and

y2 is measured by the off-diagonal elements ofΣ. Replace y1 and y2 by e and F defined

in Chapter 2 to make the VAR(p) model relevant to our study. In our swing pricing study, we use a VAR(2) model for the funds employed. Even there are some funds that do not have the problem of serial correlation, we will still apply the VAR model to them in order to solve the more general problem. Hence, the hierarchical change point technique based on the e-distance measurement will be applied to the disturbance vector.

(42)

Chapter 4 Simulation

In the first section of this chapter we show two simulation examples provided by Mat-teson and James (2013) [18]. In this section we see the performance of the nonparametric methods applied on independent sequences . Since our study is based on time series anal-ysis, we also aim to examine if the change-point detection methods based on hierarchial clustering can still work when the observations are serially correlated. Thus we also simu-late an autocorresimu-lated multivariate sequence to see the performance of these methods.

4.1 Independent Sequences

We begin with the simple case of identifying change in univariate normal distributions. For this we sequentially generate 100 independent samples from the following normal dis-tributions:

N ∼ (0,1), N ∼ (0,√3), N ∼ (2,1), and N ∼ (2,2)

If we letα = 1, the E-Divisive method identifies 108, 201 and 308 with p-values of 0.002, 0.002 and 0.010 respectively. The E-Agglomerative method gives similar results, 101, 201 and 301 with p-values of 0.004, 0.002 and 0.012 respectively. In the following figure we can see a simulated independent Gaussian observations with changes in mean or vari-ance. Dashed vertical lines indicate the change point locations estimated by the E-Divisive method, when usingα = 1. Solid vertical lines indicate the true change point locations.

(43)

4.1 Independent Sequences 31 Time V alue 0 100 200 300 400 −5 0 5 10

Figure 4.1 Change in a Univariate Gaussian Sequence

As mentioned in Chapter 2, we suspect that the threshold change in a swing pricing pol-icy can be seen as a change in the correlation between the vector of the market excess return and the vector of flow percentage. Hence, we will consider the case where the marginal distributions remain the same, but the joint distribution changes. Suppose that we have a bivariate normal distributions with the mean vectorµ = (0, 0, 0)⊤ and the following covari-ance matrices:    1 0 0 0 1 0 0 0 1    ,    1 0.9 0.9 0.9 1 0.9 0.9 0.9 1   , and    1 0 0 0 1 0 0 0 1   .

We use the R package procedure mvtnorm to generate the observations. We will generate 250 normal observations with each of covariance matrices mentioned above. Hence in total we have 750 observations. Thus among the 750 observations there are 2 change points, 251 and 501 respectively. We will use both the E-Divisive and the E-Agglomerative method to detect the change points. By the E-Divisive method, we have detected the change points on 251 and 502 with p-values of 0.002 and 0.002 respectively. By the E-Agglomerative method, we have detected the change points at 301 and 501 with p-values of 0.002 and 0.005 respectively. The results are the same as those shown in [18]. We also find that for the first change point detected by the E-Agglomerative method is lagged by 50 observations.

(44)

32 Simulation

In the previous example, we locate the change points at equal distances in our simulated random vectors. To test how the hierarchial change point technique works in a more general scenario, we will place the change points unequally through the simulated random vectors. For example, we will generate 150 multi-normal random observations with covariance of zero, 300 multi-normal random observations with covariance of 0.9, 300 random observa-tions with covariances of 0. The change points detected by the E-Divisive method are 151 and 454 with p-values of 0.002 and 0.002 respectively. For the E-Agglomerative method, we find 158 and 452 where the underlying distribution changes, with p-values of 0.004 and 0.004 respectively. We see that the E-Divisive and the E-Agglomerative methods generate satisfying results for independent sequences.

4.2 Autocorrelated Sequence

In the section, we aim to test the performance of the hierarchial nonparametric change point method on a stationary but autocorrelated process. A stochastic process yt is

station-ary if its first and second moments are time invariant: in particular if E[yt] =µ, ∀t and E[(yt−µ)(yt−h−µ)⊤] =ΓY(h) =ΓY(−h)⊤,∀t,h = 0,1,2,···, where µ is a vector of

fi-nite mean terms, and ΓY(h) is a matrix of finite covariances. The best model (in terms

of tractability) we can think of is a multivariate Gaussian process, where we can form the covariance matrix easily. Hence, we will assume that the underlying distribution of the au-tocorrelated process is normal. Although in Chapter 3 we suggest to use the residual vector of a VAR(p) process to avoid dependency problem, the empirical finding in our study is that the correlation between the market excess return and the flow percentage change in a fund will increase if there is swing pricing policy implemented. The observations of flow per-centage may be autocorrelated as discussed in Chapter 3. However, we want to investigate a more general problem, thus we will look at two autocorrelated process, X and Y , of which the correlation changes over time.

We start from autocorrelation at lag 1. The problem can be stated as follows. We want to simulate two process Xt = a1Xt−1+θt and Yt = a2Yt−1+ωt such that corr(Xt,Yt) =ρ.

We can also write these two process as [ Xt Yt ] = [ a1 0 0 a2 ] × [ Xt−1 Yt−1 ] + [ θt ωt ] (4.1)

(45)

4.2 Autocorrelated Sequence 33

This is a VAR(1) process: Zt = FZt−1+ηt, where Zt andηtare

[ Xt Yt ] and [ θt ωt ] respec-tively. F is the coefficient matrix. ηtis a 2-dimensional zero mean white noise process. It is assumed that E[Zt−1η_t⊤] = 0. We post-multiply by its own transpose and take expectations, then we have

E[ZtZt⊤] = FE[Zt−1Z_t−1⊤ ]F⊤+ E[ηtηt⊤].

Let’s denote the covariance matrix of Zt byΣ and that ofηt by Q, then we have

Σ = FΣF⊤+ Q,

where the residuals are assumed to have unit variances. Denote the component ofΣ as σ and that of Q by q. For the sake of convenience, we use 1 to stand for X andθ and 2 for Y andω, then Σ = [ σ11 σ12 σ21 σ22 ] , Q = [ 1 q12 q21 1 ] .

This equation needs to be solved by applying the two properties of vectorization: vec(A +

C) = vec(A) + vec(C) and vec(ABC) = (C⊤⊗ A)vec(B). Then the solution will be

vec(Σ) = [I − F⊤⊗ F]−1vec(Q),

where vec stands for the vectorization and⊗ for the Kronecker product and I is the identity matrix.

Since we are dealing with 2× 2 matrices and for convenience we assume that the error terms have unit variances, it is not hard to write this out precisely:

      σ11 σ12 σ21 σ22      =             1− a2₁ 0 0 0 0 1− a1a2 0 0 0 0 1− a1a2 0 0 0 0 1− a2₂             −1 ×       1 q12 q21 1      .

Change-Point Detection on Swing Funds

University of Amsterdam

MSc in Stochastics and

Financial Mathematics

Master Thesis

Change-Point Detection

on Swing Funds

Examiner:

Dr. A.J. van Es

Supervisor:

Dr. Jan Jaap Hazenberg

Author:

Lin Lou

Acknowledgements

Abstract

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Anti-Dilution

1.2

Change-Point Analysis

Chapter 2

Swing Pricing

2.1

Determining the Threshold within Fixed Regime Time

Interval

2.2

Estimating Change Points of Swing Regimes

2.3

Test Statistics

Chapter 3

Hierarchical Change Point Analysis

3.1

General Framework

3.2

Agglomerative Clustering

3.2.1

Divergence in Multivariate Distributions

∫

3.2.2

Merging

3.3

Change Point Estimation

3.3.1

Estimating the Locations of Change Points

3.3.2

Hierarchical Estimating Methods

3.3.3

Significance Testing

3.4

Time Series Analysis

Chapter 4

Simulation

4.1

Independent Sequences

4.2

Autocorrelated Sequence