• No results found

Quantifying responses to central advisory communication in private banks: a case study at Van Lanschot investment bank

N/A
N/A
Protected

Academic year: 2021

Share "Quantifying responses to central advisory communication in private banks: a case study at Van Lanschot investment bank"

Copied!
92
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics 1

Quantifying responses to central

advisory communication in private

banks

A case study at Van Lanschot investment bank

Eveline Duyster (10759808)

Supervisor: Prof. Dr. H.P. Boswijk

Second reader: Dr. J.C.M. van Ophem

MSc in Econometrics

Financial Econometrics Track University of Amsterdam 15 August 2018

(2)

Statement of Originality

This document is written by Student Eveline Duyster who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

This thesis uses discrete choice modelling to estimate the responses to content produced by Van Lanschot private bank. It compares a Multinomial Logit Model, a Random Multinomial Logit Model (which combines a random forest like subsampling method with the Multinomial Logit Model) and a two-stage noncompensatory inference model proposed by Gilbride and Allenby (2004). It tests several hypotheses on the Multinomial Logit Model related to different aspects of the problem. The Multinomial Logit Model is only estimated on an aggregated specification, whereas the other two models are estimated on both the aggregated specification as well as a full specification.

It finds that the Random Multinomial Logit Model outperforms the standard Multinomial Logit Model and the two-stage model based on both the PCC and the AuC in an Out-Of-the-Bag sample. The two-stage model outperforms the Multinomial Logit Model on solely the AuC, but not on the PCC.

The hypothesis tests indicate that different clients (as measured by different clusters) respond differently to the content produced. They also indicate that different clients act differently with regard to different types of products, that the effect of content produced is dependent on its type and that the buying response does not follow the same data generating process as the selling one. It thus remains evident that the needs of clients differ, and the service delivered should be tailored to these needs.

(4)

Contents

1 Introduction 1

2 Theoretical Framework 3

2.1 Literature Review . . . 3

2.2 Segmentation of client groups . . . 7

3 Methodology 9 3.1 Deciding not to switch . . . 9

3.2 Switching costs . . . 10

3.3 Models . . . 11

3.3.1 Mixed Multinomial Logit Model . . . 11

3.3.2 Random Multinomial Logit Model . . . 13

3.3.3 Consider-Then-Choose Noncompensatory Inference . . . 16

3.3.4 Comparison of the Models . . . 23

4 Data 25 4.1 Van Lanschot Background . . . 25

4.2 Data Description . . . 26

4.2.1 Content Database . . . 26

4.2.2 Characteristics Database . . . 27

4.2.3 Transactions Database . . . 28

4.3 Specification of the choice set . . . 28

4.4 Combining the different datasets . . . 29

4.5 Descriptive statistics . . . 30

5 Results 33

(5)

5.1 Segmentation . . . 33

5.2 Multinomial Logit Model . . . 34

5.2.1 Coefficients and Hypothesis Testing . . . 35

5.2.2 Performance . . . 39

5.3 Random Multinomial Logit Model . . . 40

5.3.1 Variable Importance and Partial Dependence . . . 40

5.3.2 Performance . . . 42 5.4 Consider-then-choose inference . . . 43 5.4.1 Model output . . . 45 5.4.2 Performance . . . 47 6 Discussion 53 6.1 Model comparison . . . 53 6.2 Areas of concern . . . 54 6.3 Further Research . . . 55 7 Conclusion 57 A K-means segmentation 59 A.1 The algorithm . . . 59

A.2 Results description . . . 60

B Markov Chain Monte Carlo (MCMC) methods 61 B.1 Gibbs Sampler . . . 61

B.2 Metropolis Algorithm . . . 62

B.3 Griddy Gibbs . . . 62

C Partial Dependence Plot 63 D Estimation Algorithm of the two-stage model 66 E Additional information on the dataset 69 E.1 Different types of content . . . 69

E.2 Types of products . . . 70

(6)

F.1 The Multinomial Logit Model . . . 72 F.2 The Random Multinomial Logit Model . . . 72 F.3 Two-step based inference . . . 72

(7)

Introduction

Many private banks send out detailed newsletters and other sorts of advice centrally as part of their service. While this is something that is done frequently and something that clients pay for as part of the services of their respective bank, little information is often acquired about the responses this central communication induces. Content is being produced primarily on ’gut feel’ of the content producers, while data-driven quantifications of this effect are often lacking. This thesis is aimed to shed light on this issue, as more knowledge on the response of customers is vital in improving this aspect of the services. This thesis concerns a case study of Van Lanschot, a private bank located in the Netherlands.

The issue is especially important to address right now, as the bank is exploring more technical extensions of their services that may for example be used to improve and personalise the online banking experience. For example, as this thesis is being written, the bank is launching the introduction of a new app, which can be used by financial advisers to push relevant content to their clients. There is, however, at the moment no direction as to which content is relevant for which client. This is now something that the adviser has to determine on his own. As this thesis concerns the effect of central advisory communication on different types of clients, it may not only aid in the production of more relevant content, but also in the application of this app. Different clients have been shown to behave differently when it comes to investing (i.a. Gonzalez-Carrasco, Colomo-Palacios, Lopez-Cuadrado, Garcia-Crespo and Ruiz-Mezcua, 2012) and thus more insight into this aspect is relevant in personalising the content.

(8)

Research Questions

The central question this thesis wishes to answer is: What is the effect of central advisory communication on the change in a client’s portfolio? To incorporate the considerable amount of heterogeneity, sub-questions that are of interest are:

1. Do different client groups respond differently to the same news?

2. Do different types of instruments on the advisory list induce different types of behaviour? 3. Do different types of central communication provoke different types of reactions

(newslet-ter, trading ideas, alerts etc.)?

4. Is the reaction the same when considering the choice to buy an instrument and to sell one? The study at hand aims to answer the research sub-questions specified above by employing a Multinomial Logit model (MNL) with group specific intercepts. This model will be compared with two other models to provide more insights into the data generating process. The models are also compared based on performance measures. The second model used is the Random Multi-nomial Logit Model (RMNL) (Prinzie and Van den Poel, 2008), which combines the machine learning technique of random forests with the multinomial logit model. Lastly, it employs a Two-Stage Consider-Then-Choose model with noncompensatory inference. The model is based on the algorithm introduced by Gilbride and Allenby (2004).

This thesis is organised as follows; first, the theoretical background is discussed. Then, the econometric models used in this thesis are enlightened more closely. Chapter 4 discusses the data and provides a short description of the Van Lanschot bank. The next chapter analyses the results. These results are discussed in Chapter 6. Finally, Chapter 7 provides a conclusion.

(9)

Theoretical Framework

This section of the thesis evaluates the literature on the topic as well as discusses the client segmentation that is used in this thesis.

2.1

Literature Review

The reaction of investors to news has been analysed often, but the focus of the research in this area has been based primarily on either the response of companies and individual investors on national news or on the added value of financial advisers on portfolios. This thesis combines the two and concerns specifically the added value of and response to centrally communicated advice by private banks. In doing so, it aims to aid banks to produce more relevant content and personalise the content they produce, in particular it aims to aid the Van Lanschot private bank.

One of the main aims of this study is to segment the responses of different types of clients and thus to help in improving the personalisation of content. Quantifying clients’ responses with respect to different types of news is vital in determining which client to show which news message. The final goal is to show clients only the news that is relevant to them. With respect to this subject, Gonzalez-Carrasco et al. (2012) have done research on the PB-ADVISOR, a system which should advise private bankers based on fuzzy and semantic techniques. This system should provide private bankers with tailor-made advice for their clients based on the clients’ characteristics. It finds that this advice can be made very accurate based on the client’s own characteristics (age, income, marital status, gender, education) and the characteristics of his or her investment behaviour (confidence, character, risk taking, emotion on risk, risk description).

(10)

The relevance of these findings to the research at hand is that we can indeed determine a client’s interest fairly accurately based on the information that is already registered in the system.

However, the difference in portfolio decisions may not only be influenced by the difference in the characteristics of a client, but also by the characteristics of their assigned financial adviser. Direr and Visser (2013) segment not clients, but financial advisers and study their advised portfolio allocations. They find that financial advisers play an important role in the investment decisions of their clients. They also find that higher educated advisers tend to sell riskier allocations and that female advisers tend to sell less risky funds. As all trades made are still run past financial advisers1, their opinion is a relevant aspect of the decision process that needs to be taken into account.

The added effect of financial advisers, however, stretches beyond merely showing a client relevant content and advice. The adviser is expected to contribute to a client’s portfolio in ways he would not be able to do himself, often because of a lack of expertise. An aspect that one could look at with regard to this contribution is portfolio diversification.

Gaudecker (2015) has investigated the effect of financial literacy and financial advice on the diversification of household portfolios. He argues that investment mistakes are less frequent as investors have more knowledge and as they receive more advice. He finds that households that are less financially literate and / or do not rely on professional or private contacts for advice have a less diversified portfolio on average. If combined, the effects are the largest. This could also imply that the effects of central communication are more pronounced for the clients who are not financially literate themselves.

Kramer (2012) has investigated the added value of financial advisers to individual investors’ portfolio decisions. He compares the portfolios of advised and self-directed investors. He finds that the differences in the characteristics of advised and self-directed investors are quite small, whereas the differences in portfolio composition are noteworthy. He also finds that advised portfolios contain significantly less equity and more fixed income securities. He attributes this to the fact that investors who seek advice would typically be more risk adverse. He finds no significant difference in the performance of the two portfolios, but the advised portfolios tend to be better diversified. Based on this finding, the paper concludes that advisers do add positive value. In relation to the topic at hand, it is relevant to note that the paper finds that advised

1

If clients wish to make trades at Van Lanschot, they still need to contact their adviser to do so. They have the final say in what their investment is going to be, but the effect of the opinion of their financial adviser cannot be ignored.

(11)

investors execute almost twice as many trades as self-directed investors (0.27 vs 0.14 per month). The authors attribute this to the fact that advised portfolios are generally better diversified and that this diversification requires more trades. They also note that there is a lot of heterogeneity regarding this characteristic, 45% of the investors never trade, whereas the 1% most active investors turn their portfolio over approximately 1.5 times annually.

All research mentioned above concerns the effect of financial advice on different aspects of an individual’s portfolio. Although this is important to consider when trying to quantify the effect of centrally communicated news, it fails to incorporate an important aspect of the problem at hand, which is the abundance of information clients sort through as they process this centrally communicated advice. The bank sends out an average of around 20 news messages per day, each of considerable length and sometimes with attachments and links to other news. As it is not likely that every client combs through all the news messages and is well-informed about all the products the bank offers, the approach of personally communicated advice may not fully extend to this situation, where a client needs to decide which information to consider and which information not to. In other words, a financial adviser can be assumed to be fully informed, while this is unrealistic for individual clients. They may not arrive at the most important information, as they have to be selective in their uptake. The two most relevant studies with regard to this topic are by Barber and Odean (2007) and Monti, Boero, Berg, Gigerenzer and Martignon (2012).

Barber and Odean (2007) investigate the effect of attention, measured by news data, on the buying behaviour of individual and institutional investors. They build on the study by Odean (1999), who proposes that investors only buy stocks that initially grab their attention. This does not mean that they buy all stocks that do so. Barber and Odean (2007) find that individual investors are net buyers of attention-grabbing stocks. They hypothesise that this is caused by the fact that investors have many different options with regards to the stocks they can buy. This effect is not found in selling stocks, as the investors do not face the same problem. They further hypothesise that preferences determine choices after attention has determined the choice set. They base their results on an extension of the model of Kyle (1985). This is a dynamic model of insider trading with sequential auctions and identifies three different traders; the single risk neutral insider, the random noise trader and the competitive risk neutral market trader. The model is created to examine the informational content of prices, the liquidity characteristics of markets and the valuation of private information to insiders. Regarding the latter, it determines

(12)

the speed at which information is incorporated into prices. The different types of traders is what makes the model less suitable for this specific research, as we expect all financial advisers, and thus indirectly the investors, to be fully informed.

A more suitable model can be found in the research of Monti et al. (2012), who investigated the behaviour of different investors, as a response to financial advice, and lay their emphasis on the difference between experienced and inexperienced investors (bank customers and university students). They also specifically examine the information sets they consider. They suggest to use noncompensatory decision tree models to do so, an approach that is followed in this thesis. The information set considered in the research is quite limited, as the information set has a total of 71 people participating in four experimental tasks. Due to this disadvantage, extra care should be taken when considering the results found. With regard to which information investors look at first, experienced investors first look at risk (89%), whereas inexperienced investors only look first at risk around 50% of the time. The screening of information is done by a mix of lexicographic (satisfy all relevant aspects) and tallying (average of relevant aspects must satisfy a certain level) rules. The decision making process is modelled by a multinomial logit model.

Some notes of warning with regard to the fragility of this research come from the presence of other media sources that may corrupt the measurement of a client’s response to a news message communicated by Van Lanschot. Connor (2013) measured the effect of investors’ information sharing and use in virtual communities. He finds that investors rely heavily on inter-personal and popular sources of information. He also finds that investors are not critical with regard to these sources of information. This may therefore cause problems with regard to not only the client’s response to news communicated by Van Lanschot (he may not weigh this accurately against the news communicated by other sources of, less reliable, information) as well as with regard to an adviser’s response to a proposition made by the client, as a client might lose credibility by reacting to unreliable news sources.

The importance of this thesis stretches beyond just quantifying the responses of centrally communicated advice, it is aimed to also aid the bank in improving their online services. As with most sectors at the present time, the banking sector is also experimenting with the implementa-tion of technology. Interactive media is important within the framework of this study, as client and adviser are expected to communicate more and more via the app2. In light of thesis, the work of Stewart and Pavlou (2002) provides a philosophical perspective of the effectiveness of

2At the moment, Van Lanschot is experimenting with an app that advisers can use to push relevant content

(13)

interactive media and how to measure this. They argue that measuring the effectiveness of mar-keting communications should be done by focusing on the interaction instead of the behaviour of the marketer or the consumer. They use structuration theory to validate their argument. This is another limitation of this thesis, as data on this interaction is now scarce or even sometimes not available at all. As soon as the media provided by the bank has become more interactive, revising parts of this thesis may be necessary.

2.2

Segmentation of client groups

In order to correct for the considerable amount of heterogeneity, clients will be segmented into different client groups. This method is used instead of the estimation using individual fixed effects, as the theory on individual behaviour is not always satisfactory, where theory on groups’ behaviour is (Smith, 1956). In this setting, estimating fixed effects on an individual level with the specification elaborated upon in Subsection 3.3.1, is not expected to yield accurate results, as the panel is relatively short (T=24), and thus we would not have enough observations per individual3.

Aspects that will be considered in this segmentation are: 1. Type of contract (active, intensive, basic)

2. Amount of capital in the account 3. Risk profile

4. Type of products owned 5. Frequency of trading

6. Volume of trading (relative to total capital)

Segmentation based on the first two is straightforward. The other features are incorporated into the segmentation using K-means clustering4. This has been shown to work well in customer segmentation problems (i.a. Chiu, Chen, Kuo and Chun-Ku, 2009). A model with fixed effects

3

Note that this is different from fixed effects estimation in a linear model, in which the fixed effects can be partialled out by using techniques like first differences and within estimation. Here, fixed effects are estimated by the technique of Chamberlain (1980), further elaborated upon in Subsection 3.3.1. This would require the estimation of an individual specific intercept and the amount of observations per individual is of interest.

(14)

based on these clusters is then estimated. The respective form depends on the model at hand, as further explained in Section 3.3.

(15)

Methodology

3.1

Deciding not to switch

When considering the clients who decide not to invest in a new product (or sell a product), two different types of scenarios can be distinguished. The client either finds that he already has the optimal portfolio given the scenario and information at hand (the switching costs are higher than the added benefit) or the client is ’absent’. The latter can very broadly be defined as clients who should switch, but don’t. They are the ’lazy’ customers.

The models should take into account which type of customer it is dealing with. The active customer will be influenced by the news messages and might act on them. The inactive customer is assumed not to act on its own and to be only responsive to news messages if he is directly approached by his financial advisor to make such a trade. Those scenarios are likely to occur when we are dealing with an alert, rather than with all the other news forms. These clients are expected to have less frequently scheduled contact than their active counterparts. They may also have less capital in their portfolio, as studies have shown that many advisers have the tendency to approach their clients with more capital first (Kramer, 2012).

Clients that are active will switch if the added benefit of switching is higher than the costs of doing so. Thus, if the switching costs are higher than the added utility, an active client is also not expected to switch. This can be defined as having the optimal portfolio given the situation at hand. The switching costs are further explained in Section 3.2.

Another aspect of (in)activity of clients is the frequency of contacts with their financial adviser. If a client is absent, then the contact is infrequent and the contact is always initiated by the financial adviser. In order to determine whether a client is absent, the overall average

(16)

amount of self-initiated contact between a client and his or her adviser will be calculated. If the client is more than two standard deviations below the mean, he satisfies this condition for being classified as inactive.

If the client has not had frequent contact (as defined above) with his or her adviser, we assume that he or she is an inactive customer. Thus, they will not be affected by the content of the news articles and therefore will not be considered in the analysis. It is interesting for the bank to further delve into the underlying reasons of why customers are inactive and how they can ensure that customers turn active again. This topic, however, is beyond the scope of this thesis.

3.2

Switching costs

For customers who are active, the models described in this section implicitly assume that the alternative with the highest expected utility will most likely be chosen from the set of alterna-tives. With regard to the utility of the alternative of not switching, the functional form of this utility is more evident than from all choices that consider switching. This is due to the fact that switching imposes switching costs, such as the costs a client pays per transaction and the costs incurred by spending time and effort on making the transaction happen. In these scenarios, the benefits should thus outweigh the costs in order for a client to make a switch.

Switching costs are common in the estimation of continuous processes, like labour supply. Often used functions in the context of continuous processes are Linear Quadratic Adjustment Costs (LQAC). Engsted and Haldrup (1994) define the loss as being proportional to the squared adjustment in the target variable. Their scenario, however, is different from the choice to invest in a certain product, as in this framework investment is a binary variable. There should be no significant increase in the height of the cost when the amount of the investment goes up, as neither the fee nor the discomfort of the information gathering process increases.

Because this adjustment formula is not appropriate for this study for reasons mentioned above, an alternative estimation of the switching costs as introduced by Grzybowski (2008) is more appropriate. Grzybowski (2008) identifies the switching costs by introducing a dummy if an individual switches from alternative k to alternative j. The utility function can then be specified as follows:

(17)

Uijt= rj + αpijt+ βxjt+ γjzit+ J

X

k=1

wksijkt+ ξij + ijt= Vijt(ξij) + ijt, (3.1)

where Uijt represents the utility for individual i at time t associated with choice j, rj is an

alternative dependent intercept, pijt are individual specific, alternative varying variables (in the

model of Grzybowski, this is the price), xjt are variables that vary over time and over alternatives

but not over individuals (such as the quality of the service of the alternative), the sijkt is a set

of dummies for switching from alternative k to alternative j1, ξij represent persistent consumer

heterogeneity and ijt is the idiosyncratic unobserved preference variable (such as taste). The

parameters wkrepresent the discomfort of switching. In this thesis, the switching costs as defined

by Grzybowski (2008) are used2, but they are latent, as they are absorbed into the intercept. These statements are further elaborated upon in the Subsection 3.3.1.

3.3

Models

This section describes the different models employed in this thesis. First, the mixed multinomial logit model will be considered. This will then be extended in the consideration of the random multinomial logit model. Lastly, the two-stage consider-then-choose model will be discussed.

3.3.1 Mixed Multinomial Logit Model

Firstly, the mixed multinomial logit model will be considered in this study. This model is preferred over the conditional logit model and the standard multinomial logit model as it allows some parameters to vary over the alternatives, while others do not. It thus allows for additional flexibility. We assume the additive random utility model with type-1 extreme value distributed errors3, thus U

ij = x0ijβ + w0iγj+ ij = Vij + ij. The probability of client i then choosing to

1

Grzybowski (2008) actually leaves out the subscript j because this is automatically implied as the utility of alternative j is considered. However, for clarification of the notation the subscript is included here. This has no consequences for the specification of the model.

2The alternative from which a client switches is not defined and not of interest, because this thesis considers

the choice to act (either buy or sell) and i.a. provisions are to be paid in both of them, so the subscript k can be left out and there is no longer summed over the different values of k. There is thus simply one dummy for all alternatives, except for the one not to act.

3

(18)

invest in alternative j is specified as follows: pij = eVij Pm l=1eVil = e x0ijβ+wi0γj Pm l=1ex 0 ilβ+w 0 iγl, (3.2)

where xij are the variables that vary with the alternatives (and the individuals) and wi are the

variables that do not. Note that the latter have a coefficient that varies over the alternatives, thus the effect may still vary over the alternatives as well.

The dependent variable is the choice to invest in or sell a certain type of instrument in a certain sector, including the choice not to act. The exact specification is further elaborated upon in Chapter 4.

The model is extended by estimating it with fixed effects. This is motivated by the fact that this reduces omitted variable bias and that unobserved heterogeneity can be related with observed covariates. More specifically, fixed effects will be estimated on the group level, instead of on the individual level. As the different groups are expected to have homogeneous responses, this method should have no significant impact on the accuracy of the estimations. However, the grouping of the data on this level does allow the variance to be reduced.

The exact specification of the model with fixed effects is based on the work of Chamberlain (1980). He introduces a group varying, alternative varying intercept to the regular multinomial logit model. His model is based on the fact that the frequency of the occurrences of the alter-native j in a group is a sufficient statistic for the intercept in the model (αij, denoted as ξij in

the switching model). The intercept can be interpreted as the tendency of individual i to choose alternative j. In the model of Chamberlain with group fixed effects, all groups get a group specific, alternative specific intercept. Estimation is then done by specifying the conditional likelihood for the whole time series of each group4 by

Pr(yi| X t δyit,1, ..., X t δyit,J) = QTi

t=1exp(x0ijtβ + w0itγj)δyit,j

P

υi∈Υi

QTi

t=1exp(x0ijtβ + wit0 γj)δyit,j

, (3.3)

where t represents the period, Ti is the amount of periods for which individual i is observed,

δyit,j is 1 if individual i chooses alternative j at time t and Υi = {υ = (υ1, ..., υT)|υt = 4

One must note that the original model from Chamberlain (1980) is not based on mixed multinomial logit models, but on the standard multinomial logit model and that the formula is thus altered in order to incorporate the mixed specification.

(19)

0 or 1 and P

tυt =

P

tyit} (the alternative set for individual i). Taking the logarithm and

averaging over this gives the full likelihood function:

1 N N X i=1 ln exp( PTi t=1δyit,j(x 0 ijtβ + w0itγj)) P υi∈Υiexp( PTi t=1δυit,j(x 0 ijtβ + w0itγj)) , (3.4)

where i represents an individual.

Maximising this function (maximum likelihood) with respect to β and γj, gives ˆβM L and

ˆ γj,M L.

Note that the ’not buying’ option will also be included in the alternatives. If the client is inactive, he is assumed to never make a trade, thus these customers are not included in the model estimation. In the chapter on data (Chapter 4), the implementation of different variables are mentioned and substantiated.

One can now immediately see that the switching costs, as defined by Grzybowski (2008), become part of the intercept. As the intercepts are group dependent and the frequency of occurrences of an alternative in a group is a sufficient statistic for these, the switching costs are merely added to the intercept (for all choices except for the choice not to invest, as no switching costs are associated with this decision).

A logical choice with respect to the model specification would be to extend the model into a nested multinomial logit model, with the first level being the choice to invest or not and the second level being in which type of instrument to invest. The choice was made not to incorporate this nested structure, to allow for correlation between not investing and the other individual alternatives.

It should be noted that the multinomial logit model suffers from the curse of dimensionality, thus it is necessary to perform feature selection before implementing this model. Software packes often lack this feature and this is therefore done manually.

3.3.2 Random Multinomial Logit Model

The Random Multinomial Logit model (RMNL) considered in this study is based on Prinzie and Van den Poel (2008). This model combines the machine learning technique called random forest models with the multinomial logit model. It builds a random forest of multinomial logit models. The authors find that it produces results superior to the standard multinomial logit model.

(20)

The Random Forest Algorithm

The random forest supervised learning algorithm is an algorithm that works on decision trees. It builds an ensemble of these trees. It can be used for both classification and regression trees. Most of the time (also in this thesis), it builds these trees on the idea of bootstrap aggregation (bagging). As decision trees are dependent on the training data, it makes sense to draw training data iteratively and aggregate the results. This is exactly what bagging does. It trains a model on each sample. The bagging method thus uses the combination of different outcomes to come to the best results.

As decision trees are greedy and consider all variables at each node, bagging can yield many

similar decision trees. A consequence of this is they have been found to be very unstable

(Breiman, 1996). However, the random forest deals with this disadvantage. At each node, the random forest algorithm selects m out of M variables randomly. The best split is selected out of these m using some prespecified criterion. The random forest algorithm introduces extra randomness into the model, because in deciding how to split a node, it searches for the best feature within a random set of features. Each tree is grown using the Classification Regression Trees (CART) methodology. The main elements of CART are:

1. Rules for splitting data at a node based on the value of a variable 2. Stopping rules for deciding if a branch should be terminated 3. A prediction for the target variable in each terminal node.

The main advantage of the random forest algorithm over regular decision trees is thus that it is not sensitive to the problems of overfitting, as it produces different trees every time due to the random component and then combines these trees. In doing so, the random forest eliminates the decision trees’ instability disadvantage and also allows it to cope with large feature spaces. The error rate of the random forest is dependent on the correlation between two trees in the forest and the strength of each individual tree in the forest. Specifically, as m grows larger, the strength and correlation of the model do too.

The classification tree recursively partitions observations into subgroups with more homoge-neous categorical response (Breiman, 1984). In light of this specific study, one could for example think about the responses to different types of news or different types of products on the advi-sory list. During the classification, the algorithm ensures that only the features needed for the test pattern under consideration are involved.

(21)

The random forest algorithm works as follows:

1. Select m out of the M features randomly, with m << M . 2. Using the best split node, calculate the node d.

3. Split the node up into other nodes using the best split. 4. Continue steps 1-3 until l nodes are reached.

5. Create n of these trees by repeating steps 1-4.

The Random Multinomial Logit Model

The Random Multinomial Logit Model (RMNL) incorporates the random forest decision trees into the regular MNL. As random forests are able to deal with large feature spaces because of the random feature selection, it is interesting in the context of multinomial logit models, as specifically this curse of dimensionality5 is what makes these models hard to implement in some situations. The main difference between the Random Multinomial Logit and the Random Forest is that the trees in the random forest are generated using CART methodology, whereas in the RMNL the trees are individual multinomial logit models. In doing so, it is able to also quantify the effect of alternative specific variables (such as the amount of articles produced), while the standard random forest technique only deals with individual specific variables by repeatedly generating groups of observations with homogeneous responses.

Let (Y, X) be a pair of random variables. The RMNL makes a hypothesis about the true function f ; y = f (x), based on a training set of N1 observations6. There is no specific form

this f (x) is limited to taking. In doing so, the learning algorithm outputs a classifier hc (the

hypothesis7). Given a loss function L, the goal of the RMNL is to find a classifier, or set of those, which minimises this loss function for a test set of N2 observations. During this thesis,

the loss function will be specified as the number of misclassified observations, following Prinzie and Van den Poel (2008):

L =

N2

X

i=1

(hc(Xi) 6= Y |Xi) (3.5)

The Random Multinomial Logit Model considers R ’trees’. Each tree is created by randomly selecting N1 samples with replacement from the entire dataset. The out-of-the-bag (OOB) data

5

One must only select the best features to incorporate in the model.

6The training set contains about 2/3rd of the observations. 7

(22)

is generated by extracting the bootstrap sample from the entire dataset. Each tree is generated by randomly selecting m out of the M features (variables) from the bootstrap sample. The loss function, as defined above, is obtained by comparing the predicted outcome with the actual outcome. The bagged predictor is obtained from the majority vote, the instance is classified into the class having the most votes over all logit models. An important realisation with regard to this method is that the trees are thus not grown using the CART methodology. The trees are evaluated using multinomial logit models. Thus we get R multinomial logit models, each considering N1 observations and m random features. The results are later aggregated.

The relative variable importance is calculated by permuting the values of a variable for each tree, recalculating the difference in L and averaging this (for all trees). To measure the importance of the mth variable, this variable is randomly permuted in the OOB data. The value Vj = |rb − rjb| is then called the importance of the j-th variable for each tree, where rb

is the number of correct votes in the untouched data and rbj is the number of correct votes in the permuted data. The final importance is calculated by averaging the values of Vj. The term

variable importance thus reflects how much misclassification error is associated with wrongly calculating the value of the respective variable. The higher the value, the more important the parameter (this value is named the raw importance score by Prinzie and Van den Poel, 2008). By dividing the raw importance score by its standard deviation, one gets a z-score.

Important to note with regard to the model described above is that the same specifications that are employed in the standard multinomial logit model are also employed here8. Thus these multinomial logit models on the trees of the random forest for example also are estimated with fixed effects.

3.3.3 Consider-Then-Choose Noncompensatory Inference

The relevance of the last model is motivated by for example works of Yee, Dahan, Hauser and Orlin (2007) and Dieckmann, Dippold and Dietrich (2009). Its main benefit over the MNL and RMNL is that it no longer assumes that economic agents have full knowledge over all different alternatives. This thesis will consider a two-step consider-then-choose model, which in the first step identifies the feasible choice set for each client and in the second step defines the choice the client makes based on this feasible choice set. In other words, the previous models assume that each client carefully considers all pieces of information and that he or she integrates them

(23)

into some common currency, such as expected utility, following a complex weighting scheme. As complete consideration of all funds is believed to be unrealistic, the consider-then-choose estimations are expected to be more accurate than the traditional logit model (Dieckmann et al., 2009). Based on results by e.g. Czerlinski, Gigerenzer and Goldstein (1999), simple heuristics have been shown to be more robust, extracting only important and reliable information from the data, while complex strategies weigh all pieces of evidence, thus extracting much noise and thereby resulting in accuracy losses when making predictions for new data (overfitting). The conjunctive and disjunctive inference considered in this thesis are an important category of this simple heuristics.

The consider-then-choose model assumes that there is some sort of threshold surrounding the consideration process. Consideration sets involve abrupt behaviour changes under discrete circumstances. These changes lead to abrupt changes in choice probability sets (Gilbride and Allenby, 2004). This can be interpreted as follows; if an alternative is in the feasible choice set, then normal random utility theory applies. If not, its probability is zero. This means that the likelihood function is discontinuous and not necessarily concave, as is the case for regular discrete choice models like the multinomial logit model. Gradient-based methods are also not suitable, as the continuity and concaveness of a likelihood function is exactly what these methods like Newton-Raphson need in order to maximise it. Thus, such methods no longer apply in this case. One way of dealing with this is by trying to ’smooth out’ the likelihood function, by introducing additional parameters and error terms. Another way of dealing with this behaviour, is by using a two-stage model, based on Bayesian inference. This is done in this thesis.

A simple discrete-choice model without choice sets, like the multinomial logit model, assumes the following likelihood:

Pr(i) = Pr(Vi+ i > Vj+ j ∀j) = Pr(j < Vi− Vj+ i ∀j) = Z ∞ −∞    Z Vi−Vj+i −∞ ... Z Vi−Vm+i −∞ f (j)...f (m)dj...dm   f (i)di, (3.6)

with Vi representing the utility associated with choice i and where the ’s can follow any

dis-tribution. For the multinomial logit model, these are type-1 extreme value distributed. For the (multinomial) probit model, these are normally distributed.

(24)

The first step in the two-step framework concerns the screening process. There are various ways of estimating this screening process. We define compensatory and noncompensatory deci-sion rules as well as conjunctive and disjunctive rules within the noncompensatory framework.

A compensatory decision rule is one that assumes that all aspects of an option are evaluated and translated into a common currency, like utility. If the utility of an alternative rises above a threshold, γ, then it will be considered in the feasible choice set. The compensatory screening rule thus has the following form:

I(Vj > γ) = 1, (3.7)

where I(Vj > γ) equals 1 if the decision rule is satisfied, meaning that an option will be considered

if the utility rises above the threshold γ.

Within the framework of noncompensatory decision rules, we distinguish conjunctive and disjunctive screening rules. Both of these involve the consideration of certain features of the alternatives. In the context of this thesis this could for example be the type of product (stock, index etc.) or a risk classification. The conjunctive rule assumes that every single one of the m features must be above its corresponding threshold for an alternative to be evaluated in the feasible choice set. The conjunctive screening rule is consistent with an elimination-by-aspects screening process. This rule thus takes the following form:

Y

m

I(xjm> γm) = 1, (3.8)

where I(xjm > γm) equals 1 if the decision rule is satisfied. Thus all conditions need to be

satisfied in order for an option to be considered.

The disjunctive rule is another noncompensatory rule, that assumes that at least one of the features of an alternative must be above a certain threshold in order for an alternative to be considered. Disjunctive rule (noncompensatory):

X

m

I(xjm > γm) ≥ 1, (3.9)

where I(xjm > γm) is 1 if the decision rule is satisfied. Thus at least one of the conditions needs

to be satisfied in order for an option to be considered.

If an alternative is in the choice set, then its choice probability is determined relative to the other alternatives in the set. If it is not, then its choice probability is zero.

(25)

As stated earlier, by introducing a feasible choice set in the first step, the likelihood function becomes irregular. The model considered in this thesis deals with this problem by employing Bayesian inference. Bayesian econometrics requires the definition of a prior probability distri-bution of the parameters, which is defined without considering the data at hand.

The reduced form model that comes into play when using such consideration set decision rules is {Pr(i) = Pr(Vi+ i > Vj+ j ∀j such that I(xj, γ) = 1}, where the indicator function

equals one if the alternative is in the choice set. xj reflects the decision rule applied to the

process and γ specifies the threshold. This incorporates the discontinuity into the model. The alternatives that do not have the acceptable attribute level are not considered and have a choice probability of zero.

In this study, independent data is not available to identify the feasible choice set and thus the set becomes a latent construct that is simultaneously estimated with the observed choices (Gilbride and Allenby, 2004). This yields additional complications as (3.6) depends on the choice set at hand. The thesis follows the Bayesian approach by Gilbride and Allenby (2004), where the model structure allows for screening alternatives to be based on the attribute levels and / or the overall value of the offering. The screening thus can be done either by compensatory, conjunctive or disjunctive rules, or a mixture of them. It employs data augmentation and Markov Chain Monte Carlo methods (MCMC). The model specifies hierarchical priors, meaning that the parameters in the prior are modelled themselves as having a distribution. The parameters within these distributions are referred to as hyperparameters. The model originally introduced by Gilbride and Allenby (2004) does not involve a time component. As this study deals with a panel data type structure, this time component is introduced where appropriate. The specification of z (equivalent to U in the specification of the multinomial logit model) still incorporates fixed effects by including a group dependent intercept.

Data Augmentation

Data augmentation can be defined as the procedure of adding imputed values as if they were observed data (Cameron and Trivedi, 2005). With regards to this procedure, Tanner and Wong (1987) show that the posterior based on only the observed data is intractable, but that it becomes tractable after data augmentation.

Gilbride and Allenby (2004) state that direct evaluation of the probability in (3.6) can be avoided in a Bayesian estimation if all V ’s are augmented with a vector of latent variables z,

(26)

with

zit= Vit+ it, it∼ N (0, 1). (3.10)

The model can then be written hierarchically as follows:

y|z,

z|V,

with y denoting the observed choice outcomes. This thus states that the observed choice out-comes (0 or 1, y), depends on the utility assigned to that specific choice (U ), which again depends on a random component, as well as a deterministic component (V ).

This can then be estimated using MCMC9 by drawing iteratively from the following two

conditional distributions:

f (z|y, V ) = Truncated Normal(V, 1), (3.11a)

f (V |z) = N (¯z, I/n), (3.11b)

where 1 is the identity matrix. The trunaction in the distribution of z is caused by the fact that Vi should be the maximum of all Vj if option i is chosen.

This is an example of a hierarchical prior, where the parameter V is specified itself as having a distribution. Thus by replacing the upper limits of the integral in (3.6) by the truncation points drawn from a truncated normal distribution, the data augmentation provides a way to avoid the estimation of this integral.

Application to choice models

The choice model can be written hierarchically as

y|z, I(x, γ),

z|V,

9

(27)

where zit= Vit+ it as specified before and I(x, γ) is equal to 1 if the noncompensatory decision

rule is satisfied. The estimation is done by drawing iteratively from the following conditional distributions:

z|y, V, I(x, γ), γ|y, z, x,

V |z.

The first follows a truncated normal distribution, which is defined for the alternatives in the choice set. If an alternative is not in the choice set (specified by the indicator function), then it follows a non-truncated normal distribution. Truncation happens as the values of z may never be larger than the value for z which corresponds to y = 1. If they are not in the choice set, this is irrelevant as these options will never be chosen; Pr(j)hit= 0 if the conditions (compensatory,

conjunctive or disjunctive) are not satisfied.

The second depends on the observed data (y) and the augmented parameter (z). It takes the form of an indicator function, which is 1 if a value of γ is permissible. Permissible values for γ are those that lead to a choice set where the maximum of the augmented variable z corresponds to the observed choice. Variation in z corresponds to different choice sets.

The third conditional distribution is analogous to (3.11b). It can also be parameterised as a regression function. The draw of V can be generated from the standard normal distribution theory.

The above method allows for the model parameters to be drawn from the full conditional distribution despite the irregularity of the likelihood surface. The model defining V (Prt(i) =

Pr(Vit+ it > Vjt + jt ∀j) becomes a standard discrete-choice model. Given z, some of the

values of γ are acceptable and others are not. Across multiple draws, the acceptable range of γ varies and the Markov Chain converges in distribution to the full conditional distribution of all model parameters. The algorithm demands a set of possible values of γ to be specified and then subsets the set of acceptable values of γ from this. The first results are dependent on the set of γ’s that are specified at the beginning. The γ that is used in the analysis is then determined using a Griddy Gibbs algorithm10. More information on the exact estimation algorithm can also be found in Appendix D.

(28)

If one considers the compensatory model, the probability of individual h and buying scenario i choosing alternative j given some threshold takes on the following form, the selected alternative has greatest utility of the options for which Vhik > γk:

Pr(j)hit = Pr



zhijt > zhikt ∀k such that I(Vhikt> γht) = 1



, (3.12)

with

γht ∼ N (¯γ, σγ2), (3.13)

to allow for heterogeneity of the cutoff value. Note that the value of γ is different for each individual and also varies throughout time. It is, however, always drawn from the same distri-bution. It is very well possible that the distribution of γ is different for different individuals and dependent on their characteristics. It is also likely that this γ shows a time trend or correlation with its lags. These issues can be dealt with by including the characteristics of individuals in the distribution and for example by specifying a VAR-type model for γ. These extensions, however, are beyond the scope of this thesis. The assumptions made are restrictive and this is one of the limitations of the model as defined in this thesis.

If one considers the conjunctive decision rule, the probability of individual h with buying scenario i choosing alternative j given some threshold takes on the following form:

Pr(j)hit = Pr



zhijt> zhikt ∀k such that

Y m I(xhikmt> γhmt) = 1  , (3.14) with γhmt ∼ N (¯γm, σm2), (3.15)

to allow for heterogeneity of the cutoff value.

If one considers the disjunctive decision rule, the probability of individual h with buying scenario i choosing alternative j given some threshold takes on the following form:

Pr(j)hit= Pr



zhijt> zhikt ∀k such that

X

m

I(xhikmt > γhmt) ≥ 1



. (3.16)

If one considers a mixture of the disjunctive and conjunctive decision rule to allow for struc-tural heterogeneity, the probability of individual h choosing alternative j given some threshold

(29)

takes on the following form:

Pr(j)hit= φPr



zhijt> zhikt ∀k such that

Y

m

I(xhikmt > γhmt) = 1



+(1 − φ)Przhijt> zhikt ∀k such that

X m I(xhikmt > γhmt) ≥ 1  . (3.17)

In this thesis, the model with conjunctive decision rules will be estimated. This is done in line with the results of Gilbride and Allenby (2004), who find that this outperforms the other models (as well as the regular probit model). The exact estimation algorithm will closely follow the appendix of Gilbride and Allenby (2004). The exact estimation algorithm can be found in Appendix D.

3.3.4 Comparison of the Models

This section gives a broad overview of the similarities and differences of the three models. This is done based on the design of the models, their advantages and disadvantages, their output and their expected predictive performance.

The standard multinomial logit model is the most well-known and often used model of the three. Its approach is similar to many standard econometric models, assuming i.a. full knowledge of all different alternatives. Its disadvantages include the aforementioned assumptions (as satis-faction of these assumptions is critical in the unbiasedness and consistency of the estimations) and the fact that it is unable to deal with large feature spaces. Its advantages include that it is likely to be more computationally efficient than the other models, as the RMNL estimates R MNLs with 5 regressors (the MNL estimates 1 MNL with 25 regressors) and the two-stage model has to find the conditional distribution first through MCMC methods, which has in pre-vious researches been very computationally intensive (i.a. Belloni and Chernozhukov, 2009). It outputs coefficients, whose magnitude and direction can be interpreted with relative ease.

The RMNL also assumes full knowledge on all different alternatives, but is able to deal with large feature spaces. This is the main advantage it has over the standard MNL. On top of this, Prinzie and van der Poel (2008), have also shown that it yields superior predictive performance even if it is possible to estimate a MNL. Due to the fact that it subsequently draws bootstrap samples and subsets of the regressors, it is not able to provide interpretable coefficients. This is due to the fact that the coefficients estimated are subject to omitted variable bias and this is also its main disadvantage. Magnitude and direction of the regressors can only be approximated by

(30)

z-values and Partial Dependence Plots11. Its main advantage over regular random forest type algorithms is that it is able to deal with alternative dependent variables, such as the amount of articles.

Finally, the two-stage model yields an advantage over the other two models because it no longer assumes full knowledge on the alternatives. Its main disadvantage is that it is presumed to be very computationally intensive (i.a. Belloni and Chernozhukov, 2009). It returns coefficients that are, if the model is specified correctly, not subject to bias or inconsistency. This makes the model output easier to interpret than the RMNL model output. Based on HIT frequency, the model has been shown by Gilbride and Allenby (2004) to perform better than the standard probit model, which is assumed to yield performance similar to the MNL. To the author’s knowledge, it has never been compared directly to the RMNL model and thus it is not known which model will outperform the other. It should be noted that the probit model in this specification follows a random coefficient regression and that thus the variables included in this model are slightly different. This is further elaborated upon in Subsection 5.4.1.

(31)

Data

4.1

Van Lanschot Background

Van Lanschot is a private bank located primarily in The Netherlands. Although the bank

provides many services, they are specialised in investment aid. The bank is open to more

affluent individuals, as the minimum free capital needed is 500,000 euros. In exchange, clients are assigned a private banker, who guides them when making their financial decisions.

Van Lanschot offers three options for investing; investing on your own, asset management (in which Van Lanschot manages your assets for you) and investment advice, in which decisions are made together with an advisor. In the last option, a new investment can be initiated by both the individual as well as the advisor, which is important to keep in mind in interpreting the outcomes of this study. Within investment advice, they again offer three different levels of service; basic, active and intensive. The main difference between the three is that they have gradually higher standard fees and gradually lower fees per transaction. Individuals who trade often are therefore expected to have a larger probability of being in the intensive service level than in the basic service level.

As part of the service provided by Van Lanschot, clients have access to a website and an app, on which they post their central advisory communication. This content is aimed to inform investors about current financial affairs and to inspire them for future transactions.

(32)

4.2

Data Description

The data used in this study is divided up into data relating to content, data relating to a client’s characteristics and portfolio and data relating to the transactions. The latter two are divided up as the characteristics are assumed to stay constant throughout the duration of a month and are the control variables in the models, the transactions are the dependent variables in the model.

The characteristics data is available on the monthly level. At the end of a month, a snapshot is taken and saved as being the client’s characteristics for that month. The transactions are available on the individual level. The relevant client characteristics are taken as the characteris-tics at the end of the prior month. Relevant news articles are the articles that date less than one month prior to the decision. The choice for one month was made as an advisor has to call his or her client in case of an alert or other change of opinion on the instrument towards negative within two weeks, thus one month should allow an advisor to have made the transaction.

This section discusses the three different data sources and addresses the different variables that are included in them. It also provides some descriptive statistics on these variables.

4.2.1 Content Database

This database was created largely for this study. Some variables were extracted from the content management system, the database behind the Van Lanschot website. These were manually extended by adding ISIN codes (unique codes identifying the instrument at hand), fund names and advice types.

The study is limited to ’call-to-action’ news messages. This is done for practical purposes, but also because the main aim of these is to persuade investors to make transactions. Other types of content can for example also be written merely to inform clients and thus their value cannot be measured in mutations made. Within these messages, Trading Ideas, Alerts, Conviction Lists, Dividend Top 10’s, Kempen Favourite lists and Investment Ideas are distinguished. A short description of each of these types of content is given in the Appendix E.1.

The database is structured on an instrument level. This means that it contains the ISIN code of the instrument, the name of the instrument, the advice (positive or negative), the title of the article, a unique ID of the message, the publication date and the type of news message. Per article, thus, multiple rows in the database may be present as one article often contains various instruments. Also, multiple rows per instrument are likely to be present, corresponding

(33)

to different articles.

4.2.2 Characteristics Database

The database on the characteristics contains i.a. all instruments (and capital per instrument) included in a client’s portfolio for each snapshot. The database is structured on account level. One should take into account that it is very likely that an individual holds multiple accounts (even within the investment advice branch of Van Lanschot). Each account has a corresponding relation number, which can either be an individual, multiple individuals, a private company etc. Each relation has multiple product holders, which corresponds to the individual or individuals behind the relation. The head product holder is regarded as the natural person owning the account. One person may thus correspond to different relations and one relation may again own multiple accounts. This study is primarily concerned with the mutations on an account level, but this means that the response of one individual may be recorded multiple times.

On these individuals, the frequency of contact they have in 2017 with their financial advisor is also collected. The reporting of this variable is very limited. The frequency is reported, but the reporting of the nature of the contact1 is questionable. The variable is still deemed trustworthy enough to identify the dormant clients. Clients with two standard deviations below the average

amount of contact2 or more are removed from the observations. These are classified as the

dormant clients as specified in Section 3.1 and are not expected to be influenced by content. Of each product, the ISIN code, the type of product3, the sector corresponding to the product (if applicable), the exchange on which the product is traded, the country of origin and the country of Initial Public Offering (IPO) are documented.

Another important aspect included in this database is the covenanted risk profile, the date of this agreement and the current risk profile. The covenanted risk profile and the current risk profile often deviate from each other, but policy is that they should not deviate too much or for too long. Financial advisers are obligated to notify their clients if such deviations occur for a certain period of time The risk profile may influence a client’s decision on whether or not to invest in a product, as it may not fit in their profile. Also, they may decide to buy or sell certain positions in order to be closer to their agreed risk profile. The current portfolio is also of interest

1This can for example be a phone call or an email. Some advisers are more specific in the subjects reviewed

in the contact than others

2

The average amount of contact is equal to 50,4 and the standard deviation is 23,8, thus clients with less contact than 3 times per year are considered dormant.

3

(34)

with regard to this issue, as portfolios are advised to be sufficiently diversified, also to eliminate the risk of the portfolio. Thus the type of products owned by a client may not only provide a proxy of the interest of a client when it comes to investing but also the possibilities with regard to diversification.

Another variable included in the database is the service level. Within the advisory branch of the investment bank the basic service, active service and intensive service are distinguished. The main importance of these with regard to this study is that these levels indicate a successively higher standard fee and lower fee per transaction. Thus, intensive clients are expected to trade more frequently than clients in the basic division. Intensive clients may therefore also have more capital in their portfolio, as these clients have more investment possibilities (some products can only be invested in above a certain level of capital) and more need for intensive diversification. Other variables in the database include the account manager (private banker) and the bor-rowing capacity.

4.2.3 Transactions Database

The transactions database contains all transactions per account (and thus per relation and per person). The capital involved in the transaction is recorded both in the currency of the fund as well as in euros. The amount of the product is also recorded as well as the price per unit. Important to note is that the difference in the aggregated amount of all portfolios changes in a certain instrument and the aggregated transactions in the same instruments are not the same. Thus using the delta in the characteristics database would result in different transaction values than when using the transactions database. This can be attributed to the fluctuations in the pricing of the asset throughout the month. The transaction amount is measured at the time of the transactions, whereas a delta calculated from the portfolios is measured as the difference in total invested value in month t minus the total invested value in month t − 1, with the value measured at the end of the respective months. This study will maintain the transaction data as the data measuring the mutations made in the portfolios.

4.3

Specification of the choice set

In order to accurately estimate a discrete choice model, aggregation of the choice set, which originally contains 5124 choices, one for each ISIN code, needs to be done. The most obvious

(35)

choice set one can specify is the option ’not to buy or sell’, the option ’to buy or sell something that no content was written on’ and ’to buy or sell one of the instruments that was written on in the last month’. The latter option needs to be separated into all different instruments, in order for one to be able to quantify the effect of different types of content. However, this would lead to a large parameter space that could possibly be too complex to estimate. If content was written on a certain instrument, but only one person would decide to invest in this instrument, then an accurate model cannot be estimated. Another complication of specifying the choice set this way is that the choice set is then time varying, as different content is written each month. Specifying the choice set as all instruments written on during the whole time period (2 years), would increase the parameter space so much that estimation would be unfeasible. Thus, this thesis estimates the choice set in a way that may not be obvious at the first sight, but might make more sense if one keeps in mind the complications mentioned above.

The choice set is specified as being the decision ’not to invest’ and the decision to invest in (or sell) instrument type x in sector y. The sectors are determined based on the Global Industry Classification Standard (GICS) taxonomy. Different instruments are assigned to different sectors by the finance department of Van Lanschot. As not all options are exploited, the dataset yields 34 different decisions. On the content level, the dataset specifies how many of a specific content type was written on that type of instrument in that sector in that month.

Two drawbacks are identified based on this estimation structure. Firstly, the dataset may be too aggregated to measure the effects of content. Content written on one instrument could cause a lot of traction on that instrument, but this effect may be negligible when aggregated into the ’type-and-sector’ format. Secondly, a lot of the instruments are not assigned to a sector. This causes a lot of the instruments to fall in a large ’undefined’ category.

4.4

Combining the different datasets

Firstly, the transaction dataset was extended by including all account numbers. If an account did not invest, then the transaction price was set to zero and the ISIN code was left unknown. The choice then corresponds to ’not investing’. Each row of the combined dataset contains the account number, the period indicator, the relative amount of capital in stocks, in bonds and in investment funds (measured at the end of 2017), the measured risk profile and the covenanted risk profile (and their deviation), the amount of euros in the transaction (zero if no transaction

(36)

took place), the relative amount of instruments in the portfolio in each sector, the sector of each instrument, the type of each instrument, and the amount and type of content for that specific instrument type and sector.

In this dataset, the variables on sector and type of instrument of the traded instruments and a dummy generated on whether the observation contains an actual transaction or not (a dummy which is one if there was no transaction) are combined to yield a choice variable, being a number between 1 and 34.

It should be noted that although the dataset may resemble a panel data, it is in fact not. This is mainly due to the fact that each account can make multiple trades in a month, thus creating more than one possible observation point per account in that month. Although it would be possible to aggregate these results, by i.a. taking the largest transaction in that month as the observation for an account in that month, it is chosen not to do this during the estimations. This choice is based on the fact that often, a negative transaction precedes a positive one, as this simply corresponds to withdrawing money from one instrument and investing this in another instrument. The model, however, is still able to incorporate some of the over time fixed effects, by including a cluster dependent intercept.

4.5

Descriptive statistics

A total of 5961 different accounts can be distinguished in the data. All these accounts make at least one transaction in the period between March 2016 and March 2018. In total, this leads to a dataset of 507588 observations, thus on average 85 observations per account number (either ’empty’ ones or ones containing transactions). Regarding just the transaction data, on average, each account makes 78 transactions over the course of a two year period, with a maximum of 1522 transactions and a minimum of 1 transaction. This distribution is thus very right skewed.

Descriptive statistics for content related measures are provided in Table 4.1. Descriptive statistics on the different choices made are provided in Table 4.2.

(37)

Figure 4.1: The histogram of trades from March 2016 until March 2018. This shows a very skewed distribution.

Table 4.1: Descriptive statistics with respect to content related variables

Statistic Value Amount of Instruments1

Amount of call to action messages 236 646

Amount of Alerts 51 61

Amount of Conviction Lists 57 198

Amount of Dividend Top 10’s 8 80

Amount of Kempen Favourite Lists 10 60

Amount of Investment Ideas 62 158

Amount of Trading Ideas 48 89

Referenties

GERELATEERDE DOCUMENTEN

Penningmeester Frank van den Heuvel heeft zijn bestuurstaak kort voor de zomervakantie neergelegd.. Henk Bijleveld is aftredend en stelt zich niet

management in general and for answering the research question specifically are described by using the academic literature, namely: the essence of risk, enterprise risk management,

The international social capital of a local investor and the social capital of the entrepreneurial firm’s management team help to increase the effect of cross-border

treatments, is stored in an EHR but also in paper records which the doctors use for every appointment with a patient. These paper records used to be placed in the

The survey data will be statistically analysed (regression analysis) under the aspects of what degree and type of knowledge the supplier owns, the degree of

We predicted that experienced and perpetrated child abuse and neglect are associated with altered sensitivity to social signals and rejection as reflected by decreased ACC,

Secondly, the variables were measured divided into five categories: commitment to change (affective-, normative- and continuance commitment), change variables (change

My specific research question is set out as follows: “At an elementary stage of language learning, do the syntactic similarities between wh- question