• No results found

Ignorance and Sophistication: the Impact of Heterogeneity on Mortgage Prepayments

N/A
N/A
Protected

Academic year: 2021

Share "Ignorance and Sophistication: the Impact of Heterogeneity on Mortgage Prepayments"

Copied!
115
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Ignorance and Sophistication: the Impact of Heterogeneity on Mortgage

Prepayments

MSc Thesis (Afstudeerscriptie)

written by

Mathijs van der Vlies

(born April 27th, 1993 in Dronten, the Netherlands)

under the supervision of S. (Stratos) Nikolakakis and Dr. P.J.C. (Peter) Spreij, and submitted to the Board of Examiners in partial fulfillment of the requirements for the degree of

MSc in Stochastics and Financial Mathematics

at the Universiteit van Amsterdam.

Date of the public defense: Members of the Thesis Committee: August 19, 2016 Prof. dr. J.V. (Jasper) Stokman

Dr. P.J.C. (Peter) Spreij Dr. A. (Asma) Khedher

(2)

Abstract

A crucial element of modelling prepayment risk in mortgages is the heterogeneity of mortgagors, which can explain qualitative phenomenon such as burnout. The aim of this thesis is to inves-tigate the sources of heterogeneity in mortgage prepayments from a liquidity risk perspective. Supplementary to burnout, the heterogeneity in mortgage rate expectations is modelled by allo-cating heterogeneous prepayment incentive functions to clients. Multinomial logistic regression is applied to estimate the probabilities of mortgagors to prepay over a 30 year horizon. These probabilities serve as input for the generation of the behavioural cash flow calendar, the ultimate risk metric estimated by the model. Underlying mortgage rates are simulated with a two-factor Hull-White model.

The study finds that burnout is a significant source of prepayment risk and that its inclusion raises the sensitivity of the projected behavioural calendar to interest rate fluctuations. Addi-tional uncertainty is reported as a result of the heterogeneous expectation functions, warranting further research into the allocation procedure using client characteristics.

(3)

Contents

1 Introduction 5

1.1 Historical Background . . . 5

1.2 Liquidity Risk Management . . . 6

1.2.1 Liquidity Risk Management Cycle . . . 7

1.3 Prepayment risk in residential mortgages . . . 8

1.4 Research Framework . . . 9

1.5 Mortgage characteristics . . . 11

1.5.1 Fixed or variable interest rate . . . 11

1.5.2 Amortization types of mortgages . . . 12

1.5.3 Prepayment . . . 12

1.5.4 Nationale Hypotheek Garantie . . . 13

2 Model Description 14 2.1 Model summary . . . 14

2.2 Binary choice modelling . . . 15

2.2.1 Non-linear model for probabilities . . . 15

2.2.2 Probit and logit model . . . 15

2.2.3 Estimation of a logit model . . . 16

2.3 Multinomial choice modelling . . . 17

2.3.1 Multinomial Logit . . . 17

2.3.2 Estimating the multinomial logit model . . . 18

2.3.3 Independence of Irrelevant Alternatives . . . 19

2.3.4 Multinomial Probit . . . 20

2.3.5 Motivation for multinomial logit . . . 20

2.4 Prepayment model . . . 21

2.4.1 Data structure . . . 21

2.4.2 Dependent variable . . . 22

2.5 Explanatory variable selection: General to specific approach . . . 22

2.5.1 Model specification tests . . . 23

2.5.2 Model diagnostics . . . 24

2.5.3 Out-of-time analysis . . . 26

2.5.4 Definition of the prepayment incentive . . . 26

2.6 Forecasting explanatory variables: stress scenarios . . . 27

2.6.1 Loan-specific risk drivers . . . 27

2.6.2 Macro-economic risk drivers . . . 28

2.7 Single Monthly Mortality rates . . . 28

2.7.1 SMM for the event Conversion . . . 29

2.7.2 SMM for the event Curtailment . . . 30

(4)

2.8.1 Contractual calendar . . . 31

2.8.2 Behavioural calendar . . . 32

2.8.3 Weighted Average Life . . . 32

3 Heterogeneous Clients 33 3.1 Heterogeneous Expectations . . . 33

3.1.1 Motivation for expectation functions . . . 34

3.1.2 Scenario Analysis . . . 35

3.2 Burnout . . . 36

3.2.1 Centering of burnout . . . 37

3.2.2 Model specifications including burnout . . . 38

4 HJM Yield Curve Model 39 4.1 Interest rate definitions and products . . . 39

4.1.1 Zero-Coupon Bonds . . . 39

4.1.2 Interest rate definitions . . . 40

4.1.3 Interest rate products . . . 41

4.1.4 Swaptions . . . 43

4.2 Heath-Jarrow-Morton (HJM) Methodology . . . 45

4.2.1 Motivation . . . 45

4.2.2 The HJM framework . . . 45

4.2.3 Risk neutral dynamics . . . 48

4.3 Market Price of Risk . . . 49

4.3.1 Volatility structure and model assumptions . . . 49

4.3.2 Distribution of R(t, T ) given Fs . . . 50

4.3.3 Market price of risk estimation . . . 52

4.4 Calibration on swaption prices . . . 54

4.4.1 The price of a swaption under the HJM model . . . 55

4.4.2 Minimization method . . . 58

4.5 Simulation of mortgage rates . . . 59

4.5.1 Simulation of yield curve . . . 59

4.5.2 Conversion to mortgage rates . . . 59

4.5.3 Fitting to stress scenarios . . . 61

5 Data 62 5.1 Calibration data . . . 62

5.2 Interest rate data . . . 63

5.2.1 Historical yield curves . . . 63

5.2.2 Swaptions . . . 64

5.2.3 Mortgage rates . . . 64

5.3 Macroeconomic variables . . . 64

5.3.1 Adjusted HPI scenarios . . . 65

5.3.2 Mortgage rate scenarios . . . 66

6 Model Assessment 68 6.1 Model assumptions . . . 68

6.2 Yield curve model assessment . . . 69

6.2.1 Parameter stability . . . 70

6.2.2 Model fit to swaption prices . . . 71

(5)

6.2.4 Mortgage rate simulations . . . 75

6.3 Multinomial logit model assessment . . . 76

6.3.1 Model specification tests . . . 76

6.3.2 Model diagnostics tests . . . 79

6.4 Performance testing . . . 84

6.5 Model assessment conclusion . . . 85

7 Risk Metrics Calculation 86 7.1 Behavioural maturity calendar . . . 86

7.2 Weighted Average Life . . . 88

8 Conclusion 91 A Definitions 93 B Stochastic Calculus 96 B.1 Probability Space . . . 96

B.2 Stochastic Integration . . . 97

B.3 Itˆo’s Formula . . . 98

B.4 Financial Market and Martingale Measures . . . 98

B.5 Fubini’s Theorem . . . 99

C Swaption volatilities and prices 101 D Analysis of Logistic Regression 103 D.1 NoBurnout specification . . . 103 D.2 AvgBurnout specification . . . 104 D.3 AvgBurnoutxPI specification . . . 104 D.4 AvgRollingBurnout specification . . . 105 D.5 AvgRollingBurnoutxPI specification . . . 105 E Autocorrelation tests 106 E.1 NoBurnout specification . . . 106

E.2 AvgBurnout specification . . . 108

(6)

Preface

This thesis marks the final step in obtaining the Master of Science degree in Stochastics and Financial Mathematics at the University of Amsterdam (UvA). The research was conducted during an internship of seven months in the Assets, Liabilities, Markets & Treasury Modelling (ALM/T) department at ABN AMRO N.V.

As a student with primarily theoretical background, modelling prepayment in mortgages from a behavioural aspect was a new challenge that I was eager to take up. Even though I have learned many things about the product and all the risks associated with it, I feel that there is more complexity than I could ever imagine. When data can never hope to describe the world fully, the temptation always exists to try one more solution, searching for perfection which can never be found. Being able to let go and finish the work that I have started is a skill that will be valuable for the rest of my career.

I would like to express my gratitude to Stratos Nikolakakis, my company supervisor and the head of the liquidity team within ALM/T, for giving me the opportunity to apply my theoretical studies in a professional environment. His guidance through every step of the internship has been invaluable to me. Aside from being knowledgeable in many areas of financial modeling, he has been a source of inspiration in my transition from academics into the financial industry. Furthermore, I would like to thank my supervisor Peter Spreij and my second reader Asma Khedher from the UvA for helping me improve the quality of my research and its documentation. Many colleagues within ABN AMRO have helped me on various aspects of the thesis. In particular, I would like to thank Peter den Iseger and Manuel Ballester for their help in modelling the mortgage rate. Furthermore, I would like to thank my fellow members of the liquidity team. They have been great colleagues and friends and I look forward to working with them as a full time employee at the bank.

Finally, I thank my family and friends for their support and for their advice in the career choices that inevitably presented themselves during my time at the bank.

(7)

Chapter 1

Introduction

This chapter serves to introduce the concept of liquidity risk in the current regulatory environ-ment, focusing on the mortgage product. The research objective is formulated, and mortgage characteristics relevant to the model are discussed. Section 1.1 puts the current risk regulations in historical perspective. Section 1.2 describes how liquidity risk is managed within the bank. Section 1.3 discusses the liquidity risk present in residential mortgages and provides an overview of the relevant risk metrics that a model should compute. The research objective is formulated in Section 1.4. This section provides an overview of the remainder of the thesis. Finally, a more detailed description of the mortgage product and its characteristics is provided in Section 1.5.

1.1

Historical Background

The subprime mortgage crisis of 2007-2008, which was triggered by a large decline in housing prices after the collapse of the housing bubble, had dire consequences for the banking sys-tem. Borrowers were struggling to refinance their loans as housing prices fell. As variable rate mortgages reset at higher rates, lenders were faced with many mortgage delinquencies. In the financial crisis that followed, large financial institutions collapsed, including Lehman Brothers and Bear Stearns. As uncertainty reigned over who was the next to fall, the credit market became paralysed and trillions of dollars in market capitalization were lost.

The blow was also felt in the Netherlands. The bank Fortis, who had acquired the Dutch parts of ABN AMRO, were facing liquidity issues because of a bank run as well as a shortage of loans from other banks, prompting the nationalization of ABN AMRO and Fortis on 3 October. Furthermore, the Dutch government made EUR 20 billion available for healthy banks and insurance companies as a buffer against the tumultuous market. ING, the biggest financial institution in the Netherlands, received EUR 10 billion. Other financial institutions which applied for governmental support were AEGON (EUR 3 billion) and SNS (EUR 750 million).

Some of the biggest shocks to the banking system occurred due to liquidity issues. Lehman Brothers and Bear Stearns both faced bank runs that led to their demise. As financial institu-tions were concerned about the exposure of other banks to the subprime-related assets, banks stopped trading with other banks and started hoarding liquid buffers. As banks became less willing to lend, investors faced funding risk, raising margins and thus further worsening funding. A downward liquidity spiral was the result.

In response to the deficiencies in financial regulation during the crisis of 2007-2008, the Basel III framework was published in December 2010, imposing stricter capital demands as well as explicit demands on the liquidity position of banks. Monitoring tools such as Liquidity Coverage

(8)

Ratio (LCR1) and Net Stable Funding Ratio (NSFR2) were introduced, which determine the quantity of liquid assets that a financial institution must hold as a buffer against liquidity risk. In addition, De Nederlandsche Bank (DNB) introduced the Internal Liquidity Adequacy Assessment Process (ILAAP) in June 2011, requiring banks to thoroughly assess its liquidity risk management.

These regulations spawned the liquidity side of risk management which had been neglected in the past. This thesis covers liquidity risk associated with a particular product: residential mortgages. In the following sections, it is explained how the liquidity risk is assessed and what its impact is for this particular product.

1.2

Liquidity Risk Management

Banks are liquidity providers for both depositors and borrowers. At any time, clients can come in and withdraw funds, and from a banks perspective, these occurrences occur mostly at random. Banks must be able to meet their obligations at the desired times. In order to fund long-term assets, banks use short-term funding. These funding sources are not always reliable: due to market movements or defaults, funding sources can dry up and in turn this can cause banks to fail on their obligations.

Banks are particularly susceptible to two types of liquidity risk:

• Funding liquidity risk: this risk is bank-specific. It results from an inability of an institu-tion to meet payment obligainstitu-tions in a timely manner, both when repaying debt and when funding loan commitments. It is the primary source of liquidity risk for a bank.

• Market liquidity risk: this risk is product-specific. This risk arises when a transaction cannot be conducted in a timely manner, or when the transaction can only be conducted at a discount of its original value, due to insufficient depth of its market.

Often, liquidity events will be of a behavioural nature. For example, a bank run occurs when clients fear for the collapse of a bank and that they will be too late to withdraw their sav-ings money. However, less extreme events can also be called liquidity events. For example, every mortgage carries liquidity risk because clients may choose to deviate from the contractual agreement by prepaying (part of) the outstanding of a mortgage before it matures, causing an unexpected cash flow which disturbs the usual funding plan. We refer to this risk as behavioural liquidity risk.

In ABN AMRO, models have been developed to assess this type of risk. Each of these models assess the liquidity risk associated with specific products on the balance sheet. The model landscape for the behavioural liquidity risk models consists of five categories. The first category covers the behavioural liquidity risk models related to the Banking Book of ABN AMRO. The Banking Book liquidity risk models can be divided into Assets and Liabilities related models. Both groups can be further subdivided into Non-Maturing and Maturing Assets/Liabilities, resulting in four types of assets/liabilities. For each asset/liability type we list typical items and associated liquidity events.

• Maturing Assets (MA): include Residential Mortgages (RM) and Term Loans (TL). Typ-ical liquidity events: prepayment, conversion, maturity extension/reduction.

1

The LCR is a ratio quantifying the amount of high quality liquid assets (such as government bonds) that an institution holds, which can, if needed, be converted easily into cash to survive a 30 day stress scenario.

2The NSFR ensures that banks hold a minimum amount of stable (based on liquidity) funding over a one year

(9)

• Non-Maturing Assets (NMA): include Current Accounts Debit and Credit Cards (ICS). Typical liquidity events: utilization.

• Maturing Liabilities (ML): include Term Deposits (TD). Typical liquidity events: attri-tion, balance fluctuation.

• Non-Maturing Liabilities (NML): include Savings and Current Accounts. Typical liquidity events: roll over, early withdrawal.

The second category in the model landscape includes the behavioural liquidity risk model for the Trading Books of ABN AMRO. This model is associated with the Trading Securities in the Capital Markets Solutions (CMS) portfolios and this risk is measured by the Haircut model (HC).

The third category in the model landscape covers the liquidity risk related to the collateral-ized derivative portfolio of the ABN AMRO. This is captured by the CSA behavioural Liquidity Risk model.

The fourth category in the model Landscape deals with the behavioural liquidity risk model for Clearing.

Finally, the fifth category in the model landscape includes the Treasury related behavioural liquidity risk model. More specifically, this category includes the intra-day liquidity model

DNB set up the Internal Liquidity Adequacy Assessment Process (ILAAP) in the Nether-lands, on top of the existing Basel III framework. The framework requires Dutch banks to give a description accompanied by an internal assessment on the way in which liquidity risk is managed in the organization. Topics governed include an internally required minimum level of liquidity that is maintained as a buffer, the suitability of the current liquidity profile of the institution and the level of actual liquidity expressed in absolute amounts, applied ratios and limit breaches.

The ILAAP program, as initiated by DNB, contains two crucial elements. Firstly, the qualitative elements and quantitative elements of the ILAAP are described. The qualitative part elaborates on such aspects as expectations relating to the strategies, procedures and measures and the liquidity cushions to be maintained by the institution. The quantitative part, which supports the qualitative part, contains standards for limits, stress tests, maturity calendars, liquidity ratios (LCR, NSFR) and monitoring tools. Secondly, a self-assessment procedure is introduced that Dutch Banks should follow in order to assess their liquidity risk management, and improve it, where necessary.

1.2.1 Liquidity Risk Management Cycle

As any other type of risk, liquidity risk within ABN AMRO is managed through a set of steps which form a cycle. First, the liquidity risk strategy is formulated. Then, the liquidity risk is assessed and reported to the regulatory bodies. Finally, a funding plan is set up, after which the strategy is formulated again. The steps are detailed below:

1. The Liquidity Strategy of the Bank is formulated. Banks face many types of risk (such as market risk, interest rate risk, credit risk, liquidity risk). The so-called risk appetite, defining the amount of risk of each type that the bank is willing to accept, is formulated. If the liquidity risk of the bank is too high, the risk must be transformed into another type of risk, or must be reduced. Since the crisis, ABN AMRO have committed themselves to a moderate risk profile. The risk appetite is adjusted accordingly.

2. The Liquidity risk in the Balance Sheet is estimated. Various items on the balance sheet carry liquidity risk. For each of these items a behavioural model is developed.

(10)

Behavioural liquidity risk is estimated through well-specified econometric models. Both client-specific and macro-economic variables are included. Through behavioural cash flows (cash flows that include both contractual cash flows and liquidity events), a behavioural calendar is generated that predicts all cash flows for a certain time horizon.

3. Stress testing and scenario analysis: the performance of the bank’s portfolio under stressed circumstances is tested. The bank needs to be able to meet their payment obli-gations in these situations. The stress scenarios may be bank-specific or market-specific (or involve a combination of the two), and should be extreme but plausible. Typically the stress scenarios are chosen to last from one month (short-term) up to one year (long-term). 4. Following from these analyses, a Limit Framework is set up. For different time hori-zons, liquidity limits are set up that may only be breached by the traders in exceptional circumstances.

5. Liquidity Risk Reporting takes place. Among other quantities, the LCR and NSFR are reported to the regulatory bodies (De Nederlandsche Bank, European Central Bank). Furthermore, liquidity gap3 reports take place. Liquidity cash flow mismatches are re-ported via the behavioural calendar.

6. Funds Transfer Pricing (FTP4) and Funding Plan: After the liquidity risk has been assessed for the full portfolio of the bank, a funding plan is formulated for every product on the balance sheet, taking this risk into account. Through FTP, the liquidity risk is priced (a premium is charged to account for the liquidity risk) for every transaction that the bank makes. The liquidity buffer is managed and a funding contingency plan is set up in the event of an economic emergency.

After the sixth step, the risk appetite is formulated again with the new risk metrics, starting the cycle again from step 1. The second and third steps are in the scope of this thesis. The next section will outline the product and the liquidity risk that it carries for the bank.

1.3

Prepayment risk in residential mortgages

This thesis focuses on liquidity risk that arises from residential mortgages. ABN AMRO has a leading market position for this product. According to the year rapport of 2015, the Dutch mortgage market amounted to EUR 638 billion at 30 September 2015 in terms of outstanding loans, of which ABN AMRO held a share of approximately 23%. Moreover, ABN AMRO reached a market share of approximately 20% in new mortgage production, capturing a number one market position in 2015. It is evident that mortgages play a big role in the bank’s profit and loss.

For residential mortgages, behavioural liquidity risk is defined as follows:

The risk that cash flows deviate from the cash flows following from the contractual agreement All mortgages include various forms of optionality. During the lifetime of a mortgage, clients may choose to repay (partly or in full) the outstanding debt before the mortgage matures.

3

The liquidity gap denotes the difference between a firm’s assets and a firm’s liabilities, caused by said assets and liabilities not sharing the same properties. This gap can be positive or negative, depending on whether the firm has more assets than liabilities or vice versa.

4FTP is a method used to individually measure how each source of funding is contributing to overall

(11)

Moreover, a client may negotiate a change in the terms of the contract, yielding a lower interest rate or a different repayment schedule. These events all change the cash flows that would normally occur, causing liquidity risk for the bank. We label these events as prepayments. The liquidity risk caused by prepayments materializes in two ways:

• Prepayments cause a funding gap: cash that was expected to arrive later must now be reinvested earlier to fund other transactions.

• Since prepayments reduce the outstanding that a client needs to pay, interest income is foregone on the prepayment amount.

Modelling the liquidity risk in the residential mortgage portfolios entails modelling the probabil-ities of the various discrete prepayment options per client. These probabilprobabil-ities serve to generate the two main risk metrics of a liquidity risk specific prepayment model:

• Behavioural calendar: the behavioural calendar is a monthly forecast of cash flows gener-ated by the existing portfolio over a horizon of 30 years. The calendar incorporates not only contractual repayments, but also (behavioural) prepayments. The forecast does not include the possibility of mortgages being offered to new clients.

• Weighted Average Life (WAL): the WAL denotes the average amount of time before each euro of principal is repaid.

The behavioural calendar provides a bank with the liquidity risk profile of the mortgage port-folio. It allows the bank to assess how much funding is needed for the current portfolio over the time horizon. By determining the change in the behavioural calendar in various stress scenarios, the associated liquidity risk is assessed. The WAL condenses the behavioural calendar in one number that is easily compared between stress scenarios.

1.4

Research Framework

When the thesis was initiated, there was already an existing model for prepayment risk in residential mortgages. The objective was to find extensions to the model. While the model was already quite extensive, it quickly became apparent that the model could be improved. The existing model was used to predict client prepayment behaviour and obtain a forecast of future behavioural cash flows (contrasting contractual cash flows). The model had recognized that clients do not always behave optimally, and that option-theoretic models are therefore out of the question. The model proposed a multinomial logit approach, including both macroeconomic and mortgage-specific information.

However, due to data limitations, client data was not included in the model. In particular, the model assumes that, on average, all clients are affected equally by their macro-economic and mortgage-specific circumstances.

This is where the greatest challenge of the thesis lies. By nature, prepayment behaviour is a problem in which differences between clients matter. These differences are referred to as het-erogeneity. Without heterogeneity, phenomena such as burnout, which depresses prepayments as a mortgage portfolio ages due to the exodus of fast prepayers, and non-optimal prepayment cannot be adequately explained. The heterogeneity of clients can be observed in historical data. Clients are observed that seem to follow the mortgage market closely and prepay at low inter-est rates. But clients ignorant to the market are also observed: these clients miss many good prepayment opportunities and are slow to react to interest rate movements.

One can easily imagine these types of clients. However, quantifying the heterogeneity proves to be a challenging task. This thesis extends the existing prepayment model by accounting for

(12)

heterogeneity, motivated by a mix of expert input and directly observed behaviour. The research objective is formulated as follows:

Research objective

Forecast prepayment probabilities by taking into account the heterogeneity of clients, given the uncertainty in mortgage rate movements.

The objective is attained through the following approach. For every month and every loan-part, a number of explanatory variables is available within ABN AMRO. These explanatory variables are segmented into different groups. Macro-economic variables, such as the House Price Index and the interest rate which ABN AMRO charges on new mortgages, are included. Mortgage-specific information, such as the amortization type and the client’s prevailing mort-gage rate, are included. For each variable, parameters are estimated through a multinomial logit regression describing the effect that each variable has on the probabilities of various prepayment events.

The explanatory variable that is expected to have the greatest effect on prepayment rates is the difference between a client’s prevailing mortgage rate and the interest rate which one receives on a new mortgage. A client with a high prevailing mortgage rate is expected to have a higher probability of prepayment. The existing model refers to this difference as the prepayment incentive. It is in this incentive that the new model incorporates the heterogeneity of clients. New prepayment incentives are defined for a number of different client profiles. These incentives should take into account a client’s expectation of future incentives. This expectation may be a simple or complex function of past incentives, depending on the sophistication of a client. Different distributions of clients along the client profiles are proposed for impact analysis with respect to the behavioural calendar.

An additional explanatory variable called burnout captures heterogeneity of clients from a different perspective. This variable, which is a function of prepayment incentives over the life of a mortgage, captures how actively a client is monitoring prepayment opportunities. The two resulting sources of heterogeneity capture both the sophistication and prepayment activity of clients.

The prepayment probabilities need to be forecasted into the future for determining the runoff of the mortgage portfolio. In order to do this, the explanatory variables of the multinomial logit regression need to be forecasted. Most importantly, the prepayment incentive must be forecasted. Taking into account the various heterogeneous prepayment incentives, paths of interest rates need to be modelled. For this, an interest rate model is necessary. A model is implemented which is capable of simulating future interest rate paths based on historical yield curves.

Two risk metrics are calculated as a result of the model: the behavioural calendar and the WAL. In order to provide the liquidity profile of the bank, these metrics are calculated in various scenarios. Firstly, the sensitivity to different interest rate scenarios (generated by the yield curve model) and house price scenarios (given exogenously) is assessed. Secondly, the impact of different distributions for the client profiles is evaluated.

This document is structured as follows. Chapter 1 introduces the reader to the mortgage portfolio and to the regulation surrounding the product. The types of mortgages are explained, and the optionality of a mortgage is discussed.

In Chapter 2, the underlying multinomial logit model is described. The estimation procedure is discussed and the assumptions are considered. A framework is set up for determining the explanatory variables to include within the model. Scenarios of stress for the macroeconomic variables in the future are provided. Furthermore, the methodology for determining the portfolio runoff given the prepayment probabilities is explained.

(13)

Chapter 3 discusses the heterogeneity that is incorporated in the prepayment incentive. Two elements are introduced to incorporate heterogeneity: heterogeneous expectation functions for the prepayment incentive and a new explanatory variable called burnout.

In Chapter 4, a yield curve model is proposed to forecast paths of mortgage rates. The HJM framework of the model is explained, and its volatility structure is calibrated on swaption prices. A trend adjustment is made through an estimation on historical yield curves and an additional spread based on historical mortgage rates finalizes the approach.

Chapter 5 provides a description of the data used. The available explanatory variables of the logistic regression are listed, along with the calibration instruments of the mortgage rate model. Finally, stress scenarios of the macroeconomic variables are provided.

In Chapter 6, the performance of the model is assessed. The performance of the yield curve model regarding its fit to the market and the plausibility of its simulations is investigated. Several specifications for the multinomial logit regression are compared through various tests on the parameters of the regression and on the residuals. To test the accuracy of the forecasts, an out-of-time test is performed which compares predicted prepayment rates with observed rates.

The results of the model are presented in Chapter 7. In several scenarios for the macroe-conomic variables and client profile distributions, the behavioural calendar and the WAL are computed. The impact of the scenarios on these quantities is assessed.

1.5

Mortgage characteristics

A mortgage loan, also referred to as a mortgage, is a loan taken by purchasers of real estate in order to fund the purchase. The loan is collateralized by the house/property: if the mortgagor (borrower) defaults on his/her loan or is otherwise unable to abide by its terms, the mortgage allows the mortgagee (lender) to take repossession of the mortgage, so that the property can be sold to pay off the loan. A distinction between residential and non-residential (company owned) mortgages can be made. This thesis will focus on the residential mortgages, which comprise most of the mortgages portfolio of the Bank.

A mortgage typically lasts between 10 to 30 years. During this time, the client pays off parts of the initial loan according to a specific repayment scheme, as well as interest rate over the principal amount. A mortgage consists of one or more loanparts: these are the basic components of a mortgage. For example, a client may have a mortgage of EUR 300.000 consisting of EUR 200.000 in a fixed rate bullet pay loanpart, as well as EUR 100.000 worth of a variable rate level pay loanpart.

1.5.1 Fixed or variable interest rate

The client can choose among two interest rate types, according to his/her risk appetite: • Fixed rate mortgage: a mortgage where the interest rate remains constant for a set

period, the interest term. Payment amounts remain fixed during the interest term, even if interest rate goes up. On the other side, there is no benefit when the interest rate goes down. After the fixed rate period ends the mortgage is either switched to a variable rate mortgage or a new fixed period is agreed upon with the lender. Note that prepayment penalties only exist during the fixed rate period, and not for the variable rate period. The fixed rate period typically ranges from five to ten years. The fixed rate mortgage is fully exposed to interest rate risk and liquidity risk.

(14)

• Variable rate mortgage: a mortgage loan where the interest rate is periodically ad-justed based on an index which reflects the cost to the lender of borrowing on the credit markets. Among the most common indices are the rates on 1-year Constant Maturity Treasury (CMT) securities or London Interbank Offered Rate (LIBOR). With this type of mortgage, the payments will fluctuate in line with changes to the variable rate. The monthly payments could go up as well as down. Variable rate loans are only exposed to liquidity risk and not to interest rate risk.

We can view the variable rate mortgage as a fixed rate mortgage with a fixed rate period of just one month. Hence, each month, the client has the same options that a client has at a reset date of a variable rate mortgage.

1.5.2 Amortization types of mortgages

Clients pay off the initial principal of a mortgage according to a specific repayment scheme. This repayment scheme is known as the method of amortization. The following amortization types can be distinguished within Dutch mortgages:

• Annuity (level pay) mortgage: Each month the borrower repays both interest and part of the notional. The total amount paid each period stays fixed throughout the lifetime of the mortgage. This figure includes both notional repayments and interest. As a result, the redemption amount increases over time but the interest paid over the outstanding decreases. The interest rate can be both fixed or variable. The borrower is allowed to prepay a fixed percentage (depending on the conditions of the mortgage) of the original loan amount per year without penalty.

• Linear mortgage: Each month the borrower repays both the interest and part of the notional. The difference with the annuity mortgage is that the redemption amount stays constant, but the total payment does not (due to the interest over the outstanding de-creasing over time). The interest rate can be both fixed or variable. The borrower is allowed to prepay a fixed percentage (depending on the conditions of the mortgage) of the original loan amount per year without penalty.

• Interest-only (bullet) mortgage: Each month the borrower repays only interest, no redemption takes place. The outstanding amount stays constant throughout the term of the mortgage. At the end of the term the borrower has to repay the full notional. The interest rate can be both fixed or variable. The borrower is allowed to prepay a fixed percentage (depending on the conditions of the mortgage) of the original loan amount per year without penalty.

The contractual repayment of a loanpart follows one of these repayment schemes. However, in every mortgage a prepayment option is embedded. This option is discussed in the next section.

1.5.3 Prepayment

At any point in time until the maturity of the mortgage a client has the option to prepay (part of) the amount that is still outstanding. Exercising this option effectively reduces the maturity, since following a prepayment, the client will pay off the mortgage earlier than before. Prepayment occurs for various reasons. A client might be in possession of a fixed rate mortgage with a high interest rate. If the interest rate on the market is sufficiently low, a client might decide to refinance to take advantage of the lower rate. Conversely, if a client owns a variable

(15)

rate mortgage, they could refinance into a fixed rate mortgage to lock in a low rate. Or, given that the client has saved enough funds, a client might prepay the mortgage altogether and avoid any further interest rate payments.

But clients have non-financial reasons of prepaying as well. Relocation to a new house is a big factor here. The client sells the house to raise money for prepayment, and issues a mortgage on a new property. Such a decision is often influenced not by interest rate considerations, but by the ability to sell the old house.

In general, mortgages cannot be fully prepaid without incurring an additional charge. This charge is known as the prepayment penalty. Usually the penalty is the net present value of the difference between the contract rate and the market mortgage rate. Mortgage lenders have agreed to allow a small percentage of the initial principal to be prepaid every year. For most mortgages, 10% of the initial principal is penalty free, although mortgages with a penalty free prepayment rate of 20% also exist. Prepayment penalties only exist during the fixed rate period, and only if the market mortgage rate is lower than the contract mortgage rate, a prepayment penalty is applicable.

Because refinancing a mortgage into a lower rate can still be seen as a net profit for the borrower, the choice was made not to account for the prepayment penalty in determining the value of the prepayment option for a borrower. However, because of the penalty free percentage, partial prepayments often amount to 10% of the initial principal. The prepayment model takes this observation into account.

Prepayment penalties are quite susceptible to regulation. Since November 2013, it is pos-sible for under water mortgages to prepay without penalty the difference between the current mortgage amount and the current WOZ (Waardering Onroerend Zaken) value of the property. This regulation reduces the risk of the borrower to end up with a residual debt. Moreover, no prepayment penalty is applied when a sudden or unexpected change in the personal attributes of the mortgagor occur, like death. Additionally, no penalty will be charged on events like moving or house selling or at any interest-rate reset date.

Because prepayment can occur under various circumstances with various implications, we distinguish different types of prepayment:

• Curtailment (partial prepayment): Part of the outstanding principal is repaid. The client can prepay up to 10-20% every year without penalty, depending on the mortgage. • Conversion: A loanpart is terminated and replaced by a new loanpart;

• Full prepayment: A loanpart is terminated and not replaced by a new loanpart.

1.5.4 Nationale Hypotheek Garantie

Finally, a regulatory concept which is unique to the Dutch mortgage market is the Nationale Hypotheek Garantie (NHG) (English: National Mortgage Guarantee) which is an insurance on mortgages. In case the client is unable to pay the rest of his payments and the house is sold for less than the value of the mortgage, the NHG pays the remainder of the outstanding debt to the mortgage lender. The insurance is paid for by a single premium which is a fixed percentage of the total value of the mortgage. The NHG removes default risk for the bank, so the bank is able to offer the mortgage at a lower rate.

(16)

Chapter 2

Model Description

This chapter contains a description of the techniques that the model employs in order to cal-culate prepayment probabilities and the corresponding behavioural cash flows. The underlying theory is discussed and it is explained how the assumptions of the model are dealt with.

First, a summary of the model structure is given.

2.1

Model summary

The residential mortgages portfolio is modelled on a loanpart level. For every loanpart across the portfolio, probabilities are modelled on a monthly basis for four different event types:

(i) Full Prepayment: The loanpart is fully prepaid. (ii) Curtailment: The loanpart is partially prepaid.

(iii) Conversion: The loanpart is terminated and replaced by a new loanpart. (iv) Nothing happens: None of the above situations apply.

The probabilities of these events are estimated through a multinomial logit regression. Based on historical data on prepayments and on various explanatory variables (e.g. mortgage rates, loan age and macroeconomic variables) which are correlated with the prepayment rates, parameters are estimated that signify the impact of the explanatory variables on prepayments.

In summary, the model framework consists of six steps:

1. The parameters of the multinomial logit model are estimated on the historical data set. 2. Monthly forecasts of the explanatory variables are obtained for a 30 year horizon.

3. For each contract and month within the prediction horizon, probabilities are calculated for each prepayment event type using the multinomial logit model.

4. Based on these probabilities, Single Monthly Mortality rates (SMMs) are calculated. 5. Based on contractual specifications, contractual cash flows are generated.

6. Behavioural cash flows are generated by adjusting the outstanding principal using con-tractual cash flows and SMM rates.

(17)

The process behind the steps are explained in the following sections. Sections 2.2 and 2.3 offer a theoretical description of the logit model and its multinomial extension. Its implemen-tation is covered in 2.4. This covers the first and third steps. Step 2 is briefly discussed in section 2.6. The procedure for determining SMMs for each event type is explained in section 2.7. The generation of the behavioural cash flows using contractual specifications (steps 5 and 6) is outlined in section 2.8.

2.2

Binary choice modelling

A common problem that econometricians face is modelling the decisions of individuals. For example, a person may decide to vote for one of two candidates of an election or an individual may be employed or unemployed. Similarly, a potential buyer may or may not react to a new offer. Usually, this binary set of outcomes is labelled 1 (”success”) or 0 (”failure”). In statistics, such a choice is often modelled by estimating the probability of choosing 1 versus the probability of choosing 0. We let yi be the choice of individual i and we are interested in the probabilities

P(yi = 1) = πi and P(yi = 0) = 1 − πi. The index i is very important here. Depending on

the circumstances and characteristics of an individual, the probability of success will differ. For example, the voting behaviour of an individual is dependent on their political views. Individuals may have a harder time getting a job when they have a lower education. And buyers will react differently to offers depending on their interests.

2.2.1 Non-linear model for probabilities

The starting point for binary choice modelling is a non-linear regression model. Let us introduce some notation. Suppose we have data associated with n individuals. For i = 1, . . . , n, set yi

to be the binary random variable denoting the choice that individual i makes. We denote πi = P(yi= 1) = 1 − P(yi = 0), the probability of yi being equal to 1. In standard econometric

notation, we let k be the number of explanatory variables. These variables are contained in the n × k-matrix X, where we denote by x0i the i’th row of X. The parameters are denoted by the vector β of length k. The objective is to write the probabilities in the following form

πi = F (x0iβ). (2.1)

Here F may in principle be any function on R. In order to ensure that the πi are probabilities,

the obvious choice is to let F be a cumulative distribution function. Since such a function is non-decreasing, this has the added benefit of making the effects of the explanatory variables monotonous as well. Namely, if βj > 0, then an increase (decrease) in xijwill lead to an increased

(decreased) probability πi, with and vice versa when βj < 0. This aids in the interpretation of

the model.

2.2.2 Probit and logit model

The model described above not only depends on the explanatory variables X, but also on the choice of function F . In practise, one would pick either the cumulative normal distribution function F (x) = Φ(x) = Z x −∞ 1 √ 2πe −1 2z 2 dz, or the cumulative logistic distribution function

F (x) = Λ(x) = e

x

(18)

The model that employs the first function is known as the probit model; the logit model makes use of the logistic function.

The normal and logistic distribution functions have similar characteristics. Their densities are both symmetric and unimodal and both distributions have mean zero. The models will produce similar results in terms of estimated probabilities. However, note that, unlike the cumulative logistic distribution function Λ(x), the integral Φ(x) has no analytical expression and will require a numerical integration algorithm. Calculation of this one-dimensional integral is still quite fast, so no compelling reason is observed to prefer either model.

The multinomial extensions of the logit and probit models add some new problems to con-sider. The motivation for selecting the multinomial logit model is given in section 2.3.5.

2.2.3 Estimation of a logit model

For the moment we will focus on the logit model. Writing out equation (2.1) for F (x) = Λ(x) yields

πi =

exp(x0iβ) 1 + exp(x0iβ). On the other hand, the complementary probability is equal to

1 − πi =

1 1 + exp(x0iβ). We can then write log

 πi 1−πi  as a regular regression: log  πi 1 − πi  = x0iβ. (2.2)

The expression log 

πi

1−πi



is referred to as the log-odds. It describes the relative preference of option 1 with respect to option 0. It will be convenient to consider the logit model as a linear regression of the form (2.2).

Typically, the logit model (2.2) is estimated using Maximum Likelihood Estimation (MLE). Suppose we observe the data y1, . . . , yn, which are mutually independent but not identically

distributed. The objective is to find a good estimator for β. For i = 1, . . . , n, the random variable yi is Bernoulli distributed with parameter πi. The probability density pi,β of such a

random variable with respect to the counting measure is pi,β(yi) = πiyi(1 − πi)1−yi = Λ(x0iβ)yi(1 − Λ(x0 iβ))1−yi = exp(yi(x 0 iβ)) 1 + exp(x0iβ).

Since y1, . . . , ynare independent, the likelihood function pnβ is simply the product of each of the

probability densities, evaluated at their respective observations yi:

pnβ(y1, . . . , yn) = n Y i=1 pi,β(yi) = n Y i=1 Λ(x0iβ)yi(1 − Λ(x0 iβ))1−yi = n Y i=1 exp(yi(x0iβ)) 1 + exp(x0iβ).

(19)

For ease of notation, denote by L(β) = pnβ(y1, . . . , yn), the likelihood of observations y1, . . . , yn.

The corresponding log-likelihood is

l(β) = log L(β) = n X i=1 yi(x0iβ) − n X i=1 log(1 + exp(x0iβ)).

The log-likelihood is maximized by solving the first order conditions. Denoting by λ(t) = (1+eett)2

the density corresponding with the cumulative distribution function Λ(t), the k first order conditions are given by

˙l(β) := ∂l(β) ∂β = ∂ ∂β n X i=1 yi(x0iβ) − n X i=1 log(1 + exp(x0iβ)) ! = n X i=1 yix0i− n X i=1 1 1 + exp(x0iβ) · x 0 iexp(x0iβ) = n X i=1  yi− exp(x0iβ) 1 + exp(x0iβ)  x0i = n X i=1 (yi− πi) x0i = 0.

These zeros are found using numerical approximations, such as Newton-Raphson. The solution (β1, . . . , βk) is unique and forms a global maximum, since the Hessian matrix

∂2 ∂β∂β0l(β) = ∂ ˙l(β) ∂β0 = n X i=1 − exp(−x 0 iβ) (1 + exp(−x0iβ))2xix 0 i,

is obviously negative definite. This facilitates numerical approximation of this maximum, and numerical methods in general converge quite fast.

2.3

Multinomial choice modelling

In prepayment modelling, a binary choice formulation is not satisfactory, because prepayment risk is described by more than two events (e.g. full prepayment, partial prepayment, no pre-payment). Therefore, a model can be defined which has a dependent variable with more than two possible values. The corresponding model is then called multinomial.

2.3.1 Multinomial Logit

The logit model described earlier lends itself to a natural extension when more than two event types are considered. We start by introducing extra notation for the multinomial model. For ease of reference we give each alternative an index 1, . . . , J . The observations are still numbered 1, . . . , n. We denote by πij = P(yi = j) the probability of individual i choosing alternative j.

(20)

is a k × 1 vector). The first element of xij is the constant term xij,1 = 1. Furthermore, for each

alternative j the parameter vector is denoted by the k × 1 vector βj.

The binomial logit model can be extended to the multinomial case by selecting one of the J responses as a pivot (throughout this thesis, we select the last response J for this purpose), and modelling the log-odds of the responses j = 1, . . . , J − 1 with respect to the last one. The relevant equation, for i = 1, . . . , n, becomes

log πij πiJ



= x0ijβj. (2.3)

We can extract from this equation the probabilities πij as follows:

πij = πiJ · ex

0

ijβj, i = 1, . . . , n. (2.4)

Since the πij are probabilities, we must have

PJ

j=1πij = 1, and therefore we can sum over the

probabilities to get 1 = πiJ  1 + J −1 X j=1 ex0ijβj  . (2.5)

Substituting (2.5) into (2.4) yields

πij = ex0ijβj 1 +PJ −1 l=1 ex 0 ilβl , j = 1, . . . , J − 1, (2.6) πiJ = 1 1 +PJ −1 l=1 ex 0 ilβl . (2.7)

2.3.2 Estimating the multinomial logit model

A joint distribution of the random variables yiis needed for estimating the parameters β1, . . . , βJ −1

of the multinomial logit model. It is assumed that the yi are independent. So assume we have

observed the data y1, . . . , yn. Then the log-likelihood can be written as

lM N L(β1, . . . , βJ) = log(LM N L(β1, . . . , βj)) = n X i=1 J X j=1 1{yi=j}log(πij) = n X i=1 log(πiyi). (2.8)

Here πiyi denotes πij for the particular yi = j that was realized. If we substitute the probabilities

(2.6) into (2.8), the log-likelihood becomes

lM N L(β1, . . . , βJ) = n X i=1   J −1 X j=1 1{yi=j}x 0 ijβj− log 1 + J −1 X l=1 ex0ilβl ! .

The gradient of the log-likelihood consists of (J − 1) stacked k × 1 vectors: ∂lM N L ∂βj = n X i=1 1{yi=j}x 0 ij− ex0ijβj 1 +PJ −1 l=1 ex 0 ilβl x0ij ! = n X i=1 1{yi=l}− πij x 0 ij, j = 1, . . . , J − 1.

(21)

The Hessian matrix of the log-likelihood is a (J − 1)k × (J − 1)k-matrix consisting of the k × k-blocks ∂2lM N L ∂βj∂(βj)0 = − n X i=1 xijex 0 ijβj  1 +PJ −1 l=1 ex 0 ilβl  − ex0ijβjx ijex 0 ijβj  1 +PJ −1 l=1 ex 0 ilβl 2 x 0 ij = − n X i=1 exijβj 1 +PJ −1 l=1 ex 0 ilβl · 1 − e xijβj 1 +PJ −1 l=1 ex 0 ilβl ! xijx0ij = − n X i=1 πij(1 − πij)xijx0ij, j = 1, . . . , J − 1

on the diagonal, and the blocks ∂2lM N L ∂βj∂(βh)0 = n X i=1 ex0ijβj· ex0ihβ h  1 +PJ −1 l=1 ex 0 ilβl 2xijx 0 ih, j, h = 1, . . . , J − 1, g 6= h

off the diagonal. This matrix is negative definite so the log-likelihood has a unique maximum. This solution is found by approximating the zero of the gradient via numerical methods such as Newton Raphson.

2.3.3 Independence of Irrelevant Alternatives

The above multinomial logit model is based on the assumption that the probability of choosing one alternative versus the probability of choosing another only depends on the characteristics corresponding to these two alternatives. To explain this in mathematical rigour, consider an individual i and two alternatives j and h. Then the ratio between πij and πih is given by

πij πih = πij πiJ πiJ πih = e x0 ijβj ex0ihβh.

Notice that the ratio only depends on the explanatory variables xij, xih corresponding with

individual i and the choices j and h along with their parameters βj, βh. So the relative odds to

choose between the alternatives j and h does not depend on the presence any other alternative l 6= j, l 6= h. This assumption is known as the Independence of Irrelevant Alternatives (IIA).

This assumption does not always hold. For example, suppose that J = 3 and that the alternatives J = 1 and J = 2 can be considered functionally equivalent for an individual i (this may occur when an individual is faced with the choice of taking a red bus, a blue bus or a car as a travel method). Furthermore, the probability πi(1∨2) of choosing either of the first two alternatives. Because alternatives 1 and 2 are functionally equivalent, one should have that πi1

π3 =

πi2

πi3 =

πi(1∨2)

2πi3 . However, because alternative 1 ∨ 2 shares the explanatory

variables and parameters with the individual alternatives 1 and 2, the IIA assumption implies that πi1

π3 =

πi2

πi3 =

πi(1∨2)

πi3 , yielding a consistent overestimation of the log-odds of these two

alternatives.

To avoid such situations in our model, we need to make sure that our alternatives have distinct characteristics that are relevant to the client. (Long & Freese, 2003) suggest that: ”It appears that the best advice regarding IIA goes back to an early statement by McFadden (1973), who wrote that multinomial and conditional logit models should be used only in cases where the alternatives can ’plausibly be assumed to be distinct and weighted independently in the eyes of the decision maker’.”

An alternative model that is sometimes used when the IIA assumption does not seem plau-sible is the multinomial probit model.

(22)

2.3.4 Multinomial Probit

The multinomial probit model is a generalization of the binomial probit model. The J possible outcomes are generated by a latent variable model

Uij = x0ijβj+ ij, i∼ N (0, Ω), (2.9)

where the Uij are called the latent variables: these represent the utility that individual i receives

from alternative j. The ij are individual specific and represent unmodelled factors in individual

preferences; the vector i is modelled as a multivariate normally distributed random variable.

Note that the utilities Uij are not observed. It is assumed that the i’th individual chooses the

alternative j that yields the greatest utility Uij. The observed choices yi, given the explanatory

variables and parameter values, are distributed as

πij = P[yi = j] = P[Uij ≥ Uih, h = 1, . . . , J ]

= P[Uij − Uih≥ 0, h = 1, . . . , J ].

Since the observed choices yi only depend on the difference between the utilities Uij, separate

coefficients cannot be identified for all j = 1, . . . , J : we set all components of βJ equal to 0 and consider the differences Zij := Uij − UiJ, j = 1, . . . , J − 1. Then

πij = P[yi = j] = P[Zij ≥ Zih, h = 1, . . . , J − 1, and Zij ≥ 0].

Thus the probabilities P[yi= j] are completely determined by the joint distribution of the Zij.

Following from the multivariate normal distribution of the i, these variables Zij are multivariate

normally distributed with a (J − 1) × (J − 1) covariance matrix Σ, derived from Ω. In principle, no restrictions are placed on this covariance matrix. Correlations can exist between Zij and Zih

for j 6= h. As a consequence, the multinomial probit model does not share the IIA property that the multinomial logit model has.

2.3.5 Motivation for multinomial logit

The multinomial probit model is theoretically an attractive option for modelling prepayment probabilities. It is more general than the multinomial logit model because the IIA assumption does not need to be made. The correlation between Zij and Zih for j 6= h might, for example,

measure the extent to which a preference for curtailment over doing nothing may be correlated with a preference for prepayment over doing nothing (where doing nothing corresponds to choice J in this case).

On the other hand, it should be mentioned that the multinomial probit model suffers from computational problems. For estimation of the model, the maximum likelihood estimator for the parameters βj must be calculated. Neither the multinomial logit nor the multinomial probit model offer an exact solution. Therefore, the maximum must be approximated numerically by a Newton Raphson method. This requires a large number of calculations of the log-likelihood. This is not feasible in the multinomial probit case, which involves a J − 1-dimensional integral that cannot be computed in a satisfactory time frame. The multinomial probit model is therefore considered unfit for prepayment modelling.

The multinomial logit model is chosen for its relative ease of computation. Care is taken to ensure that the modelled alternatives are distinct.

(23)

2.4

Prepayment model

Now that the theoretical properties of multinomial logit regressions have been discussed, let us define the model in the context of prepayment modelling. We show how the data is structured, and define the dependent variable.

2.4.1 Data structure

The data for the model is organized as panel data, where each observation can be associated with a point in time t and a specific loanpart i, i.e. we add the dimension time to the dataset to obtain two-dimensionally indexed observations (i, t), with i = 1, . . . , n and t = t0, . . . , T . The

frequency of the observations is monthly. The decision to model the evolution of loanparts as opposed to mortgages has been made because the mortgage rate type (fixed or variable) and amortization type are defined on a loanpart level and are expected to have a significant effect on prepayments.

However, this choice does come at a price: since mortgages may consist of multiple loanparts, loanparts belonging to the same mortgage will not be independent. This violates the indepen-dence of residuals assumption. Because of the big sample size and low number of loanparts that constitute a mortgage, this is not expected to have a significant impact on the accuracy of the estimated prepayment probabilities.

Because a logistic regression models probabilities which are not observed directly, the usual residual tests applicable to linear regressions cannot be performed. Instead, aggregate residuals are produced in the following manner. Denote by Pi,t the outstanding of loanpart i at time t.

Then, for j = 1, . . . , J , the estimated rate πt,j of event j at time t is defined as the weighted

average πt,j = X i Pi,t P i0Pi0,tπ(i,t),j.

Similarly, define the observed rate πt,j∗ of event j at time t as a weighted average of realized events: π∗t,j =X i Pi,t P i0Pi0,t1{yi,t=j}.

Then, for j = 1, . . . , J − 1, the residual t,j is defined as the difference between the log-odds of

these rates, i.e.

t,j = log  πt,j πt,J  − log π ∗ t,j πt,J∗ ! . (2.10)

The logit model assumes that (i,t),j follows a logistic distribution, which is very close to a normal distribution. Using the Central Limit Theorem, it then follows that the aggregated residuals are asymptotically normally and independently distributed (NID) if the model is well-specified:

j ∼ NID(0, σj). (2.11)

The NID assumption is typically tested by testing for normality, heteroskedasticity and auto-correlation. The tests performed are described in section 2.5.

(24)

2.4.2 Dependent variable

The dependent variable y(i,t) corresponds with prepayment events. There exist various types

of prepayment events. More specifically, one can prepay fully or partially or one can alter the conditions of the mortgage. It is necessary to distinguish these types of prepayment because the events are affected differently by the explanatory variable in use. Increasing the number of prepayment events will increase the predictive power of the model. However, the addition of prepayment events requires more parameters to be estimated and implies less data per event type.

At each point in time t for loan i the dependent variable y(i,t) is modelled as a random variable. The following four events were selected as possible values for the dependent variable:

y(i,t) ∈            1 (Full Prepayment), 2 (Curtailment), 3 (Conversion), 4 (Nothing). (2.12)

It is important being able to distinguish between these prepayment event types. Namely, the multinomial logit model is used to construct a projection of the future cash flows. This is done by calculating for each prepayment type an estimation of prepayment amount as a percentage of the outstanding. This percentage differs per prepayment type (e.g. in the event of ’Full Prepayment’ the full principal is repaid, whereas ’Curtailment’ may entail any percentage strictly between 0 and 100).

2.5

Explanatory variable selection: General to specific approach

The task of empirical modelling of economic processes – such as prepayments – is an arduous one. The economy is a complicated, dynamic, non-linear, simultaneous, high-dimensional, and evolving entity; social systems alter over time; laws change; and technological innovations occur. Conversely, economic theories are abstract and highly simplified. These theories also change over time, with conflicting rival explanations sometimes coexisting. The data evidence is tarnished: economic magnitudes are inaccurately measured and subject to substantive revisions, and many important variables are not even observable. The data themselves are often time series where samples are short, highly aggregated, heterogeneous, time-dependent, and interdependent.

Nevertheless, the aspiration is to find interpretable dependencies between economic variables observed in the data. The approach proposed by the London School of Economics (LSE) has emerged as a leading methodology for empirical modelling [7]. One of the main tenets of the approach is the specific modelling, sometimes abbreviated as ”Gets”. In general-to-specific modelling, empirical analysis starts with a general statistical model that captures the essential characteristics of the underlying dataset, i.e. a congruent model. Then that general model is reduced in complexity by eliminating statistically insignificant variables, checking the validity of the reductions at every stage to ensure congruence of the finally selected model.

Congruence of a model describes the ability of a model to accurately portray the dependen-cies of variables in the data. At each step, the congruence is checked through a set of tests. The testing and performance framework of the liquidity models is summarized in table 2.1.

The two categories Model specification testing, Model Diagnostics concern the congruence tests. The purpose of each category is described in the following paragraphs.

First, model specification tests analyse the proper inclusion or exclusion of explanatory variables. Ideally, an econometric model should explain the dependent variable with the least

(25)

Model specification testing Statistical significance of parameters (Wald test)

Information Criteria (AIC, BIC)

Model diagnostics

Unit root test Autocorrelation test Normality tests Heteroskedasticity test RESET test

Model performance

Estimated vs actual residuals Out-of-time test

Out-of-sample test

Table 2.1: Testing and model performance framework

amount of explanatory variables possible. Including too many explanatory variables results in over-fitting and reduces the predictive power of the model. However, care should be taken when leaving out explanatory variables since the exclusion of a relevant variable will introduce bias in the parameters. A model which establishes a good trade-off is called a parsimonious model. The statistical significance tests of parameters aid in selecting the explanatory variables to exclude from the model. Through the information criteria (AIC, BIC) one can ascertain whether a model performs better after excluding a parameter. The tests are described in subsection 2.5.1. Second, the residuals of the estimated regression are assessed by means of residual diag-nostics tests. The objective is to discover a parsimonious model with residuals that satisfy particular properties. More specifically, the residuals should be independent and identically distributed. The i.i.d. assumption is critical for any valid inference from the model. For in-terpreting most of the tests, normality of the residuals is required. Moreover, normality of the residuals yields consistent parameter estimates. The unit root test checks whether the data used is stationary and fit for regression. The RESET test assesses whether non-linear combinations of the explanatory variables help explain the dependent variable, through which one can detect misspecification of the model. The tests are described in subsection 2.5.2.

The final category of tests concern the model performance. These tests concern the predictive power of the model and are applied after the general model has been reduced to the most parsimonious form. Through the out-of-time and out-of-sample tests, one analyses whether the model, estimated on the in-sample, also produces accurate results of-sample. For the out-of-time test, the in-sample and out-of-sample are separated through time. In the out-of-sample test, the sample is separated in such a way that the two samples represent the whole sample. These tests are described in subsection 2.5.3.

For the purposes of the thesis, we proceed with the existing prepayment model as a given and explore the addition of variables into the model. The final selection of variables is described in subsection 2.5.4.

2.5.1 Model specification tests

Several selection criteria can be used to distinguish variables that are useful predictors from irrelevant variables. Throughout this subsection we adopt the following notation:

• n is the number of observations;

(26)

• L is the maximized value of the likelihood function.

We consider three criteria for explanatory variable selection: the Akaike Information Criterion, the Bayesian Information Criterion and the Wald Test. The first two criteria are used to determine the relative quality of model specifications. The criteria reward models with high maximum likelihood and penalize large numbers of parameters. The latter criterion is a standard statistical test to test the significance of a parameter.

2.5.1.1 Akaike Information Criterion (AIC)

The AIC evaluates the difference between a candidate model and the true model by means of the Kullback-Leibler divergence (K-L distance). The K-L distance is described as a way of measuring the distance between probability distributions. Applying this distance to an estimated econometric model leads to the following criterion:

AIC = 2k − 2 log(L). The model with the lowest AIC is preferred.

2.5.1.2 Bayesian Information Criterion (BIC)

The BIC is founded in Bayesian statistics. The statistic is obtained by considering the posterior probability among the available models. By ignoring terms which vanish when the sample size tends to infinity, the statistic is given by

BIC =1

2k log(n) − log(L).

The model with the lowest BIC is preferred. The criterion typically penalises models with additional parameters more than the AIC does.

2.5.1.3 Wald Test

The Wald Test is a statistical test which is used to test whether a maximum likelihood estimate ˆ

βn differs significantly from a proposed value β0. The test is governed by the following null and

alternative hypotheses:

H0 : β = β0,

H1 : β 6= β0.

The statistic used to test the null hypothesis is W := qβˆn− β0

Var( ˆβn)

.

The consistency of the estimator ˆβnimplies that W is asymptotically normally distributed with

mean 0 and variance 1. Hence, we have that W2 is distributed asymptotically according to a χ2-distribution.

For the purposes of selecting explanatory variables, one would use a proposed value of β0= 0

to test whether an explanatory variable can be left out.

2.5.2 Model diagnostics

(27)

2.5.2.1 Autocorrelation test

By testing for autocorrelation, it is tested whether the residuals are independently distributed. A possible explanation of rejection of autocorrelation test can be the presence of a trend or seasonal effect that is not captured by the model. Autocorrelation is tested in two ways:

• Breusch - Godfrey Lagrange multiplier (Godfrey LM) test is used to test for serially correlated residuals. The order of the Godfrey test should be based on the order of the seasonality. In our model we expect yearly and quarterly seasonality and therefore use the 12th order. The null hypothesis of the Godfrey LM test is that there is no serial correlation of any order up to the order tested.

• Partial autocorrelation function: the partial autocorrelation function is commonly used to determine the number of lags in an autoregressive function. The partial autocorrelation function is defined as follows: given a time series zt, the partial autocorrelation of lag k

(denoted a(k) ) is the autocorrelation between zt and zt+k with the linear dependence

between ztand zt+k removed, i.e.

a(1) = cor(zt, zt+k)

a(k) = cor(zt+k− Pt,k, zt− Pt,k),

where Pt,k(x) denotes the projection of z onto the space spanned by zt+1, . . . , zt+k−1. A

correlogram that plots the autocorrelation function and partial autocorrelation functions along with the critical values (at a 5% significance level) then indicate the correction that one may need to apply to the residuals.

2.5.2.2 Heteroskedasticity test

To test whether the residuals have a constant variance over time (homoskedasticity), two tests are performed. The tests are performed on a time window ranging from 1 to 12 lags to see whether over this period the variance changes. The Lagrange Multiplier test is performed by modelling the residuals as an ARCH process:

t= σtZt,

where Zt is a white noise process and the series σt2 is modelled by

σt2= α0+ α12t−1+ · · · + αm2t−m.

The null hypothesis of the Lagrange Multiplier test for ARCH disturbances is: H0: α0= α1 = · · · = αm = 0.

The alternative hypothesis is that at least one of the ARCH components is significant. Rejection of the null-hypothesis signifies that the residuals are evidently heteroskedastic.

2.5.2.3 Normality test

The following tests are used to examine whether the residuals are normally distributed. All tests are based on some measure of the difference between the fitted (using the sample mean and variance) cumulative normal distribution function F and the empirical distribution function Fn.

Referenties

GERELATEERDE DOCUMENTEN

CFA proposals normally include the publication of historic and forecast cash flows, and this has been accommodated in the above scheme - forecasts are prepared on

All these findings suggest that by cross-listing on an exchange with higher disclosure demands than in the firm’s domestic market, the results are that there is a

Investment size, is the log of R&amp;D expenditures, i.e., log(rd) Strong FTR, is based on the nationality of CFO and CEO and a binary variable that indicates whether their

Hypothesis 2a: The outbreak of the financial crisis triggered an increase in cash ratio for firms located in Germany (bank-based economy) and the United States (market-based

It underlines that higher vacancy rates result in higher risk of future decrease in rents and asset values and thus higher yields.. Finally from the significant one year

world economic growth rate, the impact of domestic output growth on gross flows is significantly positive and the estimated coefficients are around 0.6 in different

By allocating stocks into portfolios based on the cash flows between investors and companies, the FF5 model can explain the cross- sectional variation in returns and identify

• To what extent is the change in cash holdings of Chinese and U.S firms during the financial crisis a result from changes in firm characteristics.. • To what extent