Prepayment risk modeling of Dutch mortgages : a neural networks approach

(1)

MASTER THESIS

[NON-CONFIDENTIAL VERSION]

MSc FINANCE: QUANTITATIVE FINANCE

Prepayment Risk Modeling of Dutch Mortgages:

A Neural Networks Approach

Student:

Julia Subotniaya

Supervisors:

prof. dr. Marc K. Francke (UvA)

Victor A. Popa (NIBC Bank N.V.)

University of Amsterdam

Amsterdam Business School

(2)

Statement of Originality

This document is written by Student, Julia Subotniaya, who declares to take full re-sponsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

Confidentiality Agreement

This document falls under the standard Non-disclosure Agreement (NDA) between stu-dent, Julia Subotniaya, academic establishment, University of Amsterdam, and finacial institution, NIBC Bank N.V.. This non-confidential version of the thesis does not contain confidential inform-ation of the NIBC Bank N.V., such as key statistics and important strategical informinform-ation that the accepting party chose not to disclose. In parallel, the complete confidential version of this thesis has been provided to the academic supervisor prof. dr. Marc Francke.

(3)

Acknowledgements

The author expresses her gratitude to the NIBC Bank N.V., and the department of the Financial Markets Risk, led by Eloy Cosijn, where this research has taken place. Special thank you is ad-dressed to the Head of Risk Analytic & Model Validation team, Victor A. Popa, for the opportunity to be part of the team, for the invaluable expertise and the support at each step of the development process. This research would not be complete without Risk Management team experts: Konstantin Vitalski, Dimitar Mechev, and Jeroen Schubert, who have been providing advanced feedback and crucial comments throughout the process.

The author wishes to express her profound acknowledgement to the professor of Real Estate Valuation at the University of Amsterdam, Dr. Marc Francke, who kindly agreed to become the mentor and the academic supervisor for this Master Thesis. Your insights, your guidance and your continuous engagement in the process allowed this project to be successfully completed and made this experience exceptionally memorable.

The author thanks her family and close friends, whose encouragement and support made this master program and this research project possible.

(4)

Abstract

Residential mortgage contracts constitute a large capital share for virtually any Dutch bank or financial institution. One of the peculiar features embedded in the mortgage contract is an implicit option to repay a loan prior to its contractual maturity, which defines the notion of prepayment risk. This thesis attempts to model the prepayment risk by using the portfolio of residential mortgages held by NIBC Bank N.V. As there exists a multitude of macro-, borrower- and loan-specific factors that influence the decision to repay a loan prematurely, this paper seeks to analyze these factors and to effectively capture the nonlinear functional relationship between the explanatory variables and the prepayment rate by using a set of supervised learning techniques, Artificial Neural Networks (ANNs). In this thesis, several fitting-type of networks are developed, estimated and assessed at both loan- and portfolio-levels. Based on the suggested metrics, the best performing models are chosen. Special attention is paid to the computational time required for model training. This thesis adds to the excising empirical literature on prepayment risk by additionally using ANN approach to identify relevant contributing determinants of prepayments. For this, weights matrix analysis is implemented on the selected models.

(5)

List of Abbreviations

ANN Artificial Neural Networks CBS Central Bureau of Statistics CPH Cox Proportional Hazard Model CPR Conditional Prepayment Rate DNB Dutch Central Bank

EBA European Banking Authority

EC Economic Capital

FRP Fixed Rate Period

HPI House Price Index (base year = 2015)

IMF International Monetary Fund

IRRBB Interest Rate Risk in a Banking Book LReLU Leaky Rectified Linear Unit

LtFV Loan-to-Foreclosure Value

LtI Loan-to-Income

LtMV Loan-to-Market Value

LtV Loan-to-Value

MIR Mortgage Interest Tax Relief System

MLP Multilayer Perceptrons MSE Mean Squared Error

NHG National Hypotheek Garantie (National Mortgage Guarantee Scheme)

NID Neural Interpretation Diagram

OLS Ordinary Least Squares

PReLU Parametric Rectified Linear Unit ReLU Rectified Linear Unit

SGD Stochastic Gradient Descent

(8)

Introduction

In 2017, the total outstanding mortgage debt in the Netherlands has reached the value of 670 billion euros, as documented by the Dutch National Bank, with the portfolio of mortgage loans as an essential component across all banks and other financial institutions. A peculiar feature of a mortgage loan is that it grants an option to a mortgagor (borrower) to prematurely repay the principal - with or without penalties - to the mortgagee (lender), exposing the latter to liquidity [uncertain future cash flows] and interest rate [mismatch between prevailing and contractual] risks (Van Bussel, 1998). Yet, this implicit call option embedded in the mortgage contract is not always optimally exercised: borrowers may not prepay in times of sufficiently low interest rates and borrowers may prepay when rates available in the market exceed contractual (Charlier & Van Bussel, 2003). Making otherwise definite maturity of the contract stochastic, the prepayment risk complicates valuation and risk management for mortgage originators, which, in turn, emphasizes the importance of accurately constructed models for prepayment risk and the substance of the risk factors that drive these prepayments.

There exists a broad set of risk factors that influence the decision to prepay the mortgage loan prematurely. Commonly, these determinants are divided into three categories: macroeconomic (refinancing incentive, seasonality, house prices, etc.), borrower-specific (age, income, creditwor-thiness, etc.) and loan-specific (redemption type, fixed rate period, loan-to-market ratios, etc.). Given the multitude of factors that can potentially explain prepayment behaviour among mort-gagors and complexity of such functional relationships, modelling prepayment risk is a non-trivial task. The existing empirical literature segregates prepayment models into option-theoretic and exogenous models: while the former seek to model the prepayment under the strict assumption of optimal exercise of the option to prepay under no-arbitrage conditions, the latter attempt to incorporate the heterogeneous nature of mortgagors behaviour and account for sub-optimal option exercise. However, the traditional statistical models rely heavily on strict assumptions of normal-ity and independence of predictors, and are often characterized by the relative absence of evident empirical association between explanatory variables and the dependent target.

Recently, more advanced machine learning techniques, such as Artificial Neural Networks (ANNs), applied to risk modelling have gained popularity due to their properties of adaptive learning, ability to incorporate highly nonlinear relationship between explanatory and response variables, and overall superior performance compared to traditional statistical models (Giesecke, Sirignano & Sadhwani, 2016). Developed as an attempt to simulate human brain, ANNs have been proven to be flexible and robust approximators of any nonlinear non-parametric function with multiple influencing factors (Hornik, Stinchcombe & White, 1989; Hornik, 1991). Additionally, ANNs can be utilized for the variable selection purposes, which allows to gain considerable insights into the underlying effects of selected model predictors (Guyon & Elisseeff, 2003).

Thus, the theoretical heuristics of this research consists in modelling prepayment risk associated with residential mortgage contracts using the ANN methodology and exploring the applications of ANNs in determining relevant risk factors that influence prepayment rates by means of weight matrix analysis. As opposed to the common practice of using ANNs for classification-type

(9)

of problems, the objective of the thesis lies in examining the contribution and empirical relevance of the explanatory variables, or neural network inputs, on the prepayment rate predictions, or neural network outputs. Additional novelty of the research consists in using both loan- and portfolio-level datasets for the model training phase and subsequent risk factors analysis.

The applied objective of the research is to develop a prepayment risk model for the NIBC Bank N.V., by implementing the artificial neural networks technique to detect a set of risk factors that largely influence the prepayment rates at loan- and portfolio-levels. The introduction of this prepayment model would allow to gain more sound insights into the underlying effects of model predictors on the formulation of prepayment rates and would allow for more accurate quantification of prepayment risk, as an essential component of interest rate risk in the banking book (IRRBB). Moreover, given the changing regulatory landscape, the research would allow for more sensible estimations of economic capital (EC), as prescribed by European Banking Authority (EBA).

The thesis is organized as follows: Chapter 1 provides the general background of the Dutch mortgage market, including Dutch-specific features of mortgage loans and regulatory aspects of the market. Chapter 2 focuses on the prepayment option embedded in the mortgage contract as a financial interest rate instrument, introduces various risk factors that influence premature mortgage terminations and discusses conventional econometric models for prepayment risk with extensions to the alternative machine learning techniques, such as Artificial Neural Networks (ANNs). Next, Chapter 3 offers a methodological introduction to the ANNs, that can further be used in the context of modelling prepayment risk. Specifically, a brief preamble on the compressed ANN methodology is described in Section 3.1 with applications to the prepayment risk modelling in Section 3.2 and techniques for weight matrix analysis in Section 3.3; detailed extension to the ANN methodology is provided in Appendices A and B. Chapter 4 contains the data description, variable construction, and pre-processing methods; key statistics and visual analysis are reported too. In Chapter 5, the performance of ANN models is compared at loan- and portfolio-levels of aggregation; subsequently, the best fitting models are selected based on suggested performance metrics; next, the main results with respect to the significant risk factors for corresponding models are presented in Section 5.2. Chapter 5.3 provides the ANN robustness checks. Chapter 6 concludes and outlines potential directions for the further research.

(10)

1. The Dutch Mortgage Market

This chapter introduces the developments in the Dutch residential mortgage market and corres-ponding product offerings. Section 1.2 outlines the types of existing mortgage loans on the market. Next, several important features symptomatic of a Dutch mortgage loans, such as insurance and taxation schemes, are discussed in Section 1.3.

1.1 Historical Development

Outstanding mortgage obligations held by Dutch households accounted for as much as 93.82 percent of countrywide GDP in 2017 (87.01 percent in 2004), making the Netherlands one of the most indebted nations worldwide, on par with Denmark, Sweden and Ireland1_{. Figure 1.1 illustrates the} evolution of the total mortgage market debt outstanding along with the gross domestic product in the Netherlands for the past 10 years. According to the Dutch Central Bank projections, the mortgage debt is expected to grow further frome 665,000 million euro in 2017 to e 875,000 million euro in 2025. Yet these mortgage debt statistics should not be perceived as solely an increase in newly issued mortgages, but rather the projected growth in house prices and increase in the number of owner-occupied houses. Figure 1.1 additionally visualizes the former trend.

Figure 1.1: Mortgage debt outstanding (Source: DNB), Dutch Gross Domestic Product (Source: CBS), in millions of euros; House Price Index, base year 2015 (Source: CBS)

As of 2013, the Dutch mortgage market has regained strength fueled by the economic upturn. As the mortgage lending market has been operating under the stricter regulatory

con-1_{Organization for Economic Co-operation and Development [OECD] (2018)}

(11)

ditions set by Basel 3 Accord2_{, the tighter capital requirements have infused more stability in} the banks’ mortgage portfolios. Additionally, the mortgage market in the Netherlands has been shaping in the environment of persistently low interest rates, which brought about the demand for loans undertakings with longer fixed interest terms of 25 and 30 years. Supplemented by more stringent income and LtV requirements, along with less attractive tax-induced loan undertakings - these measures are expected to bring further stabilization and growth of the mortgage market in the Netherlands.

1.2 Mortgage Loan Types

A mortgage contract is a long-term loan, issued by the bank or other financial intermediary, for the purpose of supporting the funding of a real estate purchase by a mortgagor. In the Netherlands, the most common mortgage loan has a maturity of 30 years with the average 10-year fixed rate period (FRP) in 2010 and 15-year FRP in 20173.

Mortgage contracts can be categorized by the corresponding interest rate schemes they are offering, such as fixed rate and variable rate mortgages. With the former category, a mortgagor is obliged to pay a fixed interest rate throughout the lifetime of the loan, the upside of which is a protection from the potential upward movements in the market interest rates and the downside is a locked-in interest rate for the long-term, such as 15, 20, 30 years. The demand for this mortgage type can be explained by the implied protection of the borrower against anticipated interest rate fluctuations. Hence, the risk associated with fixed-rate mortgage alternates between the borrower and the lender, as dictated by the prevailing interest rate environment in the market.

A popular alternative is the variable rate mortgage loans that fluctuates in tandem with the market reference rate (usually, Euribor) and can be reset at various points in time. Under conditions of variable-rate mortgage, the contract rate is being reset at specified intervals to the rate prevailing at the market, i.e. agreed-upon fixed rate period. The demand for that mortgage type is justified by the fact that now a mortgage contract has both fixed and variable features. The risk of the contract here is (partially) transferred from the mortgagee to the mortgagor (Van Bussel, 1998).

Next, it is important to differentiate mortgage contracts by their amortization schedules, or redemption types. Table 1.1 briefly summarizes this classification.

1.3 Features of Dutch Mortgage Loans

As of 1996, Dutch mortgage suppliers agreed upon a so-called Interbank Credit Code of Conduct, which prescribes a set of minimum requirements with respect to contract conditions4_{, such as} borrowing limits (maximum LtI ratio tied to the household income levels), security limits (max-imum LtV ratio of 100 percent, as of 2018), and penalty-free percentage in case of early mortgage prepayment. The penalty costs are aimed at discouraging prepayments and are calculated as the discounted difference between the future monthly interest payments of the newly originated mort-gage and the existing payable interest on the loan up until the next reset date. Commonly, around 10 percent of initial capital amount of the mortgage loan can be prepaid in a given calendar year penalty-free. Mortgage lenders waive penalties in the situation of bankruptcy, sale of collateral, decease of the borrower and if the prepaid amount does not exceed the amount prescribed by each bank: some banks set the amount to 15 - 20 percent of the debt outstanding (Nederlandse Vereniging van Banken, 2014).

2_www.bis.org 3_www.nvb.nl 4_www.dnb.nl

(12)

Sub-categories Prepayment Schedule Ann uit y/ Lev el

Interest: pay significant interest in the beginning of the contract, which is decreasing with the repayment of the notional.

Principal: monthly repayment of the notional with small amount at the beginning of the contract, which is increasing over time.

Main Mortgage Products, since 01/01/2013

Linear

Interest: pay the interest on the corresponding monthly fixed loan amount, amount decreases with time. Principal: the notional amount is equally spread across the term of the contract, with the fixed amount repaid each month. The sum of the interest and the monthly repayment decreases over time.

In

terest-only/

Bullet

Interest: pay the interest during the full term of the contract, amount is relatively, the same across time, unless reset.

Principal: no regular payments during the term of the contract, the notional is fully repaid at the contract’s maturity.

Main Mortgage Products, prior to 01/01/2013

Savings Mortgage

Interest: pay monthly (fixed or variable) interest.

Principal: regular deposits are made on the savings account, with the accumulated amount matching the notional at maturity. The return on savings is used to repay the principal.

Investment Mortgage

Interest: pay monthly fixed interest.

Principal: regular deposits are made on the account, a non-amortized schedule combined with the investment product with the accumulated amount not always matching the notional at maturity.

Life Insurance Mortgage

Interest: pay monthly fixed interest.

Principal: regular deposits are made on the account, with the accumulated amount covered by the insurance policy. The return on the savings is used to repay the principal.

Table 1.1: Mortgage types, by redemption schedule

Next, the Dutch tax structure has been formulated in a way to facilitate household ownership and to boost mortgage loan undertakings: interest payments on the mortgage loans are fully tax deductible on primary residences, i.e. are not used towards total tax liability amount on the interest income. Until 2013 such tax relief opportunities had been exploited mainly by mortgagors with savings type of mortgages (see Table 1.1), as these loans are characterized by the features of both a debt (mortgage interest costs are tax deductible) and a savings element (savings accounts are not taxed). In recent years, following a set of reforms by the Dutch government in the mortgage interest tax relief system (MIR), tax-incentivized mortgage loans have being gradually phased out (Mastrogiacomo, 2013). Currently, besides mortgage loans inssued prior to 2013, only linear and annuity type of mortgages are now eligible for MIR.

Another Dutch-specific feature is the presence of the Nationale Hypotheek Garantie (Na-tional Mortgage Guarantee, or NHG) provided by the Waarborgfonds Eigen Woningen (Homeown-ership Guarantee Fund, or WEW), the body which assures the lending party against the adverse unforeseen financial and personal events, such as unemployment or disability, that mortgagor might face. Since 1956, the municipal governments has been pivotal insurer parties that help to stimulate home-ownership in the Netherlands and to eliminate credit risk faced by Dutch mortgage lenders. From 1995, NHG operates on the national level and is guaranteed by the Dutch government (Van Bussel, 1998). The program operates in the following way5: a potential borrower with mortgage loan principal belowe 265,000 in 2016 can apply for the NHG and whenever the borrower does not

5_www.nhg.nl

(13)

meet his/her payment obligations and the collateral is sold at a price that is not sufficient to repay the loan, NHG assures to reimburse the difference6_{. In that case, the debt is partially transferred} from the lender to the foundation. Such transition of the default risk happens at a fee (100 basis points of the loan as of 2014) that a mortgagor pays at the origination and which also provides discounts on the mortgage rates, depending on LtV ratios.

6_{Regardless of the redemption type of the loan, the difference is calculated according to the annuity-type}

repay-ment schedule less the outstanding amount of the loan. Following the crisis of 2008-2010, only amortizing mortgages can by insured by NHG (Francke & Schilder, 2014).

(14)

2. Mortgage Prepayment

This chapter introduces the prepayment risk embedded in a mortgage contract (Section 2.1). Next, Section 2.2 provides a detailed overview of the risk factors that influence the decision of mortgagor to prepay. Section 2.3 discusses three sets of existing approaches in the literature to model early mortgage terminations. Along with the traditional ways of modeling the prepayments, general empirical use of machine learning methods in the modelling mortgage risks, such as mortgage defaults and prepayments, is discussed in Section 2.4.

2.1 Introduction to Prepayment Risk

The standard residential mortgage contract contains an option of premature voluntarily loan re-payments, which deviate from the anticipated contractual cash flow repayment schedule (Kolbe, 2008). Therefore, prepayment risk can be defined as the risk associated with early repayments. The complexity of the risk is increased, on one hand, by a potentially sub-optimal exercise of the option embedded in the mortgage contract and, on the other hand, by the homogeneous nature of borrower’s behavior. As by Deng, Zheng and Ling (2005), once a rational well-informed agent in a perfectly competitive market opts to exercise his/her option to prepay sub-optimally, the financial contingent claims pose two different types of risks to a mortgagee: liquidity and interest rate risks. Uncertain maturity of a mortgage contract exposes the liquidity profiles of lending parties: as a result of unscheduled mortgage prepayment, a mortgagee may incorrectly overestimate future funds as they deviate from contractual cash flows. This is often referred to as liquidity risk (Deng, Quigley & Order, 2000). In turn, interest rate risk is exhibited in the unmatched cash flows that are contractually fixed and that are actually received. If at any point in time the interest rate falls below the fixed contract rate, the borrower may decide to repay the principal earlier than outlined by the contract, ceteris paribus. As the lender prematurely receives the funds, he/she is now forced to find an alternative use for the capital in possession, e.g. the funds will have to be reinvested at lower market rates (Perry, Robinson, and Rowland, 2001). Hence, it is apparent that a timely recognition of prepayment risk exposure may restrict potentially significant losses for lending parties.

The key determinant of prepayment is the so-called mortgage refinancing, i.e. prepay-ment of the notional prior to contractual schedule as a response to the decreasing market rates and subsequent taking out of a new loan in the market with lower rate. Relocation is another incentive for premature repayment, as a borrower decides to sell the collateral. Low and high curtailments (partial prepayments) and default should also be considered among the options. The low partial prepayments take up only a small part of the overall prepayment on the NIBC Bank NV. books and are highly bank-tailored, therefore will not be considered in this research. The default is usually either modelled separately or not accounted for, as a common practice in the Netherlands is for the mortgagee to contact the national credit registry that provides a detailed history on individual loan status and strength of borrower’s creditworthiness prior to the issuance of a new loan (Charlier & Van Bussel, 2003). Tightening of regulatory conditions and enhanced

(15)

financial background checks has gradually reduced the relative standing of the option to default1_. Given the relative size of the mortgage portfolio held by the NIBC Bank N.V. and the available data on prepayment causes, [confidential] of prepayments are refinance-related (Stucken, 2017). The remaining amount includes a sale of the collateral, a default or a demise of the mort-gagor - the causes, for which penalties do not apply. Additionally, for risk management purposes and the calculation of penalties, the further analysis focuses on the refinance-based prepayments only.

2.2 Determinants of Prepayments

The decision to prepay a mortgage contract prematurely is the compound outcome of financial and non-financial aspects that contemporaneously affect mortgagor’s decision to refinance or to relocate. Within academia, multifaceted explanatory variables have been proposed and tested (Clapp, Goldberg, Harding & LaCour-Little, 2001; Alink, 2002). Table 2.1 summarizes the key risk factors that induce prepayment rates on the macroeconomic, borrower and loan-specific levels. Additionally, Table 2.1 describes the underlying mechanism of each factor on the decision to refinance or reallocate and lists the academic papers that both incorporated a specific variable and obtained significant empirical results.

Additionally, Alink (2002) comprehensively discusses many other determinants, such as rank of a mortgage2, property use (Buy-to-Live versus Buy-to-Let contracts), profile of the borrower (full- versus partial employment) and the size of the loan. For further analysis, the explanatory variables have been chosen based on the empirical durability and availability in the data set; the variables are marked as ’+’ in Table 2.1. The borrower characteristics, such as age, geographic location and creditworthiness are only fractionally available in the data set, therefore, are excluded from further analysis.

1_{In 2013, Fitch Rating Agency ranks the Netherlands among the lowest in Europe in default probabilities}

(Germany - 4.1%, The Netherlands - 3.9%, Belgium - 3.7%)

2_{Lien priority of the mortgage, or a mortgage rank, is rank of the claim on the property by lending party, e.g.}

first versus eighth rank: in case of a first rank mortgage, a lender has a main claim on the property in case of default, higher rank of the mortgage implies that the property is claimed by other parties that have the first claim on the same mortgage (Alink, 2002).

(16)

Determinant Go/No Go Description Used by

Macro-economic

Mortgage Market

Rates + Mortgagors are more likely to refinance their loans when current mortgage market rates or long term interest rates (value for 3Y tenor of country’s yield curve) is below the loan-offered contractual rate at last reset. Either set of rates is used to construct refinancing incentive that measures the extent to which mortgagors are encouraged to prepay the loan prematurely.

Alink (2002), Van Bussel (1998), Deng (2000)

Term Structure of Interest Rates

+ Alink (2002), Calhoun and Deng

(2002)

Seasonality +

Different prepayment rates are recorded in particular months with peak in July (borrowers tend to relocate during holidays) and December (borrowers tend to refinance at the year end due to tax incentives).

Alink (2002), Van Bussel (1998), Charlier and Van Bussel (2002), Hayre (2003)

House Price Index +

Higher property prices encourage prepayments, as borrowers can profit from house sales; falling house prices, in turn, may limit refinancing-driven prepayments as a result of a property becoming a negative equity.

Clapp et al. (2001),

Charlier and Van Bussel (2002)

Borrower-specific

Income + Those who are more likely to have low levels of income – are less likely to exercise the option

to prepay, even when it is in their financial interests to do so. Clapp et al. (2001), Deng (2000) Age of the

Borrower

-Higher age of borrower decreases the probability of relocating, as young people tend to change houses more often; lower age of the borrower decreases the probability to refinance,

as moving option is used more often.

Alink (2002), Clapp et al. (2001), Charlier and Van Bussel (2002)

Geographical

Location

-Lower moving and refinance-driven prepayments in the West of the country, as the region is characterized by higher level of urbanization, measured by high level of address density of surroundings.

Alink (2002), De Jong (1998), Deng (2000)

Creditworthiness - Borrowers with weak credit reliability are more likely to be constrained in their ability to refinance

the loan or to relocate. Clapp et al. (2001), Alink (2002)

Loan-specific

Redemption Type + Prepayment rates vary across mortgages with different redemption types (applicable to mortgage components): prepay early in amortization scheme when part of notional amount is small.

Alink (2002), Charlier and Van Bussel (2002), Van Bussel (1998) Fixed Rate Period + The longer the fixed interest rate period – the lower the prepayment rates, as locking-in long fixed

period signals unwillingness to make decisions about the future. Clapp et al. (2001) Remaining Fixed

Rate Period +

Mortgages are prepaid more often in the last month before interest rate reset, and borrowers tend to

move in the last months before next reset. Alink (2002)

Penalty Proxy + Penalties increase the costs of refinancing the mortgage – and, thus, reduce the prepayment rates. Charlier and Van Bussel (2002) Loan Age + The higher the loan age – the more likely it is to be prepaid (alternative to a “burnout” effect). Clapp et al. (2001), Deng (2000)

Loan-to-Value +

Higher loan-to-market mortgages exhibit lower prepayment rates, as mortgage with lower leverage values implies higher overall wealth of the household, which is, on average, indicative of low prepayment probabilities.

Alink (2002), Calhoun and Deng (2002), Deng (2000)

National Mortgage

Guarantee +

Guaranteed mortgages are prepaid less, because early redemption may lead to a loss of a guarantee;

after refinancing, a new loan may exceed the LtV ratios that are required to be NHG-insured. Charlier and Van Bussel (2002)

(17)

2.3 Prepayment Models

The abundant academic literature on prepayment risk models can be partitioned into three sub-groups, based on the optimality of the prepayment exercise: models with optimal option-theoretic prepayments, non-optimal exogenous models, and strictly empirical models.

In the first subset of models, the optimal prepayment mortgage valuation has been ad-vocated in the extensive doctoral thesis by Van Bussel (1998), who was among the firsts to provide insights into the valuation of a Dutch mortgage loan, given its prepayment risk restrictions. Namely, in a setting of perfectly competitive markets, a rational mortgagor makes endogenous decision to exercise the implicit call option to prepay the mortgage loan if and only if it is in-the-money, i.e. when interest rates fall below current contractual ones. A set of studies demonstrates that this is not always the case (Dunn & McConnell, 1981; Kau, Keenan, Muller & Epperson, 1992), which makes the optimality-based techniques inappropriate for accurate prepayment rate predictions and for subsequent contractual cash flow calculations.

As a borrower’s decision to repay has a contingent and not necessarily optimality-driven nature, a second group of prepayment models has emerged. This transitory group of exogenous models uses the option-theoretic model as a foundation to incorporate exogenous determinants of prepayments, such as transaction costs and other heterogeneous frictions. It allows for the effect-ive disentangling of deterministic and stochastic components in quantifying the prepayments. For example, the work by Kau and Slawson (2002) offers a transitory model that focuses on incorpor-ating borrower-originated heterogeneity, while preserving optimality in loan terminations. Next, the paper by Charlier and Van Bussel (2003) looks at the sub-optimal exercise of the mortgage prepayment option as a feature that restrains accurate instrument valuation, and concludes that correct identification of mortgage termination drivers, such as mortgage type and tax regimes, facilitates more optimal option exercise and allows for factual prepayment rates to be used for the mortgage pricing. The work by Archer and Ling (1993) accords to these findings and adds the ”burnout” phenomenon as explanatory variable for the sub-optimal loan exercise. The burnout is expected to capture the general tendency of some mortgagors to prepay as fast as they recog-nize the benefits of early mortgage termination, while the others fall into the category of ”slow prepaying borrowers”. Overall, the dynamics of burnout consolidates both heterogeneity in mort-gagors characteristics and aging of the mortgage by incorporating the changes in mortgage pool composition(Charlier & Van Bussel, 2003). Further, given the possibility for partial penalty-free prepayments, Kuijpers and Schotman (2007) model partially callable linear and annuity mortgage loans using modified binomial trees, to accommodate the limitation of the non-optimal exercise of the prepayment option.

From the applied perspective, Kau and Slawson (2002) highlight the necessity to determ-ine exogenous factors influencing sub-optimal behavior among mortgagors by turning to observed prepayment rates to formulate the prepayment function. When modeling prepayment behavior, strictly empirical models include the binary choice model (Cox & Snell, 1981), the Cox propor-tional hazard model (Cox, 1972) and their extensions to competing risks models (Clapp et al., 2001). Green and Shoven (1983) pioneered the usage of the latter to assess the sensitivity of pre-payments, measured by mortgage turnover rates, to interest rates based on the sample of 3,938 loans. In contrast to endogenous-based set of models, the empirical prepayment models attempt to establish the relationship between the observed unscheduled mortgage repayments and a set of explanatory variables.

Additionally, it is important to differentiate the body of literature on the basis of the level of the data used. Particularly, aggregated pool-level mortgage loans data predominated earlier research on the prepayment risk: Kang and Zenios (1992), Singh and McConnell (1996), along with the paper by Clapp et al. (2001). In contrast, loan-level data allows to incorporate crucial

(18)

borrower-specific characteristics, such as income, age and geographic location. The approach has been adopted in more recent research by Charlier and Van Bussel (2003) and Sterk (2004), as detailed data has become more accessible. Nonetheless, the pool-level data allows to capture the aggregated effects of risk factors on the overall portfolio: Kang and Zenios (1992) examine the aggregated behavior of the pool of homeowners to model the prepayment of mortgage-backed securities, financial products known as MBS, and to further price the portfolio of MBS; next, Kalotay, Yang and Fabozzi (2004) point out that once the mortgage holder chooses to hold a pooled portfolio of mortgages, it is appropriate to measure the aggregated prepayment rate of such pool as an asset.

2.4 Machine Learning Applications

In modeling risks associated with residential mortgages, conventional statistical models, such as multinomial regression and Cox Proportional Hazard (CPH) model discussed in Section 2.3, have been the central choice among scholars. Yet, the success and effectiveness of these mod-els is conditional on satisfying such restrictive assumptions as normality, linearity and existence of functional relationship between dependent variable and independently and identically distrib-uted predictors (Zhang, Hu, Patuwo & Indro, 1999). To address these limitation and relax the assumption of non-linearly specifically, many researchers resort to machine learning methods.

Recent empirical studies confirm that machine learning techniques outperform traditional econometric models in many financial applications: in predicting mortgage loan defaults random forest algorithm surpasses statistical approaches (Ghatasheh, 2014), when applied to mortgage loan-data in default identifications k -nearest neighbour techniques outperform the probit regression (Galindo & Tamayo, 2000), the linear regression approach illustrates extreme under-performance when compared to classification and regression tree (CART) technique in consumer credit-default forecasts (Khandani, Kim & Lo, 2010).

Recent studies on mortgage risks extensively utilize a set of supervised learning al-gorithms, such as artificial neural networks (Giesecke et al., 2016). Having gained the predictive edge in medicine, criminology, meteorology, and environmental domain - fields that require accur-acy, flexibility and speed in complex pattern recognition tasks, the popularity of artificial neural networks has transitioned into the financial world (Trippi & Turban, 1992; Zahedi, 1996). In mortgage prepayment problems, ANN has demonstrated to be a robust tool: as option-theoretic and exogenous models fail to fully incorporate non-linearity of the relationship between predictor variables, e.g. the effect of refinance incentives on the prepayment rates, or to capture responsive-ness of a priory unknown risk-factors (Waller & Aiken, 1998), it is critical to let the data alone dictate the methodological architecture of the model. Applied to the American mortgage market, the work by Waller and Aiken (1998) successfully develops such ANN structure that outperforms the logistic regression in classifying realized prepayments for the sample of 406 observations of residential loans. The emphasis of the paper is to examine the predictive validity of ANNs using solely loan-level characteristics as explanatory variables.

Although quite cumbersome in execution and manifold in underlying theory, the ANN is the functional alternative to the classical statistical models (Zhang et al., 1999) and remains one of the most effective learning tools (Bishop, 2006). Ability to generalize is among the biggest ad-vantages of the ANN: after being trained, the network can accurately infer functional relationships on the unseen (out-of-sample) part of data (see Appendix A for the detailed overview of the ANN training process).

Additionally, most recent studies demonstrate that ANNs can be used for time series problems and are successful in various forecasting applications (Zhang et al., 1999). However,

(19)

incorporating time series component into neural network architecture is non-trivial and often re-quires using deep learning techniques: for example, recurrent neural networks (RNN), such as Long-Short-Term-Memory (LSTM) models, rely on long-term dependencies in past information in order to produce the forecasts (Kaastra & Boyd, 1996). The block of neural networks that incorporates the consequential component of time-series data is beyond the scope of this thesis.

(20)

3. Artificial Neural Networks

This chapter focuses on the introduction and applications of artificial neural network (ANN) as a system of supervised learning techniques to model the prepayment risk in Dutch residential mortgages. The brief theoretical overview of ANNs is presented in Section 3.1; more extended dis-cussion on the steps on ANN modeling can be found in Appendix A and B. Section 3.2 describes the methodological steps employed in training and selecting the network with the best perform-ance indicators. Section 3.3 outlines the ways of identifying most influential variables that affect prepayment rates through weights analysis. The chapter uses notations given by Bishop (2006) and the step-by-step procedure used by Kaastra and Boyd (1996) are conformed to. Throughout the chapter, the number of network layers is denoted as L, with one input layer and L − 1 ”hidden” l-layers (the network is commonly referred to as a (L − 1)-layer network). A visual representation of a typical neural net is given in Figure 3.1.

3.1 Overview of ANN

A typical multi-layer network can be envisioned by the structure that takes the vector of explan-atory variables, or inputs x = {x1, . . . , xD}, passes them through several layers, denoted as l, of processing neurons z = {z1, . . . , zM}, to result in a single or a set of outcomes stored in a vec-tor y = {y1, . . . , yK} that is a sufficient approximation of the target vector {tn}. An arranged neural network is characterized by strong interconnectedness and feed-forward topology of layers, which allows to put collected attributes through a series of functional transformations f (·) on the hidden layers, that are responsible for identifying patterns in the data, to produce the output(s) of the model. The parallel can be drawn with the ordinary least squares regression, with inde-pendent variables being inputs, the MSE minimization problem being a simplified form of ANN’s transformation function, and the dependent variable being the output (Kaastra & Boyd, 1996).

In the neural net, the input layer of neurons should first be linearly combined to produce weighted input that can be further transformed and passed to the adjacent layers. Therefore, M -number of linear combinations of x-vector inputs in the first hidden layer can be formulated as follows: a(1)_j = D X i=1 w_ji(1)xi+ b (1) j (3.1) where:

j = 1, ..., M = number of hidden neurons in l-layer, i = 1, ..., D = number of input neurons,

aj = weighted (linearly transformed) input for hidden neuron j,

wji = weight of i-input for the hidden j-neuron,

bj = bias parameter for hidden j-neuron.

Using the vector notation, the Eq.3.1 becomes a(1)_{= w}(1)_{x + b}(1) _{with both weighted inputs, a}(1)_,

(21)

..

.

..

.

..

.

x1 x2 x3 xD z1 zM y1 yK Input layer Hidden l layer Output L layer

Figure 3.1: Typical architecture of multi-layer perceptron ANN

and biases1_{, b}(1)

, as M -dimensional vectors, and w(1) _{being a matrix of D × M size on the first} hidden layer of the net.

Next, a non-linear differentiable activation function f (·) (see Appendix A.3) is applied to the transformed weighted inputs into so-called hidden units. The chosen function controls the flow of information that passes through neurons and prevents this information from reaching large values. On the first hidden layer, the following applies:

z_j(1)= f (a(1)_j ) (3.2)

Using the similar procedure, transformations of weighted input combinations can be estimated for all hidden l layers, with the forward-moving information flow, except for the output layer, L. Here, the output neurons can be linearly combined and transformed using the output activation function fo(·) (see Appendix A.3), which results in:

a(L)_k = M X j=1 w_kj(L)z(L−1)_j + b(L)_k , (3.3) y(L)_k = fo(a (L) k ) (3.4) where:

j = 1, ..., M = number of hidden neurons in l-layer, k = 1, ..., K = number of outputs in L-layer

ak = weighted (linearly transformed) input for output neuron j,

wkj = weight of k-output neuron for the hidden j-neuron,

bk = bias parameter for output k-neuron.

yk = set of outputs

Combining these steps together, allows to formulate the general nonlinear function of

1_{The purpose of the bias is somewhat similar to the intercept in a simple linear regression problem: it shift the}

(22)

the 2-layer neural network: yk(x, w) = fo XM j=1 w_kj(L)f D X i=1 w_ji(1)xi+ b (1) j + w(L)_k (3.5)

In this way, the neural network model allows to establish a deterministic non-linear function from input variable x = {x1, . . . , xD} to output variable y = {y1, . . . , yK} using a set of adjustable scalar parameters, such as weights and biases, stored in a vector w and b, respectively. The models with such two-staged processes are also called multilayer perceptrons (MLP), and steps (3.1)-(3.4) forward propagation.

For an ANN to be evaluated further, the error function has to be chosen and, con-sequently, minimized, w.r.t the chosen target set, {tn}. For example, in a problem with N -number of input vectors {xn} with corresponding targets2 {tn}, the minimization of a general objective (cost) function is formulated as a sum of individual n-loss functions (here: MSE cost function3):

En(w, b) = 1 2 N X n=1 ky(xn, w) − tnk2 (3.6)

with k·k being the Euclidean vector norm. It is, therefore, necessary to choose the optimization algorithm of dynamically updating the weights matrix that minimizes the error function E(w, b). Essentially, a training algorithm is applied to the non-linear function (3.5) on every iteration, which allows for modification of both w and b in an optimal, i.e. MSE-minimizing, way (Zhang et al., 1999).

Although, it is not always feasible to find a converging solution to the minimization problem, there are numerous available training methods with the goal to reach if not a global, but a sufficiently good local minimum on the decision surface, while avoiding saddle points (Fletcher, 2013); the most commonly used method for training a neural network is a two-staged algorithm4 that involves, first, evaluating the derivatives of the chosen cost function (Eq.3.6) and then, ap-plying these evaluations to adjust weights and biases using different optimization techniques, e.g. gradient-based learning. The first stage of the training process is strictly referred to as back-propagation, or backprop (LeCun, Bottou, Orr & M¨uller, 1998). The second stage uses so-called gradient descent: in order to locate the minimum of Eq.3.6, w and b are step-wise updated towards the steepest descent, i.e. decrease of the error function:

w := w − η∇wE(w, b) (3.7)

b := b − η∇bE(w, b) (3.8)

where η is the learning rate (scalar constant, η > 0 ), ∇w is the vector of partial derivatives w.r.t. weights ∂E(w, b) ∂wj1 ,∂E(w, b) ∂wj2 ,∂E(w, b) ∂wji

and ∇b is the vector of partial derivatives w.r.t. biases ∂E(w, b) ∂bj1 ,∂E(w, b) ∂bj2 ,∂E(w, b) ∂bji .

2_{Independently and identically distributed (Bishop, 2006).}

3_{An alternative cost function is cross-entropy error function, that is applied in most classification problems (G. P.}

Zhang, 2000).

4_{An alternative technique of finding the best performing local minima utilizes the sequential quadratic}

program-ming (SQP) techniques and, as opposed to a the backpropagation, is derivatives-free (Rios & Sahinidis, 2013). Other training algorithms are described in Appendix B.1

(23)

Learning rate, denoted in the gradient equation (3.8) as η, is an essential hyperparameter that is related to the training of the ANN; this constant term defines the speed with which parameters, such as weight and bias matrices, are being updated (Kaastra & Boyd, 1996).

Gradient descent can be evaluated using the whole dataset (batch gradient descent), alternatively a weight vector can be updated one randomly-chosen data point at a time in a sequential manner (stochastic gradient descent, SGD). The former method uses Eq.3.8 to pass through the entire data set and find an average appropriate gradient within a initialized basin (on the Figure 3.2a, it is any convex area on the functional surface), while the latter performs the weight-updating with replacement to find an estimate of the gradient, which involves a certain degree of noise. Such noise in a gradient estimate allows to detect minima of various depths by jumping to various basins (LeCun et al., 1998). Figure 3.2 presents a visualization of an arbitrary non-linear function with several extrema points w.r.t.error function in both surface and corresponding contour form: while batch gradient descent halts once a relatively good minimum (on the subplot 3.2b it corresponds to stopping at -2.89 depths) has been detected, SGD method is able to escape the minimum it has found in order to locate a more optimal, deeper, one (on the subplot 3.2b it corresponds to a jump from -2.89 to -5.63).

Estimating gradient descent for the purpose of w- and b-adjustment (Stage 2) involves evaluation of numerous partial derivatives w.r.t. the chosen cost function in backwards manner, i.e. backpropagation (Stage 1). If the data set comprises a number of data points, the derivative of the aggregate cost function w.r.t. weights and biases is the sum of derivatives of individual loss functions w.r.t. individual weight w_ji(l) and bias b(l)_j :

∂E(w, b) ∂w(l) = N X n=1 ∂En(w, b) ∂w_ji(l) (3.9) ∂E(w, b) ∂b(l) = N X n=1 ∂En(w, b) ∂b(l)_j (3.10) The remainder of the thesis skips operations on the bias term as they are assumed to be identical to the ones applied to the weights. According to Bishop (2006), an activated bias unit on any hidden l-layer and the output L-layer is often fixed at the +1 level:

z_j(l)= f (b(l)_j ) = 1 ∧ z(L)_k = fo(b (L) k ) = 1.

As the flow of information has backwards direction, the evaluation of derivatives starts at the output layer L. On the output layer L of the network, a partial derivative of a loss function in Eq.3.9 can be expressed using the chain rule, as the cost function En depends on a given weight only implicitly through the sum of activated y_k(L)output units of linearly weighted input neurons a(L)_k (y(L)_k = f (a(L)_k ) = f ( M P i=1 w_kj(L)z_j(L−1)+ b(L)_k )): ∂En ∂w(L)_kj = ∂En ∂y(L)_k ∂y(L)_k ∂w(L)_kj (3.11) Further, so as to compress notation of multiple partial derivatives, it is common to introduce the δ error term, which is a partial derivative of a loss function w.r.t. a linearly weighted input on of j neuron on the L-layer, a(L)_k :

δ_j(L)≡ ∂En ∂y(L)_k = K X k=1 ∂En ∂y(L)_k ∂y_k(L) ∂z(L)_j , (3.12)

(24)

(a) Surface view

(b) Contour view

Figure 3.2: Extrema on an arbitrary decision surface, an example

When k = j, the activated neuron on the output layer, y_k(L), depends solely on the weighted input on the output level L, aL_j; when k 6= j, Eq.3.12 can be rewritten as follows:

δ(L)_j = ∂En ∂y_j(L)

∂y(L)_j ∂a(L)_j

(3.13)

Using Eq.3.4, the error on the output layer can be simplified to: δ_j(L)= ∂En

∂y(L)_j

f_o0(a(L)_j ) (3.14)

Going backwards along the network layers, the error on the l-layer can be formulated in terms of the (l + 1)-layer:

(25)

δ(l)_j = ∂En ∂a(l)_j = M X j=1 ∂En ∂a(l+1)_j ∂a(l+1)_j ∂a(l)_j = δ(l+1)_j M X j=1 ∂a(l+1)_j ∂a(l)_j , (3.15)

where, using Eq.3.1, a(l+1)_j = M P j=1 w(l+1)_ji z_j(l)+ b(l+1)_j = M P j=1 w_ji(l+1)f (a(l)_j ) + b(l+1)_j . Hence, the expression ∂a

(l+1) j

∂a(l)_j can be constructed by differentiating Eq.3.15: ∂a(l+1)_j

∂a(l)_j

= w_ji(l+1)f0(a(l)_j ) (3.16)

Finally, combining Eq.3.16 and Eq.3.15, the error of j neuron on any hidden layer l is:

δ(l)_j = f0(a(l)_j ) M X

j=1

w(l+1)_ji δ(l+1)_j (3.17)

Summarizing the steps of the algorithm of backpropagation, it is first necessary to propag-ate the network forward using Eq.3.1-3.4, evalupropag-ate the output error δ(L)_{for all units on the L-layer} using Eq.3.14, next backpropagate the error through layers l = (L − 1), (L − 2), ..., l by utilizing Eq.3.17, finally, evaluate gradient of the chosen cost function using Eq. 3.9.

3.2 Methodological Steps

In order to employ ANNs in modeling prepayment risk, as well as to establish the best performing network that can further be used to determine the relation between the determinants of prepay-ments and prepayment rates, methodological steps given in Section 3.1 are conformed to.

In the context of prepayments, the input vector x = {x1, . . . , xD}, independent variables in the ordinary least squares (OLS) procedure notation, consists of those explanatory variables that are described in Table 2.1 and are marked ’+’ as available. For the portfolio-level analysis, the tar-get vector t = {tn}, dependent variable in the OLS procedure notation, constitutes mortgage-pool prepayment rates, while for the loan-level dataset, t = {tn} consist of granular loan prepayments. The former net is aimed at fitting explanatory variables to the rate of mortgage pool prepayments, while the latter performs the approximation of explanatory variables to individual loan prepayment rates. Based on the selected network architecture and peculiarities of the data set, the input vector is being propagated both forward and backwards to find the best fit for the corresponding targets. The procedure results in the vector of fitted outcomes y = {y1, . . . , yK} that serve as a sufficient approximation of the either target vector t = {tn}.

Although time-series by nature, the portfolio- and loan-level datasets in this thesis will be regarded as static and will be treated as cross-sectional, as the aim of the network fitting in this research is to extract the most influencing features that affect prepayment rates, as opposed to the portfolio developments throughout time. For the purpose of incorporating time-dependencies of the prepayment rates, the seasonality input is used (see Section 4.2) and additional sample-separating robustness checks are preformed in Section 5.3. See Chapter 6 for possible ways to use ANNs for time-series and panel types of datasets.

To approximate the set of inputs to historical targets, the appropriate algorithm of training the network has to be chosen. Section 3.1 describes the most commonly used training procedure of backpropagation with gradient descent. However, the choice of the training algorithm is ideally contingent on the specifics of the data under inspection. For more complex non-linear dependencies, characterizing the prepayment determinants and associated rates of prepayments, Bishop (2006) recommends to apply second-order training procedures described in Appendix B.1.

(26)

Therefore, it is first necessary to select such training procedure that could guarantee the sufficient model performance.

Upon training, the algorithms for both portfolio- and loan-level datasets are assessed on the quality of the performance, which is measured by such metrics, as the cost function performance, e.g. MSE E(w, b) (Eq.3.17) at the validation and tests stages (see Appendix B.1), correlation measure, R, of predictive accuracy between estimated output vector y = {y1, . . . , yK} and the target vector t = {tn} (portfolio- or loan-level targets), and, finally, the amount of iterations required for convergence, referred to as an epoch and elapsed computational time. These measures are evaluated cumulatively. Finally, only two of the best performing models per level of aggregation are used for the analysis of influencing risk factors by means of examining the matrix of connector weights.

3.3 Weights Analysis

In order to identify and to appropriately interpret the risk factors, the model inputs stored in vector x, that predominate in dictating the prepayment rates t on both granular and aggregate levels, several approaches can be applied. Often, visualizing the connecting weights on the neural network diagram (the weights matrix w), such as Figure 3.1, is opted for, as the way to detect the influence of the weights on the output. Such Neural Identification Diagrams (NIDs) are proposed by ¨Ozesmi and ¨Ozesmi (1999). Nonetheless, this method provides only graphical interpretability and lacks the identification of empirical relation between input and output layers. Alternatively, the Olden and Jackson (2002) propose calculating the product of the connector weights on the hidden layer l and the output layer weights. The resulting values are summed up for each input {x1, . . . , xD} to establish the most valuable inputs for a given net. However, when taking the product of weights, the sign is altered, making the direction of the influence of the inputs on the output unclear (Olden, Joy & Death, 2004). Among the most popular methods of pinpointing influencing inputs, based on the weights and bias matrices, is the Garson’s algorithm (Garson, 1991). The method takes hidden weights matrix, separates it into input-components and takes the absolute values of each input weight across all hidden layers. The approach sustains the directional influence of each input and makes it possible to both track the magnitude of the influence of each input and preserve the direction of the contribution.

This thesis uses the method developed by Garson (1991) to highlight the risk factor that dictate the prepayment behaviour of mortgages and to demonstrate the explanatory power, along with the weight directions, of the selected inputs (see Table 2.1). For example, in the fitting-type of the neural network, which consists of 3 inputs, x = x1, ..., x3, and one output, y = y1, and is structured as l-hidden layered network of the hidden size M = 2, the connector weights would be analyzed as follows:

Hidden, M = 1 Hidden, M = 2 Input Contribution, wi

Input, x1 w1,1= −2.34 w2,1= 3.05 w1= 3.05

Input, x2 w1,2= 2.63 w2,2= −12.34 w2= −12.34

Input, x3 w1,3= 6.17 w2,3= 3.02 w3= 6.17

Table 3.1: Connector weight quantification by Garson (1991), an example

(27)

4. Data Description

The model accommodates the internal contractual data set consisting of the fixed rate loans in the Dutch mortgage portfolio held by NIBC. Specifically, the data on both portfolio and the loan-level1 is used. Granular monthly data spans from January 1, 2008 up until February 28, 2018, inclusive. To illustrate the size of the mortgage portfolio throughout years, Table 4.1 presents the number of loans outstanding each year on NIBC Bank N.V. books. When presented in cross-sectional dimension, the final portfolio-level sample consists of [confidential] observations, and loan-level sample totals [confidential] data points: as time component is eliminated, observations for each time period are perceived as data points on the cross-sectional dimension2_{. For two} data-sets, corresponding amount of targets has to be generated.

Year 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Number of Loans

Table 4.1: [confidential] Number of loans in the final sample, by year

To avoid instabilities at the training stage of the ANN and ensure convergence, it is crucial to transform both targets and inputs in a way that would eliminate the bias towards the variables with large magnitudes, i.e. prevent gradient descent from excessive zigzagging. The goal of a chosen transformation method is to equalize the variability of targets and inputs, based on their statistical moments (LeCun et al., 1998); hence, the transformation applied on each variable is correspondingly indicated and follows the notations given by Aksoy and Haralick (2001).

4.1 Targets

To quantify the notion of prepayment outlined in the Section 2.3, the concept of the Single Month Mortality of a mortgage loan, (SM Mt), can be used. The rate is defined as the proportion of the remaining mortgage debt that prepays in a given month t (Veronesi, 2010):

SM Mt= (Prepaid Debtt)/(Total Debt Outstandingt)

The annualized percentage of this rate determines the Conditional Prepayment Rate, (CP Rt), of a mortgage loan:

CP Rt= 1 − (1 − SM Mt)12 (4.1)

Monthly historical CPR percentages, corresponding to the prepayments made for the purpose of refinancing and partial repayments that exceed penalty-triggering percentages (see

1_{As a standard practice, one loan in the portfolio may consist of up to 10 loan parts, each characterized by}

specific interest rate type, amortization schedule and fixed rate period; hence, it is reasonable to regard a loan part as an individual loan.

(28)

Section 1.3), have been constructed for the chosen time period at both loan (using prepaid loan amount and remaining mortgage debt for a loan) and portfolio levels (using aggregate prepaid amount in the portfolio and the remaining mortgage debt in the portfolio). To stabilize targets, CPR values subsequently undergo Gaussian normalization (z-score standardization) to N ∼ (0, 1)) :

x0_i= xi− µx σx

, (4.2)

where x0_i is the transformed value of the initial variable xi, µx is central tendency of x in a given range (population mean), and σx represents the variability of x in a given range (population standard deviation).

Figure 4.1: [confidential, y-axis hidden] Evolution of the conditional prepayment rates, monthly Figure 4.1 visualizes the monthly development of the aggregate portfolio normalized rates from January, 2008 to February, 2018. Prepayment rates appear to fluctuate significantly over the time period, which is partially attributed, among other factors, to the general developments on the housing market3_{visualized in Figure 1.1.}

4.2 Inputs: Macroeconomic

Following the list of prepayment determinants in Table 2.1, the input variables can be formu-lated, constructed, and subsequently transformed as part of the pre-processing stage of the ANN methodology.

One of the most influential macro-level explanatory variables is the refinancing incentive, that captures the propensity of the mortgagor to prepay when the available mortgage market rate is below the contractual, i.e. the rate fixed at the last reset (Charlier & Van Bussel, 2003). In the basic form, the refinancing incentive at time t is given by the Eq.4.3:

incentivet= rloank − rmarkett (4.3)

where:

3_{Stucken (2017)}

(29)

k = previous reset date at time t, k ≤ t, rloan = current rate on a loan,

rmarket_{= corresponding mortgage market rate.}

The mortgage market rates, rtmarket, are provided by the MoneyView, an agency that administers data on various financial products across the Netherlands. A data set contains weekly information on the offered rates of Dutch mortgage providers for different loan types. Consequently, loans in the NIBC are matched to the appropriate monthly-averaged market rate based on the re-demption type, presence of the National Mortgage Guarantee (NHG), fixed-rate period (FRP) and loan-to-market value (LtMV) of the loan. The final refinancing incentive variable is standardized using z-score method (see Eq.4.2).

Additionally, in the past decade Dutch banks have been bound to adopt the reduction of Loan-to-Value limits: recommendations made by International Monetary Fund (IMF) initiated the gradual reduction of the ratio cap from 104% in 2014 to 100% as of January 1, 20184_{. In order} to incorporate policy changes, the refinancing incentive is fixed at -1 once the LtMV of the loan in the portfolio exceeds the regulatory limit in a specific year5_{. To provide an example of such} transformation, the refinancing incentive for any month between 2008 and 2015 can be specified as follows: incentivet=    rloan_k − rmarket t if rmarket(2008,2014), −1 if rmarket [2014,2015)∧ LtMV ≥ 104%. (4.4)

The conditioning of the refinancing incentive on such policy change allows to implicitly incorporate the structural break in the prepayment rates.

The next set of prepayment determinants includes monthly historical interest rates; spe-cifically, three months swap rate curves for euro currency6 are used. Monthly term structures include rates on twenty eight tenors (maturities from 1 week to 50 years, further denoted using W 1 and Y 50 notations); all tenors are used for the analysis as 28 input variables7. According to Gou and Fyfe (2004), networks that utilize the datasets, that contain certain level of multicollin-earity, perform sufficiently well, when appropriate regularization methods are used (see Appendix B.2). Figure 4.2 highlights the variation of yield curve shapes in the crisis period (4.2a) versus recent 2 years (4.2b). This phenomenon is expected to have a significant effect on the contributing risk factors. Therefore, additional robustness checks will be performed on financial crisis/non-crisis data points. As of 2015, negative interest rates are not uncommon; the choice of the activation function (see Appendix A.3), therefore, is partially dictated by this phenomenon.

4_www.dnb.nl

5_{The regulatory requirements with respect to LtI limits are omitted, as the maximum LtI is normally determined}

on the loan-level basis and varies significantly, see Table 4.4

6_{Bloomberg: ZERO EUR vs 3M swap}

7_{Common alternatives are five year swap rate (Sterk, 2004) or term structure shape variables: shift, curvature}

(30)

(a) Period of 2008-2009

(b) Period of 2016-2018

Figure 4.2: Monthly term structures of EUR vs. 3M swap rates

Lastly, as prepayments are cyclically distributed over the year with peaks in December and July (see Figure 4.1), it is important to include the variable(s) that capture this seasonal pattern. First, the year is divided into 4 seasons: January-March (base category), April-June, July-September, and October-December, for each of which corresponding seasonal identifier variables are created. Alternatively, 11 identifier variables for each month can be created (with 12th month as the base category) using binary encoding [0, 1].

4.3 Inputs: Borrower- and Loan-specific

The only borrower-specific determinant of mortgage prepayments in the analysis is Loan-to-Income variable, measured as the ratio of the original notional to the gross income of a mortgagor at loan origination. The variable is readily available in the database and can be subsequently normalized to the [-1, 1] bounds: x0_i= xi− xmax+ xmin 2 xmax− xmin 2 , (4.5) 30

(31)

where xmin is the minimum value variable xi takes in a given range and xmax is the maximum value xi takes in a given range.

Among loan-specific characteristics, the various redemption schedules of a mortgage con-tract highlight the overall composition of the portfolio. All loan types specified in the Table 1.1 are present in the data set; based on the classification provided by NIBC, the following apply: Annuity (”Level Mortgage”), Linear (”Linear”), Interest-only (”Interest Only”, ”Bridge Loan”, ”Credit”, ”Life”, ”Unit Linked”, ”Universal Life”), Savings (”Savings Mortgage”, ”Investment”).

Figure 4.3: [confidential, y-axis hidden] Composition of the mortgage portfolio, by redemption type

The large share of interest-only mortgages, combined with savings products, has been diminishing as a result of post-crisis tightened lending conditions along with lift of tax benefits in 2013; annuity and linear types of mortgages have evidently increased in popularity in recent years. Given the distribution of mortgage types in the portfolio, the identifier variable for the redemption schedule characteristics takes on 1 if the mortgage is interest-only, and 0 otherwise.

Next, interest FRP of a mortgage loan commonly varies from one year to thirty years, with longer period reflecting the willingness to lock-in a certain interest rate and unwillingness to repay prematurely (Van Bussel, 1998). Table 4.2 lists the average CPR per FRP buckets and provides the supporting evidence of such dynamics. In the portfolio, 45% of the loans, on average, have the interest rate fixed for five to ten years, and only 2% of loans have a FRP that exceeds twenty five years (see Figure 4.4). Corresponding indicator variables are created for each FRP bucket.

FRP <5Y 5Y - 10Y 10Y - 15Y 15Y - 25Y >25Y

Average CPR

Table 4.2: [confidential] Average CPR, by FRP bucket

Time until the next interest reset influences the decision to prepay, as the mortgages with one month of remaining fixed rate period are the most inclined for prepayment. Consequently, the bucketing for the remaining FRP variable is constructed in the following way: 1 month before reset, 2 - 6 months, 6 - 12 months, and more than a year before the next reset. Corresponding indicator variables are created for each remaining FRP bucket.

(32)

Figure 4.4: [confidential, y-axis hidden] Composition of the mortgage portfolio, by FRP bucket

As a ratio of loan amount to the property value, indexed loan-to-market value (LtMV) variable is negatively associated with prepayment rates. To capture this relationship, the LtMV is bucketed into 4 categories: LtMV of less than 75%, 75% - 90%, 90% - 100%, and LtMV ratio exceeding 100%. Corresponding indicator variables are created for each indexed LtMV bucket. Table 4.3 show the average prepayment rates in the NIBC’s portfolio for corresponding LtMV buckets. The LtMV ratio of more than 100% is symptomatic of the loans issued prior to 2018.

LtMV <75% 75% - 90% 90% - 100% >100%

Average CPR

Table 4.3: [confidential] Average CPR, by LtMV bucket

Penalties associated with prepayments, as a characteristic feature of the Dutch mort-gages, is an important risk factor in determining prepayment rates. The framework for penalty calculations is determined internally within financial institutions, and is dependent on the type of mortgage redemption, and involves extensive loan-level calculations. For the purpose of incor-porating prepayment penalties, an approximate proxy is used: penalties are expected to be high when there is a high incentive to refinance and the mortgage contract has large remaining FRP. The product of the two defines the simplified penalty proxy, which is further normalized to [-1, 1] bounds using Eq.4.5, where +1 is suggestive of high likelihood of prepayment with associated penalty costs.

The average loan age, measured as a number of months between the start date of the loan and the time of observation, is [confidential]. The variable is normalized to the [0, 1] range:

x0_i= xi− xmin xmax− xmin

(4.6) where xmin is the minimum value variable xi takes in a given range and xmax is the maximum value xi takes in a given range.

Lastly, the binary variable ’NHG’, which indicates those mortgages that fall under the

(33)

Nationale Hypotheek Garantie program, is included. In line with empirical findings presented in Table 2.1, the average rate of prepayment for the mortgages insured by the program is [confidential], in the absence of NHG guarantee, conditional prepayment rate equals, on average, [confidential] . The number of input variables comprises 5 macro-economic variables and 28 variables characterizing the term structure of interest rates, 1 borrower-specific variable and 17 loan-specific variable and totals 51 input variables; the summary statistics of transformed input variables and the target CPR values is presented in Table 4.4. The scaled data lies in the (−∞, +∞) range, which should be consistent with the range of the chosen transfer in Section 3.1.

Variable Mean St.Dev. Skewness Kurtosis Min Max Refinancing Incentive

Season (January - March) Season (April - June) Season (July - September) Season (October - December) Loan-to-Income Redemption Type FRP ( <5Y) FRP (5Y - 10Y) FRP (10Y - 15Y) FRP (15Y - 25Y) FRP (>25Y) Remaining FRP ( <1M) Remaining FRP (1M - 6M) Remaining FRP (6M - 12M) 3 Remaining FRP ( >12M) LtV ( <75% ) LtV (75% - 90%) LtV (90% - 100%) LtV (>100%) Loan Age

National Mortgage Guarantee Penalty Proxy

CPR

Prepayment risk modeling of Dutch mortgages : a neural networks approach

MASTER THESIS

[NON-CONFIDENTIAL VERSION]

MSc FINANCE: QUANTITATIVE FINANCE

Prepayment Risk Modeling of Dutch Mortgages:

A Neural Networks Approach

Student:

Julia Subotniaya

Supervisors:

prof. dr. Marc K. Francke (UvA)

Victor A. Popa (NIBC Bank N.V.)

University of Amsterdam

Amsterdam Business School

Statement of Originality

Confidentiality Agreement

Acknowledgements

Abstract

Contents

List of Abbreviations

Introduction

1.

The Dutch Mortgage Market

1.1

Historical Development

1.2

Mortgage Loan Types

1.3

Features of Dutch Mortgage Loans

2.

Mortgage Prepayment

2.1

Introduction to Prepayment Risk

2.2

Determinants of Prepayments

2.3

Prepayment Models

2.4

Machine Learning Applications

3.

Artificial Neural Networks

3.1

Overview of ANN

..

.

..

.

..

.

3.2

Methodological Steps

3.3

Weights Analysis

4.

Data Description

4.1

Targets

4.2

Inputs: Macroeconomic

4.3

Inputs: Borrower- and Loan-specific