• No results found

Modelling the Prepayment of Mortgages Using Survival Analysis Techniques

N/A
N/A
Protected

Academic year: 2021

Share "Modelling the Prepayment of Mortgages Using Survival Analysis Techniques"

Copied!
60
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Modelling the Prepayment of Mortgages Using

Survival Analysis Techniques

Tautan Bogdan S2079550

(2)

Abstract

Modelling the rate of failure in the financial sector became of interest in the last decades due to the increased familiarity in survival analysis techniques. For financial institutions and portfolio managers, the early prepayment of mortgages is an important factor. Therefore, the duration of the interest payments in case of securitization may be one of their main concerns. Techniques such as survival analysis are widely used by researchers and have their strength in medicine for predicting early spell transitions, for instance, where different patients are kept under observation. The models used in this thesis and applied with the use of information from mortgage contract holders are the Cox and the Accelerated-Failure Time model. Several factors are taken into account such as the incentive of the mortgagors to prepay the contract earlier than expected. Here the interest rate perhaps plays one of the most important roles. The refinancing incentive will be introduced as a time-dependent covariate. Other factors which describe the behaviour of mortgagors will be used within the models. The semi-parametric model will prove to be the better specification.

(3)

Contents

1 Introduction 1

2 Key aspects in modelling the early prepayment of mortgages 3

2.1 The mortgage market in the Netherlands . . . 3

2.2 Literature review . . . 5

2.3 Data description . . . 7

3 Mathematical formulation of the models 9 3.1 Survival analysis concepts . . . 9

3.2 The parametric model . . . 13

3.3 The semi-parametric model . . . 15

3.4 Modelling diagnostics and goodness-of-fit . . . 17

3.5 Extending the modeling possibilities . . . 19

4 Empirical Analysis 21 4.1 The Kaplan-Meier estimate of the survival function . . . 21

4.2 Semi-parametric model estimation . . . 23

4.3 The Cox model model with time-dependent covariates . . . 32

4.4 Accelerated-Failure Time model . . . 35

4.5 Summary . . . 41

5 Conclusions 44 5.1 Alternative extensions and improvements in modelling early prepayment . . . 46

A Risk factors used in the literature 52

B Parametric models 53

C Estimation of the parameters 54

(4)

List of Figures

3.1.1 Right censoring expressed for 3 different contracts . . . 10 4.1.1 The survival probabilities for savings and redemption-free contracts . . . 22 4.2.1 Smoothed scaled Schoenfeld residuals for the semiparametric model with

fixed-time covariates . . . 27 4.2.2 Martingale residuals . . . 28 4.2.3 Cox-Snell Residuals for assessing the fit of the model . . . 29 4.2.4 Survival and hazard rates for savings mortgage loans and the contracts with

fixed-period interest rate . . . 30 4.2.5 Survival and hazard rates for redemption-free mortgage loans and the contracts

with fixed-period interest rate . . . 31 4.3.1 Survival probabilities for both product types including time-dependent risk

factors . . . 34 4.3.2 Prepayment rates for both product types including time-dependent risk factors 34 4.4.1 Comparison of the distributional assumptions for the refinancing incentive in

case of Savings mortgage product . . . 35 4.4.2 Comparison of the distributional assumption for the refinancing incentive in

case of Redemption-free mortgage product . . . 36 4.4.3 The standardised residuals from Weilbull model plotted along with the survival

distribution of the covariates - Savings and Redemption-free mortgages . . . . 39 4.4.4 Estimated survival functions for both products . . . 39 4.5.1 The Cox, AFT and Non-Parametric estimated survival probabilities for both

(5)

List of Tables

2.3.1 Fixed-rate periods with their respective number of contracts . . . 7

4.1.1 Duration analysis for different categories of mortgage loans, age of the mort-gagor and the socio-economic status of the zip-code area. . . 21

4.2.1 Survival analysis: univariate and multivariate approach . . . 23

4.2.2 Cox models with and without stratification on type of mortgage along with their global test for proportional hazards assumption . . . 25

4.2.3 Cox model with stratification on type of mortgage and fixed interest rate along with test for proportional hazards assumption . . . 26

4.2.4 Prepayment and median . . . 29

4.3.1 Multivariate Cox model with time-dependent covariates . . . 32

4.3.2 Predicted median duration for the time-dependent Cox model . . . 33

4.4.1 Comparison of AIC values for different distributional assumptions of AFT model . . . 37

4.4.2 The estimated AFT models by product type . . . 37

4.4.3 Median duration for the Weibull models . . . 40

4.5.1 Survival probabilities, AIC and Median values estimated by the Non-parametric, AFT and Cox models . . . 42

B.0.1Illustration of the parametric models . . . 53

D.0.2Cox model 1 including 10 degrees of freedom . . . 55

(6)

1

Introduction

(7)

on prepayment, survival analysis becomes an important tool due to the formulation of the models and their flexibility in different situations. Survival analysis takes into account the time between events with respect to a fixed starting point and the moment of occurrence of such event. Considering the prepayment of the mortgages involves a long period, survival analysis might be very helpful. The models which I use are the Accelerated-Failure Time model and the Cox model. Incorporating the survival analysis brings upfront in the research the model selection. The main research question of the thesis will be:

”Which of the models describe better the early prepayment of the mortgages?”

The Accelerated-Failure Time model is part of the family of parametric models. The baseline function is estimated parametrically and captures a specific distribution. Prepayment is modelled with respect to the underlying distribution. In the analysis of survival data parameterisation might be a better choice as long as the right distribution is used. On the other side, the semi-parametric Cox model does not involve a distributional assumption. Its baseline hazard function is estimated non-parametrically. The outcomes and the properties of the models, theoretically, might be similar, while their interpretation is slightly different. In this matter, tests of adequacy will be used in order to ensure that models fit well the data. A good description of survival analysis models can be found in the medical literature. Such studies as Bradburn, Clark, Love and Altman (2003) look mainly at transitions of patients from one treatment to another, or, in the case of a lung cancer trial, from treatment to death. Similar papers capture a wide diversity of cases where duration analysis is proved to play an important role. The main question of the thesis gives the opportunity to answer several sub-questions on how the models are used and how their adequacy of fit is proved.

1. How to select the risk factors in order to model better the prepayment?

(8)

2

Key aspects in modelling the early prepayment of

mort-gages

Several aspects which are important to a mortgagee will be discussed in this chapter. As discussed, the duration and the market factors impose a serious risk for mortgagees. Beside the problematic situation of how a portfolio of mortgages is financed, mortgagees have to take into consideration also other aspects. Examples may be the expenses which take part in the administration of the contract, IT costs or any other related costs which may appear during the term of the loan. Such costs must be allocated and well managed in a fair way for the whole period of the portfolio.

2.1 The mortgage market in the Netherlands

The Netherlands is one of the most diversified mortgage markets in the Europe. According to De Nederlandsche Bank and Statistics Netherlands (CBS) the mortgage market in the Netherlands is one of the largest of the European countries with gross mortgage debt to-talling at 106% of GDP. This percentage is equivalent to a debt of EUR 640 billion. Several asociations help Dutch mortgagors and mortgagees make better decisions or mitigate risk. On one side, Vereniging Eigen Huis helps Dutch individuals to choose a mortgage that is more appropriate for their financial condition. On the other side, one advantage for investors or financial institutions is represented by the fact that they can also be insured in the Nether-lands. National Mortgage Guarantee (NHG) represents an insurance scheme that pays the mortgagee at an occurrence of a specific event. However, the mortgagee has to meet several criteria. At this moment one can choose from the following types of mortgages: annuity mortgage, linear mortgage, investment mortgage, bank mortgage , traditional mortgage with life insurance, endowment, savings and redemption-free mortgage. The first two were the most common in the past, the borrowers having to make regular payments of the principal while the value of the mortgage decreased due to gradual prepayments. The other types include an investment which can be initiated by mortgagors in order to cover their debts, at least, till the end of the contract. Concerning the actual modelling, I will focus on savings and redemption-free mortgages.

(9)

to the lender and also chooses a savings account, in order to cover his or her debts. This should amount to the full payment to the lender at the end of the contract. The regular payments made by mortgagors are determined in such a way that the principal amount at the end the contract is enough to pay the loan.

• A redemption-free mortgage is a type of contract known in the financial market of the Netherlands under the name of interest-only mortgage. In this case, as in the previous, capital accumulation does not take place. The amount borrowed is usually lower than in other types of mortgages as this type of contract involves credit risk. The mortgagor is obliged at the end of the contract to sell the house, to start or refinance a new mortgage contract or to fully pay the borrowed capital if he or she chose to save separately a regular amount. Since this mortgage is interest-only, regular predetermined payments are lower compared with the other products.

(10)

selling of the house or when a fixed interest rate period ends and the mortgagor is supposed to initiate another contract. In the Netherlands the most frequent periods of fixed interest rates are between 5 and 10 years, but they can vary from 1 to 15 years and even more. Even though this penalty can be handled and sometimes taken into account by the mortgagee, the interest rate poses one of the most significant risks.

2.2 Literature review

Predicting mortgage prepayment became more and more important in the Netherlands as a result of the evolution of the market in the last two decades. Nonetheless, important studies which can be found give researchers the opportunity to extend their modelling strategies and contribute with new ones. The following are some of the most known studies in the area of prepayment of mortgages.

Aggregate level studies

• Richard and Roll (1989) compared and estimated prepayment with relation to the United States market by emphasising the size of the Mortgage-Backed Securities (MBS) and their growth in that period of time. Their model is based on a multiplication between the refinancing incentive, a seasoning multiplier, a month multiplier and a burnout factor. They will be described later in this chapter.

• Kang and Zenios (1992) study on a pool level the prepayment of mortgages referring directly to the pricing of MBS. In the same manner as Richard and Roll (1989) they consider the seasonal variations, seasoning of the mortgage pool, burn-out effect and the refinancing incentive. The model is based on a multiplicative relation between the risk factors.

• Golob and Pohlman (1994) highlight some advantages and disadvantages in mod-elling data containing individual loans or using a model on an aggregate level. Here they make a close analysis of the Wharton prepayment model with its components. The model contains the same four components as mentioned in the previous two papers. • Dunn and McConnell (1981) bring upfront the pricing of Government National

(11)

adjustment term proportional to the spot interest rate and the cash flows from any continously paid security.

• In the Netherlands, Van Bussel (1998) looks at the impact and valuation of the interest rate and mortgages and presents further new challenges for the researchers within the country.

Loan level studies

• Green and Shoven (1986) and Schwartz and Tourous(1992) estimate a propor-tional hazards model to show how the variations of interest rates affect prepayments. The model incorporates the effect of seasoning as the age of mortgage, and a function including the market value of a mortgage and the principal outstanding for a given mortgaged house value.

• Charlier and Van Bussel (2001) adapt the same strategy of using the proportional hazards model and apply it on savings and redemption-free mortgage. The risk factors used in the model are related to the seasonality, burn-out and the refinancing incentive. • Hayre (2003) is one of the most important studies made in the Netherlands to forecast prepayments for a Dutch MBS portfolio. His study contains the aforementioned risk factors.

• Koning, Sterken and Jacobs (2007) use the Cox proportional hazards model. This paper is the main reference of my thesis as I have benchmarked its strategy and approach of the subject. The main risk factors described in this paper include the loan size, socio-economic factor of the zip-code area, age of the mortgagor and the refinancing incentive. The main analysis includes only the refinacing incentive as covariate.

(12)

2.3 Data description

As mentioned before, the data is related to two types of mortgages, the savings mortgage and redemption-free mortgage. The data set contains 5, 000 contracts for both of them. The size of the sample is high enough in order not to have biased estimates. The data is provided by a Dutch institution and was used in a similar way in the paper written by Koning, Sterken and Jacobs (2007). Overall, the data set contains 92 variables and 10, 000 loans initiated by mortgagors. The covariates that describe the contracts are monitored during a period of 6 years, between 1998 and 2003, where it is stated if a prepayment occurred or not within this time frame. The first initiation date available for a contract is 15th of September 1972 and the last contractual ending date is found on 1st of June 2036. It would be interesting to see how the contracts are spread among the data regarding the fixed interest rate. In Table 2.3.1 the periods are divided into 5 categories, where the highest number of contracts appear to be present for a period between 7 and 10 years. This is also related to the amount of capital which is borrowed when the contract is initiated.

Period of fixed interest rate Number of contracts Fixed rate between start and 1 year 1030

Fixed rate between 1 and 5 years 1180

Fixed rate between 5 and 7 years 811

Fixed rate between 7 and 10 years 2600

Fixed rate between 10 and 15 years 2162

Fixed rate for 15 or more years 2205

Table 2.3.1: Fixed-rate periods with their respective number of contracts

From the available data I am using the following risk factors: • the age of the mortgagor when the contract is initiated; • the size of the borrowed principal amount;

• the interest rate between 1998 and 2003;

• the socio-economic status of the zip-code area, captured in 2002; • an indicator which will show if the prepayment has occurred or not;

(13)

• fixed-rate period for individual contracts.

Apart from these important variables in the data, other risk factors may also be found such as marital status, birth date, income, but with some missing information. All can represent important factors for a study, but the probability for an institution to keep such specific information is very low. The refinancing incentive already described will be exposed as it is formulated by Koning, Sterken and Jacobs (2007). The formula is given by:

RF Iit= Pn τ =0 (cri−mrn,t)P Pi (1+d)τ P Pi ∗ 100% (2.3.1)

Here n is the number of months until the next interest date, d the discount rate , cri

contract interest rate, mrn,t the market rate at time t for n months loan and P Pi the value

(14)

3

Mathematical formulation of the models

In this section I will introduce the basic survival analysis concepts in a brief way, along with the mathematical formulation of the models I use. A mortgage loan can be seen sometimes as a life insurance or a contract which takes place for a longer period of time. As the main concern of this thesis is prepayment behaviour, the borrower’s transition from a state to another should be well modelled. Here survival analysis plays the most important role.

3.1 Survival analysis concepts

Regarding the duration of the contract the main point of interest is the time to transition from a spell to another. This is often seen in survival analysis under the term time-to-event process. One example can be a young couple that is getting married: as creating a family involves a new, more spacious house, this requires new investments. To be more specific, each individual that already has a contract might transit from one state to another as long as he or she did not end the contract. Moreover, in our data set, the average length of the signed contracts is around 30 years, while other loans may be found as signed for a longer period. In this time many events can take place. The main focus in our study is to observe the transition of the individual to the spell of prepaying the mortgage contract. This is also called a one-state transition, which is crucial in determining the prepayment rates of mortgages. Beside the transition and the time-to-event data, duration analysis models also count how the information related to borrowers is captured in the time-frame. Here features such as censoring and truncation are very important when formulating the models.

When a survival time began or ended within a particular interval of time, this is often known in the survival analysis literature as censored survival time. If the information is cen-sored the length between the entry in the study and the exit is not known exactly. Therefore, censoring is divided into two types, right and left censoring.

Denoting a lifetime T , associated with a certain loan, and a fixed right censoring time Cr,

the exact lifetime T is known only when T is less or equal than Cr. In our case the lifetime may

(15)

length time between the entry and the occurrence of an event is not known. The mortgages can be expressed by pairs of variables (T0, δi), where T

0

is equal to T when the lifetime is observed and Cr if it is censored. Then, T

0

= min(T, Cr). Right censoring is also divided in

other types of censoring. Type I censoring is when the prepayment is observed prior to some prespecified time, Progressive Type I censoring, when contracts have different fixed-censoring times. When contracts or householders have their own fixed-censoring time, with different starting times and a predetermined end time of the study (e.g. 2003), may be reffered as Generalized Type I censoring. Type II censoring takes place when the study ends within a prespecified number of prepaid contracts. For example, if we have under observation 200 contracts, they are followed untill a prespecified number (e.g 100) will be prepaid. Competing risks may be represented by the interest in the estimation of the marginal distribution of a prepayment event, but some contracts are experiencing a competing event which leads to their remove from the study. When our event of interest is prepayment, such competing events might be the default, which leads to an unobservable prepayment event. A general type of censoring may be also reffered as interval censoring when a lifetime is known to be prepaid within an interval.

In Figure 3.1.1 T0 = Cr1 and T

0

= Cr3 (δ = 0) represent the right censored time for the

contracts 1 and 3 respectively, while T0 = T2(δ = 1) is the prepayment which occurred for

contract number 2.

Figure 3.1.1: Right censoring expressed for 3 different contracts

(16)

Cl. The event of prepayment occurs before being captured in the study at time Cl. The

length time from the entry till transition of the event is not known. The lifetime T is known only when T ≥ Cland in contrast to right censoring, may be represented by T

0

= max(T, Cl),

where T0 is equal to T when the lifetime is observed.

Truncation represents only the contracts whose event times are captured within a certain time interval (Fl, Fr). The contracts which are not selected in the given time interval offer no

information to the investigator. In the same manner as censoring, truncation of the survival time data is split into different categories. Left truncation takes place when Fr(right side of

the interval) is infinite. In this case only the contracts whose event time T exceeds the left truncation time Fl are observed. T is observed when Fl < T . The term delayed truncation

appears when contracts which were prepaid before the truncation time are not observed. Such contracts are usually observed until they are prepaid or are censored. On the other side, right truncation occurs when Fl = 0. The survival time is observed for T < Fr. Here,

contracts which experience the exit event prior to the moment of entry in the study are not considered in the study.

Regarding the censoring and truncation, the moment of making a transition from one state to another does not depend on transition history, prior to the the entry to the initial state captured by the study time interval. In the same manner the entry into the state which is modelled is exogeneous. The moment of prepayment is described by the explanatory variables, already presented in Chapter 2.

(17)

Considering this, the probability that a contract can last longer than a given time t will be given by: S(t) = P r(T > t) = Z ∞ t f (u)du (3.1.1) From where : f (t) = −dS(t) dt (3.1.2)

Where f (t) represents the probability density function of T evaluated at t. f (x) is a non-negative function known as the probability density function. The hazard function is formu-lated by:

h(t) = lim

∆t−>0

P r[t ≤ T < t + ∆t|T ≥ t]

∆t (3.1.3)

When T is a continuous variable, as in our case, the hazard function will take the form:

h(t) = f (t) S(t)=

−dlnS(t)

dt (3.1.4)

The cumulative hazard function H(t) is given by:

H(t) = Z t

0

h(u)du = −ln[S(t)] (3.1.5)

From where we can define again a relationship between hazard and survival function such as:

S(t) = exp[−H(t)] = exp[ Z t

0

h(u)du] (3.1.6)

(18)

ˆ S(t) =      1 if t < t1, Q ti≤t1 − di Yi if t1 ≤ t (3.1.7)

Here ti represents the unique time events that are ordered for a simplified calculation, Yi

is the number of contracts at risk and di is the number of failures at time ti . The estimator

is a function with jumps at observed times1. Non-parametrically it is very easy to estimate the survival or hazard function using these formulations. When a distribution is specified, these equations are written in a different form.

3.2 The parametric model

The parametric models used so far in the literature are the parametric Proportional Hazards Model and the Accelerated-Failure Time Model. The proportional hazards model has the same interpretation and properties as the Cox proportional hazards model, mathematically described in the following section. Its baseline hazard is parametrically estimated, making use of distributional assumptions on survival. Its interpretation depends on the hazard ratios. In order to have the opportunity to discuss prepayment from a different perspective I have chosen the second approach. To my knowledge, in the prepayment area of mortgages, this model was never used until now. In case of the Accelerated-Failure time model the assumption is not made with regard to the hazard function, as in the semi-parametric and parametric proportional hazards models. This time the effect of the risk factors can be understood as a multiplicative effect with respect to survival time. The effect changes survival time or log survival time by accelerating the time to the event of prepayment. The linear form of the model can be written mathematically as:

ln(T ) = β0X +  (3.2.1)

Here ln(T ) represents the log of survival time T , β is the vector of coefficients and repre-sents a set of parameters, X is a vector of risk factors, while  is the error term. When the logarithm form is used, the models may be reffered as log-normal, log-logistic, etc. Sometimes

1

Greenwood’s formula which is calculating the variance of the Kaplan-Meier estimator is: V [ ˆˆ S(t)] = ˆ

S(t)2P

ti≤t

di

Yi(Yi−di). The standard error of the Kaplan-Meier estimator is q

(19)

the untransformed T is used. I have decided to follow the same interpretation as in Harrel Jr. (2001) or Jenkins (2005). A general form of the model which includes the baseline survivor function S0 may be written as S(t|X) = S0(ϕt). The baseline survivor function contains the

so called acceleration factor ϕ = exp(Pi=n

i=1βiXi). Therefore, the acceleration factor depends

on the covariates and acts as a time scaling factor. When the value of the acceleration factor, also written as ϕ = exp(β), is ϕ < 1, the time to the moment of prepayment is accelerated. Contrary, when the ϕ > 1, the survival time is lengthened. Considering that a distributional assumption will be made, the accelerated-failure time model can be rewritten in a different way:

S(t|X) = ψ(ln(T ) − Xβ

σ ) (3.2.2)

Here the term ψ represents the standardised survival distribution which has to be specified and σ is the scale parameter. One of the most important steps in estimating a good para-metric model is to choose the right distribution. Distributions available so far are: Weibull, Exponential, Lognormal, Log-Logistic and Generalized Gamma2. Here the estimation of the coefficients and parameters is made using maximum likelihood estimation. Often the AFT model is compared with the parametric proportional hazards model. The risk measure is called now the time ratio, instead of hazard ratio. The time ratio may be expressed as follows:

T = exp(β0X)exp() (3.2.3)

The time ratio of two identical contracts, i and j, but with one different kthcharacteristic,

would be:

Ti

Tj

= exp[βk(Xik− Xjk)] (3.2.4)

If Xik− Xjk = 1, the time ratio would be: TTij = exp(βk). It is very interesting how the

accelerating factor varies over different covariates, and how the prepayments are slowed down or fastened. The effects will be better illustrated by fitting the models and by comparing different distributional assumptions, which will be done in the next chapter.

2

(20)

3.3 The semi-parametric model

The semi-parametric Cox model is a part of the class of Proportional Hazards models, known as the multiplicative hazard models. In the last decade it was the most used model by researchers and it was also involved in many medical or treatment studies that involved time-to-event data. Therneau and Grambsch (2000) conducted their studies on primary biliary cirrhosis by testing the semi-parametric model and offering ways of extending it. Harrell Jr.(2001) described several techniques of working with survival data by comparing different types of models and regression strategies. The semi-parametric model found its place in the economic environment as well. Lane, Looney and Wansley (1986) present the Cox proportional hazards model by applying it to the prediction of bank failures. Assuming that Tj is the time of the study of jth mortgagor, δj the indicator of prepayment for the

jth borrower and a vector of covariates or risk factors Xj(τ ) = (Xj1(τ ), ..., Xjn(τ ))t that

affect the survival distribution and the prepayment rate of a contract. The notation T0 will indicate the point in time when an event may become right censored. The vector Xj(τ )

can consist of time-dependent variables with values which are changing over time. As an example, time-dependent covariates might be described by the variation of the interest rates or refinaning incentives. By simply not introducing the notation τ among risk factors I will refer to fixed-time covariates. Consider now h(t|X) the hazard rate or the prepayment rate at time t for a mortgagor with the covariates included in X . The Cox (1972) model is:

h(t|X) = h0(t)exp(β

0

X) (3.3.1)

Here h0(t) is the baseline hazard function which is not treated parametrically and its

shape is not given. Moreover the only parametric form is assumed for the risk factors effect and thus the name semi-parametric proportional hazards model. The vector of coefficients β = (β1..., βp) represents a set of parameters. Therefore, the form of the equation 3.3.1 is:

h(t|X) = h0(t)exp( p

X

i=1

βiXi) (3.3.2)

(21)

give rise to their hazard ratios: h(t|X1) h(t|X2) = h0(t)exp( Pp k=1βiXi) h0(t)exp(Ppk=1βiX 0 i) = exp(Xβi(Xi− X 0 i)) (3.3.3)

The result of the fraction is a constant as the baseline hazard vanishes. This proportionality is the so called hazard ratio or relative risk: how the change in a covariate is influencing the event of the hazard that might possibly increase, which drives the survival function downwards. The estimation of the β factors is used by maximising a likelihood function. The function proposed by Cox (1975) is called P artial likelihood. Denoting now by X(i)p the pth covariate associated with a contract of which failure time is ti , by R(ti) a set of contracts

that are under the study at a time just prior to time event ti and assuming there are no ties

between the events and that the events are ordered by time t1 < ... < tn, the formula of

partial likelihood is then:

L(β) = n Y i=1 exp[Pp k=1βkX(i)k] P j∈R(ti)exp[ Pp k=1βkXjk] (3.3.4)

The calculation is related to contract-based factors and the information about all the contracts that are considered covariates. Then, the log-likelihood LL(β) = ln[L(β)] becomes:

LL(β) = n X i=1 p X k=1 βkX(i)k− n X i=1 ln[ X j∈R(ti) exp( p X k=1 βkZjk)] (3.3.5)

By maximising equation 3.3.4 we obtain the partial maximum likelihood estimates. The estimation of the parameters can be made either by Breslow’s or Efron’s approximation meth-ods. Breslow’s approximation reacts when ties are present between time events. In S3, within the coxph function, Efron’s method is applied by default, while Breslow’s approximation may be an option. Appendix C presents their mathematical formulation.

(22)

3.4 Modelling diagnostics and goodness-of-fit

So far in the literature more methods of assessing the adequacy of a model are presented. The majority cover the semi-parametric approach and give more flexibility in choosing its right form. On the other side, for the AFT models the adequacy covers mainly the right choice of the distributions and has more subjective methods, based on graphical analysis. I will present here few of the theoretical aspects regarding the AFT and the Cox model, while in the chapter where the empirical analysis is made they may be better highlighted.

For the Accelerated failure-time model the focus is rather on the underlying distribution of the baseline hazard function. Methods of assessing the best distributional form are given by several graphical descriptions. Regarding prepayment, the contraction of elapsed time until the event occurrence is described as a function of the risk factors. If non-linearity is not seen within the model, the risk factors affect the logarithm of time in a linear way. Therefore, a good distribution should be assessed. Graphically, plotting the log(−log(S(t))) against the log of time should give us parallelism and linearity between covariates. If one predictor is under the observation, then this can be discretised in quartiles and checked graphically. After choosing an adequate distribution one is concerned about the assessment of model fit. Using the same strategy, the standardized residuals for the AFT model are calculated. The residuals are obtained using the mathematical formula:

r = log(T ) − X ˆβ

σ (3.4.1)

The Kaplan-Meier estimate of the distribution of residuals is plotted together with the estimated survival from the model. They should behave linearly with each other and show little or no deviations. Other graphical approach is to plot both survival functions, one esti-mated with respect to the chosen distribution, and the other estiesti-mated non-parametrically. The method will be empirically tested in the next chapter. On the side of the semi-parametric model, slightly more clear methods of investigation are available.

(23)

This is often used as a test for time-dependency of risk factors. The Cox model assumes that the hazard ratio of two risk factors is constant over time. Looking at the equation 3.3.1, the baseline hazard is a function of t, but does not involve the set of risk factors found in the exponential expression. When the risk factors are time-independent the PH assumption must hold. In another case, when covariates are time-dependent the assumption is no longer satisfied by the model. The form is called the Extended Cox model. The χ2 statistic is calcu-lated for each of the risk factors in the model along with a probability value or p-value. The p-value is derived from the standard normal statistic and is given for each of the covariates. When the probability value is higher than 0.05 the proportional hazards holds and the for-mulation of the model is correct. The stronger the assumption, the closer the p-value is to 1. The p-value can be influenced by the sample size and that is because the proof of the null hypothesis does not involve a statistical test. One method I will present in the thesis will involve the calculation of the Schoenfeld residuals. The residuals developed by Schoenfeld (1982) are used for assessing the proportion of hazards by a visual technique. Therneau and Grambsch (2000) illustrate the residuals for both time-varying and fixed covariates. I denote the s unique ordered event times t1, ...ts, Xi one of the risk factors for ithand xithe weighted

mean of the risk factors Xi:

xj( ˆβ) = P j∈R(ti)exp(Xi ˆ β)Xi P j∈R(ti)exp(Xi ˆ β) (3.4.2)

Here R(ti) is the risk set of contracts at time ti. Considering now an indicator δi if the

contract i is prepaid or not the Schoenfeld residuals would be written such as:

sk= δi{Xi− xj( ˆβ)} (3.4.3)

The functional form of a covariate may be tested here using the Martingale Residuals. The method is also used in the computation of Cox-Snell residuals, computed for the overall goodness-of-fit of the model. Cox and Snell (1968) presented graphical methods for checking how well a semi-parametric model describes the data. Thus, if the model is well represented by the covariates then the cumulative hazard H(t|Xj) of a mortgagor given the set of risk

(24)

values, the residuals are defined as: rj = ˆH0(Tj)exp( n X k=1 Xjkβk), j = 1, .., n (3.4.4)

The baseline hazard can be estimated with the help of the Breslow estimator. After the calculation of the residuals rj , the estimated cumulative hazard is plotted against their

values. This should give a straight line with slope 1 starting from the origin. The Martingale residuals are then calculated using equation 3.4.4 and an event indicator of prepayment δj

for each jth individual.

ˆ Mj = δj− rj = δj − ˆH0(Tj)exp( n X k=1 Xjkβk) (3.4.5)

One of the properties of Martingale residuals is that they sum to zero, Pn

i=1Mˆi = 0.

Their value is between −∞ and 1 which makes them asymmetric around 0. The techniques are common in verifying how the semi-parametric approach performs.

3.5 Extending the modeling possibilities

Within the modelling possibilities, it is more likely to capture the prepayment better if I make use of multivariate analysis approach. Univariate analysis is easier to compute but not so informative. Some effects are captured and correlated between covariates and this represents an advantage of modelling them together. In the univariate model only one risk factor is taken into account, without involving any other partial effects on the hazard or survival functions. Moreover, the multivariable approach will be more informative in terms of interpretation of the risk factors, how they affect the prepayment rates. A mortgagor may be influenced at one point to prepay his or her mortgage as a consequence of more risk factors(e.g. a younger person may be tempted to relocate because of a new job, or having a family which involves more space, but also because of ohter factors such as the refinancing incentive). The risk factors considered here are already mentioned in Chapter 2, while interactions between them will be tested in the following Chapter.

(25)

the refinancing incentive includes the fixed interest rate period, after a while it changes. This involves a change over time of the covariate. Moreover, when the market interest rate changes over time, the refinancing incentive changes over time. A detailed analysis can be found in Therneau and Grambsch (2000). On the one side, time-dependent covariates are obviously an advantage as I can make use of the refinancing incentive of the mortgagors, but on the other side it is computationally very demanding. Fox(2002) presents briefly but thoroughly the Cox model including time-fixed and time-dependent covariates. In this study, in order to be able to manage the data, he created a function in S-plus called f old. When time-dependent covariates are necessary, the existent data expands to a different dimension. In our case, if I want to incorporate the incentive, the data will have 463, 777 entries; however, it was only 10, 000 contracts. This is mainly because each loan is considered within a time frame and the function creates a row for each period.

Another means of improvement is to stratify the model by using one or more covariates. Usually this happens when a covariate is not representative for the study, or its effect cannot be captured. Harrell Jr. (2001) states that if too much stratification is made in a model, estimates can be biased and give a wrong estimation of hazards. Regarding the graphical check of the hazard function, in this thesis, the hazard rates or prepayments are smoothed using spline functions.

(26)

4

Empirical Analysis

This chapter will be divided into three parts. The first part will present the non-paramatric survival estimates, the second part will include the estimation of the models considered under the auspices of the semi-parametric approach, while the last part will present the results given by the Accelerated-failure time model.

4.1 The Kaplan-Meier estimate of the survival function

It is useful to have a look at the non-parametric survival estimates of the contracts and the number of prepaid contracts during the time interval. The non-parametric method does not control for continuous covariates. The risk factors are categorical. The computation of the Kaplan-Meier estimates involving a covariate is the same as we estimate the Cox model without risk factors but with stratification on the respective covariate. The median duration of the whole portfolio is close to 8 years and 5 months. Table 4.1.1 includes the age, principal amount and the socio-economic status of the zip-code area. For each category the median duration and number of prepayments are presented.

Contracts Events Median Lower CI* Upper CI

All contracts 10000 3073 103 98 109 Saving Mortgage 5000 1651 114 107 121 Redemption free 5000 1422 94 88 105 Age 18 to 30 1384 550 77 72 86 Age 30 to 40 4283 1287 110 102 120 Age 40 to 48 2515 724 119 108 131 Age 48 to 90 1792 500 106 93 122 Size 0 to 38,934** 2500 856 84 76 90 Size 38,934 to 68,067 2702 871 94 86 108 Size 67,067 to 103,870 2298 750 101 92 111 Size 103,870 to 1,701,676 2500 596 148 133 169 High status*** 3214 941 121 110 133 Medium status 3211 1015 102 94 113 Low status 3210 1027 92 86 99 ∗CI - Confidence interval ∗∗The values are expressed in euro ∗∗∗Levels of the socio-economic status

(27)

After a very simple estimation we see that a total of 3, 073 contracts were prepaid between 1998 and 2003. Most of the events came from the borrowers whose age was between 30 and 40 when they initiated the contract. The prepayment in terms of the size of the principal is quite similar for all four amounts borrowed, with a lower prepayment rate for the most expensive contracts, which also proved to have a higher duration. Considering now the socio-economic status of the zip-code area where mortgagors live, we can see that the prepayment occurred approximately in the same manner with a higher number of events for people with a better financial condition. Figure 4.1.1 shows the difference in duration between the savings mortgage loan and the redemption-free mortgage.

(28)

4.2 Semi-parametric model estimation

In the case of the Cox model the easiest way to see how the behaviour of mortgagors affects the prepayment is to start with time-fixed covariates. It is useful as a starting point to see how each risk factors affects the prepayment rate of mortgages. As a consequence, the univariate and multivariate approach I have already discussed is presented in Table 4.2.1.

Univariate Analysis Multivariate Analysis Covariate β HR p-value β HR p-value Age 30-40 -0.364 0.695 0 -0.336 0.715 0.000 Age 40-48 -0.378 0.685 0 -0.397 0.672 0.000 Age 48-90 -0.262 0.769 0 -0.340 0.712 0.000 Size 38,934 to 68,067 -0.141 0.868 0.004 -0.119 0.888 0.015 Size 67,067 to 103,870 -0.136 0.873 0.007 -0.078 0.925 0.130 Size 103,870 to 1,701,676 -0.463 0.629 0.000 -0.366 0.693 0.000 Fixed rate 1-5 0.425 1.530 0.000 0.094 1.100 0.360 Fixed rate 5-7 0.179 1.196 0.077 -0.304 0.738 0.076 Fixed rate 7-10 -0.174 0.840 0.052 -0.668 0.512 0.000 Fixed rate 10-15 -0.205 0.814 0.027 0.794 0.452 0.000 Fixed rate >15 -0.714 0.4904 0.000 -1.309 0.270 0.000 Interest Rate 5.4-6.2 0.200 1.221 0.005 0.417 1.518 0.000 Interest Rate 6.2-6.9 0.367 1.443 0.000 0.657 1.931 0.000 Interest Rate 6.9-7.6 0.239 1.271 0.001 0.777 2.176 0.000 Interest Rate 7.6-10.5 0.380 1.462 0.000 1.066 2.906 0.000 Medium status 0.140 1.150 0.0017 0.104 1.110 0.020 Low status 0.177 1.195 0.000 0.082 1.090 0.055

Table 4.2.1: Survival analysis: univariate and multivariate approach

At this point an interpretation can be made by looking at the hazard ratios or exp(βi).

(29)

a model including all important or available risk factors. Harrel Jr. (2001) is reducing the number of factors of the model by estimating an amount of shrinkage. This involve the computation of the Akaike Information Criterion4. The criterion help in determining which model performs better. The AIC formula is calculated using the maximum log likelihood and the number of covariates k involved in the model:

AIC = −2loglikelihood + 2k (4.2.1)

The calculation of AIC values will not also be used in determining the best model, but also in determining which covariates are included in the model. One of the strategies is to create a stepwise function which starts by calculating the AIC values with all covariates and on each sequence moves forward by adding a covatiate, or backward by deleting it. The AIC values of the created models are then compared. The strategy Harrel Jr. (2001) is using is based on the calculation of the AIC value on χ2 scale. This is calculated with the use of the Likelihood Ratio χ2statistic, by AIC = LR − 2 ∗ k. The shrinkage estimate is then calculated by ˆγ = LRχLRχ2−k2 . This represents the quantification of the amount of overfitting present in

the model. The lowest AIC of all univariate models is given by the model using as predictor the size of the loan with a value of 48084.16 and a likelihood ratio test of 80.67. Of course, the likelihood ratio value is much bigger as the log-likelihood of the model is greater, and the AIC value of 47481.01 in this case shows that including more factors increases the accuracy in modelling the prepayment. The proportional hazards assumption already mentioned in Chapter 3, should be also tested. In the same way the covariates which were not introduced should also be incorporated by stratification. Before proceeding with the indicated steps, more modelling is needed. In Table 4.2.2 Model 1 is estimated including the age of the mortgagor, loan size and the interest rate, all of them using the same quartiles, as shown in Table 4.2.1. Model 2 uses the same covariates, except loan size, plus the socio-economic status. The second part of the table illustrates the models stratified on the type of the mortgage, with more satisfying results but not good enough.

The values of the AIC are similar and hard to judge, but after stratification, even if the proportional hazards assumption is violated, the second model including the socio-economic status of the zip code area presents better results. Applying also the shrinkage factors this

4

(30)

Models Df AIC χ2 Global PH test

Models before stratification

Model 1 10 48003.42 56.858 0.0000000142 Model 2 9 48068.62 45.0311 0.000000911

Models after stratification

Model 1 10 44419.66 45.110 0.00000208 Model 2 9 44454.25 29.108 0.000621

Table 4.2.2: Cox models with and without stratification on type of mortgage along with their global test for proportional hazards assumption

agrees with our AIC value, for Model 1 the value is 0.94, for Model 2 0.9. The closer the value to 1, the better is the fitted model. This value is used as a guiding value on how the model will predict regarding its likelihood but this is not a case we consider here. The proportional hazards assumption is mostly violated due to the change in time of the risk factors. When the assumption holds, the p-value should be higher than 5%. When the sample is large the probability value may satisfy a level of at least 1%. For the interest rate in both cases this is very close to 0. Even if we seek for more stability in the model, it already obvious that the covariates should be presented as time-dependent risk factors. In addition, more stratification can be made5. One covariate is the fixed-period of the interest rate. Interaction between variables was also tested in order to see if there is any correlation between risk factors and whether they can be modelled together or not . For both models presented earlier, the interaction of interest rate with age, the socio-economic status of the mortgagor and the loan size showed no significant results. The lowest p-value (0.296) was given by the interaction between the principal amount and the interest, in Model 1. The p-value for interaction with age and financial status for Model 2 is 0.459 and 0.510 respectively. The age and interest rate in the first model are again, non-significant, with a p-value equal to 0.456. The interaction is usually present when one of the risk factors depends upon a value of another risk factor. In our case, regarding the information given by data, such interaction might not be captured. One example can be the age and the socio-economic status which in many cases might not have anything to do with the interest rate. A mortgagor has to pay a certain interest rate based on a type of contract he or she has chosen at the beginning. In other cases the interactions between covariates might be captured in the data by other risk factors. So far, from an

(31)

interpretation point of view, the strategy was based on using categorized covariates. Further on the covariates will be continuous. The model I have reached is found in Table 4.2.3:

Covariate β HR se(β) z p-value ρ χ2 p-value

Age of mortgagor -0.010 0.989 0.002 -4.682 0.000 0.033 4.116 0.042 Socio-economic status 0.077 1.080 0.020 3.838 0.000 -0.002 0.007 0.931 Medium Rate 0.262 1.300 0.021 12.409 0.000 -0.014 0.705 0.400 AIC=34822.47 LR test =184.6 3df p=0 Wald =184.6 3df p=0 Score test =184.7 3df p=0 Global 4.693 0.195

Table 4.2.3: Cox model with stratification on type of mortgage and fixed interest rate along with test for proportional hazards assumption

(32)

Figure 4.2.1: Smoothed scaled Schoenfeld residuals for the semiparametric model with fixed-time covariates

(33)

H(t|X∗, X1) = H0(t)exp(β∗X∗)exp[f (X1)] (4.2.2)

The function f is found by the computation of semi-parametric model and then the Martingale residuals are obtained and plotted against the value of the covariate X1. This way the Martingale can be interpreted as the excess of prepayments present in the data and not captured by the model. The smooth curve indicates that sudden departures from zero do not take place and the linear behaviour is respected.

Figure 4.2.2: Martingale residuals

(34)

the model. Figure 4.2.3 shows a plot of the residuals against the cumulative hazard rate.

Figure 4.2.3: Cox-Snell Residuals for assessing the fit of the model

The estimated cumulative hazard rate plotted against the residuals should reach and not deviate too much from a straight line with slope 1 starting from the origin. The fit of the model seems very good and as consequence I will present the results in Table 4.2.4.

Fixed period interest rate Contracts Prepaid Median L 95% U 95% Savings mortgage until 1 year 90 5 - - -1 to 5 years 312 142 62 53 72 5 to 7 years 538 230 84 74 87 7 to 10 years 967 316 124 108 10 to 15 years 1489 580 139 118 161 Over 15 years 1604 378 - 194

-Redemption free mortgage

until 1 year 940 151 59 52 71 1 to 5 years 868 312 52 49 61 5 to 7 years 273 115 66 57 83 7 to 10 years 1633 494 96 87 115 10 to 15 years 685 210 124 105 169 Over 15 years 601 140 178 158 257

Table 4.2.4: Prepayment and median

(35)

Interpreting the results in Table 4.2.3 one can see that a 1% point increase in the financial possibilities measured by the socio-economic status of the zip-cod area will have an impact of a 8% increase of the prepayment rate. The age of the mortgagor, at this point, seems not to be very relevant in affecting the prepayment rate. An increase in the age will lead to a lower prepayment rate by 1.1%. The medium interest rate has the most significant impact. A 1% point increase in the medium interest rate will lead to an increase of the hazard rate by 30%. The estimates of survival and the smoothed prepayment rates for both products are shown in Figures 4.2.4 and 4.2.5.

(36)

Figure 4.2.5: Survival and hazard rates for redemption-free mortgage loans and the contracts with fixed-period interest rate

(37)

are smoothed with the use of the natural splines, known also as restricted cubic splines. The piecewise cubic polynomial involves pieces joined at unique observed values of the hazard rates. From a technical point of view, the functions used in R, the software I have used in order to create the smoothed functions, include the functions predict and sm.spline. The choice of the curve is made with respect to a first order derivative and a choice based on the number of degrees of freedom. Examples which include the use of the smoothing functions may be found in Hastie and Tibshirani (1990) and Harrel Jr. (2001).

4.3 The Cox model model with time-dependent covariates

One way of testing for time-dependency was to look at the proportional hazards assumption. When time-dependent risk factors are present the assumption does not hold anymore. In the previous models the assumption was not held, only when continous factors were involved with lot of stratification. The interest rate showed significant changes over time. Considering time-dependent covariates, the refinancing incentive may be used. A fixed-time variable imposes that its value for a given contract will be constant over time, while for time-dependency its value may differ over time. As a risk factor it cannot be captured as time independent due to the change in the interest rate and the fixed-rate period. To follow the idea of comparing the relative risk within categories of risk factors, I use a similar form of the model as in Table 4.2.3, except now the age of the mortgagor and the socio-economic status of the zip-code area are discretized. Table 4.3.1 presents the risk factors and their significance level in the model.

Risk factors β HR se(β) z Pr(>z) Savings mortgage Refinancing incentive 0.000 1.000 0.003 0.045 0.964 Age 30-40 -0.317 0.729 0.064 -4.985 0.000 Age 40-48 -0.395 0.674 0.067 -5.876 0.000 Age 48-90 -0.309 0.734 0.080 -3.869 0.000 Medium status 0.135 1.145 0.062 2.176 0.030 Low status 0.198 1.219 0.062 3.204 0.001

Redemption free mortgage

Refinancing incentive 0.037 1.038 0.004 10.169 0.000 Age 30-40 -0.199 0.820 0.087 -2.296 0.022 Age 40-48 -0.223 0.800 0.084 -2.638 0.008 Age 48-90 -0.183 0.832 0.076 -2.422 0.015 Medium status 0.066 1.068 0.068 0.970 0.332 Low status 0.105 1.111 0.068 1.557 0.119

(38)

In our case we did not capture the presence of time-independent covariates. Considering now equation 3.3.2, the hazard at time t depends always on the value of the risk factor Xi(t). At the same time there is one coefficient that characterises the risk factor Xi(t). The

extended data set is large in comparison with the previous one. Graphical tests to gauge the goodness-of-fit of the model are uninformative and may be rather subjective when such large amount of data is exposed. Interpreting the results in Table 4.3.1, in the case of the savings mortgages the refinancing incentive is not significant. Its hazard ratio may be reflected by the type of the contract, as this loans involve other savings accounts and the regular payments are determined in order to be able to fully pay the loan at the end of the term. On the other side, the refinancing incentive is significant for the interest-only contracts, a 1 percentege point increase in the refinancing incentive will increases the prepayment by 3.8%. The socio-economic status of the zip-code area shows a higher prepayment rate in the are with a low and medium status, with respect to the houses situated in the high status area. For the redemption-free contracts this covariate is insignificant in the model, even though the effect is smaller in comparison with the other type of contracts. The age of mortgagor, with respect to a younger category is slowing down the prepayment in all its quartiles. It might be interesting to see how the median duration changed. This is presented in Table 4.3.2.

Product Events Median Lower 95% cl Upper 95% cl The Savings Mortgage 1615 116 109 120 The Redemption-free Mortgage 1356 106 97 118

Table 4.3.2: Predicted median duration for the time-dependent Cox model

(39)

Figure 4.3.1: Survival probabilities for both product types including time-dependent risk factors

(40)

4.4 Accelerated-Failure Time model

The accelerated-failure time model in terms of interpretations studies the effect of the covari-ates with respect to the survival time. The starting point is the analysis of Harrel Jr.(2001)6. The first step in estimating a parametric model is to choose the right distribution. One strat-egy is to plot the cumulative hazard function of one covariate against the logarithm of time. The plotted lines may be parallel to each other and may exhibit a good fit of a distribution.

Figure 4.4.1: Comparison of the distributional assumptions for the refinancing incentive in case of Savings mortgage product

In Figure 4.4.1 it is obvious that the nonparametric approach is the last that has to be considered as the lines when the Weibull distribution is considered are almost everywhere parallel with few exceptions. A good parallelism can be seen in the logistic approach but this also corresponds to the case of the Gaussian distribution. For the redemption-free mortgages

(41)

the parallel lines are shown in Figure 4.4.2 again with a more stringent view in the Weibull model.

Figure 4.4.2: Comparison of the distributional assumption for the refinancing incentive in case of Redemption-free mortgage product

(42)

Model Log likelihood AIC Savings mortgage Exponential -16711.89 33437.77 Weibull -16254.58 32525.15 Log-Normal -16515.92 33047.84 Log-Logistic -16603.78 33223.55

Redemption free mortgage

Exponential -13802.76 27619.53 Weibull -13796.84 27609.67 Log-Normal -14722.66 29461.31 Log-Logistic -14836.19 29688.38

Table 4.4.1: Comparison of AIC values for different distributional assumptions of AFT model

The AIC value for each model related to a different distribution in Table 4.4.1 shows that the Weibull model fits best with the data. For the redemption-free the Weibull distribution was close to the exponential one, where the scale is set to 1 and the log likelihood value changes a little. I consider the Weibull formulation superior for both. Firstly it is worth it to look at how the time ratio between covariates influences prepayment. In Table 4.4.2 both models are presented and show very interesting results regarding the ”contraction” or ”stretching” of survival time.

Risk factors β Std. Error TR[exp(β)] P-value Savings mortgage Refinancing incentive 0.011 0.0015 1.13 0.000 Age 30-40 0.228 0.0321 1.26 0.000 Age 40-48 0.373 0.0337 1.45 0.000 Age 48-90 0.308 0.0401 1.36 0.000 Medium status -0.061 0.0313 0.94 0.051 Low status -0.117 0.0312 0.89 0.000 Redemption-free mortgage Refinancing incentive -0.028 0.003 0.82 0.000 Age 30-40 0.045 0.080 1.00 0.573 Age 40-48 0.067 0.078 1.05 0.389 Age 48-90 -0.003 0.070 1.07 0.957 Medium status -0.045 0.064 1.02 0.472 Low status -0.015 0.063 0.97 0.807

Table 4.4.2: The estimated AFT models by product type

(43)

expressed also as a time ratio and shows how the survival time is ”stretched” or ”contracted”. One example, provided by Allison (1995), illustrates within the aforementioned formulation, how dogs age faster than humans do. Consider 1 year of a dog is equivalent with 7 years for a human, the acceleration factor is then ϕ = 7. Looking at the mathematical formulation of the model, S0(t) represents the survival probability of a human, while S(t|X) is the survival

(44)

Figure 4.4.3: The standardised residuals from Weilbull model plotted along with the survival distribution of the covariates - Savings and Redemption-free mortgages

In Figure 4.4.3, for both loan types, the lines along the residuals represent the levels of the covariates. The estimation of survival under both Weibull models is shown graphically in the next figure.

(45)

Harrel Jr. (2001) presents the graphical method of estimating the survival times as another goodness-of-fit test. The estimated survival function is plotted along with the non-parametric survival. If they do not deviate too much from each other, the model fits the data well. In the case of the Redemption-free contracts the model seems to underestimate the survival probabilities and tends to deviate a bit. After the inspection of the model, one might be concerned in the median duration of one portfolio. The Savings mortgage is a shorter contract, while the redemption-free is slower in terms of redeeming the principal amount. To see the median duration, in the case of AFT models with Weibull specification this is computed by ˆT0.5|X = exp[X ˆβ + ˆσψ−1(0.5)]. Table 4.4.3 depicts the median duration

as estimated by the models, both including the same risk factors.

Product Median duration (months) The Saving mortgage 115

The Redemption free mortgage 156

Table 4.4.3: Median duration for the Weibull models

(46)

4.5 Summary

(47)

Savings Mortgage Redemption-Free Mortgage Months AFT Cox Non-Parametric AFT Cox Non-Parametric

12 0.992 0.984 0.986 0.969 0.958 0.963 24 0.969 0.937 0.940 0.930 0.888 0.893 36 0.932 0.870 0.871 0.889 0.804 0.811 48 0.884 0.792 0.792 0.847 0.728 0.731 60 0.825 0.720 0.720 0.804 0.664 0.661 72 0.759 0.656 0.652 0.762 0.605 0.589 84 0.688 0.604 0.599 0.721 0.556 0.533 96 0.615 0.554 0.548 0.680 0.523 0.485 108 0.541 0.520 0.514 0.641 0.499 0.455 120 0.469 0.486 0.481 0.604 0.470 0.422 132 0.401 0.457 0.453 0.567 0.448 0.392 AIC 32525.15 23660.3 - 27609.67 19015.77 -Median 115 121 114 156 106 94

Table 4.5.1: Survival probabilities, AIC and Median values estimated by the Non-parametric, AFT and Cox models

(48)

Figure 4.5.1: The Cox, AFT and Non-Parametric estimated survival probabilities for both products

(49)

5

Conclusions

Final remarks always come with more opportunities in extending the models. The purpose of this thesis was to compare two different methods of investigating the early prepayment of mortgages, namely the Cox model and the Accelerated-Failure Time model. In order to address this purpose three sub-questions were formulated, in 5 chapters. Chapter 1 included a brief introduction of the thesis and the description of the research questions. Chapter 2 inquired the aspects of the mortgage market in the Netherlands, the literature review and the description of the data. While, Chapter 3 discussed the mathematical formulation of the models, Chapter 4 elaborated such formulations empirically. In this regard, the study was based on a large number of contracts, with a set of well-defined risk factors.

The first sub-question was developed in Chapter 4. By using the non-parametric estimates we could see how the categories within the risk factors affected the survival. Subsequently, the distinction between the univariable and multivariable analysis had been highlighted. In order to capture and estimate how the behaviour of mortgagor may affect the prepayment rate, all risk factors had been elaborated. The significance within the model of the covariates was tested. Using a step function, the covariates were excluded and added within the model by comparing the AIC value of each constructed model. The models with the lowest AIC value were preferred. In a similar vein, the calculation of the AIC values on χ2 scale helped us quantify the overfitting present in the model by estimating a shrinkage factor. Stratification on the product type and fixed-interest rate period was considered when the covariates were time fixed.

(50)

4 different distributions were calculated. The standardised residuals were plotted for both products along with the survival distribution of the covariates. The overall goodness-of-fit of the selected Weibull model was checked, as in Harrel Jr. (2001), by plotting the estimated survival function along with the non-parametric survival curves.

The Sub-question 3 was highlighted by the interpretation of the estimated results. The Cox and AFT models have different interpretations, but similar properties. The effect size for the Cox model is measured by the hazard ratio. For the AFT model, the interpretations are given by an acceleration factor, which decelerates or accelerates the survival. In Table 4.3.1, where Cox model was presented for the savings mortgages, an increase in the age of the mortgagor, at category of 48 to 90 years, with respect to the younger group between 18 and 30 years old, will decrease the hazard by 26%. In the same context, the AFT model prolonged the survival time of a contract by an acceleration factor of 1.36. The important factor on both sides was the socio-economic status of the zip code area, while the refinancing incentive, for this type of product was insignificant. For redemption-free contracts, we confirm the importance of the refinancing incentive factor among mortgagors. A 1% point increase in this covariate will increase the prepayment rate by 3.8%, when Cox model is estimated. For the AFT model, this shortened the survival time of the contract by an acceleration factor of 0.82. The summary from Chapter 4, part 5, included the interpretation of the model and exposed how the models describe the prepayment in terms of survival probabilities and estimated median duration.

(51)

may ignore it. One big disadvantage when the semi-parametric model is used is the propor-tional hazards assumption. However, using the time-dependent covariates in a model has two advantages. One is the incorporation of the refinancing incentive and the other implies the fact that the proportional hazards assumption is not a matter of concern anymore. With regard to one of the models’ diagnostics, the AFT model has less developed techniques than the Cox model has. However, when the distribution is not well specified for AFT, the model may lead to wrong estimates. So far, the Cox model is more used in the literature. From this point of view, but also statistically, according to the estimates, Cox model is preferred. Approaching a model that adapts mortgages at a loan-level, seems to be very informative, and allows the mortgagee to have a more in-depth study over the mortgagor’s behaviour. Even if the results were very informative, some of the risk factors were not significant within the models, regarding each products’ specification. For such problems more modelling is required. Next section will give more opportunities to develop and improve the modelling of early prepayment of mortgages.

5.1 Alternative extensions and improvements in modelling early prepay-ment

Regression Assumptions. In this thesis the linearity was assumed within the models. When modelling continuous variables, relaxing the linearity assumption might be very helpful. One way to do this is to use a spline functions or spline transformations of covariates. A restricted cubic spline function may be a very useful approach. In the case of the Cox model, when we have 2 covariates, its form can be written as:

h(t|X) = h0(t)exp(βX1+ f (X2)) (5.1.1)

Here a four-knot spline function f (X2) would contain two spline components X

0

2 and X

00

2

and may be written as : f (X2) = β2X2+ β3X

0

2 + β4X

00

2. This way a test of linearity may

be assessed. Harrel Jr. (2001) provides a good theoretical background in terms of spline functions and how the linearity can be tested. The same applies for AFT model.

Unobserved Heterogeneity. The differences between contracts are captured by the vector of risk factors in each model. Some covariates which were not observed might have an effect on survival. Such unobserved contracts’ effects may be referred as ”F railties”. One representation of the model may be: h(t|X) = h0(t)exp(β

0

(52)

frailties and is independent and identically distributed with some prespecified distribution. Other way to write the model is: h(t|X) = h0(t)uiexp(β

0

Xi). As example of , ui’s can be a

i.i.d sample of gamma random variables with density function:

g(u) = u

1

θ−1exp(−u/θ)

Γ[1/θ]θ1/θ (5.1.2)

Looking at the equation, the frailty distribution contains the variance θ. A large value of θ will show a great degree of heterogeneity among contracts. Survival time of contracts may be independent of each other. When this assumption is not respected, such approach will lead to a better analysis.

Modelling the interest rate. Modelling and forecast the yield curve related to the market interest rate may provide a good future form of the refinancing incentive. Considering that the mortgage is a high-value financial contract settled for a long period, forecasting market interest rates might be very helpful. The model I suggest is the Nelson-Siegel model. Diebold and Li (2003) highlight its aspects and interpretations in more detail. The corresponding yield curve is:

yt(t) = β1t+ β2t (1 − e−λtτ) λtτ + β3t( (1 − e−λtτ) λtτ − e−λτ) (5.1.3)

The factors β1t, β2t and β3t as known as latent dynamic factors. The first one, also

interpreted as the level, is indicating an equal increase in all yields. The second one represents the slope and when it increases, it has a stronger effect on short-term yields rather than long yields. The third one, related to yield curve curvature puts more weight on medium-term yields. Equation 2.3.1 includes the market interest rate, and if the portfolio is up to date, it will be interesting to adapt such a model in order to evaluate how the refinancing incentive can affect prepayment.

Duration of a contract. From the perspective of a mortgagee the duration analysis of a contract and its interest rate can impose high risk both on assets and liabilities. One important tool in evaluating the duration of a financial instrument is The Maccauley dura-tion: D =PT

t=1t

CFt (1+r)t

P . Here t represents the maturity of individual cash-flows, CFt is the

(53)
(54)

References

[1] Bagdonavicius, V., Nikulin, M. (2002) Accelerated Life Models Modeling and Statistical Analysis Chapman and Hall/CRC, London

[2] Bradburn, M.J., Clark et al (2003) Survival Analysis Part I,II,III and IV: Multivariate data analysis - an introduction to concepts and methods British Journal of cancer

[3] Brennan, M. and Schwartz, E.S. (1985) Determinants of GNMA mortgage prices AREUEA Journal 13, 209

[4] Bussel, A.P.J.M. (1998) Valuation and interest rate risk of mortgages in The Netherlands Ph. D. thesis, University of Maastricht

[5] Charlier, E. and Bussel, A. van (2001) Prepayment behaviour of Dutch mortgages: an empirical analysis Real Estate Economics

[6] Cziraky, D. PhD Forecasting the Yield Curve with S-Plus Wilmott magazine

[7] Diebold, F. and Li, C. (2003), Forecasting the Term Structure of Government Bond Yields University of California,Riverside and University of Pennsylvania and NBER

[8] Diez, D. Survival Analysis in R

[9] Dunn, K. and McConnell, J. (1981) Valuation of GNMA mortgage-backed securities The Journal of Finance 36, 599-616

[10] Fox, J. (2002) Cox Proportional-Hazards Regression for Survival Data( Appendix to An R and S-PLUS Companion to Applied Regression)

[11] Golub, W.B. and Pohlman, L. (1994) Mortgage prepayments and an analysis of the Wharton prepayment model Interfaces 24, 282-296

[12] Green, J. and Shoven J.B. (1986) The effects of interest rates on mortgage prepayments Journal of Money, Credit, and Banking 18, 41-59

(55)

[14] Harrel, Jr., F.E. (2001), Regression Modeling Strategies : With applications to Lin-ear Models, Logistic Regression, and Survival Analysis Vanderbilt University School of Medicine, Springer, New York

[15] Hayre, L. (2003) Prepayment modeling and valuation of Dutch mortgages The Journal of Fixed Income 12, 25-47

[16] Hastie, T. J. and Tibshirani, R. J. (1990) Generalized Additive Models Chapman and Hall

[17] Hull, J. (2009) Options, futures and other derivatives, 7th edition Upper Saddle River, New-Jersey

[18] Jenkins, S. P., (2005) Survival Analysis

[19] Kalbf’leisch, J.D. and Prentice R. (2002) The Statistical Analysis of Failure Time Data-Second Edition Wiley-Interscience, New-Jersey

[20] Kang, P. and Zenios, A.S., (1992) Complete prepayment models for mortgage-backed securities Management Science 38, 1665-1685

[21] Klein, J.P., Moeschberger, M. (2003)Survival Analysis, Techniques for Censored and Truncated Data - Second edition Springer, New York.

[22] Kleinbaum, D. and Klein, M. (2005) Survival Analysis, A self learning text - Second edition Springer Science+Business Media, New York

[23] Koning, R., Sterken, E. and Jacobs, J., (2007) Modelling prepayment risk of mortgage loans

[24] Lane, W.R. , Looney, S.W. et al (1986) An application of the Cox Proportional Hazards model to Bank Failure Journal of Banking and Finance 10/1986, 511-531

[25] Resti, A., and Sironi, A. (2007) Risk management and shareholders’ value in banking. From Risk Measurement Models to Capital Allocation Policies Wiley Finance, England [26] Richand, S.F and Roll R. (1989) Prepayments on Fixed-Rate Mortgage-Backed

Referenties

GERELATEERDE DOCUMENTEN

Doordat de energiepompwerking van het Helium van eindige grootte is (door beperkt toegevoerd vermogen middels de electroden) zal bij toenemende in- tensiteit Ii

propose an algorithm to rank the nodes of a network based on the decrease in the convergence speed of the average consensus algorithm (ACA) for each possible node removal.. They

Usually, estimation of the metabolite concentrations is done by fitting an appropriate physical model function to the acquired MRS signal by a search in the corresponding

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded from: https://hdl.handle.net/1887/4918..

6 Overdispersion Modelling with Individual Deviance Effects and Penalized Likelihood 97 6.1

I illustrate how to estimate the model based on the partial likelihood, discuss the choice of time functions and give motivation for using reduced-rank models when modelling

Starting point is a Cox model with p covariates and time varying effects modeled by q time functions (constant included), leading to a p × q structure matrix that contains

We minimize the criterion function (26) for each of the simulated samples b = 1, ..., B, where we use the inverse of the conditional second moment matrix as the weighting matrix,