• No results found

Modelling expectations of death in the pension insured population

N/A
N/A
Protected

Academic year: 2021

Share "Modelling expectations of death in the pension insured population"

Copied!
96
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Modelling expectations of death

in the pension insured population

P.A.Klaren s1376039

Master Thesis, March 2008

Econometrics and Operations Research Faculty of Economics and Business Rijksuniversiteit Groningen

Supervision:

(2)
(3)

Contents

1 Introduction 1

1.1 Key Notions . . . 1

1.2 Risk of longevity and risk of short life . . . 3

1.3 Research objectives and organization of the thesis . . . 7

1.3.1 Research question . . . 8 1.4 Remarks . . . 9 2 Data description 10 2.1 Male participants . . . 10 2.2 Female participants . . . 10 2.3 Male survivors . . . 11 2.4 Female survivors . . . 12 3 Mortality Rates 13 3.1 Introduction . . . 13

3.2 Estimating the expectation of death . . . 13

3.3 Confidence intervals for the expectation of death . . . 15

3.4 Application to the data . . . 16

4 Generalized Linear Models 19 4.1 Introduction . . . 19

4.2 Normal Linear Model . . . 19

4.3 Weighted least squares . . . 20

4.3.1 Properties of distributions in the exponential family . . . 22

4.3.2 Maximum likelihood estimation . . . 23

4.4 Binomial distribution . . . 26

4.5 Application to the data . . . 29

5 B-splines 31 5.1 Introduction . . . 31

5.2 Polynomial interpolation . . . 31

5.3 Piecewise polynomial interpolation . . . 33

(4)

5.5 Application to the data . . . 37

6 P-splines 40 6.1 Introduction . . . 40

6.2 P-splines . . . 40

6.3 Penalized likelihood and the choice of the penalty parameter . . . 43

6.4 Application to the data . . . 45

7 Measuring relative expectations of death 48 7.1 Introduction . . . 48

7.2 Testing the difference in expectations of death between various years . . . 49

7.3 Annual Exposure Mortality Ratio . . . 50

7.4 Alternative ratios . . . 51

7.5 Application to the data . . . 52

8 Conclusions and recommendations 54 A Appendix to chapter 1 56 A.1 Derivation of the expected number of survivors’ pension payments . . . 56

B Appendix to chapter 3 58 B.1 Derivation of formula (3.3) . . . 58

B.2 Derivation of equality (3.7) . . . 58

B.3 Derivation of the limits of confidence interval (3.10) . . . 59

B.4 The figures showing the results of applying equalities (3.3), (3.5) and (3.10) 60 C Appendix to chapter 4 66 C.1 Derivation of formula (4.5) . . . 66

C.2 Derivation of equalities (4.9) and (4.10) . . . 67

C.3 Derivation of equality (4.12) . . . 68

C.4 Derivation of equality (4.14) . . . 69

C.5 Derivation of equality (4.16) . . . 69

C.6 Derivation of equality (4.33) up to (4.35) . . . 70

C.7 The figures belonging to the fitted model (4.40) . . . 71

D Appendix to chapter 5 75 D.1 Derivation of the uniqueness of the interpolating polynomial . . . 75

D.2 Properties of B-splines . . . 76

D.3 The figures belonging to the fitted model (5.15) . . . 77

E Appendix to chapter 6 80

(5)

Chapter 1

Introduction

In this chapter a short introduction into pension insurance will be given, as far as this con-cerns the problem handled in this work. First a number of key notions will be explained. After that the problem handled, the research objectives and organization of this thesis will be set out.

1.1

Key Notions

• Someones age during a calender year is considered to be the number of years he or she has lived at the end of that calender year. So if a person was born in 1984, his or her age is considered to be 24 during the entire year 2008.

• Pension is the collective noun for periodical payments that replace former income out of labour in case of old-age, decease or disability. Common features of pension are that the payments end when the person receiving the pension deceases and that the accrual takes place in connection with labour. This last feature distinquishes pension from social insurance and annuities.

• A pension scheme is the total of agreements that an employer and an employee have concerning the pension of the employees. In a pension scheme, the definition of age can differ from the one used here.

• A pension claim is a right to future pension payments on the basis of a pension scheme. • A pension premium is a periodical payment to the company administering the pension plan in order to finance the pension claims. An example of how to determine the height of a pension premium will be given later in this introduction. A pension premium is called a net premium if it is equal to the expected value of the pension payments. If there is just one period of pension accrual, the premium is called a single premium.

• An early leaver is an employee who leaves his employer for any reason other than de-cease before he reaches the pensionable age as describen by the pension scheme.

(6)

participants. In this work, only people who obtain pension claims on the basis of their own labour are considered to be participants.

• A pension fund is a fund that collects money resulting from a pension scheme in order to fullfill the pension claims that result from the pension scheme.

• A mortality rate is considered to be the observed number of deaths over a period of a calender year in a closed group of people divided by the total number of people present in the group at the beginning of that calender year.

• An expectation of death is an expected mortality rate and will be denoted by qx for

men and qy for women. In this notation, x and y represent the age of the persons involved.

• A mortality table is a statistical overview of an expectation of death of a group of people, for instance men in the Dutch population. The most recent Dutch mortality ta-bles are the “Gehele Bevolking Mannnen (GBM)” and the “Gehele Bevolking Vrouwen (GBV)”, established over the period 2000-2005. These mortality tables have been made by the “Actuarieel Genootschap (AG)”, that is why these tables are called “AG tables”. An example of a mortality table will be given later in this introduction.

• The reserve is the amount of money that, together with the future payments resulting from the pension scheme, should be present in a pension fund in order to fullfill the future pension claims resulting from the pension scheme.

• An old-age pension is a pension that is destined for the financial care of a participant who reaches the pensionable age for old-age pension as described by the pension scheme. • A partners’ pension is a pension that is intended for the financial care of the partner of a participant with whom the participant shares a household. A partners’ pension is paid from the moment the participant deceases. A partner of a participant is not a participant by him- or herself.

• An orphans’ pension is a pension that is intended for the financial care of a child of a participant. The orphans’ pension is paid from the moment the participant deceases and to the moment the child steps across the pensionable age as described for orphans’ pension by the pension scheme. A child of a participant is not a participant by him- or herself. • Survivors’ pension is the collective noun for orphans’ pension and partners’ pension. •The pension insured population is formed by participants together with survivors of participants.

• The risk of longevity is the possibility that a pension insured person, either partici-pant or survivor, may live longer than could be expected for on the basis of the handled mortality table. This risk is especially relevant in old-age pensions. The concept of risk of longevity will be further explained in the following section.

(7)

1.2

Risk of longevity and risk of short life

In table 1.1 an example of a mortality table is given. The figures in the table are collected from the “AG Prognosetafel” for the Dutch population in 2007, although some decimals have been left out for clarity. There are two things striking in this table:

1. The expectation of death increases (not strictly) by age.

2. The expectation of death is nowhere higher for women than it is for men at the same age.

The first characteristic is always apparant for ages over 25 and both sexes. The second characteristic is apparant in every Dutch mortality table.

Age qx qy 53 0.0040 0.0030 54 0.0050 0.0040 55 0.0050 0.0040 56 0.0060 0.0040 57 0.0060 0.0050 58 0.0070 0.0050 59 0.0080 0.0050 60 0.0090 0.0060 61 0.0100 0.0060 62 0.0110 0.0070 63 0.0120 0.0070 64 0.0130 0.0080 65 0.0150 0.0090

Table 1.1: An example of a mortality table: AG Prognosetafel 2007

In order to explain the concepts of risk of longevity and risk of short life, consider a group of a thousand men of age 55. They all receive a single old-age pension payment of 161,546 Euro on the first of january of the year that they turn 66. This means that they all have age 66 at that moment.

(8)

We do have to take into account the interest rate. The 10-year discount rate is equal to: 1

1.0410 = 0.6756

So that each of these men has to pay a net single premium of:

0.6756 ×916.30 × 161, 546

1000 = 100, 000 Euro

Now suppose that these thousand men all have a wife of age 53. We now consider a thousand couples. Suppose they all want a partners’ pension of one payment of 181,825 Euro for their wives in case they decease. The wife of a participant who deceases receives the payment. Suppose the payment takes place at the end of the year in which the man deceases.

This means that a wife will have to survive the year in which her husband deceases in order to receive the payment. Suppose they agree that payment will only occur if a participant dies before the first of january of the year that he turns 66, so for a period of ten years. Assume that the interest is again constant at 4% and that mortality table 1.1 is used. Also assume that the probability of death of a man and his wife are independent. What would the net premium for this partners’ pension be?

According to mortality table 1.1, the expected number of wives that receive this pension:

1000 × qx=55× (1 − qy=53) = 4.985 (1.1)

The expected number of payments at the end of the second year, based on mortality table 1.2, is given by the following equations:

1000 ×(1 − qx=55) × qx=56× (1 − qy=53) × (1 − qy=54) = 5.928

The equations above are true since payment at the end of the second year can only happen to wives who survive their first year together with their man. In the second year, the man has to decease and his wife has to survive in order to receive the payment. The calculations for the third to the tenth year will not be displayed here but can be found in the appendix of chapter 1. The total expected number of wives who receive the payment is equal to 81.41.

The net single premium that the men have to pay for this partners’ pension is equal to:

0.6756 ×81.41 × 181, 825

1000 = 10, 000 Euro

(9)

expected number of pay-outs differs a lot.

Also notice that the pension premiums above have been based on the expectations of death as described by mortality table 1.1. What would happen if these expectations of death do not match the mortality rates?

Suppose that mortality table 1.1 overestimates the mortality rates for men of each age by 20%, so that the mortality rates for these thousand men are 80% of the expectations of death in mortality table 1.1. Also assume that the expectations of death concerning the thousand wives are correct for each age. The correct mortality table would have looked like mortality table 1.2.

If the net pension premiums for the old-age pension would be calculated on the basis of

Age qx qy 53 0.0032 0.0030 54 0.0040 0.0040 55 0.0040 0.0040 56 0.0048 0.0040 57 0.0048 0.0050 58 0.0056 0.0050 59 0.0064 0.0050 60 0.0072 0.0060 61 0.0080 0.0060 62 0.0088 0.0070 63 0.0096 0.0070 64 0.0104 0.0080 65 0.0120 0.0090

Table 1.2: The correct expectations of death for the thousand couples.

the mortality table 1.2, the expected number of payments for the old-age pension would be given by the following equations:

1000

64

Y

x=55

(1 − qx) = 932.52

So the net single premium for the old-age pension should be:

0.6756 ×932.52 × 161, 546

1000 = 101, 770 Euro

This means that the net single premium should be 1.77% higher than it would be on the basis of mortality table 1.1. In total, the pension fund has a shortage of:

(10)

This difference shows that overestimating mortality rates can lead to a situation in which a pension fund does not receive enough premium to build its reserve. That is why this is called a risk of longevity.

What would the influence of overestimating mortality rates be on the premium for the partners’ pension?

The total expected number of payments based on the mortality table 1.2, is given 65.63. The calculation of the expected number of payments is the same as in formula 1.1 and so can be found in the appendix of chapter 1. The net single premium should be:

0.6756 × 65.63 × 18, 825

1000 = 8, 061 Euro

The equalities above show that calculations based on the correct mortality table lead to a pension premium that is 19.4% lower than the original pension premium, based on mor-tality table 1.1. In total, the pension fund a surplus of:

1000 × (10, 000 − 8, 061) = 1, 940, 000 Euro

One can see that if overestimating the mortality rates of the participants leads to a surplus on partners’ pension, underestimation will lead to a shortage. That is why the possibility of people living shorter than was expected is called a risk of short life in this case. So far, we have only considered a bias in the expectations of death concerning the partic-ipants. In case of a survivors’ pension, the expectations of death of the survivors matter as well. As one can see from the examples discussed here, the possibility of people living longer has two opposite effects on a pension fund. For an old-age pension, the effect is a shortage and for a partners’ pension, the effect is a surplus. What effect is dominant to a pension fund, depends on the composition of pensions with pension claims to match that a pension fund has.

In order to determine reserves, AG tables are most common to use. These tables are based on mortality rates of the Dutch population over the past few years. However, apply-ing these tables has a disadvantage: the construction of these tables is based on mortality rates of the entire Dutch population.

However, not everybody in the Dutch population is a participant. Since pension funds are primarily interested in the mortality rates of participants, it could be that their needs are not well served by the AG tables.

This would be the case if the AG tables systematically under- or overestimate the mortal-ity rates of participants. One of the questions to be answered in this work is whether the expectations of death of participants are well described by the AG tables. The following paragraph will explain that there are reasons to question the applicability of AG tables to pension premium calculations.

(11)

mortality rates. Employment itself is both a cause and a result of a better health. Much research has been done on the relationship between employment and expectations of death. Below I quote a few of the articles on this subject. I consider three reasons to assume that AG tables may overestimate the mortality rates of participants:

1. Employment is a cause of a better health. It is a cause since “full-time employment predicts a significantly slower decline in health for both sexes” and “the changes in health of persons part-time employed generally fall somewhere between those of persons employed full-time and those not employed”. [Ross and Mirowsky (1995)]

2. Employment is also a result of health since “initial health significantly improves the odds of full-time employment, adjusting for initial employment”. [Ross and Mirowsky (1995)] 3. Phelan et al. (2004) have shown that for more preventable causes of death, the mortality rates strongly depend on a persons’ socio-economic status and income. Their hypothesis was that people with a higher socio-economic status and a higher income have access to more resources that protect health. Their results support this hypothesis. The same con-clusion is drawn by Hadley and Osei (1982), who state that “their results tend to support the hypothesis that higher income is associated with lower mortality rates”.

Beside these three reasons, Houston (1959) has also found that mortality rates are signifi-cantly lower for groups of insured people than for the whole population.

These factors together give reason to think that AG tables may overestimate mortality rates of participants. Consequently, a risk of longevity is present. This is good reason to have a closer look at mortality rates of participants.

However, participants are not the only group of interest to pension funds. As explained before, the mortality rates of survivors are also relevant. This is why we will also have a closer look at the mortality rates of survivors. Together, participants and survivors will be named the pension insured population. Strictly this is not correct since people who have a latent survivors’ pension are also insured for their pension. However, since pension funds do not keep track of mortality rates of this particular group, they are left out of this work.

1.3

Research objectives and organization of the thesis

(12)

1.3.1

Research question

Our main goal is to see whether using AG tables causes a risk of longevity or short life for pension funds. Therefore we want to measure the relative expectation of death in the pension insured population. This way we can determine whether the expectations of death in the pension insured population are well represented by AG tables.

In order to answer our main research question, we will have to answer the sub questions below.

For the pension insured population we will have to determine our own expectations of death. Therefore we need a model to describe the exectations of death. Define the smoothness of a function f : IR 7→ IR on an interval (a,b) with a, b ∈ IR and a < b byRb

a{f (x )(2 )}2dx .

The smaller this integral is, the more smooth is the function f . Define simplicity of a function as requiring few parameters. We assume expectations of death can be represented by a simple and smooth function.

We will derive maximum likelihood estimators for the expectations of death. This way we will see how the estimated expectations of death in our data population proceed over the ages.

We want to have an upper and a lower confidence limit of the expectations of death in our data population. These limits can be used by pension funds for prudence rules. Therefore we will derive confidence intervals.

After this, we will start searching for an appropriate model for the expectations of death. The question to be answered is: how much smoothness should we impose on the expecta-tions of death in our data? We will consider mortality rates to be generated by a random variable. The variance in the outcomes of that random variable will decrease as the num-ber of observations increases. Therefore mortality rates in a large population will generally proceed smoother over the ages than in a small population. This is important since we have a have a relatively small population. Our population is relatively small since we only use the pension insured population of The Netherlands and make a subdivision into that population. Therefore it is possible that we need estimates that do not stick as strong to the observations as the estimates would in a large population. At the same time we want to avoid overfitting. Overfitting means that in estimating the expectations of death, the estimated values stick too close to the observations.

First we will apply a generalized linear model with only two parameters to the ex-pectations of death. For this first model, exex-pectations of death are fairly simple and smooth represented.

(13)

compared to B-splines. Smoothness however can be increased. At the end end of this chapter we will conclude which model is most suitable to fit our data.

After we have chosen the best model to fit our data, we will determine the relative expec-tations of death in the pension insured population. The question to be answered is in what way we can measure this the best. We will compare three different indices and choose the best for our data.

1.4

Remarks

• Forecasting expectations of death will not be a goal in this work. Requirements of pension funds for forecasts of expectations of death are hard to meet since forecasts up to 50 years ahead are required for determining pension premiums and reserves. Human mortality so far ahead depends on the impact of for instance medical advances, new infectious diseases and even disasters. [Currie et al. (2004)] Another argument to not choose for forecasting is our lack of historical data for the Dutch insured population.

(14)

Chapter 2

Data description

In this chapter, the data used in this work will be described. As explained in the intro-duction a subdivision into men and women, participants and survivors will be made. The data are collected from 12 Dutch pension funds, over 2005 and 2006. The data over 2005 are from 9 pension funds, the data over 2006 are from 12 pension funds.

The pension funds differ considerably in number of insured people, ranging from 71.441 to 1.240.685.

The data correspond to lives, not to claims. Therefore someone with more than one pen-sion claim will be counted for just one life.

The mortality rates in the pension insured population under study will not be displayed in this chapter. Later on when the methods will be applied, the mortality rates will be shown. For now, we will focus on the number of participants in the data set and their ages.

2.1

Male participants

The male participants are made up out of three groups: male employees, male early leavers and male participants receiving pension other then survivors’ pension.

In total there were 2.949.887 male participants at the beginning of 2005 and 3.281.582 at the beginning of 2006 included in this work. In figure 2.1 the subdivision over all ages has been displayed.

2.2

Female participants

The female participants are made up out of three groups: female employees, female early leavers and all female participants receiving pension other then survivors’ pension.

(15)

Figure 2.1: Respective the number of male participants in 2005 and 2006. has been displayed.

Figure 2.2: Respective the number of female participants in 2005 and 2006.

2.3

Male survivors

The male survivors are made up out of males receiving orphans’ pension or partners’ pension.

(16)

Figure 2.3: Respective the number of male survivors in 2005 and 2006.

2.4

Female survivors

The female survivors are made up out of all females receiving orphans’ pension or partners’ pension.

In total there were 195.850 female survivors at the beginning of 2005 and 256.935 at the beginning of 2006 included in this work. In figure 2.4 the subdivision over all has been displayed.

(17)

Chapter 3

Mortality Rates

3.1

Introduction

In this chapter the derivation of an estimator for the expectation of death will be consid-ered. This way we can see how smooth the expectations of death in our data population proceed over the ages. After that we will derive confidence intervals for the expectations of death in order to have upper and lower limits for the expectations of death. Confidence intervals can be useful in determining prudence rules for using the estimated expectations of death.

First we will have to introduce some notation. Consider a pension insured population in year t and all at age xi. Denote by:

• Rt

xi, i = 1 , .., N the number of people in the population at the beginning of year t at age

xi with R, xi, t , N ∈ IN and Ω ={1,..,R};

• At

xi the observed number of deceased at age xi in year t with xi, t ∈ IN and A ∈ Ω ;

• qt

xi the expectation of death at age xi in year t with q

t

xi ∈ [0, 1].

The expectation of death will be considered the probability of death for each individual. From now on, the age xi and the year t will be left out of this notation since they are

not relevant for the calculations in the context of this chapter. By that we mean that the calculations in this chapter are the same for every age and every year.

3.2

Estimating the expectation of death

Denote by K the random variable that represents the number of deaths that occur in a year in the population. So A is a realization of K . Assume the probability of death for the individuals to be independent from each other. Then K can be considered the sum of Bernoulli random variables Xi, i = 1, ..., R, each with probability q ∈ [0, 1]. Thus Xi has

expectation q and variance q(1 − q).

(18)

equality:

P (K = k) = R!

k!(R − k)!q

k(1 − q)R−k

The expectation of K can be deriven by the following equalities:

E[K] = E " R X i=1 Xi # = R X i=1 E[Xi] = Rq (3.1)

The variance of K can be deriven analogously since the Bernoulli trials are independent from each other

Var[K] = Var[ R X i=1 Xi] = R X i=1 Var[Xi] = Rq(1 − q) (3.2)

In practice, q is unknown. All we can observe are R and A. From this we can derive a maximum likelihood estimator q∗ ∈ [0, 1] for q.

When deriving a maximum likelihood estimator, the probability distribution (3.1) is re-garded as a measure L(q) of agreement between any nominated value of q and the obser-vations. Hence explaining the name likelihood. It is then quite natural to select the point with the highest likelihood if an estimate of q is required. [Azzalini (1996)]

The maximum likelihood estimator q∗ for q is given by the following equality:

q∗ = A

R (3.3)

The derivation of the maximum likelihood estimator q∗ can be found in the appendix of chapter 3. Since we can write K =PR

i =1Xi, A is a realization of PRi =1Xi. Therefore we

get the following equalities:

E[q∗] = E[ R X i=1 Xi/R] = E[ R X i=1 Xi]/R = q

Thus q∗ is an unbiased estimator of q. The variance of the maximum likelihood estimator q∗ is found by the following equalities:

Var[q∗] = Var[ R X i=1 Xi/R] = Var[ R X i=1 Xi]/R2 = q(1 − q)/R

Since we do not know the true value of q, we can substitute it by the maximum likelihood estimator q∗. The variance of q∗ can therefore be estimated by the following equality:

Var[q∗] ≈ q

(1 − q)

(19)

3.3

Confidence intervals for the expectation of death

In this section whenever we do not refer to a source we will follow the treatment of con-fidence intervals by Forfar et al. (1988). The estimator of the variance of q∗ in equation (3.4) above can be used to derive a confidence interval for q. Denote by zα that number

such that the area under the standard normal density function to the right of zα is α. If

we assume that the mortality rate is one value from a normal distribution for which the variance is given by equation (3.4), then the appropriate confidence interval for the mean of the distribution is given by:

 q ∗− z α " q∗(1 − q∗) R #1/2 , q∗+ zα " q∗(1 − q∗) R #1/2  (3.5)

We can use the maximum likelihood estimator q∗ = A/R in this confidence interval so that the confidence interval (3.5) can be rewritten to:

  A − zα[A(1 − (A/R))] 1/2 R , A + zα[A(1 − (A/R))] 1/2 R   (3.6)

However, the limits of this confidence interval can be outside the interval of [0,1]. This is not realistic since q is bounded to this interval by definition. The upper limit of confidence interval (3.5) is greater than 1 if the following inequality holds:

A > R

(1 + (z2 α/R))

(3.7) The lower limit is negative when the following inequality holds:

A < z 2 α 1 + (z2 α/R) (3.8) The derivation of equalities (3.7) and 3.8) can be found in the appendix of chapter 3. One can see that the limits of the confidence bound can be outside the interval [0,1] when expectations of death are very high or very low. For instance if one wants to construct a 95% confidence interval for the expectation of death in a population of a thousand men. In that case zα =1.96 and R =1000. The equations above would give an upper limit of

confidence interval (3.6) greater than 1 if the following inequality holds:

A > 1000

1 + (1.962/1000) = 996

The upper limit of the confidence interval (3.6) would be negative if the following inequality holds:

A < 1.96

2

(20)

Applying confidence interval (3.5) has three disadvantages:

1. The confidence bounds can be outside the interval [0,1]. Since the expectation of death is bounded to this interval by definition, confidence bounds should be in this interval as well.

2. This method assumes the variance is equation (3.4) to be correct. This is a disadvantage since equation (3.4) is only an estimate of the true variance of q∗.

3. The assumption of a normally distributed q∗.

The first and second disadvantage described above can be tackled by the following ap-proach, which depends solely on a normal approximation of the distribution of q∗. In formulae (3.1) and (3.2) it has been derived that A has expected value Rq and variance Rq(1 − q). If we now assume A to be normally distributed, a α%-confidence interval has to satisfy the following inequality:

A − Rq [Rq(1 − q)]1/2 < zα (3.9)

For zα >0, inequality (3.9) can be rewritten to form a α%-confidence interval for q that is

given by the following lower and upper bound: [Forfar et al. (1988), pg. 11] 2A + z2 α− zα[zα2 + 4A(1 − (A/R))]1/2 2(R + z2 α) ,2A + z 2 α+ zα[z2α+ 4A(1 − (A/R))]1/2 2(R + z2 α) ! (3.10) The lower limit of the confidence interval (3.10) is nonnegative and the upper limit of the confidence interval (3.10) is never greater than 1. The proof of these statements can be found in the appendix of chapter 3.

3.4

Application to the data

In this section we will apply the methods discussed earlier in this chapter. For all of the data, the maximum likelihood estimators q∗ and the confidence intervals (3.5) and (3.10) have been determined. In figure 3.1 the results are shown for the male and female participants in 2005. In the left graphs, the confidence intervals constructed by formula (3.5) are shown. In the right graphs, the confidence intervals constructed by formula (3.10) are shown. The figures for all the other available data and the S-Plus script producing these figures can be found in the appendix of chapter 3.

The maximum likelihood estimators for the expectation of death are shown by the open circles. In the left figures, the confidence intervals (3.5) are shown and in the right figures, the confidence intervals (3.10) are shown. The lower limits of the confidence intervals are shown by the dashed lines, the upper limits by the solid lines.

(21)

Figure 3.1: Above: maximum likelihood estimators and confidence intervals for the expec-tations of death for male participants 2005. Below: maximum likelihood estimators and confidence intervals for the expectations of death for female participants 2005.

naked eye. We have used a normal scale for the expectations of death since this seems to give the best interpretable confidence bounds. Also, it is not possible to show non-positive confidence bounds on a logarithmic scale.

The estimated expectations of death seem to proceed fairly smooth up to high ages. To see that this is not true for all ages, we have shown the estimated expectations of death for the same categories of data on a 10 log scale in figure 3.2.

The two types of confidence intervals only seem to differ if the expectations of death are at the limits of the interval [0,1]. Confidence intervals constructed by (3.5) have bounds outside the interval [0,1] in some cases. Confidence intervals constructed by (3.10) do not have bounds outside the interval [0,1], as was expected.

(22)
(23)

Chapter 4

Generalized Linear Models

4.1

Introduction

In this chapter some of the concepts of generalized linear models will be explained. We will follow the treatment by Dobson (2002). At the end of this chapter we will apply a linear logistic model to the data under study. We have chosen for this type of model since it is mathematically one of the simplest ways to model expectations of death, using only two parameters.

The concepts discussed in this chapter will be necessary in applying the models in this chapter and in chapters 5 and 6. This chapter is aimed at doing just that and therefore does not give a complete outline of generalized linear models. Since generalized linear models are an extension to the normal linear model, we will start with this simplest model.

4.2

Normal Linear Model

Consider:

• Y = [Y1...YN] with N ∈ IN to be independent random variables which are assumed to

share the same probability density function with realizations yi, i = 1, .., N

• xi=[xi1...xip] with xij ∈ IR, j = 1, ..., p; p ∈ IN explanatory variables with xi the i -th

col-umn of the regression matrix X ; • βT=[β

1...βp] with βj ∈ IR, j = 1, ..., p; p ∈ IN parameters;

• T=[

1, ..., N] with i = 1, ..., N independent, indentically distributed random variables

with i ∼ N (0, σ2) with realizations ei, i = 1, ..., N .

Y and X will also be called respectively the random component and the systematic component of the model. The normal linear model is specified as following:

(24)

The normal distribution has the following probability density function: f (y; u, σ) = 1 (2πσ2)1/2 exp[− 1 2σ2(y − µ) 2] (4.2)

Equality (4.2) can be rewritten as:

f (y; u, σ) = exp{(yµ − µ2/2)/σ2− 1 2[y

22+ log(2πσ2)]} (4.3)

Usually (4.1) is written in the following form:

y = Xβ +  (4.4)

The maximum likelihood estimator ˆβ for β is given by the following equality:

ˆ

β = (XTX)−1XTy (4.5)

The derivation of this estimator can be found in the appendix of chapter 4. Consider I the N × N unity matrix. We will introduce the hat matrix H here since it will be useful later on. H is defined by the following equality:

H = X(XTX)−1XT (4.6)

The hat matrix is idempotent and has rank p. The vector of residuals e can be written in the following form:

e = y − X ˆβ = y − X(XTX)−1XTy = [I − H]y

The squared residuals eTe can also be expressed in terms of the observations y and the

hat matrix H . The relation is given by the following equalites: eTe = ([I − H]y)T([I − H]y) = yT[I − H]y

4.3

Weighted least squares

Generalized linear models extend the normal linear model in the following two ways: 1. The response variables Yi need not be normally distributed. Instead, the Yi have

distributions that are part of the exponential dispersion family of distributions. The properties of these distributions will be discussed further on in this chapter.

2. The relationship between the response variables Yi and the explanatory variables xi

need not be of the simple linear form (4.1). Instead there can be a monotone, continuous, differentiable function g : IR 7→ IR relating E (Yi) = µi to the linear component xTi β, that

is:

(25)

This function g is called the link function. For the normal linear model a possible link function is the indentity function, as we saw in equality (4.1):

g(µi) = µi

Consider a single random variable Y whose probability density function f : IR 7→ IR+

de-pends on two parameters θ, φ ∈ IR. Denote by a, b, c : IR 7→ IR known functions. Then a probability density function belongs to the exponential dispersion family if it can be written in the following form:

f (y; θ, φ) = exp[yθ − b(θ)

a(φ) + c(y, φ)] (4.7)

In this relation, θ is the canonical parameter and represents the location. φ is called the dispersion parameter and represents the scale. Various members of the exponential family are defined by specifying a, b and c.

The normal distribution is part of the exponential dispersion family of distributions. The probability density function can be written in the form of equality (4.7) as we did in equal-ity (4.3) using the following parameters: [McCullagh and Nelder (1989)]

θ = µ, b(θ) = µ2/2 φ = σ2, a(φ) = φ c(y, φ) = −1

2[y

22+ log(2πσ2)]

For convenience we will deal with a subclass of the exponential dispersion family, the so-called exponential family. Moreover, we do not need the exponential disperison family when dealing with the binomial distribution as we will when modelling expectations of death.

Consider a single random variable Y whose probability density function f : IR 7→ IR+

de-pends on one single parameters θ ∈ IR. Denote by a, b, c, d : IR 7→ IR known functions. Then a probability density function belongs to the exponential family if it can be written in the following form:

f (y; θ) = exp[a(y)b(θ) + c(θ) + d(y)] (4.8)

The dispersion parameter φ is not part of this definition. Therefore the normal distribution cannot be written in the form of equality (4.8) unless the variance σ2 is assumed to be a constant or nuisance parameter. If we do that, we can represent the equality (4.3) in this form by choosing the following functions a, b, c and d :

(26)

In case a(y) = y, the exponential family is said to be in canonical form. By choosing the right link function, it is always possible to convert a probability density function in the exponential family into its canonical form. The link function that does this is called the canonical link function. For the normal linear model, the unit link function is the canonical link function.

4.3.1

Properties of distributions in the exponential family

For the normal linear model, the parameters β can be estimated by solving the maximum likelihood equations in a closed form. For generalized linear models, there is usually no closed form solution to those equations. To obtain an estimator ˆβ for β we will use the Newton-Raphson method. To apply this method we will make use of the properties of probability density functions in the exponential family. We start with the expectation and variance of a(Y ), which are given by:

E[a(Y )] = −c0(θ)/b0(θ) (4.9)

Var[a(Y )] = b

00(θ)c0(θ) − c00(θ)b0(θ)

[b0(θ)]3 (4.10)

The derivation of these formulae can be found in the appendix of chapter 4. We proceed with the likelihood of a distribution in the exponential family. From (4.8), the log-likelihood l as a function for a distribution in the exponential family is given by:

l(y; θ) = a(y)b(θ) + c(θ) + d(y)

The derivative of l (y; θ) with respect to θ, also called the score statistic U is given by: ∂l(y; θ)

∂θ = U (y; θ) = a(y)b

0(θ) + c0(θ)

The score statistic can be seen as a random variable depending on Y , that is:

U (Y ; θ) = a(Y )b0(θ) + c0(θ) (4.11)

Equality (4.11) leads us to the following property of U that we will use later on.

E(U0(Y ; θ)) = −J (4.12)

(27)

4.3.2

Maximum likelihood estimation

Consider the generalized linear model specified as following: E(Yi) = µi

g(µi) = xTi β

Where the Yi are distributed according to a probability density function from the

expo-nential family. Define ηi by the equality

g(µi) = ηi (4.13)

If g is the canonical link function, we can rewrite a probability density function (4.8) in the exponential family to:

f (y; θ) = exp[yη + c(θ) + d(y)]

The goal of this section is to find the maximum likelihood estimatorsmaximum likelihood estimator ˆβj for the parameters βj, j = 1, .., p with βj ∈ IR. For each Yi, the log-likelihood

function li for a single observation is given by:

li(yi; θi) = yib(θi) + c(θi) + d(yi)

From this we can deduce the following equality for Uj:

Uj(yi; θi) = N X i=1 " (yi− µi) Var (Yi) xij ∂µi ∂ηi # (4.14) The derivation of equality (4.14) can be found in the appendix of chapter 4.

The maximum likelihood estimator ˆβj is found by setting the score statistic in equality

(4.14) equal to zero. [Dobson (2002)] We will use the Newton-Raphson method to do this. Define the N × N diagonal matrix W by the elements:

wii= 1 Var(Yi) ∂µi ∂ηi !2 (4.15) Then the variance matrix J can be written as following:

J = XTW X (4.16)

The derivation of equation (4.16) can be found in the appendix of chapter 4. Also, U can be written in matrix notation as:

U = XTW (y − µ) ∂η ∂µ

!

(28)

We will not discuss the Newton-Raphson method. Instead we will simply give the iterative procedure used to find the solution for U (y; θ) = 0. Denote by ˆβ(m) the vector of estimates

of the parameters β1, .., βp at the m-th iteration. Consider J(m), U(m), l(m)(y; θ), µ(m),

V(0 ), η(m) and W(m) respectively the information matrix, score statistic, log-likelihood, fitted values, the variance function of y, the linear predictor and the weight matrix evalu-ated at ˆβ(m). Then the iterative equation following from the Newton-Raphson method

is given by: ˆ β(m) = βˆ(m−1)− U (m−1) U0(m−1) = ˆβ (m−1)+ U(m−1) J(m−1) (4.18)

The starting point for this iterative procedure is an estimate µ(0 ). We can use the data themselves for this, possibly adjusted to prevent us from trying to evaluate undefined function values. From this first estimate µ(0 ) we can derive η(0 ), ∂η∂µ(0 )(0 )



and V(0 ). These first two estimates can be found by the relation (4.13) and V(0 ) can be found by using the assumed distribution of Y . After that we can determine W(0 ), J(0 ) and U(0 ) using to

equalities (4.15), (4.16) and (4.17).

Multiplying both sides of equality (4.18) by J(m−1 ) gives

J(m−1)βˆ(m) = J(m−1)βˆ(m−1)+ U(m−1) (4.19)

Therefore ˆβ(1 ) can be found by:

ˆ

β(1) = J

(0)βˆ(0)+ U(0)

J(0) (4.20)

Consider the difference δ ˆβ(m) = ˆβ(m)− ˆβ(m−1 ) between two consecutive estimates. This

adjustment δ ˆβ(m) to the previous estimate ˆβ(m−1 ) is given by the following equality:

J(m−1)δ ˆβ(m) = U(m−1) = XTW(m−1)(y − µ(m−1)) ∂η (m−1) ∂µ(m−1) ! (4.21) Notice that J(m−1)βˆ(m−1) = XTW(m−1)X ˆβ(m−1) = XTW(m−1)η(m−1) (4.22) Therefore the new estimate ˆβ(m) = ˆβ(m−1 )+ δ ˆβ(m) satisfies the equations

(29)

l(m)(y; θ) − l(m−1 )(y; θ) is sufficiently small. [McCullagh and Nelder (1989)] Define a new vector of dependent variables z by:

z(m)= η(m)+ (y − µ(m)) ∂η

(m)

∂µ(m)

!

(4.24) Notice that z is formed by applying a linearized form of the link function to the data: [McCullagh and Nelder (1989)]

g(y) ≈ g(µ) + (y − µ)g0(µ) = η + (y − µ) ∂η

∂µ

!

(4.25)

It then follows from equality (4.24) and equation (4.23) that ˆβ can be found by solving the following iterative procedure:

ˆ

β(m) = (XTW(m−1)X)−1XTW(m−1)z(m−1) (4.26) Equation (4.26) has the same form as equation (4.5) for the normal linear model except that there is now a weight matrix W included. Also in this regression z is the dependent variable, not y. Equation (4.26) has to be solved iteratively since in general z and W both depend on the fitted values µ as can be seen from equalities (4.15) and (4.24). The hat matrix H for this generalized linear model is given by:

H = X(XTW X)−1XTW

The vector of residuals can be expressed in terms of the hat matrix H and the observations z as following:

e = z − X ˆβ = z − X(XTW X)−1XTW z = (I − H)z (4.27)

eTe = ([I − H]z)T([I − H]z) = zT[I − H]z (4.28)

The fit of the maximum likelihood estimator ˆβ is often assessed using the deviance D , also called the log-likelihood statistic. The deviance is a measure of how good a model fits compared to a model in which there are N parameters. This final model will be called the full model. Define ˆβmax the maximum likelihood estimator of β in the full model.

Then the deviance D is defined by the following equation:

D = 2[l( ˆβmax) − l( ˆβ)] (4.29)

Formula (4.29) shows that small values of D indicates there is no evidence of a lack of fit of the model. In order to determine what is small, we need a sampling distribution of D . We will not derive the sampling distribution of D here, the reader is referred to Dobson (2002). D asymptotically has a χ2(N − p, v )-distribution, where v is the non-centrality

(30)

The other important measure of discrepancy is the generalized Pearson χ2 statistic. Consider µi, i = 1, .., N the fitted mean values and V the estimated variance for the

dis-tribution involved. Then the generalized Pearson χ2 statistic takes the following form:

[McCullagh and Nelder (1989)]

χ2 = N X i (yi− µi)2 V (µi) (4.30)

χ2 asymptotically has a χ2(N − p, v )-distribution.

Another way of assessing the fit of our model is to look at the residuals. Although there are several types of standardized residuals known in the literature, we will discuss only one. The Pearson residual χi is given by the following equality:

χi =

(yi− µi)

q

V (µi)

(4.31)

The Pearson residual is the simple residual scaled by the estimated standard deviation of yi. The Pearson residual is closely related to the Pearson χ2 statistic through the following

relation: [McCullagh and Nelder (1989)]

χ2 = N X i=1 χ2i (4.32)

4.4

Binomial distribution

In chapter 3 we have seen that the number of deceases in a population can be modelled by a binomially distributed random variable. In this section we will see that the binomial distribution is part of the exponential family. After that, the score statistic, information and the deviance for the binomial distribution will be derived.

Let the random variable Kxi be the number of deceases in a population of Rxi ∈ IN people

of age xi; i = 1, .., N with N ∈ IN . The population has expectation of death qxi ∈ [0, 1].

Then the probability of kxi ∈ IN deaths is given by the following probability density function

f : [0, 1] 7→ [0, 1]: f (qxi; kxi) = Rxi! kxi!(Rxi− kxi)! qxkxii (1 − qxi) Rxi−kxi

For this binomial distribution, kxi, i = 1, .., N are the observations that replace yi in the

previous section. qxi, i = 1, ..., N are the parameters of interest that will be modelled.

However, qxi does not replace µi in the previous section. Since E(Kxi) = Rxiqxi, we do not

model the expected number of deaths. Instead we model the expectations of death qxi. Rxi

is considered to be a nuisance parameter, comparable to the variance σ2 of the normal

(31)

The binomial probability density function can be rewritten in the form of equality (4.8) as following: f (qxi) = exp " kxiln qxi− kxiln(1 − qxi) + Rxiln(1 − qxi) + ln Rxi! kxi!(Rxi− kxi)! !# with a(kxi) = kxi, b(qxi) = ln  qxi 1−qxi  , c(qxi) = Rxi ln(1 − qxi) and d (kxi) = ln  Rxi! kxi!(Rxi−kxi)!  . Therefore the binomial distribution is part of the exponential family. The binomial distri-bution has g(qxi) = ln



qxi 1 −qxi



as its canonical link function. This is called the logit link function. The log-likelihood function l for a single observation is given by:

li(qxi; kxi) = kxiln qxi − kxiln(1 − qxi) + Rxiln(1 − qxi) + ln

Rxi!

kxi!(Rxi− kxi)!

!

The score statistic Uj(Kxi; qxi) is found by applying equality (4.14) with yi = kxi, µi = Rxiqxi

and Var(Yi) = Rxiqxi(1 − qxi): Uj(Kxi; qxi) = N X i=1 " (Kxi− Rxiqxi) Rxiqxi(1 − qxi) xij ∂(Rxiqxi) ∂ηxi # = N X i=1 [(Kxi − Rxi)xij] (4.33)

Applying equality (C.18) gives the information J and the weight matrix W : Jjk = N X i=1 xijxik Rxiqxi(1 − qxi) ∂(Rxiqxi) ∂ηxi !2 = N X i=1 xijxikRxiqxi(1 − qxi) (4.34)

Therefore the weights wii in W for the binomial distribution are given by:

wii = Rxiqxi(1 − qxi) (4.35)

The derivation of equalities (4.34) and (4.35) can be found in the appendix of this chapter. Since E(Kxi) = Rxiqxi we get E(U (Kxi; qxi)) = 0, as was expected.

Filling in equality (4.25) gives the following dependent variable zxi in the maximum

likeli-hood estimation for the binomial distribution: zxi = ηxi +

kxi − Rxiqxi

Rxiqxi(1 − qxi)

(4.36) Notice that the logit function is not defined for qxi = 0 and qxi = 1. Therefore we will not

use qxi = kxi/Rxi as first estimators for qxi. Instead we will use q

(0 )

(32)

This first estimator qx(0 )i takes on values in the interval (0, 1).

If the response variables Kxi, i = 1, .., N with N ∈ IN are independent and Kxi ∼ Bin (Rxi, qxi)

then the log-likelihood function l (qxi; kxi) is given by:

l(qxi; kxi) = N X i=1 kxiln qxi − kxiln(1 − qxi) + Rxiln(1 − qxi) + ln Rxi! kxi!(Rxi− kxi)! !

In a full model, β = [q1∗, ..., qN∗]. The maximum likelihood estimators are qx

i = kxi/Rxi as

derived in chapter 3. So the maximum value of the log-likelihood function l ( ˆβmax) that

belongs to the full model is given by: l( ˆβmax; kxi) = N X i=1 kxiln kxi Rxi ! − kxiln Rxi − kxi Rxi ! + Rxiln Rxi− kxi Rxi ! + ln Rxi! kxi!(Rxi − kxi)! !

For any other model with p < N , p ∈ IN parameters, let ˆqxi denote the maximum likelihood

estimators for the probabilities and let ˆkxi = Rxiˆqxi denote the fitted values. Then the

log-likelihood function evaluated at these values is given by:

l( ˆβ; kxi) = N X i=1 kxiln ˆ kxi Rxi ! − kxiln Rxi − ˆkxi Rxi ! + Rxiln Rxi − ˆkxi Rxi ! + ln Rxi! kxi!(Rxi− kxi)! !

Therefore the deviance is given by:

D = 2[l( ˆβmax; kxi) − l( ˆβ; kxi)] = 2 N X i=1 " kxiln kxi ˆ kxi ! + (Rxi − kxi) ln Rxi− kxi Rxi− ˆkxi !#

Filling in equality (4.30) with yi = kxi, ˆµi = Rxiˆqxi and Var(ˆµi) = Rxiˆqxi(1 − ˆqxi) gives us

the Pearson χ2 statistic for the binomial distribution:

χ2 = N X i=1 (kxi − Rxiqˆxi) 2 Rxiqˆxi(1 − ˆqxi)

D and χ2 asymptotically have a χ2(N − p, v )-distribution. In case of the binomial

dis-tribution, v = 1. D and χ2 are asymptotically equivalent for the binomial distribution. However there is some evidence to suggest that χ2 is often better than D because D is

unduly infuenced by very small frequencies. However, both the approximations are poor if the expected number of deceases is too small. [Dobson (2002)]

Applying equality (4.31) gives the Pearson residual χi for the binomial distribution:

χi =

(kxi − Rxiqˆxi)

q

Rxiqˆxi(1 − ˆqxi)

(33)

4.5

Application to the data

In this section we will apply the methods discussed in this chapter to our data. Therefore we will fit a generalized linear model to the mortality data used in this work. We have seen in chapter 3 that qxi = Axi/Rxi is an unbiased estimator for qxi, so that E(q

xi) = qxi.

We will fit the following relation:

g(qxi) = β1+ β2xi (4.38)

Where xi, i = 1, .., N with N ∈ IN is again the age of the population of size Rxi. Since we

fit two parameters, p = 2. N differs per categorie of data, since the number of available ages is not the same for every categorie of insured people. We will use the logit link function. By inverting the logit function we can also write the relation (4.38) as following:

qxi =

exp(β1+ β2xi)

1 + exp(β1+ β2xi)

(4.39) For the male participants in 2005, ˆβ1 = -10.665, ˆβ2 = 0.102 and N = 90. D = 777.5 and

χ2 = 721.5, both giving a p-value of 0.00. This indicates that there is evidence of a lack

of fit.

For the female participants in 2005, ˆβ1 = -12.725, ˆβ2 = 0.121 and N = 90. D = 692.7 and

χ2 = 933.6, both giving a p-value of 0.00. This indicates that there is evidence of a lack

of fit.

Figure 4.1 shows the results for the male and female participants. The open circles are the maximum likelihood estimators qx

i and the solid line is the fit

ˆ qxi =

exp( ˆβ1+ ˆβ2xi)

1 + exp( ˆβ1+ ˆβ2xi)

(4.40) The expectations of death are plotted on a 10 log scale.

(34)

Figure 4.2: The Pearson residuals for respectively the male and female participants in 2005.

What strikes in figure 4.1 is that the fit is especially not so good for the lowest ages. However, if we look at the pearson residuals in figure 4.2 we see that the standardized residuals are the largest for the highest ages.

For the other categories of data, the results are shown in the appendix of this chapter. For almost all models, the deviance test and the Pearson χ2 statistic indicate a lack of fit.

This is why we will use another model in the following chapter.

(35)

Chapter 5

B-splines

5.1

Introduction

In this chapter piecewise polynomial interpolation will be introduced to model expectations of death. It appears that B-splines provide a basis for the space of piecewise polynomial functions. We will largely follow the treatment of splines by De Boor (1978). After B-splines have been introduced we will apply them to our data. This way we can see whether piecewise polynomial interpolation provides a better tool to model expectations of death in relatively small populations than logistic regression.

5.2

Polynomial interpolation

First we have to define the concept of a polynomial function. Consider • βs ∈ IR; s = 1, .., k ; k ∈ IN ;

• a polynomial function of order k to be a function p : IR 7→ IR of the following form: p(x) =

k

X

s=1

βsxs−1

Note that p(x ) has a degree smaller than k . Consider

• a knot sequence (tj)j +kr =j: a finite sequence of real points, not necessarily distinct.

• the function p : IR 7→ IR to agree with the function g : IR 7→ IR at t if for every point that occurs m times in the sequence (tr)j +kr =j the following equality holds:

∂(j−1)

∂x(j−1)p(tr) =

∂(j−1)

∂x(j−1)g(tr); for j = 1, ..., m (5.1)

If tj, .., tj +k are distinct points and g(tr) are given data, there exists exactly one polynomial

p of order k + 1 for which p(tr) = g(tr), ∀r ∈ [j, .., j + k ]. The proof of this statement can

be found in the appendix of chapter 5. Now consider

(36)

coefficient of the polynomial of order k + 1 which agrees with g at the knots tj, ..., tj +k.

It is denoted by

[tj, ..., tj+k]g (5.2)

If some or all of the points tj, ..., tj +k coincide, this definition is to be taken in the sense

of relation (5.1). Now we will state an important property of divided differences that we will not prove here but that will be useful in later derivations. Take k , r , s ∈ IN . Then for every given function g : IR 7→ IR and any two distinct points tr, ts in the sequence tj, .., tj +k

the following equality holds: [De Boor (1978), pg. 8] [tj, ..., tj+k]g =

[tj, ..., tr−1, tr+1, ....tj+k]g − [tj, ..., ts−1, ts+1, ....tj+k]g

ts− tr

(5.3) From relation (5.2) it follows that if a polynomial pj agrees with g at t1, ..., tj for j = k

and j = k + 1 then

pk+1(x) = pk(x) + (x − t1)...(x − tk)[t1, ..., tk+1]g (5.4)

For, we know that pk +1 − pk is a polynomial of order k + 1 which vanishes at t1, .., tk and

has [t1, ..., tk +1]g as its leading coefficient. Therefore pk +1 − pk must be given by:

pk+1(x) − pk(x) = (x − t1)...(x − tk)[t1, ..., tk+1]g

From this property we can show that divided differences can be used to build up the inter-polating polynomial by adding the interpolation points one at a time. The interinter-polating polynomial pk +1 of order k + 1 that agrees with g at t1, ..., tk +1 can therefore be written

in its Newton form: [De Boor (1978), pg 4]

pk+1(x) = k+1

X

j=1

(x − t1)...(x − tj−1)[t1, ..., tj]g(x) (5.5)

Where (x − ti)...(x − tj) = 1 for j < i . We will give an example of how to find the

di-vided differences of a function g by adding one interpolation point at a time. From this analysis we will find the unique polynomial pk +1 of order k + 1 that agrees with g at

tr, r = j , .., j + k .

Example 1. Consider g(t , x ) = (t − x )3+. We will fix x and consider g(t , x ) as a func-tion of t alone. The resulting number of course depends on the particular value of x we chose. Consider x = 3 so that g(t ) = (t − 3)3

+ and consider the knot sequence

t1 = 2, t2 = 3, t3 = 4, t4 = 5. We can derive the divided differences one at a time:

• a polynomial of order 1 is a constant and g(2) = 0 so [t1]g(t ) = 0.

• the agreeing polynomial of order 2 at t1, t2 is a linear function given by:

(t − t1)

g(t1) − g(t2)

t1− t2

(37)

g(t1) − g(t2) = 0 and thus [t1, t2]g(t ) = 0.

• the agreeing polynomial p3(t ) of order 3 at t1, t2, t3 must be of the form c(t − 2)(t − 3)

with a constant c ∈ IR since g(t ) vanishes at t1 and t2. Setting p3(t3) = g(t3) gives c =

1/2. Therefore [t1, t2, t3]g(t ) = 1/2.

• (p4(t ) − p3(t )) must be of the form c(t − 2)(t − 3)(t − 4) with a constant c ∈ IR since

(p4(t ) − p3(t )) vanishes at t1, t2 and t3. Setting p4(t4) − p3(t4) = g(t4) gives c = 5/6.

Therefore [t1, t2, t3, t4]g(t ) = 5/6.

From this analysis one can conclude that p4 is given by:

p4(t) = p1(t) + (p2(t) − p1(t)) + (p3(t) − p2(t)) + (p4(t) − p3(t))

= 0 + 0 + 1/2(t − 2)(t − 3) + 5/6(t − 2)(t − 3)(t − 4)

5.3

Piecewise polynomial interpolation

For a smooth and efficient interpolating we will now go to piecewise polynomial interpo-lating with higher order pieces. The most popular choice is a piecewise cubic interpointerpo-lating function. In order to explain this type of interpolating functions, we will first introduce the notion of piecewise polynomial functions.

• Consider the knot sequence (tj)n+1j =1, tj ∈ IR ∀j .

• Pj(x ); j = 1, .., n a sequence of n ∈ IN polynomials, each of order k ∈ IN .

A piecewise polynomial function f : IR 7→ IR of order k is defined by the following relation:

f (x) = Pj(x) if tj < x < tj+1; ∀j ∈ [1, ..., n] (5.6)

So f (x ) is continuous if Pj(tj +1) = Pj +1(tj +1) ∀j ∈ [2, ..., n]. If f is not continuous it has

two values at the inner knots t2, .., tn, namely Pj(tj +1) and Pj +1(tj +1). In order to obtain

a single valued function we arbitrarily choose to make f right continuous, that is

f (tj) = Pj+1(tj). (5.7)

We denote the collection of all piecewise polynomial functions of order k with knot sequence (tj)n+1j =1 by IPk,t. We will give now an example of piecewise polynomial interpolation.

Example 2. Consider the real numbers (τi)ni =1 and a function g. Given the data

g(τ1), ..., g(τn) with a = τ1 < ... < τn = b, we construct a piecewise cubic polynomial

inter-polant f to g as following. On each interval [τi, τi +1] we have f agree with some polynomial

Pi of order 4:

(38)

The i -th polynomial piece Pi is made to satisfy the following conditions:

Pi(τi) = g(τi), Pi(τi+1) = g(τi+1), i = 1, ..., n − 1

Pi0(τi) = si, P

0

i(τi+1) = si+1, i = 1, ..., n − 1 (5.9)

Where si, i = 1, ..., n are free parameters. The resulting piecewise polynomial function f

agrees with g at τ , is continuous and has a continuous first derivative on [a, b]. Notice that we can freely choose the slopes si, i = 1, .., n.

We can compute the i -th polynomial piece Pi by using its Newton form:

Pi(x) = Pi(τi) + (x − τi)[τi, τi]Pi+ (x − τi)2[τi, τi, τi+1]Pi+ (x − τi)2(x − τi+1)[τi, τi, τi+1, τi+1]Pi

We determine its coefficients using equality (5.3) and based on the data in equalities (5.9): [τi]Pi = g(τi) and [τi+1]Pi = g(τi+1)

[τi, τi]Pi = si and [τi+1, τi+1]Pi = si+1

[τi, τi+1]Pi = [τi, τi+1]g

[τi, τi, τi+1]Pi = ([τi, τi+1]g − si)/∆τi

[τi, τi+1, τi+1]Pi = (si+1− [τi, τi+1]g)/∆τi

[τi, τi, τi+1, τi+1]Pi = (si+1+ si− 2[τi, τi+1]g)/(∆τi)2

Where ∆τi = τi +1 − τi. These equalities show that in terms of the shifted powers (x − τi)r, r ∈ IN ,

Pi(x) = c1,i+ c2,i(xτ − i) + c3,i(xτ− i)2+ c4,i(xτ − i)3 (5.10)

with

c1,i = Pi(τi) = g(τi)

c2,i = P

0

i(τi) = si

c3,i = Pi”(τi)/2 = [τi, τi, τi+1]Pi− ∆τi([τi, τi, τi+1, τi+1]Pi)

= ([τi, τi+1]g − si)/∆τi− c4,i∆τi

c4,i = P”

0

i (τi)/6 = (si+ si+1− s[τi, τi+1]g)/(∆i)2

Different piecewise cubic interpolation methods can differ in the choice of the slopes si, i = 1, ..., n.

(39)

5.4

B-splines

We will now turn to B-splines as an alternative way of representing piecewise polynomial interpolation. Every space IPk,t has a basis consisting of B-splines. We will come back

to this later but first we will give a definition of B-splines. After that we will give some examples of B-splines.

B-splines provide an efficient way of representing piecewise polynomial functions. The polynomials are only defined on the appropriate intervals as we will see later in this chapter. Moreover B-splines make it easier to compute derivatives, as we will see in the following chapter.

The j -th B-spline as a function of x ∈ IR of order k for the knot sequence (tr)j +kr =j, tr ∈ IR

is denoted by Bj ,k ,t(x ) and is given by the equality:

Bj,k,t(x) = (tj+k− tj)[tj, ..., tj+k](x − t)k−1+ , x ∈ IR (5.11)

The k -th divided difference of the function (x − t )k −1 is to be taken by fixing x and

con-sidering (x − t )k −1 as a function of t alone as we did in example 1 of this chapter. The resulting number varies as we vary x and so we obtain a function Bj ;k ;t of x .

Bj ,k ,t(x ) is zero outside the domain spanned by k + 1 knots. The proof of this statement

can be found in the appendix of chapter 5. Denote by ∂

∂xB→(x ; j , k , t ) and ∂

∂xB←(x ; j , k , t ) respectively the left and right derivative of

Bj ,k ,t(x ). We will now give three examples of B-splines from Schumaker (1980), pg. 136.

Example 3. Consider the knot sequence t1 = 0, t2 = 1, t3 = 2 and t4 = 3. We will give

an example of a B-spline of order 2. The B-spline B1 ,2 ,t(x ) is then given by the following

equalities: B1,2,t(x) =      x for 0 ≤ x < 1 (2 − x) for 1 ≤ x ≤ 2

This B-spline consists of two linear pieces; one linear piece from t1 to t2, and one from

t2 to t3. The two linear pieces join at t2 since B1 ,2 ,t(1) = 1 from both sides. The first

derivatives do not match since ∂x∂ B→(x ; 1, 2, t ) = 1 and ∂x∂ B←(x ; 1, 2, t ) = −1 at t2. The

B-spline is based on the three adjacent knots t1, ..., t3.

Example 4. Consider the same knot sequence as in example 1. The B-spline B1 ,3 ,t(x ) is

then given by the following equalities:

B1,3,t(x) =                x2/2 for 0 ≤ x < 1 (−2x2+ 6x − 3)/2 for 1 ≤ x < 2 (3 − x)2/2 for 2 ≤ x ≤ 3

This B-spline consists of three quadratic pieces, joined at two knots. At t2, B1 ,3 ,t(x ) = 1/2

from both sides and ∂x∂ B1 ,3 ,t(x ) = 2 from both sides. However ∂

2

(40)

∂2

∂x2B←(x ; 1, 3, t ) = −2 at t2.

At t3, B1 ,3 ,t(x ) = 1/2 from both sides and ∂x∂ B1 ,3 ,t(x ) = −1 from both sides. However ∂2

∂x2B→(x ; 1, 3, t ) = −2 and

∂2

∂x2B←(x ; 1, 3, t ) = 1 at t3. So at the joining knots, the

ordi-nates and first derivatives coincide but the second derivatives do not. The B-spline is based on four adjacent knots: t1, ..., t4. In the left part of figure 5.1, the graph for this quadratic

B-spline is shown.

Figure 5.1: Examples of respectively a third and a fourth order B-spline.

Example 5. Now we will give an example of a B-spline of order 4. Consider the same knot sequence as in example 1. The B-spline B1 ,4 ,t(x ) is then given by the following equalities:

B1,4,t(x) =                          x3/6 for 0 ≤ x < 1 (−3x3 + 12x2− 12x + 4)/6 for 1 ≤ x < 2 (3x3− 24x2+ 60x − 44)/6 for 2 ≤ x < 3 (3x3− 24x2+ 60x − 44)/6 for 3 ≤ x ≤ 4

This B-spline consists of four cubic pieces, joined at three knots. At t2, B1 ,4 ,t(x ) = 1/6

from both sides and ∂x∂ B1 ,4 ,t(x ) = 1/2 and ∂

2

∂x2B1 ,4 ,t(x ) = 1 from both sides. However ∂3

∂x3B→(x ; 1, 4, t ) = 3 and ∂

3

∂x3B←(x ; 1, 4, t ) = −3 at t2.

For t3, equivalent results hold. The ordinates, first and second dervatives coincide, but

the third derivatives do not match. At t4, clearly all derivatives coincide. The B-spline is

based on five adjacent knots: t1, ..., t5. In the right part of figure 5.1, the graph for this

cubic B-spline is shown.

B-splines can also be represented by the following relation: Bj,k,t(x) = (tj+k − tj)

j+k

X

r=j

dr(x − tr)k−1+ (5.12)

Where dr, r = j, ..., k are coefficients in IR. This relation is derived in the appendix of this

chapter. The B-spline in example 3 can be represented in this form by choosing d1 = 1/3,

d2 = −2/3 and d3 = 0. The B-spline in example 4 can be represented in this form by

choos-ing d1 = 1/6, d2 = −1/2, d3 = 1/2 and d4 = 0. Finally, the B-spline in example 5 can be

(41)

We define a spline function of order k with knot sequence (tj)nj =1 to be a linear

combina-tion of B-splines as funccombina-tions of x of order k . Consider aj ∈ IR, j = 1, .., n to be the spline

coefficients. Define Bj ,k ,t(x ) as in equality (5.11). The spline function f : IR 7→ IR of order

k with knot sequence t is defined by the rule: fk;t(x) = n X j=1 ajBj;k;t(x) (5.13) = n X j=1 aj(tj+k − tj)[tj, ..., tj+k](x − t)k−1+ (5.14)

We defined a spline like this since B-splines form a basis for every space IPk,t. It follows

from relation (5.12) that Bj ,k ,t(x ) is a piecewise polynomial function of order k for the knot

sequence tj, .., tj +k. In order for every space IPk,t to have a basis consisting of B-splines

Bj ,k ,t(x ), it remains to show that the the n B-splines Bj ,k ,t(x ) are linearly independent.

For the proof of this statement, we refer to [De Boor (1978), pg. 116].

5.5

Application to the data

In this section we will apply spline functions to the data under study.

Consider ages xi, i = 1, ..., N with N ∈ IN . We divide [x1, xN] into n intervals of equal

lenght by the knot sequence (tj)n+1j =1. We consider tj +1 − tj = c ∀j ∈ [1, .., n] for some

con-stant c ∈ IR+, t1 = x1 and tn+1 = xN. Knots that are constructed in this way are called

equidistant knots. t2, ..., tn are the inner knots.

Let the random variable Kxi be the number of deceases in a population of Rxi ∈ IN people

of age xi; i = 1, .., N with N ∈ IN . The population has expectation of death qxi ∈ [0, 1]

with maximum likelihood estimator qx

i. Consider Bj ,k ,t(x ) as in equality (5.11). We fit to

our data the following relation: g(qxi) =

n

X

j=1

ajBj,k,t(xi) ∀i ∈ [1, .., N ] (5.15)

Then the objective function S to minimize for this spline function is given by

S = N X i=1    ˆ qxi− g −1 ( n X j=1 ajBj,k,t(xi))    2 (5.16) Where ˆqxi is the maximum likelihood estimator given in chapter 4. We have a number of

choices to make in relation (5.15):

• we use the same link function as in the previous chapter, the logit link function. • we use a spline function of order 4.

• we use one interval [tj, tj +1] for every 5 observations, rounding the number of intervals n

(42)

For the male participants 2005 we have N = 85, so we take n = 17 intervals and 18 knots. We have only 85 observations since the model (5.15) cannot be used for observations of qx

i = 1 or q

xi = 0. The logit function is not defined for these values, so there are no

observations to model. We will fit the following relation: ln qxi 1 − qxi ! = 17 X j=1 aj(tj+3− tj)[tj, ..., tj+3](xi− t)3+ i = 1, .., 85 (5.17) Denote by:

• a = (aj)T, j = 1,..,17 the vector of spline coefficients, aj ∈ IR ∀j ;

• B = (Bij), i = 1,...,85 j = 1,..,17 the regression matrix. The rows are formed by the

splines evaluated in xi, i = 1,..,85.

Relation (5.17) can be fitted by the following equality: ln qxi

1 − qxi

!

= Ba (5.18)

The fit of model (5.15) for the male participants in 2005 is shown on a 10log scale in the

left part figure 5.2. What strikes in this picture is that the spline function fits the data even better than the model in equation (4.39). Especially for the lower ages this model gives estimates that stick closer to the maximum likelihood estimators than the estimates in chapter 4. For this fit, D = 106.75 and χ2 = 109.48, giving p-values of respectively 0.05 and 0.03. This indicates that there is evidence of a lack of fit of the model.

For the female participants in 2005, we have N = 86, so we have n = 18 intervals and 19 knots. The fit is also shown on a 10 log scale in the right part of figure 5.2. For this fit, D = 79.64 and χ2 = 79.14, giving p-values of respectively 0.64 and 0.66. This indicates that there is no evidence of a lack of fit of the model.

The results for the other data under study can be found in the appendix of chapter 5. An

Figure 5.2: B-splines fitted to the mortality rates of respective male and female participants in 2005.

(43)
(44)

Chapter 6

P-splines

6.1

Introduction

In the previous chapter we have seen that B-splines give a flexible model for the expec-tations of death. In this chapter we will show how we can avoid overfitting. We will largely follow the treatment by Eilers and Marx (1996) who proposed to use a relatively large number of knots and to avoid overfitting by penalizing differences of the coeffients of adjacent B-splines.

6.2

P-splines

Consider λ a penalty parameter, λ ∈ IR and consider the spline coefficients aj, j = 1 , ..., n

as in the previous chapter. Define ∆aj = aj − aj −1 and ∆qaj = ∆q−1∆aj for q ∈ {2, 3, ...}.

So for instance ∆2a

j is given by the following equalities:

∆2aj = ∆∆aj = ∆(aj− aj−1)

= aj− 2aj−1+ aj−2

The objective function to be minimized for P-splines in the article by Eilers and Marx (1996) is given by S = N X i=1    yi− n X j=1 ajBj,k,t(xi)    2 + λ n X j=q+1 (∆qaj)2 (6.1)

Referenties

GERELATEERDE DOCUMENTEN

In response to research objectives 2 (Build an understanding of the current wood fuel flow in the town of Tsumeb) and objective 3 (Identify the opportunities that exist in

Using structural methods such as TGA, infrared spectroscopy and X-Ray powder diffraction and combining it with existing knowledge of yttrium carboxylates and the

Scientia Militaria – top downloaded article Vol. 20 no. 2 (1990) Downloads:  4 681 DOI (Usage tracked via Crossref) Article Metrics Finding References

The study has aimed to fill a gap in the current literature on the relationship between South Africa and the PRC by looking at it as a continuum and using asymmetry

Er zijn meer factoren die ervoor zorgen dat de (meeste huisgenoten van) mensen met dementie zich geen voorstelling kunnen maken van het nut van informele hulp.. De angst voor

In this paper it was shown how for algebraic statisti- cal models finding the maximum likelihood estimates is equivalent with finding the roots of a polynomial system.. A new method

Seminar &#34;Modelling of Agricultural and Rural Develpment Policies. Managing water in agriculture for food production and other ecosystem services. Green-Ampt Curve-Number mixed

activiteiten. Aangezien Nepal geen grote veiligheidsrisico’s vormt voor de Europese Unie, is er in geboden hulp geen inhoudelijke focus op het behartigen van