Modelling long term survival with non-proportional hazards Perperoglou, A.

(1)

Modelling long term survival with non-proportional hazards

Perperoglou, A.

Citation

Perperoglou, A. (2006, October 18). Modelling long term survival with non-proportional hazards. Retrieved from

https://hdl.handle.net/1887/4918

Version:

Corrected Publisher’s Version

License:

Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of

_Leiden

Downloaded from:

https://hdl.handle.net/1887/4918

(2)

Chapter

1 Introduction

Survival analysis is the field of statistics that focusses on time until an event occurs; for instance, time until the relapse or death of a patient, time to machine failure, time to cessation of smoking, time to birth, or even time until a game of darts is completed [8]. In medical studies however, in most cases survival analysis is associated with time until death or relapse of a patient. Obviously, in such a study it is not always feasible for the researcher to wait until all events occur. That causes the main difficulty of the field, the fact that the outcome is not fully observable. Thus the cases for which the outcome is not known yet are called censored, or in other words “alive”. According to Stephen Senn [86] the reason that the term censored is used is that in the pessimistic vocabulary of biostatisticians life is a temporary state, and someone who is alive is simply not yet dead. The statistician would like to know how long a patient who is suffering from a disease will survive, if it is known that he is censored and thus not (yet) dead. Therefore, one of the main interests of survival analysis is prediction, based the censored information that the researcher has for his subjects. In this work I attempt something somewhat ambitious, prediction about the distant future. That is, trying to model long term survival, patients who are censored for a long time and refuse to move to the “death” state.

(3)

surgical treatment that some patients receive might be important for survival rates during the first days after surgery, but as time elapses the effect might wash away. The same might hold for a drug given at the start of the therapy. As time passes, the drug effects will eventually die out, and it will not make any difference what kind of drug a patient received 15 years ago when his or her disease was treated.

The aim of this thesis is to investigate the techniques used in survival anal-ysis and explore how some well established models perform in the analanal-ysis of long follow up studies. My goal is to evaluate the performance of some well known techniques but also propose new methodology for data analysis and modelling. Furthermore, I will compare these approaches and investigate their differences and similarities. In this introduction I will present the proportional hazards model (Section 1.1) which is always the starting point in modern sur-vival analysis, followed by models with time varying effects of the covariates. In Section 1.3 I will introduce the reader to reduced-rank hazard regression. The gamma frailty model is presented Section 1.4, and in Section 1.5 I will discuss a generalization of that model, the relaxed Burr model. Note that in this short introduction I will only briefly present the features of the models and not discuss the disadvantages of each modelling technique, these are described in the main body of the text.

The last section of the introduction discusses a different topic, ways to handle overdispersion in generalized linear models. The work is within scope of the thesis, since I discuss alternative ways of dealing with survival data, within a Poisson regression framework. I will briefly introduce this research in Section 1.6.

Finally, in the appendix I will illustrate the use of the R software package coxvc which was built to fit some of the models discussed in this thesis.

1.1 The Cox proportional hazards model

(4)

1.1. The Cox proportional hazards model

be compared with respect to one characteristic. When information on more characteristics was available one could use parametric models, in which the distribution of the baseline hazard had to be specified. For details on actuarial methods refer to [79], and detailed presentation of survival techniques can be found in [54, 25, 7, 21, 57]

In 1972 Sir David Cox published one of the most cited papers [34] in the history of science ”Regression models and life tables” [24]. In this work Cox discussed a new technique for dealing with time to event data, using a linear regression-like approach. Assume information onp covariates, and β is vector of unknown coefficients. Then the hazard at timet can be defined as:

h(t|X) = h0(t) exp(Xβ) (1.1)

whereX is a row vector containing the patients’ characteristics and h0(t) an

unknown function called the baseline hazard giving the hazard for the standard set of conditions whenX = 0. This is a semi-parametric model since there are parametric assumptions on the effects of the covariates on the hazard, but no assumptions are needed on the distribution of the baseline.

For estimating the model Cox introduced the partial likelihood, which al-lowed the baseline to have an arbitrary non parametric form. Lett1 < t2<

... < tD denote the ordered event times and X(i) the vector of covariates at

timeti. The partial likelihood in a Cox model is then defined as:

pl(β) = D

∏

i=1 exp(X(i)β) ∑j∈R(ti)exp(Xjβ) (1.2)

withR(ti) is the risk set, the set of all subjects who are still under study at time

just prior toti. Since the partial likelihood does not depend on the baseline

hazard no assumptions are needed forh0(t). The estimates can be found by

maximizing 1.2 or the log of the likelihood, using an iterative technique such as the Newton-Raphson.

An important feature of the Cox model and the partial likelihood is the risk-set. At each time point where an event occurs, the number of people still at risk are counted and form the set of cases who are still “alive” and under observation. At exactly that time point the model compares the average characteristics of people at risk to those cases experiencing an event. In other words, the Cox model can be seen as a case-control study, evaluated at each time point where an event occurs.

(5)

model. Assume two individuals with covariate valuesX and X∗ respectively, then the ratio of their hazards will be:

h(t|X) h(t|X∗₎ =

h0(t) exp(Xβ)

h0(t) exp(X∗β)= exp(X − X

∗_)β

Although the baseline hazard is not essential for estimation, in practice it is of major importance. Once the baseline hazard is estimated, the survival function may be estimated by:

ˆ

S(t|X) = e−Hˆ0(t)exp(X ˆβ)

where ˆH0(t) is the cumulative baseline hazard given by:

ˆ H0(t) =

∑

ti≤t di ∑j∈R(ti)exp(Xjˆβ) (1.3)

wherediis the status indicator (0 censored, 1 event). This is the Breslow

esti-mator [16], proposed in 1975, while another estiesti-mator has been also proposed by Kalbfleisch and Prentice (1973) [53]. When the Cox model was proposed the fact that the baseline hazard could be neglected was very desirable, especially due to the limited computational abilities of early computers. However, in the course of events the baseline hazard was neglected by researchers, although studying it’s behavior is of great importance.

Since the Cox model is considered the ‘null’ model in survival analysis, I will use it to analyze different data sets throughout each Chapter of this text (with an exception in Chapter 6). The model will serve as the starting point for each analysis and a reference for comparing and evaluating other methods.

1.2 Time varying effects models

A natural extension from the simple proportional hazards model in (1.1) is to include an interaction of the covariates with time functions. Cox proposed this method in his original paper, mainly as a method for testing the proportionality assumption. A Cox model with time varying effects of the covariates can be written as:

h(t|X) = h0(t) exp[Xβ(t)] (1.4)

(6)

1.3. Reduced-Rank hazard regression

covariates are fixed but the coefficients of these covariates interact with time and change at different time points. Cox considered the interaction of covari-ates with simple time functions such as timet, log(t), √t. For instance, in a simple model with one covariate, sayX1 that is assumed to have a varying

effect, a time varying effects model could be written as:

h(t|X1) = h0(t) exp[X1β1+ X1β2f1(t) + X1β3f2(t)]

where f1(t) and f2(t) any time function. When the restriction of f1(0) =

0, f2(0) = 0 is imposed, β1 describes the effect of the fixed covariate right at

the start of the study. Moreover, when reporting the results of such a fit it will be advantageous to present a graph of the covariates effects versus time, instead of just reporting the coefficient estimates. A plot will visualize the effects and how they vary through time, and will reveal any interesting pattern that may emerge.

Among others, Gray [40] proposed using spline functions to model time varying effects, Hastie and Tibshirani [43] discussed models with varying coef-ficients, Verweij and van Houwelingen [103] suggest fitting time varying effects using a penalty on the likelihood to control the pattern of the time effect, while Berger et al. [13] proposed time varying effects models with the use of frac-tional polynomials as time functions. In most of the literature on the subject, the choice of the time functions is an issue to be addressed. I present the Cox models with time varying effects of the covariates in more detail in Chapter 2, discuss an efficient algorithm for estimation in Chapter 3 and compare it with other modelling techniques in Chapter 5.

1.3 Reduced-Rank hazard regression

Consider a Cox model with time varying effects, in which all the covariates interact with the same time functions. In that special case, equation (1.4) can be written as

h(t|X) = h0(t) exp(XΘF0(t)) (1.5)

(7)

of different ranks for the structure matrix gives birth to a new class of models, the reduced-rank hazard models.

Reduced-Rank regression has been originally suggested by Anderson [10] as technique in multivariate linear regression for finding linear combinations of the covariates and reducing the dimension of the problem. Since the introduction of the model the method received little attention. Burket [18] presented a study of reduced-rank models for prediction in psychometric applications, Izenman [52] gave the asymptotic (large-sample) distributions of the various estimated reduced-rank regression coefficient matrices, Tso [96] illustrated the connection between reduced-rank regression and canonical analysis, Davies and Tso [27] proposed a least squares estimate for the parameters employing the single-value decomposition and the Eckart-Young algorithm. Yee and Hastie [110] used reduced-rank methodology for a class of vector generalized linear and additive models. In their work they argued that the major reason that reduced-rank has found only a few application was that it was restricted only where the response was continuous, and thus never outside the Gaussian family. A review of these multivariate methods can be found in [82].

In Chapter 2 I implement the ideas of reduced-rank regression in survival analysis. I present a Cox model with time varying effects of the covariates and apply reduced-rank factorization on the matrix of unknown coefficients. This new class of survival models is particularly useful for parsimonious modelling but also provide the means of identifying the patterns of the covariate effects while avoiding overfitting the data. I illustrate how to estimate the model based on the partial likelihood, discuss the choice of time functions and give motivation for using reduced-rank models when modelling survival data. Fi-nally, the reduced-rank models, the Cox model with time varying effects and a gamma frailty model are compared. In Chapter 3 I propose an algorithm for fitting these models and illustrate a software written in R for estimating the parameters and compute their standard errors. In Chapter 5 I show the practical use of reduced-rank models in a big data set of breast cancer patients with information on many covariates, and compare the results to other fitting approaches.

1.4 Frailty models

(8)

1.4. Frailty models

then, several authors have worked on frailty models and their estimation [20, 49, 1, 56]. Although the idea that each person is unique and has its own frailty is trivial, modelling such heterogeneity could not be easily implemented in statistics. A frailty model is a random effects model, very much in the same sense of a mixed model in linear regression analysis. However, the censoring mechanism and the fact that frailties can be never observable make frailty models slightly more complicated.

To model individual heterogeneity, a random effect, sayZ is included in the model, which acts multiplicative on the hazard:

h(t|X, Z) = h0(t) exp(Xβ)Z

The random effect is assumed to follow a certain distribution, mainly chosen for mathematical convenience. Although there have been several proposals for the choice of the frailty distribution such as Gaussian, or positive stable distributions [48], a gamma distribution is a common choice, due to its desirable mathematical properties. In this text I only consider individual frailty models in which the random effect follows a gamma distribution.

Since the frailties are not observable inference can be made on the marginal hazard which is obtained by integrating out the random effects. In the case of a gamma frailty, the random effect is assumed to follow a gamma distribution with variance ξ and mean1. The Laplace transformation in then used to obtain the marginal hazard:

h(t|X) = h0(t) exp(Xβ) 1 + ξ H0(t) exp(Xβ)

(1.6)

This model is also known as the Burr model. Starting from (1.6) it is easy to show that

S(t|X) = exp(−

Zs

0 h(s|X)ds) = [1 + ξ H0(t) exp(Xβ)]

−1/ξ

(9)

see Therneau and Grambsch [92], Chapter 9. Other sources on frailty models can be found in Andersen et al. [7], Chapter 9, and Hougaard [50].

Frailty models are not only used for modelling individual heterogeneity, but also as a model modifier in general that can explain any possible lack of fit. They can be viewed as a one dimensional extension of the Cox model, which is achieved with the inclusion of the random effect Z. That extra dimension relaxes the proportionality assumption, since whenH0→∞ the hazard behaves

likeh0(t)/ξ H0(t) which does not depend on X. So, the hazard ratios converge

to 1 for each pair of different covariate values (X, X∗) However, the limiting hazard need not to drop to zero. In the special case thath0(t) = H0(t) = exp(t)

the limiting hazard becomes just1/ξ.

The gamma frailty model will be discussed further in Chapter 2 where I attempt a comparison with reduced-rank models. In Chapter 4 I will attempt a generalization of the model to include time dependent frailties and illustrate further the practical use of the model in Chapter 5. All the frailty models in this thesis have been fitted by using the penalized likelihood approach.

1.5 Relaxed Burr model

The marginal hazard of the frailty model as given in Chapter 2 (equation 1.6) comes from the Burr distribution [19] that is why it is also (rarely) called the Burr model. In the previous Section I discussed some of the desirable features of the Burr model. However, there is one serious limitation that goes against biological reasoning; the random effect acts multiplicative on the hazard with a value that does not depend on time. In real applications however, it would be more natural to assume that the frailty changes with time, in the same sense that a covariate effect may be strengthen or weakened as time passes by. Nevertheless, modelling time dependent frailties is a very complicated task from a mathematical point of view.

(10)

1.6. Cure rate mixture models

1.6 Cure rate mixture models

In some analyses of time to event data the emphasis is given in the identification of long-term survivors that may be present on the population, and whether those cases can be considered ‘immune’ or ’cured’. In practice in many medical applications a subject can become ‘immune’ to a type of infection or a certain virus and thus may never experience the event of interest. The aim of cure rate mixture models, or just cure models, is to identify those immune cases and model the survival of the subjects as a mixture of two functions, the probability of being cured and the probability of survival conditional on not being cured.

Cure models are often used in criminology where the event of interest is usually unoffending or reconviction [85, 23], econometrics [89] and others, for a detailed discussion of these mixed models see Maller and Zhou [65]. In this text I used a non-parametric cure rate model, a mixture of a proportional hazards model for the probability of survival and a logistic regression model for the probability of being cured. I discuss this model in Chapter 5, as an additional method to fit data from long follow up studies, present briefly a way of estimation and comment on the results when compared to other models.

1.7 Overdispersion modelling with individual deviance

effects and penalized likelihood

The last Chapter of the thesis was inspired when trying to model survival data, using a Poisson regression framework. In the old days, before the propor-tional hazards model was introduced, survival data were analyzed by counting the number of deaths and person-years in an age-period-cohort model. Using this set-up a Poisson model can be used for estimation, within the standard framework of generalized linear models. I implemented this setting and tried to include an individual random effect, in the same sense of a gamma frailty model, within the Poisson likelihood in which a ridge penalty should be added to control the magnitude of the random effect. Although this attempt was not successful -the extra parameters added in the model were not random effects-this approach gave rise to the idea of individual deviance effects, used in all kinds of generalized linear models, as a powerful explanatory tool.

(11)

an algorithm for fast algebraic manipulation of large systems of equations. Although this last Chapter is not directly associated with the rest of the methodology presented in the thesis, the main idea was born from this research, and may be easily implemented in survival analysis.

1.8 Software

Most of the analysis in this thesis was performed using R [81] statistical software which is an open source software available for download at http://www.r-project.org. I have found R to be a powerful software that can be used in a wide range of analyses. One of the advantages of the software is the flexibility in programming new algorithms and distributing them through the World Wide Web, in the form of packages including documentation and help files to other users.

When fitting models with time varying effects of the covariates I found that the available software was not efficient and posed some serious limita-tions on the analysis of large data sets. In Chapter 3 I illustrate an algorithm and the corresponding software package coxvc_1-0-1 written in R for fitting Cox models with time varying effects of the covariates and reduced-rank mod-els. An updated version of the package coxvc_1-1-1 can be downloaded at http://clinicalresearch.nl/personalpage from the homepage of the au-thor, while a manual of the package in presented in the Appendix.

(12)

1.9. Submission and publication

1.9 Submission and publication

This thesis is a collection of different papers that have been published or sub-mitted for publication:

Chapter 2: A. Perperoglou, S. le Cessie, H.C. van Houwelingen, Reduced-rank hazard regression for modelling non-proportional hazards, Statistics in Medicine,(2006) 25: 2831-2845, DOI: 10.1002/sim.2360

Chapter 3: A. Perperoglou, S. le Cessie, H.C. van Houwelingen, A fast routine for fitting Cox models with time varying effects of the covariates, Computer Methods and Programs in Biomedicine, (2006) 81;2:154-161. DOI:10.1016/j.cmpb.2005.11.006

Chapter 4: A. Perperoglou, H.C. van Houwelingen, R. Henderson, A relaxation of the gamma frailty (Burr) model, To appear in Statistics in Medicine.

Chapter 5: A. Perperoglou, A. Keramopoullos, H.C. van Houwelingen, Approaches in modelling long term survival: An application to breast can-cer, submitted for publication

(13)