A Comparison of Different Approaches to Nonparametric Inference for Subdistributions

(1)

A Comparison of Different

Approaches to Nonparametric

Inference for Subdistributions

Johannes Mertsching

Johannes.Mertsching@gmail.com

Master Thesis

Supervision: dr. Ronald B. Geskus prof. Chris A.J. Klaassen

Korteweg-de Vries Institute for Mathematics Faculty of Sciences

University of Amsterdam

(2)

(3)

Abstract

Geskus (2011) conjectured that nonparametric inference for subdistributions in the presence of both left truncation and right censoring may be done as for proper distributions after reweighing the underlying data. As a consequence, standard software from survival analysis could be used for competing risks analysis. In this thesis, two estimation problems (variance estimation for cause-specific cu-mulative incidence estimators; K-sample test for competing risks data) are ex-amined in different right censoring and left truncation frameworks. The asymp-totic properties and the small sample performance of estimators obtained using Geskus’ approach are scrutinized and compared to other estimators from the literature.

Information

Title: A Comparison of Different Approaches to Nonparametric Inference for Subdistributions

Author: Johannes Mertsching, Johannes.Mertsching@gmail.com, 10225285 Supervisors: dr. R.B. Geskus, prof.dr. C.A.J. Klaassen

Second reviewer: dr. A.J. van Es Date of submission: June 17, 2013

Korteweg-de Vries Institute for Mathematics University of Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

(4)

Introduction

This thesis is settled in the area of survival analysis where one tries to estimate default distributions. Applications are widely spread and are for example found in medicine (where default is often associated with death) but for instance also in finance (e.g. credit default probabilities) and actuarial sciences (e.g. loss payment probabilities).

Often, default occurs due to different causes and one is interested in estimat-ing cause-specific quantities. An exemplary quantity of interest is the probability of experiencing an event of a specific type by a given time. Frequently, we find ourselves in a framework where the occurrence of the event of interest cannot be observed if another event happened before. Such risks are called competing risks. In their presence, we encounter subdistributions whose estimation we shall consider here.

In practice, inference is further complicated by right censoring and left trun-cation. Right censoring is the mechanism based on the idea that some individuals are lost to observation at one point in time before experiencing an event. Left truncation is the problem that some individuals are not observed at all since they experience an event before they could enter the study. We shall assume that the right censoring and left truncation mechanisms are independent of the times to event.

We are going to compare several approaches to nonparametric inference for subdistributions for different kinds of censoring and truncation. In Geskus (2011), it is conjectured that inference for subdistributions may be done simi-larly to inference for proper distribution after reweighing the underlying data. As a consequence, software from standard survival analysis could be applied to competing risks problems where available software is not as rich. We aim at investigating this conjecture in two applications. In chapter 2, we survey the point-wise inference for estimators of the cause-specific cumulative incidence function, i.e. the probability to experience an event of interest by a given time. Several approaches to estimate its variance have been introduced and asymp-totic properties and small-sample performance shall be investigated here. Next, in chapter 3, we will do the same for approaches in simultaneous inference by considering K-sample tests. Estimation up to this point will consider different kinds of right censoring. In chapter 4, we will additionally allow for left trun-cation and show how the given arguments can be extended. A summary and suggestions for further research are given thereafter.

(7)

Chapter 1

Foundation

In this chapter, we want to lay the groundwork for later chapters. In a first section, relevant quantities from survival analysis without competing risks are introduced and their standard estimators are presented. Thereafter, the same is done for the competing risks framework. We shall consider two kinds of right censored data. Both types are introduced in this chapter as well. Concepts we will frequently use throughout this thesis are defined next. Our notation, which will to a large extent follow that in Gray (1988), is introduced along the way.

1.1 Survival analysis without competing risks

We will be concerned with the analysis of time-to-event data in a sample of size n. For individual i, we denote his event time by Ti. Without explicitly mentioning

it in the sequel, we will always assume that the event times are independent identically distributed so that there is one common event time distribution. We will refer to the distribution function as the cumulative incidence function and denote it by

F (t) := P (Ti ≤ t).

Its complement, the survival function, will be denoted by S(t) := 1 − F (t).

We will also write

¯

F (t) := 1 − F (t)

and employ a similar notation for any distribution function. Consequently, S(t) = ¯F (t). Unless mentioned otherwise, we will always assume that F (t) is absolutely continuous with respect to Lebesgue measure and denote its density by f (t). Based thereon, we define the hazard rate λ(t) by

λ(t) := lim h↓0 P (t ≤ Ti ≤ t + h|Ti ≥ t) h = f (t) 1 − F (t−)

(8)

Since the cumulative incidence function is a non-decreasing function, the hazard rate is non-negative. It is also referred to as the hazard function or the incidence function. Due to the above definition of the hazard rate, we see that

S(t) = exp − Z t 0 λ(s)ds . The cumulative hazard is defined by

Λ(t) := Z t

0

λ(s)ds

and it is immediately clear that S(t) = exp (−Λ(t)). This relation is the reason for the importance of the product integral in survival analysis. The product integral is defined as the limit of products over partitions with a decreasing mesh size just as the ’ordinary’ integral is defined as the limit of sums. We present the definition of the one-dimensional product integral as a special case of the multidimensional definition in Andersen et al. (1993, section II.6). Definition 1.1 (Product integral) Let X(t) be a cadlag function of locally bounded variation on [0, τ ]. We define for t ∈ [0, τ ]

Y (t) = t R 0 (1 + dX(s)) := lim |ti−ti−1|→0 n Y i=1 (1 + X(ti) − X(ti−1)). Here, 0 = t0 < ... < tn= t is a partition of [0, t].

Existence, important properties and further key results related to the product integral are presented in Andersen et al. (1993, section II.6). Using the product integral, S(t) = exp (−Λ(t)) can equivalently be expressed as

S(t) =

t

R

0

(1 − dΛ(s)). (1.1)

In the sequel, we briefly want to motivate the form of the estimators of the above introduced quantities. Furthermore, we will state their asymptotic properties since they will be used throughout this thesis. We will employ the following notation from standard survival analysis (without competing risks):

ri(t) := 1[Ti≥t](= 1 if individual has not experienced an event before time t) ,

Ni(t) := 1[Ti≤t](= 1 if individual experienced an event by time t) .

For any process X(t), we will define X(t−) := lims↑tX(s). Using this,

(9)

Based on the above definition, we introduce the following counting processes: r(t) :=

n

X

i=1

ri(t) = number at risk at time t,

N (t) :=

n

X

i=1

Ni(t) = total number of observed events by time t.

We denote the ordered event times by t_(j) and assume that there are d(t_(j)) events at time t(j). We note that d(t(j)) = N (t(j)) − N (t(j)−). Recall that

λ(t) = lim

h↓0

P (t ≤ Ti ≤ t + h|Ti≥ t)

h .

For small h, λ(t)h can therefore be interpreted as the conditional probability of experiencing an event in [t, t + h] given that there was no event before time t. Thus, the hazard rate is estimated by

ˆ

λ(t) = d(t) r(t).

Based thereon, the cumulative hazard is estimated by ˆ Λ(t) = X t(j)≤t d(t(j)) r(t(j)) .

Frequently, we shall write this using an integral notation as ˆ Λ(t) = Z t 0 1 r(s)dN (s).

This estimator is referred to as the Nelson-Aalen estimator. In section 1.4.2, we shall introduce a martingale central limit theorem. Using it, the asymptotics of the Nelson-Aalen estimator can be established. Let us denote by T the support of S(t), i.e. a time interval of the form [0, τ ] or [0, τ ) for some given τ ∈ (0, ∞]. The following theorems are proved in Andersen et al. (1993, p. 190, 191) and shall only be stated here.

Theorem 1.1 (Asymptotics Nelson-Aalen estimator I) Let t ∈ T be given and define J (s) := 1_[r(s)>0]. Suppose, for n → ∞,

Z t 0 J (s) r(s)λ(s)ds P → 0 and Z t 0 (1 − J (s))λ(s)ds→ 0.P Then sup s∈[0,t] ˆ Λ(s) − Λ(s) P → 0 as n → ∞.

(10)

Theorem 1.2 (Asymptotics Nelson-Aalen estimator II) For t ∈ T , sup-pose there is a non-negative function r◦ such that _rλ◦ is integrable on [0, t]. Then,

σ2(t) = Z t

0

λ(u) r◦_(u)du

is well-defined. Furthermore assuming that, as n → ∞, n Z s 0 J (u)λ(u) r(u)du P → σ2(s) for all s ∈ [0, t], (1.2) n Z t 0 J (u)λ(u) r(u)1 h√ n J (u) r(u) > idu P → 0 for all > 0, √ n Z t 0 (1 − J (u)) λ(u)du→ 0,P we get the following convergence:

√

n ˆΛ − Λ _D

→ U on D[0, t] as n → ∞.

Here, U is a Gaussian martingale (see formula (1.12) for a definition) with U (0) = 0 and V ar(U (t)) = σ2(t).

Here, D[0, t] denotes the space of cadlag functions on [0, t] endowed with the Skorokhod topology, which is explained in more detail in section 1.4.1. Note that (1.2) implicitly defines r◦.

The most popular estimator of the survival function, the Kaplan-Meier esti-mator, is based on the relation in (1.1) and given by

ˆ S(t) := Y t(i)≤t 1 −d(t(i)) r(t(i)) = t R 0 1 − d ˆΛ(s). (1.3)

Its asymptotics can be established similarly to those of the Nelson-Aalen esti-mator by employing martingale properties. The proof of the following statement can be found in Andersen et al. (1993, page 261 ff.).

Theorem 1.3 (Asymptotics Kaplan-Meier estimator I) Suppose the con-ditions in Theorem 1.1 hold true. Then

sup s∈[0,t] ˆ S(s) − S(s) P → 0 as n → ∞.

Theorem 1.4 (Asymptotics Kaplan-Meier estimator II) Suppose the con-ditions in Theorem 1.2 hold true. Then, we get the following convergence:

√

n ˆS − S _D

→ SU on D[0, t] for n → ∞

(11)

Again, we refer to section 1.4.1 for an explanation of weak convergence on D[0, t]. Estimation will usually be complicated by right censoring and left truncation. We denote by Ci the censoring time for individual i and by Li his truncation

time and assume that Ci and Li are independent of Ti. The corresponding

distributions are denoted by

G(t) := P (Ci ≤ t),

H(t) := P (Li ≤ t).

We redefine ri(t) in the following way:

ri(t) := 1[Ti∧Ci≥t>Li].

Hence, an individual is at risk only if it has neither experienced an event nor censoring and if it entered the study. The above considerations for estimation of λ(t), Λ(t) and S(t) remain valid using this adapted ’at risk’-definition.

In order to estimate the censoring distribution, we will use a product-limit estimator that we obtain after swapping the roles of Tiand Ci. For this purpose,

let us denote the ordered observed censoring times by c(j) and the number of

censorings at time c_(j) by m(c_(j)). We assume that c_(j) > t_(k) in case they happen at the same time point, i.e. if there are censorings and events at the same time points, the censored individuals still remain in the risk set at this particular point and only pass out after. A product-limit estimator of G is then given by ˆ ¯ G(t) = Y c(j)≤t 1 − m(c(j)) r(c(j)) − d(c(j)) .

A similar approach yields an estimator of the truncation distribution (Geskus, 2011). We denote by `(j)the ordered truncation times and assume that there are

w(`_(j)) individuals getting censored at `_(j). We assume that `_(j) > (t_(k)∧ c_(k0₎)

in case they happen at the same time. A product-limit estimator is obtained by reversing the time axis since Li is right truncated by Ti∧ Ci. Therefore, −Li is

left truncated by −(Ti∧ Ci) and the known product-limit estimator now reads

as ˆ H(t) = Y −`_(j)<−t 1 −w(`(j)) r(`_(j)) = Y `(j)>t 1 −w(`(j)) r(`_(j)) .

1.2 Competing Risks

In this thesis, we want to investigate competing risks models. In such models, there are multiple failure types and we are interested in estimating cause-specific quantities. We will denote the observed event type for individual i by Di and

it is assumed that there are m competing events. For any quantity, we will indicate failure type by subscript. For example, keeping in mind that the overall number of events was denoted by N (t), we will denote the counting process that

(12)

is counting the number of events of type k that happened by time t by Nk(t).

As another example, we will use

Tik= event time for the event of type k for individual i.

If there was an event of type k, we will assign Til the value ∞ for l 6= k.

Consequently, we for example obtain the nice property Fk(t) = P (Ti≤ t, Di = k) = P (Tik ≤ t).

We assume that Fk has density fk. To abbreviate notation, we will frequently

use Tkinstead of Tikwhich should not be confused with the overall event time of

individual k. We are aware that the same ambiguity arises for other quantities, for example Nk(t) could as well be the indicator denoting whether individual

k experienced an event by time t. However, it will always be clear from the context how Nk, Tk and similar expressions should be interpreted.

In the presence of competing risks, several types of hazards exist. The marginal hazard of an event is the hazard in the situation when other events do not occur. Estimation of the marginal hazard is only possible when the times to event are independent for different risk types and this independence cannot be tested for (Tsiatis, 1975). We shall not consider this hazard type further in this thesis.

A second hazard type is the cause-specific hazard which is defined in the following way: λk(t) := lim δ↓0 P (t ≤ Tik ≤ t + δ|Ti≥ t) δ = fk(t) 1 − F (t−).

Hence, we note that for small δ, δλk(t) can be interpreted as the conditional

probability of experiencing an event of type k in [t, t + δ] given there was no event (of any kind) before time t. We denote by dk(t(i)) the number of observed

events of type k at time t_(i). The definition of λk(t) suggests to estimate it by

ˆ λk(t) =

dk(t)

r(t) .

Based thereon, the cumulative cause-specific hazard Λk(t) :=

Rt 0λk(s)ds can be estimated by ˆ Λk(t) = Z t 0 1 r(s)dNk(s).

The cause-specific hazard treats competing events as censorings. However, it is important to note that this type of hazard does not correspond to the cause-specific cumulative incidence function Fk(t) = P (Tk≤ t). In fact, this

observa-tion gave rise to the definiobserva-tion of a third type of hazard, namely the subdistri-bution hazard. It is defined by

γk(t) := lim δ↓0 P (t ≤ Tik ≤ t + δ|Tik ≥ t) δ = fk(t) 1 − Fk(t−) .

(13)

Here, δγk(t) can now for small δ be interpreted as the conditional probability to

experience an event of type k in [t, t + δ] given there was no event of interest(!) before time t. Therefore, in order to estimate this quantity, a redefined risk set is used. Let us for now assume that there is neither censoring nor truncation. We will call this situation complete information and additionally consider right censoring in section 1.3 and left truncation in chapter 4. Then, we do also consider individuals to be at risk with respect to this changed meaning if they experience a competing event. They only leave the risk set if they experience the event of interest. Based on these considerations, we define

Rik(t) = 1[Tik≥t] and Rk(t) = n X i=1 Rik(t).

Since this quantity is only needed for the event of interest, we shall write R(t) = Rk(t). We note that individuals that experience a competing event will then

always be at risk which implies R(t) ≥ r(t). This ’at risk’ definition might appear counter intuitive at first sight but it is useful in the analysis of the subdistribution hazard since we obtain the following estimator of the subdistribution hazard for failure type k:

ˆ γk(t) =

dk(t)

R(t).

Naturally, the cumulative subdistribution hazard Γk(t) :=

Rt 0γk(s)ds can be estimated by ˆ Γk(t) = Z t 0 1 R(s)dNk(s). It can further be shown that

¯ Fk(t) = t R 0 (1 − dΓk(t)).

A product-limit type estimator of the cause-specific cumulative incidence func-tion is therefore given by

ˆ F_kP L(t) = 1 − t R 0 1 − dˆΓk(t) .

An alternative estimation approach is based on the relationship Fk(t) =

Z t

0

S(u−)dΛk(u).

We get an estimator of Fk(t) when replacing S and Λkby their estimators. This

estimator, commonly referred to as the Aalen-Johansen estimator, is given by ˆ F_kAJ(t) = Z t 0 ˆ S(u−) r(u) dNk(u). (1.4)

(14)

Under complete information, we show in part A.1 that ˆF_kAJ(t) = ˆF_kP L(t). How-ever, it will turn out to be crucial in the sequel that there are censoring types where this does not hold true anymore. Under complete information, it can be shown that the following process is a local square integrable martingale (see section 1.4.2) with respect to Fk = {Fk(t)}t≥0, Fk(t) = σ{Nik(u), u ≤ t, i =

1, ..., n} (Gray, 1988):

Mik(t) = Nik(t) −

Z t

0

γk(s)Rik(s)ds. (1.5)

Note that Fkis the filtration that does not contain information about the times

to the competing events. Then, Mk(t) = n X i=1 Mik(t) = Nk(t) − Z t 0 γk(s)R(s)ds

is a local square integrable martingale with respect to Fk as well.

1.3 Censoring frameworks

In this thesis, we are concerned with two different types of right censoring. For both types, an event for individual i can only be observed if it is under observation (not censored). This motivates the definition of

N_ic(t) := Z t

0

1_[C_i_>u]dNi(u).

Similarly, we define N_ikc(u). The two censoring frameworks differ with respect to available information about censoring times, i.e. there are different underlying filtrations.

The first situation is commonly called censoring complete information or administrative censoring. In here, we assume that censoring times are known for all(!) individuals, even for those who experienced an earlier event. An individual that experienced a competing event will still be considered to be ’at risk’, but only up to its censoring time. After, it leaves the risk set as any censored individual does. We note that all the individuals that did not experience the event of interest are treated similarly. This is an important observation since estimation of the subdistribution hazard does therefore not require knowledge of competing event times. These considerations are reflected in the following filtration (Fine and Gray, 1999, p. 499): Fk,CC = {Fk,CC(t)}t≥0 with

F_k,CC(t) = σ1[Ci>u], 1[Ci>u]Nik(u), u ≤ t, i = 1, ..., n .

We define Rik,CC(u) := 1[Ci>u][1 − Nik(u−)]. Individual i is ’at risk’ with

re-spect to the above meaning iff Rik,CC(u) = 1. Again, we will commonly leave

out the k in the subscript and denote the total number at risk in this setting by RCC(t) =

n

X

i=1

(15)

Under censoring complete information, M_ikc(t) := N_ikc(t) −Rt

0Ri,CC(u)γk(u)du

is a local square integrable martingale (Fine and Gray, 1999). (Andersen et al., 1993, p. 139 ff.) showed that Mikin (1.5) preserves its local martingale property

with respect to F_k∗= {F_k∗(t)}t≥0 given by

F_k∗(t) = σ1[Ci>u], Nik(u), u ≤ t, i = 1, ..., n

when independence of censoring and event times is assumed. Note that Mik(t) = Z t 0 dNik(u) − Z t 0 Rik(u)γk(u)du, M_ikc(t) = Z t 0 1[Ci>u]dNik(u) − Z t 0

1[Ci>u]Rik(u)γk(u)du.

The predictability of 1[Ci>t] with respect to F

∗

k and its boundedness therefore

imply that M_ikc is local square integrable martingale with respect to F_k∗ as well. Note that Fk,CC(t) ⊂ Fk∗(t). Hence, the innovation theorem (Andersen et al.,

1993, p. 79 ff.) implies that N_ikc has the following intensity process with respect to Fk,CC:

E [γk(t)Ri,CC(t)|Fk,CC(t−)] = γk(t)Ri,CC(t).

The latter equality follows since γk(u) is a deterministic function and Ri,CC is

predictable with respect to Fk,CC(t). Therefore, we conclude that the intensity

process (and hence the compensator) of N_ikc with respect to Fk,CC(t) is the same

as with respect to F_k∗(t). Thus, M_ikc is a local square integrable martingale with respect to Fk,CC(t).

The second censoring situation we encounter is random censoring infor-mation. Censoring times are only known if they happened before the event times. For an individual that experiences an event before it gets censored, the censoring time is unknown. If all the individuals that experienced a competing event stayed in the redefined at risk set forever, this would not account for the censoring that individuals would have experienced if they had not passed out be-fore. However, since censoring is assumed to be independent of the event times, the unobserved censoring times can be estimated. Fine and Gray (1999) intro-duced such an estimator that assigns time-dependent weights to all individuals. Individual i has weight Ri,RC(t) given by

Ri,RC(t) = ˆ ¯ G(t−) ˆ ¯ G((t ∧ T−ik)−) . (1.6)

Here, T−ik denotes the time at which individual i experiences an event other

than that of type k.

It is important to note that M_ikc∗(t) := N_ikc(t) −Rt

0 Ri,RC(u)γk(u)du is not a

local martingale with respect to a filtration not containing information about the competing event times. This particularly distinguishes the two censoring information frameworks. Under censoring complete information, M_ikc essentially is a local martingale since only information about the independent censoring

(16)

times has to be added to Fk to make Ri,CC predictable. This is different for

Ri,RC. Only individuals that experienced a competing event may contribute to

RRC with a weight different from 1. Hence, information about competing event

times has to be available to make Ri,RC predictable. For example, one may try

to consider ˜F_k= { ˜F_k(t)}t≥0 with

˜

F_k(t) = Fk(t) ∨ σ

n

1_[C_i_>u], ˆG ((u ∧ T−ik)−) , u ≤ t, i = 1, ..., n

o

. (1.7)

Note that this filtration contains information about competing event times. However, augmenting the filtration by information about competing event times invalidates the martingale property of Mik in (1.5) which is the crucial starting

point of the reasoning under censoring complete information. We conclude that the same martingale arguments cannot be applied anymore. One may wonder whether M_ikc∗ is a local martingale with respect to a different filtration. Such a filtration does not exist. This is a consequence of asymptotic results we establish in chapters 2 and 3 with the help of a counterexample in section A.3. However, it remains unclear what compensator N_ikc has with respect to ˜Fk in (1.7).

In order to obtain predictability of Ri,RC, we associate the following filtration

with the random censoring information situation: FRC= {FRC(t)}t≥0 with

FRC(t) = σDiNic(u), 1[Ci>u](1 − Ni(u)), u ≤ t, i = 1, ..., n .

Based thereon, Ri,RC(t) can be determined. The total number at risk in the

random censoring information framework is then given by RRC(t) =

n

X

i=1

Ri,RC(t).

Later on, we will frequently make use of the relationship 1

n|RCC(t) − RRC(t)|

a.s.

→ 0 as n → ∞ (1.8)

which we shall prove here. For this purpose, we first note that Beyersmann et al. (2012, p. 129, 130) showed

lim

n→∞|E[Ri,CC(t)] − E[Ri,RC(t)]| = 0. (1.9)

Furthermore, since max {E[|Ri,CC(t)|], E[|Ri,RC(t)|]} ≤ 1 < ∞, an application

of the strong law of large numbers yields for n → ∞ 1 nRRC(t) − E[Ri,RC(t)] a.s. → 0, (1.10) 1 nRCC(t) − E[Ri,CC(t)] a.s. → 0. (1.11)

Combining (1.9), (1.10) and (1.11), we obtain (1.8). Hence, we see that RRC(t)

(17)

censoring information to censoring complete information seems feasible. Asymp-totics in the censoring complete information framework can be established us-ing martus-ingale arguments and it is temptus-ing to apply similar techniques under random censoring information. If this were true indeed, any function of the subdistribution hazard could be estimated as in the standard survival setting after assigning time-dependent weights to all the individuals. This approach we will refer to as ’the reweighing approach’. As a consequence, standard software could be used for competing risks analysis. The applicability of the reweighing approach shall be investigated in the sequel.

1.4 Essential concepts

In this part, we briefly want to state concepts and theorems that we will use frequently in later chapters. First and foremost, this is a martingale central limit theorem that will later allow us to derive asymptotic properties of estimators elegantly. In order to make the convergence therein explicit, weak convergence of random functions is briefly introduced first. Further results that we will use in later parts are stated thereafter.

1.4.1 Weak convergence of random functions

Let us denote by D(T ) the space of cadlag functions on T where T now is the interval [0, τ ) or [0, τ ] for some given τ ∈ (0, ∞]. We define a metric on D(T ) in the following way (Jakubowski, 2007):

Let Λ be the space of strictly increasing functions from T onto itself. For two elements x, y ∈ D(T ), we define ds(x, y) := inf λ∈Λ sup t∈T |λ(t) − t| + sup t∈T |x(λ(t)) − y(t)| .

This metric was introduced by Skorokhod and is therefore referred to as the Skorokhod metric. We aim at endowing D(T ) with a metric that makes it a Polish space. Endowed with Skorokhod’s metric ds, D(T ) becomes a Banach

space only. However, Billingsley (1968) proved that there is an equivalent metric d0_s that makes the Banach space separable and thus a Polish space.

Based on the above considerations, we now introduce weak convergence of random functions. Suppose X(n), X ∈ D(T ). We say X(n) converges weakly to X (and write X(n) → X), if for all bounded, real-valued and continuousD functions f :

Ef (X(n)) → Ef (X) as n → ∞

where the latter convergence is to be understood with respect to d0_s. A gen-eralization of the Skorokhod topology to multidimensional spaces (and hence a generalization of weak convergence to that of multidimensional random func-tions) is sketched in Jakubowski (2007).

(18)

1.4.2 Preliminaries from martingale theory

Many arguments throughout this thesis are based on martingale theory. The key in most of the arguments providing large sample properties is a martingale central limit theorem that we want to introduce in this section. However, in order to do so, we shall briefly recap martingale-related definitions first.

We find ourselves in a filtered probability space where we denote the filtration by F = {F (t), t ∈ T }. A cadlag process M is called a martingale on T with respect to F if it is integrable on T , adapted to F and if it additionally satisfies the martingale property, i.e. for 0 ≤ s ≤ t ∈ T :

E[M (t)|F (s)] = M (s). It is called square integrable if it additionally satisfies

sup

t∈T

E[M (t)2] < ∞.

A stochastic process is said to satisfy a property locally if that property is satisfied along a fundamental (or localizing) sequence of stopping times. A sequence Tn of stopping times is called fundamental if it is non-decreasing and if

∀t ∈ T : P (T_n≥ t) = 1 as n → ∞.

For example, we say that M is a local square integrable martingale if there is a fundamental sequence Tn such that MTn is a square integrable martingale. Here, we employ the notation

MTn(t) := M (Tn∧ t) and call MTn the stopped process.

For a square integrable martingale M , it is known that M2 is a non-negative submartingale. An application of the Doob-Meyer theorem guarantees the ex-istence of a predictable, increasing process (Spreij, 2013, p. 19) that we will denote by < M > such that

M2− < M >

is a martingale. < M > is called the quadratic variation process. Based thereon, it will turn out to be useful to define a (predictable) covariation process for two square integrable martingales M1 and M2 by

< M1, M2 >:=

1

4[< M1+ M2> − < M1− M2 >] .

It can be shown (Spreij, 2013) that < M1, M2> is the unique process that can

be written as the difference of two natural processes such that M1M2− < M1, M2>

(19)

is a martingale. Here, a process X is called natural if it is increasing and if it further satisfies the following equality for any right-continuous, bounded martingale M : E " Z (0,t] M (s)dX(s) # = E " Z (0,t] M (s−)dX(s) # .

An extension of quadratic variation and predictable covariation to local mar-tingales is feasible. For a local martingale M , we define its quadratic variation process as the unique finite variation cadlag predictable process such that

M2− < M >

is a local martingale. A similar extension of the predictable covariation process from martingales to local martingales is straightforward (Andersen et al., 1993, p. 68).

A process X is a semimartingale if there is an increasing, predictable process A such that M = X − A is a local martingale. A is frequently called the compensator of X and the Doob-Meyer decomposition theorem (Spreij, 2013, p. 11) gives conditions under which A exists. For a semimartingale X, we define its quadratic covariation by

< X >:=< M > .

Similarly, we define the predictable covariation process for two semimartingales. For a cadlag process X, let us define the jump process

∆X(s) = X(s) − X(s−).

Hence, if X is left-continuous, ∆X ≡ 0. For a local square integrable martin-gale M and > 0 given, define A(t) := Ps≤t∆M (s)1[|∆M (s)|>]. Denote its

compensator by ˜A. We define M := A− ˜A. It can be shown that M − M

has jumps smaller than 2 only (Rebolledo, 1980).

In order to state a multi-dimensional martingale central limit theorem, we con-sider a k-dimensional sequence of local square integrable martingales,

M(n)=M₁(n), ..., M_k(n).

Denote by < M(n) > the k × k-matrix containing the quadratic covariation processes < M_i(n), M_j(n) >. Suppose that V is a k × k-dimensional continuous, deterministic, positive semidefinite matrix-valued function on T . Let M(∞)be a continuous Gaussian vector martingale with < M(∞)_{>= V. Such a martingale}

is characterized by two properties:

M(∞)(t) − M(∞)(s) ∼ Nk(0, V(t) − V(s)) , (1.12)

(20)

V is chosen such that M(∞) always exists (Andersen et al., 1993).

We are now able to introduce the desired convergence theorem. The mar-tingale central limit theorem can be viewed as a generalization of the stan-dard central limit theorem. Whereas the stanstan-dard central limit theorem makes a statement about the convergence of the sum of independent identically dis-tributed random variables, the martingale central limit theorem generalizes this from random variables to stochastic processes, namely martingales. Different versions exist that differ with respect to restrictions on the martingale. The following version from Andersen et al. (1993, p. 83) is the one that is used fre-quently in the context of survival analysis and that will turn out to be sufficient for our purposes.

Theorem 1.5 (Martingale central limit theorem) Suppose T0 ⊂ T .

More-over, let M(n)be defined as above. Furthermore, denote by M(n) the K-dimensional

vector where the ith component equals M_i(n). If we assume that, as n → ∞, < M(n)> (t)→ V(t) for all t ∈ TP 0,

< M_i(n)> (t)→ 0 for all t ∈ TP 0, i ∈ {1, ..., K}, > 0,

the following convergence holds true as n → ∞: M(n)(t1), ..., M(n)(tl) _D →M(∞)(t1), ..., M(∞)(tl) for t1, ..., tl∈ T0.

If T0 is dense in T and if it contains τ if T = [o, τ ], we get

M(n) D→ M(∞) in (D(T ))k as n → ∞

and < M(n)> converges uniformly on compact subsets of T in probability to V. (D(T ))k is the space of Rk-valued cadlag functions on T endowed with the Skorokhod topology.

The importance of the above martingale central limit theorem in survival analysis is based on the observation that M (t) := N (t) −Rt

0 λ(s)r(s)ds is a local

square integrable martingale with respect to an adequate filtration (Andersen et al., 1993, p. 73). Furthermore, when integrating with respect to M (t), it is known that the stochastic integral is a local martingale if the integrand is a lo-cally bounded predictable process. If M is a local square integrable martingale, integrating with respect to M (t) even yields a local square integrable martin-gale. Under suitable conditions the martingale central limit theorem becomes applicable and statements about the asymptotic behavior of a sequence of such integrals can be made.

1.4.3 Supplementary results

An equation that we will encounter in connection with product integrals is the Duhamel equation (Andersen et al., 1993, p. 90). It relates two product integrals and later-on allows us to apply martingale arguments. Without proving it, we shall state its one-dimensional version here.

(21)

Lemma 1.1 (Duhamel equation) Suppose Y (t) = t R 0 (1 + dX(s)), Y0(t) = t R 0 1 + dX0(s)

and further assume that Y0 is non-singular. Then Y (t) Y0(t) − 1 = Z t 0 Y (s−) Y0(s) d(X(s) − X 0 (s)).

In particular, if X(t) − X0(t) is a local square integrable martingale, Duhamel’s equation implies that _YY (t)0_(t)− 1 is a local square integrable martingale if

Y (s−) Y0_(s) is

(22)

Chapter 2

Variance Estimation for

Cause-Specific Cumulative

Incidence Estimators

In this chapter, we aim at estimating the variance of estimators of the cause-specific cumulative incidence function. Based thereon, we are able to do inference at fixed time points. For example, we can then construct confidence intervals for this very estimator.

First, we are going to explain variance estimation under complete information followed by an examination of the censoring complete information case. Both settings allow for a straightforward derivation using martingale arguments. The random censoring information situation is considered thereafter. We will briefly show how estimators from the literature can be derived and thereafter explain why an application of the reweighing approach suggested in Geskus (2011) is not feasible.

2.1 A variance estimator under complete information

Let us first consider the complete information setting, i.e. we assume that there is neither right censoring nor left truncation. Using a martingale central limit theorem, we derive an estimator having the desired asymptotic properties. The given arguments are entirely taken from the proof by Andersen et al. (1993, p. 256 ff.) for the standard survival setting without competing risks. However, we shall use a different underlying martingale.

The most crucial change to the derivation in Andersen et al. (1993) for this purpose is that we are in the filtered probability space where the filtration is now given by Fk = {Fk(t)}t≥0 with

F_k(t) = σ{Nik(u), u ≤ t, i = 1, ..., n}.

This is the filtration that does not contain information about the time to the competing events. Let us for now assume that Γk(t) is absolutely continuous so

(23)

that γk(t) exists. Recall that in this setting,

Mk(t) = Nk(t) −

Z t

0

γk(s)R(s)ds (2.1)

is a local square integrable martingale with respect to Fk (Gray, 1988). Such a

process is the starting point in Andersen et al. (1993) and from here we proceed exactly as done there. We define Sk as the ’survival of event type k’-function,

i.e.

Sk(t) := ¯Fk(t) = exp (−Γk(t)).

We denote the support of Sk by T . When at least 2 event types happen with

a positive probability, Fk(∞) = P (Di = k) < 1 and T = [0, ∞]. Under the

assumption given above formula (2.1), Sk(t) is a continuous function. Using the

definition of the product integral, we write this as Sk(t) =

t

R

0

(1 − dΓk(s)).

Hence, a product limit estimator of Sk(t) is given by

ˆ S_kP L(t) = t R 0 1 − dˆΓk(s) .

We keep in mind that the cumulative subdistribution hazard can be estimated by a Nelson-Aalen type estimator, i.e.

ˆ Γk(t) = Z t 0 J (s) R(s)dNk(s) with J (t) := 1[R(t)>0]. Therefore, ˆ S_kP L(t) = t R 0 1 − J (s)dNk(s) R(s) .

The following theorem gives conditions for uniform consistency of this estimator. It can be established as done in (Andersen et al., 1993, theorem IV.3.1) using Mk instead of M . The conditions are therefore similar to those in Theorem 1.1.

Theorem 2.1 (Asymptotics of Sk I) Suppose t ∈ T . Furthermore, assume

that, as n → ∞, Z t 0 J (s) R(s)γk(s)ds P → 0, Z t 0 (1 − J (s))γk(s)ds→ 0.P

Then ˆS_kP L is uniformly consistent on [0, t], i.e. sup

s∈[0,t]

| ˆS_kP L(s) − Sk(s)| P

(24)

To proceed, we define in analogy with (4.3.3) in Andersen et al. (1993) Γ∗_k(t) :=

Z t

0

J (u)γk(u)du

and based thereon

S_k∗(t) := exp (−Γ∗_k(t)) =

t

R

0

(1 − dΓ∗_k(s)).

Let us note that Γ∗_k(t) ≤ Γk(t) and consequently S_k∗(t) ≥ Sk(t). Later on, we

shall use the relationship ˆ Γk(t) − Γ∗k(t) = Z t 0 J (s) R(s)dNk(s) − Z t 0 J (s)γk(s)ds = Z t 0 J (s) R(s)dMk(s).

We observe that S_k∗(t) > 0 by the way we defined it and therefore the Duhamel equation from section 1.4.3 becomes applicable. It yields

ˆ S_kP L(t) S_k∗(t) − 1 = − Z t 0 ˆ S_kP L(s−) S_k∗(s) d(ˆΓk− Γ ∗ k)(s) which we rewrite as ˆ S_kP L(t) S_k∗(t) − 1 = − Z t 0 ˆ S_kP L(s−)J (s) S_k∗(s)R(s) dMk(s). (2.2)

The integrand is bounded by _S1

k(t) and we therefore conclude that

ˆ Sk(t)

S_k∗(t) − 1 is a

local square integrable martingale with expectation zero. Hence, E " ˆ_SP L k (t) S_k∗(t) # = 1.

Since we observed earlier that S_k∗(t) ≥ Sk(t), we therefore conclude that E ˆSkP L(t) ≥

Sk(t). The same holds true in the standard survival setting (Andersen et al.,

1993, p. 257). To get an estimator for the variance of Sk(t), we use formula

(2.2) again and note that < Mk> is the compensator of Mk2. Hence,

V ar ˆ_SP L k (t) S_k∗(t) − 1 ! = E ( ˆ_SP L k (t) S_k∗(t) − 1 )2 = E < Sˆ P L k S_k∗ − 1 > (t). Using the predictability of SˆkP L(s−)J (s)

S_k∗(s)R(s) and further employing the martingale

property in (2.1), we compute the quadratic variation of SˆkP L(t)

S_k∗(t) − 1: < Sˆ P L k S_k∗ − 1 > (t) = Z t 0 ( ˆ_SP L k (s−) S_k∗(s) )2 J (s) R(s)γk(s)ds.

(25)

From (2.1), it is known that sup_s∈[0,t]| ˆS_kP L(s) − S_k∗(s)| → 0.P Furthermore, since we assume that Sk(t) is continuous, we know that sups∈[0,t]| ˆSkP L(s) −

ˆ

S_kP L(s−)|→ 0 as n → ∞. Therefore, we conclude thatP V ar ˆ_SP L k S_k∗ ! − Z t 0 J (s) R(s)γk(s)ds P → 0 as n → ∞.

Thus, replacing γk(s) by its natural estimate ˆγk(s), we get a variance estimator

d V ar ˆ_SP L k (t) Sk(t) ! = Z t 0 J (s) R2_(s)dNk(s).

If we denote the variance of ˆS_kP L(t), the product-limit estimator, by σ2_{P L}(t), we therefore conclude that

ˆ σ_{P L}2 (t) = ˆSP L_k (t) 2Z t 0 J (s) R2_(s)dNk(s).

This estimator, which is commonly called the Aalen-type estimator, turned out not to be the preferred one in the standard survival setting without competing risks (Therneau, 1999). An estimator that was found to perform preferably in small samples is obtained when extending the model. It is no longer assumed that Γk(t) =

Rt

0 γk(s)ds and Sk(t) is therefore not necessarily continuous. Using

this more general approach, we may work with Mk(t) = Nk(t) −

Z t

0

R(s)dΓk(s).

Using the predictability of R and equation (2.4.2) in Andersen et al. (1993), we conclude that this local martingale has quadratic variation process

< Mk > (t) = Z t 0 R(s)(1 − ∆Γk(s))dΓk(s). Hence, < ˆ S_kP L S_k∗ − 1 > (t) = Z t 0 ( ˆ_SP L k (s−) S_k∗(s) )2 J (s) R(s)(1 − ∆Γk(s)dΓk(s). By replacing S_k∗ by ˆS_kP L and Γk by ˆΓk we then obtain

d V ar ˆ_SP L k (t) Sk(t) ! = Z t 0 J (s) R(s)(R(s) − ∆Nk(s)) dNk(s).

This yields the Greenwood-type estimator for the variance of the estimator of the cause specific cumulative incidence function:

ˆ σ_Gw2 (t) = ( ˆS_kP L(t))2 Z t 0 1 R(s)(R(s) − ∆Nk(s)) dNk(s).

(26)

The large sample properties now follow similarly to those of the Kaplan-Meier estimator in Andersen et al. (1993, theorem IV.3.2.) when replacing M by Mk.

Note that the required conditions are therefore similar in nature to those in Theorem 1.2. Without a proof, we shall first state the necessary conditions and then give the convergence results we get for an increasing sample size n. Even though this is not explicitly indicated in the notation, we note that all the quantities will depend on n.

Theorem 2.2 (Asymptotics of Sk II) For some t ∈ T , suppose there is a

non-negative function ˜R such that γk(s)

˜ R(s) is integrable on [0, t]. Then, σ_{P L}2 (t) := S_k2(t) Z t 0 γk(s) ˜ R(s)ds is well-defined. If we further assume that, as n → ∞,

∀s ∈ [0, t] : n Z s 0 J (u) R(u)γk(u)du P → σ 2 P L(s) S_k2(s) , (2.3) ∀ > 0 : n Z t 0 J (s) R(s)γk(s)1 h √ nJ (s)_R(s) > ids P → 0, and √n Z t 0 (1 − J (u))γk(s)ds P → 0, we get √ n( ˆS_kP L(t) − Sk(t)) D → N (0, σ2 P L(t)) for n → ∞.

Finally, the above derived variance estimates are uniformly consistent in the following sense: sup s∈[0,t] |nˆσ_Gw2 (s) − σ_{P L}2 (s)|→ 0 as n → ∞,P sup s∈[0,t] |nˆσ_Aa2 (s) − σ_{P L}2 (s)|→ 0 as n → ∞.P

Note that ˜R(t) is implicitly defined by (2.3).

To summarize, we derived a variance estimator for ˆSP L

k (t). Since Sk(t) =

¯

Fk(t), this is then obviously a variance estimator for ˆFkP L(t) := 1− ˆSkP L(t) as well.

In section A.1, we show that under complete information, ˆF_kAJ(t) = ˆF_kP L(t) and their variances evidently equal. Therefore, the above derived variance estimator also estimates the variance of the Aalen-Johansen estimator.

2.2 Generalization to censoring complete information

In a next step, we want to generalize the above setting slightly to the censoring complete information situation. In this setting, an individual that experienced a competing event is considered to be at risk to experience the event of interest up

(27)

to his censoring time. After, it leaves the risk set as any censored individual does. Recall from section 1.3 that in this setting, the underlying filtration is given by Fk,CC = {Fk,CC(t)}t≥0and that M_ikc(t) := N_ikc(t)−

Rt

0Ri,CC(u)γk(u)du is a local

square integrable martingale with respect to Fk,CC. Define Nkc(t) =

Pn i=1Nikc(t). We conclude that M_kc(t) := N_kc(t) − Z t 0 RCC(u)γk(u)du

is a local square integrable martingale with respect to Fk,CC as well. Hence, we

can apply martingale properties as done above in the complete information situ-ation. Again, we will obtain a variance estimator of ˆS_kP L(t) since the derivation makes use of the product limit form

t R 0 (1 − dΓk(s)) = exp − Z t 0 γk(u)du .

In the censoring complete information situation, Γk(u) is estimated by

ˆ Γk(t) = Z t 0 J (s) RCC(s) dN_kc(s) and the product-limit estimator is therefore given by

t R 0 1 − dN c k(s) RCC(s) .

However, in the censoring complete information situation, ˆ S_kP L(t) = t R 0 1 − dN c k(s) RCC(s) 6= Z t 0 ˆ S(s−) r(s) dN c k(s) = ˆSkAJ(t).

The martingale derivation remains valid but we will eventually get a variance estimator for ˆ S_kP L(t) := t R 0 1 − dN c k(s) RCC(s) .

The obtained variance estimator will not estimate the variance of the Aalen-Johansen estimator. However, for later purposes, let us summarize the asymp-totic properties that we get under conditions that are similar to those in section 2.1 with ˜R(t) replaced by ˜RCC(t). The product-limit estimator ˆS_kP L(t) will again

be uniformly consistent and in the censoring complete information setting and it will further satisfy

√ n( ˆSP L_k (t) − Sk(t)) D → N (0, σ2_CC(t)) as n → ∞ with σ2_CC(t) := (Sk(t))2 Z t 0 γk(s) ˜ RCC(s) ds.

(28)

An estimator of the variance will then be given by ˆ σ2_CC(t) = ( ˆS_kP L(t))2 Z t 0 1 RCC(s)(RCC(s) − ∆N_kc(s)) dN_kc(s)

and again, the variance estimate is uniformly consistent in the following sense: sup

s∈[0,t]

|nˆσ_CC2 (s) − σ_CC2 (s)|→ 0 as n → ∞.P

2.3 Variance estimation under random censoring

in-formation

In section 1.3, we have argued that Mc∗

k (t) = Nkc(t) −

Rt

0RRC(s)γk(s)ds is not a

local martingale under random censoring information. Hence, arguments differ-ent from the censoring complete information framework have to be applied to variance estimation.

Various ways of estimating the variance of the Aalen-Johansen estimator under random censoring information have been introduced. A brief overview is given in Braun and Yuan (2007). We will call the estimator that was found to perform best in simulation studies under random right censoring informa-tion (Braun and Yuan, 2007) and in the presence of addiinforma-tional left truncainforma-tion (Allignol et al., 2010) the Betensky estimator. We will introduce truncation in section 4.1. In the present section, the Betensky estimator is derived first from the multistate model. Thereafter, an alternative approach by Gray (1988) is presented. Next, we will show that the variance estimator proposed by the approach in Geskus (2011) does not have the desired asymptotics. A possible correction term is discussed afterwards.

2.3.1 The Betensky variance estimator

The Betensky variance estimator was derived by Betensky and Schoenfeld (2001) and Gaynor et al. (1993) using different approaches. Allignol et al. (2010) showed that their estimator is equivalent to the Greenwood-type estimator from the multistate model.

We want to sketch how the Greenwood-type estimator can be derived from the multistate model by following the arguments in Andersen et al. (1993). For this purpose, we briefly explain the multistate model first:

Recall that there are m competing risks. We now consider an (m + 1)-dimensional state space, say 0, 1, ..., m and model the movement of an individual as a Markov process with initial state 0 in t = 0. The probability to experience a transition from state a to state b in [s, t] is denoted by Pab(s, t). We denote by

P(s, t) the (m+1)×(m+1)-matrix collecting the various transition probabilities, P(s, t) = {Pab(s, t)} , a, b = 0, ..., m.

(29)

Suppose there are n independent individuals having the above behavior. For a 6= b, the number of direct, observed transitions from a to b at [0, t] is denoted by Nab(t) and it can be shown that Nab(t) has intensity process λab(t)ra(t) where

λab(t) is the transition intensity for one individual and ra(t) is the total number

of individuals in state a at time t. Based on these considerations, Andersen et al. (1993) derive a product-limit estimator of P(s, t), the (multidimensional) Aalen-Johansen estimator, which is given by

ˆ P(s, t) = t R s I + d ˆΛ(u).

Here, ˆΛ(u) is an estimator of Λ(u), i.e. of the matrix having as its entries the cumulative intensities Λab(u) =

Ru

0 λab(s)ds. These intensities can be estimated

by a Nelson-Aalen-type estimator, ˆ Λab(u) = Z u 0 1 ra(s) dNab(s).

The derivation of the Betensky variance estimator now reads fairly similar to that of the Greenwood estimator in the setting without competing risks. Andersen et al. (1993) define Λ∗(t) as the matrix with components

Λ∗ab(t) =

Z t

0

1[ra(s)>0]λab(s)ds

and show that ˆΛ(t) − Λ∗(t) is a local square integrable martingale. Using a multidimensional form of the Duhamel equation from section 1.4.3, they then prove that ˆ P(s, t)P∗(s, t)−1− I = Z t 0 ˆ P(s, u−)d ˆΛ(t) − Λ∗(t) P∗(s, u)−1.

Based on this martingale representation of the Aalen-Johansen estimator, one can derive a covariance estimator in a similar way as it was done in section 2.1 based on (2.2). However, the necessary computations are now multidimensional and hence somewhat more complex, see Andersen et al. (1993, p. 290 ff.) for the explicit formulas.

We obtain the competing risks models by setting for a 6= b, a 6= 0 all the transition intensities λab(t) equal to zero. Therefore, in the competing risks

setting, the (m + 1) × (m + 1)-dimensional transition matrix is of the form

P(s, t) =        P00(s, t) P01(s, t) P02(s, t) . . . P0m(s, t) 0 1 0 . . . 0 0 0 1 . . . 0 .. . ... ... . .. ... 0 0 0 . . . 1        .

Consequently, their covariance estimator can be applied in order to estimate the variance of the Aalen-Johansen estimator in the competing risks setting as well.

(30)

2.3.2 Gray’s variance estimator

Gray (1988) suggested a different variance estimator. Even though his variance estimate tends to overestimate the actual variance in small samples (Braun and Yuan, 2007), the following considerations give rise to an estimator that has a similar structure to the one we get using the reweighing approach. It shall further be of use in later chapters when considering K-sample tests.

Gray makes use of the stochastic processes Mi(t) = Nic(t) −

Z t 0

r(u)λi(u)du.

Here, i denotes failure type. He establishes that the Mi are orthogonal square

integrable local martingales with predictable variation processes < Mi > (t) =

Z t

0

r(u)λi(u)du

with respect to FRC (Gray, 1988). We shall employ the notation M−k(u) :=

P

i6=kMi(u) and use a similar notation for any other quantity such as λ−k(t) :=

P

i6=kλi(t) and F−k(t) := Pi6=kFi(t). The following decomposition, that was

later also used by Lin (1997), is the starting point of his derivation: ˆ Fk(t) − Fk(t) = Z t 0 ˆ S(u)d ˆΛk(u) − Z t 0 S(u)dΛk(u) = Z t 0 ˆ S(u) r(u)dMk(u) + Z t 0 ˆ_{S(u) − S(u)} dΛk(u). Recall that Fk(t) = Rt

0S(u−)dΛk(u). S is assumed to be continuous, thus

Fk(t) =

Rt

0 S(u)dΛk(u). Assume that S(t) > 0. Then,

ˆ Fk(t) − Fk(t) = Z t 0 ˆ S(u) r(u)dMk(u) + Z t 0 ˆ_{S(u) − S(u)} S(u) ! dFk(u)

An application of Duhamel’s equation from section 1.4.3 yields ˆ S(t) − S(t) S(t) = − Z t 0 ˆ S(u−) r(u)S(u)dM (u) where M (u) =Pm

i=1Mi(u). We emphasize that as a consequence, ˆ S(t)−S(t)

S(t) is a

continuous process of finite variation. This is important to note since accordingly the integration by parts formula from ordinary integration holds true. Hence,

Z t 0 ˆ_{S(u) − S(u)} S(u) ! dFk(u) = ˆS(t) − S(t) S(t) ! Fk(t) − Z t 0 Fk(u−)d ˆ_{S(u) − S(u)} S(u) ! = ˆS(t) − S(t) S(t) ! Fk(t) + Z t 0 Fk(u−) ˆS(u−) r(u)S(u) dM (u).

(31)

Using the above mentioned integral representation for S(t)−S(t)ˆ _S(t) , we rewrite this as Z t 0 ˆ_{S(u) − S(u)} S(u) ! dFk(u) = −Fk(t) Z t 0 ˆ S(u−) r(u)S(u)dM (u) + Z t 0 Fk(u−) ˆS(u−) r(u)S(u) dM (u) = Z t 0 [Fk(u−) − Fk(t)] ˆS(u−) r(u)S(u) dM (u).

Therefore, we conclude that ˆ Fk(t) − Fk(t) = Z t 0 ˆ S(u) r(u)dMk(u) + Z t 0 [Fk(u−) − Fk(t)] ˆS(u−) r(u)S(u) dM (u).

Let us assume that the conditions in theorem 1.3 hold true. Then, ˆS(t) is known to be a uniformly consistent estimator of S(t). Thus, asymptotically,

ˆ Fk(t) − Fk(t) = Z t 0 S(u) r(u)dMk(u) + Z t 0 [Fk(u−) − Fk(t)] r(u) dM (u) = m X l=1 Z t 0 S(u) r(u)1[l=k]+ [Fk(u−) − Fk(t)] r(u) dMl(u).

Recall that Mk and M−k are independent and further

< Mk> (t) =

Z t

0

r(s)λk(s)ds.

An application of the martingale central limit theorem yields asymptotic nor-mality of √n ˆFk(t) − Fk(t)

. We further determine its variance. Recall that S(t) = 1 − Fk(t) − F−k(t) and that we assume Fkto be continuous. Furthermore,

using the predictability of the integrand, we conclude that σ_RC2 (t), the variance of limn→∞ √ nn ˆFk(t) − Fk(t) o , is given by σ_RC2 (t)= limP n→∞n m X l=1 Z t 0 S(u) r(u)1[l=k]+ [Fk(u) − Fk(t)] r(u) 2 d < Ml> (u) = lim n→∞n Z t 0 (1 − F−k(u))2 r(u) λk(u)du + Z t 0 (Fk(u))2 r(u) λ−k(u)du (2.4) + (Fk(t))2 Z t 0 λk(u) + λ−k(u) r(u) du −2F_k(t) Z t 0 1 − F−k(u) r(u) λk(u)du + Z t 0 Fk(u) r(u) λ−k(u)du .

Replacing unknown quantities by their estimates, this gives rise to a variance estimate based on Gray (1988).

(32)

2.3.3 Geskus’ variance estimator

In the setting without competing risks, the survival estimator, i.e. the Kaplan-Meier estimator, is given by (1.3) and an estimator of its variance was found to be (Andersen et al., 1993) ˆ σ_Gw2 (t) = ( ˆS(t))2 Z t 0 1 r(s)(r(s) − ∆Nc_(s))dN c_(s).

The Aalen-Johansen estimator under random censoring information can be writ-ten as a product-limit estimator,

ˆ S_kAJ_{(t) = R} s≤t 1 − dN c k(s) RRC(s) ,

and the reweighing approach in Geskus (2011) suggests to estimate its variance by ˆ σ_Ge2 (t) = ( ˆS_kAJ(t))2 Z t 0 1 RRC(s)(RRC(s) − ∆N_kc(s)) dN_kc(s). If M_kc∗(t) = N_kc(t) −Rt

0RRC(s)γk(s)ds was a local square integrable martingale

with respect to a filtration G (which is possibly different from the filtration under random censoring information), martingale arguments as in the censoring complete information situation would apply. In particular, ˆσ_Ge2 (t) would satisfy

|nˆσ2_Ge(t) − σ_RC2 (t)|→ 0 as n → ∞.P

We are going to show that this is not the case. In fact, we will prove that n|ˆσ2_CC(t) − ˆσ_Ge2 (t)|→ 0 as n → ∞.P (2.5) For this purpose, suppose ˆS_kP L is the product-limit estimator under censoring complete information and ˆS_kAJ is the Aalen-Johansen estimator under random censoring information. We first note that, under conditions similar to the ones in theorem 2.1,

sup

s∈[0,t]

| ˆS_kAJ(s) − ˆS_kP L(s)|→ 0 as n → ∞P

since both quantities estimate Sk(t) uniformly consistently. We showed in (1.8)

that

1

n|RCC(t) − RRC(t)|

a.s.

→ 0 for n → ∞.

Further using the dominated convergence theorem for stochastic integrals (Prot-ter, 2004, p. 176), we get, as n → ∞, n Z t 0 1 RRC(s)(RRC(s) − ∆N_kc(s)) − 1 RCC(s)(RCC(s) − ∆N_kc(s)) dN_kc(s) P → 0.

A triangle inequality type argument then establishes n|ˆσ_CC2 (t) − ˆσ_Ge2 (t)| → 0P and we conclude that (2.5) holds true. Therefore, we see that ˆσ2_Ge(t) estimates

(33)

the variance of the product-limit estimator in the censoring complete informa-tion setting with informainforma-tion from the random censoring informainforma-tion situainforma-tion only. However, the counterexample in section A.3 shows that σ2_CC(t) = σ_RC2 (t) does generally not hold true. The product-limit estimator from the censoring complete information framework itself may not be determined based on the in-formation from the random censoring inin-formation. We are therefore not able to construct confidence intervals with the above variance estimate.

Even though this is not true when there is censoring, we show in A.2 that ˆ

σ_Ge2 = ˆσ_Be2 under complete information.

2.4 A failure term

In this section, we want to examine how σ_RC2 (t) and σ2_CC(t) differ. If we man-aged to find an easy-to-compute expression for the difference, we could possibly correct the Geskus variance estimator by the relevant quantity. Furthermore, we would possibly be able to say under what conditions the transferred variance estimate is applicable.

From the martingale arguments in section 2.2, we obtain

σ2_CC(t)= limP n→∞(Sk(t)) 2 _n Z t 0 1 RCC(s) γk(s)ds.

Furthermore, σ_RC2 (t) is given in formula (2.4). Together with formula (12) in Geskus (2011), this yields

σ_CC2 (t) − σ2_RC(t)= limP

n→∞n(I) − 2Fk(t)(II) + (Fk(t))

2_(III)

where (I), (II) and (III) are given by

(I) = Z t 0 1 r(s) ˆ S(s−) ˆ Sk(s−) γk(s) − (S−k(s))2λk(s) − (1 − Sk(s))2λ−k(s) ! ds (II) = Z t 0 1 r(s) ˆ S(s−) ˆ Sk(s−) γk(s) − S−k(s)λk(s) − (1 − Sk(s))λ−k(s) ! ds (III) = Z t 0 1 r(s) ˆ S(s−) ˆ Sk(s−) γk(s) − λk(s) − λ−k(s) ! ds.

For the standard survival analysis framework with only one risk type, we see that the difference of σ2

RC and σCC2 is zero (only one type of hazard and λ−k(s) = 0).

Due to the complicated nature of the above variance estimates, we have not been able yet to determine some relation between σ_CC2 and σ_RC2 such as σ_CC2 ≥ σ2

(34)

2.5 A simulation study

In order to examine the small sample performance of the Geskus estimator, we performed a simulation study. Time-to-event data were generated as in Braun and Yuan (2007), i.e. based on a bone marrow transplant example, with additional left truncation. The event of interest is relapse and there are two kinds of competing events: death and aGVHD (acute graft-versus-host-disease). Times to relapse were simulated from a Weibull distribution with hazard λr(t) =

κrρr(ρrt)κr−1. The competing event times were generated from an exponential

distribution with mean 10 (time to death) and from a Weibull distribution with hazard λa(t) = κaρa(ρat)κa−1 (aGVHD). We simulated competing risks data in

four settings that differ with respect to the parameter κr, ρr, κa and ρa. The

choice of parameters is displayed in Table 2.1. The parameters are chosen such that in setting A and B, the event of interest has a decreasing hazard whereas in setting C and D its hazard is increasing. Both cases are paired with a decreasing (setting A and C) or increasing (setting B and D) hazard for time to aGVHD.

We considered censoring times that were uniformly distributed on [0, 10]. In Setting κr ρr κa ρa

A 0.5 0.2 0.5 0.2

B 0.5 0.2 2 0.2

C 2 0.2 0.5 0.2

D 2 0.2 2 0.2

Table 2.1: Parameter values for the investigated settings.

addition to the data generation in Braun and Yuan (2007), we also simulated left truncation times that were uniformly distributed on [0, 5] with probability p.t and 0 otherwise. We simulated data for p.t = 0 (no truncation), p.t = 0.3 (slight truncation) and p.t = 0.8 (heavy truncation). As in Braun and Yuan (2007), we evaluated performance for sample sizes n = 20, n = 50 and n = 100. Due to the left truncation in our simulation, it happened that some individuals were not observed at all (study entry time larger than censoring time or larger than event time). Since the investigated samples are small already, we decided to generate new proposals for the individuals that are not observed in order to guarantee that our variance estimates are based on samples of size n. We note that the individuals that had to be replaced entered the study late. The newly generated individual entered the study late with probability p.t < 1 again. Therefore, the actual fraction of individuals entering the study at time t > 0 is smaller than p.t. Together with the complete simulation results, this fraction is reported in the appendix.

We chose B = 1000 as our simulation size. However, there occurred empty inner risk sets of two kinds. Firstly, for some data sets at times t0 smaller

than the investigated time points t = 1 and t = 3, R(t0) = 0 but at earlier

and later time points R(t) > 0 (singletons of type 1). In this case, the data set was not taken into consideration since the product-limit estimator of the

(35)

cause-specific cumulative incidence function will be zero at all time points larger than t0. Furthermore, data sets in which r(t0) = 0 but r(t) > 0 at times t > t0

(singletons of type 2) cannot be handled by available software for subdistribution analysis. Hence, such data sets were not taken into consideration either. The number of problematic data sets of both types is reported in the appendix as well. Note that any singleton of type 1 will also be of type 2.

We computed a number of different variance estimates. As a reference point, we chose the empirical variance which is also displayed in the tables below (”emp Var”). For the estimates, it is displayed what fraction of the empirical variance their average (averaging over the B samples) explained (relative variance). The closer this is to 1, the better the estimator performs. In the sequel, we will briefly explain how the different variance estimators were computed:

The Geskus variance estimator (ˆσ2_Ge) was computed using the standard sur-vival package after transforming the original data set to a data set where individ-uals that experienced a competing event obtained a time-dependent weight. For this part, we used the crprep function. Using the reweighed data set, we then used the survfit function to compute the Geskus variance estimator. Here, we used the Greenwood-type estimator since this has been found to perform preferably in the standard survival setting (Therneau, 1999). In the displayed simulation outcome, this estimator is denoted by ”Geskus1”. We also computed the Aalen-type estimator based on the reweighed data set (”Geskus2”) but found the Greenwood-type estimator to perform better indeed.

The variance estimator by Betensky/Gaynor et al. (ˆσ_Be2 ) is implemented in the etm-package and denoted by ”Betensky”. For the case of only right censored data (p.t = 0), we also computed Gray’s variance estimator (ˆσ_Gr2 ; ”Gray”) using the cuminc function in the cmprsk-package. Since this function does not allow for left truncation, the corresponding cells are left blank when individuals may enter late. The different variance estimates are computed at two time points, t = 1 and t = 3. The simulation results are displayed below.

Despite not having the desired asymptotic properties, Geskus’ estimator of the variance appears to give reasonable estimates in small samples. From our simulation, we cannot tell any significant advantage of the Betensky variance estimator over the Geskus variance estimator. In all settings, both estimators do equally well and regularly better than Gray’s variance estimator. Whereas Gray’s variance estimator tends to overestimate the true variance, the Betensky variance estimator and the Geskus variance estimator both tend to underesti-mate it. In the absence of left truncation (p.t = 0), Geskus’ estimator appears to be larger than the Betensky estimator. Considering that they both tend to underestimate the actual variance, Geskus’ estimator regularly does better. This reverses when there is left truncation only. In this case, Geskus’ variance estimator appears to be smaller than the Betensky variance estimate and the Betensky estimate regularly performs better.

Based on the different variance estimates, we also computed confidence in-tervals on the log-log scale (Therneau, 1999) and the according coverage proba-bilities. They are reported together with the complete simulation results in the

(36)

p.t n t emp Var Geskus1 Geskus2 Betensky Gray 0 20 1 0.01077 0.93351 0.87477 0.93091 1.01907 3 0.01349 0.90488 0.83500 0.88176 1.04928 50 1 0.00409 1.01109 0.98581 1.00951 1.04511 3 0.00502 0.99654 0.96699 0.98474 1.04450 100 1 0.00188 1.10789 1.09407 1.10639 1.12547 3 0.00248 1.01955 1.00452 1.00927 1.03870 0.3 20 1 0.01135 0.94659 0.88338 0.94613 3 0.01382 0.90465 0.83435 0.89275 50 1 0.00431 1.02804 1.00073 1.02764 3 0.00497 1.03391 1.00327 1.02966 100 1 0.00237 0.93584 0.92349 0.93560 3 0.00262 0.98186 0.96750 0.97894 0.8 20 1 0.01710 0.89705 0.80666 0.90709 3 0.01801 0.85194 0.77028 0.88527 50 1 0.00683 0.93188 0.89582 0.94051 3 0.00701 0.90282 0.86967 0.93585 100 1 0.00323 0.99542 0.97644 1.00300 3 0.00326 0.98086 0.96312 1.01364

Table 2.2: Simulation results for setting A for different levels of left truncation

appendix.

It is interesting to investigate how ˆσ_Be2 − ˆσ2_Gebehaves. Exemplary, we present some observations for n = 20 for setting C at t = 3 in the absence of trunca-tion. In Figure 2.1(a), it is confirmed that ˆσ2_Ge≥ ˆσ_Be2 . The larger the variance estimates, the larger their difference tends to become.

In Figure 2.1(b), we see that large differences of the two variance estimates tend to go along with large estimates of the overall cumulative incidence function. We present one possible explanation. The Betensky variance estimator takes into account the competing event times whereas they are ignored by the Geskus variance estimator. It is known that the cause-specific cumulative incidence functions of the event of interest and of the competing event add up to the overall cumulative incidence function. When the overall cumulative incidence function has a value close to 1, this therefore poses a restriction on future estimates of the cause-specific cumulative incidence function of the event of interest. This restriction is ignored by Geskus’ variance estimator. However, we note that this apparently does not play a role in the absence of both left truncation and right censoring since the variance estimators coincide in this situation.

In Figure 2.1(c), it appears that the difference of the two variance esti-mates does not depend as much on the estimate of the cause-specific cumulative incidence function. The overall cumulative incidence is the sum of both the

A Comparison of Different Approaches to Nonparametric Inference for Subdistributions