• No results found

Modelling the antidepressant’s market behavior after patent expiries. Evidence from the Netherlands

N/A
N/A
Protected

Academic year: 2021

Share "Modelling the antidepressant’s market behavior after patent expiries. Evidence from the Netherlands"

Copied!
65
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Modelling the antidepressant’s market behavior after

patent expiries.

Evidence from the Netherlands

by

Petros Pechlivanoglou

Thesis Supervisors:

Prof. dr. T.J. Wansbeek

Prof. dr. M.J. Postma

Dr. L. Spierdijk

DEPARTMENT OF ECONOMETRICS , FACULTY OF ECONOMICS

UNIVERSITY OF GRONINGEN, THE NETHERLANDS

(2)

Abstract

Background: Controlling pharmaceutical expenditures is of particular interest to governments. Patent expiries of drugs are important, whereas dispensing (cheaper) generic agents results in lower health care costs. Therefore, estimating effects of generic substitution and factors influencing the switch to a generic product is highly relevant as it can provide useful information concerning health care decisions

Objective: To define switching patterns for drugs after patent expiry and to investigate the duration of brand product use until patients switch to generic products including estimation of the effect of different covariates on this switch.

Methods: Data on antidepressant use were obtained from IADB.nl, comprising pharmacy dispensing records of around 500,000 patients in The Netherlands. For the analyses, we focused on patients receiving a Fluoxetine product. Duration analysis techniques were applied to estimate the probability over time for patients to switch to a generic product. Non-parametric survival and hazard estimations were performed and different parametric hazard regressions were fitted on the data (using different distributions), Following we focused on multivariate analysis to estimate the effect of different covariates on the switching probability. Since we used interval censored data, we applied discrete duration methods. A theoretical and practical comparison of discrete versus continuous methods was done before application of the former. The problems of unobserved heterogeneity on the estimated model and endogeneity of some of the covariates were solved by estimating a generalized mixed model with random effects and by using instrumental variables respectively. Finally the same analysis was applied on the patent expiries of other branded products from different drug groups in the same way and we compared the results between groups.

Results: Duration analysis results show a higher probability for patients switch-ing to a generic product durswitch-ing the first year after patent expiry. For Fluoxetine, switching probabilities were largely affected by the prescribers’ and dispensers’ in-clination to provide generic drugs but were also significantly affected by the price difference between the branded and generic drugs. The correction for unobserved heterogeneity showed no significance variance between individuals but correction for endogeneity resulted in significantly different coefficient for the endogenous Covariate. The second stage analysis showed that the larger the diffusion of the new product is driven mainly by the difference in price at the early period after patent expiry. However, on this comparison, further research has to be done.

(3)

Contents

1 Introduction 5

1.1 Motivation, goals and questions . . . 5

1.2 Structure . . . 6

2 Pharmaceutical background 6 2.1 Health Policy and Price regulations . . . 6

2.2 Generic products . . . 7 2.3 Generic Substitution . . . 8 2.3.1 Pharmacists’ incentive . . . 9 2.3.2 Patients’ incentive . . . 9 2.3.3 Doctors’ incentive . . . 10 2.3.4 Therapeutic Substitution . . . 11 2.4 Existing Studies . . . 11 2.5 Antidepressants . . . 12 2.5.1 Depression . . . 12 2.5.2 Treatment . . . 13 3 Methods 14 3.1 Duration analysis . . . 14

3.1.1 The Survival Function . . . 15

3.1.2 The Hazard Function . . . 15

3.2 Censoring Problem . . . 16

3.3 Non-Parametric Estimation . . . 18

3.4 Parametric Estimation . . . 19

(4)

3.5 Semi-parametric estimation . . . 21

3.5.1 Continuous time analysis . . . 21

3.5.2 The Proportional hazard model . . . 21

3.5.3 Testing the proportionality assumption . . . 24

3.5.4 Visual tests . . . 24

3.5.5 Formal tests . . . 24

3.6 Discrete time analysis . . . 25

3.6.1 Relation between continuous and discrete methods . . . 27

3.6.2 Episode-splitting . . . 28

3.7 Endogeneity and instrumental variables . . . 32

3.7.1 2SLS Inconsistency . . . 37 3.8 Unobserved Heterogeneity . . . 38 3.8.1 Continuous case . . . 39 3.8.2 Discrete case . . . 40 4 Empirical analysis 43 4.1 Data . . . 43 4.1.1 IADB.nl . . . 43 4.1.2 The Antidepressants . . . 44 4.1.3 Fluoxetine Users . . . 46

4.2 Duration analysis results . . . 48

4.2.1 Non-parametric estimation . . . 48

4.2.2 Parametric estimation . . . 50

4.2.3 Semi-Parametric estimation . . . 52

4.3 Application in different expired patents . . . 57

(5)

1

Introduction

1.1

Motivation, goals and questions

The aim of this study is to analyse the effect of the patent expiry of Fluoxetine, nearly seven years after the first generic substitute for Prozac was introduced (December 1999). A set of research questions are raised concerning the response of the antidepressant market to the introduction of generic Fluoxetine and the effect on the main market participants: patients, prescribers and pharmacists:

• How does the introduction of generic Fluoxetine affect the probability that a patient will stop using the branded alternative?

• What is the effect of time on the above mentioned probability?

• Does the inclination of the prescriber or the pharmacist to prescribe generic rather than branded drugs affect the patient’s switching probability from a brand product to a generic?

• What is the causality, in case of an existing relation, between drug prices and pre-scribers decisions?

In particular, the following analyses are performed in order to give answers to the main research questions:

• Non-parametric methods are applied in order for the effect of time on the switching probability to be defined.

• Parametric methods are applied to model the duration until a patient’s switch. • Discrete duration methods are applied on a set of covariates to define the effect of

(6)

In the last mentioned analysis it is considered of importance to include methods that can control possible unobserved heterogeneity, as well as for the problem of endogenous covariates. In particular, a generalized mixed model with random effects is applied to control unobserved heterogeneity and the method of instrumental variables is used to deal with the endogenous covariates.

1.2

Structure

This thesis is organised as follows. In section 2 the pharmaceutical background is intro-duced and an overview of the problems to be studied is presented. Section 3 provides the methodology used in this study. In section 4 the results follow while in the last section we provide the general conclusions of this thesis together with some discussion on the pharmaceutical and econometric topics covered.

2

Pharmaceutical background

2.1

Health Policy and Price regulations

The Dutch government, as most governments, is particularly interested in controlling the pharmaceutical expenses. Reimbursement on drug expenses cost more than 10% of the total healthcare costs. Therefore a way of controlling these expenses is needed. The Dutch government has two ways to regulate drug prices. The maximum reimbursement prices are set through the reference pricing system (Geneesmiddelen Vergoedings Systeem - GVS) and the drug prices law (Wet Geneesmiddelen Prijzen - WGP) (Boersma et al., 2005). After a certain procedure the maximum reimbursement price is being set according to these two mechanisms.

(7)

pharma-ceutical product may set the price of his product close to this price. Evidence for this comes also from the fact that generic prices are only around 20% lower than the price of the brand product. In this way the generic company maximizes its profit as the patient will, theoretically, be indifferent when deciding between the brand or generic drug. Ac-cording to King and Kanavos (2000), patients do not complain if they are asked to switch from a brand to a generic product, as long as this switching will bring a profit for them. For this reason the Dutch government questioned the effect of the generic drugs as far as their effect on price and cost reduction is concerned. In this study we try to find whether this concern of the Dutch government can be supported with analyses on perscription level data. Results show that whenever the price is set close, or even above, the price of the existing brand drug, the desired cost-cutting switching to generic products does not occur.

2.2

Generic products

(8)

2.3

Generic Substitution

When a generic product enters the market, usually at a lower price than the branded prod-uct in order to attain market share, the phenomenon of generic substitution takes place. Generic substitution is the switching of a patient from a branded drug to a generic copy. In general, the promotion of generic substitution is being done in different ways across countries. Mostly the incentive is financial, but sometimes (usually because of govern-mental decisions) non-financial incentives are also given both to patients and physicians. Compulsory generic substitution, public campaigns, financial incentives to physicians and pharmacists are only some of the financial incentives. Also, non-financial incentives, such as the use of generic products only in the prescriber’s electronic drug databases or a change in medical education toward prescribing generics, are noticed across countries.

The market of generic products is expanding that varies across countries. Mainly, the reasons for uneven development of the generic market across countries are differences in:

• the quality of generic products,

• the constitution of the system of patent protection, • the governmental policy and,

• the mentality of the patients and prescibers within the specific countries.

(9)

strict regulations concerning lowering the branded products’ price, like France and Italy, or countries where the patent protection is low , as Greece, generics have much lower amounts of generic products diffused in their pharmaceutical market.

2.3.1 Pharmacists’ incentive

In the Netherlands, the most crucial decision for generic substitution is being taken by the pharmacist. He/She has the right to dispense any of the two categories of medicine (brand of generic). Unless the prescriber explicitly indicates on the prescription that the branded drug should be dispensed for certain compliance reasons, the pharmacist is eligible to dispense the generic even if the branded is suggested by the prescriber.

The profit that every pharmacist earns from each prescription is a fixed amount for every prescription dispensed. Therefore one would expect that pharmacists would be indifferent about dispensing equivalent drugs. However, pharmacists often are credited with extra profit for dispensing generic products by the insurance companies, the government or the producer of the generic drug. As a result the pharmacist should, if acting rationally, dispense the generic copy so that he/she will receive extra profit. However, oppositely, sometimes producers of branded products after relevant discounts to the pharmacist are hampering this so-called generic substitution.

2.3.2 Patients’ incentive

(10)

patients away from the decision of using a generic drug (ex. different dosage or color of the pill).

2.3.3 Doctors’ incentive

The decision on the exact prescription of the medicine is being taken of course by the doctor himself. It is very important whether a doctor is inclined to prescribe a particular medicine with regard to the probability that the patient will eventually get a prescription with this specific medication. As specialist doctors are supposed to be more frequently informed about the new entrants in each pharmaceutical market, they are expected to be the first that will be influenced by a change that occurs after the patent expiry. The general practitioners is claimed to generally see what the specialists are prescribing and then adjust their prescribing in the same direction (Laat et al., 2002).

Plenty of research has been done in the past, in order for this prescriber’s behavior to be understood, analyzed and finally become a factor of a mathematical model. Coscelli (2000) tries to extract from a set of pharmaceutical data, the reasons that a prescriber decides to prescribe one out of more therapeutically equivalent drugs. He tests the hypothesis that doctors and patients are indifferent between therapeutically equivalent drugs. He assumes however that there is no price competition, as the system in Italy in the period of his research did not allow price competition between drugs with the same active substance. Under this assumption he finds that the ’habit’ of branded prescription is the driver in prescribing decisions. However as mentioned above, the conditions in the pharmaceutical market in Italy by the time of this research are by far not similar to the ones of The Netherlands nowadays.1

1For example generic substitution from the pharmacist is still not allowed in Italy, while in the

(11)

2.3.4 Therapeutic Substitution

Together with the phenomenon of generic substitution, another effect takes place in the pharmaceutical market after the date of the patent expiry. Due to the entrance of the generic product the conditions change in the market of the therapeutic class where the branded product belongs. For example, if compulsory substitution exists then people that were used on taking a specific kind of medicine (size, dose, color) will have to switch to something that they might not like or be familiar with. Also sometimes doctors are not satisfied with the new generic product. Therefore the prescriber might prefer switching the patient toward a different treatment method that is also applicable for the particu-lar disease. This switching from one therapeutic method to another is called therapeutic substitution (Klok et al., 2006). We should note that therapeutic substitution exists also before the patent expiry (due, for example, to compliance problems of patients to a specific medication) but it is claimed that the patent expiry increases the phenomenon of thera-peutic substitution (Klok et al., 2006). Therathera-peutic substitution is indeed an interesting research topic, because a ’ leakage’ of patients from a therapeutical method where generic substitutes exist, and therefore treatment cost is lower, would yield a higher treatment cost. However we are not going to deal with it in the present thesis as it deviates significantly from our main research topic.

2.4

Existing Studies

(12)

the effect in saving health care costs after a patent is expired and generic products enter the market.

Generic substitution and the diffusion of generic substitutes in the antidepressant mar-ket as showed in (Druss, 2004) is another characteristic example. The patent expiry of Prozac has raised the interest for generic producers to enter the antidepressants market. A Cox proportional hazard model is used in order to analyze the effect of certain variables on the probability of a patient’s switch to generic Fluoxetine. In (Lanjouw, 1988) the impact of health policy choices on the decision of entry for new drugs is identified and an effort of measuring this effect is being made. The estimations are done through a probit and a hazard model, to estimate the probability and the speed of a product entering a country’s drug market, with respect to the regulations that apply in the different countries investigated.

2.5

Antidepressants

2.5.1 Depression

(13)

suffer from depression.

2.5.2 Treatment

There are several ways to treat depression. Most often patients are advised to follow a treatment involving either psychotherapy or medication or even both. Other types of therapy include electro convulsive therapy, herbal therapies, etc.

Treatment with medication is thought to be more effective for patients with medium to severe symptoms of depression. However patients respond differently to the existing types of medication. Also there are regulations that prohibit the use of several medication methods to special populations (e.g. children). Depression treatment has evolved through time and new treatment methods have been discovered, that are nevertheless not fully substituting the old ones. Therefore nowadays there exist several medication techniques for treating depression. They are divided into 2 main categories:

• Tricyclic Antidepressants

• Specific Serotonin Reuptake Inhibitors (SSRI)

(14)

3

Methods

The importance of cost reduction in the health care sector, by the means of generic sub-stitution, combined with the significant number of patent expiries in the class of SSRIs in the last decade, indicate that analyzing the antidepressants’ market around the period of patent expiries may give us interesting information concerning the switching behavior to generic prducts. In particular, measuring the duration of some antidepressant user until first generic use from the same drug group, can provide us with information on the diffusion of the generic products in the market.

Duration data are most often analyzed with statistical techniques that belong to the family of survival or duration analysis. With the help of these techniques, we can statisti-cally analyse the duration that a patient was treated with the branded drug until he was switched to a generic. Also this duration can be fitted to a distribution so that it can be described by a parametric function. Finally, by using duration analysis techniques, we can observe the effect of certain characteristics that patients, drugs, prescribers or pharmacists may have on the probability of a patient to be switched to a generic.

3.1

Duration analysis

Duration analysis, also known as survival analysis in medicine or failure time analysis in reliability theory, is probably one of the most famous fields in statistics. This is due to the fact that duration data are very often met in practice (time until death, time to a machine’s failure, etc). However, duration data cannot not always be handled with simpler techniques like Ordinary Least Squares (OLS). The reasons for this are mainly the problems of censoring and time varying covariates. Therefore, other approaches should be considered in order for estimations of duration data to be made.

(15)

3.1.1 The Survival Function

Let T be the duration from a given time point until an individual faces some event of interest. The survival function is defined as the probability of an individual to face an event after some point t in time. If we denote by f (t) the density function of T , the expression of survival, in continuous time, is :

S(t) = P r(T > t) = Z ∞

t

f (x)dx.

The survival function equals 1 − F (t), where F (t) = P r(T ≤ t) is the cumulative distribu-tion funcdistribu-tion at time t. The survival funcdistribu-tion has the property of being a strictly monotonic and non-increasing function, bounded in the interval [0,1]. Through the survival function we can derive (in continuous time) the density function f (t) as these two functions are connected through the following relation:

f (t) = ∂F (t)

∂t = −

∂S(t) ∂t .

3.1.2 The Hazard Function

The hazard function captures the conditional probability of an individual to face the anal-ysed event at time t given that he/she survived (did not face the event) up to this time. It is formulated as:

h(t) = lim

∆t→0

P r[t ≤ T < t + ∆t|T ≥ t]

∆t .

That is the probability of an event to occur in a small interval ∆t given that the duration of survival of the individual is at least t.

(16)

the density and the survival function: h(t) = f (t) S(t) = − ∂[1−F (t)] ∂t 1 − F (t) = −∂ log[1 − F (t)] ∂t = −∂ log[S(t)] ∂t .

From the above relation we can see that S(t) can be expressed as a function of h(t) in the following way:

S(t) = − exp Z t

0

h(u)du.

The value R0th(u)du is also known as the integrated hazard and is denoted by H(t). It is frequently used as its graphical representation can be sometimes much more informative than a plot of the hazard function. That is because of the fact that in the integrated hazard the amount of variation through time is smaller since the estimated values are aggregated (Kiefer, 1988). For the discrete case the integrated hazard is defined as:

H(t) =

k

X

i=0

h(ti).

where ti indicates the distinct duration times before time tk.

3.2

Censoring Problem

(17)

• Right censoring: The type of censoring that is being most often faced in practice. Within this class we can distinguish three different types of right censoring.

– Fixed censoring: It exists when an individual i is followed for the whole study period but there is no event occurring during the follow up. The individual then is censored with a fixed censoring time T .

– Random censoring: The individual can exit the study without facing an event at every point in time. The individual is considered censored if he/she exits earlier from the study sample (regardless of the reason) or does not face an event until the end of the study.

– Competing risks: In this censoring type the individual can face multiple events during the study period. In other words the events that can be faced, in any time of the study period, can be more than one. An example common in can-cer studies, is the case where competing risks are a recurrence of the disease (relapse) and death in remission (or treatment related mortality). These two risks represent the two reasons a patient may fail the therapy, namely relapse or death due to the toxicity of the therapy itself.

• Left censoring: Often we want to see the duration of an event from the time that started to the time that ended. When individuals enter the study period without the initiation of the event to be known then we are facing the problem of left censoring. Left censoring is much more difficult to be dealt with.

(18)

3.3

Non-Parametric Estimation

When someone is interested in the distribution of duration data, then empirical estimates for the distribution of the duration to be identified are very informative. Usually, non-parametric estimations are applied for the survival and integrated hazard functions, al-though there exist methods for the estimation of the hazard rate as well (Kiefer, 1988). However the inference from the later non-parametric estimation is more difficult due to the estimated hazard rate’s shape.

The most widely used non-parametric estimate for the survival function is the Kaplan-Meier estimator. This is a step function which encounters the number of events and censored observations per distinct (increasing) time periods.

If we denote by

• t(1) < t(2) < . . . < t(k): the distinct ordered time durations of our sample,

• dj: the number of events at time j for j = 1 . . . k,

• nj: the number of individuals survived until time j,

• hj = dj

nj: the empirical hazard at time j.

Then the Kaplan-Meier estimator can be expressed as:

ˆ S(t) = k Y j=1 (1 − hj) = k Y j=1  1 − dj nj  .

(19)

3.4

Parametric Estimation

When no explanatory covariates are present, it is often useful to estimate a parametric model based only on the sample durations until the observed event to occur by assuming that they follow an underlying distribution. In this way the only parameters that need to be estimated are the distribution parameters.

Let f (Ti) be the density function of the duration Ti for any individual i, S(Ti) is the

survival function respectively and di is a censoring indicator for every individual, where

di = 1 when the individual i is censored and di = 0 otherwise. We assume that the duration

variable T is i.i.d. The estimation can be done with maximum likelihood methods. If we ignore the fact of censoring the likelihood function would be given by :

L =

n

Y

i=1

f (Ti)

For survival data though the information for the censored dataset spans only until the end of their survival period. Hence likelihood function for the whole sample will be the product of the likelihood for the censored and non-censored individuals:

L =

n

Y

i=1

f (Ti)diS(Ti)1−di. (1)

The above expression can be rewritten as a function of the hazard rate as we know that f (Ti) = h(Ti)S(Ti). Therefore, L = n Y i=1 h(Ti)diS(Ti)

is the likelihood to be maximized.

(20)

(non-negative) durations are the Exponential, Weibull, Lognormal, Log-logistic, Generalized Gamma and Burr. Each distribution has specific properties concerning the effect of time on the hazard. The exponential assumes a constant hazard, the Weibull a monotonically increasing or decreasing hazard, the Lognormal and Log-logistic can have an increasing and then decreasing hazard (distinguishing from the Lognormal from the fact that at the initiation of the study period the hazard can have a value different than 0). There is a nested class of distributions (e.g. Generalized Gamma, Burr) which make estimation of the most appropriate distribution for each specific dataset easier. Further on we present in detail the Generalized Gamma distribution as it nests several distributions such as the Lognormal, Weibull and Exponential (as Exponential is a special case of the Weibull).

3.4.1 Generalized Gamma

The Generalized Gamma has the property of nesting a number of distributions according to the values of its parameters. Let λ, p and k be the location, scale and shape parameters respectively. The density function of Generalized Gamma is then:

f (t) = λp(λT )

pk−1exp[−(λT )p]

Γ(k) .

The above expression yields the:

• Gamma density function, when p = 1 • Weibull density function, when k = 1,

• Exponential density function, when p = 1 and k = 1, • Lognormal density function when k → ∞.

(21)

is a special case of the Generalized Gamma. Although standard estimation routines for the Generalized Gamma are not available in most standard statistical software packages, the relation between the Generalized Gamma and the Gamma distribution make estimation easy to be implemented manually. In particular, when the T ∼ GG(λ, p, k) then Tλ/p

G(kλ/p, |λ|). Maximizing the likelihood in (1) using the transformed data and the Gamma density function, yields the estimators for the parameter vector θ = (λ, k, p).

3.5

Semi-parametric estimation

3.5.1 Continuous time analysis

We proceed with the multivariate analysis where the duration variable is considered to be observed in continuous time. Plenty of research has been done in the estimation of duration models in continuous time. Models with extended applications in the literature include the Accelerated Failure Time (AFT) model (Klein and Moeschberger, 1996), the weighted least squares model proposed by Stute (1993) and the proportional hazard model proposed by Cox (1972). The last model allows us to estimate the effect of the covariates on the hazard of an event, contrary to the other methods, like the AFT model, where the effect is directly on the expected survival time of the individual.

3.5.2 The Proportional hazard model

Cox (1972) proposed the following model in order for the effect of several covariates on the hazard rate h(t) to be estimated:

h(t|zi) = h0(t)g(β, zi),

The above formula means that the hazard rate is the result of a (known) function g(β, zi),where

(22)

of size K, and of a baseline hazard rate h0(t). The most common function used for the

covariates is the following:

g(β, zi) = exp(ziβ)

so the model can be written as:

h(t, zi) = h0(t) exp(ziβ). (2)

The reason that is called semi-parametric is because the baseline hazard is estimated non-parametrically and only the coefficients of the covariates need to be estimated para-metrically.

In order however for this model to be estimated, a proportionality assumption is being made. This assumption can be illustrated through the following example. Consider two individuals i = 1 and i = 2 and a given duration ˜t. According to the equation above, the ratio of their hazards will be

h1(˜t)

h2(˜t)

= h0(˜t)exp(z1β) h0(˜t)exp(z2β)

= exp[(z1− z2)β. (3)

The ratio of the two hazards in (3) does not depend on survival time. Hence we can conclude that for two individuals the effect of the covariates are proportional through time.

(23)

times. The partial likelihood expresses the probability of an individual to face an event at time ti, conditional on being in the set of individuals at risk at time ti. So, the conditional

probability of every individual at the moment m of facing the event is the probability that the individual i faces the event conditional to the probability that the rest of the set of individuals at risk do not face the event. If we denote the probability of switching for an individual at time ti by f (β, zi) and R(ti) the set of individuals at risk just prior to ti,

then this conditional probability can be expressed as follows:

Lm =

f (β, zm)

P

l∈R(tm)f (β, zl)

. (4)

Continuing we can substitute f (β, zm) = h(β, zm)S(β, zm). Since all individuals in the

risk set R(tm) have survived until tm their survival function will be the same. Using these

two identities we find that

Lm = h(β, zm)S(β, zm) P l∈R(tm)h(β, zl)S(β, zl) = P h(β, zm) l∈R(tm)h(β, zl) . (5)

Substituting (2) into (5) and taking into consideration that the baseline hazard is assumed to be the same for all individuals results to

Lm =

exp(zmβ)

P

l∈R(tm)exp(zlβ)

. (6)

Each event that some individual faces contributes to the likelihood function according to (6). Therefore the partial likelihood will be:

(24)

where M is the total number of distinct event times. Maximizing L with respect to β yields the estimated values for the coefficients ˆβ

3.5.3 Testing the proportionality assumption

One major assumption that the Cox model makes is that the effect of the covariates on the hazard of an event is proportional through time. In practice however this assumption is of-ten violated. Including covariates that we have been falsely assumed to have a proportional effect on the hazard, leads to biased estimates (Schemper, 1992). Small violations of the proportionality assumption have minor effect on the coefficient of the covariate. However, when the violation is large, a large amount of the sample is censored, or the effect of the covariate on the duration is large then the bias is much larger. Therefore before deciding for the final model, proportionality tests must be applied.

In order for someone to test whether the covariates have indeed a proportional effect he/she can apply several tests. We can separate them in two categories, visual tests and formal tests. Below we present some of them.

3.5.4 Visual tests

A relatively simple way to check the proportionality assumption of a certain variable is to compare non-parametric survival estimators for different values of the covariate. Plotting the estimations in the same plot would give evidence of potential proportionality violation. In particular, if the survival curves cross each other then there is some information of proportionality violation.

3.5.5 Formal tests

(25)

Cox regression. The interaction between the time variable and each covariate shows the effect of this covariate through time on the hazard rate. This requires however the data to be episode-splitted so that one observation in the dataset will represent one time interval (see section 3.6.2). In this way the the time variable can be created. A statistically significant coefficient implies an interaction between the variable and time and therefore a proportionality violation.

3.6

Discrete time analysis

There are times in duration analysis where the event occurs within some time interval. This is in contrast to the continuous estimation where we consider that an event can occur in any point in (continuous) time. The data in that case are considered as interval censored. Different techniques of duration analysis should be applied in that cases to encounter for interval censoring. In particular, assume that the whole study period is splitted into r time intervals with interval boundaries α1(= 0), α2, . . . , αr:

[α1, α2], (α2, α3], . . . , (αj−1, αj], . . . , (αr−1, αr].

Since we are now interested in the survival of an individual i after the end of the interval, the survival function at the end of one interval is defined as :

S(αj) = P r(T > αj).

(26)

single infinitely small moment in time. It is defined as: h(αj) = P r(T < αj−1) − P r(T < αj) P r(αj−1) = S(αj−1) − S(αj) S(αj−1) = 1 − S(αj) S(αj−1) .

Since the survival function at an interval is the probability of an individual not to face an event until the end of an interval αj, it can also be expressed as the product of the

probabilities of not facing the event within all previous time intervals, up and including the current interval. Therefore the survival function can be also written as:

S(αj) = j Y k=1 [1 − h(αk)] or equivalently log S(αj) = j X k=1 log[1 − h(αk)]

(27)

Recall that the survival function for the continuous time case is S(t) = exp  − Z t 0 h(u)du  .

3.6.1 Relation between continuous and discrete methods

We can show the relation between the continuous and the discrete hazard models by deriving an estimate of parameters for the interval censored data using the properties of the continuous time hazard. The survival function at time αj is now given by:

S(αj) = exp  − Z αj 0 h(u)du  .

Since by (2) h(t, zi) = h0(t) exp(ziβ) we can continue as follows:

S(αj) = exp

 −

Z αj

0

h0(u) exp(ziβ)du

 = exp  − exp(ziβ) Z αj 0 h0(u)du  .

Using (8) we can define the hazard function for the discrete case as:

h(αj, zi) = 1 − S(αj) S(αj−1) = 1 − exp− exp(ziβ) Rαj 0 h0(u)du  exp− exp(ziβ) Rαj−1 0 h0(u)du  = 1 − exp  exp(ziβ) Z αj−1 0 h0(u)du − Z αj 0 h0(u)du 

which implies that

(28)

and subsequently

log −(log[1 − h(αj, zi)]) = ziβ + log

Z αj

αj−1

h0(u)du. (8)

Since we assumed that the hazard is relatively small then (8) can be approximated with the help of a logit link function. In particular, as h(αj, zi) → 0 then

log(− log[1 − h(αj, zi)]) ≈ log(h(αj, zi))

. Using the same assumption,

log 

h(αj, zi)

1 − h(αj, zi)



= log(h(αj, zi)) − log(1 − h(αj, zi)) ≈ log(h(αj, zi)).

Hence, (8) can be rewritten as

log  h(αj, zi) 1 − h(αj, zi)  = ziβ + log Z αj αj−1 h0(u)du (9) log  h(αj, zi) 1 − h(αj, zi)  = ziβ + γ. (10) where γ = logRαj

αj−1h0(u)du. We therefore showed above that using continuous time analy-sis on discrete level data yields results that can be approximated with the help of the logit model.

3.6.2 Episode-splitting

(29)

until the event occurs. We provide a simple example in Table 1. We assume three different individuals which are followed through time. Each individual has a duration time, which is the time until either facing an event or until exiting the study (censoring), and three explanatory covariates, two time-constant, age and gender, and one time-varying, price, measuring the difference between two prices. A unique ID is assigned to every individual prior to episode splitting. We split the dataset in such a way so that every individual has multiple records according to its duration. So, in the example, the first person has now three records since his duration variable has a value of 3. Every new record refers to an increment of the duration where the individual is still in the study. The two new variables start and stop indicate the starting and ending time point respectively, for which the record is referring to. The time-constant variables are remaining the same in every multiple record. The time-varying covariate, since it varies within the time that the individual is still in the study and did not face the event, takes distinct values according to the increment for which the record is referring to. For example the first individual in Table 1 has exactly the same values for the variables age and gender since are time constant but the variable price takes the value 0.821 in the first week of the study period, 0.762 in the second, etc. A new unique id new person id for every observation in the dataset is included. The new censoring indicator start, created after the multiplication of the data, takes the value 0 when the event does not occur within the time increment defined by the two variables start and stop and 1 when the event occurs within this increment.

In the way described above, we are able to transform duration data in a form of binary data. After the data are transformed we can handle them easier with the help of binary regression models. A drawback of the episode splitting method however is that the sample size becomes much bigger than the original one. This results in high computing times.

(30)

Table 1: Data Episode Splitting (Example)

new person id person id duration start stop censoring age gender price

1 1 3 0 1 0 40 1 0.821 2 1 3 1 2 0 40 1 0.762 3 1 3 2 3 1 40 1 0.544 4 2 4 0 1 0 25 0 0.821 5 2 4 1 2 0 25 0 0.762 6 2 4 2 3 0 25 0 0.544 7 2 4 3 4 0 25 0 0.469 8 3 6 0 1 0 35 1 0.821 .. . ... ... ... ... ... ... ...

a discrete time proportional hazard model and that of a logistic regression model. Follow-ing we will try to show this similarity by derivFollow-ing the likelihood function of the logistic regression model from the one of the proportional hazard model.

Due to the fact that we observe events occurring in time intervals, and not in exact points in time (e.g. weeks), then the hazard hi(αj) of an event to occur to an individual i

is:

hi(αj) = P r(Ti = αj|Ti ≥ αj).

We define the survival and density function for one individual i as:

Si(αj) = P r(Ti > αj) = j Y k=1 [1 − hi(αk)] fi(αj) = P r(Ti = αj) = hi(αj)S(αj−1)

The episode splitting is essential because now we can use the new censoring indicator yij

(31)

otherwise. The likelihood function is defined as: L = n Y i=1 j Y k=1 fi(αk)yikSi(αk)1−yik = n Y i=1 j Y k=1 hi(αk)yikSi(αk−1)yikSi(αk)1−yik = n Y i=1 j Y k=1 hi(αk) yik  Si(αk) 1 − hi(αk) yik Si(αk)1−yik = n Y i=1 j Y k=1  hi(αk) 1 − hi(αk) yik Si(αk) = n Y i=1 " j Y k=1  hi(αk) 1 − hi(αk) yik j Y k=1 (1 − hi(αk)) # . (11)

By taking the logarithm of (11) we have :

` = n X i=1 j X k=1 yiklog  hi(αk) 1 − hi(αk)  + n X i=1 j X k=1 log(1 − hi(αk)) = n X i=1 j X k=1

[yiklog hi(αk) − yiklog(1 − hi(αk)) + log(1 − hi(αk))]

= n X i=1 j X k=1 [yiklog hi(αk) + (1 − yik) log(1 − hi(αk))].

Written back in the original form, the likelihood function becomes :

L = n Y i=1 j Y k=1 hi(αk)yik(1 − hi(αk))1−yik, (12)

which is the likelihood function of a logistic regression model.

(32)

applied before to the data.

3.7

Endogeneity and instrumental variables

Although duration analysis overcomes difficulties concerning the time varying nature of the data and the problem of censoring, there are more obstacles of both practical and theoretical nature that have to be overcome. Initially, a ‘chicken and egg’ dilemma arises when we encounter drug prices in our analysis. This endogeneity problem occurs because there is an unclear relation between the prescriber’s behavior and the price fluctuations. In other words, when doctors consider that a specific pharmaceutical is too expensive to be prescribed, then the consumption of this product is assumed to decrease, which forces the producing company to reduce the price in order to make the pharmaceutical more attractive to the prescribers. So, the chance that a patient will be prescribed a generic drug affects the price level. On the other hand, a product that is considered to be as effective as an alternative pharmaceutical but costs less, should be attractive to the prescriber. Therefore, the price level affects the chance that a patient will be switched to a generic drug.

(33)

defined through a regression analysis and then predictors for the endogenous variables are made with the help of the instruments. Furthermore, the original model is estimated by encountering all the exogenous variables plus the projections of the instruments on the endogenous variables. In the framework of duration data, where the logistic regression is used, 2SLS is not applicable since the second regression that needs to be estimated is not linear (see section 3.7.1). However, the IV estimation method is still applicable. In the IV method the estimation is done in one step. We will explain in more detail the IV estimation for a logistic regression with a logit link function.

Before we continue, we suppress the notation in the episode splitted data as follows. For i individuals, where i = 1, . . . , n with durations j where j = 1, . . . , Ti, we consider a

N × 1 vector y and a N × K matrix Z where N =Pn

i=1Ti. Let

y = (y11, . . . , y1T1, . . . , yn1, . . . , ynTn)

0

Z = (z11, . . . , z1T1, . . . , zn1, . . . ,nTn)

0

Also, β is the K vector of coefficients and h is a function of zi and β.

The likelihood function derived in (12) will be maximized with respect to the vector of coefficients β when the following first order conditions are satisfied:

1 N N X i=1 [yi− h(zi, ˆβ)]∂h(z∂ ˆβi, ˆβ)zi h(zi, ˆβ)[1 − h(zi, ˆβ)] = 0. (13)

Since however the hazard function h(zi, β) is expressed with the logit link function then

we know that ∂h(zi, ˆβ)

∂β = h(zi, ˆβ)[1 − h(zi, ˆβ)]. Using the above, (13) becomes:

(34)

Simultaneously solving the above K moment conditions with respect to β gives us the Generalized Method of Moments (GMM) equivalent of Maximum Likelihood. With the help of the GMM we will show that is possible to handle endogenous variables with the method of Instrumental Variables (IV). Meijer and Wansbeek (2005) offer a detailed introduction in estimation with GMM.

In the presence of endogeneity for some of the covariate(s), (14) does not give consis-tent estimates, i.e the expected moment conditions are different from zero (Foster, 1997), (Davidson and MacKinnon, 1993). For this reason we need to find a new set of variables w, called the instruments, which will set the new moment conditions equal to zero. The new estimation function will be:

N

X

i=1

wi[yi− h(zi, ˆβ)] = 0.

If not all the variables are endogenous, then w consists of the instrumental variables for the endogenous while a copy of the exogenous variables serves as a good instrument of themselves. Denoting as h(zi, ˆβ) the logit term exp(zi

ˆ β) 1+exp(ziβ)ˆ

then the GMM estimation function for the IV case will be:

N X i=1 wi yi− exp(ziβ)ˆ 1 + exp(ziβ)ˆ ! = 0. (15)

Formula (15) give us K different equations which if solved with respect to ˆβ give us the estimated coefficients. If there exist more than one instrument for every endogenous variable then the model is over-identified and a weighting matrix should be introduced among the instruments :

argmin

β

(35)

where PW = W (W0W )−1W0 is the projection matrix of the instruments and h(Z, β) is

the N × 1 vector with typical element

h(zi, β) =

exp(ziβ)

1 + exp(ziβ)

.

However in our case we deal only with a just-identified model. Since (15) cannot be solved explicitly, numerical solution is required for the estimators to be computed.

Before we proceed to the estimation of the asymptotic covariance matrix of the estima-tor ˆβ we should first estimate the matrix G(β) consisting of the first order derivatives of the moment conditions. Analytically

G(β) = ∂[wih(zi, β)] ∂β = − N X i=1 wizi0h(zi)(1 − h(zi)),

Using the results from above, then ˆβ follows the following asymptotic distribution:

N1/2( ˆβ − β)→ N (0, Vd W).

The asymptotic covariance matrix is

VW = (G0Ψ−1G)−1,

where Ψ is the covariance matrix of the moment conditions. Therefore, a consistent esti-mator for Ψ is needed in order to estimate the asymptotic covariance of the IV estiesti-mators. A simple way to get a consistent estimate of the covariance matrix of the moment conditions Ψ is to use the sample estimate of the covariance:

(36)

This yields an estimate for the variance of the estimates ˆV1 =



1 NGˆ

0(W0∆W )−1Gˆ−1,

where ˆ∆ is the matrix with typical diagonal element:

ˆ

∆ii = ˆe2i = (yi− ˆh(zi, ˆβ))2.

Alternatively, one could use the expected value of the variance of the moment conditions

gi = wi[yi− h(zi, β)].

E(gigi0) = E[wi(yi− h(zi, β))(yi− h(zi, β))0wi0]

= wiw0iE(yi− h(zi, β))2

= wiw0ih(zi, β)(1 − h(zi, β))

If we take into account the expected value of the variance for the moment conditions, then the variance of the estimators becomes :

ˆ V2 =  1 N ˆ G0(W0P W )ˆ −1Gˆ −1 ,

where ˆP is the diagonal matrix with a typical element of the diagonal:

ˆ

Pii= h(zi, ˆβ)(1 − h(zi, ˆβ)).

(37)

we use the Hausman test (Hausman, 1978). It is a method based on the difference be-tween the GMM-IV and Maximum Likelihood (ML) estimated coefficients and asymptotic covariances. In particular, the Hausman statistic is defined as :

Ξ = ( ˆβ0q− ˆβIVq)

0

( ˆVIV − ˆV0)−1q ( ˆβ0q − ˆβIVq).

where q denotes the position of the endogenous covariate(s), ˆβIVq are the IV estimated coefficients, ˆβ0q are the estimated coefficients for this covariates with ML and V and V0 are the covariance matrices for the IV and ML method respectively. Only the estimator of the endogenous covariates is used because if this is orthogonal to the rest of the covariates, then the difference of the covariance matrices will have a lot of zeros (Kennedy, 2003). Ξ follows asymptotically a χ2

h distribution, where h is the number of endogenous covariates.

A value of the test statistic larger than the critical value rejects the null hypothesis of no endogeneity of any of the covariates.

3.7.1 2SLS Inconsistency

We claimed above that, in a logistic regression, 2SLS is not an appropriate method since it gives inconsistent estimations. This occurs because the dichotomous nature of the trans-formed data does not allow the use of a two stage approach (Foster, 1997), (Davidson and MacKinnon, 1993).

(38)

minimization function. Then minimizing the function

min (y − h(PWZ, β))0(y − h(PWZ, β))

would give an inconsistent estimator for β. On the contrary, the nonlinear GMM-IV estimation discussed before is using the minimization argument below,

min (y − h(Z, β))0PW(y − h(Z, β)) .

Minimization of the latter function gives consistent estimates for β since h(Z, β) (and not Z as in the 2SLS procedure) is projected on the instrument space.

3.8

Unobserved Heterogeneity

(39)

3.8.1 Continuous case

There are several methods proposed in the literature for correcting the problem of unob-served heterogeneity in a model. The technique used depends on the type of estimation method one decides to use. In duration analysis, for the continuous case, the most common method used is this of the unobserved, frailty parameter.

In a Cox proportional hazard, taking into consideration of unobserved heterogeneity is done by adding in the Cox formula an extra unobservable individual variable that scales the no-frailty component. The unobserved random variable Vi is assumed to have the

following properties : • Vi > 0

• E(Vi) = 1

• Vi is independent of the rest variables of the model,

• its distribution is assumed to be known with finite variance.

Using the notation introduced in section 3.6.2 we consider the model to be estimated after the addition of the parameter:

h = h0Viexp(zijβ)

where Vi > 0. The parameter Vi is the so called frailty parameter of the individuals of the

study set. It is the parameter that captures the different probability of every individual to experience an event, due to unobserved characteristics of the individual. The assumption that we previously done that E(Vi) = 1 implies that for given zij the average value of the

hazard is E(h) = h0exp(zijβ), (Wooldridge, 2002).

Due to the fact that the unobserved variable Vi is unidentified, we have to impose

(40)

of contradicting opinions in the literature about the distribution imposed in the frailty parameter. The most common choice for the distribution of the unobserved parameter is the Gamma distribution. The choice was done due to computational purposes. In the most formal article, to our knowledge, Abbring and van den Berg (2006) has proved that under mild conditions the distribution of the unobserved parameter converges to a Gamma distribution. In particular the unobserved parameter’s distribution converges to the Gamma distribution for big duration periods and if the effect of the interaction term of the duration and the covariates on the hazard is negative. However other distributions have been used such as the Lognormal or the inverse Gaussian (Klein and Moeschberger, 1996). Also authors are critical even on the fact of adding an unobserved heterogeneity term in the model, especially when it is applied in univariate models.

When the explanatory power of the observed covariates is weak then the problem created by the omitted variables is even more intense. Jenkins (2005) shows that not allowing for unobserved heterogeneity results in :

• underestimates for the estimated coefficients of the covariates.

• over-estimation of the negative duration dependence or equivalently under-estimation of the positive duration dependence, as well as

• violation of the proportionality effect of a covariate to the hazard since now the effect of the covariate declines with time.

However care should be taken when interpreting the results after the addition of a frailty parameter since the estimated coefficients are not interpreted in the same way.

3.8.2 Discrete case

(41)

of the random effects parameter is similar to adding a frailty parameter in the Cox model. Estimation of a binary regression with random effects however is more complicated that in the continuous Cox model, as we will show below.

We assume that every individual i, with duration until facing an event equal to Ti, has

an unobserved probability for the event to occur . Therefore we allow the probability of the binary response variable yij to depend on a set of covariates captured in the matrix Z

and a new random effects variable i which captures the unobserved individual probability.

So we are interested in the probability of an event to occur at a given time point αj for

an individual i, conditional on the random effects i. Therefore, the likelihood for an

individual will be:

Li = j

Y

k=1

h(zik, i, β)yik(1 − h(zik, i, β))1−yik

Here we make the assumption that the response variables y are independent within every individual, given the random effects. This assumption is essential to allow the multi-plication of the probabilities above, in order to derive the conditional probability of the response vector. This assumption is known as the conditional independence assumption (DePuy et al., 2005). Taking the integral for all i of the individual conditional likelihood

Li give us the marginal density function

λi = Z ∞ −∞ j Y k=1 h(zik, i, β)yik(1 − h(zik, i, β))1−yikf (i)di

where f (i) is the density of the distribution that the random effects are assumed to

follow. Maximizing the marginal likelihood L =Qn

i=1λi or equivalently the log-likelihood

` =Pn

i=1log(λi) yields ML estimates for the coefficients of the covariates in Z and for the

(42)

It is common in the literature to assume that the random effects follow a normal distri-bution (Larsen et al., 2000), (Schall, 1991), (Stiratelli et al., 1984) Using this assumption, the marginal log likelihood can be written as:

` = n X i=1 log Z inf − inf j Y k=1 h(zik, i, β)yik(1 − h(zik, i, β))1−yik 1 √ 2πσ exp − 2 i 2σ2   di.

However, rewriting the above formula in a form that includes only distributional parameters of the random effects instead of the unobserved parameters themselves (i.e. integrate out the random effects) and then maximizing the likelihood function is not possible in this case. That means that the above formula cannot be rewritten in a closed form and numerical maximization is required. There is a great amount of literature concerning methods of maximization of this formula. Most of them use Bayesian approaches or estimation through simulation. An overview of some of these methods can be found in (DePuy et al., 2005) and (Stiratelli et al., 1984).

In (Jenkins, 2005), (Bradley et al., 2001) the authors follow an approach where imposing a specific distribution on the random effects is avoided. Since however the number of individuals exceeds the number of available degrees of freedom, they consider a number of groups M . Each individual has a probability πm , m = 1, . . . , M , of belonging to one of the

M groups. We base our estimations on the idea that the intercept can vary between groups. In particular, consider again the hazard function hi(αj) for an individual i that belongs to

a group m where now hi = 1+exp(zexp(ziβ+m)

(43)

where πm is the probability of an individual belonging to the mth group and Lm =  hm(αj) 1 − hm(αj) cj j Y k=1 [1 − hm(αk)].

where c is the censoring indicator. Therefore the contribution to the likelihood function by an individual i with spell duration αj will be

Li = M X m=1 πmLm. where PM m=1πm = 1

As it was mentioned above, the interpretation of the estimated coefficients in the two cases (according to whether random effects are taken into consideration or not) differs. In the case where unobserved heterogeneity is not taken into consideration then the inter-pretation of the estimated coefficient exp(β) is the average marginal effect of the specific covariate on the response variable, when the first increases by a value of 1, for any indi-vidual.

For the case where the individual effect is included in the model, the marginal effect of a coefficient is the same only for the (hypothetical) individuals that have the same underlying unobserved heterogeneity.

4

Empirical analysis

4.1

Data

4.1.1 IADB.nl

(44)

de-partment of Social Pharmacy & Pharmacoepidemiology (Sociale Farmacie & Farmacoepi-demiologie), during the last decennium IADB.nl has been developed. IADB stands for InterAction DataBase. IADB.nl has evolved from a database originally started in 1995 with about 10 pharmacies primarily in the city of Groningen to currently 50 pharmacies in the provinces of Groningen, Friesland, Overijssel and Gelderland. The adherent population of the 50 pharmacies is approximately 500.000 persons. The database comprises the full populations of the cities of Groningen and Zwolle. The adherent population has grown during the nineties and is more or less stable at the mentioned 500.000 since 1997.

4.1.2 The Antidepressants

The antidepressants dataset extracted from the IADB consisted initially from 1.5 million recipes and 89 thousand patients. We distinguished the different drug groups according to their ATC code (Anatomical Therapeutic Chemical). Each ATC code refers to a different class of drugs which, regardless if they are branded of generic, use the same active sub-stance. In the IADB 26 different drug groups for antidepressant medications can be found. For the purpose of the analysis we focused on three of those groups. One reason for this selection was that the patent expiry of the branded drug of the group should occur during our sample period. In Table 2 one can see the percentage as well as the amount of recipes per drug group in IADB.

Name Recipes Share

1 FLUOXETINE 139505 0.089

2 CITALOPRAM 116313 0.074

3 PAROXETINE 414837 0.265

Table 2: Number of recipes and recipe share of Fluoxetine, Citalopram and Paroxetine

(45)

in Defined Daily Doses (DDDs).

Figure 1: Share of Paroxetine, Fluoxetine and Citalopram over time

(46)

4.1.3 Fluoxetine Users

Focusing in the analysis of Fluoxetine we can see that as soon as the generic product was introduced in the market, there was almost immediate response from the consumers. In less than a year the generic product had a market share within the drug group of almost 80% , Figure 2.

Figure 2: Percentage of brand and generic Fluoxetine

For every patient we used information concerning each individual’s characteristics, the characteristics of his/her prescriber and pharmacist as well as the market conditions at the time of the switch. In particular we have focused our analysis on the following variables:

(47)

• The binary covariate event, showing whether a patient has switched to a generic drug or he/she is censored.

• The binary covariate gender showing the gender of the patient. • The age of the patient age.

• The prescriber’s inclination to prescribe generics artsrate, measured as the fraction of the generic recipes prescribed by a specific doctor, divided by the total number of recipes of him/her.

• The pharmacist’s inclination to dispense generics apotrate, measured as the fraction of the generic recipes dispensed by a specific pharmacist, divided by the total number of recipes of him/her

• The price difference variable diffprice. To construct this variable we initially con-sidered the fraction of the declaration price of both branded and generic drugs di-vided by the defined daily doses (DDDs) for which the two types of drugs found to be prescribed in the IADB. The difference between these two values constructed the covariate (diffprice).

Following we provide some descriptive statistics concerning the Fluoxetine users and the information extracted from their recipes included in the data set.

duration event gender age artsrate apotrate diffprice

Minimum 1 0.00 0.00 6.9 0.00 0.00 -0.36 1st Quantile 8 1.00 0.00 41.8 0.32 0.38 0.14 Median 13 1.00 1.00 50.9 0.45 0.50 0.22 Mean 46 0.85 0.68 52.5 0.44 0.49 0.19 3rd Quantile 48 1.00 1.00 60.7 0.57 0.59 0.27 Maximum 315 1.00 1.00 98.1 0.98 1.00 0.46

(48)

In Table 3 we can see that women are the majority in the dataset (68%). This is possibly related to the nature of depression as women are twice more likely to suffer from (mild) depression incidents (Wilkinson et al., 1999). Also, 86% of the users of Fluoxetine after the patent expiry date decided to switch to generic products. The average duration until some user either switched to generic or stopped using the brand Fluoxetine was 46 weeks while 25% of the patients had stopped or switched within the first 2 months after the generic was introduced. As far as the age of the patients studied is concerned, we can see that the patients’ sample was mostly conentrated between 40 to 50 years old (Figure 3). The above mentioned fact could be caused from the fact that depression is a disease that refers mainly to middle aged men and women during their late 30’s-early 40’s (Wilkinson et al., 1999). Therefore a sample concentration around the age of 40’s and 50’s can be explained by the nature of the disease.

4.2

Duration analysis results

4.2.1 Non-parametric estimation

Figure 4 presents the Kaplan-Meier estimator of the survival probability on Prozac, the branded product of the Fluoxetine drug group. We can clearly see the steep decrease in the probability to continue using the branded product in the first 30 weeks. The solid line represents the estimator while the dashed lines are the 95% confidence interval for the survival probability.

Approximately only 40 % of the patients were still using the brand Fluoxetine 30 weeks after the patent expiry. The survival probability from this point and after seems to be declining with a constant rate. Similar inference we can extract from the empirical hazard rate estimator. In Figure 5a we can distinguish the higher conditional probability of switching in the first four months2 and then a decreasing hazard rate, until nine months

(49)

Figure 3: Age distribution of Fluoxetine users

after, where the hazard rate remains approximately constant. In order to verify this we estimate the Kaplan-Meier integrated hazard through which we can capture the cumulative probability of switching. The estimated graph is presented in Figure 5b. According to the literature, a concave integrated hazard rate function implies positive duration dependence (Kiefer, 1988), i.e. the switching has more chances to occur in the dates closer to the patent expiry date. The slope of the integrated hazard in figure 5b is less steep from the 12th month until almost three years after the patent expiry, and from there and after the

hazard seems to remain constant.

All the non-parametric estimations yielded results that imply a high probability of

(50)

Figure 4: Survival Probability to continue using a branded product for Fluoxetine users

switching in the first period after patent expiry and a relatively constant hazard rate after this period. The above result is useful not only because it helps us understand the relation between time and the switching probability but because this information is also needed for the next sections where we apply parametric and semi-parametric estimations.

4.2.2 Parametric estimation

(51)

Figure 5: Hazard and integrated hazard of switching to a generic product for Fluoxetine users

dependence that in theory could be captured by the Generalized Gamma. In particular, the distribution of the hazard rate is initially increasing and then decreasing through time. Therefore, we expected the Generalized Gamma to be able to approximate the duration data accurately. The results are shown in Table 4.

V alue Std.Error z − score p − value

Shape −0.92

log(Scale) 0.18 0.02 9.31 0.00

Location 2.53 0.03 83.62 0.00

Table 4: Generalized Gamma distribution fit on duration until switching data

(52)

parameter of the Generalized Gamma distribution is significantly different from 1, since the logarithm of the parameter is significantly different from zero, so the Gamma distribution is not the best approximation. Also the negative value of the Shape parameter indicate a bad approximation also by the Lognormal distribution.

Due to the lack of information on deciding upon a well-fitted distribution we tried to fit the a loglogistic distribution on the duration data since it has the desired properties suited for our analysis but is not nested in the Generalized Gamma function. The results are shown in table 5.

V alue Std.Error z − score p − value

Location 3.01 0.03 87.31 0.00

Log(scale) −0.22 0.02 −10.34 0.00

Table 5: Loglogistic distribution fit to the duration until switching data

To visualize the fit of these distributions on the duration data we plotted together the Kaplan-Meier estimator of the survival function with the parametric survival functions in Figure 6. We can see that the fit is considerably better for the Generalized Gamma distribution.

The usefulness of parametric modelling stems from the fact that the hazard distribution can be identified with estimating only a small number of parameters, as well as from the fact that the same distribution fitted to these data can be fitted to a similar case (e.g. to another drug group) so that the goodness of fit, as well as the diversity between the drug groups’ hazard to be identified.

4.2.3 Semi-Parametric estimation

(53)

Figure 6: Gen.Gamma and Loglogistic distributions fit on survival probability on the branded drug for Fluoxetine users

the data are interval censored, we applied discrete duration methods. before we proceeded to the estimation of the models however we had to make sure that the proportionality assumption that we made, concerning the covariates and time, is not violated. In particular, we applied the visual method discussed in section 3.5.3.

(54)

the average have the same survival probability through time compared to the rest of the patients. The covariate age however seems to give some small evidence of violation of the proportionality assumption, however we decided not to remove this variable from the estimations.

Figure 7: Visual Proportionality tests for the time-constant covariates.

(55)

end of any observed time interval for every individual, in order to identify the form of duration dependence (Jenkins, 2005). A positive estimated coefficient for the last variable would indicate an increasing shape for the hazard rate, negative coefficient would equal to decreasing hazard and a coefficient equal to zero a constant hazard. All the continuous covariates were measured at their mean values so that the interpretation of the coefficient can be done easier (Kiefer, 1988). The results of the logistic regression estimation are presented in Table 6.

V alues Std.Error p − value

intercept −4.015 0.055 <0.001 gend −0.077 0.051 0.128 age 0.000 0.002 0.392 diffprice 2.893 0.212 <0.001 apotrate −0.723 0.122 <0.001 artsrate 0.314 0.155 0.052 log(dur) −0.619 0.030 <0.001

Table 6: Logistic Regression on duration until switching to generic Fluoxetine

The significance of the coefficient for the variable diffprice implies that there is a higher probability for someone to switch to a generic product when the price difference between the brand and the generic product is high. Similarly we see that patients are more likely to be switched to a generic drug by a doctor that is more inclined to prescribe generics. The opposite holds for the pharmacist. The more often a pharmacist is dispensing generics, the less likely is that he/she will switch a patient from a brand product to a generic. As far as the shape of the hazard is concerned, we see that the hazard is decreasing through time. This is in accordance to our non-parametric estimations where a decreasing hazard was already observed. The variables age and gender do not seem to have a significant effect on the switching probability as their coefficients are not statistically significant.

(56)

correcting for endogeneity. In both cases however the estimators point toward the same, positive, relation between the price difference and the probability of switching. The esti-mators for the rest of the covariates are almost the same as in the case where endogeneity is not taken into consideration.

V alues Std.Error p − value

intercept −4.044 0.055 <0.001 gend −0.075 0.051 0.136 age 0.000 0.002 0.390 diffprice 3.440 0.224 <0.001 apotrate −0.717 0.122 <0.001 artsrate 0.313 0.155 0.053 log(dur) −0.649 0.029 <0.001 Hausman test 55.513 <0.001

Table 7: Logistic Regression on duration until switching to generic Fluoxetine - Instru-mental Variables

Testing the hypothesis of significant presence of endogeneity in the variable diffprice with the help of the Hausman test yielded a result that rejects the hypothesis of a non-endogenous covariate (Table 7). Therefore the model that encounters for endogeneity was selected.

We continue by fitting a generalized mixed model with random effects ignoring for the moment the presence of endogeneity. We decided to apply both the parametric and non parametric method introduced in section 3.8.2 for estimating a generalized mixed model. Table 8 shows the results for all the covariates. For the parametric mixed model we assume that the random effects follow a normal distribution. For the non-parametric one we make the assumption of four mass points.

(57)

Parametrc R.Effects Non-Parametric R.Effects V alues Std.Error p − value V alues Std.Error p − value

intercept −4.015 0.049 <0.001 −- - -gend −0.077 0.054 0.150 −0.077 0.052 0.133 age 0.000 0.002 0.858 0.000 0.002 0.39 diffprice 2.893 0.204 <0.001 2.897 0.199 <0.001 apotrate −0.723 0.124 <0.001 −0.725 0.121 <0.001 artsrate 0.314 0.156 0.045 0.315 0.156 0.048 log(dur) −0.619 0.024 <0.001 −0.620 0.023 <0.001 MASS1 −2.520 0.184 <0.001 MASS2 −1.934 0.085 <0.001 MASS3 −1.496 0.082 <0.001 MASS4 −1.138 0.117 <0.001 Variance R.Effects. 5e−10 Mixture proportions:

MASS1 MASS2 MASS3 MASS4

0.046 0.454 0.454 0.0454

Table 8: Parametric and Non parametric Generalized Mixed Models

variance of distribution of the random effects is close to zero which indicates no presence of unobserved heterogeneity. In the non-parametric model we can observe a small difference in the coefficients of the mass points. ALl coefficients however point to the same negative direction.

4.3

Application in different expired patents

(58)

proba-bility of a patient on the branded product and the price difference between the branded and generic products. Also we were particularly interested on defining a pattern on the in-troduction of the generics in the market. For example, as we will see later on, the entrance of the generic product in the market of Paroxetine had an effect much more modest than in the case of Fluoxetine.

(59)
(60)

Figure 8: Price difference and Survival probability on the branded product for Fluoxetine (a), Paroxetine (b) and Citalopram (c) users

5

Conclusion

Referenties

GERELATEERDE DOCUMENTEN

The safety of each drug was classified as safe, no additional risks known, additional risks known, unsafe, unknown or the safety class was dependent on the severity of liver

bert Marquet en Bernard Landau betreffende de gastropo- den van de Zanden van Luchtbal (Plioceen, Antwerpen), een fors stuk over schitterende macroflora’s uit het Plio- ceen van

Figure 2: Timeseries of observed, uncorrected (a) water storage and (b) discharge of two model landscapes (center and west) of the Landscape Evolution. Observatory over the course

It is observed that the numerically calculated torque obtained from the summation of each contact force torque is comparable with the torque obtained from the average shear stress

Had Pierson in Amsterdam zijn colleges over Engelse literatuur aanvankelijk nog gegeven aan zes of zeven studenten, 17 Swaen doceerde haar daar inmiddels.. voor groepen

In addition to specific numbers regarding patent transactions and volume the USPTO gathered and assigned information about the organizations and technological

The main findings of this study suggest that strategic pathways, containing a wide range of patent expiration strategies have a significant influence on the

The findings regarding the Dutch stock market and the findings regarding the disappearance of market anomalies suggest that analysts’ recommendations published on Dutch stocks