• No results found

Analyzing improvement in Dutch patients undergoing total hip replacement using sample selection models

N/A
N/A
Protected

Academic year: 2021

Share "Analyzing improvement in Dutch patients undergoing total hip replacement using sample selection models"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BS

C

E

CONOMETRICS

TRACK: ECONOMETRICS

BACHELOR

T

HESIS

Analyzing Improvement in Dutch patients

undergoing Total Hip Replacement using

Sample Selection Models

by

M

AURITS

O

OSTENBROEK 10350500

June 26, 2018

12 ECTS April-June 2018 Supervisor: K.J.VANGARDEREN Assessor: Nancy BRUIN

(2)

Contents

Introduction 1

2 Theoretical background 3

2.1 Quality of Life . . . 3

2.1.1 Tools . . . 3

2.1.2 QoL tools in Analysis . . . 4

2.1.3 Used tools . . . 5

2.2 Review of possible regressors . . . 6

2.3 Limited dependent variables models . . . 7

2.3.1 Collinearity of selection bias . . . 10

2.4 Model Choice . . . 10 2.5 Theoretical Assumptions . . . 11 3 Methodology 12 3.1 Data . . . 12 3.1.1 Description . . . 12 3.1.2 Preparation . . . 12 3.1.3 Variable selection . . . 14 3.1.4 Summary statistics . . . 17 3.2 Estimation . . . 17 3.2.1 Model Description . . . 18

3.2.2 Collinearity of selection bias term . . . 18

3.2.3 Variable tests . . . 19

4 Results and Analysis 21 4.1 Exclusion criterium . . . 21

4.2 collinearity . . . 21

4.3 Variables . . . 22

5 Conclusion 27

(3)

Appendix A ASA classification B Stepwise exclusion B.1 Step 1 . . . . B.2 Step 2 . . . . B.3 Step 3 . . . . B.4 Step 4 . . . . B.5 Step 5 . . . . B.6 Step 6 . . . .

(4)

List of Abbreviations

2PM Two-Part Model

ASA American Society of Anesthesiologist

HOOS Hip disability and Osteoarthritis Outcome Score

LIML Limited Information Likelihood (also Heckman 2 step method)

MLE Maximum Likelihood Estimation

OHS Oxford Hip Score

PROMs Patient Reported Outcome Measures

QoL Quality of Life

SF-36 Medical Outcome Study Short-Form-36

THP Total Hip Replacement

VAS Visual Analoque Scale

(5)

Abstract

Introduction: Previous researches to improvement after Total Hip Replacement neglect the prior selection process. This research aims to examine possible selection effects in estimating param-eters of patients’ pre-operative characteristics for improvement after THR using sample selection models. Furthermore, the performance of the Heckman two-step method, Tobit type 2 and Two-Part model is examined.

Theoretical Background: The Heckman method is susceptible for collinearity problems. If collinearity problems occur, the Two-Part model may be more robust, else the Tobit type 2 is efficient over the Heckman two-step method.

Methods: The analysis is performed using backwards selection. Collinearity problems are de-tected using correlation and condition number. The selection effect is tested using a t-test on the inverse Mills ratio in the Heckman two-step method and a Wald test in the Tobit type 2 model. Results and Analysis: The inverse Mills ratio and the correlation coefficient are insignificant. The estimated parameters are similar in the Heckman two-step method, Tobit type 2 and Two-Part model.

Conclusion: Selection bias seems to have a limited effect in the studied sample.

Statement of Originality

This document is written by Student Maurits Oostenbroek who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(6)

Introduction

Osteoarthritis is the most prevalent joint disease of which hip osteoarthritis is the second most prevalent type after knee osteoarthritis. Every year in the Netherlands there are 1.4 per 1000 men with a new diagnosis of hip osteoarthritis and 2.5 per 1000 women (NIVEL, 2017; Ministerie van Volksgezondheid Welzijn en Sport, 2017). Osteoarthritis has the ninth highest disease burden in the Netherlands (NIVEL, 2017; Ministerie van Volksgezondheid Welzijn en Sport, 2017). Total Hip Replacement (THR) is a surgical medical intervention primarily used to relieve pain from osteoarthritis and to improve mobility. To improve decision making when considering total hip replacement and enhance personalized healthcare, it is useful to be able to predict the extent of improvement after an intervention.

In medicine there is currently no objective measure for improvement. One of the measures used to evaluate improvement commonly used in healthcare is Quality of life (QoL), which is a subjective measure. Several tools have been developed in the past years to assess QoL. Some ex-amples are: The EQ-5D (Rabin, Oemar, Oppe, Janssen, & Herdman, 2015), the Oxford Hip Score (OHS) (Dawson, Fitzpatrick, Murray, & Carr, 1996), the Hip disability and Osteoarthritis Out-come Score (HOOS) (Klässbo, Larsson, & Mannevik, 2003), the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) (Roorda et al., 2004) and the Medical Outcome Study Short-Form-36 (SF-36). The OHS, HOOS and WOMAC are hip disease specific and the EQ-5D and SF-36 are general health state questionnaires.

QoL tools have been used in studies regarding THR. Aalund, Glassau and Hansen (2017) used the EQ-5D to estimate the impact of age on quality of life improvement after a THR. They found that older age predicted for higher EQ-5D scores after surgery. Quintana, Escobar, Aguirre, Lafuente, & Arenaza (2009) used the SF-36 and WOMAC to identify several factors in Spanish patients. Clinical parameters such as being female, older age, contralateral osteoarthritis, having comorbidities predicted less improvement. Pre-intervention health status and mental health status were uniformly related to changes in QoL. Additionally, Quintana, Bilbao, Escobar, Azkarate, & Goenaga (2009) created decision trees based on their earlier findings to classify a patient’s indication for surgery as inappropriate, uncertain or appropriate for surgery. The most important factors in the decision tree are pre-operative WOMAC functional limitation scores and WOMAC pain scores.

(7)

factors that influence improvement. However, little research has been conducted to prediction ex ante. Yen, Bispo, Lopes Paiva,Tiago de Souza, & Lopes Neto (2018) used different machine learning algorithms to predict the OHS score after THR and had accurate results. However, only patients that were already selected for treatment were studied, so the process in which patients are selected for the THR is neglected.

Heckman (1979) proved that neglecting the selection process could cause a bias in the predic-tion and proposed the Heckman two-step method to correct for this bias. Afterwards several other models have been developed for sample selection problems. To assist decision making, predic-tions of the potential outcome should be made before selection for THR and a correction for the selection process should be included.

This research aims to examine possible selection effects in estimating parameters of patients’ pre-operative characteristics for improvement after THR using sample selection models. Since sample selection models are not commonly applied in medicine, the performance of the Heckman two-step method, Tobit type 2 model and Two-Part model are assessed for future use in this field. The structure of this paper is as follows: Chapter 2 contains the theoretical background pro-viding an overview of past research in the field of measuring improvement after THR and possible sample selection models. Then in chapter 3 the used dataset and methodology are described. In chapter 4 the results and the analysis are stated, followed by a discussion and comparison to the literature. Finally, chapter 5 contains the conclusion of this paper.

(8)

2

Theoretical background

In this research, patient characteristics are used to predict the outcome after a total hip replacement. In this chapter a theoretical background is supplied about measuring quality of life and common ways to analyse it. Subsequently, three Patient Reported Outcome Measures (PROMs) used for osteoarthritis patients are explained. Several views about decisive regressors to predict quality of life after THR are discussed. Then possible models available for estimation and their points of criticism are reviewed and the tests to assess these models are examined. Finally, the theoretical assumptions used in this research are stated.

2.1 Quality of Life

2.1.1 Tools

Nowadays healthcare has two purposes. Firstly, there are treatments with the intention to cure a disease. Secondly, there are treatments with palliative intent; these treatments focus on improving health in chronic diseases. Traditionally the effect of treatments is measured by objective measure-ments, such as the cure rate, biological response to treatment and survival. In modern healthcare, patients and investigators reasoned that subjective indicators should also be considered in assess-ing effect of treatment (Fayers & Machin, 2007, p. 3). These subjective indicators are referred to as indicators of Quality of Life (QoL).

However, what is quality of life? Quality of life is a vague concept with a wide variety of applications. Several tools have been developed and they focus on different aspects. Common aspects of quality of life on which tools focus are the disease’s impacts on physical, psychological and social function. Two types of tools have been developed. There are more general tools that assess general well-being and disease-specific tools which incorporate disease-specific symptoms. QoL has recently become more and more important as outcome variables (Fayers & Machin, 2007, p. 3). It has been argued that small benefits from treatment of cancer can be outweighed by reduction of QoL and therapy costs (Buccheri, Ferrigno, Curcio, Vola, & Rosso, 1989). Since there is no cure possible, treatment of chronic diseases focuses mainly on survival and on relieve of symptoms. In these diseases quality of life has become a major outcome variable (Fayers & Machin, 2007, p. 12).

(9)

2.1.2 QoL tools in Analysis

Many tools have been developed over the years for different purposes. In this paper multiscale tools are used for analysis. Multiscale tools consist of n questions each scored using a k point Lik-ert scale, scoring from 0-(k-1). A common practice is to aggregate the scores on all the questions up to one total score, which is used for analysis (Fayers & Machin, 2007, p. 216).

A tool needs several properties to be useful for analysis. First of all they should be valid. Validation is defined as: "the process of determining whether there are grounds for believing that the tool measures what it is intended to measure, and that it is useful for its intended purpose." (Fayers & Machin, 2007, p. 77) In this case, the purpose is measuring QoL in patients undergoing total hip replacement.

The most important properties for a tool in this research are responsiveness and sensitivity. Responsiveness refers to the ability of a tool to detect change over time in one patient. Sensitivity is the capacity to detect changes between groups or individuals. The spread of scores in a population has an influence on these properties. For example: If there is a large proportion of the observations that has the minimal score (called floor effect) or maximum score (ceiling effect), it reduces the ability to discriminate between different cases, thus reducing the responsiveness and sensitivity.

Another important property of a QoL tool, is the distribution of the score. QoL multiscale tool aggregated scores are ordinal scores from 0 to k-1. It is not necessarily that a single numerical step in scores relates to the same qualitative difference between scores. To be able to use the scores as an outcome measure, it is assumed that over the full spectrum, differences of one step have the same qualitative meaning; in other words, differences in score are linear (Fayers & Machin, 2007, p. 291).

Analysing change in QoL can be modeled by a linear model in two ways. The first is to use the follow-up value Q1 as outcome variable and use the baseline measurement Q0 as a regressor. In this model it is assumed that the baseline measurement has influence on later measurements. The second is to use change between follow-up and baseline measurement as an outcome variable: ∆QoL = Q1− Q0. An advantage of the ∆QoL model is that the distribution of change is more likely to be symmetrical (Fayers & Machin, 2007, p. 290). A disadvantage of this method is that if the correlation between Q0 and Q1 is lower than 0.5, the variance of ∆QoL is larger than the variance of Q1. Additionally the second method relies on the assumption of linearity in scores

(10)

(Fayers & Machin, 2007, p. 290-291). Linear models to analyse Q1and ∆QoL are:

1 : Q1 = β0+ βtreatment 1 x+ βQoLQ0+ ε1 2 : ∆QoL = β0+ βtreatment 2 x+ ε2

However, these equations are used to estimate the Average Treatment Effect (ATE), βtreatment. In this paper the focus is not to estimate ATE, but to individually predict treatment effect. There-fore these equations will not be used, but sample selection models specified later.

2.1.3 Used tools

In this paper three tools or Patient Reported Outcome Measures (PROMs) are reviewed: the EQ-5D-3L, the Oxford Hip Score (OHS) and the Hip disability and Osteoarthritis Outcome Score (HOOS). The EQ-5D is a general QoL tool that consists of 5 dimensions (or questions) which can be answered with one out of three options and a separate Visual Analogue Scale (VAS). The EQ-5D can be analysed using one of the 243 health states or using the VAS-score. The multi-dimensional health states can be converted to a utility index ranging from -0.59 to 1 (Lamers, McDonnell, Stalmeier, Krabbe, & Busschbach, 2006). This questionnaire is mostly used for gen-eral healthcare evaluation and cost-utility analysis (Longworth & Rowen, 2013).

The OHS has been developed specifically for total hip replacement (Dawson et al., 1996). It consists of 12 questions, each scored on a 5 point Likert scale. The score is meant to aggregate to a total score ranging from 0 to 48 (Murray et al., 2007). Oppe, Devlin and Black (2011) tried to acquire utilities for the OHS by trying to link OHS to EQ-5D, but is seems that differences in conceptual basis made it impossible.

The HOOS has been developed out of the Knee injury and Osteoarthritis Outcome Score (KOOS) and the WOMAC score (Nilsdotter, Lohmander, Klässbo, & Roos, 2003). It consists of 40 items in 5 subscales: pain, symptoms, functioning in daily living, functioning in sport and recreation and hip related QoL. Each of the items are scored using a 5 point Likert scale. Scores in subscales aggregate and are then normalised to scores ranging from 0-100. A total score can be acquired using subscale weights and aggregating the subscale scores.

Harris et al.(2016) performed a systematic review assessing the PROMs commonly used in pa-tients undergoing hip and knee replacement to provide the most promising tools. They found that a lot of questionnaires lack evidence of measurement properties. They searched for evidence and

(11)

scored the following properties: reproducibility, validity, responsiveness, construct, interpretabil-ity and floor/ceiling/precision effects. They defined: “Interpretabilinterpretabil-ity relates to the degree to which one can assign qualitative meaning – that is, clinical or commonly understood connotations – to a tool’s quantitative change in score.“ (Harris et al., 2016, p. 103). The EQ-5D had very little evidence for most of these properties. It had only some good evidence in favor for interpretability for hip-specific usage. The HOOS has some good evidence in favor for construct and some limited evidence for responsiveness and floor/ceiling/precision effects. The OHS was deemed to be the most promising tool with good evidence in favor construct, responsiveness and interpretability. However, there is mixed evidence for floor/ceiling/precision effects.

2.2 Review of possible regressors

One of the key parts of predicting outcome after a total hip replacement is determining the in-fluence of various parameters on change in health related QoL. Several earlier studies committed to finding these predictors. Quintana, Escobar, Aguirre, Lafuente, & Arenaza (2009) used multi-variate linear regression models to analyse the influence of pre-intervention clinical, sociodemo-graphic and health status parameters on change in health related QoL. They conclude that clinical parameters such as being female, older age, contralateral osteoarthritis and having comorbidities predicted less improvement in health related quality of life after a THR and that pre-intervention mental health scores also predicted less improvement. Vissers et al. (2012) performed a system-atic review and found limited and conflicting evidence regarding the influence of psychological factors on the outcome of THR. Also racial and ethnic factors seem to influence the outcome. Lavernia, Alcerro, Contreras, & Rossi (2011) showed that patients with African American ethnic-ity had worse outcomes than others. Hofstede et al. Yen et al. (2018) used pre-operative EQ-5D index, self-perceived disability, problems while shopping, circulation diseases and pre-operative problems while climbing stairs for their prediction model. Hofstede et al. (2016) performed a systematic review for preoperative predictors and main conclusions are that worse preoperative function and severe radiological osteoarthritis predicted more improvement, that for age, gender, comorbidity, pain and pre-operative quality of life the results of studies were conflicting and that patients with higher BMI had worse outcomes.

(12)

2.3 Limited dependent variables models

In medicine most research focuses on impact of treatment. When analysing impact of treatment, groups of patients that have received treatment are compared with patients who did not. To get a clear representation of the impact of treatment, the groups have to be as similar as possible. The golden standard in methodology is setting up a prospective study in which treatment is randomly assigned preferably double blinded, i.e. to patient and doctor, bypassing selection (Higgins & Green, 2008, p. 212). However, a lot of databases available have not been set up prospective and contain just observational data. In this type of data selection processes have occurred between treatment and non treatment. Generally the outcome of a treatment can only be observed if the patient has been selected. In observational studies, two processes can be modeled. First, the outcome equation can be modeled for patients that have been operated. Let zi = 1 if the patient ihas received treatment and zi= 0 if not. yi is the outcome measure and xi a corresponding k x 1 vector of regressors. β is a k x 1 vector of parameters and εi is the error term. The outcome equation is then defined as:

yi= x0iβ + σ εi if zi= 1 yi is not observed if zi= 0

(1)

ziis the binary result of the selection process. This selection process can be modeled using a latent variable z∗i. If z∗i > 0, z = 1, else z = 0. wiis the m x 1 vector of regressors, γ is an m x 1 vector of parameters and ωi is the error term. So the selection equation is then defined as:

zi= 1 if w0iγ + ωi> 0 zi= 0 if w0iγ + ωi≤ 0

(2)

The wivector and the xivector do not necessarily share the same variables, which distinguishes this model from other limited dependent variables models such as the truncated model and the censored model. If the error terms of equation (1) and (2) are not independent OLS estimation of β will be inconsistent. To show this inconsistency let us assume that the errors have a bivariate normal distribution and that the values of (wi, xi) are fixed and that the error terms (ωi, εi) are independent for different observations. Using the scaling term σ in equation (1) variances are: Eωi2 = 1, Eεi2 = 1 and covariance Eωiεi = ρ. So:

(13)

  ωi εi  ∼ NID   0 0  ,   1 ρ ρ 1   !

The inverse Mill’s ratio is defined as:

λi= λ (wiγ ) = φ (w0iγ ) Φ(w0iγ ) = E  ωi|ωi> −w0iγ  (3)

Selection bias follows from:

Eεi|zi= 1  = ρ Eωi|ωi> −w0iγ = ρλi Eyi|zi= 1  = xiβ + σ E  εi|zi= 1 = xiβ + ρ σ λi

Heckman’s (1979) characterised this problem in OLS as a specification error in which the bias term is an omitted variable. He proposed to include a bias term in the main regression. Let: ηi= yi− Eyi|zi= 1. Then the outcome equation is as follows:

yi= x0iβ + ρ σ λi+ ηi, E 

ηi = 0 for zi= 1 (4)

The proposed method contains the following steps:

1. Estimate parameters of the probability that z* > 0 in equation (2) using probit on the total sample: log(L(γ) =

i;zi=1 log(Φ(w0iγ )) +

i;zi=0 log(1 − Φ(w0iγ )) (5)

Use the gamma estimations to estimate the inverse mills ratio:

ˆ

λi= λ (wiγ ) =ˆ

φ (w0iγ )ˆ Φ(w0iγ )ˆ

2. Estimate equation (3) in the selected subsample using OLS

yi= xi0β + ρ σ ˆλi+ ηi (6)

Heckman also shows that the error terms are heteroskedastic so the conventional OLS is not valid to estimate the standard errors. Heckman (1979) derived an asymptotic covariance matrix, which Greene (1981) later simplified to estimate the correct standard errors. Cameron and Trevidi (2005, p. 550) also suggested bootstrap to estimate the correct standard errors.

(14)

Efron and Tibshirani (1994, p. 10) described the bootstrap as: “a computer-based method for assigning measures of accuracy to sample estimates”. Bootstrap relies on the assumption that the original sample is reasonably representative for the population. Resampling (with replacement) from the original sample is then similar to drawing a new sample from the population (Varian, 2005). Bootstrap can be used to estimate the sampling distribution of almost any statistic (Varian, 2005). Bootstrap standard errors are estimated using an algorithm described by Efron and Tibshi-rani (1994, p. 47).

Heckman discussed that the OLS in the second step is consistent but not efficient and that the method should primarily be exploratively used. He proposed that the method provides good starting points for Maximum likelihood (ML) estimation. However, the suggestion of starting points for maximum likelihood estimation dates from a time where it demanded much compu-tation power. Other names for the Heckman two-step method are Heckit and Heckman limited information ML (LIML).

The Heckman Two-step method is not efficient, mainly because only the subsample is used for estimation of the parameters and information is lost. Amemiya (1985, p. 386) includes the full sample to derive a maximum likelihood estimator and defines this model as Tobit type 2 or the Full Information Maximum Likelihood estimator (FIML):

logL(β , γ, σ , ρ) = ∑i;zi=0log(1 − Φ(w

0 iγ ) + ∑i;zi=1log  Φw 0 iγ + ρ σ(yi−x 0 iβ √ 1−ρ2  + ∑i;zi=1  −1 2log(σ 2) −1 2log(2π) − 1 2σ2(yi− x0iβ )2  (7)

Duan, Manning, Morris, & Newhouse (1983) however rejected the Tobit type 2 model and the Heckman two-step method, since these methods estimate unconditional outcomes. He proposed to use the Two-Part model, which estimates conditional outcomes. It is similar to the conditional Heckman two-step method, but has one difference: it ignores the inverse Mill’s ratio in the latter equation. The Two-part model models yi conditional on z∗ being positive. The expected value of yiis then:

Eyi|zi= 1 = xiβ + σ E 

(15)

2.3.1 Collinearity of selection bias

A common problem in two step sample selection models is identification of the variables. The selection equation and the outcome equation often have a lot of variables in common if not identi-cal. If the covariates in wiand xiare identical identification is dependent of the nonlinearity of the inverse Mills ratio function (λ (.)), see equation (3). However λ (.) is approximately linear over most of its argument (Leung & Yu, 2000, p. 177). Leung and Yu (2000, p. 178,196) specified two conditions, that, if both are satisfied, xi and λ (w0iγ ) will be highly collinear and there is a highˆ chance on collinearity problems. First, most of the values of w0iγ of the treated group fall in theˆ linear part of λ (.), that is most of w0iγ are smaller than 3. The second condition is that xˆ iand w0iγˆ are highly collinear.

To reduce the collinearity problems arising from the inverse Mills ratio an exclusion restric-tion is imposed on the variables of the selecrestric-tion and outcome equarestric-tion. Little and Rubin (1987, p. 230) state that at least one variable in the selection equation should be excluded from the out-come equation. This means that at least one variable should be a good predictor for the selection, but not for the outcome. Rendtel (1992) shows that implementing the exclusion criteria without testing whether they also directly impact the outcome criteria can influence the performance of the estimation methods.

However, even if the exclusion criterium is met, collinearity problems may continue to exist in two-step estimation methods. Collinearity of the inverse Mills ratio λ can be tested using correlation and the condition number (Leung & Yu, 1996). If collinearity problems with ˆλ arise, Leung and Yu (2000) suggested some remedies. Including more variables in the selection criteria could stretch the range of w0iγ and spreading some of the values to the non-linear part of the λ (.)ˆ curve. Furthermore imposing more exclusion criteria could further alleviate the problems.

2.4 Model Choice

Puhani (2000) discusses the performance of the Heckman two-step method, Tobit type 2 and Two-Part model. Several Monte Carlo simulations are reviewed and he draws the following conclusions. First, the most important difference arises from difference in regressors in the selection equation and the outcome equation. The degree of collinearity between the inverse Mills ratio and the xi regressors is the most decisive criterion to judge appropriateness of Heckman two-step method and Tobit type 2 in comparison to the Two-Part model. In cases with a high correlation between the

(16)

error terms of the outcome and the selection equation the Heckman two-step method is inefficient and the Two-Part model might be more robust. Furthermore, in cases with no collinearity problems the Heckman two-step method is consistent, but the Tobit type 2 is more efficient.

The difference between the Heckman two-step method and the Two-Part model (2PM) is the inclusion of the selection bias term. If there is no bias, the Two-Part method is the true model and most appropriate to use for estimation. Heckman (1979) proposed a test for selection bias consist-ing of a t-test on the inverse Mills ratio. However, Puhani (2000) points out that collinearity limits the power of the t-test, due to variance inflation, so he advices to test for collinearity using the condition number for regressors. The cut-off point of this condition number is arbitrary. Belsley, Kuh & Welsch (1980, p. 105) proposed a cut-off point for the condition number of 30, while Le-ung and Yu (1996) suggested 20. If the condition number exceeds the cut-off, there is collinearity and the 2PM is recommended, else the Tobit type 2 (or Heckman two-step method) (Leung & Yu, 1996).

The Tobit type 2 is efficient over the Heckman two-step method. However, maximum like-lihood estimation relies on the assumed distribution of the errors, in this case bivariate normal distribution. Yet, it is not known whether this is the correct distribution. Maximum likelihood estimation is generally inconsistent when misspecified, especially in limited dependent variable models (Smith, 1989). Klaauw and Koning (2003) introduced a test for the normality assump-tion in sample selecassump-tion models based on semiparametric estimaassump-tion. Newey (2009) proposed an estimation method in which no distribution needs to be specified, which might give solutions if the normality assumption does not hold. However, in this paper semiparametric estimation is not used, so the normality assumption cannot be tested.

2.5 Theoretical Assumptions

On the basis of theories about QoL tools and evidence provided by Harris et al. (2016) it is assumed that the aggregated scores of HOOS and OHS have linear differences. Furthermore, con-sidering the existence of a selection bias, the Heckman two-step method will be used to estimate the model. For the probit in the first step of the Heckman two-step method, it is assumed that the errors of the selection equation are normal distributed. To apply the Tobit type 2 model it is as-sumed that the errors of the selection and outcome equation have a bivartiate normal distribution. However, tests for appropriateness have to be conducted to validate the use of these models.

(17)

3

Methodology

In this chapter the methodology of this research is discussed. First the description and preparation of the data will be discussed. Then the estimation models will be specified for this research. Tests for the estimation methods assumptions are reviewed. Methods for changing the model are also discussed.

3.1 Data

3.1.1 Description

The data has been collected in the Bergman Clinics Naarden. The Bergman Clinics Naarden have a specific patient population. They do not have an intensive care, so they only treat patients with low complication risk. The risk on complications is assessed using the American Society of Anesthesiologists (ASA) classification (Daabiss, 2011). The higher the classification the higher the risk. The specification of the ASA is shown in Appendix A. Bergman Clinics Naarden only treats patients with classifications lower than ASA III.

The data is collected in a prospective study with the goal to assess important factors in deci-sion making and to compare performance. Both patients who did receive a total hip replacement and patients who did not were included. All patients included in the dataset gave consent and were anonymised. The data is split in 6 types of variables: patient information, radiology out-comes, physical examination, operation data, clinical information and Patient Reported Outcome Measures (PROMs). Patient and clinical information have been acquired for all patient that have been admitted. Radiology, physical examination and operation information have been acquired for those patients who were exampined. PROMs were reviewed at admission (t = 0) and were repeated after 3, 12 and 24 months (respectively t = 1, 2, 3) after receiving treatment in case they were operated; otherwise 3,12 and 24 months after admission in the study. The study is active, meaning that the majority of follow-up measurements has not been acquired yet.

3.1.2 Preparation

The data preparation consisted of 4 steps. First, new variables were created. Second the separate datasets were prepared for merging. Then the separate tables were merged in 1 table, adequate for estimation. Finally, the dataset is cleaned.

(18)

Several primary variables were missing in the dataset and had to be acquired by comparing the separate sets. There was no variable which specified whether the patient received treatment or not. To acquire this information, for every patient was checked whether there was an entry in the operation data. If there was an admission the dummy got value 1 otherwise 0. Furthermore the total score of PROMs were not calculated. For the OHS the total score is the sum of all the scores of the questions. For the HOOS, the subscale scores were calculated as the mean of the scores for the questions on the subscales. The age of the patient was calculated as the age of the patient at the t = 0 measurement of the PROMs. For the physical examination only range of motion was given. To be able to use them for analysis, dummy variables were created to indicate impaired function. Endorotation and abduction are considered most important in physical examination. Endorotation was considered impaired if the range of motion was less than 20 degrees, abduction with less than 20 degrees.

The tables had several disadvantages which had to be solved before the sets could be merged. First, the tables radiology and physical examination contained multiple rows for one patient, be-cause in some cases both hips were examined. There was no variable in advance to specify which side gave complains. For patients receiving treatment, it became clear which hip was affected from the operation data. However in patients not receiving treatment, there was no report and it was unclear which side was impaired. First for all patients receiving treatment the examinations of the contralateral hip were dropped. For patients not receiving treatment the most complete entry was kept and the latter were dropped. This resulted in unique admissions in the tables, more convenient for merging the datasets into one set for analysis.

In the PROMs dataset there were also multiple admissions. For all different times, t = 0,1,2,3, a new row was entered in the dataset with a different value for t. To use the data from the different timestamps the dataset was split in 4 tables for each time respectively. As mentioned before, a lot of follow-up measurements have not been conducted yet resulting in few observations for t ≥ 2. Therefore, only the datasets for t = 0 and t = 1 were used.

All datasets contained the patient number and were merged using this variable as an index. Several types of merges were used. An inner intersection between datasets contains only the rows for which the patient number was in both datasets and should contain full entries of the variables in both datasets. A left/right intersection contains all the rows in the left or right dataset and fills in the entries for the new variables if they are available, otherwise fills them in with empty.

(19)

variables were merged. Not all variables are necessary for the selection equation, so no inner merge was necessary. Major variables are: treatment dummy, PROM outcomes, age, gender, radiology outcome and physical examination. The clinical variables are mainly important for the outcome equation, so full entries are not necessary for the selection equation and missing values are allowed in the dataset. All the missing entries important for the selection equation were dropped. Then the important variables for the outcome equation were checked. Only entries with missing values in the treatment group were dropped. The process is schematically shown in figure 1.

Figure 1

3.1.3 Variable selection

In medicine, generally many different variables are observed overlapping in goal and relevance. For example in physical examination, there are ten different variables which collectively lead to the conclusion that the function is impaired. Inclusion of all the variables leads to high dimension models, which could suffer from multicollinearity. Choice of variables is therefore a crucial task. First the available variables for the selection equation will be reviewed, second for the outcome

(20)

equation.

Decision making in medicine mostly follows guidelines, it is expected that indications as de-scribed by the guidelines are important factors in the selection equation. Gademan et al. (2016) reviewed guidelines to find common indications for total hip replacement. They found that pain, function, radiological changes and failed conservative therapy were the most used indication cri-teria in guidelines. However, within these indications no agreement on cut-off points was found. In electing regressors for the selection equation, these domains were taken into account. All the potential representations for these domains in the available data are shown in table 1. Questions of PROM’s usually have restricted possible outcomes and have ordinal scales (Likert scale), so these variables should be avoided in the equation if there is an alternative. Aggregated data on one of the subscales might be useful instead of separate question scores. Furthermore a total score of one of the PROMs might be a useful regressor as an total indication of the pain and function domains. In chapter 2, theoretical background, it is shown that OHS is deemed the most promising PROM. Therefore the total score of OHS was chosen as the variable for Quality of Life (QoL). However, OHS does not have subscores for different components of QoL. HOOS does supply subscores for pain, daily life, symptoms, sports and quality of life. These subscores can be used as regressors for these specific domains of QoL.

In section 2.2, potential predictors of the outcome after a total hip replacement are reviewed. Hofstede et al (2016) conducted a systematic review to identify variables that affect They found that worse preoperative function and more severe radiological osteoarthritis predicted more im-provement. There was conflicting evidence for age, gender, comorbidity, pain and quality of life. The potential variables for each of these factors are shown in table 1. Only variables which had a low percentage of missing values are evaluated, since further reduction of the sample size should be avoided. In the physical examination data, a lot of observations are missing. Endorotation is a key variable in assessment, since impaired endorotation restricts the function of the hip. Further-more, endorotation has the lowest amount of missing values. Therefore the dummy representing endorotation is chosen as a variable representing hip function.

(21)

Table 1: Potential variables

Domain Potential variables Type

Pain HOOS pain mean score aggregated PROM: score 1-5

EQ-5D question PROM: Likert scale 1-5

OHS question PROM: Likert scale 1-5

function/physical examination Flexion dummy

Adduction dummy

Abduction dummy

Extension dummy

Endorotation dummy

Exorotation dummy

HOOS symptoms mean score aggregated PROM: score 1-5

Radiology Osteofyt scale 0-2

Cysts dummy

Deformed joint dummy

Narrowing joint space scale 0-3

KL-score scale that takes all in account: 0-4

failed conservative None

Quality of life OHS total score at t = 0 aggregated prom score 0-48

EQ-5D vas single score 0-100

Clinical BMI ≥ 30 dummy

Age continous

Smoking dummy

Gender dummy (female = 1)

Education scale 0-3

Comorbidity depression dummy

Cardiovascular dummy

Pulmonary dummy

Rheumatic disease dummy

Diabetes Mellitus dummy

Liver failure dummy

any comorbidity dummy

For some domains there are a lot of variables, such as radiology and QoL. In both the vari-ables collectively result in the decision to perform total hip replacement. Some of these varivari-ables have correlation with each other, causing multicollinearity. The collinearity between the variables within these domains are shown in figure 2 Multicollinearity should be avoided, since it is causes singular matrices of the regressors, making estimation impossible. Correlation between variables is shown in figure 2. There is high correlation between the PROMs scores.

(22)

Figure 2: Correlation of PROM and radiology

(a) PROMs (b) radiology

In the lower left corner of the figures are the scatter plots of two variables. On the diagonal is a histogram of the variable and in the upper right corner the correlations are displayed.

Furthermore, the number of observations is quite low. This gives problems when estimating high dimensional models, such as large standard errors in OLS and non-convergence in maximum likelihood estimation. Therefore a selection of regressors should be made. Radiographs were scored using the KL-grading system. The KL-grading system incorporates the severity of joint space narrowing, osteophytes, cysts and deformation (Kellgren & Lawrence, 1957), thus it can be used as a variable for the extent radiological osteoarthritis. Since it cannot be predicted upfront how many regressors can be included in the model, we start with the largest possible model. When the maximum likelihood estimation does not converge, the least relevant regressor is dropped. This process is repeated until the model can be estimated.

3.1.4 Summary statistics

Some summary statistics of the used dataset are shown in table 2. The KL score, HOOS pain score, HOOS symptoms score and OHS total score seem to be higher in the treated group and impaired endorotation seems to be more common. Depression, smoking and BMI over 30 are less prevalent than 15 percent in both groups.

3.2 Estimation

In this research the effect in the outcome regression is estimated by the Heckman two-step method and by Tobit type 2 model. First the estimated models are stated. Then the tests and possible solu-tions of collinearity are discussed. Finally the tests for relevance of the parameters are discussed.

(23)

Table 2: Summary statistics

Group total (n = 221) treated (n = 187) non treated (n = 34)

mean (sd)

KL score 2.86 (0.91) 3.04 (0.80) 1.85 (0.82)

HOOS pain mean score 2.25 (0.69) 2.29 (0.66) 2.07 (0.83)

HOOS symptoms mean score 2.39 (0.72) 2.44 (0.68) 2.07 (0.83)

OHS total score 22.27 (7.93) 22.81 (7.64) 19.32 (8.93)

Age 64.76 (7.77) 64.75 (7.88) 64.79 (7.22) number (%) Female 129 (58%) 108 (58%) 21 (62%) Depression 6 (3%) 5 (3%) 1 (3%) Smoking 27 (12%) 23 (12%) 4 (11%) BMI > 30 25 (11%) 23 (12%) 2 (6%) Impaired endorotation 203 (92%) 175 (94%) 28 (82%) 3.2.1 Model Description

The included variables in the starting model are the same for Heckman two-step method and the Tobit type 2 model. The selection equation is given in equation (2) and the outcome equation in equation (1)The dependent variable z is the treatment dummy. Starting with including the most potential explanatory variables, wi includes: HOOS pain, KL score, HOOS symptoms and endorotation. The dependent variable yiis the difference of OHS score between t = 1 and t = 0. xi consists of OHS total score at t = 0, gender, age, depression, BMI over 30 and smoking.

In the Heckman two-step method, the selection equation and inverse Mills ratio (λ ) are esti-mated using probit. Then ˆλ is included as an extra covariate in the outcome equation: x∗i = [xi, ˆλi]. The corresponding parameters b∗of the outcome equation are estimated using OLS. As mentioned in chapter 2 the outcome equation suffers from heteroskedasticity. The variance covariance ma-trix is estimated following Greene (2003, p. 785). Additionally bootstrap is used to estimate the standard errors of the parameters. The Tobit type 2 model is estimated using maximum likelihood. The loglikelihood is given in equation (7).

3.2.2 Collinearity of selection bias term

In section 2.3.1 the causes of collinearity of the selection bias are discussed. To prevent collinearity an exclusion criterion is imposed. At least one variable should be a good predictor for the selection,

(24)

but not for the outcome. From the researches of Gademan et al. (2016) and Hofstede et al. (2016) it seems that pain is a good predictor for selection, but not for the outcome, meeting the exclusion criterion. To test whether the excluded variable is appropriately excluded the variable is first included in the outcome equation. The parameters will be estimated using Heckman two-step method. Then the influence of the to be potential excluded variable on y is tested using a t-test. If the test is not significant, the variable can be legitimately excluded.

To further assess possible collinearity correlation and the condition number are used. Corre-lation is estimated with R2of an auxiliary regression of ˆλi on the covariates included in xi. The variance inflation factor is defined as VIF = (1 − R2)−1. The condition number is calculated as suggested by Belsley, Kuh & Welsch (1980, p. 104):

kappa(X ) =σmax σmin

Where σmaxand σminare respectively the maximal and minimal singular value of the Singular Value Decomposition (SVD) of X . Since scale of the columns does affect the condition number, Belsley, Kuh & Welsch (1980, p. 120) proposed that the columns should be scaled to unit length. In this paper the cut-off of 20 as suggested by Leung and Yu (1996) is used. The condition num-ber only tells whether there is collinearity, but not whether the inverse Mills ratio is the cause of it. Therefore the condition number should be calculated with and without ˆλ in X . If there is a substantial rise in the condition number, it may be concluded that ˆλ is collinear. To further eval-uate possible collinearity, the spread of w0iγ is evaluated in a histogram. If collinearity problemsˆ arise, including more variables in the selection equation cannot be used in this research, since the analysis starts with the most variables included. Puhani suggested that if the collinearity problems cannot be solved, the Two-Part model may be the most robust estimator. Therefore the Two-Part model will also be used to compare the results.

3.2.3 Variable tests

In predicting outcome for total hip replacement, selection of variables is critical. Since it is be-lieved that the selection process could cause bias in parameter estimation, this effect should be tested. In the Heckman two-step method a t-test can be performed on βλ. If significant the Heck-man two-step method is justified and should perform well. However if not significant, subsample OLS might give better results. In the Tobit type 2, correlation of the equations ρ is identified and

(25)

the parameter can be tested using the Wald test

The relevance of variables will be tested in both the selection equation and the outcome equa-tion. In the Heckman two-step method two tests are used. In the first probit step, the Wald test is used to check individual variable significance. The LR-test is used to test several variables. In the second OLS step, the t-test is used for individual significance and the F-test for multiple variables. In the Tobit type 2 model, the Wald test is used for individual relevance. The LR-test is used to test multiple variables in both selection and outcome equation. Starting with the full model, variables will be dropped when the parameters are not significant. Then after dropping multiple variables the LR or F-test will be used to check the validity of dropping all the variables. In the case sub-sample OLS seems necessary, t-tests and F-tests are used to check significance for covariates. The ability to explain variation is assessed using the R2of the second stage of the Heckman two-step method and Two-Part model.

(26)

4

Results and Analysis

In this chapter the results and analysis are discussed. First the validity of the exclusion restriction is discussed. Then the results of collinearity tests are stated and analysed. Next the effect of the selection bias is reviewed. Finally the variables in the models and the performance of the models are discussed.

4.1 Exclusion criterium

The test for direct effect of the excluded variables is tested using Heckman two-step method in-cluding the excluded variable in the outcome equation. The HOOS pain score seems to be only significant in the selection equation, so it may be concluded that the the HOOS pain score can be validly excluded.

4.2 collinearity

Leung and Yu showed two causes of collinearity of the inverse Mills ratio λ : values of wiγ in theˆ linear range of λ (.) and high correlation between xiand wi. First a histogram of the predictions of wiγ are shown in figure 3. Most of the predicted values of wˆ iγ are below the value of 3, which isˆ the linear part of the λ (.) curve.

Figure 3: Histogram of predicted w0iγ of the first modelˆ

(27)

showing signs for collinearity problems. The condition number of x is 32.01 without λ (wiγ ) andˆ 50.73 with λ (wiγ ). Both conditions mentioned by Leung and Yu (2000) are satisfied, showingˆ signs of collinearity problems with λ .

However the condition number of X without λ is also larger than 20, showing multicollinearity problems. To decrease collinearity problems within X, variables that are correlated with each other should be dropped. To alleviate collinearity problems with λ (wiγ ), Leung and Yu (2000) showˆ that expanding the range of wiγ by including more variables in the selection equation helps andˆ imposing more exclusion criteria on the outcome equation. However including variables that do not have any influence is not necessarily better. So it is a balance between excluding variables that are irrelevant and including to alleviate collinearity problems. Since collinearity problems do occur, Puhani’s (2000) suggestion to use the Two-Part model is followed.

4.3 Variables

The outcomes of the stepwise exclusion is shown in appendix B. The results of the first and the final model are shown in tables 3 and 4. The condition number in the final Heckman two-step model with and without λ (wiγ ) is respectively 8.44 and 5.14, showing that collinearity problemsˆ have been solved.

The exclusion of variables has been tested using t-test, F-test and LR-tests. The results of the tests are shown in tables 3 and 4. First LR tests in both the probit step of Heckman 2 step and for the Tobit type 2 are performed to exclude covariates from the selection equation testing β = 0 for both impaired endorotation and the HOOS symptoms score against β 6= 0. The test statistics are 1.10 and 0.91 for respectively Heckman two-step method and the Tobit type 2. The test is not significant, resulting in the decision that both can be dropped from the equation. However reduc-ing the variables in the selection equation could reduce the spread in wiγ , increasing collinearityˆ problems. The range of wiγ with HOOS pain and impaired endorotation included and excluded isˆ respectively [-1.381, 3.196] and [-1.076, 3.160]. The ranges do not really change, so the variables can be dropped.

(28)

Table 3: Selection results Heckman vs Tobit

Dependent variable: Treatment Dummy

Heckman first Heckman final Tobit first Tobit final

(1) (2) (3) (4)

KL score 0.924∗∗∗ 0.940∗∗∗ 0.937∗∗∗ 0.955∗∗∗

(0.148) (0.147) (0.151) (0.149)

HOOS pain mean 0.320 0.433∗∗ 0.318 0.430∗∗

(0.220) (0.168) (0.221) (0.166)

Endorotation 0.332 0.276

(0.387) (0.395)

HOOS symptoms mean 0.147 0.148

(0.211) (0.211)

Constant −2.638∗∗∗ −2.288∗∗∗ −2.617∗∗∗ −2.316∗∗∗

(0.650) (0.550) (0.645) (0.548)

LR-χ2tests selection

Symptoms and endorotation (df = 2) 1.10 0.91

Note: (Standard Error) ∗p<0.1;∗∗p<0.05;∗∗∗p<0.01

Second the covariates of the outcome equation are tested using the t-test for individual parame-ters and the F-test and LR-test for multiple parameparame-ters in respectively the Heckman and Tobit type 2. Surprisingly, age is significant with smoking, BMI and comorbidity included, but when one of them is excluded it is not significant anymore. Interestingly is also, that both the F-test in the second step of Heckman two-step method and the LR-test in Tobit type 2 model are insignificant for inclusion of BMI, smoking, comorbidity and age. The final model only includes preoperative HOOS pain score and KL score in the selection equation and the preoperative OHS total score and KL score in the outcome equation. In table 5 the corrected SDbootare shown estimated for the fi-nal Heckman model. Bootstrap could not be performed for the first model, since multicollinearity caused singularity problems. The standard errors are estimated larger than when using the normal estimation, but none of the conclusions change.

The selection bias is tested in both the Heckman 2 step model and in the Tobit type 2 model. In the Heckman 2 step the parameters of λ (wiγ ) were respectively 5.660 and 5.968 for the fullˆ model and the final model. These are quite large estimations, but both were not significant. In the Tobit type 2 model the correlation is estimated in the full and in the final to be respectively 0.1965 and 0.1937 and was in both not significant. The results of the Heckman two-step method and Tobit

(29)

type 2 both indicate that there is low correlation between the errors of the selection equation and the outcome equation. Additionally, the parameters estimated are robust when using the Two-Part model. It seems that in this sample the selection process does significantly cause a bias in the estimation of parameters for the full sample. The influence of the selection process has not been studied before, so future research is necessary to confirm this effect.

In this research the Heckman two-step method, Tobit type 2 and Two-Part model are used to analyse predictors of improvement after THR. The performance of the Heckman two-step method, Tobit type 2 and Two-Part model is quite similar. All the estimated parameters are within each others 95% confidence intervals. The standard errors of the Tobit type 2 model are mostly smaller than the Heckman two-step method, showing the efficience of the Tobit type 2. However, since there is no clear evidence of a selection bias, the Two-Part model might be the true model. Then the Two-Part model is the efficient estimation method. Other sample selection models such as semiparametric models as described by Newey (2009) and Klaauw and Koning (2003) may also prove to be useful, since assumptions on error distributions are not necessary.

Interestingly, most variables mentioned in literature to be influential for the outcome after a total hip replacement (THR), such as gender, age, comorbidity and BMI over 30, did not have a significant effect in this research. The reason might be that this sample is limited. The pa-tients in this sample are relatively healthy, below ASA III, resulting in low prevalence in high risk overweight and comorbidity. Low prevalence and lower severity of these conditions might contribute to underestimation of the effect. Furthermore, the timing of the follow-up is 3 months after surgery, since too few follow-ups 1 year after surgery were observed. The full effect of the surgery may be more apparent after a longer time. Additionally, a larger dataset may make effects of now insignificant variables, such as age, more evident.

The intercept, the radiological severity of the disease and the PROM score before the oper-ation were the only significant explanatory variables. Furthermore, in this sample, there was no evidence for a selection bias. However even with few variables a reasonable adjusted R2of 0.466 was found in the Heckman two-step method and 0.463 in the Two-Part model. Therefor it may be concluded that in the studied population the outcome after a THR can be predicted reasonably well using only the radiological severity of osteoarthritis and OHS total score before the operation. The results of this research could be used to predict future follow ups in the same population to eval-uate the predictive performance. However, more research is necessary to evaleval-uate the predictive performance of sample selection models in medicine literature.

(30)

Table 4: Outcome results Heckman vs Tobit

Dependent variable: OHS difference

Heckman first Heckman final MLE first MLE final

(1) (2) (3) (4) OHS score t = 0 0.920∗∗∗ 0.928∗∗∗ 0.894∗∗∗ 0.891∗∗∗ (0.077) (0.078) (0.070) (0.070) KL score 2.739∗∗ 3.417∗∗ 1.868∗∗ 2.119∗∗∗ (1.330) (1.364) (0.828) (0.804) Age 0.129∗∗ 0.130∗∗ (0.065) (0.065) Female −1.419 −1.424 (1.062) (1.063) Depression dummy −3.072 −3.244 (3.164) (3.197) Smoking −0.822 −0.740 (1.563) (1.576) BMI > 30 1.869 1.871 (1.560) (1.574) Constant −23.607∗∗∗ −18.627∗∗∗ −19.801∗∗∗ −12.924∗∗∗ (7.064) (5.899) (5.380) (3.428) selection correction

Inverse Mills Ratio 4.281 5.968

(4.058) (4.063)

collinearity

condition number x 32.01 5.14

condition number x∗ 50.73 8.44

correlation (R2aux) 0.77 0.765

Tests exclusion outcome

LR-χ2(df = 5) 8.3756 F-test (df = 1,5) 1.6016 Observations 221 221 221 221 R2 0.495 0.475 Adjusted R2 0.472 0.466 σ 7.228 7.631 6.970∗∗∗ 7.136∗∗∗ ρ 0.592 0.782 0.166 0.194

(31)

Table 5: Bootstrap statistics final Heckman equation

original bias SDboot 95% Conf interval

Selection KL Score 0.940∗ 0.041 0.185 (0.559, 1.264) HOOS pain 0.432∗ 0.010 0.201 (0.023, 0.818) Constant −2.288∗ −0.093 0.652 (-3.57, -1.018) Outcome OHS score t = 0 0.928∗ −0.010 0.070 (0.787, 1.071) KL score 3.417∗ −0.139 1.485 (0.806, 6.965) Constant −18.627∗ 0.739 6.168 (-34.3, -8.04)

inverse Mills Ratio 5.968 −0.331 5.220 (-2.135, 19.329)

sigma 7.631∗ 0.201 1.187 (6.431, 12.407)

rho 0.782 −0.117 0.550 (-0.320, 1.634)

Note: ∗If 0 not in confidence interval

Table 6: Two-Part Model

Dependent variable: Treatment Dummy 2PM first 2PM final (1) (2) KL score 0.924∗∗∗ 0.955∗∗∗ (0.148) (0.149) HOOS pain mean 0.320 0.430∗∗

(0.220) (0.166) Endorotation 0.332

(0.387) HOOS symptoms mean 0.147

(0.211) Constant −2.638∗∗∗ −2.316∗∗∗ (0.650) (0.548) Observations 221 221 Note:(Standard Error) ∗p<0.1;∗∗p<0.05;∗∗∗p<0.01 Dependent variable: OHS difference 2PM first 2PM final (1) (2) OHS score t = 0 0.884∗∗∗ 0.879∗∗∗ (0.069) (0.069) KL score 1.546∗∗ 1.733∗∗∗ (0.671) (0.663) Age 0.130∗ (0.066) Female −1.433 (1.087) Depression dummy −3.321 (3.267) Smoking −0.686 (1.608) BMI > 30 1.876 (1.610) Constant −18.373∗∗∗ −11.210∗∗∗ (4.987) (2.763) Observations 187 187 R2 0.492 0.469 Adjusted R2 0.472 0.463 Note:(Standard Error) ∗p<0.1;∗∗p<0.05;∗∗∗p<0.01

(32)

5

Conclusion

This paper provides empirical evidence on the relationship between selection for total hip replace-ment and the effects of personal characteristics on improvereplace-ment after a total hip replacereplace-ment. The factors that influence the outcome have been studied before, but only in the treated population. However, the potential improvement is uncertain before the choice to treat has been made (ex ante).

The main purpose of this paper was to evaluate a potential effect of the selection process in estimating parameters of patients’ pre-operative characteristics for improvement after THR using sample selection models. The parameter of the inverse Mills ratio is not significant and the esti-mated correlation in Tobit type 2 is low and insignificant. Additionally, the estiesti-mated effects of the Heckman two-step method and Tobit type 2 are robust in the Two-Part model, contributing to the conclusion that there is a limited effect of selection bias.

Furthermore, the Heckman two-step method, Tobit type 2 model and Two-Part model were compared. The Tobit type 2 had smaller standard errors than the Heckman two-step method show-ing efficiency. However, the lack of evidence for a selection bias suggests that the Two-Part model is the true model, in which case this estimation method is efficient.

To predict the individual improvement after THR as good as possible, all factors mentioned in literature are included in the model. In this population there is little evidence for effects on improvement for the variables smoking, age, BMI over 30, gender or depression. The only factors that have significant effect in all models are preoperative OHS total score for quality of life and the KL score for extent of radiological osteoarthritis. In the final model the adjusted R2was 0.466. The variation in improvement is pretty well explained by the two factors found. However the sample studied in this research is limited with only 221 observations. Further research is warranted in larger samples to confirm the found effects.

(33)

References

Aalund, P. K., Glassou, E. N., & Hansen, T. B. (2017). The impact of age and preoperative health-related quality of life on patient-reported improvements after total hip arthroplasty. Clinical Interventions in Aging, 12, 1951–1956. doi: 10.2147/CIA.S149493

Amemiya, T. (1985). Advanced Econometrics. Oxford: Basil Blackwell.

Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics : identifying influential data and sources of collinearity. Wiley.

Buccheri, F. G., Ferrigno, D., Curcio, A., Vola, F., & Rosso, A. (1989, may). Contin-uation of chemotherapy versus supportive care alone in patients with inoperable non-small cell lung cancer and stable disease after two or three cycles of MACC. Results of a randomized prospective study. Cancer, 63(3), 428–432. doi: 10.1002/1097-0142(19890201)63:3<428::AID-CNCR2820630305>3.0.CO;2-V

Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics : methods and applications.

Daabiss, M. (2011, mar). American Society of Anaesthesiologists

phys-ical status classification. Indian journal of anaesthesia, 55(2), 111–

5. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/21712864

http://www.ncbi.nlm.nih.gov/pubmed/21712864 doi: 10.4103/0019-5049.79879 Dawson, J., Fitzpatrick, R., Murray, D., & Carr, A. (1996, jun). Comparison of

mea-sures to assess outcomes in total hip replacement surgery. Quality in health care : QHC, 5(2), 81–8. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10158596 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC1055370

Duan, N., Manning, W. G., Morris, C. N., & Newhouse, J. P. (1983). A

Com-parison of Alternative Models for the Demand for Medical Care.

Jour-nal of Business & Economic Statistics, 1(2), 115–126. Retrieved from

http://www.tandfonline.com/doi/abs/10.1080/07350015.1983.10509330 doi: 10.1080/07350015.1983.10509330

Efron, B., & Tibshirani, R. (1994). An introduction to

the bootstrap. Chapman & Hall. Retrieved from

https://www.crcpress.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317 Fayers, P. M., & Machin, D. (2007). Quality of Life: The assessment, analysis and

inter-pretation of patient-reported outcomes. (2 ed.). Chichester: John Wiley & Sons. doi: 10.1002/9780470024522

Gademan, M. G. J., Hofstede, S. N., Vliet Vlieland, T. P. M., Nelissen, R. G. H. H., & Marang-van de Mheen, P. J. (2016). Indication criteria for total hip or knee arthro-plasty in osteoarthritis: a state-of-the-science overview. BMC musculoskeletal disor-ders, 17(1), 463. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/27829422 http://www.ncbi.nlm.nih.gov/pubmed/27829422 doi: 10.1186/s12891-016-1325-z

Greene, W. H. (1981). Sample Selection Bias as a

Specifica-tion Error: A Comment. Source: Econometrica, 49(3), 795–

798. Retrieved from http://www.jstor.org/stable/1911523

(34)

Greene, W. H. (2003). Econometric analysis. Prentice Hall. Retrieved from

https://books.google.nl/books/about/Econometric_Analysis.html?id=JJkWAQAAMAAJ&redir_esc=y

Harris, K., Dawson, J., Gibbons, E., Lim, C. R., Beard, D. J., Fitzpatrick, R.,

& Price, A. J. (2016). Systematic review of measurement properties of

patient-reported outcome measures used in patients undergoing hip and knee

arthroplasty. Patient related outcome measures, 7, 101–8. Retrieved from

https://www.dovepress.com/systematic-review-of-measurement-properties-of-patient-reported-outcom-peer-reviewed-fulltext-article-PROM doi: 10.2147/PROM.S97774

Heckman, J. (1979). Sample Selection Bias as a

Specifica-tion Error. Econometrica, 47(1), 153–161. Retrieved from

https://econpapers.repec.org/RePEc:ecm:emetrp:v:47:y:1979:i:1:p:153-61 Higgins, J. P., & Green, S. (2008). Chapter 8: Assessing risk of bias in included studies. In:

Higgins JPT, Green S (editors). Chichester (UK): John Wiley & Sons.

Hofstede, S. N., Gademan, M. G. J., Vliet Vlieland, T. P. M., Nelissen, R. G. H. H., & Marang-van de Mheen, P. J. (2016). Preoperative predictors for outcomes after total hip replacement in patients with osteoarthritis: a systematic review. BMC Musculoskeletal Disorders, 17(1), 212. doi: 10.1186/s12891-016-1070-3

Kellgren, J. H., & Lawrence, J. S. (1957). RADIOLOGICAL

ASSESS-MENT OF OSTEO-ARTHROSIS. Ann. rheum. Dis, 16. Retrieved from

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1006995/pdf/annrheumd00183-0090.pdf Klässbo, M., Larsson, E., & Mannevik, E. (2003). Hip disability and osteoarthritis

out-come score. An extension of the Western Ontario and McMaster Universities Osteoarthri-tis Index. Scandinavian journal of rheumatology, 32(1), 46–51. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12635946

Lamers, M. L., McDonnell, J., Stalmeier, M. P. F., Krabbe, M. P. F., & Busschbach, V. J. J. (2006, jun). The Dutch tariff: results and arguments for an effective design for na-tional EQ-5D valuation studies. Health Economics, 15(10), 1121–1132. Retrieved from https://doi.org/10.1002/hec.1124 doi: 10.1002/hec.1124

Lavernia, C. J., Alcerro, J. C., Contreras, J. S., & Rossi, M. D. (2011). Ethnic and Racial Factors Influencing Well-being, Perceived Pain, and Physical Function After Primary Total Joint Arthroplasty. Clinical Orthopaedics and Related Research , 469(7), 1838. Retrieved fromR https://doi.org/10.1007/s11999-011-1841-y doi: 10.1007/s11999-011-1841-y

Leung, S. F., & Yu, S. (1996). On the choice between sample selection and

two-part models. Journal of Econometrics, 72(1), 197–229. Retrieved from

http://www.sciencedirect.com/science/article/pii/0304407694017204 doi:

https://doi.org/10.1016/0304-4076(94)01720-4

Leung, S. F., & Yu, S. (2000). Collinearity and Two-Step Estimation of Sample Selec-tion Models:Problems, Origins, and Remedies. ComputaSelec-tional Economics, 15(3), 173–

199. Retrieved from http://link.springer.com/10.1023/A:1008749011772 doi:

(35)

Little, R. J. A., & Rubin, D. B. (1987).

Statisti-cal analysis with missing data. Wiley. Retrieved from

https://books.google.nl/books/about/Statistical_Analysis_With_Missing_Data.html?id=w40QAQAAIAAJ&redir_esc=y Longworth, L., & Rowen, D. (2013). Mapping to obtain EQ-5D utility values for use in

nice health technology assessments. Value in Health, 16(1), 202–210. Retrieved from http://dx.doi.org/10.1016/j.jval.2012.10.010 doi: 10.1016/j.jval.2012.10.010 Ministerie van Volksgezondheid Welzijn en Sport. (2017). Artrose | Cijfers &

Con-text | Huidige situatie | Volksgezondheidenzorg.info. Retrieved 2018-04-13, from

https://www.volksgezondheidenzorg.info/onderwerp/artrose/cijfers-context/huidige-situatie#node-prevalentie-en-aantal-nieuwe-gevallen-van-artrose Murray, D. W., Fitzpatrick, R., Rogers, K., Pandit, H., Beard, D. J., Carr, A. J., &

Dawson, J. (2007). The use of the Oxford hip and knee scores. Journal of

Bone and Joint Surgery - British Volume, 89-B(8), 1010–1014. Retrieved from http://www.bjj.boneandjoint.org.uk/cgi/doi/10.1302/0301-620X.89B8.19424 doi: 10.1302/0301-620X.89B8.19424

Newey, W. K. (2009, jan). Two-step series estimation of sample

selec-tion models. Econometrics Journal, 12, S217–S229. Retrieved from

http://doi.wiley.com/423X.2008.00263.x doi: 10.1111/j.1368-423X.2008.00263.x

Nilsdotter, A. K., Lohmander, L. S., Klässbo, M., & Roos, E. M. (2003). Hip disability and osteoarthritis outcome score (HOOS)–validity and responsiveness in total hip replacement. BMC musculoskeletal disorders, 4, 10. doi: 10.1186/1471-2474-4-10

NIVEL. (2017). NIVEL Zorggegevens eerste lijn. Retrieved 2018-04-13, from

https://bronnen.zorggegevens.nl/Bron?naam=NIVEL-Zorgregistraties-eerste-lijn Oppe, M., Devlin, N., & Black, N. (2011). Comparison of the underlying constructs of

the EQ-5D and oxford hip score: Implications for mapping. Value in Health, 14(6),

884–891. Retrieved from http://dx.doi.org/10.1016/j.jval.2011.03.003 doi:

10.1016/j.jval.2011.03.003

Puhani, P. (2000). The Heckman Correction for Sample Selection and Its

Critique. Journal of Economic Surveys, 14(1), 53–68. Retrieved from

http://doi.wiley.com/10.1111/1467-6419.00104 doi: 10.1111/1467-6419.00104

Quintana, J. M., Bilbao, A., Escobar, A., Azkarate, J., & Goenaga, J. I.

(2009). Decision trees for indication of total hip replacement on patients

with osteoarthritis. Rheumatology, 48(11), 1402–1409. Retrieved from

https://academic.oup.com/rheumatology/article-lookup/doi/10.1093/rheumatology/kep264 doi: 10.1093/rheumatology/kep264

Quintana, J. M., Escobar, A., Aguirre, U., Lafuente, I., & Arenaza, J. C. (2009). Predictors of health-related quality-of-life change after total hip arthroplasty. Clinical Orthopaedics and Related Research, 467(11), 2886–2894. doi: 10.1007/s11999-009-0868-9

Rabin, R., Oemar, M., Oppe, M., Janssen, B., & Herdman, M. (2015). EQ-5D-5L user guide. Basic information on how to use the EQ-5D-5L instrument(April), 28. Retrieved from

(36)

http://www.euroqol.org/fileadmin/user_upload/Documenten/PDF/Folders_Flyers/EQ-5D-5L_UserGuide_2015.pdf

Rendtel, U. (1992). On the Choice of a Selection-Model When Estimating

Regres-sionmodels with Selectivity. Discussion Papers of DIW Berlin. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp53.html

Roorda, L. D., Jones, C. A., Waltz, M., Lankhorst, G. J., Bouter, L. M., van der

Eijken, J. W., . . . Suarez-Almazor, M. E. (2004, jan). Satisfactory cross

cultural equivalence of the Dutch WOMAC in patients with hip

osteoarthri-tis waiting for arthroplasty. Annals of the rheumatic diseases, 63(1), 36–

42. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/14672889

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC1754708 doi: 10.1136/ARD.2002.001784

Smith, R. J. (1989). On the Use of Distributional Mis-specification Checks in Limited Dependent Variable Models. Economic Journal, 99(395), 178–92. Retrieved from

https://econpapers.repec.org/article/ecjeconjl/v_3a99_3ay_3a1989_3ai_3a395_3ap_3a178-92.htm van der Klaauw, B., & Koning, R. H. (2003). Testing the Normality Assumption in the Sample

Selection Model with an Application to Travel Demand. Journal of Business & Economic Statistics, 21, 31–42. Retrieved from https://www.jstor.org/stable/1392348 doi: 10.2307/1392348

Varian, H. (2005). Bootstrap Tutorial. Mathematica Journal, 9(4), 768–775.

Vissers, M. M., Bussmann, J. B., Verhaar, J. A. N., Busschbach, J. J. V., Bierma-Zeinstra, S. M. A., & Reijman, M. (2012). Psychological factors affecting the outcome of total hip and knee arthroplasty: A systematic review. Seminars in Arthritis and Rheumatism, 41(4), 576– 588. Retrieved from http://dx.doi.org/10.1016/j.semarthrit.2011.07.003 doi: 10.1016/j.semarthrit.2011.07.003

Yen, T. K., Bispo, A. S., Lopes Paiva, D., Tiago de Souza, L. G., & Lopes Neto, E. B. (2018). Predictive Model and Web-based Calculator for the Oxford Hip Score After Total Hip Replacement. Orthopedic & Muscular System, 07(01), 1–6. Retrieved from

https://www.omicsonline.org/open-access/predictive-model-and-webbased-calculator-for-the-oxford-hip-score-after-total-hip-replacement-2161-0533-1000252-98677.html doi: 10.4172/2161-0533.1000252

(37)

Appendix

(38)

B

Stepwise exclusion

B.1 Step 1

Table 7: Exclusion step 1

Dependent variable: selection Heckman MLE (1) (2) KL score 0.930∗∗∗ 0.945∗∗∗ (0.148) (0.151) HOOS pain mean 0.418∗∗ 0.419∗∗

(0.169) (0.168) Endorotation 0.305 0.254 (0.386) (0.393) Constant −2.506∗∗∗ −2.493∗∗∗ (0.620) (0.619) Observations 221 221 Adjusted R2 0.472 ρ 0.611

Inverse Mills Ratio 4.430 (4.102)

Note: ∗p<0.1;∗∗p<0.05;∗∗∗p<0.01 Dependent variable: OHS difference Heckman Tobit (1) (2) OHS score t = 0 0.919∗∗∗ 0.893∗∗∗ (0.076) (0.070) KL score 2.781∗∗ 1.860∗∗ (1.342) (0.825) Age 0.132∗∗ 0.130∗∗ (0.065) (0.065) Female −1.415 −1.423 (1.061) (1.064) Depression −3.049 −3.245 (3.164) (3.197) Smoking −0.821 −0.735 (1.562) (1.575) BMI > 30 1.889 1.877 (1.559) (1.574) Constant −23.939∗∗∗ −19.803∗∗∗ (7.208) (5.397) Inverse Mills Ratio 4.430 (4.102)

Observations 221 221

R2 0.495

Adjusted R2 0.472

ρ 0.611

(39)

B.2 Step 2

Table 8: Exclusion step 2

Dependent variable: selection Heckman MLE (1) (2) KL score 0.940∗∗∗ 0.954∗∗∗ (0.147) (0.149) HOOS pain mean 0.433∗∗ 0.430∗∗

(0.168) (0.167) Constant −2.288∗∗∗ −2.315∗∗∗ (0.550) (0.548) Observations 221 221 Adjusted R2 0.475 ρ 0.762

Inverse Mills Ratio 5.660 (4.021)

Note: ∗p<0.1;∗∗p<0.05;∗∗∗p<0.01 Dependent variable: OHS difference Heckman Tobit (1) (2) OHS score t = 0 0.930∗∗∗ 0.896∗∗∗ (0.077) (0.070) KL score 3.149∗∗ 1.931∗∗ (1.353) (0.809) Age 0.130∗∗ 0.130∗∗ (0.064) (0.065) Female −1.374 −1.412 (1.059) (1.063) Depression −2.800 −3.198 (3.164) (3.197) Smoking −0.891 −0.757 (1.551) (1.576) BMI > 30 1.858 1.872 (1.547) (1.574) Constant −25.477∗∗∗ −20.112∗∗∗ (7.192) (5.343) Observations 221 221 R2 0.497 Adjusted R2 0.475 ρ 0.762

Inverse Mills Ratio 5.660 (4.021)

Referenties

GERELATEERDE DOCUMENTEN

konden de kaarten niet geaccepteerd worden, die zijn nu bij de secretaris, R.. Er is opnieuw een schoning van debibliotheek uitgevoerd, dit in samenwerking met de

&#34;Kom nou Mary, WTKG-ers gaan niet op de loop voor een beetje regen.. Ik weet nog wel een aantal

all truss nodes of the lattice model must be ’visited’ to determine the internal potential energy of the lattice model. The

Cost considered under the investment measure are treated temporarily as non-controllable cost (pass-though) for the duration of the investment measure

Hierna volgen lineaire afbeeldingen (eigenwaarden en bijbehorende eigenvectoren, kern). Hoofdstuk 10 bespreekt matrices en determinanten, lineaire vergelijkingen, de algebra van

Box-and-whisker distribution plot of typicality rating scores for young wines (a) and two-year bottle-aged wines (b) from old vine Chenin blanc grapevines of different ages..

Figure 3: Accuracy of the differogram estimator when increasing the number of data-points (a) when data were generated from (40) using a Gaussian noise model and (b) using

The standardized Precipitation Index (SPI) was used to standardize the rainfall data. The results were combined with water depth information and the data from water