• No results found

Mapping quality of life outcome measures using response mapping

N/A
N/A
Protected

Academic year: 2021

Share "Mapping quality of life outcome measures using response mapping"

Copied!
48
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BS

C

E

CONOMETRICS AND

O

PERATIONAL

R

ESEARCH

TRACK: ECONOMETRICS

BACHELOR

T

HESIS

Mapping Quality of Life Outcome Measures

using Response Mapping

by

E

RIC

K

NOT 11059028

June 26, 2018

12 ECTS April-June 2018 Supervisor: K.J.VANGARDEREN

(2)

S

TATEMENT OF

O

RIGINALITY

This document is written by Eric Knot, who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

A

BSTRACT

This study provides an empirical comparison of two different models for response mapping of quality of life PROMs. It also refines and further validates a procedure for mapping disease-specific PROMs to generic measures such as the EQ-5D. Mean EQ-5D utility index values can be predicted from Hip Osteoarthritis Outcome Scores with a high degree of accuracy. The results and procedure described in this study are of use for future mapping endeavours when it comes to mapping disease-specific to generic Quality of Life PROMs.

(3)

Contents

1 Introduction 1

2 Theoretical Framework 3

2.1 Quality of Life . . . 3

2.2 A general measure: the EuroQol-5D . . . 3

2.3 Disease-specific measures . . . 5

2.4 Mapping to EQ-5D . . . 5

2.4.1 Estimation Sample . . . 5

2.4.2 Model Specification . . . 6

2.4.3 Model Type . . . 7

2.5 Model and Hypotheses . . . 13

3 Data and Methods 14 3.1 Dataset . . . 14 3.2 Methodology . . . 14 3.2.1 Correlation Analysis . . . 14 3.2.2 Data split . . . 15 3.2.3 Estimation . . . 15 3.2.4 Models . . . 16 3.2.5 Prediction . . . 17 3.2.6 Validation . . . 17

4 Results and Analysis 18 4.1 Correlation Analysis . . . 18

4.2 Model and Prediction . . . 18

4.3 Validation . . . 20

4.4 Discussion . . . 22

4.4.1 Comparison with other research . . . 22

4.4.2 Limitations of Study . . . 23

(4)

References 27

A Appendix 29

A.1 Regression Output for the Continuous Model . . . 29

A.2 Summary of Correlation characteristics . . . 30

A.3 CCA Method . . . 32

A.3.1 Objectives of Canonical Correlation Analysis . . . 32

A.3.2 Design . . . 33

A.3.3 Assumptions . . . 34

A.3.4 Deriving Canonical Functions and Assessing Fit . . . 35

A.3.5 Interpreting Canonical Variates . . . 36

A.3.6 Validation and Diagnosis . . . 36

A.4 CCA Results . . . 37

A.4.1 Limitations and Extensions of Study . . . 40

A.4.2 Tests of Significance . . . 40

A.4.3 R-output of CCA . . . 42

(5)

1

Introduction

Osteoarthritis (OA) is one of the most common chronic health conditions in the world and a major cause of pain and disability in adults. As of 2016, there were over 1.2 million patients with OA in the Netherlands alone (Dutch Institute for Healthcare Research, 2017). Worldwide, 7.4 in every 100 adults over 60 years of age are affected (Quintana, Bilbao, Escobar, Azkarate, & Goenaga, 2009).

Total Hip Replacement (THR) is a surgical procedure used to alleviate pain and increase mo-bility as well as quality of life primarily in patients suffering from OA. In 2015, 28.798 THR operations were performed in the Netherlands, almost solely as treatment for OA (LROI, 2016). In the past decades much research has been conducted regarding the appropriateness of THR as a medical intervention. An intervention is described as appropriate if "the expected health benefit exceeds the expected negative consequences by a sufficiently wide margin" (Quintana, Bilbao, Escobar, Azkarate, & Goenaga, 2005). Quintana et al. (2005) found that 24.1 percent of all THR operations were uncertain in terms of appropriateness and 5.1 percent were deemed inappropriate (Quintana et al., 2005). This was evaluated by using explicit criteria developed by a panel of ex-perts using RAND methodology, including variables such as duration of hospital stay, pain scores, complications, mobility and events of re-intervention. At the time, no explicit measure for quality of life (QoL), however, was used to evaluate appropriateness.

Accurately measuring QoL has recently become a major goal in health care, because of its promising merit as a measure of quality and value of care for an individual patient, or of a medical intervention specifically. There is however currently no single objective, international standard for measuring quality of life in medicine. One well-established, multidisciplinary measure is the EQ-5D. This is a generic, five dimensional patient-reported outcome measure (PROM) in which patients self-report their QoL by answering five questions on a Likert scale. Though effective because of its universal applicability and ease of use, there are multiple other PROMs designed specifically for patients undergoing suffering from OA or undergoing THR, such as the Hip Dis-ability and Osteoarthritis Outcome Score (HOOS) and the Oxford Hip Score (OHS). These scores encompass a much broader range of questions, designed to specify a patient’s QoL more precisely. Among others, Noseworthy et al. (2018) have compared validity and responsiveness of EQ-5D with OHS in patients with OA undergoing THR, concluding very strong correlation between these scores and almost no difference in response rates (Connor-Spady et al., 2018).

(6)

As more and more PROMs and other disease-specific measures for quality of life are devised and used, it is becoming increasingly important to be able to "map" to generic measures such as the EQ-5D. Such generic measures are valuable and necessary for health care procedure valua-tion methods using concepts such as Quality-Adjusted-Life-Years (QALY’s) and other (health) economic variables.

The purpose of this study is to establish a procedure for accurately predicting generic QoL PROMs from disease-specific QoL PROMs. The following subquestions are defined:

1. Can the HOOS be used to accurately predict individual and mean EQ-5D outcome scores? 2. What mapping algorithm or estimation technique is most suitable for prediction?

3. How might procedures described in earlier research be improved using modern econometric techniques?

Data was used from a large private orthopedic surgery clinic in the Netherlands, encompassing a range of medical variables including EQ-5D-3L, OHS and HOOS scores of over 2200 patients before, three months after and one year after having underwent THR in the past five years. Regres-sion techniques and neural networks are used to establish predictions and the value of descriptive research such as correlation analysis is discussed. Results are compared to previous research and conclusions are drawn about the most adequate mapping procedures.

In this study, literature review as well as data studies are used. In chapter two, findings from relevant literature are summarized in a theoretical framework. In chapter three, the statistical methods used in this study are described and a complete description of the dataset is provided. In chapter four, results and analysis with respect to all three subquestions are presented and discussed. Chapter five summarizes and concludes with recommendations for mapping disease-specific to generic PROMs such as the EQ-5D using response mapping.

(7)

2

Theoretical Framework

In this study the main goal is to research if and how it is possible to accurately predict generic, established quality of life measures using others that are disease-specific. Being able to "map" onto generic measures of quality of life is extremely important in health economics, because it allows for the comparison and valuation of medical interventions such as the one used in this study. Specifically, this study focuses on patients suffering from osteoarthritis and that have undergone Total Hip Replacement (THR) surgery in an orthopedic clinic in the Netherlands.

In this theoretical framework, first a definition of quality of life is discussed, as well as possible ways of measuring quality of life using PROMs. Thereafter an important, well-established generic PROM, the EuroQol-5D, is introduced.

In the main part of this chapter, the various steps of mapping procedures are described. Various models that have proven adequate in accurately predicting generic scores are presented, as well as less successful models and shortcomings of existing econometric techniques or study designs.

The chapter concludes with a hypothesized best procedure and model for mapping from a disease-specific to a generic measure of quality of life.

2.1 Quality of Life

The World Health Organization declares health to be "a state of complete physical, mental and social well-being, and not merely the absence of disease" (Tan-Torres & Baltussen, 2003). The first attempt to institutionalize the relation between quality of life and health was the introduction of the term health related quality of life (HRQoL). This remains a loose definition, because it is unclear which dimensions such a construct should comprise of. In general however, more and more medical decision making processes incorporate HRQoL as an important outcome measure, in addition to clinical benefits or risks/side effects. This has led to the development of a broad collection of generic and disease-specific measures of quality of life being developed over the past decades (Tan-Torres & Baltussen, 2003).

2.2 A general measure: the EuroQol-5D

One of the most commonly used generic health status instruments is the EQ-5D, a PROM intro-duced by the EuroQoL group in 1990. (Whitehead & Shehzad, 2010) The EQ-5D is a

(8)

question-naire comprising of an evaluating and a descriptive part. In the descriptive part, a patient’s current health status is described in five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Mobility deals with being able to walk independently, self care with the ability to wash and dress oneself. Usual activity describes the ability to conduct usual household or leisure activities. Pain/Discomfort and Anxiety/Depression are simply direct measures of the degree to which patient experiences them. Patients score each dimension on a 3 point (EQ-5D-3L) or 5-point (EQ-5D-5L) scale, reflecting which statement of each scale most accurately describes their current health state. This means, for example, that there are 35(243) health states possible in the EQ-5D-3L scale, ranging from 11111 (having no problems in all dimensions) to 33333 (having extreme problems in all dimensions). In the evaluating part, patients evaluate their overall health state using a visual-analogue scale (EQ-VAS) in which they score their health state on a scale from 0-100, with 100 indicating best possible health (Whitehead & Shehzad, 2010).

The EQ-5D can be interpreted as a scale in the form of these 243 individual health states, but can also be converted into a so-called utility index score. Various methods for ascribing these HRQoL weights are the Visual Analogue Scale (VAS) and time-trade-off (TTO). TTO presents patients with a choice between living the remainder of their life in their current condition and thus with their current quality of life, opposed to living for a shorter period of time but in full health. The fraction of full health time and survival time with current condition serves as the HRQoL weight, and coincides with the so-called "point of indifference" between the two choices. Time-trade-off has become the preferred method for indexing utility, because it is a ’choice task’ rather than a ’rating task’, which runs the risk of scaling bias (Whitehead & Shehzad, 2010). Using TTO, a value set has been created in many different countries, which can be interpreted as the "societal valuation of the respondents’ health state" in that country. The exact valuation differs per country, but the index ranges from -0.59 to 1.00 (a score of 1 corresponding to perfect health, and -0.59 to worst possible health) for the original UK system. An advantage of using these value sets and thus an index score is that it delivers a one-dimensional, internationally comparable utility score based on a patient’s EQ-5D. Possible disadvantages are that there is a lack of compelling theoretical support for such a general population sample with respect to a linear and continuous utility index (Krabbe, Weijnen, Brooks, Rabin, & Charro, 2003).

(9)

2.3 Disease-specific measures

Among disease-specific measures of quality of life for patients with Osteoarthritis undergoing To-tal Hip Arthroplasty is the Hip Disability Osteoarthritis Outcome Score (HOOS). The HOOS is a score designed to evaluate symptoms and functional limitations of the hip. It consists of 40 sepa-rate items, assessing five patient-relevant dimensions: pain (ten items), symptoms (stiffness, range of motion (five items), activity limitations daily living (17 items), sport and recreation function (four items) and hip-related Quality of Life (four items). Every question is answered on a Likert scale, measuring from 0 to 4. The total score is transformed into a 0-100 worst to best scale. The HOOS has been validated extensively and has been found to be a valid and responsive measure of quality of life (Nilsdotter, Lohmander, Klässbo, & Roos, 2003).

2.4 Mapping to EQ-5D

"Mapping" techniques to transform outcomes from different measures of health or HRQoL to, for example, generic scores, have become more and more popular and numerous as the amount of such scores continues to grow (Longworth & Rowen, 2013). Longworth Rowen (2003) also use the EQ-5D as an example to provide an overview of how mapping is currently used in health technology assessment as well as to provide guidance on best practices when it comes to mapping using quality of life PROMs. They begin by defining "mapping" in mathematical terms as "the correspondence by which each element of a given set has associated with it one or more elements of a second set", meaning that mapping implies using a model or an algorithm to predict health-related utility or generic score outcomes by using data from other measures of health outcomes. Longworth et al. define the following elements of mapping: Defining the "estimation" data set, Model specification, Model Type, Assessing Performance, and Application. Their procedure is summarized throughout the following paragraphs, including considerations and results from other recent studies.

2.4.1 Estimation Sample

The first step in the mapping approach is obtaining an adequate estimation sample. This must consist of a data set comprising of a group of people who complete the preferred instrument (for example, a generic QoL score such as EQ-5D) to report their own health, and from whom data is also obtained on the "source" measure. To be generalizable, the estimation sample should be

(10)

as close as possible in as many relevant ways to the sample on which the mapping algorithm will be applied; i.e. all covariates used in the mapping function should be overlapping in distribution for the estimation and target samples. It is also important that all variables included within the target measure that are thought likely to have an effect, for example, on EQ-5D values, should be included in the estimation sample (Longworth & Rowen, 2013).

Kontodimopoulos, Aletras and Paliouras (2009) however found that, for mapping cancer-specific QoL scores such as the EORTC Quality of Life Questionnaire Core-30 (QLQ-C30) onto the EQ-5D, regressions differed significantly in different groups in estimating the mapping algo-rithm, indicating that such an algorithm may not be generalizable to different cancer conditions, or, more generally speaking, to all patients within a specific class of diseases using the same disease-specific measure of quality of life. This indicates that mapping may not always be possible, even if an adequate estimation sample has been obtained.

2.4.2 Model Specification

The model used to predict a certain outcome measure depends heavily on the type of outcome measure. For a multiattribute utility measure such as the EQ-5D, which consists of either a utility index or a five dimensional Likert-scale to describe health state, multiple very different types of models have been used. In a recent review of mapping studies within quality of life including 30 studies and 119 mapping models, Brazier et al. (2010) found that the most commonly used target measure was the EQ-5D (15 studies) and that the most common model specification used was the use of a preference-based utility index as the dependent variable and dimension or item scores from other instruments used as independent variables. The review also concluded that model specifications that included possible nonlinear relationships between target and measure scores, such as squared or interaction terms, had a small impact, but that this differs by target and source measures, patients groups and patient severity. Additionally, they found that the inclusion of non-health variables such as socio-demographics made some improvement in the accuracy of the mapping function (Brazier, Yang, & Tsuchiya, 2010).

Furthermore, the researchers found that the explanatory power using R2 was often low for models that involved mapping a condition-specific measure onto a generic preference based mea-sure. This could be explained by a limited conceptual overlap between important dimensions for a condition-specific measure, which may even be neglected entirely by a generic measure or vice versa.

(11)

Estimation of the mapping regression relies on statistical dependence between the target mea-sure and source meamea-sure as well as avoidance of omitted variables (Whitehead & Shehzad, 2010). Selection of explanatory variables should be based on prior knowledge of relationships between variables, (medical) theory and econometric techniques. Correlation should be used to examine relationships between source and target measures, whereby a poor correlation would indicate that the mapping function will also perform poorly. In addition, general econometric diagnostic tests such as Akaike’s information criterion (AIC) can be used to test model specification, as well as misspecification tests such as the Ramsey RESET test, or tests for omitted variables and het-eroscedasticity (e.g. a Park test) and non-normality of errors (e.g. a Jarque-Bera test) (Brazier et al., 2010).

2.4.3 Model Type

Determining the appropriate model type for mapping to EQ-5D depends on the objective of either predicting the (utility) index value or individual responses to each of the dimensions of health described by the patient, which is known as response mapping.

2.4.3.1 Mapping to EQ-5D Index Values

When mapping to utility index values, Brazier et al. (2010) found that the most common estimation technique was ordinary least squares (OLS), but that linear regressions might not always accurately predict extremely high or low EQ-5D scores. This can be attributed to the fact that the EQ-5D index is a bounded score (using UK tariffs, between -0.594 and 1, 1 indicating perfect health) and that OLS does not restrict predicted values, leading to ceiling and floor effects or even implausible predicted values. The same was concluded by Pinedo-Villanueva et al. (2013) in their study in which they compared different mapping techniques for mapping the Oxford Hip Score on to the EQ-5D Utility Index.

Various attempts have been conducted to overcome these theoretical limitations of OLS by using Tobit and censored least absolute deviation (CLAD) regression models. Kontodimopoulos (2009) however concluded that the overall improvement of prediction using these models com-pared with OLS is small. The choice and application of alternative models such as the Generalized Linear model, latent class model and Two-step model as well as random effects censored mixture model is an area of ongoing research, with mixed results (Kontodimopoulos, Aletras, & Paliouras,

(12)

2009). Other research concluded that OLS regression was the most accurate model, yet that the accuracy deteriorated in older and less healthy groups, for which the TP/TS models performed better (Chuang & Kind, 2009).

Oppe, Devlin and Black (2011) have compared the underlying constructs of another hip-specific PROM, the Oxford Hip Score (OHS), with those of the EQ-5D, as well as their impli-cations to mapping to EQ-5D. Using principal component analysis, they concluded that the OHS data could be associated with three constructs (pain, mobility and usual activity), and that the EQ-5D is multidimensional with the same construct being detected by different OHS items. They conclude that because of these and other underlying conceptual differences, mapping is unlikely to provide an appropriate basis for estimating utility index scores with the OHS (Oppe, Devlin, & Black, 2011).

However, Pinedo-Villanueva et al. (2013) concluded the opposite: in their research using different models to map the Oxford Hip Score to the EQ-5D utility index values, they found that all models estimated the mean EQ-5D score within 0.005 of an observed health state utility estimate. The models used were three transfer-to-utility (TTU) regressions (Continuous OLS, Categorical OLS, two-part Logit OLS) and response mapping. OLS continuous was found to be the most accurate in predicting the mean EQ-5D scores of a large group of patients, using their individual total HOOS scores. Response mapping was however the only technique that was able to predict low or negative utility index values. They also concluded that age, gender and deprivation did not improve the models (Pinedo-Villanueva et al., 2013).

2.4.3.2 Mapping to EQ-5D Dimension Responses

An alternative to mapping to a utility index, which in practice is a limited number of discrete values, is to map to the descriptive (multidimensional) system of the measure using response mapping. This allows index value sets to be applied separately, so that results better reflect the distribution of values and health states than if they would have been obtained if estimated directly (Longworth & Rowen, 2013).

The most commonly used method for mapping to EQ-5D dimensions is logistic regression. Multinomial logistic models used to estimate separate mapping functions to predict levels of each dimension have also been used, but Brazier (2009) and Chuang (2009) both conclude that this approach does not seem to improve prediction. Pinedo-Villanueva et al. (2013) however, demon-strated that the use of multinomial logistic regression was in fact the only adequate response

(13)

map-ping method. Considering that their work focused on the mapmap-ping the Oxford Hip Score on to the EQ-5D index values, and that their results indicated that response mapping was the only method that was able to predict lesser health states and thus low utility index values, this approach may be the most useful for the similar mapping of the HOOS to the EQ-5D.

Considering that Whitehead Shehzad (2010) and Brazier et al. (2010) both emphasize the im-portance of statistical dependence and correlation between target and source measures, an adequate multivariate correlation analysis may be necessary to assess the merit of estimating a mapping al-gorithm. Bagozzi, Fornell and Larcker (2010) describe Canonical Correlation Analysis (CCA) and its merit as a general model for most bivariate and multivariate statistical methods, because of its capability of handling multiple criteria and multiple predictors simultaneously. They in turn also describe its shortcomings, such as difficulty with determining the statistical significance of individ-ual parameter estimates or with relaxing assumptions of the canonical model that are inconsistent with theory or observed data. Because of its ability, as opposed to general regression techniques such as OLS which estimate a single dependent variable, to estimate multiple dependent variables at once, CCA may be a very useful tool in assessing expected prediction performance of map-ping algorithms (Bagozzi, Fornell, & Larcker, 1981). In the next section, a short overview of the technique is provided.

2.4.3.3 Canonical Correlation Analysis

Canonical Correlation Analysis is a method used to reduce the dimensionality of multiview data, and was invented by Harold Hotelling in 1935. Specifically, it uses a vector of independent vari-ables as well as a vector of dependent varivari-ables and computes their corresponding empirical cross-covariance matrix. It then applies Singular Value Decomposition (SVD) on this matrix to acquire linear projections of these vectors that have maximal correlation (Papasarantopoulos, Jiang, & Cohen, 2016).

CCA solves the following sequence of optimization problems for j ∈ 1, .., m where aj∈ R1×d

and bj∈ R1×d

0

(Papasarantopoulos et al., 2016): arg maxaj,bj corr(ajX

T, b

jYT) such that

corr(ajXT, akYT) = 0; k < j

(14)

where corr is the function that accepts two vectors and returns the Pearson correlation between the pairwise elements of the two vectors. The problem of CCA can be solved by applying singular value decomposition (SVD) on a cross-covariance matrix between the two random vectors X and Y, normalized by the covariance matrices of X and Y. More specifically, CCA is solved by applying thin singular value decomposition (SVD) on the empirical version of the following matrix:

(E[X XT])−12E[XYT](E[YYT])− 1

2 ≈ UΣVT

where E[] is the expectation operator and Σ is the diagonal matrix of size m × m for some small m. (Papasarantopoulos et al., 2016)

2.4.3.4 Regression Techniques for Response Mapping

Considering the ordered nature of Likert scales such as those used in the HOOS and the EQ-5D, intuitively, an ordinal logistic regression would seem to be the correct estimation method for mapping. However, in previous mapping studies using Response Mapping, both Grey, Rivero-Arias and Clarke (2006) and Pinedo-Villanueva (2013) used multinomial logistic regressions to predict EQ-5D scores. The reason for this, is that for an ordinal logistic regression, the parallel lines assumption must hold, which states that slope coefficients in the model are the same across response categories. This can be tested formally using a Likelihood Ratio test, but is very unlikely to hold in scales involving such subjective measures as pain or quality of life. Another reason to use multinomial logistic regression is that it allows for relaxation of many assumptions necessary for other types of logistic regression.

2.4.3.5 Multinomial Logistic Regression

Multinomial logistic regression is an extension of the binary logistic regression, in which two or more categorical dependent outcome variables are explained by a set of independent variables. The primary outcome of such a model is the logarithm of the odds or relative probability of a certain outcome variable, relative to a reference category. In equation form, this gives:

zj= β0+ k

i=1

βixi

(15)

Multinomial regression does not assume normality, linearity or heteroscedasticity. For sample size, the rule of thumb is to use at least 10 observations per independent variable in the model (Garson, 2016). Classically, multinomial logistic models are estimated using Maximum Likeli-hood Estimation (MLE). However, some statistical software such as R include other estimation techniques such as neural networks. Previous research, such as that done by Pinedo-Villanueva et al (2013) and Grey, Rivero-Arias and Clarke (2006) used MLE to find probabilities for all outcome categories, and then used Monte Carlo simulation to attain health state predictions from the multi-nomial logistic regression model. The latter used dummy variables for every possible response level as independent variables, because, they argue, the nature of scales in dimensions like pain and quality of life cannot be assumed to be proportional. For example, one cannot assume that an improvement on a HOOS item score from a response level 4 to response level 3 is equal to an im-provement from 2 to 1. For this same reason, ordinal logistic regression was not applicable. This greatly increases the complexity of model, even more so for a model incorporating many different independent variables, such as response levels for all 40 HOOS questions. Furthermore, MLE is highly sensitive to multicollinearity, but neural networks as a prediction technique to a much lesser extent. This means that more response levels for very similar questions and thus highly correlated independent variables such as those in disease-specific PROMs can be retained in the model. However, when reviewing regression outputs, highly correlated independent variables are still likely to lead to high standard errors due to variance inflation.

2.4.3.6 Neural Networks

The neural network employed for estimation of multinomial logistic regressions in R is a so-called feed-forward neural network with a single hidden layer, which provide a flexible way to generalize linear regression functions. In this section, a short introduction to the basic structure of this neural network as described by Venables and Ripley (2006) is provided. A schematic overview of such a generic network is provided in figure 1 (Venables & Ripley, 2002).

Input layer units distribute input to ’hidden’ units in the second layer. These units sum their inputs, add a constant and take a fixed function φhof the result. Output units apply an output function φo.

yk= φo αk+

i→k wikxi+

j→k wjkφh αj+

i→ j wi jxi !!

(16)

Figure 1: Structure of a single layer Neural Network. Source: Venables Ripley (2002), page 244

In the case of a logistic regression, the φh’activation function’ is simply the logistic function:

l(z) = e

z

1 + ez

and the output units are linear, logistic or threshold units. The regression function f is parameter-ized by the set of weights wijfor every link in the network. The weights are chosen to minimize

a fitting criterion. In the multinomial log-linear model with K classes, a neural network with K outputs and the negative conditional log-likelihood yields the following fitting criterion:

E=

p

k −tkplogpkp with ppk = ey p k ∑Kc=1ey p c

(17)

2.5 Model and Hypotheses

To conclude the theoretical framework, the most appropriate method to map the HOOS to the EQ-5D is to use response mapping with seperate multinomial logistic regressions for each dimension of the EQ-5D and to predict health states using neural networks. Canonical Correlation Analysis can be used to assess the degree of correlation between the two methods, with which the final prediction results can be compared. Authors such as Brazier et al. (2010) have concluded that low correlation would indicate poor mapping performance. By using estimation techniques involving neural networks, mapping performance of models that assume a linear Likert scale responses as a single variable can be compared with extensive models used in previous research invoking dum-mies for every response level. With this approach, the assumption that low correlation between PROMs indicates low predictive performance can be tested bu comparing the quality of prediction with the outcomes of canonical correlation analysis. Also, the predictive performance of the sim-plified model can be compared to the dummy models used in previous research. The predictive performance of the models can be compared by means of comparison of the mean, range of the fitted values and true utility index values, as well as the Mean Absolute Error, McFadden’s R2, and root mean squared error, as has been done in previous similar response mapping studies such as by Grey, Rivero-Arias and Clarke (2006) and Pinedo-Villanueva et al (2013).

(18)

3

Data and Methods

3.1 Dataset

In this chapter, the data and methods used in this study are explicated. First general information about the dataset is given, as well as some relevant statistics. In the second part, Canonical Cor-relation Analysis is described stage by stage, including which assumptions and diagnostics are necessary, specifically in relation to the dataset used. Finally, the mapping procedure and model, as well as model evaluation and validation criteria are described.

Data was used from an orthopedic surgery clinic in the Netherlands. In the course of five years, patients with a possible medical indication for total hip replacement surgery were followed. During each interview patients succumbed to physical and radiologic examinations in addition to general medical anemnesis. Specifically, the patient’s Oxford Hip Score (OHS), Hip Osteoarthritis Outcome Score (HOOS) and Euroqol-5D-3L (EQ-5D) score were also registered at each interview. The first interview was conducted on November 5th, 2013 and the last on April 16th, 2018. There were a total of 2265 measurement measurements (including all of this data), of which the first 62 did not yet include HOOS scores. All interviews with missing HOOS or EQ-5D scores were excluded from the dataset used in this study.

All QoL surveys were completed for 2163 observations. This included 1393 t0measurements,

560 t1 measurements and 210 t2 measurements. The t0 was upon intake in clinic and was used

as baseline to evaluate indication of undergoing an operation. The t1measurement was 13 weeks

after surgery and t252 weeks after surgery. In the case that baseline characteristics such as gender

or age at time of interview were missing, observations were left out of the analysis.

3.2 Methodology

3.2.1 Correlation Analysis

For analysis of the correlation between outcome measures, Canonical Correlation Analysis is used to estimate the redundancy index between the HOOS and EQ-5D scores. A full methodology, in-cluding all assumptions, tests and and a guide to interpretation can be found in the appendix. Figure 2 illustrates the Canonical Correlation Analysis for the mapping of HOOS to EQ-5D di-mensions. Comparing the outcomes of CCA with the predictive performance of the models allows for testing of the hypothesis presented in chapter two that low correlation between source and

(19)

Figure 2: Schematic overview of CCA of HOOS and EQ-5D scores

target measures indicate low predictive performance of a mapping algorithm.

3.2.2 Data split

The original dataset is split into two groups: an estimation sample, ‘train’, used to estimate the predictive model, and a validation sample, ‘test’, used to test the predictive performance of the model on other observations. Because no other comparable data was available, external validation was done with the test sample. Both samples are chosen such that differences in their mean EQ-5D utility values, percentage of females, mean age, timing of observations, and range and amount of EQ-5D utility scores are statistically insignificant. Also, earlier work by Pinedo-Villanueva (2013) and Grey, Rivero-Arias and Clarke (2006) describe benefiting from pooling all t0, t1and t2

observations in both estimation and prediction samples. Theoretically there is no reason that the mapping relation should differ before or after receiving treatment, which was confirmed in their research. For this reason, the choice to pool observations is also made in this study.

3.2.3 Estimation

The Response Mapping technique aims to predict the individual score on each dimension of the EQ-5D for every observation. This is done by estimating a multinomial logistic regression model

(20)

for the two models. The estimation is done using a single layer feed-forward neural network and the observations from the train subset.

3.2.4 Models

Furthermore, two different models are considered. A "continuous" model, which treats Likert scale answers such as those in the HOOS questionnaire as a linear scale, and a "dummy" model, which uses every unique response level to individual questions of the HOOS as a separate predictor.

The following model was used for the final five multinomial logistic regressions in the contin-uous model (one for each EQ-5D dimension):

EQ5Di= 5

j=1 N

k=1 HOOSjk

with i the 5 EQ-5D dimensions, k the 5 HOOS subscales, and j the various questions within each subscale (N ∈ (5, 10, 17, 4, 4) for symptoms, pain, activities, sports/recreation and quality of life, respectively).

For the dummy model, this resulted in fifteen multinomial logistic regressions (one for each response level of each EQ-5D dimension), each including 200 variables, in the following model:

EQ5Dim= 5

j=1 N

k=1 5

l=0 Djkl

with Djkl a dummy variable, i the 5 EQ-5D dimensions, m the level of each EQ-5D dimension, k

the 5 HOOS subscales, and j the various questions within each subscale (N ∈ (5, 10, 17, 4, 4) for symptoms, pain, activities, sports/recreation and quality of life, respectively) and each response level for each question (l ∈ (1, 5)).

Because the objective of this study is prediction and estimation is done using a trained neu-ral network as opposed to MLE, multicollinearity or heavily correlated (in)dependent variables do not affect predictive performance of models (Venables & Ripley, 2002). This is checked by dropping combinations of highly correlated variables and considering the difference in predictive performance.

(21)

3.2.5 Prediction

In turn, the trained neural network for both models is used to predict every EQ-5D dimension score for each individual observation with R’s nnet package. These predicted individual EQ-5D health states (possibly ranging from 11111 to 33333), as well as the "true" individual health states for each observation from the dataset, are then converted to a utility index using the Dutch EQ-5D-3L Tariff (Lamers, McDonnell, Stalmeier, Krabbe, & Busschbach, 2006).

3.2.6 Validation

Validation of results is done internally and externally. For the internal validation, the true EQ-5D utility values of the train group are compared to the predicted values for each model. For external validation, the neural network trained with the ‘train’ subset is used to predict EQ-5D health states for the ‘test’ sample, which are then in turn also converted into utility index values.

To assess and compare the predictive performance of each model, the predicted mean EQ-5D utility value, range of and amount of unique values, Mean Absolute Error, McFadden’s Pseudo-R2, Root Mean Squared Error, percentage of perfectly predicted values and percentage of predicted values within 0.1 of true utility value for both models are obtained and compared.

(22)

4

Results and Analysis

This chapter presents and interprets results. First, the conclusions from the Canonical Correlation Analysis are presented. Then the outcome measure and validation for both predictive models are presented and discussed. Finally, results are compared with previous research and limitations of the study are described.

4.1 Correlation Analysis

The first CCA run on the 1393 t0observations indicated that the first four canonical variates were

statistically significant with p-values of 0, found by using Rao’s F Approximation (see Appendix). They yielded canonical roots of sufficient magnitude to be deemed practically significant.

The Aggregate Redundancy Coefficients (see Appendix) indicate that collectively, 23.9 per-cent of the total variance of HOOS scores is explained by variance in EQ-5D scores, and that 15.4 percent of the variance of EQ-5D scores is explained by variance of HOOS scores.

The second CCA run on the 560 t1observations and using HOOS row means as independent

variables indicated that the first three canonical variates were statistically significant with p-values of 0.00, and the fourth with significance of 0.05 and a p-value of 0.025, found by using Rao’s F Approximation (see Appendix). They also yielded sufficiently large canonical roots.

The Aggregate Redundancy Coefficients (Appendix table) indicate that collectively, 40.7 per-cent of the total variance of HOOS scores is explained by variance in EQ-5D scores, and that 26.9 percent of the variance of EQ-5D scores is explained by variance of HOOS scores.

Based on these findings and the findings from earlier research described in chapter 2, it is clear that overall expected predictive performance of any model mapping HOOS scores to EQ-5D health states or utility is very low.

4.2 Model and Prediction

The original dataset was split into the two subset: a ’test’ and a ’train’ sample. Table 1 shows the summary statistics for both groups. Both groups were very similar with almost exactly corre-sponding mean ages and standard deviations, as well as percentage of female patients and mean EQ-5D scores and percentage of t0, t1and t2observations.

(23)

Table 1: Summary Statistics

Statistic Train (n = 1099) Test (n = 1101)

Mean Age at operation (SD) 64 (9.8) 64 (9.3)

Female 61 % 61 %

Mean EQ-5D Utility 0.72 (0.73) 0.73 (0.73)

Range of EQ-5D Utility 0.067 - 1.000 -0.033 - 1.000

Percentage t0 64 63

Percentage t1 25 26

Percentage t2 11 11

Amount of unique Utility scores 53 54

The models described in chapter 3 were estimated and used for prediction. Regression outputs for the continuous model including coefficients and standard errors can be found in the appendix.

The figures above show histograms of the true and predicted EQ-5D index values for both models. Many individual regressors were not statistically significant. Removing individual or combi-nations of statistically insignificant or highly correlated regressors did not improve performance of either model, so all regressors were kept in both models. Because the focus was on prediction and

(24)

the estimation method used for the multinomial logistic regression was a trained neural network, multicollinearity did not affect performance, as described in chapter 3. However, standard errors of many variables in the full model are extremely high, which does suggest a degree of variance inflation that might be caused by a degree of multicollinearity.

4.3 Validation

Both variations of the model were validated internally using the train subset. Table 2 shows sum-mary performance indicators for the internal validation. The mean fitted EQ-5D utility value was estimated within 0.05 for both models. The continuous model encompassed a slightly larger range then the true values of the EQ-5D utility scores, predicting 30 unique utility values. The dummy model encompassed a slightly smaller but nevertheless complete range, predicting 70 unique util-ity values. The models predicted individual utilutil-ity values perfectly in 38% resp. 42%, and pre-dicted within 0.1 of the true utility value in 67% resp. 72% percent of the observations. The mean absolute error and root mean square deviation of the entire set of predictions was slightly better for the dummy model in the internal validation.

Table 2: Models’ Performance and Internal Validation

Model Mean fitted EQ-5D Difference with true mean Range of fitted values % Perfect Prediction % within 0.1 MAE R2 RMSE Continuous Model 0.77 -0.042 -0.033 38 67 0.11 0.38 0.19 1.000 (30) Dummy Model 0.75 -0.027 0.035 42 72 0.086 0.55 0.16 1.000 (70)

Table 3 shows summary performance indicators for the external validation, predicting the utility index for observations in the test subset of the data. Both models estimated the mean EQ-5D util-ity index values within 0.04. The dummy models’ predictions encompassed the complete range of true utility values, whereas the continuous model did not predict the lowest possible true utility values. The continuous and dummy models predicted 32% and 25% utility values perfectly, pre-dicting 34 and 88 unique utility index values respectively. Conversely, 62% vs. 58% of predictions

(25)

Figure 3: MAE per EQ-5D Utility Value Deciles for Continuous Model

were within 0.1 of the true utility value. The MAE and RMSE of the entire set of predictions was slightly but significantly better for the continuous model than for the dummy model.

Table 3: Models’ Performance and External Validation

Model Mean fitted EQ-5D Difference with true mean Range of fitted values % Perfect Prediction % within 0.1 MAE R2 RMSE Continuous Model 0.77 -0.037 0.066 32 62 0.12 0.29 0.21 1.000 (34) Dummy Model 0.76 -0.027 -0.060 25 58 0.14 0.25 0.22 1.000 (88)

For both models, prediction for lower EQ-5D scores indicating lesser health was significantly less accurate than prediction for (near) perfect health. Figures 3 and 4 show the Mean Abso-lute Error per EQ-5D decile for both the continuous as the dummy model. This lower prediction performance for lesser health states is similar to findings by Pinedo-Villanueva et al. when map-ping the OHS to the EQ-5D. In their article, they concluded that the same was found in their, but also other mapping studies mapping disease-specific and generic PROMs to the EQ-5D. The re-searchers also found very similar MAE and differences between deciles of EQ-5D as those found here. This lower predictive performance could in part be due to the fact that overall, there were

(26)

Figure 4: MAE per EQ-5D Utility Value Deciles for Dummy Model

much less low utility index value and health state observations than those indicating good or near perfect health. It may also be the case that this floor effect is due to inadequate differentiability between degrees of lower health states. For example, considering that the HOOS allows for a 5-level response and the EQ-5D-3L for only 3, it may well be that patients differentiate less between their own symptoms when assigning low health scores to either scale.

4.4 Discussion

4.4.1 Comparison with other research

Predictive performance for both models is very similar to the results for the OHS to EQ-5D Mean Utility study by Pinedo-Villanueva et al. (2013). Though the objective of their study was to calculate mean EQ-5D scores for a large group of individual OHS observations, their conclusion that the Response Mapping approach adequately can predict the full range of EQ-5D Utility index values is further verified by results of this study. Though the mean EQ-5D utility values in this study were predicted less accurately then in the results for the OHS, the prediction of individual EQ-5D health states and utility values appears to be more accurate using a trained neural network as prediction method as opposed to Monte Carlo simulation comparison.

The dummy model however did produce slightly better results in the internal validation. How-ever, it is interesting to note is that where previous mapping studies, including Pinedo-Villanueva et al. (2013), used a ’dummy’ model when using Likert-scales as variables - for the obvious reason that subjective PROM scales are very unlikely to be linear - it appears that treating each question

(27)

as a linear variable, using an estimation technique invoking a neural network, provides equal or better results in terms of predicting health states in the test sample than the far more complex ’dummy’ model. The questions remains whether or not this result also holds for PROMs with less (similar) questions and degree of overlap than the HOOS.

In previous research such as the study by Pinedo-Villanueva et al. and Grey, Rivero-Arias and Clarke (2006), conclusions about the amount of variance of EQ-5D scores that can be explained by the variance in HOOS scores are drawn on basis of the R2of the models with their predicted and true utility index values. However, the interpretation of the R2in multinomial logistic regression is not analogous to that in a linear regression. A better measure is the use of McFadden’s Pseudo-R2 and the percentage of correctly predicted response levels. Table 4 gives, for each of the 5 multinomial logistic regressions in the continuous model, the percentage of correctly predicted response levels for the train and test subset, as well as McFaddens R2for the regression.

Table 4: Measures of Assessment of fit of individual regressions in continuous model Model for EQ-5D Dimension % Correct in ’train’ % Correct in ’test’ McFaddens R2

1 89 85 0.51

2 82 80 0.31

3 80 81 0.40

4 71 66 0.27

5 82 82 0.17

We see that the models predict well in all dimensions, but that clearly the fourth dimension, Pain and Discomfort, is predicted least accurately. This is curious, because "Pain" is also an important subscale of the HOOS.

4.4.2 Limitations of Study

This study has several limitations. First of all, there was a large difference in amount of t0, t1and t2

measurements, making it difficult to compare results for the different groups. This study benefited form pooling together scores for both validation samples, because this allowed for a larger range of different responses, considering the fact that patients who had undergone treatment (the t1and t2

groups) rarely exhibited lesser health states, rendering proper mapping of these states impossible. Furthermore, no external data was used to validate findings, which by splitting the data meant a fifty percent reduction in the amount of estimation sample observations. A larger number of

(28)

observations is likely to further improve prediction performance of the models.

In terms of analysis, the focus of this study was prediction. It was concluded that prediction did not improve or actually change when highly correlated or highly insignificant (combinations of) variables were moved from either model. For this reason, all questions were used for both final models. However, correlation analysis shows that many individual HOOS questions have very high correlations. This probably explains the remarkably high standard errors in the regression output of the continuous model, as multicollinearity causes inflation of variance. An extension of this research would be to find the optimal model in terms of descriptive and predictive power, thus negating multicollinearity whilst maintaining the same or better prediction performance.

Also, in this study the three-level version of the EQ-5D was used. At present, also a five-level version is available and widely used. This version encompasses a much larger amount of possible health states (3125 as opposed to 243), and thus many more possible utility value outcomes. This would imply that if a model with similar predictive performance as those found in these results would be found, a much more accurate differentiation between health states and utility values is possible, also allowing for more exact predictions. It does however also pose a question as to how much (added) value a scale with so many small differences has in describing something as complex and subjective as quality of life.

(29)

5

Conclusion

This study provides an empirical comparison of two different models for response mapping of quality of life PROMs. It also refines and further validates a procedure for mapping disease-specific PROMs to generic measures such as the EQ-5D.

Mean EQ-5D utility index values can be predicted from Hip Osteoarthritis Outcome Scores with a high degree of accuracy. Individual utility index values in the models used in this study can be predicted from a patient’s HOOS within 0.1 in over 60% percent of cases. The mean absolute error of predictions made in the external validation in this study was 0.12.

Correlation Analysis between two outcome measures, used to assess the expected predictive performance of a response mapping algorithm, is of less value than previously stated in literature. Even though the results of Canonical Correlation Analysis in this study appeared detrimental for the predictive performance of a mapping algorithm for the HOOS and the EQ-5D in indicating that only between 15 and 27 percent of the variance of EQ-5D could be explained by variation in HOOS response levels, this was certainly not representative for the predictive performance of the models used.

Previous research done by Pinedo-Villanueva et al. (2013) and Grey, Rivero-Arias and Clarke (2006) in mapping to EQ-5D used response mapping algorithms using multinomial logistic re-gression, Monte Carlo simulation comparison, and a selection of questions (as opposed to using every single question from the PROM) to prevent multicollinearity. Pinedo-Villanueva et al. also employed a model in which dummies were used for every response level of every subscale ques-tion used in source measure, in their case the Oxford Hip Score. This study shows that, using a trained single-layer feed-forward neural network as an estimation technique, the selection of ques-tions as well as the use of dummies for response levels and Monte Carlo simulation comparison to assign health state predictions may not be necessary. The full continuous model used in this study, in which subjective Likert scales responses like those in the HOOS and EQ-5D were treated as individual (linear) variables, showed better predictive performance in the external validation than the more complex model including many dummy variables. In short, it appears that using multi-nomial regression and a feed forward neural network as estimation technique is the most adequate procedure for response mapping.

(30)

Further research could be done to extend the model to minimize multicollinearity whilst retain-ing predictive performance, to make regression outputs more interpretable. Also, a larger sample size, a truly external validation sample and the use of the five-level version of the EQ-5D would be valuable in further improving results and the response mapping procedure.

In conclusion, the results and procedure described in this study are of use for future mapping endeavours when it comes to mapping disease-specific to generic quality of life PROMs. These results may also well be of use for the mapping of a range of other questionnaires or test scores, both inside and outside of medical research.

(31)

References

Bagozzi, R. P., Fornell, C., & Larcker, D. F. (1981). Canonical correlation analysis as a special case of a structural relations model. Seminars in Nuclear Medicine, 16(4), 437-454. Brazier, J., Yang, Y., & Tsuchiya, A. (2010). A review of studies mapping (or cross walking)

non-preference based measures of health to generic preference-based measures. European Journal of Health Economics, 11(2), 215-225.

Chuang, L. H., & Kind, H. (2009). Converting the SF-12 into the EQ-5D: an empirical comparison of methodologies. Pharmacoeconomics, 27(6), 491-505.

Connor-Spady, B., Marshall, D.A., E., B., Dunbar, M., & Noseworthy, T. (2018). Comparing the validity and responsiveness of the eq-5d-5l to the oxford hip and knee scores and sf-12 in osteoarthritis patients 1 year following total joint replacement. Quality of Life Research, 27(4), 1311-1322. doi: https://doi.org/10.1007/s11136-018-1808-5

Dutch Institute for Healthcare Research. (2017). Prevalance of arthrosis in the netherlands. Dutch Ministry for Public Health, Wellbeing and Sports (VWS).

Garson, G. (2016). Logistic regression: Binomial and multinomial, 2016 edition. Asheboro, North Carolina, US: Statistical Associates Publishers.

Hair Jr, J., Black, W., Babin, B., & Anderson, R. (2010). Multivariate Data Analysis, 7th edition. New York: Pearson.

Kontodimopoulos, N. A., Aletras, V., & Paliouras, D. (2009). Mapping the cancer-specific EORTC QLQ-C30 to the preference-based EQ-5D, SF-6D, and 15d instruments. Value in Health, 12(8), 1151-1157.

Krabbe, P., Weijnen, T., Brooks, R., Rabin, R., & Charro, F. (2003). The measurement and valuation of health status using EQ-5D: A European perspective: Guidelines for analyzing and reporting EQ-5D outcomes. Dordrecht: Springer Netherlands.

Lamers, L., McDonnell, J., Stalmeier, P., Krabbe, P., & Busschbach, J. (2006). The dutch tariff: results and arguments for an effective design for national EQ-5D valuation studies. Health Economics, 15(10), 1121-1132.

Longworth, L., & Rowen, D. (2013). Mapping to obtain EQ-5D utility values for use in NICE health technology assessments. Value in Health, 16(1), 202-210.

LROI. (2016). Annual report of national registrate of orthopedic implants. NOV.

(32)

outcome score (HOOS)–validity and responsiveness in total hip replacement. BMC Muscu-loskeletal Disorders, 30(6), 4-10.

Oppe, M., Devlin, N., & Black, N. (2011). Comparison of the underlying constructs of the EQ-5D and oxford hip score: Implications for mapping. Value in Health, 14(6), 884–891.

Papasarantopoulos, N., Jiang, H., & Cohen, S. (2016). Canonical correlation inference for map-ping abstract scenes to text. AAAI, 26(1), 1-10.

Pinedo-Villanueva, R. A., Rafael, A., Turner, D., Judge, A., Raftery, J., & Arden, N. (2013). Mapping the Oxford Hip Score onto the EQ-5D utility index. Quality of Life Research, 22(3), 665–675.

Quintana, J., Bilbao, A., Escobar, A., Azkarate, J., & Goenaga, J. (2005). Appropriateness of total hip joint replacement. International Journal for Quality in Health Care, 17(4), 315-321. doi: http://dx.doi.org/ 10.1093/intqhc/mzi047

Quintana, J., Bilbao, A., Escobar, A., Azkarate, J., & Goenaga, J. (2009). Decision trees for indication of total hip replacement on patients with osteoarthritis. Rheumatology, 48(11), 1402-1409.

Tan-Torres, T., & Baltussen, R. (2003). Making choices in health: The WHO guide to cost-effectiveness analysis. Geneva: World Health Organization.

Venables, W., & Ripley, B. (2002). Modern applied statistics with s. New York: Springer. Whitehead, S., & Shehzad, A. (2010). Health outcomes in economic evaluation: the QALY and

(33)

A

Appendix

A.1 Regression Output for the Continuous Model

(34)

Figure 6: Second half of Regression Output

(35)

Figure 7: Correlation between HOOS subscale means (t0)

(36)

Figure 9: Correlation between HOOS subscale means (t1)

Figure 10: Correlation between EQ-5D subscales (t1)

A.3 CCA Method

In this section the complete methodology of research is explicated, based upon the description in Hair et al. (2010). Also is described which assumptions and manipulations of the data are necessary for Canonical Correlation Analysis, as well as various stages of the analysis.

A.3.1 Objectives of Canonical Correlation Analysis

The goal of CCA in this study is to determine the magnitude of relationships that may exist be-tween the two sets. To do this, the canonical correlation coefficients, the redundancy index of

(37)

Figure 11: Schematic overview of CCA of HOOS and EQ-5D scores

(in)dependent variables, and the canonical (cross)loadings are the most important research objec-tive (Hair Jr, Black, Babin, & Anderson, 2010). In figure 11 the procedure is illustrated.

A.3.2 Design

Considering the fact that CCA is a generalization of multivariate analysis, it shares implementa-tion issues common to all multivariate techniques, such as an appropriate sample size, conceptual linkage of variables and absence of missing data or outliers.

Sample Size

The rule of thumb for sample size in CCA or economic analysis is at least 10 observations for every included independent variable.(cite)

Variables and their Conceptual Linkage

The objective of this study is to find a relationship between HOOS and EQ-5D scores with which one could predict EQ-5D scores from HOOS. This is theoretically feasible, as both are a measure of QoL, and their various domains overlap conceptually. Consider:

(38)

Though not corresponding precisely, it may be clear that there is much overlap to be expected between these dimensions, and thus that the score on one survey would correlate with the score on the other. HOOS scores on each of the 40 questions are on a 0-4 scale, and EQ-5D on a 1-3 scale. Missing Data and Outliers

For a correct implementation of CCA, it is important to correct for any missing data or outliers. For this reason, the first 60 observations of the full dataset were ommitted, because of their lack of HOOS entries. The remaining dataset did not contain any outliers or missing values.

A.3.3 Assumptions Linearity

Because the range of the HOOS (5 points) and the EQ-5D score (3 points) on a numeric integer scale were so similair, no transformation of values was deemed necessary. It is theoretically feasi-ble that the relationship between HOOS and EQ-5D is linear.

Normality

Though CCA does not necessarily require strict normality or standardized data, for validation of results and especially the statistical significance of found canonical coefficients, crossloadings and redundancy indexes it is beneficial. The most important aspect is that the data is unimodal. The full dataset, comprising of ti for i ∈ (0, 1, 2) frequently displayed a bimodal distribution (See

his-tograms), due to the beneficial aspects of treatment. As such, t0scores in both indexes appeared to

be much higher than t1 or t2 scores, indicating that surgery had helped alleviate symptoms, pain,

and the ability to conduct daily activities. For this reason, two seperate analyses were deemed necessary: one for t0and one for t1. The sample size of t2measurements was too little to conduct

a full separate analysis, because the ratio of independent variables : observations is only approxi-mately 1:5, as opposed to the recommended 1:10. Also, the improvement due to surgery is already very clear in t1 (see data summary). Thus, t0 and t1 have unimodal distributions, for which the

(39)

Homoscedasticity

Heteroscedastic relationships decrease correlation between variables. By using seperate t0and t1

sets, an important source of heteroscedasticity is negated. However, in datasets concerning treat-ment effects, it is common to observe floor- and ceiling effects. There is a selection bias in the sense that in general, more patients in the t0group will have much higher or the highest possible

HOOS or EQ-5D scores, than the t1group, in which the contrary may be true: a significant portion

of patients may have no symptoms or pain, and thus the lowest possible HOOS or EQ-5D score possible.

Multicollinearity

Multicollinearity is an issue in multivariate analysis in which two independent variables are strongly correlated. This confounds the ability for various techniques to adequately describe the individual effect on either variable on dependent variables. To test for multicollinearity, the individual corre-lations between all HOOS questions and all EQ-5D questions were calculated. Several variables proved to have very high correlations. See figures 7-10.

A.3.4 Deriving Canonical Functions and Assessing Fit

Canonical Correlation analysis first derives a first canonical function such that its correlation coef-ficient is maximal. Further canonical functions are then created using the "leftover" variance from each successive previous function. In assessing fit of the analysis, three factors are of importance. Statistical Signifance

First, the canonical correlations must be statistically significant to an appropriate degree. The most widely applied tests for CCA are Rao’s apporximation of the F test and Wilk’s lambda.

Magnitude of Canonical Correlations

Also, for meaningful analysis, the correlation coefficient of canonical functions must be of suffi-cient magnitude. There are no generally accepted guidelines for this magnitude, so interpretation is completely relient on a theoretical relation between independent and dependent sets of variables, and thus their canonical variates.

Redundancy Index

The last measure is the so-called redundancy index of the entire set of canonical functions. Canon-ical Correlation coefficients describe the amount of shared variance between linear composites of (in)dependent variables, not of the sets of variables themselves. The redundancy index is a mea-sure for the amount of variance of dependent or independent variable set which is explained by

(40)

variance in the other set. In the theoretical framework it was described that in general, low cor-relations between measures, such as HOOS and EQ-5D as predicted by a measure such as the redundancy index, indicate a poor predictive quality of mapping techniques.

A.3.5 Interpreting Canonical Variates Canonical Weights

Common interpretation of CCA lies in the magnitude and sign of canonical weights, which in-dicate relative contributions of variables to canonical functions. Small weights may inin-dicate that variables are irrelevant in the relation constructed by the canonical function, or that their effects have been partialed beacuse of multicollinearity. Negative signs imply an inverse relation. In general, assessment of canonical weights is difficult because of their instability among different samples or because of multicollinearity.

Canonical Loadings

Canonical loadings convey the linear correlations between variables and their canonical variate. In this sense, the same problem arises as with canonical weights: canonical loadings are also rela-tively instable among other samples, which makes external validity of findings difficult. Canonical Crossloadings

The most robust proposed method of interpreting CCA, especially in datasets in which some de-gree of multicollinearity is present, is using canonical crossloadings, which portray the correlation between (in)dependent variables and the opposite canonical variates.

A.3.6 Validation and Diagnosis

To validate findings in the CCA, a dataset can be split, whereby the analysis can be performed on both subsets indivdually. By comparing canonical variates, redundancy indices and canonical weights/(cross)loadings, one can assess whether the model is adequate and whether the relation-ship found is robust.

In this study, primary interpretation is thus based on the significance of canonical correlation coefficients, their magnitude and redundancy index, and canonical (cross)loadings. The dataset is split naturally in to a t0and t1set, for two reasons. First, to remove a bimodal distribution due to

treatment effect (it is clear that after treatment, patients’ quality of life outcome scores significantly improve, for all measures.

(41)

A.4 CCA Results

The first CCA run on the 1393 t0observations and using HOOS row means as independent

vari-ables gave the following results:

Table 5: Canonical Correlation Coefficients (t0)

CV 1 CV 2 CV 3 CV 4 CV 5

0.606 0.187 0.143 0.096 0.021

The first four canonical variates were statistically significant with p-values of 0, found by using Rao’s F Approximation. (see appendix) They yield the following canonical roots, which provide an estimate of the shared variance between the canonical variates.

Table 6: Canonical Roots (t0)

CV 1 CV 2 CV 3 CV 4 CV 5

0.367 0.035 0.020 0.009 0.0004

The canonical variates are of sufficient magnitude to be deemed practically significant. The Aggregate Redundancy Coefficients (Appendix table ..) indicate that collectively, 23.9 percent of the total variance of HOOS scores is explained by variance in EQ-5D scores, and that 15.4 percent of the variance of EQ-5D scores is explained by variance of HOOS scores. Considering that only the first canonical variate has an R2of 0.367 and the rest significantly lower, attention should be focussed on the first canonical function.

Due to the high correlation between mean HOOS scores, canonical weights and loadings are unfit for interpretation, because they are very unstable due to the high degree of collinearity. There-fore, only canonical crossloadings are deemed acceptable for interpretation of CCA results.

All independent variables were strongly correlated with the first dependent canonical variate, with a positive correlation of 0.431, 0.546, 0.586, 0.512 and 0.292, respectively. This reflects the high degree of shared variance between the two variables. By squaring these values we find that 18.6, 29.9, 24.3, 26.2 and 8.5 percent of the variance of the dependent canonical variate is explained by the HOOS mean subscale scores respectively.

For the dependent variables, we see a similair pattern: all dependent variables were correlated with the first independent canonical variate, with a correlation of 0.420, 0.390, 0.440, 0.400, and

(42)

0.210, respectively. From this, it is clear that 17.6, 15.2, 19.4, 16.0, and 4.4 percent of variance of the EQ-5D subscale scores can be explained by the first canonical variate.

Table 7: Canonical Crossloadings (HOOS subscale row means) (t0)

CV 1 CV 2 CV 3 CV 4 CV 5 hoost0.s.rowmeans 0.431 0.021 −0.025 -0.004 -0.014 hoost0.p.rowmeans 0.546 -0.078 0.007 -0.009 -0.001 hoost0.a.rowmeans 0.586 0.016 0.031 -0.009 0.0004 hoost0.sp.rowmeans 0.512 0.019 -0.068 -0.005 0.004 hoost0.q.rowmeans 0.292 -0.009 0.004 0.084 -0.00004

Table 8: Canonical Crossloadings (EQ-5D subscales) (t0)

CV 1 CV 2 CV 3 CV 4 CV 5 eq5d.1 0.420 0.026 -0.087 -0.031 -0.003 eq5d.2 0.390 0.110 0.071 -0.002 -0.00004 eq5d.3 0.440 -0.016 -0.023 0.064 0.002 eq5d.4 0.400 -0.110 0.055 -0.020 -0.004 eq5d.5 0.210 -0.023 0.007 -0.017 0.019

The second CCA run on the 560 t1observations and using HOOS row means as independent

variables gave the following results:

Table 9: Canonical Correlation Coefficients (t1)

CV 1 CV 2 CV 3 CV 4 CV 5

0.733 0.227 0.147 0.037 0.018

The first three canonical variates were statistically significant with p-values of 0, and the fourth with significance of 0.05 and a p-value of 0.025, found by using Rao’s F Approximation. (see ap-pendix) They yield the following canonical roots, which provide an estimate of the shared variance between the canonical variates.

Table 10: Canonical Roots (t1)

CV 1 CV 2 CV 3 CV 4 CV 5

0.537 0.052 0.022 0.001 0.0003

(43)

percent of the total variance of HOOS scores is explained by variance in EQ-5D scores, and that 26.9 percent of the variance of EQ-5D scores is explained by variance of HOOS scores. Considering that only the first canonical variate has an R2of 0.537 and the rest significantly lower, attention should be focussed, again, on the first canonical function.

Due to the high correlation between mean HOOS scores, canonical weights and loadings are unfit for interpretation, because they are very unstable due to the high degree of collinearity. There-fore, only canonical crossloadings are deemed acceptable for interpretation of CCA results.

All independent variables were strongly correlated with the first dependent canonical variate, with a negative correlation of 0.591, 0.659, 0.692, 0.645 and 0.580, respectively. This reflects the high degree of shared variance between the two variables. By squaring these values we find that 34.8, 43.4, 47.9, 41.6 and 33.6 percent of the variance of the dependent canonical variate is explained by the HOOS mean subscale scores respectively.

For the dependent variables, we see a similair pattern: all dependent variables were correlated with the first independent canonical variate, with a correlation of -0.580, -0.360, -0.630, -0.550, and -0.350, respectively. From this, it is clear that 33.6, 13.0, 40.0, 30.0, and 12.3 percent of variance of the EQ-5D subscale scores can be explained by the first canonical variate.

Table 11: Canonical Crossloadings (HOOS subscale row means) (t1)

CV 1 CV 2 CV 3 CV 4 CV 5 hoost1.s.rowmeans -0.591 0.001 0.043 -0.016 -0.005 hoost1.p.rowmeans -0.659 −0.065 0.045 -0.001 0.002 hoost1.a.rowmeans -0.692 0.035 0.041 0.003 0.001 hoost1.sp.rowmeans -0.645 0.029 -0.024 -0.011 0.005 hoost1.q.rowmeans -0.580 -0.051 -0.062 0.004 -0.007

Table 12: Canonical Crossloadings (EQ-5D subscales) (t1)

CV 1 CV 2 CV 3 CV 4 CV 5 eq5d.1 -0.580 -0.034 -0.050 -0.015 0.004 eq5d.2 -0.360 0.150 0.063 -0.012 -0.002 eq5d.3 -0.630 0.055 -0.036 0.014 -0.002 eq5d.4 -0.550 -0.100 0.043 -0.001 -0.007 eq5d.5 -0.350 -0.029 0.080 0.010 0.011

Referenties

GERELATEERDE DOCUMENTEN

distribution (labeled as 2) shows a perfect UV overlay, which indicates the presence of the thio carbonyl thio functionality. This polymer was polymerized in a

In this paper, I addressed the issue whether the automobile industry has been successful in decreasing the (relative) carbon dioxide emissions for the production of

Hardie (2005), &#34;RFM and CLV: Using Iso-Value Curves for Customer Base Analysis&#34;, Journal of Marketing Research. and Vanitha Swaminathan (2004), “A typology of online

Additionally, and most importantly, Article 140 introduces a referendum by the population, determining the status of Kirkuk, either as part of the Kurdistan Regional Government, or

Scatter plot showing the R 2 value of a regression analysis against the slope of the line of that same regression analysis when plotting overarching, community-level daily

Cyber-physical Controller Agent Cyber Agent Physical Entity Agent Business Rule Management Agent Emergent Behavior Detection Agent Rule Engine Agent.. (b) Entity

Search terms: Amandelbult, cooling, Demand side management (DSM), ESKOM, Gold mines, Kopanang, National energy regulator (NER), optimisation, Platinum mines, pumping,

To test the third hypothesis, the stronger Dutch adolescents identify themselves with their in-group, the more powerful the effect of a stereotyping illustration will be on their