• No results found

The construction of health state utilities Osch, S.M.C. van

N/A
N/A
Protected

Academic year: 2021

Share "The construction of health state utilities Osch, S.M.C. van"

Copied!
19
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Citation

Osch, S. M. C. van. (2007, September 6). The construction of health state utilities.

Retrieved from https://hdl.handle.net/1887/12363

Version: Corrected Publisher’s Version

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden

Downloaded from: https://hdl.handle.net/1887/12363

Note: To cite this publication please use the final published version (if applicable).

(2)

2 2

Correcting biases in standard gamble and

time trade-off utilities

Correcting biases in standard gamble and time trade-off utilities.

S.M.C. van Osch, P.P. Wakker, W.B. van den Hout, A.M. Stiggelbout Medical Decision Making 2004; 24(5): 511-517.

(3)

Abstract

The standard gamble (SG) method and the time trade-off (TTO) method are commonly used to measure utilities. However, they are distorted by biases due to loss aversion, scale compatibility, utility curvature for life duration, and probability weighting. This chapter applies corrections for these biases, proposed in the economic literature, and provides new data on these biases and their corrections. The SG and TTO utilities of six rheumatoid arthritis health states were assessed for 45 healthy respondents. Various corrections of utilities were considered. The uncorrected TTO scores and the corrected (for utility curvature) TTO scores provided similar results.

The gains-corrected SG showed the best convergence with TTO scores. It has been suggested that TTO biases neutralize each other (whereas SG biases do not), so that the TTO method provides good estimates of utility. This chapter provides arguments suggesting that the TTO scores are biased upwards, rather than having balanced biases. First, the only downward bias in TTO scores (due to utility curvature of life duration) was small and, probably, cannot offset the upward biases. Second, the TTO scores are higher than the theoretically most preferred correction of the SG, the mixed correction. These findings suggest once more that uncorrected SG scores, which are higher than TTO scores, are too high.

(4)

Introduction

Utilities can be used to measure the effects of treatment outcomes, and play an important role in cost effectiveness analyses (8;9). Two methods to measure the utility of health states are the time trade-off (TTO) method and the standard gamble (SG) method (10). Based on normative expected-utility arguments, the SG method has often been considered the gold standard for utility measurement. However, there is much empirical evidence demonstrating that expected utility is not descriptively valid, and that its violations generate upward biases in SG utilities (6;7;13).

Less is known about the effects of biases in the TTO measurements. Some recent papers have suggested that these biases might neutralize each other (6), so that no systematic overall bias results. It would then follow that, on average, TTO utilities are closer to true utilities than SG utilities are. This would entail a theoretical justification for the preference for the TTO method that is indeed observed in practice. Another justification for this preference is based on the higher face validity of TTO results than of SG results. In the latter, respondents have been commonly found to exhibit overly extreme risk aversion (14). This chapter provides new insights into correction methods for the aforementioned biases, advanced in the economic literature, and tests them in the medical domain.

Biases in TTO and SG utilities

Bleichrodt provided an overview of the biases in utility measurement, and their likely effects (6). We discuss these biases below, and summarize them in Table 2.1.

(5)

Utility curvature

The TTO assumes that the utility of life duration is linear (10;15). This assumption is, in general, not correct (16). Empirical evidence shows that the utility of life years is concave for most people, with nearby years valued more than remote years (17).

In TTO measurements, respondents are asked to trade future years, which are, thereby, overweighted in the TTO calculations. This leads to a downward bias of the resulting utilities. SG measurements are not distorted by utility curvature for life duration.

Probability weighting

Probability weighting entails that people process probabilities in a nonlinear manner.

The pattern most commonly found is that people tend to overweight small probabilities and underweight large probabilities. The TTO does not use probabilities and, hence, is not affected by the corresponding biases. Probabilities do play a role in SG measurements and, therefore, probability weighting does affect SG utilities.

Empirical studies of probability weighting include Abdellaoui, Bleichrodt & Pinto (18), Bleichrodt & Pinto (19), Gonzalez & Wu (20), and Tversky & Kahneman (11).

Probabilities p > .33 are usually underweighted, so that respondents choose excessively high probabilities to generate indifference in SG questions. This leads to an overestimation of utility in SG measurements. Reversed effects occur for probabilities p< .33, leading to an underestimation of utility. Because utilities of health states usually exceed .33, probability weighting will usually generate an upward bias for the SG utilities (6).

Loss aversion

Loss aversion refers to the finding that people are more sensitive to losses than to gains (11). Consequently, losses weigh more heavily in decisions than gains do.

(6)

Whether an outcome is perceived as a gain or a loss depends on the reference point, which is often the status quo. The TTO takes an impaired health state as the starting point. This starting point is a natural candidate to serve as the reference point for the respondents. The TTO asks how many life years a person is willing to give up in order to regain optimal health. The person is asked to trade off life years (a loss) for optimal health (a gain). Loss aversion will make people more reluctant to give up life years.

Consequently, loss aversion generates an upward bias for the TTO, thus overestimating the utility of health states.

In the SG, the gambles can be perceived as yielding all losses, all gains, or as mixed (yielding both gains and losses), depending on the perceived reference point. It has been argued that the health state being evaluated is most likely to be perceived as the reference point (7;21), which can be seen as follows. In SG measurements, two options are considered. Option 1 with certainty yields an intermediate outcome, i.e. the health state to be evaluated. Option 2 is a gamble yielding a good outcome with probability p and a bad outcome with probability 1−p. The probability p is varied until indifference results. The certain outcome is not varied and is, therefore, most naturally taken as the reference point (7;21). In option 2, the good outcome is then perceived as a gain and the bad outcome as a loss. Consequently, it has been argued that the gamble as a whole is perceived as mixed. If so, for a person who is loss averse, the gain-probability p must then be extra high to offset the loss-probability 1−p. Loss aversion therefore generates an upward bias in SG utilities.

Scale compatibility

A less well-known bias is scale compatibility. It refers to the finding that, the higher the compatibility of a characteristic with the response scale used, the more attention and weight an individual will give to that characteristic (6;13;22;23). For the TTO, the

(7)

response scale is the number of years in good health. More attention is, therefore, given to duration than to health status. A respondent will be less willing to trade off life years, disregarding the health impact for those years. Thus, higher scores result.

For the SG, the response scale is a probability. Thus, respondents will pay more attention to the probabilities. This may hold as well for the good-outcome probability as for the bad-outcome probability (6). Therefore, no systematic bias for SG utilities can be predicted.

Table 2.1. Summary of biases discussed and their effects per method.

Utility curvature

Probability weighting

Loss aversion

Scale

compatibility

Total effect

TTO Down Not applicable Up Up ?

SG Not applicable Up (mostly) Up Unknown Up

Corrections for biases

Methods have been proposed to correct TTO utilities of health states for utility curvature for life duration using the Certainty Equivalent standard gamble (CE) to assess the utility of length of life (14). Although quantitative corrections of TTO utilities for loss aversion and scale compatibility are highly desirable, no such corrections are known at present, unfortunately. We can, therefore, only present a correction of TTO utilities for utility curvature for life duration. The corresponding formula is given in Appendix 2A. For SG utilities, corrections for the biases mentioned have been proposed (4) with the exception of scale compatibility. We consider three possible versions, depending on whether the gamble outcomes are perceived as all

(8)

gains, all losses, or mixed. Figure 2.1 shows the corrected SG utilities for each possible perception. The corresponding formulas are given in Appendix 2B.

0,0 0,2 0,4 0,6 0,8 1,0

0,0 0,2 0,4 0,6 0,8 1,0

Indifference probability

(Corrected) utility Mixed

Uncorrected

Gains Losses

FIGURE 2.1. The inverse S-shaped correction functions of SG utilities per perception: all gains, all losses, and mixed. The uncorrected function is also depicted.

We examine the convergent validity of the various corrections proposed, and the extent to which the biases in TTO measurements neutralize each other. We speculate on which (corrected) measurements yield utilities closest to true utilities.

(9)

Methods

Procedure

Forty-five respondents were recruited through newspaper ads and pamphlets. They were paid € 22.50 for participation. Six rheumatoid arthritis health state descriptions were selected from the descriptions given by rheumatic patients in the Rheumatoid Arthritis Patients In Training study (24). Descriptions were taken from the EQ-5D system, a multi-attribute health utility system. The EQ-5D system comprises five dimensions of health (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). Each dimension comprises three levels (no problems, some/moderate problems, and extreme problems). A unique EQ-5D health state is defined by combining one level from each of the five dimensions. The health states were chosen so as to cover the utility continuum (0−1), using corresponding EQ-5D valuations based on the TTO (25). We used the EQ-5D health state descriptions; 21232 (utility of .09), 22322 (utility of .19), 21321 (utility of .36), 21222 (utility of .62), 21211 (utility of .81), and 21111 (utility of .85).

The TTO, SG, and CE were all computerized using the program Ci3 (26). All elicitations were based on the ping-pong search procedure. This procedure leads to fewer inconsistencies in people's preferences than the procedure of direct matching (27).

All respondents performed two sessions with a two-week interval in between. The order was randomized. Session A consisted of SG and TTO elicitations. The order of elicitations within this session was randomized per method. Session B was devoted to the CE life-year gambles. Each session took 90 minutes on average to complete, and was preceded by oral and written instructions. At any time during an elicitation, it

(10)

was possible for respondents to take a break, check earlier answers, and possibly change them. At the end of each elicitation, respondents were requested to verify if they indeed considered the two options equivalent.

Session A, standard gamble and time trade-off

Session A started with a short explanation of rheumatoid arthritis. In total, six SG's and six TTO's were performed, one elicitation for each rheumatoid arthritis health state. In the SG, two options were given. Option 1 was a rheumatoid arthritis health state for the respondent's remaining life expectancy (LE). Option 2 was a gamble between good health for LE with probability p and death within a week with probability 1−p. Probabilities in the gamble were varied until indifference resulted. LE was based on a respondent’s remaining life expectancy derived from Dutch life tables (28).

For the TTO, respondents were offered the choice between either a rheumatoid health state during LE and a healthy life for period x (x  LE). Period x was varied until indifference resulted.

Session B, certainty equivalent (CE)

Respondents performed seven Certainty Equivalent life-year gambles in good health, CE12.5, CE25, CE37.5, CE50, CE62.5, CE75, and CE87.5. CE is a standard gamble for which probabilities are held constant, in our case at p = .5. The duration of the certain outcome is varied until indifference results. The CE50 is the number of years that a respondent finds equivalent to a 50−50 gamble between LE and death within a week.

CE75 is the number of years equivalent to a 50−50 gamble between the LE and CE50.

CE25 is the number of years equivalent to a 50−50 gamble between CE50 and death within a week; etc. A detailed discussion of the chained CE measurement method

(11)

used in this chapter is available in Verhoef et al. (29). As CE measurements were chained, e.g. the CE50 was used to derive the CE75, complete randomisation was not possible. The order of elicitations within this session was randomized as much as possible.

The CE values, used to correct the TTO measurements for nonlinearity of utility, were analyzed in the traditional way assuming expected utility. A reanalysis of these data through prospect theory, and the location of a reference point appropriate for such an analysis, is the topic of future research. This chapter focuses on the novelty of the corrected SG measurements, and the comparison of these to traditional measurements.

Data analysis

The formulas used to calculate utilities from the respondents' choices are explained in Appendices 2A and 2B. Discrepancies between methods were assessed for all health states using MANOVA with method as a within-subjects factor, to determine convergent validity between the TTO and the SG, both corrected and uncorrected.

Results

Two of the 45 respondents were excluded from the analysis because they were not able to perform CE life-year gambles appropriately, either because the subjective life expectancy was much higher than the LE used ("My grandmother and grandfather are alive and well and both 90 years of age; the 76 years (LE) you offer is far too short.") or due to religious arguments ("God decides what will happen, not I."). The respondents consisted of 26 females (mean age = 27, s.d. = 12) and 17 males (mean age = 34, s.d. = 14). All

(12)

respondents had received at least a high-school education. About 50% of the respondents were university students, and 25% of the respondents had children.

Most respondents (65%) exhibited risk aversion in the CE questions, i.e. their CE’s were lower than the expected values of the gambles. About 25% of the respondents exhibited risk seeking, and 10% exhibited risk neutrality (the power coefficient r of utility between 0.95 and 1.05). The mean power coefficient r of utility was 1.16 (s.d. = 1.07) and the median power coefficient r was .80. For the TTO, utility-curvature correction, using the individual r-values (corrected TTO), leads to slightly higher scores than uncorrected TTO scores. Figure 2.2 shows the minor and non-significant effect of the correction on the average TTO valuation per health state (p = .29).

Figure 2.2 also presents health state utilities as assessed by the SG, both uncorrected and corrected. It shows that uncorrected SG and losses-corrected SG, leading to very similar utilities, always provide the highest value for a health state, followed by gains- corrected SG. Mixed corrected SG always provides the lowest utility. This order is in line with the differences shown in Figure 2.1 (see also Appendix 2B). Gains-corrected SG shows the strongest convergence with both the corrected TTO (p = .51) and the uncorrected TTO (p = .74). The losses-corrected SG is relatively high and shows the least convergence with the uncorrected TTO (p < .001) and the corrected TTO (p <

.002). Mixed-corrected SG provides scores that are considerably lower than uncorrected TTO scores (p = .05) or corrected TTO scores (p < .01).

(13)

0.3 0.4 0.5 0.6 0.7 0.8 0.9

Health states

Utility

SG uncor SG loss TTO cor TTO uncor SG gain SG mix

I II III IV V VI

FIGURE 2.2. Mean utility for each health state per method and possible corrections. The six health states are ranked on the x-axis according to the corresponding mean utility.

Discussion

In health economics, the TTO has been developed as an alternative to the SG (10).

Although lacking the theoretical foundations of the SG, the TTO has emerged as the most frequently used method. The main reasons for TTO's wide acceptance are its better feasibility, its higher discriminative power, and its better face validity. The epithet of the SG as gold standard has faded during years of practice. TTO seems to have been accepted as a practical gold standard.

(14)

In our data, utility of life years was nearly linear at the aggregate level and, hence, correcting the TTO for utility curvature had only a minor effect. Some other studies found stronger deviations from linearity for the utility of life years (30). Stiggelbout et al. used a time frame of ten years and interviewed disease-free testicular patients who evaluated a good health state and, therefore, their findings may not be comparable to ours (30).

In our data, correcting for utility curvature had no effect. Consequently, this correction did not neutralize the upward bias in TTO due to loss aversion and scale compatibility, resulting in an overall upward bias in TTO scores. This suggests that the even higher uncorrected SG and losses-corrected SG scores are way too high.

There is other evidence suggesting that SG scores are too high (7;13). No quantitative estimations are known of the effects of loss aversion and scale compatibility on the TTO scores and, hence, we cannot estimate the degree of overestimation comprised in TTO scores. In Bleichrodt and Pinto (23) and Bleichrodt, Pinto, and Abellan (31), similar high durations were used and no loss aversion was found for such high durations.

The gains-corrected SG showed the strongest convergence with the uncorrected TTO data. However, in our data the TTO seems to be too high and, thus, gains-corrected SG is probably too high also. The mixed-corrected SG may provide better approximations of true utility than the gains-corrected SG. A psychological argument in favor of the mixed-corrected SG is that the certain outcome is fixed in the SG (4). The framing of the instructions, where respondents were asked to imagine that the certain health state is their status quo, provides another argument in favor of the mixed correction.

Further, immediate death is not plausible to serve as a reference point because it is remote from the actual situation faced by the respondents, which is another reason

(15)

why it is unlikely that all outcomes in the SG will be perceived as gains. This probably is too distant from a healthy person's status quo, which includes life expectancy. Little is known about the psychology behind the location of the perceived reference point.

Qualitative data to provide further insights will be desirable.

Conclusions

In our study, utility curvature was absent at the average level and, as a result, correcting TTO scores for utility curvature had little effect at the aggregate level. The loss-correction of the SG also had little effect. The gains-correction of the SG had more effect, leading to lower scores that were close to the TTO scores, and yielding the strongest convergent validity. The mixed-correction of the SG led to considerably lower scores. Besides the convergent validity, Bleichrodt (2002) suggested another argument, based on conjectured neutralizing biases, favoring TTO scores. We have suggested, to the contrary, a net upward bias for TTO scores. There are also theoretical arguments, based on prospect theory, favoring the mixed-correction of the SG.

Because we found that TTO scores were higher than mixed-corrected SG scores, this suggests again that TTO scores are too high, in deviation from what has been thought before. This finding suggests once more that the, even higher, SG scores are much too high.

(16)

Appendix 2A. TTO calculations

Estimates for utilities of the six health states were derived from the TTO questions by dividing the number (x) of years in good health by the LE. A power function with parameter r was used to describe utility of life years. Power functions were chosen because there is empirical evidence supporting these functions (16). For each respondent, r was estimated and used to correct the respondent's TTO. Following Pliskin et al. (16), the utility function U(Y,Q) for life years Y in health state Q is U(Y,Q) = bY rH(Q), where H(Q) is a quality adjusted factor, scaled from 0 to 1. The following argument is taken from Miyamoto and Eraker (14):

For CEn, n = 25, 50, 75:

n / 100 = U(CEn,Q) / U(LE,Q).

Expanding the right side yields:

n / 100 = bCEnrH(Q) / (bLErH(Q)) = (CEn / LE)r

Taking logarithms and dividing through yields:

(1 / r)ln(n / 100) = ln(CEn / LE)

A least-squares estimate can be obtained for (1 / r).

It can be shown that H(Q), the measure of health quality, is estimated by (x / LE) from the TTO raised to the power r.

If a respondent is indifferent between (LE,Q) and (x,Qmax), then U(LE,Q) = U(x,Qmax):

bLErH(Q) = bxrH(Qmax) = bxr, because H(Qmax) = 1.0.

H(Q) = (x / LE)r now follows (14;30).

(17)

Appendix 2B. SG calculations

The following utility calculations are based on prospect theory, following Bleichrodt, Pinto, &

Wakker, 2001 (4). We use the following notation:

p = indifference probability provided by the respondent U(h) = utility of health state h

(p) = weight of the probability p

 = parameter in the probability weighting function

 = loss aversion parameter (value = 2.25)

Tversky and Kahneman proposed the following probability weighting function (11):

(p) = p / ((p + (1 − p)  )1 / )

The formula has been found to be different for losses than for gains, in which - (p) = weight of probability of a loss, and + (p) = weight of probability of a gain). If individual estimates of the parameters of the respondent for the relevant outcomes are available, then these values should obviously be used. Such estimations are, however, hard to obtain, and are not commonly available in the health literature. In the absence of such information, it seems natural to use the estimations most commonly accepted in the literature, being those by Tversky and Kahneman (11):  = 0.69 for losses and  = 0.61 for gains. For a detailed discussion of this point see section 4 of Bleichrodt, Pinto, and Wakker (4).

If all outcomes are perceived as gains, then the formula for the SG utility of the health state is:

U(h) = +(p).

If all outcomes are perceived as losses, then the formula for the SG utility of the health state is:

U(h) = 1 – -(1 − p).

For the mixed case, the formula for the SG utility of the health state is:

U(h) = +(p) / ( +(p) + -(1 − p)).

(18)
(19)

Referenties

GERELATEERDE DOCUMENTEN

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded.

The studies described in this thesis were performed at the Department of Medical Decision Making of the Leiden University Medical Center, Leiden, The Netherlands, and were

The potential biases in these methods that are discussed are loss aversion, probability weighting, scale compatibility, and utility curvature for life duration.. This

For the life-year gambles involving long periods of survival (e.g. CE75 and CE87.5), the low outcome of the gamble frequently served as the reference point (see Figure 3.3)..

The effect of scale compatibility on mean SG utilities was tested by calculating Pearson's R correlations between the relative focus of attention and the total mean utility for

An important theme for respondents was goal aspiration. It influenced how respondents answered TTO questions. Most often, respondents mentioned not being able to achieve goals

Additionally, we examined whether dual processing (an interaction between automatic and controlled information processing) occurred during VAS valuation. In the first experiment,

As predicted, the HRAS showed mostly positive and moderate to low correlations with the DOSPERT Behavior subscales, except for the correlation with the subscale Health (.50) which