• No results found

Enhancing clinical utility of a working memory task: a reliable change approach

N/A
N/A
Protected

Academic year: 2021

Share "Enhancing clinical utility of a working memory task: a reliable change approach"

Copied!
30
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Research Master’s Thesis | University of Amsterdam

Enhancing clinical utility of a working memory task:

a reliable change approach

Author

Name: Renee Kleine Deters Studentnumber: 5690072

Date: 31-07-2014

Assessors

First assessor: prof. dr. H.M. Geurts Second assessor : dr. Cedric Koolschijn Daily supervisor: G.H. Tamminga, MSc

(2)

Abstract

In research, cognitive change on the individual level is not frequently assessed, even though group effects may not describe the full extent of change. The aim of this study was to provide reliable change estimates for an N-back task of working memory, which can be used to evaluate individual cognitive change. Reliable change estimates were calculated in typically developing (TD) children and children with ADHD receiving placebo (ADHD-P). Change estimates were then applied to a group of children with ADHD treated with methylphenidate (MPH). In addition, stability of test performance in children with ADHD and healthy children was compared. Three groups of children aged 10 – 12 were included, 30 TD children, 19 ADHD-P children, and 17 children with ADHD receiving MPH. The N-back task was administered twice, eight weeks apart during which the children with ADHD received treatment. The range of the reliable change estimates was found to be very large due to low test re-test reliabilities. Test re-test reliabilities in the TD and ADHD-P group were similar for the 1-back condition, but not for the 2-back condition. Similar proportions of children were classified as showing reliable change in the ADHD-P and ADHD-M groups. Due to low psychometric properties, the N-back task is not suitable for investigating change in WM abilities.

(3)

Contents

Enhancing Clinical Utility of a Working Memory Task: a Reliable Change Approach ... 4

Methods... 7

Participants ... 7

TD children. ... 7

Children with ADHD. ... 7

Procedure ... 8

TD children. ... 8

Children with ADHD. ... 8

Neuropsychological assessment... 9 Materials ... 9 Screening... 9 N-back task. ... 10 Data analysis ... 11 Demographics. ... 11

Reliable Change Estimates. ... 12

Exploratory analyses. ... 13

Results ... 13

Demographics ... 13

Reliable Change Estimates ... 16

Exploratory analyses. ... 18

Discussion ... 19

(4)

Enhancing Clinical Utility of a Working Memory Task: a Reliable Change Approach An important goal of neuropsychological assessment is to investigate change in cognitive functioning over time (Heilbronner et al., 2010; Temkin, Heaton, Grant, & Dikmen, 1999). Changes in cognitive functioning can be a result of neurodegenerative conditions, traumatic brain injury, and neurosurgical, behavioral and pharmacological interventions (Duff, 2012; Heaton et al., 2001; Temkin et al., 1999). To document changes in cognitive functioning, repeated neuropsychological assessments are conducted.

In clinical practice, test results are interpreted on the individual level. However, in research test performance and changes in performance between multiple assessments are often evaluated at the group level. A problem with this approach is that treatment or clinical condition may differentially affect individuals. Hence, group effects may not describe the full extent of change in cognitive abilities (Heaton et al., 2001). In addition, it might be of interest for individuals to determine whether changes on neuropsychological tests reflect changes in cognitive abilities or normal variability in test performance.

Examining cognitive change at the individual level yields some challenges, as test performance is likely to be variable even in absence of change in the underlying cognitive abilities. Just like single test scores, an observed change score is likely to be a combination of true change and error (Duff, 2012). Two sources of error relevant to repeated testing are test test reliability and practice effects (Duff, 2012; Heilbronner, 2010; Temkin, 1999). Test re-test reliability may be influenced by several factors, such as the length of the re-test-rere-test interval (Duff, 2012). Shorter retest intervals have been show to lead to higher reliability coefficients (Duff, Beglinger, Moser, & Paulsen, 2010). In addition, clinical conditions might influence change scores, as they can also affect test scores on single neuropsychological assessments (Duff, 2012).

(5)

A method for determining whether changes in test scores are reliable was proposed by Jacobson and Truax (1991). Their reliable change index (RCI) estimates the extent to which a test score can be expected to change as a function of measurement error. If the change in an individual’s score exceeds the change that occurs due to measurement error, it is statistically reliable. The observed change is unlikely to be due to the psychometric properties of the measure and likely to reflect a real difference in performance. A problem with the traditional RCI is that it does not account for practice effects, which are common in cognitive testing. Therefore, modifications of the RCI that do account for practice effects have been proposed (RCI + practice effects, RCIPE; Chelune et al., 1993). In addition, Iverson (2001)

incorporated variability of the second test administration in the formula. This alternate calculation is now typically used as the denominator in RCIPE (Duff, 2012). To calculate

RCI’s, data on test stability and practice effects of cognitively healthy individuals is required. The scarcity of published retest data for neuropsychological tests is a major obstacle to the widespread use of RCI’s. The American Academy of Clinical Neuropsychology (AACN; Heilbronner et al. 2010) states that more empirical research is needed to identify clinical change scores and facilitate their application for most cognitive tests currently in use.

Repeated cognitive testing is very common is the field of Attention

Deficit/Hyperactivity Disorders (ADHD) research. Several cognitive deficits have been hypothesized to play a major role in the mechanisms underlying ADHD. Examples are response inhibition (Barkley, 1997), regulation of arousal/activation (Sergeant, Oosterlaan, & van der Meere, 1999), delay aversion (Sonuga-Barke, 2002; see Castellanos & Tannock 2002 for a review). Serial assessments of these functions are frequently used to evaluate treatment efficacy (Coghill, et al., 2013; Pietrzak, Mollica, Maruff, & Snyder, 2006).

A neuropsychological construct that has recently gained considerable interest in relation to ADHD is working memory (WM). WM is consistently found to be impaired, both

(6)

in children (Kasper, Alderson & Hudec, 2012; Martinussen, Hayden, Hogg-Johnson, & Tannock, 2005; Willcut, Doyle, Nigg, Faraone, & Pennington, 2005) and in adults (Alderson, Kasper, Hudec, & Patros, 2013) with ADHD. WM is defined as a limited capacity system involving the temporary storage and active manipulation of information used to guide behavior and is involved in maintaining, controlling, and manipulating goal-relevant information. It enables skills like reasoning, planning, problem solving and goal-directed behavior (Baddeley, 2003, 2010).

Methylphenidate (MPH) is the most common pharmacological treatment for ADHD, and several studies show beneficial effects on aspects of WM in ADHD (Bedard & Tannock, 2008; Bedard, Jain, Johnson, & Tannock, 2007; Kobel et al., 2008; Mehta, Goodyer, & Sahakian, 2004). However, no RCI’s have been calculated in these studies. As research indicates that approximately 70% of patients respond to MPH (Greenhill & Ford, 2002), group effects of treatment are not sufficient to investigate the association between MPH efficacy and change in cognitive abilities. Clinical utility of the findings is limited.

RCI methods should be used to investigate individual change in working memory abilities after MPH administration. As Heaton et al. (2001) pointed out that normative rates of change may not generalize from nonclinical to clinical samples, data of healthy

comparison groups might not be the most appropriate for calculating RCI’s. This may specifically apply to ADHD research, as children with ADHD show greater variability in performance compared with healthy children (Douglas, 1999). Castellanos and Tannock (2002) even argue that this within-subject variability is a cognitive hallmark of ADHD. However, variability in test performance of children with ADHD is seen in particular on a type of tests different from the N-back task, being speeded-reaction time tests (Castellanos, & Tannock, 2002). In addition, one study on the subject found stability of cognitive

(7)

(Mollica, Maruff & Vance, 2004). Nonetheless, it still remains an important issue to consider when calculating RCI’s for an ADHD sample. As there is little evidence that placebos

produce change in cognition in ADHD (Waschbusch, Pelham, Waxmonsky & Johnston, 2009) data obtained in a placebo group can be used to investigate test stability.

The aim of this study is to establish reliable change estimates for an N-back task of working memory used in ADHD research. First, test-retest reliabilities for typically

developing (TD) children and children with ADHD receiving placebo (ADHD-P) are calculated to investigate whether test stability differs between groups. Subsequently, RCI’s will be established for both groups. Lastly, RCI’s will be applied to a group of children with ADHD receiving methylphenidate (ADHD-M), to investigate whether they are effective in classifying reliable change after MPH administration. Is it hypothesized that test stability does not differ between the TD and ADHD-P groups. RCI’s are also expected to be similar between groups. In addition, approximately 70% of children treated with MPH are expected to show reliable change on the N-back task.

Methods Participants

TD children. The TD group consisted of 33 children aged 10 – 12 years. Exclusion criteria included a score above the 95th percentile on the Disruptive Behavior Disorders rating scale (DBD-RS; Oosterlaan et al., 2008), presence of psychiatric, neurological or medical disorders, having siblings with an ADHD diagnosis, prior stimulant use, and an estimated IQ below 80.

Children with ADHD. The children with ADHD all participated in a multicenter randomized, double-blind, placebo-controlled trial (ePOD study; Bottelier et al., 2014). They were aged 10 – 12 years. The ADHD-P group consisted of 20 children, the ADHD-M group

(8)

consisted of 18 children. The exclusion criteria included the presence of co-morbid

psychiatric disorder requiring pharmacological treatment, neurological or medical disorders, prior stimulant use, and an estimated IQ of below 80.

Procedure

TD children. TD children were recruited through primary schools. Their parents were asked for permission by means of an informed consent. They also filled out the DBD before their children completed the baseline assessment. The subtests of the WISC were included in the baseline assessment. Assessments took place at their schools, test data were planned in agreement with the schools. The follow-up assessment took place about eight weeks after the baseline assessment. Children got a small present for participating.

Children with ADHD. The children with ADHD were recruited from clinical programs at the Child and Adolescent Psychiatry Center Triversum (Alkmaar), from the department of (Child and Adolescent) Psychiatry of the Bascule/AMC (Amsterdam), and from PsyQ mental health facility in The Hague (Bottelier et al., 2014). Children were included when they met the ADHD diagnosis described in the Diagnostic and Statistical Manual of Mental Disorders (4th ed., American Psychiatric Association, 1994), as evaluated by an experienced psychiatrist and confirmed by the National Institute of Mental Health Diagnostic Interview Schedule for Children version IV (NIMH-DISC-IV), authorized Dutch Translation (Ferdinand & van der Ende, 1998). The NIMH-DISC-IV and were administered on a screening day prior to the baseline assessment. Parents also filled out the DBD during the screening day. Directly after the baseline assessment, patients were stratified by age and then randomized to either placebo or MPH treatment. Optimal titration was used to determine medication dose. The starting dose was 0.3 mg/kg a day in 1-2 doses and could be increased weekly with 5-10/day to a maximum of 60 mg daily. Clinical dosage was dependent on the reduction of symptoms, and decisions about dosage modifications were only taken by the

(9)

treating psychiatrist. The follow-up assessment took place about eight weeks after starting treatment.

Neuropsychological assessment. The N-back task was part of a neuropsychological assessment of about one hour. The tasks were counterbalanced across subjects, with four different sequences of administration, to account for influences of order of administration. Materials

Screening. The authorized Dutch translation of the DBD-RS (Oosterlaan et al, 2008) was used to screen for symptoms of disruptive behavior disorders (attention deficit, conduct, and oppositional defiant). The internal consistencies of the subscales inattention,

hyperactivity, conduct disorder (CD) and oppositional defiant disorders (ODD), are .92, .90, .90 and .72, respectively. Test-retest reliabilities are higher than .80 for the inattention, hyperactivity and ODD scales. Test-retest reliability for the CD-scale is .50. Convergent validity ranges between r = .18 and r= .83, divergent validity ranges between -.16 and -.50. IQ was estimated using the subtests Vocabulary and Block design from the Wechsler Intelligence Scale for children-Revised, Dutch translation (WISC-III-NL; Kort et al., 2002). Vocabulary and Block design show the highest correlations with the total IQ (Kort et al., 2002). In the general population, the combination of these subtests was the most valid when using two subtests to estimate IQ (e.g. Ryan, 1981, Ryan et al., 1988; Silverstein 1983, 1989, 1990).

In the ADHD group, the NIMH-DISC-IV, authorized Dutch translation (Ferdinand & van der Ende, 1998) was used to verify the diagnosis and to determine ADHD subtype. Test-retest reliability for the parent versions in a clinical sample is .79 (Shaffer et al., 2000). There has been no formal validity testing in the NIMH DISC-IV, but an agreement between the DISC and a clinician’s ratings of 0.72 was found.

(10)

N-back task. The N-back task used in this study was designed by van Leeuwen, van den Berg, Hoekstra, and Boomsma (2007), who reported test-retest reliabilities for children of .65 on the 2-back level and .70 on the 3-back level. The task has been adapted from more traditional N-back tasks (Gevins and Cutillo, 1993; Jansma, Ramsay, Copolla & Kahn, 2000). Children were told a caterpillar would appear in one of four holes of an apple presented on the screen. In order to prevent the apple from being eaten by the caterpillar, children had to indicate in which hole the caterpillar appeared a given number of trials ago. The task consisted of three levels. In the first level children had to indicate in which hole the caterpillar appeared one move back (1-back), in the second level they had to indicate in which hole the caterpillar appeared two moves back (2-back), and in the third level they had to indicate in which hole the caterpillar appeared three moves back (3-back; see Figure 1). Children were instructed to push one of four buttons corresponding to the holes in the apple, using thumbs and index fingers of both hands. There was a 1 s delay between caterpillar moves. Each level consisted of 32 trials. After each level, children received feedback on the amount of apples they saved.

.

Figure. 1. Schematic representation of the N-back task. The black dots represent the caterpillar.

(11)

Participants started with a 1-back practice block of 11 trials. They received feedback on the number of apples they saved. When less than five apples were saved, participants practiced again until they clearly understood the task. Then the 1-back level was started. After the 1-back level, children continued with a 2-back practice block. When more than half of the apples were saved, participants proceeded with the 2-back level. If not, they practiced again until the task was clear. The task ended when participants made six or more mistakes during the 2-back level. With five mistakes or less, participants proceeded to the 3-back level. As the 3-back condition was quite hard, no minimum amount of apples needed to be saved in the practice block, as long as participants understood the task. The numbers of correct

answers for each level were used as dependent measures in the analyses. Data analysis

Demographics. Means and standard deviations (SD’s) were calculated for the

independent variables age, IQ and duration of the test-retest interval. Means and SD’s were also calculated for each level of the N-back task at baseline and follow-up, and for the change scores. Calculations were performed separately for the TD, ADHD-P, and ADHD-M groups. All variables were tested for normality using the Kolmogorov-Smirnov test. If data in all groups was normally distributed, Levene’s tests were done to further investigate whether the variances differed between groups. Depending on the normality of the data, ANOVA’s or Kruskal Wallis tests were used to investigate differences in demographic variables and N-back performance across groups. Significant ANOVAs and Kruskal Wallis tests were followed-up by post-hoc independent t-tests or Mann Whitney U tests. Change scores for all three levels of the N-back tasks were screened for outliers using Tukey’s fences (Hoaglin, 2003). Subsequent analyses were performed both with and without outliers. Only analyses without outliers that yielded different results are reported. To investigate whether N-back performance at follow-up was significantly better than baseline performance, repeated

(12)

measures ANOVA’s or Wilcoxon rank sum tests were done. Unless indicated otherwise, α was set at .05.

Reliable Change Estimates. Test-retest reliabilities were calculated using Pearson

correlation coefficients for normally distributed data or Spearman’s Rho coefficients for non-parametric data. Reliabilities were calculated for each level of the N-back task, separately for the TD and ADHD-P groups. The Fisher transformation was used to investigate whether test-retest reliabilities differed between groups.

Mean change scores were calculated by comparing mean baseline scores with the mean follow-up scores in the TD and ADHD-P groups. The reliable change estimate adjusted for practice effects (Chelune et al., 1993) is calculated with a 90% prediction interval based on the standard error of the difference (SEdiff). Whenever the posttest variance can be estimated and is not assumed to be equal to the pretest variance, the expression

(1) SEdiff= √[(SDx2+ SDy2)(1 − r xy)]

is preferred (Duff, 2012; Iverson et al., 2003; Maassen, 2005, 2009), where SDx = the standard deviation at baseline

SDy = the standard deviation at follow-up

rxy = test-retest reliability

The 90% prediction interval was calculated by multiplying SEdiff by the corresponding

value from the z-distribution (+/- 1.64). The prediction interval was added to and subtracted from the mean change scores. Reliable change (RC) values were then rounded to the nearest whole number for ease of interpretation, yielding the minimum increase in amount of points needed for reliable change. As the reliable change estimates were calculated using a 90%

(13)

confidence interval, 10% of the children are expected to show reliable change by chance. This was investigated for both the TD and ADHD-P groups. After the RC values were calculated, they were applied to data of the ADHD-M group. This way the number of children showing reliable change when taking MPH was established.

Exploratory analyses. Given the relatively small sample sizes, large variances and non-normality of the data, a bootstrap analysis was performed with the means and medians of the change scores to evaluate stability of the change scores.

Results Demographics

In the control group, 3 of 33 participants were excluded. Two participants were excluded due to difficulties with data registration. One participant was excluded because his twin brother was also participating. In the ADHD-P group, 1 of 20 participants was excluded due to difficulties with data registration. In the ADHD-M group, 1 of 18 participants was excluded due to difficulties with data registration.

Demographic characteristics are presented in Table 1. No differences in age, F(2, 63) = .48, p = .63 and IQ, F(2, 60) =1.96, p = .15, were found across the TD, P and ADHD-M groups. Duration of the test re-test interval was different across the TD, ADHD-P and ADHD-M groups, χ2 (2) = 13.54, p = .001. Follow-up Mann-Whitney U tests showed that the

mean duration of the test re-test interval was shorter in the TD group compared with the ADHD-P group, U = 128, p = .001, r = .46, and the ADHD-M group, U = 128.5, p = .005, r = .41. Duration of the test re-test interval was similar in the both ADHD groups, U = 154, p = .81. The TD, ADHD-P and ADHD-M groups differed on the DBD attention scale, χ2 (2) =

48.71, p < .001, hyperactivity scale, χ2 (2) = 45.75, p < .001, ODD scale, χ2 (2) = 21.42, p <

(14)

ADHD-P and ADHD-M groups scored similar on the attention scale, U = 143.5, p = .566, hyperactivity scale, U = 134.5, p = .391, ODD scale, U = 146.5, p = .829, and the CD scale, U = 129, p = .411. The TD group scored lower than the ADHD-P group on the attention scale, U = .0 , p < .001, r = 0.84, hyperactivity scale, U = 10, p = .001, r = 0.81, ODD scale, U = 111.5, p = .001, r = 0.50, and the CD scale, U = 136, p = .001, r = 0.50. The TD group scored also lower than the ADHD-M group on the attention scale, U = .0 , p < .001, r = 0.83, hyperactivity scale, U = 7.5, p < .001, r = 0.81, ODD scale, U = 67.5, p < .001, r = 0.62, and the CD scale, U = 138, p = .002, r = 0.46.

Table 1. Demographic characteristics

Note: DBD in = DBD inattention scale, DBD H/I = DBD hyperactivity/impulsivity scale, DBD ODD = DBD oppositional defiant disorder scale, DBD CD = DBD conduct disorder scale.

a N = 17 (missing values for 2 participants), b N = 16 (missing value for 1 participant), c N = 18 (missing value for 1 participant), d N = 13 (missing value for 4 participants).

One outlier was identified for the 1-back level in de TD group, and one outlier was identified in the ADHD-P group on the 2-back level, showing a decrease of decrease of 24 points between baseline and follow-up. N-back performance in shown in Table 2. At

TD (N=30) ADHD-P (N=19) ADHD-M (N=17)

M (S.D.) Range M (S.D.) Range M (S.D) Range

Age (years) 11.13 (0.65) 10.02 – 12.28 11.35 (0.96) 10.09 – 13.08 11.30 (0.92) 10.06 – 13.02 Estimated IQ 110.37 (19.57) 81 - 145 99.76 (12.62) a 81 - 129 109.06 (20.32)b 84 - 145

Test Interval (weeks) 7.44 (0.88) 6 – 8.71 8.51 (0.90) 7.43 –

10.71 8.17 (1.39) 3.57 – 10.00 DBD in 2.50 (2.37) 0 – 8 23.05 (3.34) 16 - 27 22.47 (3.22) 17 – 27 DBD H/I 1.87 (1.87) 0 – 7 16.26 (7.12) 3 - 25 15.18 3 – 24 DBD ODD 1.53 (1.94) 0 - 7 6.17 (5.18) c 0 – 10 6.71 (5.51) 1 – 20 DBD CD 0.33 (1.12) 0 - 6 2.28 (2.74) c 0 - 10 1.35 (1.62) 0 - 6 MPH dose (mg/day) 32.7 20- 45d

(15)

baseline, there was no difference in performance between the TD, ADHD-P and ADHD-M groups on the 1-back level, χ2 (2) = 1.99, p = .37, 2-back level, χ2 (2) = 4.92, p = .085, and 3-back level, χ2 (2) = 1.38, p = .501.

Table 2. Mean number of correct answers on the N-back task

Controls (N=30) ADHD-P (N=19) ADHD-M (N=17)

M (S.D.) Range M (S.D.) Range M (SD) Range

Baseline 1-back 31 (1.62) 26 – 32 29.63 (3.48) 22 - 32 30.12 (4.54) 13 – 32 2-back 25.4 (6.66) 8 – 32 21.47 (7.81) 8 – 32 20.76 (8.29) 6 – 32 3-back 21.06 (7.11)a 9 - 32 17.25 (7.29) b 4 - 28 20.80 (7.95)c 10 – 32 Follow-up 1-back 31.13 (1.17) 28 – 32 30.89 (1.45) 27 - 32 31.12 (1.32) 27 – 32 2-back 30.17 (2.29) 24 – 32 23.68 (7.58) 8 - 32 28.35 (3.73) 22 – 32 3-back 23.69 (6.35)a 10 - 32 16.67 (5.01)d 11 - 22 23.40 (6.11)c 15 – 32 Note: For the 3-back level at follow-up, only children were included who also completed this level at baseline.

a N = 16, b N = 8, c N = 5, d N= 6.

Wilcoxon signed-ranks tests were done to investigate whether groups scored better at baseline than at follow-up. Because for each group the test was done separately for each level, the Bonferroni correction was applied. This yields an α of 0.05/3 = 0.017. The TD group scored significantly better on the 2-back level than at baseline, Z = -3.54, p < .001, r = .6, but not at the 1-back and 3-back levels, Z = -.314, p = .753, and Z = -1.4, p = .162. Follow-up performance in the ADHD-P group was not better than baseline performance for the 1back level, Z = .31, p = .178, 2back level, Z = 1.54, p = .124, and 3back level, Z = -.105, p = 0.917. In the ADHD-M group, follow-up performance was better than baseline performance for the 2-back level, Z = -3.31, p = .001, r = .8, but not for the 1-back level, Z = -.31, p = .757, and 3-back level, Z = -1.83, p < .068. At follow-up, no differences in

(16)

performance were found between the TD, ADHD-P and ADHD-M groups for the 1-back level, χ2 (2) = .31, p = .86, and the 3-back level, χ2 (2) = 5.485, p = .064. The groups

performed differently at the 2-back level at follow-up, χ2 (2) = 12.44, p = .002. Pairwise

comparisons (with α = 0.05/3 = 0.017) indicated that the TD and ADHD-P group differed on the 2-back level, U = 126.5, p = .001, r = 0.48. The TD and ADHD-M group did not differ, U = 173.5, p = .06. In addition, there was no difference in 2-back performance between the ADHD-P and ADHD-M group, U = 107, p = .087. Mean changes in performance between baseline and follow-up were similar in the TD, ADHD-P and ADHD-M groups for the 1-back level, F(2, 63) = .81, p = .451, 2-1-back level F(2, 63) = 2.16, p = .124, and 3-1-back level, F(2, 63) = .46, p = .64.

Reliable Change Estimates

Test re-test reliabilities are listed in Table 3. In the ADHD-P group, only six children completed the 3-back level at both occasions. This sample size was considered too small to reliably investigate test-retest reliabilities.

Table 3. RC estimates

Note: r = test-retest reliability, Sediff = standard error of the difference, DS = difference score,

RC estimate = critical values for reliable change, *correlations with and without outliers. N r Sediff DS RC estimate TD 1-back 30 .23/.30* 1.75 0.13 -3 ≥ RC ≥ 3 2-back 30 .14 6.53 4.77 -6 ≥ RC ≥ 16 3-back 16 .53 6.00 2.63 -8 ≥ RC ≥ 13 ADHD-P 1-back 19 .20 3.37 1.26 -5 ≥ RC ≥ 7 2-back 19 .40/.64* 8.43 2.2 -12 ≥ RC ≥ 16/ -8 ≥ RC ≥ 12

(17)

Fisher’s Z transformation indicated that test-retest reliabilities were similar in the TD and ADHD-P groups for the 1-back level, Z = 0.1, p = .460, and the 2-back level, Z = -0.9, p = .184. However, when omitting the outlier for the 2-back level, test-retest reliabilities were different in the TD and ADHD group, Z = -1.92, p = .027.

The results of the application of the RC estimates are shown in table 4. To evaluate the RC estimates, they were also applied to the groups (TD/ADHD-P) on which calculations were based. In the TD group, 10 – 19 % of the children would be classified as showing reliable change. In the ADHD-P group, 11 – 32 % of the children would be classified as showing reliable change.

Table 4. Application of the RC estimates

Note: RC… - = percentage children showing negative reliable change based on calculations in

the group indicated, RC… + = percentage children showing positive reliable change, based on

calculations in the group indicated, ADHD-Po = calculations based on the ADHD group when the outlier was omitted.

Not only positive change was observed, some children performed worse at follow-up than at baseline. Both TD and ADHD-P estimates were applied to the ADHD-M group, with 24% being the highest percentage of children showing reliable change. Fisher’s exact test

1-back 2-back 3-back

TD ADHD-P ADHD-M TD ADHD-P ADHD-M TD ADHD-P ADHD-M RCTD - (%) 10 6 0 0 6 0 RCADHD-p -(%) 0 0 5 0 RCADHD-Po - (%) 16 0 RCTD +(%) 7 12 10 18 13 0 RCADHD-p +(%) 11 6 5 18 RCADHD-Po + (%) 16 24

(18)

indicated that the proportions of individuals in the ADHD-P and ADHD-M showing positive reliable change were similar on the 1-back level (p = 0.543, one sided) and the 2-back level, both with (p = .260, one sided) and without outlier (p = .434, one sided).

Exploratory analyses.

Results of the bootstrap analyses are shown in Table 5. As in the ADHD-P and ADHD-M groups only data of 6 and 5 children was available, bootstrap analyses were thought not to be appropriate for this data. Except for the 1-back level, mean and median change score are highly variable.

Table 6. 95% percent confidence intervals of mean and median change scores

Controls Placebo Methylphenidate

Mean

1-back change score -0.45 – 0.77 -0.12 – 3.00 -0.71 – 3.80 2-back change score 2.55 – 7.12 -2.24 – 6.23 -5.06 – 11.40 3-back change score -0.59 – 6.11

Median

1-back change score 0 – 0 -1 – 2 -1 – 1

2-back change score 1 - 5 0 – 7 1 - 8

3-back change score 0 – 4 Note: Bootstrap results are based on 1000 samples.

(19)

Discussion

In the present study, RC estimates for an N-back task of working memory were

established. Because this task is used in a large, randomized controlled trial in which efficacy of methylphenidate is evaluated, it is important to investigate whether increased performance at follow-up reflects a real improvement in cognitive abilities. The reliable change index is a method for investigating when change in performance on the individual level exceeds the change that is expected due to task characteristics, such as reliability and practice effects. Calculating reliable change indices could enhance clinical utility of cognitive tasks because they allow for the evaluation of results on the individual level, whereas traditional statistics evaluate performance at the group level.

In this study, the reliable change estimates calculated did not enhance clinical utility of the N-back task. Because of the large variance in both performance and change scores between baseline and follow-up, the range of the reliable change estimates is very large. An increase of many points from baseline to follow-up was needed to be classified as showing reliable change. Hence, the RC estimates were not very sensitive in detecting change, and the proportion of children in the ADHD-M group showing reliable change was similar to the proportion of children showing reliable change in the ADHD-P group. This is not consistent with prior research, as both the RCIPE generally yields good results (Heaton et al., 2001,

Heaton, Grant & Dikmen, 1999; Hinton-Bayre, 2010) , and MPH has been found to be effective in improving WM in ADHD (Bedard & Tannock, 2008; Bedard, Jain, Johnson, & Tannock, 2007; Kobel et al, 2008; Mehta, Goodyer, & Sahakian, 2004).

The three elements contributing to the RC estimates are the SD’s at both test

occasions, and the test re-test reliability. Most SD’s found in this study are quite large, even in the TD group. More troublesome however, are the low test re-test reliabilities. It seems that factors other than change in working memory, for example attention or motivation,

(20)

contributed to N-back performance. Even though identified as an outlier, a decrease of 24 points (total amount of point is 32) at follow-up indicates that other factors than WM abilities is of influence on test performance. Test re-test reliabilities are not only important in

calculating RC estimates, they are also important when evaluating medication results on the group level, or even when evaluating results of a single test session. For the 1-back condition, low test-retest reliabilities might be explained by ceiling effects. This might also be the case for the 2-back level in control children, in which ceiling effects seem to be present at follow-up. Only in the ADHD-P group, 2-back reliability was modest after removal of the outlier, and comparable with the test-retest reliabilities found by van Leeuwen, van den Berg, Hoekstra, and Boomsma (2007). Hinton-Bayre (2010) states that it could be argued that interpretation of change in settings where test re-test reliability is < 0.70 can’t be justified. Hence, this task may not appropriate for measuring change due to its psychometric

properties. The failure to classify children in the MPH-M group as showing reliable change may be due to task characteristics, not to the RC method employed.

Indeed, no group effects of MPH were found either, in contrast with earlier research on MPH effects and WM in ADHD (Bedard & Tannock, 2008; Bedard, Jain, Johnson, &

Tannock, 2007; Kobel et al, 2008; Mehta, Goodyer, & Sahakian, 2004). The low test re-test reliabilities might also be an explanation this result. Another explanation might be that medication dosage in this study was not sufficient to affect cognitive abilities. This is not very likely, as children were closely monitored by a psychiatrist during the study. In addition, optimal titration was used to determine the appropriate dosage, so when no effect of MPH was noticed, the dosage was increased.

Furthermore, test re-test reliabilities of the TD and ADHD-P groups were compared, and were found not to be different for the 1-back level. Without removal of the outlier, test-stabilities for the 2-back level were not found to be different. This is similar to results found

(21)

in prior research, where performance stability was also found to be similar in control children and children with ADHD (Mollica, Maruff & Vance, 2004). However, in the ADHD-P group, removing one outlier on the 2-back level had a substantial effect on the test re-test reliabilities. After doing so, test-stabilities were found to be higher in the ADHD-P group. When looking at the data, it seems that there is a ceiling effect for the TD group at the 2-back level at follow-up, which is not present in the ADHD-P group. When looking at the follow-up data, the TD group performs significantly better than the ADHD-P group on the 2-back level. One other study investigated test re-test reliabilities of this specific task (Leeuwen, van den Berg, Hoekstra, & Boomsma, 2007). They found a test re-test reliability of .65 for the 2-back level and a reliability of .70 for the 3-2-back level in a sample of children aged 8 – 11. However, the mean age was 8.7, which is somewhat younger than in the current study. In their sample of adolescents aged 14 – 20 (M =18.4), test re-test reliabilities were .16 for the 2-back task, and .70 for the 3-back level. They attributed the low test-retest for the 2-back level to ceiling effects. As discussed, this might also be the case for the TD group in the present study. The 2-back test re-test reliability coefficient (without outlier) was .64, which is similar to the reliability of .65 found by van Leeuwen et al. (2007) in children aged 8 – 11. The prefrontal cortex, and particularly the DL-PFC, involved in working memory, appears to be the last brain region to mature (Casey, Giedd, & Thomas, 2000). These findings suggest that that development of the DL-PFC in children with ADHD lags behind the development in normal children. Indeed the DL-PFC is frequently implicated in ADHD (Cubillo, Halari, Smith, Taylor & Rubia; Dickstein, Bannon, Castellanos & Milham, 2006;Nigg & Casey, 2005; for development see Krain & Castellanos, 2006).

As the test re-test reliabilities described by van Leeuwen et al. (2007) were highest for the 3-back level, even in children aged 8 -11, the cut-off criterion employed in the current study is a limitation. However, ethical considerations have played a part in this decision. The

(22)

neuropsychological assessment already takes about an hour. In addition, children generally find the N-back task quite hard, and forcing them to do the 3-back level would yield

frustration and fatique. Van Leeuwen et al. (2007) administered only a 1-back practice block. Doing this, and always administering the 3-back level, would be a good suggestion for future research using the N-back task. Test re-test reliability of the 3-back level in the TD group however, are smaller than those reported by van Leeuwen et al. (2007). This might be explained by the set-up of their task, which was a little different than in the current study. Each level contained one practice block and three blocks in which performance was

measured, with each block consisting of 20 trials, yielding a maximum score of 60 for each level. Hence, this might be a better way of administering this task. In addition, the cut-off score employed in the current study may also have influenced the 3-back test-retest reliability. It should be noted however, that the test re-test interval in van Leeuwen et al. (2007) lasted between two and three weeks. As shorter test re-test intervals have been associated with higher reliability (Duff, Beglinger, Moser, & Paulsen, 2010), this may also explain the difference. If the N-back task is to be used in future research, first the most appropriate set-up and ideal test re-test interval should be rigorously evaluated.

Another limitation of the current study is that sample sizes were relatively small. In addition, a large amount of variance in performance was present. This is illustrated by the fact that one outlier has a huge influence on the results, and even bootstrap analyses yielded estimates with a wide range of dispersion. Greater sample sizes would clarify whether performance on this task is as variable in a larger sample.

The overall conclusion of this study, with the data currently available, is that clinical utility of this task is very limited, and hence is not appropriate for measuring change in cognitive abilities. In future research aimed at investigating MPH effects (or any other treatment effects) on cognitive abilities, it is imperative to include valid, objective measures

(23)

of cognitive function that may be administered repeatedly (Pietrzak, Mollica, Maruff, & Snyder, 2006).

References

Alderson, R. M., Kasper, L. J., Hudec, K. L., & Patros, C. H. G. (2013).

Attention-deficit/hyperactivity disorder (ADHD) and working memory in adults: a meta-analytic review. Neuropsychology, 27(3), 287–302. doi:10.1037/a0032371

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental health disorders (4th edition.). Washington, DC: Author.

Baddeley, A. (2003). Working memory: Looking back and looking forward. Nature Reviews Neuroscience, 4(10), 829–839. doi:10.1038/nrn1201

Baddeley, A. (2010). Working memory. Current Biology, 20(4), R136–R140. doi:10.1016/j.cub.2009.12.014

Barkley, R. A. (1997). Behavioral inhibition, sustained attention, and executive functions: constructing a unifying theory of ADHD. Psychol. Bull. 121, 65–94.

Bedard, A.C., Jain, U., Johnson, S. H., & Tannock, R. (2007). Effects of methylphenidate on working memory components: influence of measurement. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 48(9), 872–80.

(24)

Bedard, A.-C., & Tannock, R. (2008). Anxiety, methylphenidate response, and working memory in children with ADHD. Journal of Attention Disorders, 11(5), 546–57. doi:10.1177/1087054707311213

Bottelier, M. A., Schouw, M. L. J., Klomp, A., Tamminga, H. G. H., Schrantee, A. G. M., Bouziane, C., . . . Reneman, L. (2014). The effects of Psychotropic drugs On

Developing brain (ePOD) study: methods and design. BMC Psychiatry, 14-48. doi:10.1186/1471-244X-14-48

Casey, B. J., Giedd, J. N., & Thomas, K. M. (2000). Structural and functional brain development and its relation to cognitive development. Biological Psychology, 54, 241−257.

Castellanos, F. X., & Tannock, R. (2002). Neuroscience of attention-deficit/hyperactivity disorder: the search for endophenotypes. Nature Reviews. Neuroscience, 3(8), 617–28. doi:10.1038/nrn896

Chelune, G. J., Naugle, R. I., Luders, H., Sedlak, J. & Awad, I. A. (1993). Individual change after epilepsy surgery: Practice effects and base-rate information. Neuropsychology, 7, 41–52.

Coghill, D. R., Seth, S., Pedroso, S., Usala, T., Currie, J., & Gagliano, A. (2013). Effects of methylphenidate on cognitive functions in children and adolescents with attention-deficit/hyperactivity disorder: evidence from a systematic review and a meta-analysis. Biological psychiatry, Epub ahead of print. doi: 10.1016/j.biopsych.2013.10.005.

Cubillo, A., Halari, R., Smith, A., Taylor, E., & Rubia, K. (2012). A review of fronto-striatal and fronto-cortical brain abnormalities in children and adults with Attention Deficit

(25)

Hyperactivity Disorder (ADHD) and new evidence for dysfunction in adults with ADHD during motivation and attention. Cortex, 48 (2), 194 – 215. doi:

10.1016/j.cortex.2011.04.007.

Dickstein, S. G., Bannon, K., Xavier Castellanos, F., & Milham, M. P. (2006). The neural correlates of attention deficit hyperactivity disorder: an ALE meta‐analysis. Journal of Child Psychology and Psychiatry, 47(10), 1051-1062.

De Jonge, P., & De Jonge, P.F. (19960. Working memory, intelligence, and reading ability in children. Personality and individual differences, 21, 1007 – 1020.

Douglas, V. I. (1999). Cognitive control processes in Attention-Deficit Hyperactivity Disorder. In Quay HC, Hogan AE (eds). Handbook of Disruptive Behaviour Disorders (105–138). New York: Plenum Press.

Duff, K. (2012). Evidence-based indicators of neuropsychological change in the individual patient: Relevant concepts and methods. Archives of Clinical Neuropsychology, 27(3), 248-261.

Duff, K., Paulsen, J., Mills, J., Beglinger, L. J., Moser, D. J., Smith, M. M., ... & Harrington, D. L. (2010). Mild cognitive impairment in prediagnosed Huntington disease.

Neurology, 75(6), 500-507.

Ferdinand, R.F., Van der Ende, J., & Mesman, J. (1998). Diagnostic Interview Schedule for Children, DISC-IV. Nederlandse vertaling [Dutch translation]. Unpublished manuscript. Sophia Kinderziekenhuis, Rotterdam.

(26)

Gevins, A., & Cutillo, B. (1993). Spatiotemporal dynamics of component processes in human working memory. Electroencephalography and clinical neurophysiology, 87(3), 128-143.

Greenhill, L. L., & Ford, R. (2002). Childhood attention deficit hyperactivity disorder: pharmacological treatments. In J. M. Gorman & P. E. Nathan (Eds.), A guide to treatments that work (2nd Ed.). New York: Oxford University Press.

Heaton, R. K., Temkin, N., Dikmen, S., Avitable, N., Taylor, M. J., Marcotte, T. D., & Grant, I. (2001). Detecting change: A comparison of three neuropsychological methods, using normal and clinical samples. Archives of Clinical Neuropsychology : The Official Journal of the National Academy of Neuropsychologists, 16(1), 75–91.

Heilbronner, R.L., Sweet, J.J., Attix, D.K., Krull, K.R., Henry, G.K., &Hart, R.P.

(2011).Official position of the American Academy of Clinical Neuropsychology on serial neuropsychological assessments: The utility and challenges of repeat test

administrations in clinical and forensic contexts. The Clinical Neuropsychologist, 24(8), 1267–1278.

Hinton-Bayre, A. D. (2010). Deriving Reliable Change statistics from test–retest normative data: Comparison of models and mathematical expressions. Archives of Clinical Neuropsychology, 25(3), 244-256.

Hoaglin, D. C. (2003). John W. Tukey and data analysis. Statistical Science, 18(3), 311-318.

Iverson, G. L. (2001). Interpreting change on the WAIS-III/WMS-III in clinical samples. Archives of Clinical Neuropsychology, 16(2), 183-191.

(27)

Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19.

Jansma, J. M., Ramsey, N. F., Coppola, R., & Kahn, R. S. (2000). Specific versus

Nonspecific Brain Activity in a Parametric N-Back Task. NeuroImage, 12, 688-697.

Kane, M. J., Conway, A. R. A., Miura, T. K., & Colflesh, G. J. H. (2007). Working memory, attention control, and the N-back task: A cautionary tale of construct validity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 615-622.

Kasper, L. J., Alderson, R. M., & Hudec, K. L. (2012). Moderators of working memory deficits in children with attention-deficit/hyperactivity disorder (ADHD): A meta-analytic review. Clinical psychology review, 32(7), 605-617.

Klingberg, T., Fernell, E., Olesen, P. J., Johnson, M., Gustafsson, P., . . Forssberg, H. (2005). Computerized training of working memory in children with ADHD – A randomized controlled trial, Journal of the American Academy of Child & Adolescent Psychiatry, 44(2), 177-186.

Kobel, M., Bechtel, N., Weber, P., Specht, K., Klarhöfer, M., Scheffler, K., Penner, I.-K. (2008). Effects of methylphenidate on working memory functioning in children with attention deficit/hyperactivity disorder. European Journal of Paediatric Neurology : EJPN : Official Journal of the European Paediatric Neurology Society, 13(6), 516–23. doi:10.1016/j.ejpn.2008.10.008

(28)

Kort, W., Compaan, E. L., Bleichrodt, N. Resing, W. C. N. Schittekatte, M., Vermeir, G., & Verhaeghe, P. (2002). Nederlandse bewerking van: Wechsler, D. (1991). Wechsler Intelligence Scale for Children, 3rd Edition. Amsterdam: NIP Dienstencentrum.

Krain, A.L., & Castellanos, F.X. (2006). Brain development and ADHD. Clinical Psychology Review, 26, 433- 444.

Maassen, G. (2005). Reliable change assessment in sport concussion research: a comment on the proposal and reviews of Collie et al. British Journal of Sports Medicine, 483–489. doi:10.1136/bjsm.2004.015594

Maassen, G. H., Bossema, E., Brand, N. (2009). Reliable change and practice effects: outcomes of various indices compared. J Clin Exp Neuropsychol, 31, 339–352.

Martinussen, R., Hayden, J., Hogg-Johnson, S., & Tannock, R. (2005). A meta-analysis of working memory impairments in children with attention-deficit/hyperactivity disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 44(4), 377–84. doi:10.1097/01.chi.0000153228.72591.73

McCaffrey, R .J., & Westervelt, H.J. (1995). Issues associated with repeated neuropsychological assessments. Neuropsychology Review, 5, 203–221.

McCaffrey, R.J., Ortega, A., Orsillo, S.M., Nelles, W.B., & Haase, R.F. (1992). Practice effects in repeated neuropsychological assessments. Clinical Neuropsychologist, 6, 32– 42.

Mehta, M. A., Goodyer, I. M. & Sahakian, B. J. (2004) Methylphenidate improves working memory and set-shifting in AD/HD: Relationships to baseline memory capacity. Journal of Child Psychology and Psychiatry 45, 293–305.

(29)

Mollica, C. M., Maruff, P. & Vance, A. (2004). Development of a statistical approach to classifying treatment response in individual children with ADHD. Hum

Psychopharmacol Clin Exp, 19, 445–456. DOI: 10.1002/hup.624

Nigg, J.T., & Casey, B.J. (2005). An integrative theory of attention-deficit/hyperactivity disorder based on the cognitive and affective neurosciences. Development and Psychopathology, 17, 785-806.

Nigg, J. T., Willcutt, E., Doyle, A. E., & Sonuga-Barke, J. S. (2005). Causal heterogeneity in ADHD: do we need neuropsychologically impaired subtypes? Biological Psychiatry, 57, 1224–1230.

Oosterlaan, J., Baeyens, D., Scheres, A., Antrop, I., Roeyers, H. & Sergeant, J.A. (2008). Vragenlijst voor gedragsproblemen bij kinderen 6-16 jaar, Handleiding. Amsterdam: Harcourt Publishers.

Pietrzak, R.H., Mollica, C.M., Maruff, P., & Snyder, P.J. (2006). Cognitive effects of immediate-release methylphenidate in children with attention-deficit/hyperactivity disorder. Neuroscience and Biobehavioral Reviews, 30, 1225 – 1245.

Sergeant, J., Oosterlaan, J. & Van der Meere, J. (1999). Handbook of Disruptive Behavior Disorders (eds Quay, H. C. & Hogan, A. E.) 75–104 Plenum, New York.

Shaffer, D., Fisher, P., Lucas, C. P., Dulcan, M. K., & Schwab-Stone, M. E. (2000). NIMH Diagnostic Interview Schedule for Children Version IV (NIMH DISC-IV): description, differences from previous versions, and reliability of some common diagnoses. Journal of the American Academy of Child & Adolescent Psychiatry, 39(1), 28-38.

(30)

Temkin, N. R., Heaton, R. K., Grant, I. & Dikmen, S. S. (1999). Detecting significant change in neuropsychological test performance: a comparison of four models. J Int Neuropsyl Soc, 5(4), 357-369.

Tombaugh, T. N. (2005). Test-retest coefficients and 5-year change scores for the MMSE abd 3MS. Arch Clin Neuropsychol, 20(4), 485-503.

Sonuga-Barke, E. J. (2002). Psychological heterogeneity in AD/HD — a dual pathway model of behaviour and cognition. Behav. Brain Res. 130, 29–36.

Van der Oord, S., Ponsioen, A. J. G. B., Geurts, H. M., Ten Brink, E. L., & Prins, J. M. (2012). A pilot study on the efficacy of a computerized exectuvive function remediation training with game elements for children with ADHD in an outpatient setting: Outcome on parent- and teacher-rated executive functioning and ADHD behavior. Journal of Attention Disorders, in press. DOI: 10.1177/1087054712453167.

Van Leeuwen, M., van den Berg, S. M., Hoekstra, R. a., & Boomsma, D. I. (2007).

Endophenotypes for intelligence in children and adolescents. Intelligence, 35(4), 369– 380. doi:10.1016/j.intell.2006.09.008

Waschbusch, D. A., Pelham, W. E., Waxmonsky, J. & Johnston, C. (2009). Are there placebo effects in the medication treatment of children with attention-deficit hyperactivity

disorder? J Dev Behav Pediatr, 30(2), 158-68.

Willcutt, E. G., Doyle, A. E., Nigg, J. T., Faraone, S. V., & Pennington, B. F. (2005). Validity of the executive function theory of attention-deficit/hyperactivity disorder: A meta- analytic review. Biological Psychiatry, 57, 1336-134.

Referenties

GERELATEERDE DOCUMENTEN

which approaches they use, towards change recipients’ individual and group attitudes, (3) try to figure out if, how and in which way change recipients’ attitudes are influenced

An inquiry into the level of analysis in both corpora indicates that popular management books, which discuss resistance from either both the individual and organizational

Communication plays an important role in change and employee participation but does not clearly influence the relationship between leadership behavior and employee participation..

Also, management of an organization would increase change process involvement and com- mitment when organizational members have influence in decision-making within the change

The management question that was on the basis of this research was how to get the employees ready to change the social culture at [XYZ] into a more

Furthermore, the informant was explicitly invited to mention what employees make, and how they become enthusiastic about a change (favourable perception), feel the need for

As argued by Kotter and Schlesinger (1989), participation in the change process had a high impact on the willingness of middle management within Company XYZ to change.. Moreover,

Although communication remained a significant predictor of willingness to change in the drawn regression models, its influence has been decreased substantially by the addition