• No results found

From What Age on can Children Provide Valid Self-reports on The Strength and Difficulty Questionnaire? Support from The Validity-Index Approach

N/A
N/A
Protected

Academic year: 2021

Share "From What Age on can Children Provide Valid Self-reports on The Strength and Difficulty Questionnaire? Support from The Validity-Index Approach"

Copied!
49
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Internship Paper for Research Master Study

From What Age on can Children Provide Valid Self-reports on The Strength and Difficulty Questionnaire? Support from The Validity-Index Approach

Yan Liu

Supervisors: dr. J. M. Conijn and dr. N. Smits

Author Note

Yan Liu, Child Development and Education Department, University of Amsterdam; Judith M. Conijn, Section of Educational Sciences, VU University Amsterdam;

Niels Smits, Child Development and Education Department, University of Amsterdam. This research is a part of the study results of Yan Liu in the research master program at Child Development and Education Department, University of Amsterdam.

(2)

Abstract

The self-report version Strength and Difficulty Questionnaire (SDQ) is widely used across countries for adolescents to report their psychopathological symptoms. The suggested age by the developers of the SDQ was eleven to sixteen, but previous studies also implied that the SDQ should generate valid self-reports by children as young as eight. The present study aimed to provide additional psychometric evidence on the quality of SDQ self-reports for children of different ages. We used the statistical validity-index approach (Conijn, Smits, & Hartman, 2019) to address our study aim. In this approach, we adopted the maximum long string index to

quantify children’s repetitive response bias for 276 paired child-teacher reports and the Guttman indices to quantify inconsistent responses for 785 children’s self-reports.

The analysis of the maximum long string index indicated that children’s self-report seemed to be biased by repetitive response patterns at a comparable degree as teacher-report. The analyses of the Guttman indices (𝐺𝑝 and 𝐺𝑛𝑝) showed that the inconsistencies in children’s self-report decreased with age. From age-10 onwards, the classification rate of inconsistent response patterns in children’s self-reports maintained at a relatively stable level.

Our study suggests that children aged ten and above can provide sufficiently valid self-report SDQ data. Inconsistent response bias interferes with valid responses of children below ten years on SDQ; repetitive response bias does not interfere with the response behavior of children aged eight and above substantially. Due to the limitations of the dataset, further studies are needed to get a firm conclusion.

Keywords: The Strength and Difficulty Questionnaire (SDQ), validity-index approach, long string index, Guttman index, child measure

(3)

The Strengths and Difficulties Questionnaire (SDQ) (Goodman, 1997; Goodman, 2001) has been widely used around the world to assess the psychological adjustment of children aged three to 16 (www.sdqinfo.com). The SDQ consists of five subscales, four about difficulties children are experiencing (i.e., emotional symptoms, conduct problems, hyperactivity-inattention problem, and peer problems) and one about the strength demonstrated in their behavior (i.e., prosocial behavior). Each subscale includes five items, making the SDQ a 25-item questionnaire. Identical or nearly identical SDQ versions are available for parents, teachers, and children, which are here denoted as the parent-version, teacher-version, and self-report version of the SDQ, respectively. Because of its simple wording, multiple dimensions, and short length, the SDQ is popular in different areas to measure the mental health of children. In education and the social work area, SDQ is used to screen children’s mental disorders. In the clinical context, the SDQ is used to produce informative results about the type and severity of the potential disorders and to assess the outcome of treatment. In research, the SDQ is used by researchers to conduct both developmental and epidemiological research.

The self-report version SDQ was developed for children aged 11 to 16 to report their emotional and behavioral symptoms (www.sdqinfo.com). However, whether 11-years is the minimum age at which children are capable of providing valid self-report data on the SDQ was not supported sufficiently by existing psychometric evidence. On the one hand, the self-reports from children aged 11 years and above have been reported to show satisfactory properties, such as internal consistency (Cronbach's 𝛼, not including conduct problem and/or peer problem subscales sometimes), inter-rater correlation, test-retest reliability, and convergent validity (e.g., Becker, Hagenberg, Roessner, Woerner, & Rothenberger, 2004; Vugteveen, de Bildt, Serra, de Wolff, & Timmerman, 2018; Goodman, Meltzer, & Bailey, 1998); on the other hand, whether

(4)

the self-reports from children aged 10-years (or younger) are substantially less valid than the self-reports from children aged 11-years (and above) remains unknown. Therefore, more research is needed to collect additional psychometric evidence on the validity of children’s self-report on the SDQ.

The studies including children younger than 11 produced mixed findings on the

psychometric properties of children’s self-report on the SDQ (Muris, Meesters, & van den berg, 2003; Muris, Meesters, Eijkelenboom, & Vincken, 2004; Van Roy, Veenstra, & Clench-Aas, 2008). For example, Muris et al. (2003) administered the self-report SDQ among children aged 9 to 15 years (i.e., 9-to-12-year group and 13-to-15-year group). The psychometric properties of the data were roughly satisfactory, including inter-rater correlation, 2-month test-retest

reliability, concurrent validity, and internal consistency (Cronbach’s 𝛼) except for conduct problem and peer problem subscales. The low values of 𝛼 for the two subscales were attributed to the positively worded reversed-scored items. The child-parent agreement for the 9-to-2-year group was comparable to that for the 13-to-15-year group. In another study, Muris et al. (2004) included children aged 8 to 13 (8-to-10-year group and 11-to-13-year group). The estimated Cronbach’s 𝛼 for the complete sample was comparable to that for the data collected among children aged-11 and above. However, Cronbach’s 𝛼 for the 8-to-10-year group was overall unacceptable. A general issue with these studies is that no psychometric properties for individual age groups were reported; Rather, they reported the psychometric properties of the data from mixed-age groups. The results derived from the mixed-age-group comparisons of Cronbach’s 𝛼 and inter-rater correlations were not sufficiently informative to come to a firm conclusion about the minimum age of using the self-report version SDQ.

(5)

Age issue for child measures

Determining the age at which children are capable of providing valid self-report is a crucial issue when researching children (e.g., Landgraf, van Grieken, & Raat, 2018; U.S Food and Drug Administration, 2009), and must be assessed for every measure. Different

questionnaires, depending on their specific properties, ask for different levels of intellectual, attention, and time devotion to guarantee primary validity. For example, longer questionnaires require the respondents to concentrate for a longer time; items about reflection on the self are more cognitively and verbally demanding than items touching upon concrete emotions and behaviors (Damon & Hart, 1988; Eder, 1989); reporting on emotions is more complicated than reporting on overt behaviors (Eddy, Khastou, Cook, & Amtmann, 2011). When the requirements go beyond the reach or willingness of children, they cannot entirely grasp what the items are about, or they cannot concentrate long enough to go through all the items. As a result, their responses may be biased, either by carelessness or misunderstanding. Because children’s capability of completing questionnaires is closely related to their developmental stage (Davis-Kean & Sandler, 2001), studying the minimum age for applying a child measure is a

fundamental approach to ensure proper respondents can provide valid responses. Approaches to studying the minimum age

In the literature, internal consistency (Cronbach’s 𝛼), inter-rater correlation, and test-retest correlation have often been used to assess the quality of children’s self-reports on SDQ (Goodman, et al., 1998; Muris, et al., 2004; Muris, et al., 2003). However, these indicators do not give a candid or complete picture of the validity of children’s self-reports (Conijn, Smits, & Hartman, 2019).

(6)

First, Cronbach’s 𝛼 expresses the extent to which the item scores in a (sub)scale correlate with each other (Liu, Wu, & Zumbo, 2010). Higher 𝛼 value indicates higher consistency across items with .60 as a threshold of being acceptable. However, Cronbach’s 𝛼 might also be high as the result of an artifact: it may be positively biased if all items are worded in the same direction, for example, when the same response option is selected repeatedly (Liu et al., 2010; Vaske, Beaman, & Sponarski, 2017; Peer & Gamliel, 2011). For the SDQ, all items in the prosocial behavior subscale are worded positively; all items in the emotional problem subscale are worded negatively. As a result of this setup, blindly selecting the same response option for each item in these two subscales would lead to a highly consistent response pattern. Therefore, instead of representing the high reliability of the subscale, a high 𝛼 might also be a sign of biased response patterns. Second, inter-rater reliability expresses the extent to which multiple raters show consensus. However, parents’ or teachers’ reports cannot be considered the gold standard of screening a child’s psychopathological symptoms, and therefore using the agreement with their reports to indicate the quality of children’s self-report may result in invalid conclusions (Rey, Schrader, & Morris-Yates, 1992). Third, using test-retest reliability to evaluate the quality of children’s self-reports may also be problematic. Because children may develop so fast that the true scores of their characteristics may change in a short period, lower test-retest reliability does not necessarily mean that their responses are unreliable (Roberts & DelVecchio, 2000). Fourth, inter-rater correlation and test-retest correlation are generally derived from the (sub)scale scores, which misses the item-level information of children’s self-report such as information on specific response biases. For instance, the same total scores may be derived from two utterly different response patterns: both response patterns 3, 4, 1 and 1, 3, 4 make a total score 8. Studying

(7)

children’s response patterns may provide further valuable information to understand the quality of children’s self-report on the SDQ.

This study, therefore, adopted the validity-index approach (Conijn et al., 2019) to study children’s response patterns on the self-report version SDQ. Specifically, we studied whether 8 to 12-year-old children’s self-reports on the SDQ were biased, and if so, what kind(s) of bias was at play and to what degree.

Validity-index approach

The validity-index approach makes use of various post hoc validity-indices to quantify different types of non-optimal response patterns. For example, the extreme response index is used to detect the tendency that the extreme options are more often selected than the in-between options. The (dis)agreement response index is used to detect to what extent the agreement options are endorsed more often than the disagreement options or the opposite (i.e., agreement response bias or disagreement response bias). The long string index is used to quantify the length of the consecutive selection of the same response option (i.e., repetitive response bias)

(Chambers and Johnson, 2002; Davis et al., 2007). The Guttman index is used to detect to what extent the item scores in a response pattern are inconsistent across one unidimensional scale (Shelton, Frick, & Wootton, 1996). These validity-indices are computed for individual

respondents, showing to what degree one response pattern is compromised by a specific type of biased response behavior. Higher values of validity-index usually indicate more biased response patterns. In other words, a higher value of validity-index is associated with lower validity. By pooling values of a given validity index within groups of respondents (e.g., based on children’s age), we can evaluate to what degree the response patterns from a specific group are interfered with by a specific type of response bias.

(8)

Using the validity-index approach, this study provides complementary evidence for establishing the minimum age for using the self-report version SDQ (Conijn et al., 2019). The basic idea is to compare the validity of the children’s response patterns of a given age to that of corresponding adult informants and identify the children’s age since when there is no substantial difference in validity. The values of the validity-index for children’s self-reports are expected to decrease with age and gradually approach the index values of adults. Generally speaking, the following two questions were answered in this study: (1) From what age on can children provide SDQ response patterns as valid as their teachers do? (2) What type of response biases interferes with the validity of children’s self-report SDQ data?

Method Participants and procedure

The data used in this study were collected for a larger study (Zee & Roorda, 2018) among 3- to 6-grade students and their teachers in eight regular primary schools in the Randstad area of Netherlands. For a detailed description of the data collection, please refer to Zee and Roorda (2018).

Seven hundred and eighty-five children’s self-reports and 267 teacher-reports about a subset of these children were included in the present study. Among the 785 children’s self-reports, 384 (48.9%) were provided by boys. Children were grouped by age: the 8-year group included 109 children (45% boys), the 9-year group 175 children (49.1% boys), the 10-year group 203 children (48.8% boys), the 11-year group 212 children (50.9% boys), and the 12-year group 86 children (47.7% boys).

The 267 teacher-reports were provided by 35 teachers (9 males and 26 females) with an average age of 39.74. The mean teaching experience was 13.71 years, ranging from 1 year to 43

(9)

years. The demographic characteristics of the teachers were comparable to that of the general teacher population in the Netherlands (Zee & Roorda, 2018). Each teacher provided individual ratings for three to 10 students in his or her classroom. For the 267 students with a teacher report, 130 were boys (47%). Their ages were 8-years (N = 46), 9-years (N = 59), 10-years (N = 65), 11-years (N = 69), and 12-11-years (N = 28). The gender distribution was balanced in each age group. Henceforth, we call the 267 self-reports and the paired teacher reports the “paired child-teacher subset” of the data.

Measures and Indicators

The Strength and Difficulty Questionnaire (SDQ).

The Dutch versions SDQ for self-report and teacher-report were employed in Zee and Roorda’s study (2018). The item stems are identical in the self-report version and teacher version except for slight wording differences. For example, item 1 in the self-report version is "I try to be nice to other people," and it is worded in the teacher-version as "(This student is) considerate of other people's feelings (www.sdqinfo.com). The SDQ includes five subscales: emotional

problems, conduct problems, peer problems, hyperactivity, and prosocial behavior. Each subscale includes five items, and items of different subscales are presented in a mixed order. Five out of the 25 items are reversed scored. For the four difficulty subscales (emotional problems, conduct problems, peer problems, hyperactivity), higher subscale scores represent more social-emotional problems. For the prosocial subscale, higher scores indicate more strength in social adjustment.

The 3-point scale in the original version of the SDQ was adapted to the 5-point scale by Zee and Roorda (2018). For the self-report version, the three original options, "Not true," "Somewhat true," and "True" were converted to five options: "No, that is not true," "That is usually not true," "SOMETIMES," "That is usually true," and "Yes, that is true." For the

(10)

teacher-report version, the three original options, "Disagree," "Somewhat agree," and "Agree," were transformed into five options: "Definitely does not apply," "Not really," "Neutral, not sure," "Applies somewhat," and "Definitely applies." The five options for each item were graded 1 to 5, and the total score for each subscale, therefore, ranged from 5 to 25.

The adaptation of response options by Zee & Roorda (2018) may have brought advantages and disadvantages. A 5-point scale forces the respondents to differentiate more extensively to what extent they agree or disagree with the statement of the item than a 3-point scale does. Responses on a 5-point scale may be more informative compared to the 3-point scale and may be capable of detecting smaller differences in feelings and behaviors (Alwin, 1992). However, to select one option from five may be more demanding cognitively for respondents than to select one option out of three (Krosnick & Presser, 2009; Royeen, 1985). The adapted version with a 5-point scale may be more challenging for children whose cognitive development is relatively limited than the original version with a 3-point scale and thereby may decrease the reliability and validity of their response. Besides, the middle option of the adapted teacher-version, "Neutral, not sure," can be interpreted in two ways: the first is "neither agree nor disagree"; the second is "having insufficient information to make a decision." The potentially confounded meaning of the middle response option might have interfered with the response process of the teachers and may have resulted in unexpected response patterns. Therefore, a preliminary check of the distribution of response options was conducted to inspect the influence of scale adaptation on response behavior.

Validity Indicators.

Four validity indices were initially considered to quantify the possible biases existing in each response pattern: the extreme response index, agreement response index, long string index,

(11)

and Guttman index (refer to the introduction part for the definition of each index). The properties of the SDQ implied that the use of the extreme response index and the agreement response index were inappropriate. For example, item scores of the SDQ among the general population are known to have a ceiling effect or floor effect (Goodman, 2001), indicating that extreme options (indicating no psychopathological symptoms) are the most popular ones. Therefore, it is

unreasonable to treat the selection of extreme options as being biased (Conijn et al., 2019). The agreement response index is improper because the SDQ includes unbalanced numbers of positively worded items and negatively worded ones, i.e., 15 negatively worded items and 10 positively worded items. We, therefore, only used the other two types of indices, long string index and Guttman index, to quantify repetitive response bias and inconsistent responding in children’s self-report on SDQ, respectively.

Maximum Long string index (𝑳𝒎𝒂𝒙).

Long string indices are useful to detect repetitive response bias. Repetitive response bias occurs when the respondent selects a specific option consecutively without regarding the item content much (Niessen, Meijer, and Tendeiro, 2016). The index is suitable when using

questionnaires with multiple dimensions or surveys with positively scored items and negatively scored items presented in mixed order (DeSimone, Harms, and DeSimone, 2015). As mentioned above, all these features are satisfied by the SDQ (i.e., five subscales, subscale items in mixed order, and the 10 positively worded items and 15 negatively worded items in mixed order). Repeated selection of a specific option consecutively can, therefore, be considered as a response pattern interfered with by repetitive response bias.

The maximum long string index (𝐿𝑚𝑎𝑥) was selected for this study since it was reported to outperform other long string indices (Meade & Craig, 2012; Niessen et al., 2016). 𝐿𝑚𝑎𝑥

(12)

quantifies the length of the consecutive selection of a specific option in one response pattern. For example, if a respondent provides answers to 10 items like this: 1,3,2,1,4,4,5,3,3,3, the lengths of repetitive answers are 1,1,1,1,2,1,3, among which the largest one is three. Thus, the value of 𝐿𝑚𝑎𝑥 is three for this specific response pattern. From each response pattern, one single value is derived for 𝐿𝑚𝑎𝑥. If the value of 𝐿𝑚𝑎𝑥 exceeds a pre-specified cutoff value, the corresponding response pattern is classified as “invalid.”

Guttman Indices (𝑮𝒑, 𝑮 𝒏 𝒑

).

Guttman index (𝐺𝑝) is a count of the Guttman errors in a response pattern (Niessen et al., 2016). For polytomous items, a Guttman error occurs when the respondent takes a less popular item-step without taking the more popular one. An item-step refers to the moving from one response option to the next one. For example, one item-step is taken when selecting the second response option; two item-steps are taken when selecting the third response option. The

popularity of each item step is computed based on the endorsement rate of the response options across persons in the sample. From most popular to least popular, the respondents are supposed to take the more popular item step before taking the less popular one in a unidimensional questionnaire. Each time when the respondent endorses a less popular item step but does not endorse a more popular one, one Guttman error occurs. The more Guttman errors there are, the more inconsistent the response pattern is indicated to be.

𝐺𝑛𝑝 is the normed version of 𝐺𝑝, in which the effect of the total score on the number of Guttman errors is partialed out. Because 𝐺𝑝 is possibly confounded with the total score (Conijn, Franz, Emons, de Beurs, & Carlier, 2019), 𝐺𝑛𝑝 is considered for this study as well though it is less powerful than 𝐺𝑝 in detecting careless responding (Niessen et al., 2016; Emons, 2008).

(13)

informants or age groups, the Guttman index (𝐺𝑝) would be adopted in the main analysis and normed Guttman index (𝐺𝑛𝑝) to replicate the analysis. Otherwise, the normed Guttman index (𝐺𝑛𝑝) would be used in the main analysis and the Guttman index (𝐺𝑝) to replicate the analysis.

The application of Guttman indices requires the satisfaction of the three assumptions of the nonparametric item response theory model, unidimensionality, local independence, and monotonicity (Emons, 2008). Unidimensionality refers to that all items in a subscale measure a single dominating latent variable. Local independence means that the responses to different items are statistically independent given the latent variable. Monotonicity implies that the number of item steps increases or stays the same (i.e., does not decrease) as the item-rest score increases. These assumptions were examined before applying the Guttman index.

Both Guttman indices are computed for the individual subscales of a measure, summing which produces the final index value for each response pattern. By pooling the index values of all response patterns, a cutoff value is derived for the sample. The response patterns with an index value surpassing the cutoff value are classified as "invalid" (Meijer et al., 2016). Data analysis

Data analysis comprised three steps, the preliminary analysis, the main analysis, and sensitivity analysis to replicate the results.

In the preliminary analysis, we screened the missing values first to get a general idea of the data quality. Then, we evaluated the frequency distribution of response options for each item, respectively, for child-report data and teacher-report data. Generally, we expected unimodal distributions for all items, such as bell-shaped, skewed, and ceiling/floor effect, because the extreme options indicating psychopathological symptoms in the original version were usually selected by a tiny number of respondents in community sample, for example, 10% (Goodman,

(14)

2001; Goodman, Ford, Simmons, Gatward, & Meltzer, 2000). A bimodal distribution may indicate the improper setting of response categories, and the corresponding items were

considered to be removed. After that, we computed the group mean total scores and compared these across groups based on which the selection between Guttman index and normed Guttman Index was made. Subsequently, we computed the internal consistency (Cronbach’s α) for the children’s self-report and the teacher-report, respectively, and the inter-rater correlation for the children’s self-report scores and teacher-report scores as was done in previous studies on SDQ. Finally, we assessed the three assumptions of the Guttman index. Unidimensionality was assessed by computing the variance explained by the first component in the subscale and comparing the first two eigenvalues. A ratio of 1/3 or smaller between the first two eigenvalues was considered to indicate a single strong dimension (Slocum-Gori & Zumbo, 2011) and suitable to apply the Guttman index (Conijn, Emons, & Sijtsma, 2014). R package “nFactor” (Raiche, 2010) was used to do the unidimensionality assessment. We did not assess local independence separately because sufficient unidimensionality for a short subscale was thought to imply sufficient local independence for computing the Guttman index (Conijn et al., 2019). The assumption of monotonicity was tested using R package “mokken” (van der Ark, 2007).

In the main analysis, we computed 𝐿𝑚𝑎𝑥 and 𝐺𝑝 (or 𝐺𝑛𝑝), respectively, for each

respondent. The missing values were imputed five times when computing the Guttman index. R package “PerFit” (Tendeiro, Meijer, & Niessen, 2016) was used to compute the Guttman index (𝐺𝑝 or 𝐺𝑛𝑝). The cutoff values for both indices were set at the 90% percentile (Meijer, et al., 2016), taking the break-off point at the scree plot into consideration as well. Response patterns that produced validity index values larger than the cutoff value were classified as “invalid.” Then, the validity index values and classification rates were averaged in each age group. The

(15)

95% confidence interval was also computed for the group mean validity index values and for the group classification rate. Finally, we compared visually, across age groups, the group mean value of the validity index (with 95% confidence interval) and classification rate (with 95% confidence interval) of each age group (Conijn et al., 2019).

In the sensitivity analysis, the results of the Guttman index analysis (𝐺𝑝 or 𝐺𝑛𝑝) were replicated using the other Guttman index (𝐺𝑛𝑝 or 𝐺𝑝). For example, if 𝐺𝑝 were used for main analysis, 𝐺𝑛𝑝 would be used for the sensitivity analysis.

Results Preliminary analysis

The properties of the dataset were presented first, and the results of the assessment of Guttman index assumptions are presented second.

Missing values.

For the complete set of child reports (N=785), 45 cases included missing values (37 cases missed one value; 5 cases missed two values; 2 cases missed three values; 1 case missed four values). For the paired child reports (N = 267), fifteen cases included one missing value, and another case missed two values. The teacher’s report (N = 267) contained no missing values; thus, all cases were included in data analysis.

Distribution of response frequency.

The distribution of response options for each item was inspected for the child’s self-report and the teacher-self-report, respectively, in the paired dataset (N = 267). As shown in Table 1, for the children’s self-report, 18 out of the 25 items showed a strong ceiling or floor effect. Four items were bell-shaped or closely to be bell-shaped. Item 11, “I have one good friend or more, “showed a bimodal distribution with the middle option, “SOMETIMES,” showing the lowest

(16)

frequency. For the number of good friends, “SOMETIMES” seemed not suitable because the number of good friends is usually relatively stable within a period. Thus, we concluded that the response of children was not interfered with strongly by the adaption of response options except for Item 11.

For the teacher data (N = 267), 18 out of 25 items showed a similar ceiling or floor effect as the child-report data (see the right column of Table 1). However, a bimodal pattern was found for Item 2, Item 10, and Item 15. For these items, the middle options were selected least

frequently. Item 6, Item 8, Item 12, and Item 16 showed a bumpy pattern, with a lower frequency for the middle option compared to the second and the fourth ones. The bumpy and bimodal patterns implied that the teachers avoided to select the middle option (“Neutral not sure”), and this was likely due to its confounding meaning, as mentioned in the method part. For example, most teachers are likely to avoid the middle response option (“Neutral, not sure ”) for items such as Item 10 (“I am constantly fidgeting or squirming”) because they have enough knowledge about the child to give a positive or negative answer to that question.

(17)

Considering the teacher's response behavior on nearly one-third of the items was problematically disturbed, we decided to compute the Guttman indices only for the child data and exclude the teacher data from the analyses. For the analyses concerning the Guttman indices, we, therefore, used the full child data (N = 785) excluding Item 11. For the 𝑳𝒎𝒂𝒙 index, we used the paired data (N = 267) as planned because the repetitive response style, which usually occurs regardless of the content of the items (Niessen et al., 2016), was not expected to be interfered with by the confounded meaning of the middle response option in the teacher data. Group mean scores.

First, we compared the child-reported mean scores and teacher-reported mean scores across age groups (N = 267). As shown in Table 2, the mean differences were small to modest for the emotional problems and hyperactivity subscales, with Cohen’s d below or around .30. The mean score differences for the conduct problem subscale were small to large, with Cohen’s d ranging from .23 for the 12-year group to .71 for the 9-year group. For the peer problem and prosocial behavior subscales, the difference was generally small across age groups.

Then, we compared the child-reported mean scores between adjacent age groups

(complete child dataset, N = 785), for example, between the 8-year group and the 9-year group. As shown in Table 3, the differences between adjacent age groups, in most conditions, were trivial to small, implying that a comparison of the mean values of the Guttman index across different age groups would not be interfered with by the mean score differences. Therefore, we decided to use Guttman index (𝐺𝑝) for the main analysis and normed Guttman index (𝐺𝑛𝑝) for the sensitivity analysis.

(18)

Internal consistency (Cronbach’s 𝜶).

Table 4 shows that Cronbach's α for the teacher's report was generally satisfactory for all subscales, ranging from 0.69 for the peer problem subscale to 0.88 for the hyperactivity subscale. For each age group, Cronbach's 𝛼 was acceptable (𝛼 > .60) for the teacher report except for the peer problem subscale for the 12-year group (𝛼 = .53).

Cronbach’s 𝜶 for the children’s self-report was generally acceptable except for the peer problem subscale (𝜶 = .33) and for the prosocial behavior subscale (𝜶 = .59). No clear age effect was observed from the 8-year group to the 12-year group across the subscales. For example, for emotional problem subscale, Cronbach's 𝜶 was acceptable for all age groups; for the

hyperactivity subscale, the values of α for the 8-, 10-, and 12-year groups were acceptable, and for the 9-year group and the 11-year group they were satisfactory; for the other three subscales (peer problem, conduct problem, and prosocial behavior), the 𝜶 values fluctuated between being unacceptable and acceptable across age groups. Child-teacher correlation.

The child-teacher correlation was the lowest for the 8-year group for the four difficulty subscales, especially for emotional problem subscale (r = -.01), conduct problem subscale (r = .16), and peer problem subscale (r = .16). From the 8-year group to the 12-year group, the child-teacher correlation went up and down, showing an increasing pattern roughly for the four difficulty subscales. For the prosocial behavior subscale, the child-teacher correlation ranged from .22 for the 11-year group to .40 for the 10-year group, from which no clear pattern was shown (Table 4; last row).

The assessment of the assumptions of the Guttman index.

We assessed the assumptions of the Guttman index using the complete child dataset (N = 785). Parallel analysis was conducted to examine the unidimensionality assumption. From the

(19)

scree-plot, each of the five subscales suggested one dominant component (see Appendix A). For the emotional problem, conduct problem, hyperactivity, and prosocial behavior subscales, the eigenvalue of the first component were at least three times as large as that of the second component, taking about fifty percent of the sum of all eigenvalues. For the peer problem

subscale, the first component explained one-third of the total variance and was two times as large as the second eigenvalue. We concluded that the leading dimension of each of the five subscales was strong enough for the application of the Guttman index (Conijn et al., 2019). Regarding the assumption monotonicity of each item, significant violations were found in 5 items. One

significant violation occurred in Item 14, Item 16, Item 20, and Item 22, respectively, and seven significant violations occurred in Item 6. We concluded that the ordering of the response options followed latent monotonicity for most items.

Main analyses

We computed 𝐿𝑚𝑎𝑥 for the paired child-teacher subset of the data (N = 267) and 𝐺𝑝 for the complete child data (N = 785). For 𝐿𝑚𝑎𝑥, the mean values (and the classification rates of invalid cases) for a specific age group derived from the teacher-report were taken as the benchmark of an appropriate level of ‘validity issues.’ So, the discrepancy existing between 𝐿𝑚𝑎𝑥 values derived from the children’s self-reports and that from the teacher-reports were compared across age groups. For 𝐺𝑝, comparisons were conducted across age groups without a complementary comparison with the teacher-reports (i.e., due to the problems identified with the application of 𝐺𝑝 to the teacher data). For both 𝐿𝑚𝑎𝑥 and 𝐺𝑝, line graphs were presented to detect a possible age effect visually.

(20)

Maximum long string index (𝑳𝒎𝒂𝒙): detecting repetitive responding Distribution of the index.

Most respondents selected the same option successively for two or three times at most (Figure 1). After pooling the values of 𝐿𝑚𝑎𝑥 derived from all children’s self-report and teacher-report, the cutoff at the 90th percentile was three. This cutoff value coincided with the value of the break-off point detected visually from the two histograms for children’s self-report and teacher-report. The cutoff value of three implied that response patterns with four or more

repeated answers were classified as “invalid." For children’s self-report, 34 cases (12.7.%) were classified as invalid, and for teacher-report, 26 cases (9.7%) were classified as invalid.

Main result.

The left panel of Figure 2 shows the group mean values of 𝐿𝑚𝑎𝑥 for children’s self-report and teacher-report. All the group mean values hovered around 2.5 and 3.0. The largest child-teacher discrepancy happened for the 12-year group. However, the estimated 95% confidence interval of group mean 𝐿𝑚𝑎𝑥 for children’s self-report overlapped with that for the teacher-report for all age groups. Therefore, no substantial differences between children’s self-report and teacher-report existed for any age group.

The right panel of Figure 2 shows the classification rate of invalid cases for each age group, respectively, for children’s self-report and teacher-report. For the children’s self-report, the classification rates roughly decreased with age: 17.4% for the 8-year group and 7.7% for the 12-year group. For the teacher-report, the classification rates went up and down from the 8-year group to the 12-year group (between 6.5%. and 13.5%). The cross-informant comparisons

(21)

all age groups. Therefore, no substantial discrepancy was found between the classification rates of children’s self-report and teacher-report for any age group.

To conclude, no substantial difference was found between children’s self-report and teacher-report in terms of repetitive response bias.

Guttman index (𝑮𝒑): detecting inconsistent responding Distribution of the index.

Figure 3 shows that the number of Guttman errors in each response pattern ranged from zero to 30. The cutoff at the 90th percentile was 12.32, indicating that responses with 𝐺𝑝 values larger than 12 were classified as "invalid."

Main result.

The left panel of Figure 4 shows that the group mean value of 𝐺𝑝 decreased with age. Self-reports from the 8-year group contained the most Guttman errors on average (M = 8.46), and the self-report from the 12-year group included the least (M = 5.08). From the line graph, we noticed that the decrease in 𝐺𝑝 value wore off from the 10-year group to the 12-year group. But because the 95%CIs of adjacent groups always overlapped with each other, we concluded that there was no substantial decrease from age to age in terms of the absolute number of Guttman errors. However, the clear decreasing pattern and the substantial decrease from eight-year group to the 10-year group still suggest that 10-years is a critical point for the ability to provide a consistent response pattern.

The right-hand panel of Figure 4 shows the classification rate of invalid response patterns for each age group in children’s self-report. A decreasing pattern can be observed. The

classification rate for the 8-year group was the highest (n = 24, 22%), implying that a larger proportion of response patterns from 8-year children were identified as “invalid” than that of

(22)

response patterns from any other age group (14% for 9-year group; 6% for 10-year group, 11-year group, and 12-11-year group). Similar to the pattern of 𝐺𝑝 mean values, the decrease of classification rate from year to year was not significant because of the overlapping CIs.

However, the decrease from the 8-year group to the year group was significant. From the 10-year group to the 12-10-year group, the classification rate stabilized.

To conclude, older children provided less inconsistent responses than younger children did. A larger proportion of response patterns from younger groups were estimated to be “invalid” comparing to response patterns from older groups. Ten-years seemed to be a critical point for the decrease of inconsistent responding since there appeared the only turning point for 𝐺𝑝 mean value and classification rate.

Sensitivity analysis

In this step, we used the normed Guttman index (𝐺𝑛𝑝) to replicate the result of Guttman index (𝐺𝑝) analysis.

As shown in Figure 5, the value of the normed Guttman index (𝐺𝑛𝑝) ranged from zero to 0.7. The cutoff value at the 90th percentile was 0.33. It implied that response patterns with 𝐺𝑛𝑝 larger than 0.33 were classified as "invalid."

The group mean value of 𝐺𝑛𝑝 and classification rate for each age group demonstrated similar patterns as those shown by 𝐺𝑝 in the main analysis (Figure 5). Specifically, the group mean decreased from 8-year group (0.22), 9-year group (0.20), 10-year group (0.17), 11-year group (0.16), to 12-year group (0.15). Although the decrease from age to age was not substantial because of the overlapped 95% CIs, the decrease was significant from the 8-year group to the 10-year group. From 10-10-years to 12-10-years, the mean value of 𝐺𝑛𝑝 stayed relatively stable. Therefore, ten-years again seemed to be a critical point on the decreasing line.

(23)

The classification rate decreased as age increased except for the 12-year group where a larger proportion of response patterns were classified as invalid than the 11-year group. The sharpest drop occurred from the 9-year group (15%) to the 10-year group (7%) with a drop rate of more than 50%. However, the estimated 95% CIs for the 9-year group and 10-year group overlapped with each other, implying non-substantial differences. From the 10-year group to the 12-year group, the classification rate stayed below 10%, with a fluctuation between 5% (11-year group) and 9% (12-year group). Although the classification rate visually increased from 11-year group to 12-year group, because of the overlapping CIs, a horizontal line may still be assumed from the 10-year group to the 12-year group.

To conclude, the results of the sensitivity analysis (using 𝐺𝑛𝑝) mostly resembled the results of the main analysis (using 𝐺𝑝). How inconsistent response bias interfered with children’s self-report from 8-year old to 12-year old maintained after partialling out the influence of total scores. Generally, age had a negative effect on the value of Guttman indices (𝐺𝑝 and 𝐺𝑛𝑝) and the corresponding classification rates of invalid response patterns. From the 8-year group to the 12-year group, children’s self-report on SDQ showed fewer inconsistent patterns. Smaller

proportions of response patterns were classified as invalid for the older groups than for the younger groups. Both the group mean values and the classification rates for age groups were relatively stable from the 10-year group to the 12-year group. Thus, ten years was the suggested minimum age for children to provide sufficiently valid self-report for SDQ.

Conclusion and Discussion

This research explored the minimum age at which children can generally provide sufficiently valid self-reports on the SDQ. Two types of validity-indices, the maximum long string index and the Guttman index, were employed to quantify repetitive response bias and

(24)

inconsistent responding in children’s self-reports, respectively. Based on the between-informant comparisons (i.e., children’s self-report and teacher-report) of 𝐿𝑚𝑎𝑥 and the corresponding classification rate of invalid response patterns, children’s self-reports seemed to be biased by a repetitive response pattern at a comparable degree to the teacher-report. The analyses of the Guttman indices (𝐺𝑝 and 𝐺𝑛𝑝) showed that the inconsistencies in children’s self-report decreased as age increased. From age-10 onwards, the index value and the classification rate of invalid response patterns stayed at a relatively stable level.

These findings suggest that (1) children aged-10 and above can provide sufficiently valid self-report for the SDQ; (2) inconsistent response bias interferes with the response behavior of children below 10 years on SDQ; repetitive response bias seems not to significantly interfere the response behavior of children aged eight and above. Nevertheless, the 95% CIs of the

classification rate for the Guttman indices (𝐺𝑝 and 𝐺𝑛𝑝) overlapped for the 9-year group and 10-year group. It remains unclear whether or not 9-10-year-old children can provide self-reports with similar quality as 10-year old children, and therefore, more psychometric evidence is needed to stress this issue.

This research provides complementary evidence about the quality of children’s self-report on the SDQ concerning the age effect on response biases, helping us disentangle the inconsistent age patterns revealed by Cronbach’s 𝛼 and interrater correlation reported in previous studies as well as the present study. The age pattern of the Cronbach’s 𝛼 for children's self-report in the present study was generally comparable to Muris et al.’s (2003) and Muris et al.’s (2004). The child-teacher correlations in this study were generally higher than the average informant-child correlations reported in the meta-analysis by Achenbach, McConaughy, and Howell (1987) and comparable to the reported values by previous studies on SDQ (Muris et al., 2003;

(25)

Goodman, 2001). However, we did not find a consistent pattern for Cronbach’s 𝛼 or the child-teacher correlation, for example, an increase in these values with increasing age. In contrast, using the validity-index approach, we found a significant age pattern in the Guttman indices. So, the validity-index approach seems to provide useful and complementary information to standard psychometric approaches. Future studies on the validation of self-report child measures should consider using the validity-index approach to provide an integrative suggestion about the minimum age of application.

This study, however, had several limitations. Due to the properties of the dataset, our conclusion should be interpreted with great caution. A first limitation was that the SDQ used to collect data was an adapted version, which converted the original 3-point scale into the 5-point scale. The adapted version was neither validated nor, by the best of our knowledge, used in other research. A 5-point scale is cognitively more complicated than a 3-point scale because children have to select between two options with the same valence (Royeen, 1985). It is reasonable to assume that the minimum age for applying the original SDQ version, using the 3-point scale, may be lower than the suggested age-10 in this study. Thus, the results in the current study are of limited practical value. To broaden the implications of this study, we repeated the analysis of the Guttman index (𝐺𝑝) by transforming the 5-option data to 3-option data (see Appendix B)1. We got the same results as in the main study: the self-reports from children aged ten and above were biased much less by inconsistent responding than the self-reports from younger children. This suggests that the minimum age of 10-years may also apply to data collected with the original SDQ that has a 3-point response scale. A second limitation was that no appropriate paired teacher-report was available for conducting the between-informant comparison for Guttman

1The analysis of maximum long string index was not repeated because we assumed that repetitive

(26)

index analyses. Therefore, we cannot claim that self-reports from children aged ten and above were equally valid as corresponding informant reports from adults, such as teachers. The results only imply that from 10-years on, there seemed to be a negligible age effect on the consistency of children’s self-reports. Given these limitations, further studies should be conducted using informant-child paired data collected using the original version of the SDQ.

Despite the limitations regarding the practical implications for use of the original SDQ, our study provides exciting results regarding biases in children’s self-report data that uses a 5-point scale. Our conclusion regarding the Guttman indices that children aged ten and above can provide valid response patterns is consistent with the conclusion of Conijn et al.’s (2019), which employed the validity-index approach to investigate the minimum age of applying the self-report version of PedsQL. The PedsQL is a self-report measure of the quality of life with 23 items (5 to 8 per subscale) and 5 response options for each item, which is similar to the adapted SDQ. The combined results of the two studies imply that 10-years might be a milestone for children to provide consistent self-report on a 5-point Likert scale, given that the item stems are simple like those of the self-report version of SDQ and PedsQL.

This study can be taken as a second example of using the validity-index approach to study the minimum age of children for providing a valid self-report. The first example provided by Conijn et al. (2019) was limited because only one validity-index, the Guttman index, was employed. For the present study, the long string index was used as well as the Guttman index, because the items from the five subscales were in mixed order and the positively worded items were mixed with the negatively worded items. These two examples suggest together that Guttman indices, including 𝐺𝑝 and 𝐺𝑛𝑝, were powerful for detecting inconsistent responding (Niessen et al., 2016). Nevertheless, the analysis of the maximum long string index did not

(27)

produce reliable results with an apparent age pattern, which might be attributed to the small sample size of paired child-teacher data to some degree. Therefore, more research is needed on the value of the long string index.

The current study has some important implications for the future applications of the validity-index approach as a method to investigate the minimum age for using a self-report child measure. For example, this study compared the means of Guttman indices derived from the five subscales and investigated the general pattern of inconsistent responding. In the future,

comparisons should be conducted further for subscales. Validity-index analyses for subscales will detect children’s inconsistent response patterns in reporting different aspects, for example, emotional problems, behavior problems, or peer relationships. In previous studies on the self-report version SDQ, Cronbach’s 𝛼 for the peer problem subscale and the conduct problem subscale was usually unacceptable (below .60, Goodman, 2001; van Wildenfelt et al., 2003; Muris et al., 2003); in the current study, the Cronbach’s 𝛼 for the peer problem subscale and prosocial behavior subscale was generally unacceptable neither. If the relation between the respondent's age and the Guttman index can be computed and analyzed for subscales, we may get more informed about to what extent children's self-report on different subscales were biased by their inconsistent responding behavior. A second idea for future studies is to investigate the statistical relationship between age and validity-index (and the classification rate), such as regressing validity-index on age or using contrast code to estimate the difference between adjacent age groups. The present study detected the change of the mean validity-index and the classification rate visually by the scree plot. This approach was useful as a diagnostic tool but not informative enough to draw a statistical conclusion.

(28)

To conclude, this study was the first attempt to use the validity-index approach to study the minimum age for children to provide valid self-reports on the SDQ. The maximum long string index and Guttman indices were employed to quantify repetitive response bias and inconsistent responding. Due to properties of the dataset resulting from the adaptation of the questionnaire, we drew our conclusion of a minimum self-report age of 10-years with limited confidence. Future research should assess children’s response biases using data collected by the original version of the SDQ. Also, paired data from adult informants (teachers, parents, or both) and children should be used to get more informative results and more reliable conclusions.

(29)

Reference

Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational

specificity. Psychol Bull, 101, 213–232.

Alwin, D. F. (1992). Information transmission in the survey interview: number of response categories and the reliability of attitude measurement. Sociological Methodology, 22, 83– 118.

Becker, A., Hagenberg, N., Roessner, V., Woerner, W., & Rothenberger, A. (2004). Evaluation of the self-reported SDQ in a clinical setting: Do self-reports tell us more than ratings by adult informants? European Child & Adolescent Psychiatry, 13, ii17-ii24.

Chambers, C. T., & Johnston, C. (2002). Developmental differences in children’s use of rating scales. Journal of Pediatric Psychology, 27, 27-36.

Conijn, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38, 122-136. Conijn, J. M., Franz, G., Emons, W. H. M., De Beurs, E., & Carlier, I. V. E (2019). The

assessment and impact of careless responding in routine outcome monitoring within mental health care. Multivariate Behavioral Research, 54:4, 593-611, DOI:

10.1080/00273171.2018.1563520

Conijn, J. M., Smits, N., & Hartman, E. E. (2019). Determining at What Age Children Provide Sound Self-Reports: An Illustration of the Validity-Index

(30)

Curvis, W., McNulty, S., & Qualter, P. (2014). The validation of the self-report strengths and difficulties questionnaire for use by 6- to 10-year-old children in the U.K. British Journal of Clinical Psychology, 53(1): 131-137.DOI: 10.1111/bjc.12025

Damon, W., & Hart, D. (1988). Cambridge studies in social and emotional development. Self-understanding in childhood and adolescence. Cambridge: Cambridge University Press. Davis, E., Nicolas, C., Waters, E., Cook, K., Gibbs, L., Gosch, A., & Ravens-Sieberer, U. (2007).

Parent-proxy and child self-reported health-related quality of life: using qualitative methods to explain the discordance. Quality of Life Research, 16, 863-871.

Davis-Kean, P. E., Sandler, H. M., Davis-Kean, P. E., & Sandler, H. M. (2001). A meta-analysis of measures of self-esteem for young children: a framework for future measures. Child Development, 72(3), 887–906. https://doi-org.proxy.uba.uva.nl:2443/10.1111/1467-8624.00322

Deighton, J., Croudace, T., Fonagy, P., Brown, J., Patalay, P., & Wolpert, M. (2014). Measuring mental health and wellbeing outcomes for children and adolescents to inform practice and policy: a review of child self-report measures. Child and Adolescent Psychiatry and Mental Health 8: 14. DOI: 10.1186/1753-2000-8-14

DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2014). Best practice recommendations for data screening, J. Organiz. Behav., 36, 171– 181, doi: 10.1002/job.1962

Eddy, L., Khastou, L., Cook, K. F., & Amtmann, D. (2011). Item selection in self-report measures for children and adolescents with disabilities: lessons from cognitive interviews. Journal of Pediatric Nursing, 26, 559-565.

(31)

Eder, R. A. (1989). The emergent personologist: the structure and content of 3½-, 5½-, and 7½-year-olds’ concepts of themselves and other persons. Child Development, 60(5), 1218. https://doi-org.proxy.uba.uva.nl:2443/10.2307/1130795

Emons, W. H. M. (2008). Nonparametric Person-Fit Analysis of Polytomous Item Scores. Applied Psychological Measurement, 32(3), 224–247.

https://doi.org/10.1177/0146621607302479

Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A research note. Journal of Child Psychology and Psychiatry 38: 581-586. doi:10.1111/j.1469-7610.1997.tb01545.x

Goodman, R. (2001). Psychometric properties of the Strengths and Difficulties Questionnaire. Journal of the American Academy of Child and Adolescent Psychiatry 40:1337-1345, https://doi.org/10.1097/00004583-200111000-00015.

Goodman, R., Ford, T., Simmons, H., Gatward, R., & Meltzer, H. (2000). Using the Strengths and Difficulties Questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. British Journal of Psychiatry 177: 534-539,

doi:10.1192/bjp.177.6.534

Goodman, R., Meltzer, H., & Bailey, V. (1998). The Strengths and Difficulties Questionnaire: a pilot study on the validity of the self-report version. European Child and Adolescent Psychiatry 7:125–130, https://doi.org/10.1007/s007870050057

Krosnick, J. A., & Presser, S. (2009). Question and Questionnaire Design. In Wright, J. D., & Marsden, P. V. (Eds). Handbook of Survey Research (2nd Edition). San Diego, CA: Elsevier.

(32)

Landgraf, J. M., van Grieken, A., & Raat, H. (2018). Giving voice to the child perspective: Psychometrics and relative precision findings for the Child Health Questionnaire self-report short form (CHQ-CF45). Quality of Life Research, 27, 2165-2176.

Liu, Y., Wu, A. D., & Zumbo, B. D. (2010). The Impact of Outliers on Cronbach’s Coefficient Alpha Estimate of Reliability: Ordinal/Rating Scale Item Responses. Educational and Psychological Measurement, 70(1), 5–21. https://doi.org/10.1177/0013164409344548 Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.

Psychological Methods, 17, 437-455.

Muris, P., Meesters, C., Eijkelenboom, A., & Vincken, M. (2004). The self‐report version of the Strengths and Difficulties Questionnaire: Its psychometric properties in 8‐ to 13‐year‐old non‐clinical children. British Journal of Clinical Psychology, 43, 437-448.

https://doi.org/10.1348/0144665042388982

Muris, P., Meesters, C., & van den Berg, F. (2003). The Strengths and Difficulties

Questionnaire (SDQ): further evidence for its reliability and validity in a community sample of Dutch children and adolescents. European Child & Adolescent Psychiatry 12: 1. https://doi.org/10.1007/s00787-003-0298-2

Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1-11. https://doi.org/(...)16/j.jrp.2016.04.010

Peer, E., & Gamliel, E. (2011). Too reliable to be true? Response bias as a potential source of inflation in paper-and-pencil questionnaire reliability. Practical Assessment, Research & Evaluation, 16, 1-8.

(33)

Raiche, G. (2010). nFactors: An R package for parallel analysis and non-graphical solutions to the Cattell scree test. Retrieved from https://cran.r-project.org/web/packages/nFac tors/nFactors.pdf

Rey, J. M., Schrader, E., & Morris-Yates, A. (1992). Parent-child agreement on children’s behaviors reported by the Child Behavior Checklist (CBCL). Journal of Adolescence, 111, 8-9.

Riley, A. W. (2004). Evidence That School-Age Children Can Self-Report on Their Health, Ambulatory Pediatrics 4:371-376.

Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126, 3-25.

Royeen, C. B. (1985) Adaption of Likert scaling for use with children. Occupational Therapy Journal of Research, 5(1), 59–69.

Shelton, K. K., Frick, P. J., & Wootton, J. (1996). Assessment of parenting practices in families of elementary school age children. Journal of Clinical Child Psychology, 25, 317-329. Tendeiro, J., Meijer, R., & Niessen, A. (2016). PerFit: An R Package for Person-Fit Analysis in

IRT. Journal of Statistical Software, 74(5), 1 - 27. doi: http://dx.doi.org/10.18637/jss.v074.i05

U.S. Food and Drug Administration. (2009). Guidance for industry: Patient-reported outcome measures: Use in medical product development to support labeling claims. Retrieved from https:// www.fda.gov/downloads/drugs/guidances/ucm193282.pdf

van der Ark, L. (2007). Mokken Scale Analysis in R. Journal of Statistical Software, 20(11), 1-19. doi: http://dx.doi.org/10.18637/jss.v020.i11

(34)

van Roy, B., Veenstra, M., & Clench-Aas, J. (2008). "Construct validity of the five-factor Strengths and Difficulties Questionnaire (SDQ) in pre-, early, and late adolescence." Journal of Child Psychology and Psychiatry. 49(12): 1304-1312.DOI: 10.1111/j.1469-7610.2008.01942.x

Vaske, J. J., Beaman, J., & Sponarski, C.C. (2017). Rethinking Internal Consistency in Cronbach's Alpha. Leisure Sciences 39.2: 163–173.

DOI: 10.1080/01490400.2015.1127189.

Vugteveen, J., de Bildt, A., Serra, M., de Wolff, M. S., & Timmerman, M. E. (2018).

Psychometric Properties of the Dutch Strengths and Difficulties Questionnaire (SDQ) in Adolescent Community and Clinical

populations. Assessment. https://doi.org/10.1177/1073191118804082

Waters, E., Stewart-Brown S., & Fitzpatrick, R., (2003). Agreement between adolescent self-report and parent self-reports of health and well-being: results of an epidemiological study, Child Care, Health & Development, DOI 10.1046/j.1365-2214.2003.00370.x

Zee, M., & Roorda, D. L. (2018). Student-teacher relationships in elementary school: The unique role of shyness, anxiety, and emotional problems, Learning and Individual Differences, 67, 156-166, https://doi.org/10.1016/j.lindif.2018.08.006.

(35)

Table 1

Distribution of response frequency for child’s self-report and teacher-report

Item Subscale Child’s self-report Teacher’s report

1. I try to be nice to other people. PRO Ceiling effect Ceiling effect

2. I am restless, I cannot stay still for long. HYP Floor effect Bimodal

3. I get a lot of headaches, stomach-aches or sickness. EP Floor effect Floor effect 4. I usually share with others (candy, toys, pencils and so on.) PRO Ceiling effect Ceiling effect

5. I get very angry and often lose my temper. CP Floor effect Floor effect

6. I am usually on my own. I generally play alone or keep to myself. PP Floor effect Bumpy

7. I usually do as I am told. CP Bell-shaped Ceiling effect

8. I worry a lot. EP Bell-shaped Bumpy

9. I am helpful if someone is hurt, upset or feeling ill. PRO Ceiling effect Ceiling effect

10. I am constantly fidgeting or squirming. HYP Floor effect Bimodal

11. I have one good friend or more. PP Bimodal Ceiling effect

(36)

13. I am often unhappy, down-hearted or tearful. EP Floor effect Floor effect

14. Other people my age generally like me. PP Ceiling effect Ceiling effect

15. I am easily distracted. HYP Bell-shaped Bimodal

16. I am nervous in new situations. EP Bell-shaped Bumpy

17. I am kind to younger children. PRO Ceiling effect Ceiling effect

18. I am often accused of lying or cheating. CP Floor effect Floor effect

19. My classmates bully me or pick on me. PP Floor effect Floor effect

20. I often volunteer to help others. PRO Ceiling effect Ceiling effect

21. I think before I do things. HYP Ceiling effect Ceiling effect

22. Sometimes, I take things that are not mine. CP Floor effect Floor effect 23. I get on better with adults than with children my age. PP Bell-shaped Floor effect

24. I have many fears, I am easily scared. EP Floor effect Floor effect

25. I finish the work I'm doing. I am able to stay focus. HYP Bell-shaped Ceiling effect

Note. These are the items of the self-report version of the Strength and Difficulty Questionnaire (SDQ). The items of the teacher version replace the subject “I” with “This student.” EP = Emotional Problem, CP = Conduct Problem, HYP = Hyperactivity, PP = Peer Problem, PRO = Prosocial Behavior.

(37)

Table 2

Group mean score (M), standard deviation (SD), and child-teacher difference (Cohen’s d) for the SDQ subscale scores by age groups and by different informants

Note. EP = Emotional Problem, CP = Conduct Problem, HP = Hyperactivity, PP = Peer Problem, PRO = Prosocial Behavior, d = Cohen’s d Subscale Informant 8-year group (N = 46) 9-year group (N = 59) 10-year group (N =65) 11-year group (N = 69) 12-year group (N = 28) Total (N = 267) EP Child 10.46 (3.82) 10.56 (3.99) 10.94 (3.52) 10.46 (3.67) 9.54 (3.34) 10.5 (3.69) Teacher 8.7 (4.4) 9.37 (4.68) 10.11 (4.32) 9.52 (4.28) 8.21 (4.38) 9.35 (4.42) d 0.30 0.22 0.17 0.19 0.27 0.28 CP Child 9.78 (3.16) 9.88 (3.66) 8.61 (2.83) 9.09 (3.63) 8.29 (3.11) 9.17 (3.35) Teacher 7.85 (3.68) 7.29 (3.28) 7.15 (2.85) 7.17 (3.26) 7.54 (3.64) 7.35 (3.27) d 0.43 0.71 0.55 0.49 0.23 0.55 HP Child 13.38 (4.51) 13.49 (5.13) 13.11 (3.82) 12.37 (4.8) 11.46 (4.39) 12.88 (4.58) Teacher 11.65 (5.64) 12.02 (5.48) 11.95 (5.09) 10.83 (5.54) 10.89 (5.05) 11.51 (5.37) d 0.31 0.26 0.24 0.24 0.14 0.27 PP Child 10.09 (2.83) 9.39 (3.26) 9.45 (2.98) 8.54 (3.19) 9.67 (3.19) 9.34 (3.11) Teacher 8.07 (2.95) 8.97 (3.3) 9.38 (3.86) 9.16 (3.57) 7.82 (3.07) 8.84 (3.46) d 0.54 0.13 0.02 0.17 0.46 0.15 PRO Child 20.37 (3.29) 20.97 (2.85) 21.57 (2.02) 21.35 (3.05) 22 (2.8) 21.21 (2.82) Teacher 20.54 (3.93) 20.78 (3.79) 20.15 (3.36) 20.61 (3.73) 21.68 (2.86) 20.64 (3.61) d 0.04 0.05 0.45 0.17 0.13 0.18

(38)

Table 3

Child-reported subscale scores across age groups (N = 785) and the standardized group mean difference between adjacent age groups Subscale 8-year group

(N = 107) 9-year group (N = 173) 10-year group (N =201) 11-year group (N = 211) 12-year group (N = 85) EP 10.88 (3.62) 11.20 (4.03) 11.23 (3.66) 10.70 (3.39) 10.04 (3.46) d .081 .012 .153 .204 CP 9.45 (3.07) 9.36 (3.18) 8.85 (3.16) 8.64 (3.17) 8.18 (2.85) d .03 .16 .06 .15 HP 12.70 (4.34) 13.01 (4.63) 12.70 (4.49) 12.81 (4.53) 11.52 (4.25) d .09 .09 .02 .29 PP 9.99 (3.19) 9.33 (2.93) 9.35 (2.98) 8.89 (3.10) 9.42 (2.95) d .22 .01 .15 .17 PRO 20.93 (3.09) 21.18 (2.70) 21.55 (2.43) 21.30 (2.81) 21.98 (2.42) d .09 .15 .10 .25

Note. The sample size of each age group varies across subscales. EP = Emotional Problem, CP = Conduct Problem, HP =

Hyperactivity, PP = Peer Problem, PRO = Prosocial Behavior, d = Cohen’s d; 19-year group vs 8-year group; 210-year group vs 9-year group; 311-year group vs 10-year group; 412-year group vs 11-year group.

(39)

Table 4.

Internal consistency (Cronbach’s 𝛼) with 95% Confident Interval and child-teacher correlation across age groups and subscales.

Subscale Informant 8-year group (N = 46) 9-year group (N = 59) 10-year group (N =65) 11-year group (N = 69) 12-year group (N = 28) Total (N = 267) EP Child .63 (.46-.74) .73 (.57-.82) .62 (.38-.75) .69 (.54-.78) .68 (.46-.78) .67 (.59-.73) Teacher .86 (.75-.92) .87 (.79-.91) .85 (.79-.90) .83 (.73-.89) .87 (.71-.90) .85 (.82-.88) r (child, teacher) -.01 .25 .40 .25 .23 .25 CP Child .54 (.24-.71) .65 (.44-.78) .57 (.17-.76) .65 (.48-.77) .55 (.18-.75) .61 (.50-.68) Teacher .82 (.70-.88) .79 (.62-.88) .75 (.57-.84) .80 (.69-.86) .79 (.63-.84) .79 (.74-.83) r (child, teacher) .16 .45 .43 .36 .53 .37 HP Child .67 (.45-.80) .81 (.70-.88) .65 (.46-.76) .83 (.73-.89) .76 (.32-.88) .77 (.71-.81) Teacher .86 (.77-.92) .87 (.80-.92) .89 (.81-.93) .90 (.85-.94) .85 (.79-.90) .88 (.85-.90) r (child, teacher) .33 .44 .44 .34 .64 .42 PP Child -.04 (-.75-.35) .32 (-.07-.55) .30 (.03-.51) .53 (.16-.72) .31 (-.23-.58) .33 (.18-.45) Teacher .63 (.31-.81) .63 (.39-.77) .78 (.64-.87) .72 (.57-.82) .53 (.28-.70) .69 (.61-.75) r (child, teacher) .16 .32 .41 .24 .20 .26 PRO Child .62 (.32-.78) .43 (.02-.64) .35 (.01-.55) .77 (.65-.83) .65 (.07-.82) .59 (.48-.68) Teacher .84 (.73-.90) .86 (.76-.92) .84 (.74-.90) .88 (.81-.92) .72 (.55-.81) .85 (.80-.88) r (child, teacher) .25 .29 .40 .22 .38 .28

(40)

Figure 1. Distribution of 𝐿𝑚𝑎𝑥 value derived from children’s self-report (left-panel) and teacher-report (right panel).

(41)

Figure 2. Mean value of 𝐿𝑚𝑎𝑥 and classification rate for each age group, including 95%

confidence intervals, for children's self-report (Solid line) and teacher-report (dashed line). The lower value of the estimated 95% confidence interval of classification rate derived from

children's self-report for the 8-year group overlapped with the classification rate derived from teacher-report. Both were 6.5%.

(42)
(43)

Figure 4. Mean value of Gp with 95% confidence intervals across age groups (left panel) and classification rate with 95% confidence intervals across age groups (right panel) derived from children’s self-report (N = 785).

(44)
(45)

Figure 6. Mean value of 𝐺𝑛𝑝 with 95% confidence intervals across age groups (left panel) and classification rate with 95% confidence intervals across age groups (right panel) derived from children’s self-report (N = 785).

(46)

Appendix A Table A1

Results of the dimensionality analysis for the children’s self-report SDQ (N = 740) and for teacher-report SDQ (N = 276) Subscale Informant Percentage variance explained by 1st component Percentage variance explained by 2nd component Number of components based on parallel analysis λ2/λ1 EP Child 47 17 1 0.36 Teacher 71 10 1 0.15 CP Child 46 17 1 0.38 Teacher 71 12 1 0.16 HP Child 57 16 1 0.28 Teacher 74 12 1 0.16 PP Child 36 20 1 0.54 Teacher 56 17 1 0.30 PRO Child 46 15 1 0.33 Teacher 68 12 1 0.17

Note. EP = Emotional Problem, CP = Conduct Problem, HP = Hyperactivity, PP = Peer Problem, PRO = Prosocial Behavior. λ1: the first eigenvalue; λ2: the second eigenvalue.

Referenties

GERELATEERDE DOCUMENTEN

We set up an experiment in which participants played a game designed to induce maximum challenge, with player characters frequently dying, without the players giving up.. In

: insights on player experience from physiological and self-report measures Citation for published version (APA):.. Hoogen, van

In het voorjaar van 2001 hebben het LEI en RIVM het mestoverschot in 2003, op grond van het mineralen- aangiftesysteem Minas, berekend op circa 8 miljoen kg fosfaat.. Naar

In this study we were able to address this issue using nine DEMs taken at di fferent years and from different sensors to study the change in loose material volume caused by co-seismic

The children rated their own aggressive behaviour on the adapted self-report version of the Instrument for Reactive and Proactive Aggression (Polman et al., TABLE 1. 

One of the most significant developments in international human rights law for 2018 has been the adoption of the first General Recommendation (GR) ex- clusively dedicated to

If this knowledge is however not available or we want to know the absorption coefficient for the in situ sound field, we can still approximate the incident intensity, as we

Moreover, inspired by the method pro- posed in Tallapragada and Chopra (2014), a dwell-time (Cao and Morse (2010)) is used to exclude Zeno, which can simplify the implementation of