• No results found

Language Assessment in SLI: Standardized testing versus language sample analysis

N/A
N/A
Protected

Academic year: 2021

Share "Language Assessment in SLI: Standardized testing versus language sample analysis"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Morphosyntactic Assessment in SLI

2014

June 17

Standardized testing versus language sample analysis

Graduate School for Humanities MA General Linguistics

(2)

II

(3)

Abstract

BACKGROUND: Morphosyntax is an area of language development often affected in

children with specific language impairment (SLI). Morphosyntactic development can either be assessed by standardized norm-referenced tests or by language sample analysis. Clearly, each method has its own strengths and weaknesses. Comparable studies as well as clinical observations have shown mismatches between the results of both methods. Yet, a comparison of the results of the two methods has not been conducted for Dutch instruments. AIMS: In the current thesis, the central question is whether performance on standardized language tests and results of language sample analysis lead to comparable conclusions about the language development of the children with SLI. Since morphosyntax is the only language domain assessed by all three diagnostic instruments, this domain will be the focus of the current study. METHODS & PROCEDURE: Test files of ten 4;0-8;0 year-old children clinically diagnosed with SLI and tested by either the CELF-4-NL and STAP or Schlichting Test for Sentence Development and STAP were assessed qualitatively. Group performance, individual performance and problem structures were evaluated. RESULTS: The group level analysis showed that: complexity scores of the STAP are higher than grammaticality scores for nearly all children; norm-referenced tests correspond most to the complexity scores of the STAP; and the STAP scores in general correspond most to the Schlichting and to the CELF subtest Formulated Sentences. Analysis of the individual subjects showed a low correspondence between performances on both methods. In only three out of ten subjects, both test methods led to the same conclusions. The methods also differed considerably in the structures they identified as problem structures. CONCLUSION: The current study reveals only a slight

correspondence between performance on standardized tests for morphosyntactic assessment and the results on the instrument for language sample analysis. Although in practice both methods have proved to be a good clinical tool for the assessment of children with SLI, these results indicate that caution is needed in choosing one method over the other since the tasks are not equivalent. Until more extensive studies prove otherwise, the current procedure (i.e. the use of spontaneous language as an addition to standardized language tests) seems justified.

(4)

Table of Contents

Acknowledgements ... II Abstract ... III

1. Introduction ... 2

2. Background ... 4

2.1 Diagnosing children with specific language impairment ... 4

2.2 Comparison of diagnostic instruments used to assess language disorders ... 7

2.3 The current study... 11

3. Linguistic and clinical aspects tested by CELF, SCHLICHTING and STAP ... 13

3.1 CELF-4-NL ... 13

3.1.1 Materials ... 14

3.1.2 Procedure ... 14

3.1.3 Scores and interpretation... 15

3.1.4 Psychometric properties ... 15

3.2 Schlichting Test for Language Production ... 16

3.2.1 Materials ... 17

3.2.2 Procedure ... 17

3.2.3 Scores and interpretation... 18

3.2.4 Psychometric properties ... 18

3.3 STAP ... 18

3.3.1 Materials ... 19

3.3.2 Procedure ... 19

3.3.3 Scores and interpretation... 19

3.3.4 Psychometric properties ... 20

3.4 Comparison of the CELF, Schlichting and STAP ... 21

4. Methods ... 24 4.1 Subjects ... 24 4.2 Data collection ... 25 4.3 Analysis ... 25 5. Results ... 29 5.1 Scores ... 29 5.2 Case studies ... 31

(5)

References ... 40

Appendix 1 Subtests of the CELF-4-NL used in the four-level assessment process ... 43

Appendix 2 Structures assessed by the Schlichting test for Sentence Development ... 44

Appendix 3 Structures assessed by the STAP ... 46

Appendix 4 STAP-profile & STAP-summary ... 47

Appendix 5 Comparison of structures assessed by the CELF-4-NL, Schlichting test for Sentence Development and STAP ... 50

Appendix 6 Example items of morphosyntactic structures shared by the CELF-4-NL and Schlichting test for Sentence Development ... 51

List of Tables and Figures

Table 2.1 The EpiSLI system. ... 6

Table 2.2 Strengths and weaknesses of norm-referenced tests and language sample analysis ... 10

Table 3.1 Correlations between standard deviations of CELF and Sentence Development of the Schlichting... 21

Table 3.2 Shared morphosyntactic structures assessed by the CELF, Schlichting and STAP ... 21

Tabel 4.1 Characteristics of subjects ... 25

Table 5.1 Scores of the children tested by the CELF and STAP ... 29

Table 5.2 Scores of the children tested by the Schlichting and STAP ... 29

Table 5.3 Comparison of the children tested by the CELF and STAP ... 29

Table 5.4 Comparison of the children tested by the Schlichting and STAP... 29

Figure 5.1 Scatterplot of scores of the children tested by the CELF and STAP ... 29

(6)

2

1.

Introduction

Immediately after birth, children start using their speech apparatus to produce sounds. This mainly comes down to crying at first, but a broader variation of sounds emerges soon after. Children are able to produce adult-like structures when they are only five years old, and when a child has reached his teens, language proficiency has developed so much that there is only some fine-tuning left to do.

The current study focuses on assessment of morphosyntactic development in 4;0-8;0 year old Dutch children. Four-year-old children are known to be in the penultimate stage of language acquisition, the stage in which they rapidly develop in all areas of language. Although they already possess most of the adult language-structures in this stage, they still need practice on how to consistently use these structures in the correct ways. Overregularization occurs frequently and exceptional structures have to be learned. Examples of developing structures and processes mentioned by Gillis and Schaerlaekens (2000) are reflexive pronouns, derivational morphology and the fluent production of passive, long and composed sentences.

The language development of children with a language deficit deviates or is delayed compared to the pattern displayed in typically developing children. Language domains most commonly affected in children with language impairment are phonology, morphology and syntax, but other domains may also be affected. For Dutch, Bol and Kuiken (1988) compared the spontaneous language of nineteen 4;1 to 8;2 year old children with specific language impairment (SLI) to a group typically developing language-age matched children (3;6-4;0 year old). They found some significant differences, to the disadvantage of children with SLI. These differences included: less frequent use of pronouns, possessive nouns and diminutives; less frequent use of conjugations of verbs in the first person singular; and less use of phrases that include articles and prepositions. Furthermore, children with SLI produced more incomplete sentences consisting of only two constituents and produced less sentences consisting of more than four constituents. Coordinations using the conjunctions maar (but) and want (for) were used less, as were questions including syntactic inversions consisting of three constituents and structures in which objects and adverbials were combined. Tense, agreement and verb argument structure were assessed by De Jong (1999), who studied Dutch school-aged children with (grammatical) SLI. He found that the inventory of past tense forms was limited and that, instead of using regular past tense forms, the children

(7)

preferred using a past tense form of the auxiliary gaan (‘go’) complemented by a verb infinitive. Deficits in subject-verb agreement included omission or substitution of the agreement morpheme and use of the infinitive instead of an inflected verb. Verb argument structure was also affected in children with SLI. Often this meant that the structures were low in complexity, but grammatically correct.

To assess language development in children, various instruments are being used. The current study assesses the use of standardized language tests and an instrument for language sample analysis in the diagnosis of Dutch children with SLI. The main question of this thesis is: To what extent do performance on standardized language tests and

results of language sample analysis lead to comparable conclusions? This question is

relevant, because it may reveal whether both methods are interchangeable in the process of diagnosing children with SLI.

The language tests that will be assessed in this study are the CELF-4-NL (Kort, Schittekate & Compaan, 2008) and the Schlichting Test for Sentence Development (Schlichting & lutje Spelberg, 2010). The instrument for language sample analysis that will be assessed is the STAP (Van den Dungen & Verbeek, 1994). Since morphosyntax is the only language domain assessed by all three diagnostic instruments, this domain will be the focus of the current study.

Based on clinical observations of the diagnostic instruments and their use, it is expected that correspondences as well as differences will be found when results of the different instruments are compared. Norm-referenced tests as well as language sample analyses provide a general view of a child’s morphosyntactic knowledge. Corresponding within-subject scores on these methods would therefore be expected. However, it is likely that the CELF, Schlichting and STAP measure different morphosyntactic structures or measure the same structures by using different techniques. This presumably lowers the correspondence.

The structure of this thesis is as follows: first in chapter 2, some light will be shed on earlier research about the diagnosis of children with specific language impairment. Moreover, the two methods for language assessment will be discussed. Chapter 3 will provide an evaluation of the assessment tools used in the current research. In chapter 4, the subjects and methods of the current research will be described. Chapter 5 provides an overview of the results, which will be discussed in chapter 6.

(8)

4

2.

Background

Although language impairments can be caused by a variety of underlying problems, this study focuses on the language development of children with a primary language disorder. The criteria adopted by various researchers and in research instruments for identifying children with SLI will be discussed in 2.1. In 2.2 a comparison of two methods for language assessment is provided and section 2.3 discusses the current study.

2.1 Diagnosing children with specific language impairment

Specific language impairment is a developmental disorder generally defined by exclusion criteria. Although the language of children with SLI deviates compared to typically developing children, their nonverbal IQ is within normal limits and they do not have sensory problems. Furthermore, these children have had normal opportunities for language learning and don not show signs of any other developmental disorders (Bishop 1992, Leonard 2000). Inclusion criteria are less frequently and consistently used. A number of studies did however formulate such criteria. Stark and Tallal (1981) devised a method for selecting children with SLI, based on both inclusion and exclusion criteria. For children with a performance IQ score of at least 85 who passed the other exclusion criteria described above, complementing inclusion criteria were applied. These children needed to have: a combined language age (LA) score of at least 12 months below either mental age (MA) or chronological age (CA), whichever was the lower; a receptive LA of at least 6 months below MA or CA; or an expressive LA of at least 12 months below MA or CA. Out of the 132 children aged 4;0-8;6 clinically classified as being language impaired, only 39 (29,54%) were selected as having SLI based on the inclusion criteria by Stark and Tallal. The majority of the exclusions were based on low performance IQ level. Plante (1998) argues that, although children with a non-verbal IQ below 70 should be excluded from SLI studies, a cut-off score of 85 might be too high. This argument was based on an earlier study by Swisher, Plante and Lowell (1994) in which was demonstrated that within-subject performance on IQ tests varied with a mean of 10 points, depending on the IQ measurement method used. Another objection to the method by Stark and Tallal came from Lahey (1990), who argued that identification of children with SLI should not be made based on a comparison of language age to mental age but to chronological age, which is most frequently used nowadays.

(9)

Stark and Tallal discuss the heterogeneity of the remaining children who were found to have SLI. Although children with severe expressive language deficits and articulation deficits had been excluded, the severity and nature of the deficits found in the remaining 39 children varied greatly. This led them to propose that ‘the classificatory term “Specific Language Deficit” may be a misleading one’ (Stark & Tallal, 1981: 122), because it does not refer to one single deficit. This proposition agrees with the more recent thoughts and findings about the existence of subtypes of SLI (Conti-Ramsden et al., 1997; Van Daal et al., 2004). Some of these proposed subtypes predominantly address morphosyntactic problems. In other subtypes the focus is more on phonological, articulation, semantic or pragmatic problems. Rapin and Allen (1987) proposed subtypes of SLI, including a phonologic-syntactic deficit and a lexical-syntactic deficit. The first deficit was characterized by children having articulation, phonology, morphology and syntax problems, the latter included children with syntax and morphology problems, word finding difficulties and expressive problems. These subtypes respectively corresponded to ‘cluster 1’ and ‘cluster 5’ found by Conti-Ramsden and her colleagues (1997). The focus of this study will be on subjects with morphosyntactic problems who possibly fall within one of these subtypes. Although the existence of different subtypes of SLI seems plausible, this debate will not be elaborated any further because it exceeds the purpose of the current study.

Another diagnostic system for identifying kindergarten children with SLI was designed by Tomblin, Records and Zhang (1996). The EpiSLI system was designed to aid the conduction of epidemiologic research on SLI. The system consisted of previously existing standardized tasks, addressing three domains of language: vocabulary, grammar and narrative; and two modalities: comprehension and production. Vocabulary was tested through a picture identification task and an oral identification task. Grammatical knowledge was tested by a grammatical comprehension task, a sentence imitation task and a grammatical completion task. Both the vocabulary and the grammatical tasks were selected from the TOLD-2:P (Test of Language Development-2 Primary: Newcomer & Hammill, 1988). A narrative production and comprehension screening task (Culatta, Page & Ellis, 1983) was used to assess narrative competence. Individual composite scores for each domain and each modality were calculated. This led to five scores, as shown in Table 2.1.

(10)

6

Table 2.1 Specific areas of language measured by the EpiSLI system and the composite scores derived

from these measures (Tomblin et al. 1996: 1287).

The diagnostic system and its composite scores as designed by Tomblin and his colleagues concur with the CELF-4-NL described in section 3.1. Tomblin et al. (1996) statistically computed the diagnostic standard in which ‘a language impairment is indicated if the child failed two or more of the five composite measures, where failing was a z-score of -1,25 or less’ (Tomblin et al. 1996: 1288). A total of 7,019 kindergarten children were sampled and screened for possible language impairment. Subsequently a subgroup of 1,502 children with the same proportion of screening failures as in the original group of 7,019, was given the diagnostic battery in Table 2.1. Of these children, 13.58% was labeled having language impairment. This rate was consistent with the expectations of clinical standards as well as with the only other large-scale epidemiologic study of language impairment in kindergarten children. However, it was not a good predictor of the prevalence of SLI, because exclusion criteria were not employed. Further research by Tomblin and his colleagues (1997) provided an estimated prevalence rate of 7,4%. This agrees with the overall estimated prevalence rate of 6% to 10% for school-age children, as reported by the DSM-IV-TR (American Psychiatric Association, 2000).

In the Netherlands, children are labeled as being language impaired when they fail at least two separate language tests on two different domains, where failing is defined as scoring -1,5 standard deviations or less. The domains that can be used to identify language impairment are (a) speech; (b) auditory processing; (c) grammar; and (d) lexical/semantic development. Alternatively, children who score -2 standard deviations or less on a general language test will also be diagnosed with SLI. In both cases cognitive functioning cannot be an underlying cause of the language impairment and speech therapy should have been proven to be unsuccessful (Voogd, 2009).Tests frequently used

Modality Language Domain Picture Identification Oral Vocabulary Vocabulary Composite Grammatic Understanding Grammatic Completion Sentence Imitation Grammar Composite Narrative Comprehension Narrative Recall Narrative Composite Comprehension Composite Expression Composite

(11)

by Dutch speech therapists to demonstrate morphosyntactic impairment are the CELF-4-NL (Kort, Schittekatte & Compaan, 2008) and the Schlichting tests for sentence production (Schlichting & lutje Spelberg, 2010). These tests, along with a Dutch assessment tool for sample analysis (STAP: van den Dungen & Verbeek, 1994), will therefore be the focus of this study.

2.2 Comparison of diagnostic instruments used to assess language disorders Speech therapists and speech-language pathologists generally use standardized language tests to assess children with language problems. These tests are often preferred because they are norm-referenced and relatively easy and quick to administer. Ornstein (1993) reports a number of strengths of norm-referenced tests, including: ‘(a) they assume statistical rigor in that they are reliable and valid; (b) the quality of test items is often high in that they are developed by test experts, pilot tested and have undergone revision prior to publication and use; and (c) administration procedures are standardized and the test items are designed to rank examinees for the purpose of placing them in specific programs or instructional groups’ (Ornstein, 1993 in: Ford, 2009).

McCauley and Swisher (1984) agree with the notion that, if properly used, norm-referenced tests are useful for diagnostics. However, they do argue that test administration and processing is prone to error, which is detrimental to test reliability and validity. To support this claim, they discuss four errors commonly made when using norm-referenced tests. The first error frequently made, arises when age-equivalent scores are used. The authors warn that these scores should be used with caution, because they are psychometrically imprecise and may lead to misinterpretation of the results. They suggest that summarizing test results with standard scores or percentile ranks, as is the case with the Dutch language tests evaluated in this study, serves as a valuable alternative. A second error may occur when interpreting the test profile. When norm-referenced tests are used, the standard error of measurement and the confidence interval are often provided to calculate the range in which the child’s true score is expected to be found. However, the scores on a profile are only estimates of the true score that a child would obtain if the scores were truly reliable and error-free. The deviation between a pair of scores on a profile can therefore be interpreted as difference in performance, while this difference might as well originate from measurement error. This may result in wrong conclusions about an individual’s strengths and weaknesses. Moreover, children will sometimes vary in their performance and therefore vary in their test scores, without the

(12)

8

existence of a language problem. The third error specified by McCauley and Swisher is the use of repeated testing as a means of assessing progress. Because norm-referenced tests are designed to look at differences between individuals, they often consist of items that cover a broad range of skills. The authors claim that, because of the limited number of items that assess the individual skills, norm-referenced tests are likely to be less sensitive to changes in behavior over time. The last error discussed by the authors is the use of test items in planning goals for therapy. First of all, they state that the number of items on a norm-referenced test is too small to draw conclusions, since not all forms and all developmental levels of the skills tested are covered, and conclusions cannot be drawn from individual errors. Secondly, most test profiles fail to provide enough detailed information to be used when planning goals for therapy.

A study by Merell and Plante (1997) supported the claim that norm-referenced tests are not suitable for planning therapy goals. They examined the extent to which norm-referenced tests are qualified to answer the questions “Is there a language impairment?” and “What are the specific areas of deficit?”. By investigating the first question, the authors assessed the tests’ sensitivity and specificity: the former addressing the tests’ accuracy in identifying children with language impairment, and the latter addressing their accuracy in identifying typically developing children. Merrell and Plante tested 40 preschool children (20 with SLI and 20 typically developing children) with the Test for Examining Expressive Morphology (TEEM) and the Patterned Elicitation Syntax Test (PEST). After cut-off scores were statistically computed to discriminate maximally between the two groups of children, sensitivity and specificity of both tests were still high enough for accurate discrimination. The specific areas of deficit, however, could not easily be identified because performance on similar structures varied in both tests. This possibility of variable performance lead Merrell and Plante to conclude that individual items could not be used to demonstrate mastery or deficit of specific structures and thus could not be used when planning goals for therapy.

McCauley and Swisher (1984) claim that language sampling is an important alternative to norm-referenced tests when describing children’s expressive language. Although this method is time consuming and therefore was infrequently used at the time of their writing, the authors claim it to be ‘a fertile source of suitable therapy objectives’ (McClauley & Swisher, 1984: 345). Additionally, language sample analysis can be deployed when evaluating progress in a child’s language. This is because, contrary to norm-referenced tests, no learning effect occurs (Van den Dungen en Verboog, 1998).

(13)

Besides being a valuable tool for the detailed analysis of specific structures and deficits, the use of language samples has other benefits. First of all, research by Dunn, Flax, Sliwinski and Aram (1996) indicated that measures of spontaneous language might be more sensitive than standardized tests. These authors compared clinical judgement, standardized test performance and measures of spontaneous language of preschool children. They found that children who were clinically diagnosed as language impaired, had deficits in spontaneous language even though they did not fail the standardized test. This suggests that the measures of spontaneous language reflect language difficulties of a child that were not assessed by the standardized test. Furthermore, language samples are more ecologically valid than standardized tests, because they naturally reflect the child’s language competence. The samples show the grammatical forms and the vocabulary used by the child, but they also demonstrate how the child uses language to share information with a listener. Finally, language sample analysis is also effective when diagnosing children who are difficult to test with standardized tests, for instance because of behavioural problems or high levels of performance anxiety (Costanza-Smith, 2010). In relation to this, Stockman (1996) discusses the value of language sample analysis for linguistic minority children. She claims that this method is more cultural sensitive, valid, accessible and flexible than standardized tests in the process of diagnosing linguistic minority children.

Apart from being time-consuming, language sample analysis has another disadvantage. Eisenberg (1996) points out the inability to draw conclusions about non- or infrequently produced structures. When using language sampling as a diagnostic tool, frequency of production is used as evidence that a child has acquired a particular structure. However, if the child’s production of a structure does not meet the criterion frequency, it is unclear whether this reflects a child’s lack of knowledge about the structure or if it reflects other factors such as a lack of opportunities to use the structure in the discourse situation. Eisenberg therefore suggests that language sample analysis may not be sufficient for studying all aspects of child language, and that some form of elicited production is desirable.

As shown above, the use of both methods for language analysis is subject to debate. An overview of the discussed strengths and weaknesses of both instruments is provided in table 2.2.

(14)

10

Table 2.2 Strengths and weaknesses of norm-referenced tests and language sample analysis Strengths Weaknesses

Norm-referenced tests

Language sample analysis

High reliability and validity (Ornstein, 1993)

High quality of test items (Ornstein, 1993)

Ranking of examinees in specific programs or instructional groups (Ornstein, 1993)

Standardized procedures (Ornstein, 1993)

Measurement error often interpreted as performance error

(McCauley and Swisher, 1984)

Less suitable for assessing progress and planning goals for therapy

(McCauley and Swisher, 1984)

Age-equivalent scores are prone to error

(McCauley and Swisher, 1984) Detailed analysis of language

production

(e.g. Merell and Plante, 1997; Prutting et al., 1975; Blau et al., 1984)

Suitable for difficult-to-test children and linguistic minority children (Costanza-Smith, 2010; Stockman, 1996)

High ecological validity (Costanza Smith, 2010) High sensitivity (Dunn et al., 1996) No learning effect

(Van den Dungen & Verboog, 1998)

Time consuming

(Ornstein, 1993)

Difficult to distinguish lack of knowledge from lack of production opportunities in cases of infrequent produced structures

(Eisenberg, 1996)

Some of the strengths and weaknesses shown in table 2.2 are disputable or differ in relevance depending on the specific tests used for assessment. The implications of table 2.2 for the Dutch tests are discussed in chapter 6.

In the past decades, several studies have been conducted on the use of norm-referenced tests and language sample analysis in the diagnosis of children with language impairment. Often these studies have shown a mismatch between the results on both test methods.

Prutting, Gallagher and Mulac (1975) investigated the relationship between the syntactic structures produced on the Northwestern Syntax Screening Test (NSST) and the same structures produced in a spontaneous language sample. The NSST elicits syntactic structures using a type of sentence repetition. The examiner showed 20 pairs of pictures

(15)

with grammatical distinctions to 12 four- and five- year old children with language delay. The pictures were introduced by using pre-determined sentences. Subsequently, the target structures were elicited by the examiner pointing at the picture and asking ‘What is this one?’ or ‘What is that one?’. For a child to have acquired a structure, the produced sentence form had to be identical to the form used by the examiner. Besides administration of the NSST, two language samples were collected and structures identical to those of the NSST were used for further investigation. Comparison of both methods demonstrated that 30% of the children failed to produce a grammatical distinction on the NSST, but correctly generated this distinction in spontaneous language. The authors concluded that these results indicated that item analysis of the NSST did not accurately represent the children’s spontaneous language skills, which made the NSST an instrument merely suitable for screening. A spontaneous language sample could in their opinion be used as a diagnostic tool to analyse specific syntactic structures. Although these results seem straightforward, the authors only reported whether the failed items of the NSST were produced correctly in spontaneous language. They did not address NSST performance of the structures produced incorrectly in spontaneous language. The current study attempts to look at the error production by using the tests as well as spontaneous speech as a starting point for analysis.

A similar conclusion to that of Prutting et al. (1975) was drawn by Blau, Lahey and Oleksiuk-Velez (1984), who studied whether the Carrow Elicited Language Inventory (CELI) could be used when developing goals for language intervention. The authors tested ten children with language impairment with the CELI and additionally analysed a language sample of the children. As with the findings by Prutting, Gallagher and Mulac, all of the children made fewer errors in their language sample than on the CELI, but correlations between the scores of the CELI and the language sample were high enough to conclude that the CELI could function as a diagnostic tool. However, based on goals that were determined in an earlier stage of assessment, most errors produced on the CELI were not considered as immediate goals for intervention. The language samples on the other hand did lead to these specific goals and also provided content and context in which the goals could be taught.

2.3 The current study

The mismatches between norm-referenced tests and language sample analysis found in earlier studies, agree with the authors clinical observations of Dutch children who show a

(16)

12

mismatch between performance on language tests and performance in spontaneous language. The current study therefore attempts to assess and compare performance on two norm-referenced tests for language production on the one hand, and a language sampling method for Dutch children with SLI on the other hand. The main question of this thesis is: To what extent do performance on norm-referenced language tests and

results of language sample analysis lead to comparable conclusions? To answer this

question, multiple case-studies will be conducted and the following questions will be answered:

1. To what extent does group performance on the norm-referenced tests correspond to the results of the language sample analysis?

2. To what extent do individual scores on the CELF and Schlichting correspond to the results of the STAP?

3. Are the problem structures identified by the CELF or Schlichting also identified by the STAP and vice versa?

These questions will be discussed in chapter 4 and 5. The next chapter provides a description of the CELF, Schlichting and STAP.

(17)

3. Linguistic and clinical aspects tested by CELF, SCHLICHTING and STAP

In this chapter, three diagnostic tools used to assess the language production of children with (suspected) language impairment will be described. Sections 3.1 through 3.3 provide a description of the individual instruments. Each section describes the goal and target group, materials, procedure, methods for scoring and interpretation, and psychometric properties of the diagnostic tool. These psychometric properties were derived from the test manuals and from reports of the Dutch commission for test matters (COTAN: Commissie Testaangelegenheden Nederland, www.cotandocumentatie.nl). Section 3.4 provides a more detailed comparison of the instruments and the structures they assess.

3.1 CELF-4-NL

The CELF-4-NL is a Dutch adaptation of the CELF-4 (Clinical Evaluation of Language Fundamentals, fourth edition) by Semel, Wiig and Wayne (2003). The CELF-4-NL (henceforth CELF) by Kort, Schittekatte and Compaan was published in 2008. The test was designed to provide an overview of a child’s general language ability and to assess performance on specific language areas. Norms are available for 5;0-15;0 year old Dutch speaking children and estimated norm scores are available for the ages 16;0 to 18;0 (Kort et al. 2008). To facilitate diagnosis, the CELF provides a four-level assessment process, consisting of the following levels:

1. Identification of a possible language disorder. 2. Description of the nature of the disorder. 3. Evaluation of possible underlying deficits.

4. Evaluation of language and communication in context.

The first level is assessed by administering four pre-selected tests addressing different language areas. These four tests together provide the Core Language Score, which determines the presence or absence of a language disorder. For the second level, additional tests are administered. These tests provide scores to calculate the Receptive Language Index, Expressive Language Index, Language Content Index and Language Structure Index. The third level explores any possible connection between the language problem and other abilities such as memory and rapid automatic naming. The fourth level assesses how the disorder affects the children’s’ classroom performance (Kort et al., 2008). An overview of the specific subtests that are used in each level of the four-level

(18)

14

assessment process is provided in Appendix 1. Because of their morphosyntactic-productive nature, main focus will be on the subtests Word Structure, Recalling Sentences and Formulated Sentences.

3.1.1 Materials

The CELF consists of an examiner’s manual, a stimulus book, scoring forms, Observational Rating Scale forms and Pragmatic Profile forms. The manual provides guidelines for testing, scoring and interpreting scores, norm tables and a description of the tests’ purpose, design and development. The flip-over stimulus book provides the examiner with test information and model sentences, while simultaneously presenting the test item to the child.

3.1.2 Procedure

Assessment of a child’s language has to be carried out by a speech therapist familiar with the materials and procedures. The guidelines provided by the CELF have to be followed accurately. Test order is generally based on the four-level assessment process, but other orders are also possible. Although the procedure may vary slightly depending on the subtest, general procedure is as follows:

1. The child is seated opposite the examiner, looking at the picture on the stimulus book.

2. The examiner explains the test and administers practice items to check understanding.

3. Test items are administered using the model sentences on the stimulus book or in the manual. Depending on test or test-item, the examiner can repeat instructions. If relevant, the examiner chooses the correct starting-item based on the child’s age.

4. The examiner writes down the score and other relevant information on the score form.

5. The examiner calculates the child’s scores.

The three subtests most relevant to this study are Word Structure, Recalling Sentences and Formulated Sentences. In Word Structure the child is presented a picture on the stimulus book. The examiner initiates a sentence and urges the child to complete it, as shown in (1).

(19)

(1) The examiner points to the picture and says:

Dit is een jongen en dit is … (‘This is a boy and this is …)

The child completes the sentence by saying:

… een meisje (‘a girl’)

The subtest Recalling sentences requires the children to repeat a spoken sentence, as shown in (2).

(2) The examiner reads the following sentence:

De jongen viel en deed zich pijn (‘The boy fell and hurt himself’)

The child is stimulated to repeat the sentences without changing it.

In Formulated Sentences the child looks at a picture on the stimulus book. The examiner provides the child with a word that has to be integrated in a sentence matching the picture.

(3) The examiner shows a picture of children crossing a finish line in a running competition and provides the child with the word lachend (‘laughing’)

The child creates the sentence: Lachend komen de kinderen de finish over (‘The children cross the finish line laughing’)

3.1.3 Scores and interpretation

The CELF uses quotient scores (mean 100, standard error 15) and percentiles (0 to 100) as a standard for scoring. The raw data on the score forms can be converted into these scores by using the designated norm tables that can be found in the manual. The manual also provides age-equivalent scores, but the authors state that these scores should be used with caution (Kort et al., 2008).

3.1.4 Psychometric properties

Norms of the CELF-4-NL are based on a sample of 1356 Dutch and Flemish children aged 5 to 15 years old. All children lived in either The Netherlands or Flanders for at least 7 years and none of the children had a mental or physical disability. Norm groups consisted of 77 to 152 subjects, depending on the age group (Kort et al., 2008). In 2010

(20)

16

the COTAN evaluated the CELF and concluded the norms for the 5 to 15 year old children to be satisfactory (scale: good – satisfactory – poor). The reliability and construct validity were also reported satisfactory for the three subtests used in the current study. The criterion validity was not assessed and was therefore reported poor (Egberink et al. 2010a).

3.2 Schlichting Test for Language Production

The Schlichting Test for Language Production was originally designed as part of a test battery to test language development of children up to the age of four years. The authors claim that before development of the battery, which also includes a Dutch version of the Reynell Verbal Comprehension Scale A (Reynell, 1985), no instruments were available for testing young children with sufficient reliability and validity (Schlichting & lutje Spelberg, 2003). The original Test for Language Production consisted of four tests that could be used independently. These were the test for Sentence Development; a picture-based Vocabulary Test; a test for Auditory Memory and a vocabulary checklist. In 2010 a renewed version of the Schlichting test was published (Schlichting Test voor Taalproductie-II: Schlichting & lutje Spelberg, 2010). In this more recent version the vocabulary checklist was omitted and a Narrative Task and a Pseudo-Word task were added. At the same time, the Reynell was replaced by the Schlichting Test voor Woordontwikkeling (Schlichting Test for Word Development).

The test for Sentence Development is the subject of this study. This test assesses grammatical production based on functional imitation of sentences, meaning that the child (partly) imitates utterances produced by the examiner in a functional context. Functional imitation is viewed as a good measure of syntactical knowledge, since it is claimed that young children cannot imitate structures that are not part of their own linguistic system (Schlichting & lutje Spelberg, 2003). The utterances were designed to have a communicative purpose, as shown in example (4) below. The complete list of structures tested can be found in Appendix 2.

(4) The examiner chooses a picture, puts it in a paper frame and says:

Ik denk dat ik de auto neem (‘I think I’ll take the car’)

The child is invited to imitate the examiners actions and says:

(21)

The goal of the Test for Sentence Development as described in the manual, is to measure the syntactic productive knowledge of Dutch children (Schlichting & lutje Spelberg, 2010). However, in an earlier stage Schlichting and lutje Spelberg acknowledged that it is impossible to test the children’s knowledge of all syntactic structures by using only 40 items. They therefore proposed that the test assessed the child’s knowledge of ‘certain structures in certain linguistic (and cognitive) contexts’ (Schlichting and lutje Spelberg, 2003; 249).

The test is designed to assess 2;0-7;0 year old Dutch children with a possible language delay and is furthermore claimed to be suitable for diagnostics and the assessment of progress (Schlichting & lutje Spelberg, 2010).

3.2.1 Materials

The Schlichting Test for Language Production consists of an examiner’s manual, a stimulus book, stimulus materials and scoring forms. The manual provides guidelines for testing, scoring and interpreting scores, norm tables and a description of the tests’ purpose, design and development. The stimulus book provides scenes that, in combination with the stimulus materials, are used in different test items.

3.2.2 Procedure

Assessment of a child’s language has to be carried out by a speech therapist familiar with the materials and procedures and the guidelines provided by the Schlichting Test for Sentence Development have to be followed accurately. General procedure is as follows:

1. The child is seated opposite of the examiner (looking at the stimulus book).

2. The examiner lays out the scene on the stimulus book that corresponds to the practice item, explains the test and administers practice items to check understanding.

3. Test items are administered. Answers elicited are: exact imitation; imitation with variation; complementation; and answering. The examiner can repeat instructions and utterances depending on the specific item. If relevant, the examiner chooses the correct starting-item based on the child’s age.

4. The examiner writes down the utterance on the scoring form. 5. The examiner calculates the child’s scores.

(22)

18

3.2.3 Scores and interpretation

After the child’s utterances are written down, the examiner analyses them and scores them as either ‘passed’ or ‘failed’. The score form provides the target responses for all items. The total score can subsequently be converted into quotient scores (mean 100, standard error 15) or percentile scores (0 to 100), using the norm table included in the manual. For the test for Sentence Development, the quotient score is called the ‘zinsquotiënt’ (‘sentence quotient’) or ZQ for short.

3.2.4 Psychometric properties

The exact number of children included in the norm group of the test for Sentence Development is unknown. Schlichting and lutje Spelberg (2010) report that the sample used in the norm study for the Test for Language Production-II, ranged from 635 to 983 children per subtest, depending on the age group. The twelve age groups included 67 to 101 children. In 2010, COTAN evaluated the Schlichting Test for Language Production-II and concluded the norms to be good (scale: good – satisfactory – poor). Reliability was rated satisfactory and construct validity was rated good. Due to minimal study of the criterion validity by the authors, this criterion was rated poor (Egberink et al. 2010b).

3.3 STAP

Development of a first version of the STAP (Spontane Taal Analyse Procedure: Language Sample Analysis Procedure) was initiated by Magreet van Ierland in 1975. The definitive version of the STAP was realised by Van den Dungen and Verbeek (1994), and was published by the department of Linguistics of the University of Amsterdam. A theoretical motivation by Verbeek, Van den Dungen and Baker was published in 2005.

At the start of the first STAP study, barely any spontaneous language data on Dutch children was available. Therefore two goals were formulated. The first goal was to collect language production data of typically developing Dutch children. The second goal was to develop an instrument that could qualify and quantify the productive language of Dutch children. Data was collected of 240 4;0-8;0 year-old children attending regular schools. This age range was chosen because in 1975 little was known about the language production of Dutch children after the age of four. The current STAP system is therefore also intended to assess 4;0 to 8;0 year old children (Verbeek et al., 2007).

(23)

In the formal diagnosis of language impairments, the STAP cannot be the only diagnostic tool used in the assessment process. This method is, however, frequently used to support results of other tests and to justify the conclusions drawn by the clinician.

The STAP assesses multiple language domains. In some of these domains complexity as well as correctness is assessed, in other domains only correctness is assessed. Domains assessed both for complexity and correctness are morphology and syntax. Phonology, semantics and pragmatics are only assessed for correctness. Appendix 3 shows an overview of the variables assessed by the STAP (Verbeek et al., 2007; 111).

3.3.1 Materials

The STAP manual consists of guidelines for recording and transcribing language samples and instructions for the analysis of language samples. It also contains eight STAP forms used for the analysis of language samples, four STAP-profile forms for comparison of a child’s performance on the observed variables to one of the four norm groups, and a STAP-summary form that provides a quick overview of the child’s performance.

3.3.2 Procedure

Assessment of a child’s language using STAP has to be carried out by a clinician familiar with language sampling and language analysis, and the guidelines provided by the STAP manual have to be followed accurately. The procedure consists of the following steps:

1. Engaging a conversation with the child.

A conversation between clinician and child is taped on video or audio. No materials are used and at least 50 full utterances have to be collected. Elliptical answers are not included in this counting, but are counted separately.

2. Transcribing the conversation following the STAP guidelines. 3. Segmenting the transcript following the STAP guidelines. 4. Analysing the transcript following the STAP guidelines. 5. Filling out the STAP-profile form

6. Filling out the STAP-summary form 7. Interpreting the data

3.3.3 Scores and interpretation

After the transcript is analysed, total scores of each of the variables presented in Appendix 3 can be calculated. Subsequently, the obtained total scores can be drawn on

(24)

20

the STAP-profile form (Appendix 4). On this form, all variables are presented, accompanied by the total scores and matching standard deviations (-2, -1, 0, 1, 2) of the norm group of typically developing age-matched children. By drawing the scores on the profile-form, interpretation of the scores is facilitated:

- Scores drawn on or to the right of -1 standard deviation are interpreted as being average or above average.

- Scores drawn between -1 and -2 standard deviations are interpreted as being moderately deviant (the child being moderately impaired on the associated morphosyntactic structure).

- Scores drawn on or left of -2 standard deviations are interpreted as being severely deviant (the child being severely impaired on the associated morphosyntactic structure).

After calculating total scores and drawing them on the STAP-profile, the STAP-summary (Appendix 4) can be filled out. The summary form facilitates quick assessment of the severity of the language disorder, by providing scores of overlapping morphosyntactic categories such as ‘syntactical errors’ or ‘morphological complexity’ (Van den Dungen & Verbeek, 1999). These overlapping categories are composed of specific variables presented on the profile form. As an example, to award a score to the category ‘syntactical errors’ on the summary form, the scores of the variables ‘main verb missing’ and ‘agreement errors’ of the profile form are analyzed. Following guidelines, whichever of the two variables within a category scores the lowest, has to be noted on the summary form. Thus, the highest of the two scores is not taken into account. To illustrate with the example above: performance on the category ‘syntactical errors’ for the participant shown in Appendix 4 is based on the variable ‘agreement errors’, because performance on this variable is worse than performance on the variable ‘main verb missing’.

3.3.4 Psychometric properties

Norms of the STAP are based on a sample of 240 children between the ages of 4;0 and 8;0, divided in four age groups. The STAP was not evaluated by the COTAN, since it is not a diagnostic test. Because analysis is based on a sample of spontaneous language, reliability of the STAP is expected to be low (Van den Dungen & Verboog, 1998). Validity, however, increases when the child’s language in the test situation represents the child’s normal language, as is the case in the STAP (Costanza-Smith, 2010). An

(25)

assessment of the inter-rater reliability by Verbeek et al. (2007) had satisfactory results for most of the variables tested with the STAP.

3.4 Comparison of the CELF, Schlichting and STAP

Schlichting and lutje Spelberg (2010) assessed the correlations between the standard deviations of the Schlichting tasks and the tasks that are used to determine the core score of the CELF. The correlations of the tasks relevant for this thesis are provided in table 3.1 below.

Table 3.1 Correlations between standard deviations of CELF and Sentence Development of the Schlichting

(Schlichting, 2010; 34). CELF Subtest Schlichting SD WS RS FS 0,62* 0,48* 0,27

*p<0.01. SD=Sentence Development, WS=Word Structure, RS=Recalling Sentences, FS=Formulated Sentences

Although for Word Structure and Recalling Sentences the correlations to the Schlichting are significant, the correlations in table 3.1 are not very high. Sentence Development shows the highest correlations to the subtest Word Structure of the CELF. The correlation of Formulated Sentences to the Schlichting is, on the other hand, very low. The authors claim that this might be explained by the CELF’s focus on both structure and content, rather than full focus on structure. Children have to produce a sentence that is both syntactically and semantically correct. In some cases, the sentence provided by the child will be syntactically correct, but the child does not obtain full score because the sentence does not completely match the event described on the picture presented by the examiner. Schlichting and lutje Spelberg (2010) indicate that in the Schlichting Test for Sentence Development, the difficulty-level of the items is deliberately kept low and the focus is mainly on structure. Regarding the correlation of the Schlichting Test for Sentence Development and Recalling Sentences of the CELF, it shows that the correlation is surprisingly low given the fact that both tests are based on sentence repetition. The authors do not provide any suggestions as to how this low correlation could be explained. When comparing the items on both tests, it seems plausible that differences in performance arise from a difference in number of items and a difference in morphosyntactic complexity. The children included in the correlation study fell within the age range of 5;7 to 7;3. The CELF subtest Recalling Sentences only provides eight items

(26)

22

for this age range, whereas the Schlichting provides thirty. Furthermore, item difficulty increases rapidly on the CELF, whereas the Schlichting shows a gradual increase in difficulty.

A full overview of the morphosyntactic structures assessed by the CELF, Schlichting Test for Sentence Development and STAP is provided in Appendix 5. An overview of the overlapping structures is provided in table 3.2.

Table 3.2 Shared morphosyntactic structures assessed by the CELF, Schlichting and/or STAP, according to

their respective manuals.

Morphosyntactic structure Schlichting WS CELF RS FS STAP Noun x x x Verb x x x Adjective x x x x Coordination x x x x Subordination x x x x Pronoun x x x Past participle x x x Adverbial adjunct x x Comparative x x Negative x x Passive x x Relative clause x x Adverb x x

FS= Formulated Sentences, RS=Recalling Sentences, WS=Word Structure

In total, the CELF and STAP assess seven structures that are similar and the Schlichting and STAP share eight similar structures. This is, however, only when the structures reported by the manuals are used as a starting point. An item-level analysis of the target responses reveals differences in labeling, underlying structures, and scoring. This brings to question whether the structures in table 3.2 indeed are comparable.

The first problem with labeling is the fact that some structures are assessed by both tests, but are labeled differently. An example is the demonstrative pronoun die (‘that one’) assessed by the CELF subtest Word Structure but labeled ‘subject’ by the Schlichting in for instance the sentence die daar (‘that one there’). Another example is

lachend (‘laughing’) labeled as an adverb by the CELF, whereas the Schlichting labeled

the antonym huilend (‘crying’) as a present participle. Secondly, some complex morphosyntactic structures that seem relevant are not labeled at all, simply because they are not the target item of the sentence. An example is the diminutive vriendinnetje, not

(27)

labeled as such by the Schlichting in the sentence haar vriendinnetje (‘her girlfriend-DIM’), where the target structure is the pronoun haar (‘her’).

When examining the items with the supposed shared structures in more detail, it emerges that they often differ in underlying variables. This is shown by the example items provided in Appendix 6. Some of these items (e.g. pronouns and relative clauses) are quite similar, but most of the items are less comparable than initially thought. An example is the structure ‘verb’, assessed by both the CELF and the Schlichting. The target structure of the Schlichting is the infinitive slapen (‘to sleep’), while the CELF elicits a past tense 3th person gaf (‘gave’). Since both manuals state that the items assess the structure ‘verb’, it is initially expected for both items to be comparable. However, they are not, and neither infinitives nor the finite past tense were assessed by the other test.

A final difference between both tests is the strictness for scoring the child’s responses. While the CELF subtests have a very strict scoring system that takes even a slight deviation into account, the Schlichting manual allows for more variability and often allows for deviant utterances to be produced, as long as the specific target structure is produced correctly.

Because of the differences between the CELF and Schlichting described so far, the question arises to what extent the results of these two tests are comparable. This is relevant because the results of both tests have to be combined in order to answer the main research question. Moreover, differences in structure labels and underlying variables make it unfeasible to provide a detailed analysis of performance on comparable items. However, in order to remain true to the tests as they were intended by their authors, analysis will be based on the structures as reported by the CELF and Schlichting manuals

The next chapter provides a description of the subjects, data collection and methods of data analysis used in the current study.

(28)

24

4.

Methods

The current research is based on document analysis. Data were obtained by analyzing ten files of eight children who visited one of the two participating institutions treating and/or diagnosing children with language difficulties: Pento Centre for Audiology in Amersfoort and the Sophia Children’s Hospital in Rotterdam. This chapter will provide an overview of the methods for analysis. The criteria used to select subjects will be discussed in 4.1 followed by the methods for data-collection in section 4.2. Section 4.3 describes the methods used for analysis.

4.1 Subjects

The results of this study are based on the data of eight children clinically diagnosed as being language impaired. All children were aged 4;0 to 8;0, since they had to be within the age range for which STAP has norm data. Children with a hearing loss of more than 20 decibels were excluded from the study and all children had a non-verbal IQ above 85. Furthermore, all children were tested within a 6 month period by either the CELF and STAP or Schlichting and STAP. When no full STAP analysis was available, there had to be at least one spontaneous speech sample available for which a STAP analysis could be carried out.

In total, ten files were analyzed. For the subjects C4a and S4a in table 4.1, data were available of respectively twelve (C4b) and eight (S4b) months after the initial test data. The availability of these data allows for a comparison over time, but initially the children are treated as separate subjects. C1 and C3 were tested at Pento Centre for Audiology, all other data were collected at the Sophia Children’s Hospital.

An overview of the children of whom data were used for this study, can be found in table 4.1. As this table shows, there is an age difference between the children tested by the CELF and the children tested by the Schlichting. Children in the latter group are up to 3,5 years younger than the children in the first group. This is not surprising, because the Schlichting is often used for younger children. Finding children who all fell within the same age category, proved to be impossible.

(29)

Table 4.1 Characteristics of subjects

Participant Test Gender Age at test date

Age STAP

IQ

C1 CELF Female 7;0 7;3 Estimate average

C2 CELF Female 7;7 7;7 Estimated average

C3 CELF Female 7;9 7;10 107 (WNV)

C4a CELF Male 7;0 7;0 106 (WISC)

C4b CELF Male 8;0 8;0 106 (WISC age 7;0)

S1 Schlichting Female 4;7 4;7 121 (SON-R)

S2 Schlichting Female 5;10 5;10 110 (SON-R)

S3 Schlichting Female 6;4 6;4 114 (SON-R)

S4a Schlichting Male 5;4 5;4 107 (WPSSI age 6;0)

S4b Schlichting Male 6;0 6;0 107 (WPSSI)

C4b same participant as C4a, S4b same participant as S4a. WNV= Wechsler Non-Verbal, WISC= Wechsler Intelligence Scale, SON=Snijders-Oomen Non-Verbal Intelligence Test, WPSSI= Wechsler Preschool and Primary Scale of Intelligence.

4.2 Data collection

All data were collected through analysis of existing test reports. Either the children were selected by the author herself or they were pre-selected by the supervising clinician based on age, IQ and hearing status. The CELF data that were available for analysis consisted of completely filled out score profiles, which included the children’s Core Language Scores and index scores. Filled out score forms from the subtests Word Structure, Recalling Sentences and Formulated Sentences were also available. The Schlichting data consisted of ZQ scores and filled out score forms with the children’s responses to the individual items. The STAP data consisted of the original transcripts and the examiners analyses of these transcripts. For the subjects C1 and C3 this analysis was conducted by the author, analyses of the remaining children were carried out by the supervising clinician working at the Sophia Children’s Hospital. STAP profile forms and summary forms were not yet available in the test reports. These forms were therefore filled out by the author following the guidelines described in 3.3.3. Examples of a STAP-profile and STAP-summary are presented in Appendix 4.

4.3 Analysis

In order to assess the extent to which performance on norm-referenced tests and the STAP lead to comparable conclusions, analysis took place at two levels. First, the scores of both methods were compared in a group-level analysis. At the second level, the subjects’ performances on both methods were analyzed individually. Additionally, scores of C4a and C4b and scores of S4a and S4b were compared.

(30)

26

For the group-level analysis, the grammaticality and complexity scores of the STAP were compared with either the CELF scores or the Schlichting scores to determine the relations between the different measures. In order to make the STAP scores comparable to the scores of the CELF or Schlichting, the STAP’s scoring system was slightly adapted at two points.

The first adaptation concerned the calculation of more precise standard deviations for the STAP. While norm-referenced tests allow for precise standard deviations to be calculated, the STAP only provides the standard deviations -2, -1, 0, 1, 2 or between -2 and -1, between -1 and 0, between 0 and 1, and between 1 and 2, as described in section 3.3.3. Through consultation with one of the STAP authors, it became clear that precise standard deviations for the STAP could not be computed statistically. Therefore, scores that fell within the intermediate categories, were round of with 0,5 standard deviations. When for instance a variable on the profile form scored ‘between -1 and 0 standard deviations’, this variable was scored as -0,5 standard deviations.

A second adaptation was the calculation of mean standard deviations for both the grammaticality and the complexity of the children’s utterances on the STAP. This was done by using the scores linked to the categories that were filled out on the STAP-summary (section 3.3.3 and Appendix 4). The mean scores of the sections Gegevens over

ongrammaticaliteit (‘data on ungrammaticalities’) and Gegevens over complexiteit (‘data

on complexity’) were calculated. These mean scores could then be compared with the scores obtained by either the CELF or the Schlichting. Due to the small number of subjects per norm-referenced test, no statistical analysis could be conducted.

For the individual analysis, the questions previously proposed by Merrell and Plante (1997) were discussed: “Does the tool indicate language impairment?” and “What are the specific areas of deficit?”. In order to answer the first question, individual scores on both methods were compared. The criteria used to establish the presence of a language disorder are based on the criteria used in the formal diagnosis of Dutch children with possible language impairment, as described in section 2.2. These criteria were as follows:

CELF: Core Language Score ≤ -2 SD, or: two or more subtests ≤ 1,5 SD Schlichting: ZQ ≤ -1,5 SD

(31)

In order to discuss the question “What are the specific areas of deficit?”, criteria for the identification of problem structures have been defined. For the STAP, morphosyntactic structures presented on the profile form with scores on or below -1,5 standard deviations were interpreted as being a problem structure. This is because this cut-off score is most frequently used in formal diagnostics. For the CELF subtests, morphosyntactic structures were considered problem structures when 75% or more of the items of a given structure was produced erroneously. Items were labeled as being produced erroneously, when they obtained zero points for an item. Example (5) demonstrates how problem structures were identified following CELF guidelines.

(5) The examiner reads the following sentence:

Het boek werd niet door de leraar naar de bibliotheek teruggebracht.

(‘The book was not returned to the library by the teacher’) The child repeats the sentence as follows:

De boek is niet door leraar naar bibliotheek.

(‘The book is not to library by teacher’)

According to the CELF manual, the intended target structure in example (5) is a passive sentence with negation. Most of the time, the children did indeed produce the intended target structure incorrectly. Example (5) shows, however, that this is not always the case, as the negation niet (‘not’) was present in the child’s utterance. However, because of the CELF guidelines, the amount of errors in the sentence above lead to a zero points score for this item. By adapting the 75% criterion, only those structures that were produced erroneously above chance were labeled as being problem structures.

For the Schlichting, all structures produced incorrectly were noted and problem structures that occur twice or were marked individually. This is because most structures were only tested once. Example (6) demonstrates the identification of problem structures by the Schlichting.

(5) The examiner reads the following sentence:

Nu wil ik deze. (‘Now I want this one’)

The child repeats the sentence as follows:

(32)

28

According to the manual, structures tested in this sentence are the subject, auxiliary verb, object, adverbial adjunct and word-order inversion. In this example the problem structures would be the adverbial adjunct nu (‘now’) and the word-order inversion.

(33)

5.

Results

This chapter presents the scores obtained by the subjects on either the CELF and STAP or the Schlichting and STAP. In section 5.1 an overview of the individual scores is presented. These scores will be compared at group-level. Section 5.2 reviews individual performance and describes whether both methods enable for the same conclusions to be drawn.

5.1 Scores

Test scores of the CELF and Schlichting were copied from the children’s test files and are presented in table 5.1 and table 5.2. These tables also include the manually computed STAP scores (section 4.3). Table 5.1 shows the scores of the children tested using the CELF and STAP. An accompanying scatter plot is provided in figure 5.1.

Table 5.1 Scores (standard deviations) of the children tested by the CELF and STAP

Partici-pant Core Language Score Structure index Word Structure Recalling Sentences Formulated Sentences STAP Grammaticality STAP Complexity C1 -2,3 -1,7 -2,1 -2,4 -1,3 -1,4 -1,3 C2 -1,8 -1,8 -1,7 -3,0 -0,7 -1,6 -0,7 C3 -1,5 -0,9 -1,0 -1,0 -1,3 -1,9 -0,1 C4a -2,2 -1,7 -1,7 -2,7 -0,7 0,7 0,3 C4b -2,1 -1,9 -1,7 -2,8 -0,7 0,1 0,5

Figure 5.1 Scatterplot of scores of the children tested by the CELF and STAP.

CLS = Core Language Score, SI = Structure Index, WS = Word Structure, RS = Recalling Sentences, FS = Formulated Sentences, STAPg = STAP grammaticality 2nd score, STAPc = STAP complexity

-3,5 -3 -2,5 -2 -1,5 -1 -0,5 0 0,5 1 C1 C2 C3 C4a C4b Test sco res Participants

CELF/STAP

CLS SI WS RS FS STAPg STAPc

Referenties

GERELATEERDE DOCUMENTEN

Tijd en ruimte om Sa- men te Beslissen is er niet altijd en er is niet altijd (afdoende) financiering en een vastomlijnd plan. Toch zijn er steeds meer initiatieven gericht op

De beginnend beroepsbeoefenaar inventariseert de behoeften en verwachtingen van de klant ten aanzien van de aan- en verkoop van het vastgoedobject, de financiële speelruimte die

De tweede geschetste verwachting, debatten over een wetsvoorstel betreffende een constitutioneel vraagstuk zijn van hogere kwaliteit dan debatten over een partijpolitiek

Deze metingen werden door de meetploeg van Animal Sciences Group uitgevoerd volgens het nieuwe meetprotocol voor ammoniak (Ogink et al. , 2007) zoals die is opgenomen in de

Aangez~en de kroketten en bitterballen als één produkt zijn onderzocht is niet bekend of er verschillen in afwijkingen tussen deze twee pro- dukten bestaan. De

Per ontwerp (intern en extern) onderscheiden we drie niveaus. Ten eerste het bestuurlijk-strategische niveau. Dit is het niveau waarop bestuurders die gelegitimeerd zijn om voor

• Teelt op substraatbedden van zand, klei, potgrond en kokos • Teelt op 80 cm en 40 cm duinzand op afgedekte ondergrond In de komende jaren wordt, naast beantwoording van veel

We then establish a framework for risk assessment that encompasses increasing levels of complexity by including interactions among multiple drivers of climate change risk