• No results found

University of Groningen SEN and the art of teaching van der Kamp, Antoinette Jacqueline

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen SEN and the art of teaching van der Kamp, Antoinette Jacqueline"

Copied!
25
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

SEN and the art of teaching

van der Kamp, Antoinette Jacqueline

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date:

2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van der Kamp, A. J. (2018). SEN and the art of teaching: The effect of systematic academic instruction on

the academic and behavioural problems of students with EBD in special education. Rijksuniversiteit

Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

(3)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

(4)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 91PDF page: 91PDF page: 91PDF page: 91 91

6.1 Introduction

Providing a learning environment which fits the different abilities and needs of all students is a core issue in education. This applies even more to students with emotional and behavioural difficulties (EBD) in special education (SE), who are characterized by behaviour which negatively affects their social and academic development (Bos & Vaughn, 2006; Lane & Menzies, 2010). Research suggests that academic progress for many of these students falls continuously short (Mattison, Hooper, & Glassberg, 2002; Siperstein et al., 2011). Since problem behaviour frequently occurs when students are confronted by ill-fitting or maladapted tasks (Umbreit et al., 2007), it is essential to adapt the curriculum to the students’ abilities and needs. For years, most attention has been paid to these students’ behavioural problems, whereas the last ten years have seen growing attention being paid to their academic abilities. This is an essential development, since an adapted curriculum not only offers the potential to improve these students’ academic growth, but also to reduce their problem behaviour in classrooms (Kern & Clemens, 2007; Van der Worp-van der Kamp, 2014).

In chapter 3, this process of continuous improvement and adaptation is defined with Deming cycle of PDCA (1986). The third step in this cycle, namely monitoring students' progress regarding the defined goals (check) entails, inter alia, the use of regular assessment. After all, regular assessments are of great support in adapting a curriculum to the needs and abilities of these students (Salend, 2009). Shrine, Ardoin, Yell and Carty (2014) even refer to assessment as the keystone of special educational programming. Moreover, the current policy of data-driven teaching requires that all students – including those with disabilities – must be included in large-scale standardized assessments (Thurlow, Lazarus, Thompson, & Morse, 2005). These assessments can be very useful indeed, assuming that their outcomes are a reliable and valid reflection of the students’ academic abilities. However, Tindal et al. (2016), indicate that predicting performance on large-scale standardized assessments is problematic for students with disabilities. Likewise, based on the inexplicable wildly fluctuating outcomes students with EBD achieve in large-scale standardized assessments presented in chapter 4, we doubt their reliability when applied to these students. Major fluctuations in grades within a six-month period call into question whether these tests are suitable for students with EBD. Moreover, McGrew and Evans (2004) found that the vast majority of teachers did not believe that

(5)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 92PDF page: 92PDF page: 92PDF page: 92 92

students in special education should be expected to meet the same academic standards or be included in the same assessments as mainstream students.

The use of large-scale standardized assessments for students with EBD in special education is complicated in at least two ways. The first complication concerns the fluctuations in the students’ academic performance. The problems students with EBD in special education experience cannot be explained as being rooted in intellectual, sensory or health deficits (Onderwijsraad, 2010), as such students are believed to be capable of fair academic performance and growth (Vannest et al., 2009; Van der Worp-van der Kamp et al., 2014). However, these students are known for being easily distracted and having weak attention control, short attention spans, weak inhibition and problems with their working memories. These typical characteristics erect barriers for students with EBD which prevent them from demonstrating their knowledge and abilities accurately, leading to poor school outcomes and weak test performance (Bolt & Thurlow, 2004). Moreover, suddenly emerging emotional problems such as negative emotions, anxiety or mood swings (Valiente, Swanson, & Eisenberg, 2012) often further worsen these outcomes and can also lead to fluctuating performance and varying results over time (Van der Worp-van der Kamp et al., 2016). These problems are further exacerbated, according to teachers, because national testing can be overwhelming, causing stress for students with disabilities (Crawford, Almond, Tindal, & Hollenbeck, 2002; DeBard & Kubow, 2002). Since these assessments are typically only snapshots of academic performance (data captured at specific moments) and students with EBD can be unpredictable in their behaviour, assessment outcomes can vary from moment to moment.

A second complication concerns the choice of assessment. Crawford and Tindal (2006) noted that only half of teachers knew where to obtain specific information to help them decide about appropriate test participation. Most students with EBD in special education had a school career characterized by failure, high mobility between schools and high suspension and expulsion rates (Wagner, Newman, Cameto, & Levine, 2005). By the time they enter special education, important parts of the standard academic curriculum have often been missed. Since large-scale national assessment systems are based on this curriculum, these omissions complicate the choice of assessment level for these students. In other words, it is difficult to establish the appropriate level for testing students who have not followed the full instructional programme related to the test’s content. This is not only relevant for the first test taken but also for subsequent tests. If we assume that students follow a standard curriculum but have important gaps in their knowledge of

(6)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 93PDF page: 93PDF page: 93PDF page: 93 93

parts of it, it becomes problematic to decide which test to choose. If tests are selected based on where the students should find themselves on the ordinary semestral programme, it is likely that the tests will be too difficult for these students. Anxiety will affect the results even more (Derakshan & Eysenck, 2010). However, a test taken at the students’ actual performance level gives rise to other concerns. Since these large-scale tests measure broad aspects of the curriculum, the students’ discrete progress in certain parts of the curriculum is unlikely to be discernible. In such cases, despite the fact that these students may actually have improved in some specific content areas, their overall scores on the test will remain unsatisfactory. Poor academic outcomes undoubtedly generate feelings of inadequacy in students. Since self-esteem levels are a powerful controlling factor in the students’ behaviour and their ability to learn (Margerison, 1996), unsuitable assessment measures can have serious consequences for students with EBD.

To summarize, given the characteristics of students with EBD in special education, it is possible that large-scale assessments do not provide a meaningful and accurate picture of the EBD students’ academic performance and growth, with all that this entails. To date, there has been little research on the large-scale assessment of students with EBD in special education (Tindal et al., 2016). Given that these students’ academic progress is of growing concern and that students with EBD are regularly subjected to such assessments, systematic research of the practice and reliability of large-scale assessment for these students is needed. In light of the obstacles to the successful large-scale assessment of these students described above, this study has two aims. The first aim concerns assessing the stability in performance scores over time per student. In other words, to what extent are the outcomes in tests reasonable predictors of outcomes in succeeding tests? The second aim concerns the choice and difficulty level of the tests administered to students in special education, considering the tests previously administered to these students, their ability and performance over time.

6.2 Method

6.2.1 Participants

The study was conducted in five special primary schools for students with severe behavioural problems (RENN4) in the northern Netherlands. Data were collected from all students who attended the schools in 2014 and completed a national standardized assessment from the Dutch student and education monitoring system CITO (Central

(7)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 94PDF page: 94PDF page: 94PDF page: 94 94

institute of test development). The study included 546 students with a mean age of 10.5 (SD 1.62, minimum 6.4, maximum 13.8). All the students met the following school-specific criteria: 1) the students showed severe behavioural or psychiatric problems in terms of DSM-IV; 2) this behaviour was manifested in education and at home and/or in leisure activities; 3) youth care and/or a child psychiatric services were involved; 4) the students’ participation in education was extremely limited in terms of there being serious shortfalls in academic learning and/or behaviour towards teachers or other students; and 5) problems persisted for at least six months despite the school’s educational care (WEC-Raad, 2008). According to the admission rules, students with an intellectual impairment are not included in RENN4.

6.2.2 Instruments

The state-wide assessment studied here is CITO’s student and education monitoring system: a broad Dutch assessment programme for school-aged students most widely used in the Netherlands. This system encompasses standard biannual tests for a variety of academic content including maths and spelling. It includes tests for six school years in twelve different test levels (TL): every school year (3-8), a midseason test is taken half way through (M3-M8), and an end-of-season test is taken at the end of each academic year (E3-E8). CITO has also developed five optional ‘in-between’ tests especially for students in special education for whom the step between two consecutive tests is too great (e.g. from M3 to E3), to replace these more difficult tests (e.g. M3E3 instead of E3). For the highest school year (8), CITO added an extra test level for the start of that year (B8). These ‘in-between’ tests (from M3E3 to M5E5 and B8) increase the number of levels to 17.

This study is targeted at the assessments for maths and spelling, being crucial problem areas for special education students (Reid et al., 2004). The maths tests measure skills including mental arithmetic, and more complex maths skills such as fractions, percentages and calculations concerning time and money. The spelling tests contain dictation tasks measuring active spelling along with multiple choice tasks for recognizing spelling errors in a written text (starting with M4). All spelling tests consist of three modules, of which all students complete the first module. Depending on the outcome of the first module, students proceed to an easier or a more difficult module. The reliability coefficient (Accuracy of Measurement) for both maths and spelling scores is high to very high, ranging from .91 to .97 and .86 to .95 respectively (Engelen, Hoogstraate, Scheltens, & Verbruggen, 2012; Mols & Kamphuis, 2012).

(8)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 95PDF page: 95PDF page: 95PDF page: 95 95

The CITO registration system is based on a measurement technique which makes scores on subsequent Tests Levels (TL) comparable by using an underlying fixed scale, the ‘Ability Scale’. Each topic (maths or spelling) has its own Ability Scale. By using this scale, CITO converts the raw score (number of correct items) of a test into an Ability Score (AS), permitting students’ scores to be compared on different tests at different moments, and on different tests at the same moment. Since all Ability Scores related to the same construct are displayed on one scale, high correlations can be expected between consecutive tests. For the norm group of typical students, CITO found correlations between consecutive tests between .86 and .96 for maths and between .75 and .96 for spelling (Wijs, de, Kamphuis, Kleintjes, & Tomensen, 2011).

In mainstream education students are streamed into groups by level. Based on their ability score, CITO classifies them into five ability levels, namely from the 20% lowest scoring students (V) to the 20% highest scoring students (I). In special education, to prevent students from being continually classed as level V, CITO links every Ability Score to a Performance Level (PL). A Performance Level provides an indication of a student’s grade level comparable to grades 3 to 8 for mainstream students. For example, a student tested at the end of group 5 (TL = E5) might achieve an Ability Score corresponding to the mean Performance Level at the end of group 4 (PL = E4).

The student is therefore demonstrating an approximate delay of one year. If the mean delay is more than one year, CITO considers the outcome of the test not reliable. These outcomes are represented by CITO using the symbols < or >. Thus, a test taken at level E5 and scored at level M4 is represented as < E4 in the CITO registration system. The same test scored at level M7 is represented as > E6). In such cases, the tests are considered to be too difficult or too easy for the student in question.

6.2.3 Data collection and variables

Data for this study were retrospectively collected from the schools’ administration systems, which store all assessments conducted between 2011 and 2014 for students at school in 2014. The systems contain up to eight measurements for every student (midseason 2011 to end-season 2014), depending on the number of years the student had attended special education. The Test Level (TL = M3 to E8) was noted for each assessment moment, as well as the Ability Score and Performance Level as provided by the CITO administration system. Where multiple tests were performed for a student at a given assessment moment, the most recent score was noted. The latter is based on the

(9)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 96PDF page: 96PDF page: 96PDF page: 96 96

assumption that the teachers had performed the second test because the first test had given cause for doubt. As the student’s performance level was more than a year off the test level, this was indicated by the symbols < and >. In those cases the score was an unreliable estimate of the student’s performance.

Additionally, the students’ age in June 2014 (the final assessment moment) was obtained from the system. Since students in special education are seldom classified in grade groups, they were grouped on the basis of ‘Didactic Age’ (DA) for reasons of comparability. The Didactic Age formally agrees with the mean number of months students have been in education based on their calendar age, ranging from 0-60 at increments of 5 (De Vos, 2014). Assuming that students receive 10 months of education per year, after half a year of education students will have a Didactic Age of 5 months, after one year a Didactic Age of 10 months, after one and a half years a Didactic Age of 15 months, and so on. Using this approach, each assessment moment (twice yearly) was associated with a Didactic Age. Successive Didactic Ages provide a growth path for the development of students over the years, comparable to the growth path of mainstream students.

6.2.4 Statistical analyses

For the first aim of the study – measuring stability in performance scores over time per student – descriptive analyses were conducted of Ability Score per Didactic Age for all students. Boxplots were computed for the mean Performance Level per Didactic Age, to provide insight into the associated Performance Levels. Spaghetti plots were computed to provide insight into the development in Ability Score over time per student. Next, given the hierarchically nested data (i.e. multiple measurements per student) a multilevel regression analysis was computed to provide information about the variance structure of outcomes within and between students. The outcomes in Ability Score were therefore estimated as a function of the Didactic Age in increments of 5 months. Fixed as well as random effects were considered for these Didactic Ages. In addition, a dummy variable indicating the reliability of the Ability Score (< and >) was included as a fixed effect in the model. The model predictions for Ability Score by Didactic Age were then plotted together with the observed score. Finally, correlations were estimated between Didactic Age within students, based on the model results. A p-value < 0.05 was considered to be statistically significant. All analyses were performed in SPSS, except for the multilevel models, which were performed in R Package nlme (Pinheiro et al., 2017).

(10)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 97PDF page: 97PDF page: 97PDF page: 97 97

Descriptive statistics were used for the study’s second aim – investigating decision-making regarding the tests set for students in special education. The first step concerns the order of the tests performed by students. The order can be regular, if each

successive test is set at the next successive level (TLDA Æ TLDA+5). Alternatively, the order

can be irregular, if the successive test is set at the same Test Level (repeated level), at a

lower Test Level (TLDA-5, TLDA-10) or at a higher (skipped) Test Level (TLDA+10, TLDA+15). It is

even possible that the next successive test is set at a much lower or much higher level (TLDA<-10 or TLDA>+15). To gain insight into the order of tests completed by the students,

successive Test Levels were compared per student.

A second step investigated the difficulty level of the tests set for the students. Therefore, the Performance Levels (PL) obtained were compared to the levels at which the

tests were performed (TL). The Performance Levels obtained could fit the Test Level (TLDA

= PLDA), they could be a half or a whole year lower or higher than the Test Level (PLDA-5,

PLDA-10, PLDA+5, PLDA+10), or they could be much lower or higher than the Test Level (PLDA<10,

PLDA>10). In the latter case, the tests are considered to be too difficult or too easy for the

student in question and the outcomes were considered unreliable.

The third step also concerns the difficulty level of the tests, this time by comparing the level of the test set for the students to the student’s ability. The latter is represented by the outcome in Performance Levels on the preceding test (DA-5). Since it is didactically logical that the Test Level set fits the preceding Performance Level, this provides another indication of difficulty of the Test Level for the student. The Test Level can follow the

preceding Performance Level (PLDA-5ÆTLDA), the Test Level can be the same, somewhat

lower or higher than the preceding PL (PLDA, PLDA+5, PLDA+10), and the Test Level can be

much lower or higher than the preceding Performance Level (PLDA<-10, TLDA>10).

6.3 Results

6.3.1 Descriptive findings

Test results are presented for 547 students with a mean Didactic Age of 32.9 (SD = 14.9) months. Table 6.1 presents an overview of the amount of Test Levels (TL) used in this study.

(11)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 98PDF page: 98PDF page: 98PDF page: 98 98

Table 6.1. The amount of Test Levels for Maths and Spelling Test

Level Maths N Spelling N Test Level In between N Maths Spelling N E4 297 193 M3E3 16 32 M5 268 262 E3M4 11 19 E5 255 227 M4E4 18 7 M6 233 234 E4M5 19 9 E6 167 166 M5E5 7 12 M7 135 155 B8 10 22 E7 116 135 M8 40 38

Table 6.2 presents the mean Ability Score (SD) per Didactic Age along with the number of tests completed for maths and spelling. For reasons of comparison, the mean Ability Score (SD) of the mainstream norm population is added (Engelen et al., 2012; Mols & Kamphuis, 2012; Scheltens et al., 2011). The outcomes show an increase in Ability Score over time. These students’ Ability Scores clearly lag behind the mean scores of norm population. Figure 6.1 presents the box plot of the mean Performance Level per Didactic Age, revealing that most students score under the expected Performance Level for their age. For instance, for the students with a Didactic Age of 15 (comparable to the M4 level), 75% of the scores were below M4 for both maths and spelling. The high standard deviations in Table 6.2 and in the boxplots in Figure 6.1 show a great range in score per Didactic Age.

6.3.2 Predictive value/order of outcomes within students

The development in Ability Score per student over time is visualized in spaghetti plots (Figure 6.2). For reasons of clarity, a random selection of about 10% of the students is shown. The figure also contains the mean norm score of typical students, displayed with the dotted line. Many students’ maths scores fluctuated considerably over time, with marked increases or decreases of up to 50 points between two subsequent Didactic Ages. Some students had only one clear dip or peak in their scores, other students had fluctuating outcomes with a number of large increases and decreases in succession. On the whole, the plots show a rising trend. This also applies to spelling although the fluctuations appear to be less marked than in the maths scores.

Figure 6.3 shows the observed Ability Scores plotted against the individual students’ Didactic Ages. The two lines in Figure 6.3 indicate the predicted scores for students considered reliable (light grey) and not reliable (dark grey) by CITO. As

(12)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 99PDF page: 99PDF page: 99PDF page: 99 99

previously mentioned, in the latter are cases where the performance level differs by over a year from the test level. The predictions are based on the multilevel model and show the average expected progression in Ability Scores over time. For maths, most of the lower Ability Scores are considered unreliable, as can be seen from the model prediction and the large number of observed dark grey scores at the lower end of the figure. Moreover, the predictions show how the difference between reliable versus unreliable outcomes is initially quite large, with a difference of around 20 points at a Didactic Age of 10 months, but is almost non-existent for older children with an average difference of around 5 points at a Didactic Age of 50. A group of around 30 unreliable scores which scored markedly higher between the Didactic Ages of 45 and 50 months can be seen at the top right of the figure. The observed scores at the bottom left and top right of Figure 6.3a might be indicative of a floor and ceiling effect for these tests.

A similar convergent pattern with initially unreliable lower scores for students can be observed for the spelling scores.. However, it should be noted that the initial difference is somewhat smaller in absolute points and much smaller considering the fact that the scale for spelling has a larger range than the maths scale. Similar to the maths plot, a small group of highly unreliable scores can be seen at the top right of Figure 6.3 between the Didactic Ages of 45 and 50 months. Moreover, some extremely low unreliable scores can be observed in the scores of students tested at younger ages.

Concerning the predictive validity of the maths scores, the correlation between Ability Score per Didactic Age as revealed in Table 6.3 has estimated values of between .57 and .90. The main diagonal in Table 6.3 for successive Didactic Ages shows correlations from .70 between DA15 and DA20, up to .90 between DA45 and DA50. These outcomes reveal that 49–81% of the variation in Ability Score in maths is explained by the Ability Score of the previous test. Correlations generally increase with increasing Didactic Ages. The magnitude of the correlations decrease, though remaining moderate to high, with increasing distance between Didactic Ages. The correlations for spelling have comparable estimated values, between .54 and .90. The main diagonal for spelling shows correlations from .76 to .90, revealing that 58–81% of the outcomes are explained by the outcome in the previous test.

(13)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 100PDF page: 100PDF page: 100PDF page: 100

Table 6.2 . A bility Scor e ov er tim e sup ple m en ted by th e t ypic al A S a s prov ide d by CI T O fo r M at hs a nd S pel ling r es pec tiv el y Maths Spelling Moment of testi ng N Me an A S (SD) Me an AS (SD) N orm p opulation N Me an A S (SD) Me an AS (SD) N orm p opulation DA 5 80 20,64 (16,6 4) 26,04 (14,6 0) 45 103,22 (8,2 9) 104,8 (10,1 2) DA 10 135 26,70 (17,4 2) 34,80 (14,6 97 105,99 (11, 25) 111,2 (7,3) DA 15 185 35,61 (21,2 2) 47,00 (14,6 8) 148 109,23 (11, 25 117,5 (7,0) DA 20 222 43,72 (21,2 3) 56,44 (14,6 4) 176 113,34 (9,4 6) 120,8 (6,7) DA 25 247 52,20 (20,8 1) 69,00 (14,4 8) 216 117,44 (9,8 5) 126,8 (6,7) DA 30 251 59,22 (22,7 8) 74,08 (14,5 2) 214 120,67 (9,7 6) 130,3 (7,0) DA 35 242 63,65 (22,6 0) 82,16 (14,5 2) 223 123,03 (9,7 4) 135,5 (7,4) DA 40 247 71,46 (22,0 6) 87,16 (12,6 4) 231 126,13 (9,9 1) 138,3 (7,4) DA 45 225 79,82 (22,0 6) 96,52 (12,6 4) 227 129,39 (10, 21) 138,5 (6,3) DA 50 193 83,92 (21,3 4) 100,04 (12, 64) 191 130,43 (9,7 5) 139,7 (6,9) DA 55 140 84,73 (21,1 4) 107,52 (12, 68) 144 130,22 (10, 54) 142,5 (7,5) DA 60 73 85,37 (21,3 6) 74 131,86 (9,6 3) DA 65 28 91,07 (23,0 6) 26 134,73 (10, 36)

(14)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 101PDF page: 101PDF page: 101PDF page: 101

F igu re 6. 1 . B ox pl ot c onc er ni ng P er fo rm an ce lev el p er T es t L ev el d is pl ay ed b y Dida ct ic al A ge fo r M at hs a nd Sp el ling r es pec tiv el y .

(15)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 102PDF page: 102PDF page: 102PDF page: 102

Figure 6.2 . Graph ic r ep re se ntatio n of th e v ar ia nc e in A bi lit y Sco re o ve r ti m e of a ran dom se le ctio n o f 10% of the stude nts fo r Math an d S pe lli ng r es pe ct iv el y .

(16)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 103PDF page: 103PDF page: 103PDF page: 103

Figure 6.3 a/b . Plot of ob se rved (p oints) and pr ed ict ed (l in es ) a bility scor es a ga ins t t he o bs er ve d di da ct ic al age , s plit fo r n ot re liab le an d re liab le te st sco re s for Maths an d S pe llin g r es pe ctive ly .

(17)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 104PDF page: 104PDF page: 104PDF page: 104

Table 6. 3. Est imated co rr elation st ruc tur e f or Math s a nd Sp ellin g re spe ct iv el y. DA 10 DA 15 DA 20 DA 20 DA 30 DA 35 DA 40 DA 45 Maths DA 15 .78 DA 20 .74 .70 DA 25 .73 .69 .76 DA 30 .65 .57 .71 .83 DA 35 .77 .71 .76 .85 .88 DA 40 .70 .71 .65 .77 .80 .86 DA 45 .63 .65 .72 .76 .79 .84 .90 DA 50 .65 .63 .72 .79 .80 .87 .85 .91 Sp ellin g DA 15 .76 DA 20 .72 .82 DA 25 .72 .82 .85 DA 30 .65 .81 .74 .84 DA 35 .74 .82 .77 .85 .89 DA 40 .73 .74 .73 .85 .89 .91 DA 45 .54 .77 .72 .83 .83 .86 .88 DA 50 .61 .76 .73 .78 .84 .87 .88 .90

(18)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 105PDF page: 105PDF page: 105PDF page: 105 105

6.3.3 Choice of test

The majority of students were tested at a level which fits their Didactic Age (e.g. most students with a DA of 15 were tested at Level M4) or on a lower test level. Only a very small amount of all the tests were in-between tests, 2.9 / 3.8% for Maths and Spelling respectively.

Table 6.4 reveals the order in which Test Levels were set for the students (6.4a), how students performed on these tests (6.4b) and how the Test Level set related to the students’ performance on a previous test (6.4c). These three tables together provide an insight into the mutual relationship and the difficulty of the Test Levels set for students. Concerning the order of Test Levels completed by the students each school semester, Table 6.4a reveals that about three-quarters of tests is set at the next successive level. A Test Level was skipped in 6.5 / 8.2% of cases, as students were set a Test Level one year ahead of the anticipated level. In 11.7 / 13.3% of cases the students took a test at the same level as previously (repeated it). A minority of the students completed a lower Test Level than previously. Tables 6.4b and 6.4c provide data concerning the difficulty of the Test Levels completed by the students.

Table 6.4b compares the Test Level per test per student to its performance level (PL). These outcomes reveal that only about one-tenth of students’ performance matched the test level (e.g. a test level at M5 had a performance level at M5). Furthermore, it shows

that a minority of the assessments are completed on a higher Performance Level (PLt+5,

PLt+10) but that about half of the assessments are completed at a lower Performance Level

(PLt-5, PLt-10). Furthermore, 3.1 / 3.2% of students score over a year ahead of the

assessment set, while 15.6 / 21.5% of the students score over a year lower than the assessment set, meaning that on the basis of CITO principles, 18.7/24.7 % of the outcomes can be considered unreliable. These tests appeared to be either too easy or – and more often – too difficult for the students.

Finally, Table 6.4c compares each test’s Test Level to the Performance Levels of the

preceding tests (PLt-5). These outcomes reveal that 22.6 / 16.2% of Test Levels were a half

year ahead of the Performance Levels of their preceding tests, 34.4 / 30.4% of Test Levels were a year ahead of preceding Performance Levels; and 30.1 / 41.5% of Test Levels were over a year ahead of the preceding Performance Levels. Accordingly, the majority of the Test Levels were well ahead (one year and higher) of the levels at which the students

(19)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 106PDF page: 106PDF page: 106PDF page: 106

T abl e 6 .4 a. The T es t L eve l off er ed re la te d to p re ce din g Te st Le ve l displ aye d in p er ce nta ge s. Te st Le ve l TL t N M uc h l ow er TL Low er TL Sa me TL Regu la r ens u ing TL Highe r TL M uc h high er TL TL t< -10 TL t-10 TL t-5 TL t TL t+5 TL t+1 0 TL t>1 0 M ath 1674 0,3 % 0,6% 5,6% 11.7% 73 ,4 % 6,5% 1,9% Sp el lin g 1427 0,7% 0,8% 5,4% 13,3% 69 ,3 % 8,2% 2,3% T abl e 6. 4 b . T he s tud en ts outc ome ( Pe rfo rma nc e Le ve l) r el ate d to th e Te st Le ve l displ aye d in p er ce nta ge s. Te st Le ve l TL t N PL muc h be ne ath TL PL b en ea th T L PL fit s TL PL a bov e TL PL muc h ab ove TL PLt< -10 PL t-10 PL t-5 PL t PLt+5 PL t+1 0 PL t>1 0 M ath 2270 15.6% 22.7% 30.5% 10 .7 % 12.5% 5.0% 3.1% Sp el lin g 2001 21.5% 25.2% 29% 9. 1% 8.3% 3.4% 3.2% Ta bl e 6. 4 c. Te st L ev el c omp ar ed to th e stu de nts ou tc om e (P er for ma nc e Le ve l) of th e for ego in g te st displ aye d in pe rc en ta ge s. Te st Le ve l TLt TL muc h ab ove for egoi ng PL TL a bov e for egoi ng PL TL fit s fo re go ing PL TL the s ame as for egoi ng PL TL b en ea th for egoi ng PL TL muc h be ne ath for egoi ng PL PL t< -10 PL t-10 PL t-5 PL t PL t+5 PLt+1 0 PL t>1 0 M ath 30.1% 34.4% 22 .6 % 4.9% 5% 2.4% 0.7% Sp el lin g 41.5% 30.4% 16 .2 % 3.7% 4.5% 3.2% 0.6%

(20)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 107PDF page: 107PDF page: 107PDF page: 107 107

performed six months previously. In 13 / 12% of cases the Test Levels were behind the preceding Performance Levels.

6.4 Discussion

The main goal of the current study was to gain insight into the practice and outcomes of large-scale standardized assessments of students with EBD in special education. Although these state-wide assessments are known for their good psychometric properties, the question rises whether they provide a reliable indication of the academic ability and performance of students with EBD in special education. The specific behavioural and emotional problems of these students and their often problematic school career complicate the assessment process, possibly providing wildly variable or unreliable scores over time. The first aim therefore was to assess insights into the gradient of academic growth of these students over time. Since the outcome of these assessments can be affected by the choice of the test, and vice versa, understanding the latter formed the second aim of this study.

The descriptive outcomes reveal that the mean Ability Scores of all students increase over time for maths and spelling. The outcomes also reveal that, in line with other studies (Trout et al., 2003; Wagner et al., 2005), the mean Ability Scores of these students for both maths and spelling were noticeably below the Ability Score of the typical norm population at the same Didactic Age. Converted to Performance Levels, about three quarters of the students performed below the level which could be expected for their given Didactic Ages. Regarding the development in Ability Score per student over time, several students show one or more extreme peaks or troughs in their academic development. However, many of these high and especially low scores are considered unreliable by CITO, possibly as a consequence of too easy or – more usually – too difficult tests being set. Based on the multilevel model, the average expected progression in Ability Score over time shows that the difference between reliable versus unreliable outcomes in particular appears at the lower Didactic Ages. This can partly be explained by the fact that CITO sets a lower limit for its Performance Levels. Every Ability Score with a Performance Level lower than M3 cannot be specified otherwise than <M3 and is therefore considered unreliable. However, logically there can only really be a difference of over a year between

(21)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 108PDF page: 108PDF page: 108PDF page: 108 108

Test Level and the Performance Level from level M4 and above. The reverse applies to the M8 Test Level.

An important issue in this study is whether the fluctuations in score within students are so excessive that there is little coherence between subsequent scores. If so, low correlations could be expected between the Ability Scores at different Didactic Ages. However, the correlations found in this study, from .70/76 to .90, are only slightly lower than the correlations for the norm group presented by CITO. Therefore, the fact that 58 to 81% of test outcomes can be explained from the outcome in the previous test is not unusual. Like the norm group, the predicted value increases with the Didactic Age. It can therefore be concluded that the fluctuating outcomes of students at successive Didactic Ages are not so distorted that no trend can be discerned in the development of students with EBD.

However, this does not imply that the results accurately demonstrate what the students actually know and can do. Looking at the choice of test, this study reveals that most of the Test Levels set were a year or more ahead of the Performance Level of the preceding test (64/71%). It appears that, regardless of the outcome on the preceding test, students are set the next Test Level in line (about 70%). The results also reveal that of the 75% of Performance Levels which were behind their corresponding Test Levels, about 40% even scored a year or more behind the Test Level set. As indicated previously, about 16/22% of the outcomes were so low, based on the difference between Test Level and Performance Level, that the outcomes could not be considered reliable. and , therefore, jeopardize validity as well. Since most of the Test Levels set appear to be too difficult for students, it is plausible that the low results reflect choice of test rather than students’ real ability. This does shed a different light on the above high correlations between scores on subsequent tests. After all, low scores on tests that are too difficult can be predictive for low scores on a next test that is also too difficult.

Although the above conclusions largely apply to both maths and spelling, there are some differences. The decreasing rate of improvement in Ability Score at higher Didactic Ages is more applicable to spelling. This difference matches the outcomes of the typical norm group and can be explained by the fact that spelling tests in the higher groups contain repetitions of previous knowledge (Mols & Kamphuis, 2012), while the assessment of maths continues to measure new skills even in the higher groups. The correlation between successive Ability Scores for spelling is somewhat stronger than the

(22)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 109PDF page: 109PDF page: 109PDF page: 109 109

correlations for maths, possibly because of the way the tests are constructed. That is, students are set easier or more difficult follow-up modules depending on a student’s performance in the first test module, leading to gradually more suitable tests. However, we can conclude for both types of assessment that most tests are set at too high a level and that the outcomes fall increasingly short compared to what can be expected of the Didactic Age in question.

Several limitations should be noted when interpreting this study’s findings. First, the data used were obtained from an existing school administration system (CITO) and as such, the researchers did not have control over the processes used to complete the administration. The same is true of how the assessments were implemented and the test conditions. Students with special needs frequently take tests under adjusted conditions (Bolt & Roach, 2009) intended to mitigate the unwanted effects of their disabilities on their scores (Pitoniak & Royer, 2001). However, there seems to be a tendency to provide students with testing adjustments which are not explicitly provided for in the test instructions, some of which are even unwarranted (Byrnes, 2008; Ysseldyke et al., 2001). If so, these adjustments could have a serious impact on the validity of an assessment and its outcomes. A second limitation concerns the fact that this was a retrospective study. Data were collected from existing records and therefore a cohort of students was not followed over a long period, but only the students who were at school in 2014. For example, students who left school in that period because of good progress or dramatic decline in performance were not included in this study. Finally, this research was conducted with data collected from a single school district in the northern Netherlands. Although there is no reason to assume that the results for this region would differ from the rest of the Netherlands, it is not clear to what extent these findings can be extended to other nations.

6.5 Conclusion

As indicated previously, it is crucial for students with EBD that the curriculum is adapted to their special needs and abilities. The latter must surely apply to assessments as well. Ysseldyke, Dennison and Nelson (2004) state that when students with disabilities are provided with appropriate instruction, they are able to perform well on state-wide assessments. The latter, however, is only possible on the condition that they are also set appropriate assessments. Since the outcomes of this study reveal that a majority of

(23)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 110PDF page: 110PDF page: 110PDF page: 110 110

assessments are set at a maladaptive level, it is difficult to evaluate the significance of their outcomes. After all, it remains unclear whether the outcomes reflect the true ability of these students. This is a disturbing idea, since assessments should be the foundation of special educational programming (Crawford & Tindal, 2006; Shrine et al., 2014).

Repeatedly providing students with EBD in special education with maladapted assessments is thus a serious problem, especially when this leads to ongoing poor outcomes. Given the reciprocal relationship between academic self-concept and academic achievement (Marsh & Martin, 2011) this policy cripples their sense of competence and academic self-esteem. Given their past, low self-esteem is often an important characteristic of students with EBD and it often contributes to problem behaviour and learning difficulties (Martin, Cumming, O’Neill, & Strnadová, 2017). Ongoing low outcomes will affect this even more. This downward spiral can only be broken by setting students appropriate assessments which focus on their progress instead of on their delay. Although the multilevel model in this study shows growth in Ability Score over time, the associated Performance Levels mainly emphasize the students’ delay. As suggested by Tindal et al. (2016), multilevel growth models using ability scores are more sensitive to observing growth than documenting progress over performance levels. Since the majority of the students score under the level at which they were tested, these kinds of state-wide assessment rarely reveal the actual improvement most students actually achieve. Criterion-referenced tests and curriculum-based assessments are two applicable alternatives. Criterion-referenced tests measure a limited domain designed to compare a student's performance on specific learning tasks to a specified criterion. curriculum-based assessments measure the performance level of a student in terms of the curriculum being used and are often a regular part of the curriculum (Shriner et al., 2014). Both assessment types provide detailed information which is important for instructional decision-making. These kinds of tests are not concerned with comparative rankings but are used to judge a student’s performance in comparison to a predetermined standard. Moreover, there is a greater concern about the degree to which the students show progress in this kind of testing, than about their score compared to other students. Since this kind of testing is closer to daily teaching practice and is therefore easier for teachers to interpret, it can be of greater value for adapting the curriculum than state-wide testing (Shriner et al., 2014). Furthermore, if the knowledge assessed is at an accessible level in the zone of proximal development, a student’s chances of success will increase. As a result, experiences of success will strengthen a student’s sense of competence and ultimately self-esteem. In this

(24)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Processed on: 3-10-2018 PDF page: 111PDF page: 111PDF page: 111PDF page: 111 111

manner, setting appropriate assessment which leads to adaptive curricula that pave the way for increasing academic growth and decreasing problem behaviour (Fore et al., 2007; Martin et al., 2017).

The strength of this study resides in the fact that it contributes to scarce knowledge concerning the use and reliability of state-wide assessments for students with EBD in special education. State-wide assessments are certainly not the only tests on which teachers base their teaching programmes. They also use other assessments and their own observations to adapt the curriculum to their students. Nevertheless, it is very important to verify the reliability of these assessments. Special education is required to include students in widely-used large-scale assessments and to report on the results. The latter are often used for accountability purposes or for evaluating the effects of interventions. Although the latter fall outside the scope of this chapter, it is important to note that using maladapted assessments can lead to incorrect assumptions and inadequate decisions. Perhaps even more serious is the fact that at some point students will be judged on these outcomes, for instance at their transition to secondary education. At that very important moment in their lives, poor test results can lead to incorrect decisions. However, most disturbing of all is that as long as students with EBD score low on these tests, our expectations of these students will remain low. This does not improve the life chances of students with EBD nor does it do justice to their abilities. It is therefore difficult for them to develop into the competent individuals we want them to be. Teachers, administrators, policymakers and researchers would do well to rethink the use of state-wide assessments for special education.

(25)

524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp 524629-L-bw-vdWorp Processed on: 3-10-2018 Processed on: 3-10-2018 Processed on: 3-10-2018

Referenties

GERELATEERDE DOCUMENTEN

Previous research on immigrant depictions has shown that only rarely do media reports provide a fair representation of immigrants (Benett et al., 2013), giving way instead

We hebben daarom de hoeveelheid leertaakgerichte instructie die wordt aangeboden door leraren in het speciaal onderwijs vergeleken met de hoeveelheid leertaakgerichte

Beste familie, vrienden en (ex)collega’s van het GPC, RENN4, de Hanzehogeschool (@ Svenja: we did it!) en de afdeling Orthopedagogiek van de RuG (@ Niek: fijn dat je me hielp

匀䔀一 愀渀搀 琀栀攀 愀爀琀 漀昀 琀攀愀挀栀椀渀最 匀䔀一   愀渀搀  琀栀 攀 愀 爀琀  漀 昀 琀攀 愀挀栀 椀渀最 ᰠ吀漀 挀漀渀挀氀甀搀攀Ⰰ 䤀 眀漀甀氀搀 氀椀欀攀

Her research focuses on the effect of systematic academic instruction on the behaviour of students with severe emotional and behavioural problems. From 2015 to 2018 she

SEN and the art of teaching: The effect of systematic academic instruction on the academic and behavioural problems of students with EBD in special education..

Individualism Uncertainty avoidance Power distance Masculinity Positive ERS Midpoint RS Negative ERS NPS H1a-c H2a-c H3a-c H4a-c H5-7 Research questions.. 1.  Is there

(A) Western blot results show the expression of MMP-2 and MMP-9 proteins, both in the active (cleaved) and inactive (full-length) forms in PVA/G sponge, PEOT/PBT sponge and