To Question the Question: the Influence of Textual Question Characteristics on Reading Comprehension.

(1)

Faculty of Social and Behavioural Sciences

Graduate School of Child Development and Education

To Question the Question: the Influence of Textual Question Characteristics on

Reading Comprehension.

Research Master Child Development and Education Research Master Thesis

Hannah Boll

Prof. dr. P.F. de Jong Date 22-06-2021

(2)

Abstract

Reading comprehension is often measured with a comprehension test consisting of several texts followed by questions about the text's content. These questions are deemed to tap the understanding of the preceding text. Surprisingly, it was not yet investigated how the text characteristics of the questions might influence the performance on tests of reading

comprehension. When the formulation of a question is more challenging, it is more difficult to understand what is being asked and correctly answer it. Moreover, the effects of text characteristics of questions might depend on various reader characteristics, such as being a fast reader versus a slow reader. Data from 70 reading comprehension questions from 9 texts was available for 996 children (51% boys, M age = 9.57 years, SD age = 5.66 months) from primary schools in the Netherlands. The influence of textual question characteristics, other question characteristics, reader characteristics and text characteristics on reading

comprehension was examined by multilevel logistic regression modelling. Results showed that the textual characteristics had a small effect on answering a question correctly. These were, in particular, the number of sentences and concrete nouns. Other important predictors were question type, question format, vocabulary and fluent reading speed. The results of this study also suggest that most textual question characteristics are not of great importance when question characteristics like question type and question format are already taken into account.

This underlines the importance of teaching children how to answer specific question types and question formats. More research is needed to further understand the influence of the number of sentences and concrete nouns in a question on the measurement of reading comprehension.

Keywords: reading comprehension, question characteristics, primary school

(3)

To Question the Question: the Influence of Textual Question Characteristics on Reading Comprehension.

The reading comprehension skills of children in the Netherlands are declining compared to these skills of children in other countries (Gubbels et al., 2017). The PIRLS results show that only 8% of the Dutch students reach the highest level of comprehension, and 17% of the 15-year-olds leave education as low-literate and cannot comprehend texts to acquire knowledge or understand instructions (Gubbels et al., 2017). Not only the PIRLS results but also the latest PISA results and results from the Dutch Inspection of Education show that there is cause for concern about the reading comprehension level among youth in the Netherlands (Gubbels et al., 2017; Gurria, 2015). Being able to comprehend well is a necessary skill for the rest of your life which you need to succeed in society. Children who are poor at reading comprehension also experience many problems in other school subjects and daily life (Gubbels et al., 2017).

Reading comprehension is usually measured with a comprehension test consisting of several texts followed by questions about the text's content. These questions are deemed to tap the understanding of the preceding text. Whether a child answers a question correctly is determined by characteristics of the child and the difficulty of the text. But this might also be determined by characteristics of the questions themselves. Before you can answer a question, you need to read it and understand what is being asked. One could argue that a question itself is already a piece of text. To answer the question correctly, you have to match that piece of text with the previously read text which belongs to the question. When you look at a question as a piece of text, the influence of textual question characteristics, the complexity of the formulation should also be taken into account. When it's harder to read the question, it's also more challenging to understand it.

(4)

In the current study, we examined the influence of reader characteristics, text characteristics, and question characteristics on children's reading comprehension. The main focus lies on the influence of textual question characteristics that determine how difficult it is to read and understand a question. With this study's results, we will have more insight into the difficulties children encounter when answering a question from a reading comprehension test.

Gaining more knowledge about the influence of the complexity of a question's wording is important for education on reading comprehension.

Reading Comprehension

Reading comprehension is the process of extracting and constructing meaning from written text. The reader has to create a mental representation of the text while reading the text.

It requires the continuous integration of novel information with prior knowledge of the text (Kintsch, 2012). To successfully comprehend a text, the first step is recognising letters and words. Subsequently comes the process of understanding the meaning of those words

integrating them within the context of a sentence and with the text's overall meaning (Best et al., 2008). Thus, the process of comprehending a text is a multidimensional cognitive activity (Catts, 2018). Its outcome depends on different elements: the reader, the purpose for reading and the text features (Snow, 2002).

A core process in comprehending a text is word-to-text integration. Word-to-text integration is the process of integrating novel words that are being read into the current mental model of the text (Perfetti et al., 2008). A reader needs to be able to process words while they are encountered. While reading, the reader relies on processes that anticipate for information that is still coming and processes that link what is being read to the preceding text (Stafura et al., 2015). In other words, readers rely on forward- and backward processes.

Word-to-text integration processes make it possible to tune and update the understanding of the text (Perfetti & Stafura, 2014). Imagine someone reading a text about fireworks. First, the

(5)

reader learns that a girl and her father are going to light fireworks together. A few sentences further in the text, the word explosion is encountered. The reader adds this word to his mental model of the text and connects the word explosion and the fireworks that the girl and her father were going to light. To understand what happened, the reader had to update his

understanding of the text by adding the word explosion to his mental representation of the text and connect the word explosion to the preceding text.

Skilled comprehenders tend to immediately integrate a word into the context, even when the word has to be integrated across sentence boundaries (Perfetti & Stafura, 2014;

Perfetti et al., 2008). Less skilled readers need more time to go from word identification to word integration, suggesting slow processing of words during text reading. The process that links the meaning of the word to the preceding text is slower in less-skilled comprehenders and shows a less effective use of the relation between the meaning of words (Perfetti et al., 2008). Correctly comprehending a text is thus depended on if you are a skilled reader or not.

Text Characteristics

Text characteristics tend to determine whether a reader understands a text.

Characteristics as word frequency, sentence length, genre and cohesion are known to affect reading comprehension (Francis et al., 2018; Graesser et al., 2011). Kleijn (2018) investigated the effect of linguistic features on reading comprehension in Dutch adolescents from grades 8 through 10. She presents the Utrecht Readability model. This multilevel model can predict text difficulty for Dutch readers by using the predictors word frequency, concrete nouns, maximum syntactic dependency length, content words per clause and adjectival past participles. According to this study, these five linguistic features are the best predictors for predicting text comprehension in Dutch texts. More difficult texts in the Dutch language generally consist of less frequent words, a low proportion of concrete nouns, high maximal

(6)

dependencies in sentences, a high number of content words per clause and a high proportion of adjective past participles (Kleijn, 2018).

Processing of less frequent words increases the processing time and makes the text more challenging to read (Kleijn, 2018). This is the same for texts with long sentences (Graesser et al., 2011). Text comprehending is easier when the text has a high proportion of concrete nouns that refer to persons, organisms, places, times, units of measurement and artefacts (Kleijn, 2018). Sadoski and colleagues (2000) explain that it is easier to read a text with concrete language because it promotes referential processing. The child can form pictures in his head while reading, which makes comprehending easier.

Maximum syntactic dependency length, the maximum distance between two syntactically related words in a sentence, negatively influences reading comprehension (Kleijn, 2018). Texts with sentences with more words between words that are syntactically related are more challenging to read than texts with sentences with fewer words between the two syntactically related words. Take, for example, the sentence: I saw a cute kitten. The words I and saw are syntactically related because who saw? I. The syntactic dependency length between these two words is only one because there are no other words in between the two related words. When you look at the words saw and kitten, which are also syntactically related, there is a syntactic dependency length of three. The maximum syntactic dependency length is the most extensive syntactic dependency length observed between two syntactically related words in a sentence.

The number of content words per clause also negatively influences reading comprehension (Kleijn, 2018). Content words have meaning, and a clause stands for the smallest grammatical unit that expresses a thought. Texts are more difficult to read when the number of content words per clause is high. Adjectival past participles are also negatively related to reading comprehension (Kleijn, 2018). Children tend to score lower on reading

(7)

comprehensions tasks when the text has a high proportion of adjective past participles.

Adjectives are words that tell us something about nouns. Past participles are parts of verbs used to indicate past or completed action or time. An example is a ruined cake or a broken plate.

Question Characteristics

Besides text characteristics, question characteristics also influence how difficult a reading comprehension test is. Eason and colleagues (2012) investigated the relations among reader characteristics, text types and question types in children aged 10-14 years. Question types were literal and inferential. Text types included different genres. The reader

characteristics consisted of different cognitive skills. They found a main effect between question types and cognitive skills. Higher-order cognitive skills were needed for more complex text genres and inferential questions. Inferential questions are thus more difficult to answer because you need to use higher-order cognitive skills. These higher-order cognitive skills could be lacking in weak comprehenders.

Muijselaar and colleagues (2017) also found that inferential questions are more

difficult questions than literal questions. They examined if differentiation by text and question type is necessary for reading comprehension in Dutch children. The question types they examined were literal, inferential and evaluative. Literal questions focus on information stated explicitly in the text (Basaraba et al., 2013; Eason et al., 2012; Miller et al., 2014). Inferential questions assess the child's ability to draw conclusions and make inferences about information in the text. Evaluative questions focus on integrating information from the text with

knowledge the child already has. The child has to evaluate what it is reading in the text. The results indicated that inferential questions required a higher level of reading comprehension ability than literal questions. Other previous studies, among which the study of Eason and

(8)

colleagues (2012) and Basaraba and colleagues (2013), also indicated that literal questions are easier than other question types.

Surprisingly, it has not been investigated how the formulation of the questions might influence reading comprehension. You could look at a question as a piece of text. To answer the question correctly, you have to match/integrate that piece of text from the question and answer options with the text to which the questions belong. To correctly do this, you first need to understand the question and the corresponding answer options. This is where the formulation of the question comes into the picture. When the formulation is more challenging, it is more difficult to understand what is being asked. After you understand the question, you can connect the question and the answer options. The last step is to connect the different answer options and the corresponding text in the reading assignment task. When you do this, you integrate the answer options into your mental model about the text and ask yourself if it fits into the model or not. You follow this process until you find the answer that fits the information you already stored in your mental model about the text.

We previously discussed that text characteristics tend to determine whether a reader understands a text. When we look at questions as a piece of text, textual question

characteristics might also determine whether a reader understands a text. The formulation of a question could significantly influence how difficult it is to answer a question. For example, when a question consists of less frequent words, the questions are more likely to be

challenging to answer because less frequent words are more challenging to understand than frequent words. This is why this study's central question is whether the complexity of the formulation of a question influences the chance of answering a question correctly.

Reader Characteristics

Different elements affect reading comprehension, including the characteristics of the reader. One of the best-known models concerning reading comprehension, The Simple View

(9)

of Reading, states that reading comprehension can be predicted by the product of a reader's linguistic comprehension and decoding ability (Gough & Tunmer, 1986; Hoover & Gough, 1990). In the Simple View of Reading, decoding concerns the accuracy of single-word reading. Linguistic comprehension refers to understanding the meaning of spoken language.

The Simple View of Reading shows that linguistic comprehension and decoding together account for 45-85% of the variance in reading comprehension (Catts et al., 2005; Hoover &

Gough, 1990).

Other reader characteristics that can influence reading comprehension are fluent reading and vocabulary (Cain et al., 2004; Perfetti & Adolf, 2012; Perfetti & Stafura, 2014).

Fluent and accurate word reading is needed for the processing of words and sentences in a text (Perfetti & Stafura, 2014). Vocabulary, which is a part of linguistic comprehension, is a critical component in reading comprehension. Comprehension relies on vocabulary

knowledge. A reader can't read a text without knowing what most of the words mean. A broad vocabulary is needed to derive meaning from sentences and, in the end, the text (Cain et al., 2004; Perfetti & Adolf, 2012). The estimation is that about 90% of the words need to be in the reader's vocabulary to understand the text (Nagy & Scott, 2000). It can be concluded that it is of great importance that children read fluently and have a sufficient vocabulary. Vocabulary influences the ability to accurately access the meaning of words, and fluency influences the ability to access the words quickly (Cain et al., 2003). If accessing the word meanings happens too slow, it will be too difficult to process the links with other words in the texts before the next word is being read. The word-to-text integration process is therefore impaired in less-skilled comprehenders.

The influence of reader characteristics can be depended on characteristics of the text and the questions. Quite a few studies have focused on which texts are more readable for which children, but there are only a few studies that investigated how text characteristics and

(10)

question characteristics interact with different reader skills (Barth et al., 2014; Eason et al., 2012; Francis et al., 2018). Eason and colleagues (2012) investigated the relations among reader characteristics, text type and question type in children aged 10-14 years. Question types that were investigated were literal and inferential questions. Reader characteristics included semantic and syntactic awareness, study skills and how well children could make inferences. Text characteristics included different genres: narrative, expository and functional texts. Eason et al. found that the influence of having certain well-developed reader

characteristics differed between text and question types. Children that were good at making inferences and had good study skills scored higher on the texts with the genre expository and the question type inferential. The expository genre and inferential question type can be categorised as more complex than the other text genres and question types. Good study skills and making correct inferences seemed essential characteristics when reading a more complex genre or answering a more complicated question.

Effects of reader and text characteristics on oral reading fluency as a proxy for reading comprehension were investigated by Barth and colleagues (2014) in children in grades 6 to 8.

The sample consisted of 1028 typical readers and 704 struggling readers. Oral reading fluency was used to assess reading comprehension. Oral reading fluency is characterised as efficient and automatic word reading and is closely related to reading comprehension. Measuring oral reading fluency also requires minimal time investment compared to measuring reading comprehension. Oral reading fluency was measured as words read correctly in the first 60 seconds of reading. Student characteristics included sight word reading, verbal knowledge, phonological decoding, level of decoding ability, gender and grade. Text characteristics included text length, passage difficulty, language, genre and discourse attributes. Both student-level and text-level characteristics contributed uniquely to oral reading fluency.

Student-level characteristics sight word reading and decoding ability accounted for most of

(11)

the variability between students. Passage difficulty level was the best text-level predictor of oral reading fluency. The overall ability level and the difficulty of the text interacted with each other in the model. Overall ability level consisted of the two groups skilled and struggling readers. Therefore, it can be concluded that the effect of the difficulty of a text varies between skilled and struggling readers.

The effects of reader and text characteristics on oral reading fluency as a proxy for reading comprehension were also investigated by Francis and colleagues (2018) in children in grades 6 to 8. This study used a part of the data from the study by Barth and colleagues. The sample consisted of 311 typical readers and 578 struggling readers. The difference between this study and the previous study is that the authors aimed to build a model for reading comprehension that accounts for variation within readers and across texts. The authors suggested that the Simple View of Reading should be extended to change the model from a static model to a personalised model for reading based on a developmental perspective of reading comprehension. They examined the changes in the effects of reader characteristics and text characteristics along with their interactions on oral reading fluency. Student-level characteristics included silent reading fluency, phonemic decoding, sight word decoding, verbal knowledge and listening comprehension. Text characteristics included word frequency, sentence length, word concreteness, cohesion, syntactic simplicity, narrativity and difficulty level of the passage. Francis and colleagues (2018) called this new model the Complete View of Reading. The student-level characteristics silent reading fluency, phonemic decoding and sight word decoding were the strongest predictors of reading comprehension. The effects of expository text type and difficulty level of the passage differed between reader types, typical versus struggling readers. Good readers reduce their reading speed as a function of text difficulty. They reduce their fluency more as the passage's difficulty increases and when reading an expository text compared to a narrative text. This points out that readers do not use

(12)

their cognitive resources in a homogeneous way when solving a reading task. It seems that text features may affect readers differently and that readers use their cognitive abilities differently. These results indicate that it's important to consider the variation within readers and across texts in reading comprehension models.

The results obtained from these studies show that there are interactions between reader, text and question characteristics (Barth et al., 2014; Eason et al., 2012; Francis et al., 2018). There is not only a main effect of these characteristics, but the effects depend on each other. For example, readers differ in what they find difficult while reading texts and differ in how they deploy their cognitive skills. Reading comprehension thus depends on

characteristics of the text, questions and the cognitive skills of the child. The studies by Barth and colleagues (2014) and Francis and colleagues (2019) used reading fluency as a proxy for reading comprehension. To better understand the interactions between reader characteristics and text characteristics, research on reading comprehension should focus on using reading comprehension tasks as an outcome to measure reading comprehension. Eason and colleagues (2012) used a reading comprehension measurement as their outcome, but a limitation

regarding this study is the use of limited text, reader, and question characterises. Other important characteristics, like text length and vocabulary, have been shown to have a significant influence on reading comprehension but were not included in this study.

Investigating interactions between more and other text, reader and question characteristics will broaden the knowledge base. Textual question characteristics could also be included in the model to make to model even more interesting.

The Current Study

In the current study, the influence of reader characteristics, text characteristics and question characteristics on reading comprehension was examined. Reader characteristics included fluent reading speed, vocabulary and age. The text characteristic used in this study is

(13)

text length. Question characteristics included the question type categories: literally in the text, inference and interpretation and evaluation, and the question format categories: multiple- choice, true/not true and open-ended questions. Textual question characteristics included word frequency, content words per clause, concrete nouns, maximum syntactic dependency length, mean sentence length and the number of sentences. The most important and most prominent part of this study focused on predicting the influence of how questions are

formulated on answering the question correctly. Much research has been done about question characteristics, like question type, but not about textual question characteristics, like the frequency of the words in a question.

It is expected that a low word frequency, high number of content words per clause, a low proportion of concrete nouns, high maximum syntactic dependency length and a lengthy sentence length will make it more challenging to match the question with the previously read text (Kleijn, 2018). When it's harder to match the question to the previously read text, the chance of answering a question correctly will decrease. The reader characteristics fluent reading speed and vocabulary will also influence the chance of answering a question correctly (Cain et al., 2004; Perfetti & Adolf, 2012; Perfetti & Stafura, 2014). Slow reading speed and a low vocabulary will decrease the chance of answering a question correctly. These are the main effects we expect to find.

Regarding interaction effects, we expect to find interactions between reader

characteristics and textual question characteristics. When the words in a question and answer options have a low word frequency, there will not be an effect of fluent reading speed. But when the words in a question and answer options have a high word frequency, there will be an effect of fluent reading speed. This is expected because when words are less known, it will be harder to read the words fluently. The same is expected for word frequency and vocabulary knowledge. When the words are less known, the child's vocabulary knowledge will be more

(14)

important than when words are more frequent and therefore easier. We also expect an

interaction between vocabulary and concrete nouns, based on the assumption that vocabulary knowledge is more important when there is a high percentage of concrete nouns in the

question and answer options. We expect to find the same association between vocabulary and content words per clause. Regarding fluent reading speed and the other textual question characteristics: number of sentences, mean sentence length, mean syntactic dependency length, concrete nouns and content words per clause, we also expect to find interactions.

Method Participants

For this study, we used the pre-test data from a longitudinal intervention study on reading comprehension (Muijselaar et al., 2017). The sample consisted of 996 children (506 boys and 490 girls) with a mean age of 9,57 years (SD = 5,66 months). The ages range from 7,92 years to 11,25 years. The children were recruited from 35 elementary schools in the Netherlands. Almost 96% of the children were born in the Netherlands. Other countries of birth were Turkey, Morocco, Suriname and other not specified countries. About 85% of the children had one or two parents who were born in the Netherlands. Other countries of birth of the parents were Turkey, Morocco, Surname, Netherlands Antilles and other not specified countries. There were no children with sight or hearing problems, and no children received special education services.

Instruments

Five tests were used to measure reading comprehension, fluent reading speed and vocabulary.

Text Characteristics

Reading Comprehension Texts. Two different sets of reading comprehension texts were used. Means and standard deviations of the scores on the reading comprehension texts

(15)

are shown in Table 1. Two texts came from the adjusted Dutch versions of the PIRLS 2011 reading assessment (Mullis et al., 2009). The PIRLS reading assessment of young children was designed to measure students' ability to construct meaning from texts by engaging in a range of comprehension processes. The texts' Reuzentand' and 'Vijandentaart' were used in this study. At the end of the text, the children had to answer questions concerning the text with a total of 16 and 11 questions, respectively. Question 8 and 13 from the text

'Reuzentand' were excluded from the analysis due to the type of question format. To answer these questions, the children had to fill in a table to complete the information in the table. This question format does not correspond with the question formats we focus on in the study, and therefore the questions were excluded. Question formats were open-ended questions,

true/false questions and multiple-choice questions. The total score was the number of correctly answered questions. For this sample, the Cronbach's alphas for the two texts were .77 (Reuzentand) and .62 (Vijandentaart). The Cronbach's alpha for the combined texts was .82.

The second set of reading comprehension texts came from the Aarnoutse-Kapinga reading comprehension test (Aarnoutse & Kapinga, 2006). The Aarnoutse-Kapinga reading comprehension test is a standardised test battery for measuring reading comprehension, which can be used from grades 1 to 6. For this study, the test battery for grades 4 to 6 was used. The test battery consisted of seven short texts, each followed by questions about the text. With a total of 43 multiple-choice/true or false questions. The total score was the number of correctly answered questions. For this sample, Cronbach's alpha was 0.78. In total, the children had to answer 70 reading comprehension questions. The Cronbach's alpha for the PIRLS and Aarnoutse-Kapinga texts combined was .88.

The texts were coded as short or long by the authors of this study. The text coding in short/long was done based on the length of the texts because text length is a significant

(16)

predictor of reading comprehension (Barth et al., 2014). The PIRLS texts had lengths of 832 and 920 words. On the other hand, the seven texts from the Aarnoutse-Kapinga reading comprehension test ranged from 123 to 314. This resulted in coding the PIRLS texts as long texts and the Aarnoutse-Kapinga texts as short texts.

Table 1

Descriptive statistics of the scores on the reading comprehension texts

Minimum Maximum M SD

Aarnoutse text 1 0 6 3.33 1.30

PIRLS text 1 0 16 9.89 3.22

PIRLS text 2 0 11 5.39 2.36

Total Score 17 65 42.07 10.16

Question characteristics

Textual Question Characteristics. The textual question characteristics word

frequency, mean maximum syntactic dependency length, number of sentences, mean sentence length and content words per clause were calculated with the use of the program T-scan (Pander Maat et al., 2014). Concrete nouns were calculated by hand because T-scan offers a density score, and we wanted to analyse the number of concrete nouns, not the density score.

(17)

T-scan is a software tool designed for analysing Dutch texts. The tool was designed to map characteristics that affect a text's complexity on word, sentence, paragraph and text level. The sentence-level information was used for the questions. The first analysis focused on the questions by themselves, and the second analysis focused on the questions with their corresponding answer options. T-scan offers an extensive list of characteristics that can be analysed. For this study, we asked T-scan to analyse the characteristics word frequency, mean maximum syntactic dependency length, content words per clause, number of sentences and the mean sentence length. The choice for the first three characteristics was based on the study results about the readability of Dutch texts by Kleijn (2018). The choice for adding the manually counted number of concrete nouns as a predictor is also based on the study results about the readability of Dutch texts by Kleijn (2018). Mean sentence length and number of sentences were added to the analysis because these are characteristics of sentence length and sentence length is one of the best-known predictors in the literature on predicting reading comprehension (Francis et al., 2018; Graesser et al., 2011).

Word frequency. Calculated as the frequency of all content words based on the

SUBTLEX-NL corpus (per billion words log-transformed), without names and the frequency of compound nouns, corrected by taking the frequency of the head morpheme(s). The

SUBTLEX-NL is a database of word frequencies from the Dutch language based on 44 million words from film and television subtitles. The calculated frequency is a logarithm with ground number 10.

Concrete nouns. Calculated as the number of concrete nouns, including nouns

referring to persons, organisms, artefacts, places, times and measures, in a sentence.

Mean maximum syntactic dependency length. Calculated as the number of words between a syntactic head and its dependent (for example, verb-subject) in all sentences divided by the number of sentences.

(18)

Content words per clause. Calculated as the total amount of content words per clause

in a sentence. Adverbs are not included in the calculation.

Mean sentence length. Calculated as the total amount of words in the sentences

divided by the number of sentences.

Number of sentences. Coded as 0 for questions containing only one sentence, and

questions with more than one sentence were codded as 1.

Question Types and Question Formats. The coding of the question types and

question formats (literal, inferential, evaluative, multiple-choice, true/not true and open-ended question) was done by the first and second author of an earlier conducted study into reading comprehension, which used the same pre-test data from the longitudinal intervention study on reading comprehension development (Muijselaar et al., 2017). Both question types and question formats were coded as 0, 1 and 2. For question types 0 = literal (32,9%), 1 = inferential (44,3%) and 2 = evaluative (22,9%) and for question formats 0 = multiple choice (50,0%), 1 = true/not true (31,4%) and 2 = open-ended (18,6%).

Fluent Reading Speed. To measure fluent reading speed, the Een Minuut Test (One Minute Test) was used (Brus & Voeten, 1999). The test consisted of a sheet with 116 words increasing in length. Children were asked to read the words as quickly and accurately as possible. Incorrectly read words were counted and were subtracted from the total of words read. This means that the score on the test is equal to the words correctly read within one minute. As established by the test developers, Cronbach's alpha is .89 (Brus & Voeten, 1999).

Vocabulary. To measure vocabulary, the Peabody Picture Vocabular Test-III-NL was used (Dunn & Dunn, 2005; Schlichting, 2005). This is a translated version from the original American Peabody test. The test consists of 204 cards, each displaying four images. For this study, sets 8 to 13 were used, consisting of 72 cards with four images. The children had to

(19)

choose the picture that corresponds to the word that is read out loud. They had to underline the correct picture in their booklet. The test was done in the classroom setting instead of individually. The test took about 30 minutes. The total score for each child was calculated as the number of correct answers. Cronbach's alpha was .69, as established by the authors of the original study.

Procedure

In the original study, the tests were administered in four test sessions. The fluent reading speed test was administered individually. The PIRLS reading comprehension tests, Aarnoutse-Kapinga reading comprehension test and the Peabody Picture Vocabulary test were administered in the classroom setting. Trained research assistants from the University

administered the tests. Consent from the ethics commission was obtained.

Analysis

A four-level hierarchical multilevel logistic regression model was used to test whether reader characteristics, text characteristics and question characteristics affected children's reading comprehension. Multilevel modelling was chosen because of the nested structure of the data and to simultaneously account for variance at the question-level (level-1), text-level (level-2), student-level (level-3) and school-level (level-4). There was possible dependency within each of the levels because the chance of answering questions correctly might be more similar in one text than in another text, more similar in the same child than in another child and children in the same school might be more similar than children in different schools. The independent variables were fluent reading speed, vocabulary, text difficulty, word frequency, mean sentence length, concrete nouns, content words per clause, number of sentences, mean maximum syntactic dependency length question format and question type.

Logistic regression was the right approach because the outcome variable in this study was binary, and the dependent variable followed a binomial distribution. The children either

(20)

answered the questions correctly or incorrectly, resulting in a 0 or 1 score per question. The binary dependent variable reading comprehension score was the outcome.

Analyses were carried out using lme4 software in R (Bates et al., 2015). To compare models and test whether a model improved, deviance tests were conducted with α = .05 as the criterion for significance (Agresti & Franklin, 2015). The effect sizes of the deviance tests were interpreted using Cohen's (1988) criteria. <2% is considered negligible, 2-13% is considered a small effect, 13-26% is considered a moderate effect and >26% is considered a large effect. Each new model was also compared to the empty model to determine the proportional reduction in explained variance. For each model, the interclass correlation and the proportion of reduction in deviance were also calculated. Odds ratios (OR) were

interpreted using Chen, Cohen and Chen's (2010) criteria. OR = 1.68, 3.47, and 6.71 are equivalent to Cohen’s d = .20 (small), .50 (medium) and .80 (large).

The preliminary phase involved preparing and inspection of the data. The data's examination of the continuous predictor variables was done by looking at the histograms and scatterplots to see if the variables had a normal distribution and if the plots showed

homogeneity. Inspection of the scatterplot for mean sentence length of the question resulted in the deletion of two questions from the data. These questions were question 1 from the second PIRLS text and question 13 from the third Aarnoutse text. The questions were deleted due to being outliers (more than 2 SD from the mean sentence length). The score distributions of word frequency, concrete nouns, vocabulary, fluent reading speed, maximum syntactic dependency length and content words per clause were approximately normal. The

corresponding plots showed homogeneity. The histogram for mean sentence length showed skewness to the right, and the plots showed a slight deviation from homogeneity. For mean sentence length, this means that overall, there are more short than long questions.

(21)

The next step in the preliminary phase was the preparation of the data. Dummies were made to represent the variables gender, question type, question format, text length and

number of sentences. Gender, text length and number of sentences were coded as 0 or 1.

Question type non-literal was coded as 0 (literal), 1 (inferential), 1 (evaluative) and question type evaluative was coded as 0 (literal), 0 (inferential), 1 (evaluative). Question format multiple choice was coded as 1 (multiple choice), 1 (true/not true), 0 (open-ended) and question format two-choice was coded as 0 (multiple choice), 1 (true/not true), 0 (open- ended). Finally, to prevent scaling problems, the predictor variables, excluding the dummy variables, were rescaled. The variables were rescaled to have a mean of 0 and a standard deviation of 1. The purpose of rescaling was to put all the variables in units relative to the sample's standard deviation and thus get rid of scaling problems.

The first step of the analyses was building the empty model. The empty model was the null model, which did not include any predictor variables, only random intercepts. The empty model was built to assess the variation of the log-odds from one cluster to another. The random part of the empty model was tested by comparing it to a generalised least squares fitted linear model without random parts. By calculating the intraclass correlation coefficient, the proportion of the between-cluster variation in the total variation was estimated. This gives information about the extent to which the odds vary from cluster to cluster. The ICC was estimated for the three different clusters: texts, children and schools. An ICC of 0 indicates that there is no between-cluster variation. The chance of answering a question correctly does not differ from cluster to cluster. An ICC of 1 indicates that there is no within-cluster

variation. The chance of answering a question correctly does vary from cluster to cluster but not within a cluster.

In the second step, the child-level variables vocabulary, fluent reading speed and age were added to the model. As a third step, the text-level variable text length was added to the

(22)

model. In the fourth model, the question-level question characteristics variables question format and question type were added. Lastly, the question-level textual question

characteristics were added to the model. Thus, this fifth model included all the lower-level variables and all higher-level variables that significantly contributed to a better model fit. By building this model, the unexplained variation of all lower-level effects was estimated.

The fifth model was expanded by adding the hypothesised cross-level interactions.

These are the interactions between the textual question characteristics and the child characteristics. The last step was building the final model so that the hypotheses could be tested. The random slopes for all textual question characteristics and child characteristics were tested at the text-level, child-level, and school-level to come to the final model. A Principal Components Analysis (PCA) of the random-effects variance-covariance estimates was used to detect and diagnose overfitting problems in the random-effects model (Bates et al., 2015).

The final model then was compared to the same model without multilevel modelling to demonstrate that multilevel modelling was needed. The coefficient estimates from the final model were compared to the coefficient estimates obtained without multilevel modelling.

After building the final model, odds ratios were inspected to determine whether the data supported the hypotheses. Odds ratios with their corresponding 95% confidence intervals were estimated. The odds ratios and CI's of the main effects were analysed first, and after that, the odds ratios and CI's of the interaction effects. Plots were generated for the significant interactions to illustrate the interaction effects.

Results

Data from 70 reading comprehension questions was available for 996 children. Not all children made all the 70 questions due to sickness from school or not finishing all the

questions in time. In total, the data consisted of 68.759 answered questions. The percentage of

(23)

missing data was 1.5%. The chance of answering a question correctly was higher than answering a question incorrectly.In the total sample, questions have a 59% chance of being answered correctly across all texts (M = 0.59, SD = 0.49). Correlations, means, standard deviations, skewness and kurtosis of the textual question characteristics and question characteristics question type and question format are shown in Table 2.

In general, the correlations between the textual question characteristics and reading comprehension were low to moderate. Significant correlations were observed with sentences (-.34) and mean sentence length (.24), while correlations with word frequency (-.04), content words per clause (.04), concrete nouns (.09) and maximum syntactic dependency length (.04) were not significant. For the question characteristics, a significant correlation between

question type (-.45) and the chance of answering a question correctly was found, while the correlation with question format (.01) was not significant.

For the reader characteristics, the correlation between the age of the children in months and outcome reading comprehension score was low, negative and significant (r = - .06, p <.001). Age had a mean of 114.88 and a SD of 5.66. The correlation between

vocabulary and fluent reading was significant and positive, r = .25, p <.001. Vocabulary had a mean of 35.13 and a SD of 6.31. Fluent reading speed had a mean of 61.86 and a SD of 13.70.

The correlations between vocabulary and fluent reading, and the outcome reading comprehension were significant and positive, r = .53, p <.001 and r = .40, p <.001, respectively.

Empty Model

The empty model, with only the random intercepts text, student and school included, was built to estimate the variance in the chance of answering a question correctly on the question-level, text-level, child-level and the school-level. The intercept, the grand mean, yields the logits that can be transformed to the probability of answering a question correctly.

(24)

The model intercept is b = 0.45, z = 2.30, p = .02, which translates into questions having, on average 59% chance of being answered correctly across all levels and texts. More details of the empty model can be found in Table 3a.

The interclass correlation coefficient (ICC) for the chance of answering a question correctly were calculated. The proportion of the total variance explained by differences between texts was 7.8%. The differences between students explained 9.6% of the total variance and the differences between schools 1.6%. This indicates that 81% of the total variance is explained by variables measured on the first level, the questions. All random intercepts were kept in the model. It resulted in a model with four levels. When the final model is built, the random intercepts will be tested to see if they significantly improved the model and to see if multilevel modelling was needed.

Reader Characteristics Model

In the second step, different models were built to assess the extent to which the lower- level effects vary from cluster to cluster. First, the child-level variables reading fluency, vocabulary and age were added to the model (Model 1). The deviance for this model was 85444.3 compared to the deviance of 86178.3 for the empty model. Adding reading fluency, vocabulary and age to the model reduced the variance at the child-level with 43% and at the school-level with 54%. The text-level variance remained the same. The main effects for reading fluency and vocabulary were both significant. Being a better fluent reader and having a more extensive vocabulary makes it easier to answer reading comprehension questions correctly. The coefficients can be found in Table 3a.

Reader and Text Characteristics Model

Secondly, the text-level variable text length was added to the model (Model 2).

Including the main effect of text length in the model did not improve the model fit compared to the model with only reader characteristics, ²(1) = 0.58, p = 0.45, PRD = 0.00%. Adding

(25)

text length to the model did not reduce any of the remaining deviance. The deviance for this model was the same as the deviance of 85444.3 from the first model. Only the variance at the text-level slightly decreased by 6%, variances at the child-level and school-level remained the same. The main effect of text length was not significant. The coefficients can be found in Table 3a. Due to the results that indicate that the main effect was not significant and that the variable did not improve the model fit, the decision was made to leave text length out of the further models. This decision was made to protect the parsimony and interpretability of the model.

Reader and Question Characteristics Model

Third, the question-level question characteristics were added to the model (Model 3).

Including the main effects of question format and question type in the model significantly improved the model fit compared to the model with only the reader characteristics, ²(4) = 5050.0, p = <.001, PRD = 5.93%. The proportion reduction of the unexplained deviance was small. The deviance for this model was 80174.5 compared to the deviance of 85444.3 from the first model. The variance at the text level decreased by 29%, the variance at the child-level increased by 21%, and the school-level variance remained almost the same. Inferential and evaluative questions were more difficult to answer than literal questions. Evaluative questions were even more difficult to answer than inferential questions. Multiple-choice questions were significantly less difficult to answer than open-ended questions. Two-option multiple-choice questions were even less difficult to answer than four-option multiple-choice questions. The coefficients can be found in table 3a.

Reader, Question and Textual Question Characteristics Model

Lastly, the text characteristics of the questions were added to the model (Model 4). By building this model, the unexplained variation of all lower-level effects was estimated.

Including the main effects of word frequency, number of sentences, concrete nouns, content

(26)

words per clause, maximum syntactic dependency length, and mean sentence length to the model significantly improved the model fit compared to the model with only the reader and question characteristics, ²(6) = 815.01, p = <.001, PRD = 1.02%. Adding the textual question characterises to the model resulted in a low reduction of the unexplained deviance.

The textual question characteristics did not explain much more deviance over and above the deviance already explained by the other question characteristics. The deviance for this model was 79359.5 compared to the deviance of 80174.5 from the third model. As expected, the variances and the text-level, child-level and school-level remained almost the same. A reduction of variance at the question-level was expected and not a reduction at the higher levels. The main effects for number of sentences, content words per clause, mean sentence length and concrete nouns were significant. A question with more than one sentence, more content words per clause and a higher mean sentence length is more difficult to answer.

Having more concrete nouns in the question makes the question easier to answer. Both the main effects for word frequency and maximum syntactic dependency length were not significant. The coefficients can be found in Table 3b.

Model With Interactions

The fourth model was expanded by adding the hypothesised cross-level interactions (Model 5). These are the interactions between the textual question characteristics and the reader characteristics. Including the interactions significantly improved the model fit

compared to the model with only main effects, ²(9) = 52.28, p = <.001, PRD = 0.00%. The proportion reduction in deviance was negligible. The deviance for this model was 79307.2 and was only slightly lower than the deviance of 79359.5 of the fourth model. The variances remained almost the same.

For vocabulary, only the interaction with content words per clause was significant. It does not seem to matter a lot how many content words per clause are in a question for

(27)

children with a small vocabulary. When a child's vocabulary is more extensive, the effect of the number of content words per clause is also higher. The interaction is depicted in Figure 1.

For fluent reading speed, the interactions between fluent reading speed, number of sentences, content words per clause, concrete nouns, mean sentence length and maximum syntactic dependency length were significant. The coefficients can be found in Table 3b. The

interaction between fluent reading speed and number of sentences shows that when a child is a poor fluent reader, it matters less if the question is a question with one sentence or more than one sentence. The interaction is depicted in Figure 2.

The interaction between fluent reading speed and concrete nouns shows that fluent reading speed has almost the same effect on questions with few or many concrete nouns. The effect of reading fluency is slightly higher for questions with more content words per clause.

The interaction is depicted in Figure 3. The interaction between fluent reading speed and content words per clause shows that it does not seem to matter if a question contains more or less content words per clause when a child is a poor fluent reader. When a child is a better fluent reader, it's easier to answer a question with fewer content words per clause correctly than a question with more content words. The interaction is depicted in Figure 4.

The interaction between fluent reading speed and mean sentence length shows that when a child is a poor fluent reader, it matters more if a question contains fewer or more words than when a child is a better fluent reader. The interaction is depicted in Figure 5. The interaction between fluent reading speed and mean syntactic dependency length shows that the effect of a low or high mean syntactic dependency length changes when a reader becomes a better fluent reader. Poor fluent readers seem to find it easier to answer questions with a higher mean syntactic dependency length, and better fluent readers seem to find it easier to answer questions that have a lower mean syntactic dependency length. The interaction is depicted in Figure 6.

(28)

Final Model

The last step was building the final model so that the hypotheses could be tested (Model 6). The random slopes for all textual question characteristics and child characteristics were tested at the text-level, child-level, and school-level to come to the final model. Random slopes that significantly improved the model and contributed to the explained random

intercept variance without the model being over-fitted stayed in the model (Bates et al., 2015).

At the text-level, the random slopes for mean syntactic dependency length, word frequency, concrete nouns, content words per clause, and mean sentence length significantly added to the model's fit when tested one by one without the other random slopes included in the model.

The random slope for concrete nouns significantly improved the model's fit at both the child- level and the school-level when tested without the random slopes at the text-level included in the model.

Adding the slopes one after another led to an overfitting problem when the slopes word frequency, concrete nouns, content words per clause and mean syntactic dependency length were included at the text-level and the slope for concrete nouns was included at both the student-level and the school-level. A PCA of the random-effects variance-covariance estimates was used to diagnose the overfitting problem. The PCA showed that the slopes for content words per clause and mean syntactic dependency length at the text-level explained almost no extra variance over and above the slopes for word frequency and concrete nouns and the intercept. Excluding the slopes for content words per clause and mean syntactic dependency from the model made the model more parsimonious. Thus, it was decided not to include the slopes for content words per clause and mean syntactic dependency length in the model. This resulted in the final model with a slope for concrete nouns at the school-level, student-level, and text-level slopes for word frequency and concrete nouns. The coefficients can be found in Table 3b.

(29)

The final model was compared to the same model without multilevel modelling to demonstrate that multilevel modelling was needed. The multilevel model performed better than the model without levels, χ2(12) = 4655.9, p = <.01, PRD = 5.62% (small effect). The deviance for the final model was 78220.9 and the deviance of the model without levels was 82877.0. When the estimates of the two models are compared, it stands out that the predictors word frequency, mean syntactic dependency length, and the first dummy for question type are significant in the model without the levels text, child and school included. In the multilevel model, these effects were not significant. The significance of the interactions is the same in both models. The values of the estimates and interactions are approximately the same.

The final model explained 21.5% of the text-level variance, 41.3% of the child-level variance and 25.7% of the school-level variance. Some significant major important effects, some minor significant effects and minor non-significant effects have been found. Regarding the main effects, vocabulary and fluent reading speed were the most important predictors at the child-level. Question type, question format, number of sentences and concrete nouns were the most important predictors at the question-level. Only one significant interaction was found concerning vocabulary, namely the small interaction effect with content words per clause.

Different significant interaction with fluent reading speed were found, but only the interaction with number of sentences concerned a substantial effect. The odds ratios of the other

significant interactions ranged between 0.90 and 1.10 and were considered negligible. The figures of the interactions also explain why these interactions can be considered to be negligible. The figures can be found in the Appendix. When examining the figures, it can be seen that, within a relevant range of -1.5 to +1.5 SD, the confidence intervals of the effects overlap for quite a large part.

The main effects, the odds, imply that at the mean of the other predictors, an increase of 1 SD from the mean for a predictor variable makes it a certain amount of time more or less

(30)

likely to answer a question correctly. The odds ratios of the main effects can be found in Table 3b. Regarding the main effects of the textual question characteristics, the most important predictors were number of sentences and concrete nouns. Answering a question correctly when a question has more than one sentence is 1.86 times less likely than when it's a question with only one sentence. Indicating that questions that contain more than one

sentence are more difficult questions to answer correctly. When a question contains more concrete nouns, it is 1.29 times more likely that a question is answered correctly. This means that it is easier to answer a question correctly when a question consists of more concrete nouns.

Other important predictors at the question-level were question type and question format. The chance of answering a question correctly when a question is an evaluative question is 2.11 times less likely than when it's an inferential question, which means that evaluative questions are more difficult questions to answer than inferential and literal

questions. The difference in the chance of answering a question correctly between inferential and literal questions was not significant. When a question is a four-option multiple choice or true/not true question, the chance of answering a question correctly is 2.23 times more likely than when it's an open-ended question. The chance of answering a question correctly when a question is a true/not true question is 1.81 times more likely than when it's a four-option multiple-choice or open-ended question. It indicates that the true/not true questions are the most accessible questions and open-ended questions the most difficult.

At the child-level, the most important predictors were vocabulary and fluent reading speed. A more extensive vocabulary makes it 1.41 times more likely that a question is answered correctly, which means that children with an above-average vocabulary have a higher chance of answering a question correctly. Being a faster fluent reader makes it 1.26

(31)

times more likely that a question is answered correctly. Children with an above-average fluent reading speed have a higher chance of answering a question correctly.

Regarding the interactions, only one interaction turned out to have a substantial effect.

This was the interaction between fluent reading speed and number of sentences. The

interaction shows that when a child is a poor fluent reader, it matters less if the question is a question with one sentence or more than one sentence than when the child is a fast-fluent reader. The more fluent a child can read, the more it matters if a question consists of one or more sentences. The same as poor fluent readers, fast-fluent readers seem to find it easier to answer questions with one sentence.

Discussion

With this study, we added to the knowledge base about question characteristics that influence the performance on measures of reading comprehension. A lot is already known about text, reader and question characteristics, but surprisingly, it was not yet investigated how the text characteristics of the questions might influence reading comprehension. The purpose of this study was to explore which textual question characteristics influence the chance of answering a question correctly, to see if the textual question characteristics have an influence above and beyond the other already well-known reader and question characteristics and to find out which interactions between reader characteristics and textual question

characteristics influence the chance of answering a question correctly. This was accomplished by building six hierarchical multilevel logistic regression models.

Textual question characteristics that predicted the chance of answering a question correctly included concrete nouns, content words per clause, number of sentences and mean sentence length. Question characteristics that predicted the chance of answering a question correctly included question type and question format. The text characteristic text length did not predict the chance of answering a question correctly and was not included in the final

(32)

model. Reader characteristics that predicted the chance of answering a question correctly included vocabulary, fluent reading speed and age. The discussion is structured by first discussing the findings from the final model in line with the order presented in the introduction: question characteristics, reader characteristics, and textual question

characteristics. The model predicted a moderate amount of the item-level variance. The model explained 21.5% of the text-level variance, 41.3% of the child-level variance and 25.7% of the school-level variance.

Effect of the Text Characteristics

Surprisingly, text length did not seem to influence the chance of answering a question correctly. This finding is not in line with earlier research on the influence of text length on reading comprehension, in which text length was a significant predictor (Barth et al., 2014).

One possible explanation could be that the coding of the texts in short and long texts corresponds to the reading comprehension battery the texts came from. The Aarnoutse-

Kapinga reading comprehension battery texts were coded as short texts, and the texts from the PRILS battery were coded as long texts. The coding might have implicated something

different than text length, possibly a difference between the two test batteries. Text length was not included in the final model.

Effects of the Question Characteristics

Results indicate that the question type and question format are important predictors of the chance of answering a question correctly. Included in the final model were question types literal, inferential and evaluative, and question formats multiple-choice 4-options, true/not true and open-ended. The true/not true question format has the highest probability of being answered correctly. After that came the multiple-choice 4-option format. The open-ended format had the lowest probability of being answered correctly. Answering an open-ended question was more than twice as difficult as answering a multiple-choice 4-option or true/not

(33)

true question. An explanation for these findings could be that for multiple-choice questions, correct answers can also be obtained by guessing (Burton, 2001). This guessing chance makes it more likely that a child answers a question correctly. For a true/not true question, you have a guessing chance of 50%, and for a multiple-choice four-option, you have a guessing chance of 25%. The guessing chance could be why open-ended questions are so much more difficult than multiple-choice questions.

Consistent with earlier findings, literal question were the questions with the highest probability of being answered correctly (Basaraba et al., 2013; Eason et al., 2012; Muijselaar et al., 2017). Evaluative questions were the most difficult questions to answer because the child has to evaluate what is being read and integrate the information from the text with knowledge the child already has. After adding the random slopes to the model, the dummy variable that compares literal and inferential questions with evaluative questions was not significant anymore.

Effects of the Textual Question Characteristics

In this study, it was suggested that a question could be seen as a piece of text. To be able to answer a question, it first needs to be understood. When the formulation of a question is more challenging, it is more difficult to understand what is being asked and correctly answer it. With this theory in mind, the text difficulty characteristics from Kleijn (2018) were used to investigate the effect of the formulation of the question on the chance of answering a question correctly. The effects found were all in line with the expectations based on the predictors of text difficulty in Dutch texts (Kleijn, 2018).

Results indicate that the textual characteristics, in general, had only a small influence on answering a question correctly. Two main effects are more prominent than the others:

number of sentences and concrete nouns. Concrete nouns was also an important predictor in the study by Kleijn (2018). Number of sentences was not included in that study. When a

(34)

question has more than one sentence answering the question correctly is almost twice as difficult compared to a question with only one sentence. This is an important new finding.

When a question contains more concrete nouns, it is easier to answer the question correctly.

This is consistent with Kleijn's (2018) findings and with Sadoski and colleagues (2000), who explained that children can form pictures in their head while reading concrete nouns, making comprehending easier.

Surprisingly, the effects of word frequency and mean maximum syntactic dependency length were not significant. It was expected to find an effect of word frequency because processing of less frequent words increases the processing time and makes the text more challenging to read (Kleijn, 2018). An explanation for the absence of an effect for word frequency could be that the frequencies of the words in the questions are very similar to the frequencies of the words in the corresponding text. If this is the case, then the word frequency effect is already explained by the differences between the texts. That could be why we did not find an effect at the question-level. For mean maximum syntactic dependency length, it was expected that sentences with a high syntactic dependency length are more challenging to read than sentences with a low syntactic dependency length (Kleijn, 2018). An explanation for the absence of this effect could be that some questions are too short to be seen as a piece of text.

It might be the case that an effect of the mean maximum syntactic dependency length can only be found in longer pieces of text than the questions that we treated as a piece of text in this study.

The effects of vocabulary and fluent reading speed are in line with earlier research on reader characteristics that influence reading comprehension (Cain et al., 2004; Perfetti &

Adolf, 2012; Perfetti & Stafura, 2014). A good vocabulary is needed to accurately assess the meaning of words and derive meaning from sentences (Cain et al., 2003; Cain et al., 2004;

(35)

Perfetti & Adolf, 2012). Fluency is needed to access the words quickly (Cain et al., 2003).

When a child is a slow reader and does not read fluently, it is harder to understand what is being read because it takes too much time to process the whole sentence. This finding is also in line with the Simple View of Reading, which states the importance of the accuracy of single-word reading in predicting reading comprehension ability (Gough & Tunmer, 1986;

Hoover & Gough, 1990).

Interactions Between Reader and Textual Question Characteristics

Results indicate that the effects of the predictors on reading comprehension only slightly differed due to the values of other predictors. Vocabulary only seemed to have a significant interaction with content words. An explanation for the absence of the other interactions might be that the Peabody Picture Vocabulary test was administered in the classroom setting instead of individually like the Peabody Picture Vocabulary test authors intended the administration to be. This might have resulted in underestimating the effect of vocabulary and underestimating the interactions involving vocabulary. Fluent reading speed had significant interactions with all textual question characteristics except for word frequency.

The fact that an interaction between fluent reading speed and word frequency was not found might be due to the absence of the main effect for word frequency.

An interaction that stands out is the interaction between fluent reading speed and number of sentences. When a child is a slow fluent reader, it does not seem to matter that much if the question contains one or more sentences. The probability of answering a question correctly is in both cases lower than for a skilled fluent reader. The interesting part of this interaction is that fluent reading speed has a bigger influence on questions with one sentence than on questions with more than one sentence. The difference between poor and good fluent readers is bigger when answering a question with one sentence than answering a question with more than one sentence. This effect could be explained by the fact that when a slow

(36)

reader reads a question containing one sentence, the reader can easily make a reading error while reading. When this reader needs to read more than one sentence, the reader can still correct an error in the first sentences when he or she reads the other sentences. The reader has the time to notice that something was not read correctly and adjust for the error before

answering the question. When a reader does not make an error but misses information while reading because the reader is a slow reader, this missed information in the first sentence can sometimes be pointed out in the second sentence when the sentences are coherent. In this case, the second sentence can compensate for the information in the first sentence. Another finding is that fast fluent readers find it more challenging to answer questions with more than one sentence. It might be the case that questions with more than one sentence are by

themselves harder to answer, as seen in the main effect of the number of sentences. Good readers tend to reduce their fluency more as text difficulty increases (Francis et al., 2018), meaning that being able to read fluently fast is not that important in questions with more than one sentence because answering those questions is more complex due to other reasons.

To summarise, the most important predictors for performance on a reading comprehension task were question type, question format, number of sentences, concrete nouns, vocabulary and fluent reading speed. Open-ended questions are more than twice as difficult to answer as multiple-choice questions for children aged 7 to 12. Evaluative questions are twice as difficult to answer as inferential and literal questions. Questions with more than one sentence are almost twice as difficult to answer correctly. When a question contains more concrete nouns, it's easier to answer the question correctly. An important interaction was the interaction between fluent reading speed and number of sentences. The interaction seems to indicate that it also might benefit slow fluent readers when a question contains more sentences because it gives the reader time to correct a reading error, and fully understand the question.