The Halpern Critical Thinking Assessment : Towards a Dutch Appraisal of Critical Thinking

(1)

Critical Thinking

Hannie de Bie & Pascal Wilhelm

¹

Faculty of Behavioural Sciences, University of Twente, Enschede, The Netherlands

Abstract

When implementing critical thinking learning objectives in education, a valid and reliable instrument for assessing the level of critical thinking skills is needed. This study focuses on the psychometric properties of the Dutch version of the Halpern Critical Thinking Assessment (HCTA). A real-world outcomes inventory (RWO- NL) was developed to measure negative life events. The number of negative life events was hypothesized to be inversely related to critical thinking ability. The HCTA and RWO-NL were administered to university students in communication and psychology (N = 240). Reliability of the HCTA appeared sufficient (α = .75; λ₂ = .77) and factor analysis indicated that the use of the constructed response and forced choice format each containing the five critical thinking subscales is an adequate method for assessing critical thinking ability. The total HCTA and weighted RWO-NL scores did not show a significant relationship, r = -.12, ns. Recommendations for improving the Dutch HCTA are discussed.

Keywords: Halpern Critical Thinking Assessment, validity, reliability

1 Correspondence concerning this article should be addressed to Hannie de Bie or Pascal Wilhelm, University of Twente, Faculty of Behavioural Sciences, De Zul 10, 7522NJ, Enschede, The Netherlands. Phone: 0031 53 489 3562 E-mail: hannie112@hotmail.com or p.wilhelm@utwente.nl

(2)

There are many examples of thoughtless actions of people resulting from blindly accepting information or carelessness. One example is the outcome of an investigation of Professor Diederik A. Stapel of Tilburg University, the Netherlands. The press release that indicated that meat eaters are more selfish and less social, was well received, especially by vegetarians. Many people believed this without asking critical questions or consider how the study was designed. Fortunately, some reporters questioned the study and thoroughly checked the methods and conclusions (Hupkens, 2011, September 14) and chose not to publish the research. Later it became known that Stapel’s data was fictitious (TilburgUniversity, 2011) and eventually all Stapel’s publications were examined in search of scientific malpractice.

How important is the skill of critical thinking within society? Quite important, according to the increasing focus in education on ‘21

^st

century skills’ (Ananiadou & Claro, 2009). These skills, including critical thinking, communication, ICT literacy, social and/or cultural skills, creativity, collaboration and problem solving skills (Voogt & Roblin, 2010) are not new in their entirety (Rotherham & Willingham, 2010). However, due to economical and societal change (from an industrial- to a knowledge society) and ICT developments, having success on the labour market often depends on having these skills. This is one of the reasons why these skills have to be taught, beginning early in education (Voogt & Roblin, 2010, 2012).

This study will focus on one of the 21

^st

century skills, namely critical thinking.

Despite many conflicting definitions (e.g. Black, 2012; Ennis, 1996; Facione, 1998; Halpern, 1998; Moseley et al., 2005; Sternberg, 1986), Butler (2012) concludes that most researchers agree that critical thinking “involves attempting to achieve a desired outcome by thinking rationally in a goal-oriented fashion” (p. 721). In accordance with this observation, the definition of Halpern (1998) will be used:

The term critical thinking refers to the use of those cognitive skills or strategies that

increase the probability of a desirable outcome. […] Critical thinking is purposeful,

reasoned, and goal-directed. It is the kind of thinking involved in solving problems,

formulating inferences, calculating likelihoods, and making decisions. Critical thinkers

use these skills appropriately, without prompting, and usually with conscious intent in

a variety of settings. (pp. 450-451)

(3)

If critical thinking becomes an educational objective, appropriate assessment is needed. Many critical thinking assessments have been studied thoroughly (e.g. Black, 2012;

Hatcher, 2011; Ku, 2009; Spector, Schneider, Vance, & Hezlett, 2000; Stein & Haynes, 2011). This study focuses on the Halpern Critical Thinking Assessment (HCTA), which presents 25 everyday scenarios. These scenarios are derived from various domains in real life;

issues like the lottery, the death penalty, and slimming programs are discussed. In each scenario, respondents are asked to answer an open-ended question to give a first reaction in their own words, which is followed by a forced choice question to choose the best response out of the possible answers. The HCTA is unique when it comes to the combination of these two formats in one assessment. The other advantages of the HCTA relative to other assessments is the use of a computerized grading system and the option for both offline and online administration. In the following, these differences and other characteristics of the HCTA will be discussed.

The HCTA consists of five categories to measure critical thinking skills: (a) verbal reasoning skills (e.g. the ability to detect and defend against ubiquitous or deceptive language usage), (b) argument analysis skills (e.g. the ability to assess the strength of an argument), (c) skills in thinking as hypothesis testing (e.g. the ability to reason scientifically, to determine whether or not the given information confirms the hypotheses), (d) using likelihood and uncertainty (e.g. make use of the correct estimate of a probability), and (e) decision making and problem solving skills (e.g. the ability to define the problem, identify goals and weigh both positive and negative findings) (Halpern, 2012).

The HCTA presents 25 scenario's respondents can encounter in daily life, each starting

with an open-ended question, followed by forced choice questions. This is the first distinction

between the HCTA and other assessments that are either based on a constructed response or a

forced choice format. Ku (2009) advocated the use of a multi response format and emphasized

that the conceptualization of critical thinking should be taken into account when developing a

critical thinking assessment. She stated that there are two components of critical thinking: the

dispositional component (e.g. intention and motivation to engage in critical thinking) and the

cognitive component (e.g. a set of skills, rules of formal logic). Together, these components

determine the actual critical thinking performance. Taube (1997) also investigated with a

confirmatory factor analysis whether a two-factor model of critical thinking provided a better

fit with the data than an one-factor model. The data consisted out of (a) the dispositional

factor measured by assessments on tolerance of ambiguity, need for cognition, and intellectual

(4)

development, and (b) the cognitive factor was measured by SAT scores, grade point average scores, and scores on the Watson-Glaser Critical Thinking Appraisal (WGCTA: Watson and Glaser (1980)). The Ennis-Weir Critical Thinking Essay Test (EWCTET: Ennis and Weir (1985)), which consists out of a constructed response format, had a significant relationship with both factors (Taube, 1997). These components appear to be associated with different response formats, in which the constructed response format exposes more of the dispositional component than the forced choice format does (Ku, 2009). Both response formats are discussed below.

Constructed response relies on free recall, the respondents have to search their memory and select the knowledge to construct an answer (Bridgeman & Morgan, 1996;

Butler, 2012; Ku, 2009). A constructed response format takes more cognitive effort, but also reveals more of the dispositional part of critical thinking because of the opportunity for the respondent to display the appropriate skills. The motivational and intentional aspects of critical thinking will become more apparent, because answering open-ended questions shows to what extent the respondent is willing and able to engage into critical thinking at the right moment (Ku, 2009). However, in contrast to the forced choice format, an assessment with a constructed response format is much more time consuming and there is concern about the subjectivity of the scoring. The critical thinking assessments that rely solely on free recall, are for instance the EWCTET and the ICAT Critical Thinking Essay Examination (SonomaStateUniversity, 1996).

A format based on forced choice requires less cognitive effort, because the

respondents can rely on recognition; they have to identify the applicable response from a list

of alternatives (Bridgeman & Morgan, 1996; Butler, 2012; Ku, 2009). The forced choice

format gives the possibility to measure how well good use of critical thinking skills is

recognized, it is a quick way to get an indication of one's critical thinking skills and an

efficient way to reach many people. But it may reveal only the cognitive component, it is

doubtful whether forced choice is suitable for measuring the disposition towards critical

thinking. Responses do not indicate how the respondent operates in an unprompted context

where they have to generate their own answers (Ku, 2009). The critical thinking assessments

that rely solely on recognition memory, are for instance the California Critical Thinking Skills

Test (CCTST: Facione (1990)) and the Cornell Critical Thinking Test (CCTT: Ennis,

Millman, and Tomko (1985)).

(5)

Both response formats have their benefits and limitations, but the HCTA combines these two formats into one assessment (Halpern, 2012). The response formats combined should potentially give a more compatible assessment, since both formats are needed to measure the dispositional and cognitive components of critical thinking. There is a considerable amount of evidence indicating high reliability and validity of the HCTA (Halpern, 2012). Internal consistency (Cronbach’s Alpha) ranges from .85 to .97 and because of the scoring method that guides the grader with scoring the constructed response, the inter- rater reliability is high (Halpern, 2012). Because items are based on constructs that are most frequently mentioned in descriptions of critical thinking, content validity is presumably high (Halpern, 2012). Four small studies mentioned in Halpern (2012) reported correlations between scores on the constructed response items and forced choice items of .39, .49, .42, and .51. This suggests a reasonable relationship between the two formats, and simultaneously gives evidence for the separability of free recall and recognition (Halpern, 2012).

The factor structure of the HCTA is evaluated in more detail in Halpern (2012) with a U.S. norm sample of 450 respondents collected by the test author. The author proposed three models and concluded that a two-factor model (comprising the constructed response and forced choice format) each containing the five categories of critical thinking skills, fits the data best. Hau et al. (2006, April) and Ku (2009) also evaluated the factor structure of the HCTA with a sample of respondents from the U.S. and Hong Kong and concluded that the same model represented the best fit. The constructed- and forced choice format are related, but also have separate characteristics. This supports the combined use in assessing critical thinking (Halpern, 2012; Hau et al., 2006, April; Ku, 2009).

Regarding construct validity, positive correlations (r = .12 - .59) exist with level of education and academic ability tests like the SAT- and grade point average scores (Butler, 2012; Halpern, 2012; Ku, 2009), but the correlation with SAT scores appears higher (r = .50 - .58) (Hau et al., 2006, April; Marin & Halpern, 2011). In addition, scores on need for cognition- and conscientiousness scales are moderately related to critical thinking skills (Clifford, Boufal, & Kurtz, 2004; Halpern, 2012; Ku, 2009; Ku & Ho, 2010; Spector et al., 2000).

Several other countries have used a translated HCTA as a tool to assess critical

thinking, including China, Ireland, Portugal, Belgium, Spain, United States, and Vietnam

(Butler et al., 2012; Halpern, 2012). With respect to the total HCTA score, studies in Spain

(M[355] = 106, SD = 16.1) and Belgium (M[173] = 113.15, SD = 11.5) found similar results

(6)

compared to the U.S. norm sample (M[450] = 109.71, SD = 18.23). The relatively constant scores illustrate the quality of the scenarios in the HCTA. Few changes by native translators were needed to make the assessment more culturally fair, perhaps because cultural differences are taken into account when developing the scenarios (Halpern, 2012).

An interesting study on the external validity of the HCTA was conducted by Butler (2012). Instead of the more frequently studied relationship between academic performance and critical thinking, she studied the relationship between real-world outcomes and critical thinking level. The real-world outcomes were measured by an inventory (Decision Outcomes Inventory (DOI)) adapted from Bruine de Bruin, Parker, and Fischhoff (2007), that determines frequencies of negative life events in numerous domains, like finances, education, and relations over the past 6 months. Butler (2012) hypothesized that the respondents who score higher on the HCTA would report fewer negative life events than those who score lower on critical thinking. This hypothesis was based on the notion that critical thinkers transfer and exercise their critical thinking skills in multiple domains of life in order to be successful and avoid negative life events caused by poor decision making. As Halpern (1998) states: “critical thinkers will have more desirable outcomes than ‘noncritical’ thinkers (where ‘desirable’ is defined by the individual, such as making good career choices or wise financial investments)”

(p. 450). This prediction was confirmed by Butler (2012), HCTA and DOI scores of 131 respondents showed a modest relationship (r = -.38, p = < .001) in the expected direction.

Also, the assumption was made that the critical thinking assessment scores and real-world outcomes would yield similar results for three qualitatively different groups of respondents (community adults, state university students, and community college students). Butler (2012) reported no differences in the number of negative life events between the groups and differences among the groups of respondents in the relationship between the number of life events and the critical thinking assessment scores. The only difference found was the score on the critical thinking assessment; community college students scored significantly lower (M = 92.31, SD = 17.50) than state university students (M = 105.15, SD = 21.48) and community adults (M = 110.42, SD = 20.43).

A replication of the above-mentioned study was carried out with an Irish sample of 70

respondents, which also participated in an online critical thinking course. This study

confirmed the results of Butler’s research and showed that there was a low but significant

correlation between critical thinking scores and real-world outcomes, r = -.28, p = .019

(Dwyer, Hogan, & Stewart, 2012). These outcomes are worth pursuing, because it provides

(7)

evidence for the external validity. If there is a relationship between critical thinking and decision making, critical thinking instruction may also yield benefits beyond the classroom.

The HCTA was translated in Dutch and published by Schuhfried GmbH. However, little psychometric data is available about the Dutch edition. Therefore, this study explored the internal structure and reliability of the Dutch HCTA. Secondly, this study attempted to replicate the findings of Butler (2012), whose primary objective was to determine whether HCTA scores are related to a real-world outcomes inventory of everyday life.

Like Halpern (2012), Hau et al. (2006, April), and Ku (2009), Cronbach's Alpha of the total item set is hypothesized to be .85 or higher and reliability of the subscale items should resemble those of the U.S. norm sample (constructed response format α = .84, and forced choice format α = .79). Additionally, it is expected that the factor analysis will show two related, but separable latent factors for the constructed- and forced choice format, each containing the five critical thinking categories. Finally, like Butler (2012), it is predicted that the correlation between HCTA scores and scores on a real world outcomes inventory shows a modest, but significant negative relationship. This prediction implies that the respondents who score higher on the HCTA will report fewer negative life events than those who score lower on critical thinking. To check for possible ambiguities in the HCTA encountered during administration, respondents indicated at the end of the test if and where they experienced problems in the comprehensiveness of the HCTA. The results of these observations are reported.

Method Participants

The respondents were first- and second year communication and psychology students from the University of Twente in the Netherlands, Enschede. For course credits, they are required to participate in research studies conducted at the Faculty of Behavioural Sciences.

They signed up voluntarily and earned 1.5 points (out of 10 mandatory credits) by completing both the HCTA and RWO-NL. 258 respondents completed the Dutch HCTA and the Dutch version of the RWO (RWO-NL) adapted from Bruine de Bruin et al. (2007) and Butler (2012). Participants were excluded from analysis when their answers on both tests indicate that they did not take the assessment seriously or when they prematurely aborted the tests.

Data from 18 respondents were excluded. Of the remaining 240 respondents, 80% was female

(n = 191) and 20% was male (n = 49). This female/male distribution represents the population

(8)

of the concerning studies. The respondents were aged 18 - 32 (M = 20.53, SD = 2.07). Ethnic background was distributed as follows: 46% Dutch, 34% German, and 20% stated another ethnicity or refused to fill out this question.

Materials

Halpern Critical Thinking Assessment. The instrument used to measure critical thinking is the HCTA (Halpern, 2012). A few technical language errors were corrected before it was used, without affecting the meaning of the items (see Appendix A). Test form S1 and version A has been used in this study. The HCTA Form S1 presents 25 everyday scenario's accompanied by questions in two response formats: first a constructed response (open-ended) and then forced choice (e.g., multiple-choice, rating of alternatives or ranking) (See Appendix B for the HCTA used in this study). Form S2 only consists of the forced choice questions, and can be used as a short form. Version A and version B are parallel versions of the HCTA. This enables a repeated measures design, without possible memory bias for the items.

The everyday scenarios were drawn from disciplines such as medical research, social policy analysis and other disciplines respondents may encounter in daily life. Five categories of critical thinking are measured with the HCTA: (a) verbal reasoning, (b) argument analysis, (c) thinking as hypothesis testing, (d) likelihood and uncertainty, and (e) decision making and problem solving. Each construct is examined with five everyday scenario's. An everyday scenario in the category of likelihood and uncertainty is for instance the following situation:

"Ahmed has obtained the highest score on the first of three exams of a group of 120 students."

The respondent is asked to indicate what expectation he or she has about how Ahmed will score at the end of the semester. They have to check a box whether Ahmed will score (a)

"class average", (b) "above class average, but not the top of the class", or (c) "top of the class". The constructed response is composed by the next question: "Explain your answer in one or two sentences". The respondent continues on the next page with the forced choice question and has to check one of the next five response options which is the most likely explanation: (a) "Ahmed was probably lazy and studied not so hard after his first exam", (b)

"Other students in his class learned how to study, so they got higher grades", (c) "An extreme

score is usually followed by a score that is closer to the average", (d) "The law of averages

predicts that all students will score close to average, regardless of their score on a single test",

and (e) "Test scores are not independent, so the score of Ahmed at the end of the semester

will depend on the performance of other students in the class". The answers to both the

constructed response and forced choice questions must show that the respondent recognizes

(9)

the principle of regression to the mean and the improbability of an extreme score followed by another extreme score (correct answer: c). The administration of the HCTA (Form S1:

constructed response and recognition items, Version A) took 45 to 80 minutes.

The HCTA was provided within the Vienna Test System (VTS: Halpern, 2012), but in this study the HCTA and RWO-NL were administrated online with Thesistools

¹

. With Thesistools, both tests could be administered online so that a large group of respondents could be reached. Scoring was derived from VTS which automatically calculates the forced choice response and guides scoring of the constructed responses. Guided grading uses computerized prompting of the grader. For example, in response to the answer to the constructed response question regarding the example of the score of Ahmed at the end of the semester described above, the grading system for example displays the following question: “Did the respondent recognize the principle of regression to the mean or the improbability of an extreme score followed by another extreme score?” By each question, the grader determines if the respondent’s answer: (a) clearly indicated this, (b) less clearly indicated this, or (3) did not indicate this at all (Halpern, 2012). By scoring in a standardized way with the use of computerized prompting, the concern for scoring bias can be reduced. Halpern (2012) reported a high (r = .83) inter-rater reliability for the constructed response, which indicates that scoring objectivity can be assumed.

RWO-NL. The computerized inventory used to assess real-world outcomes is the RWO-NL, adapted from Bruine de Bruin et al. (2007) and Butler (2012). The items were translated and adapted for Dutch respondents by altering language use within certain expressions, culturally unfamiliar or uncommon terms, and economically different items (Butler et al., 2012). The respondents indicate whether or not they have experienced a particular event in the past six months by selecting a check box with ‘yes’ or ‘no’. The inventory contains items from a wide variety of domains, like finances, education, and relations. There are two types of items possible: items containing sub-questions, and items with no sub-questions. The items with sub-questions always start with an opportunity (e.g.

"Gone shopping for food or groceries") that made the (several) negative event(s) possible.

These negative events were measured by the sub-questions (e.g. "Threw out food or groceries you had bought, because they went bad"). Therefore this inventory considers the possibility of actually experiencing a negative life event due to a previous decision. Nine items have no

1 http://www.thesistools.com

(10)

preceding occurrence (e.g. "Been in a public fight or screaming argument"). See Appendix C for the complete list of 50 presented items.

The total item set was created in three steps. First, 31 items and a total of 31 sub- questions were adopted from the original inventory published by Bruine de Bruin et al.

(2007). Of these items, 10 items and 12 sub-questions were modified for the Dutch population. Second, the adjustments made by Butler (2012) to make the items more applicable for university students were included, which resulted in nine additional items and 11 additional sub-questions. Of these items, one item and four sub-questions were modified for the Dutch population. Finally, the addition of 10 items and 19 sub-questions were made to make the inventory more suitable and culturally appropriate for the Dutch population. For example, an original item contained the words "Used checks". This was altered in "Used a debit card", because checks are rarely used in the Netherlands. In this way, the item was made more suitable for the Dutch population, without too much deviation from the original item.

Additional items were created which nowadays play a greater role in the life of the Dutch students. For example: (a) "Had a mobile phone", with subsequent negative events: (b) "Lost a mobile phone" and (c) "Had to pay at least three times extra on your phone bills because you went over your call/text/data limit". These additional items all fit in the various domains used for the RWO. The administration of the RWO-NL took 5 to 15 minutes.

The calculation of the total score of the RWO-NL was adapted from Bruine de Bruin et al. (2007) and Butler (2012). First, the total score is calculated by dividing the number of negative life events by the number of opportunities that can make a negative life event possible (Butler, 2012). This score represents the proportion of negative life events that the respondents experienced. Second, the negative outcomes are weighted which makes the calculations of the RWO-NL more fair (Bruine de Bruin et al., 2007). Some outcomes are in fact not as bad as others (e.g. taking the wrong train or bus vs. been in jail). Generally, bad outcomes probably happen to a few people. To weigh for severity, the proportion of avoided outcomes per item is computed (1 - proportion of experienced outcomes). Each respondents' weighted RWO-NL score is the sum of the weighted scores, divided by the total number of experienced opportunities to make bad decisions. This results in a score between 0 and 1, where 0 stands for making good decision making, and 1 for poor decision making. The respondents were able to not answer any uncomfortable questions by checking a third option.

If the nonresponse was an opportunity that could make a negative life event possible but the

respondent did answer "yes" to a subsequent negative life event, then the response does not

(11)

make sense. The opportunity, like driving a car, is necessary to experience a negative life event, like getting a speeding ticket. In these cases, it is assumed that the respondent has forgotten to answer the first question and therefore the nonresponse is changed to "yes". If the nonresponse was a negative life event, then the whole item is excluded from analyses by also excluding the opportunity that could make the negative life event possible from the proportion score.

Procedure

All respondents who participated in the study were presented with an informed consent statement describing the purpose of the study and stating confidentiality. All respondents received the same assessment and conducted this computerized assessment through an online questionnaire. After completion of the test, the outcomes were checked and the respondents received the compensation of 1.5 out of 10 participation points. After the total study was finished, the respondents received a debriefing by email which also included reading suggestions for students who are interested in learning more about critical thinking.

Results

The mean score on the HCTA (n = 240) was 108.23 (SD = 13.91) out of the maximum score of 194. The mean score on the weighted RWO-NL (wRWO-NL) was 0.14 (SD = 0.08).

All measures met the criteria for univariate normality (skewness and kurtosis between -1 and 1). Different age groups were compared, including ≤ 19 years and ≥ 20 years, the age boundary was incremented twice, and finally the age groups of ≤ 26 years and ≥ 27 years were compared. There were, as predicted, no differences in HCTA scores based on age or gender (all ps > .05). Also, there were no differences observed in HCTA scores between Dutch and German students, t(189) = 1.52, p = .13, and between Dutch students and remaining students with other ethnicities, t(115) = 0.58, p = .57. It is therefore concluded that ethnicity had no effect on the HCTA scores, thus students with a native language other than Dutch were presumably able to correctly understand the content of the items.

The exploration of the factor structure of the Halpern Critical Thinking Assessment

Cronbach's Alpha measured for all items was α = .75 which seems to support the

reliability of the HCTA (Kline, 2000). However, the large number of items and the use of

multiple subscales within the HCTA raised concern about the interpretation of this value

(Cortina, 1993; Field, 2009). Therefore, the Cronbach's Alpha was calculated for every

subscale of both formats (see Table 1). As can be seen, total subscale scores had very low to

(12)

Table 1

Summary of the Cronbach's Alpha (sub)scale scores of the HCTA

(Sub)scale

Cronbach's Alpha Guttman's Lambda Constructed

response

Forced

choice Total

Constructed response

Forced

choice Total

Critical Thinking .61 .64 .75 .63 .67 .77

Verbal Reasoning .30 .19 .33 .33 .23 .37

Argument Analysis .24 .22 _.38 _.30 _.26 _.42

Thinking as Hypothesis

Testing .29 .38 .53 .32 .42 .55

Likelihood and Uncertainty .31 .18 .39 .33 .20 .42 Decision Making and

Problem Solving .30 .43 .52 .34 .47 .54

Table 2

Model fit statistics of the measurement models in the HCTA sample

Model χ

²

df p CFI RSMEA ∆ χ

²

∆df p ∆ CFI

M1 32.050 29 .318 .990 .021 - - - -

M2 38.556 30 .136 .972 .035 6.506 1 .011 0.018

M3 95.614 30 < .001 .782 .096 63.564 1 < .001 0.208

poor values for Cronbach's Alpha. A better reliability estimation like Guttman's Lambda is endorsed by Sijtsma (2009). This analysis results in a slightly better overall reliability value, λ

2

= .77, and subscale values.

To further explore the factor structure of the HCTA, a confirmatory factor analysis

was carried out with three measurement models specified in Halpern (2012). This analysis

evaluated the goodness of fit of each model to the norm data. The first model (M1) constitutes

two latent factors; the constructed response ("critical thinking - free recall") and the forced

choice format ("critical thinking - recognition"), each containing the five sub-scale scores

hypothesized to load on the associated factor. The correlated unique errors of the associated

sub-scale scores were allowed to be freely estimated since items of both formats have the

same origin. The latent factors were allowed to be correlated to test the hypothesis whether

(13)

Figure 1. Standardized factor loadings of measurement model 1. CTF = latent factor critical thinking - free recall, CTR = latent factor critical thinking - recognition, VRF = sub-scale score verbal reasoning - free recall, AAF = sub-scale score argument analysis - free recall, HTF = sub-scale score hypotheses testing - free recall, LUF = sub-scale score likelihood and uncertainty = free recall, PSF = sub-scale score decision making and problem solving - free recall, VRR = sub-scale score verbal reasoning - recognition, AAR = sub-scale score argument analysis - recognition, HTR = sub-scale score hypotheses testing - recognition, LUR = sub-scale score likelihood and uncertainty - recognition, PSR = sub-scale score decision making and problem solving - recognition.

there is a relationship between the two formats, and simultaneously gives evidence for the separability of free recall and recognition. The only change in the second model (M2) and the third model (M3) compared to M1, is that the standardized latent correlation of the two latent factors was set to respectively 1 and 0, to test the hypotheses whether both factors have indistinguishable- or completely separated characteristics.

The calculations of the factor structure of the HCTA were carried out with IBM®

SPSS® Amos(TM) 22 (Arbuckle, 2013). Maximum likelihood was used to estimate the

model parameters, since the data is normally distributed. The following cut-off values were

used to evaluate the goodness of fit of the models: non-significant χ

²

-test, CFI ≥ .95, and

RMSEA < .05 (Jackson, Gillaspy Jr, & Purc-Stephenson, 2009; Marsh, Hau, & Wen,

2004).Also, to test whether model M2 and model M3 significantly differ from model M1,

criteria of a significant ∆χ

²

statistic and ∆CFI ≥ 0.01 (Cheung & Rensvold, 2002) are used.

(14)

The model fit statistics of the three measurements are summarized in Table 2. The statistics in Table 2 show a significantly better fit of model M1, ∆χ

²

(1, N = 240) = 6.51, p = .01, which indicates that the latent factors of "Critical Thinking - free recall" and "Critical Thinking - recognition" have a strong relationship (r = .785). However, the factors are separable because M2 (in which the standard latent correlation of the two factors was set to 1) gave a poorer fit than M1.

The standardized factor loadings of the constructed- and multiple choice sub-scales on their associating latent factor were all significant (all ps < .001). The correlated unique errors between the associated sub-scales turned out to be rather small and only reached significance in "Hypothesis Testing" and "Decision Making and Problem Solving". The lack of significance indicates a good fit of the model, since the correlations between the latent variables and the associated subscales are preferred to explain all variance. The structural relations and standardized factor loadings are depicted in Figure 1.

The Halpern Critical Thinking Assessment in relation with the Real World Outcomes inventory

In contrast to what was expected, there was no significant relationship between the total HCTA score and the score on the RWO-NL, r = -.10, ns. Also, the total HCTA score and the weighted score on the RWO-NL did not result in a significant relationship, r = -.12, ns.

The item nonresponse was not included in these calculations. Refusing to answer the question when uncomfortable could probably be the result of experiencing the outcome, but it might be too embarrassing to admit in the inventory. To find out how the relationship changes between the HCTA and RWO-NL when this consideration is taken into account, all unanswered questions were replaced by 'yes' (experiencing the outcome). Surprisingly, the relationship between the total HCTA score and the RWO-NL score appeared to be significant, r = -.16, p

= .01. Also, the weighted score on the RWO-NL had a somewhat stronger relationship with the HCTA, r = -.17, p = .007. However, these correlations are small. This finding seems to support the hypothesis that a higher score on critical thinking is related to experiencing less negative events in daily life. But these results should be interpreted with caution, because of the alternative calculation of the item nonresponse. See Appendix C for the complete list of the 50 presented item sets and the corresponding response frequencies.

Feedback of the respondents

The option to provide feedback on the HCTA was used by 55 respondents, this is 23%

of the total sample. Of these feedback, 37 respondents stated that complex and incorrect

(15)

syntax and the use of scientific language makes it difficult to comprehend the scenario's and questions of the HCTA. Also, seven respondents commented on the excessive length of the test, some said they spent three hours to complete the assessment. In order to maintain concentration and motivation, the sentences should be kept short and easy to understand.

Although few respondents used the opportunity to give feedback about specific items, the comments still can be useful to give an indication of where bottlenecks are. Specific useful comments about ambiguities within certain items of the HCTA were mentioned with regard to the following items: 5, 7, 8, 14, 19, and 21. Each item will be discussed below (see Appendix B The Dutch HCTA for the display of the items).

The following feedback of one respondent about item 5 summarizes the feedback of the other nine respondents. This respondent cites the following sentence derived from scenario 5: "After a year it was found that the average result of the at risk students was .2 higher than at risk student of the previous year." She comments: "This sentence is not clear. Is it about the average result, the number of current students that increased, or the students who have successfully completed the first year? Also, it was not clear to me what .2 exactly means.

Is it that the grades increased by 0.2? Or did 20% of the students achieve the first year? Or is the result of the at risk students increased by 20%?"

Seven of the respondents thought scenario 7 was too difficult. They felt they did not have enough knowledge for this question and would like to have had more explanation about terms like "diagnostic category". Another respondent thought the last question was too long and complex.

All four of the respondents are not very clear about why they found scenario 8 difficult to understand. They used terms like "vague" to describe this item.

Item 14 received feedback of five respondents. Four of these respondents even said that the correct answers were not among the forced choice response options. They probably did not understand that the response options included an underlying reasoning, instead they took it literally.

Four respondents did not understand what was meant by "deficiencies" in scenario 19.

One respondent posed the question: "Certain shortcomings of food?" This indicates a misinterpretation of the question. Instead, there should be looked at the shortcomings of the reasoning of the news article.

Finally, nine respondents directed their feedback at scenario 21. This was mainly due

to the question whether the respondents could formulate the problem in two ways. Also, the

(16)

question of part B, whether the respondents could assess the quality of several problems, was often not properly understood.

Conclusions and Discussion

The results of the present study partly replicate those reported by Halpern (2012).

First, the confirmatory factor analysis revealed that the model which reflects two correlated latent factors (the constructed response and forced choice format) each containing the five subscales of the HCTA, best fits the factorial structure of the Dutch HCTA. This supports the findings of Halpern (2012), Hau et al. (2006, April), and Ku (2009) in which the same model fitted the data of a U.S. and Chinese sample. Most important, the analysis confirms that the latent factors of the constructed response and the forced choice format are closely related, yet separate in their properties. The separability could reflect the difference between measuring more of the dispositional component with the constructed response format, and measuring the cognitive component with the forced choice format. This indicates that the use of both response formats in the HCTA is a valid method to obtain an accurate indication of the ability in critical thinking. However, the estimated value of Cronbach's Alpha did not confirm the hypothesis of α ≥ .85, neither did the value of the subscale scores (constructed response format α = .61 in lieu of α = .84, and forced choice format α = .64 in lieu of α = .79). Still, the overall values of Cronbach's Alpha (α = .75) and Guttman's Lambda (λ

₂

= .77) indicate good reliability of the Dutch HCTA. Taken together, these results not only confirm the quality of the Dutch translation, but also the universality of the two factor model each containing five subscales. This justifies the use of the Dutch version of the HCTA in the Netherlands, despite the fact that respondents still reported ambiguities in the scenarios. Once these issues have been resolved, further research on the Dutch HCTA will probably yield more reliable results.

Finally, the third aim was to assess the relationship between HCTA and RWO-NL

scores. The analysis showed a non-significant relationship, while Butler (2012) found a

modest relationship, r(131) = -.38, p < .001. The relation between the HCTA and RWO-NL

could possibly be compromised because of the use of self-report. For example, having a

higher score on the HCTA could influence the RWO-NL score by giving more social

desirable answers. Also, the inclusion of only university students could have influenced the

RWO-NL scores, because it is possible that university students are more likely to display

themselves as good decision-makers compared to students from other educational levels. Like

Butler (2012), respondents were given the opportunity to not answer any uncomfortable

(17)

questions to lessen this concern during administration of the RWO-NL. When the item nonresponse was replaced with an affirmative answer (that the respondent actually experienced the outcome), a significant negative relationship between the HCTA and the (w)RWO-NL appeared, r = -.17. This relationship is still relatively weak compared to the findings of Butler (2012) and Dwyer et al. (2012), who found correlations of respectively r = - .38 and r = -.28. But that aside, when not answering the question is caused by the embarrassment to admit experiencing the outcome, the hypothesis that a higher score on critical thinking is related to experiencing less negative events in daily life is supported.

However, this assumption can not be determined with this study. A closer look at the item nonresponse revealed a significant relationship between HCTA scores and the number of unanswered questions on the RWO-NL in the opposite direction, r(239) = -.20, p = .002. This does not correspond with the proposition that respondents with a higher critical thinking score gave also more social desirable answers and thus caused more item nonresponse on the RWO- NL. It could be that respondents with a lower HCTA score were perhaps less motivated to seriously complete the RWO-NL, and therefore did not answer all questions. Or that respondents with a higher HCTA score are more aware of the anonymity of the RWO-NL, and therefore answered more questions even when they felt embarrassed. Nevertheless, further development and validation of the RWO-NL can possibly give a more reliable tool for measuring negative life events.

This study also has its limitations. First, the sample of respondents consists of only university students from the same university who have chosen to participate in this study.

There was no random selection, although the sample is rather large and will at least be

representative for students from this Faculty. Second, in line with previous research, it is

assumed that the constructed response format measures more of the dispositional aspect of

critical thinking. The argument that the constructed response format gives the opportunity to

come up with self-generated solutions, think critically in an unprompted context, and show

the extent to which someone will think critically makes sense. But the evidence of the

assertion that it measures the dispositional aspect of critical thinking is scarce. Bridgeman and

Morgan (1996) indicate that constructed response and forced-choice questions measure

separate cognitive abilities and Martinez (1999) adds that the range of cognitions needed for

answering constructed response questions is larger than the range of cognitions for answering

forced choice questions. Further research should investigate this issue more precisely for

critical thinking. For instance other methods of measuring the disposition towards critical

(18)

thinking, like in the California Critical Thinking Disposition Inventory (CCTDI) (Facione, 2000), can provide further evidence. The CCTDI uses seven elements of the overall disposition (inquisitiveness, systematicity, analyticity, open-mindedness, maturity of judgment, truth-seeking, self-confidence, and their negative poles) and these elements are measured using 75 Likert style items. Comparing scores on the CCTDI and scores on the constructed response format of the HCTA may shed some light on whether the disposition is revealed in the constructed response. Third, the online administration of the test was unproctored. There are mixed opinions about this method of administration within the cognitive domain; proctored and unproctored conditions may be equivalent (Lievens &

Burke, 2011), or the unproctored condition yields higher test scores than the proctored conditions because of the presence of a proctor (Carstairs & Myors, 2009). This effect does not seem to occur in the present study, because no inflation of the unproctored test scores is observed compared with the mean of the U.S. norm sample (M = 109.71, SD = 18.23) (Halpern, 2012). Just the opposite, a lower mean of 108.23 (SD = 13.91) was found. The U.S.

norm sample has a wider spread in terms of age (M = 29, SD = 12.53), thus this could mean that these respondents had more years of education. More years of education is related to the level of critical thinking (Butler, 2012), which can explain the lower mean in comparison with the U.S. norm sample. But, as a fourth limitation, the lower mean may also be caused by the translation of the HCTA, where cultural differences could have influenced the test score. The Netherlands are a welfare state, where different opinions about politics, diets, drugs and alcohol could distort the score on these particular items. Respondents may for example react more lenient towards the scenario about alcohol abuse due to a lower age at which alcohol may be consumed in the Netherlands, making alcohol more socially accepted. This mildness may cause Dutch respondents to not easily report the alcohol abuse to an authority figure in that particular scenario, possibly making them to score lower on this item. Another drop in the score on the HCTA could be due to incorrect syntax. Respondents had the possibility to give feedback on the HCTA after completion of the assessment. Based on this feedback, we infer that the complex and incorrect sentence structure and scientific questioning could have affected the comprehensibility of some of the items. We suggest that these issues are addressed before further research is done.

This study has established a solid foundation for future validation research on the

HCTA in the Netherlands. The hypothesized factor structure was confirmed, which justifies

the Dutch version of the HCTA. Also, recommendations have emerged out of the respondents'

(19)

feedback and a thorough review of the authors. Because our sample of respondents had a smaller spread regarding age and educational level, it is recommended to include a larger and more diverse sample of respondents. The supervision of a proctor could give more control over the behaviour of the respondents during the completion of the tests. In addition, another method for measuring negative life events can be introduced. For instance, an unobtrusive experiment could be developed to observe actual behaviour that indicates good or bad decision making. For example, the respondent is approached by an aggressive seller and the researcher scores whether the respondent uses good or bad decision making. Although this is a very laborious method to collect data, it can circumvent the social desirability issue. Based on these observations, it is recommended to re-examine the relationship between HCTA and RWO-NL scores.

Finally, with a legitimate Dutch critical thinking assessment instrument, it is possible to examine the effectiveness of critical thinking instruction. Gains in critical thinking can be measured by an increase in HCTA scores from pretest to posttest. The repeated measures design requires the use of version B of the HCTA, which in turn should also be subject of validation studies for the Dutch population. However, further validation research is needed.

Still, a thoroughly researched HCTA is a promising tool to assess the 21st century skill of critical thinking among Dutch learners.

Acknowledgements

This research was conducted for the master thesis of Hannie de Bie, supervised by

Pascal Wilhelm and Hans van der Meij. The authors would like to thank Diane F. Halpern and

Heather A. Butler for their knowledge and helpful suggestions, and Schuhfried GmbH and

Science Plus Group BV for their practical support.

(20)

References

Ananiadou, K., & Claro, M. (2009). 21st century skills and competences for new millennium learners in OECD countries. OECD Education Working Papers, 41. doi: 10.1787/218525261154 Arbuckle, J. L. (Ed.). (2013). Amos 22 Reference Guide. Crawfordville, FL: Amos Development

Corporation.

Black, B. (2012). An overview of a programme of research to support the assessment of Critical Thinking. Thinking Skills and Creativity, 7(2), 122-133. doi: 10.1016/j.tsc.2012.04.003

Bridgeman, B., & Morgan, R. (1996). Success in college for students with discrepancies between performance on multiple-choice and essay tests. Journal of Educational Psychology, 88(2), 333. doi: 10.1037/0022-0663.88.2.333

Bruine de Bruin, W., Parker, A. M., & Fischhoff, B. (2007). Individual differences in adult decision- making competence. Journal of personality and social psychology, 92(5), 938. doi:

10.1037/0022-3514.92.5.938

Butler, H. A. (2012). Halpern Critical Thinking Assessment Predicts Real-World Outcomes of Critical Thinking. Applied Cognitive Psychology, 26(5), 721-729. doi: 10.1002/acp.2851

Butler, H. A., Dwyer, C. P., Hogan, M. J., Franco, A., Rivas, S. F., Saiz, C., & Almeida, L. S. (2012).

The Halpern Critical Thinking Assessment and real-world outcomes: Cross-national applications. Thinking Skills and Creativity, 7(2), 112-121. doi: 10.1016/j.tsc.2012.04.001 Carstairs, J., & Myors, B. (2009). Internet testing: A natural experiment reveals test score inflation on

a high-stakes, unproctored cognitive test. Computers in Human Behavior, 25(3), 738-742. doi:

10.1016/j.chb.2009.01.011

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233-255. doi:

10.1207/S15328007SEM0902_5

Clifford, J. S., Boufal, M. M., & Kurtz, J. E. (2004). Personality Traits and Critical Thinking Skills in College Students Empirical Tests of a Two-Factor Theory. Assessment, 11(2), 169-176. doi:

10.1177/1073191104263250

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of applied psychology, 78(1), 98-104. doi: 10.1037/0021-9010.78.1.98

Dwyer, C. P., Hogan, M. J., & Stewart, I. (2012). An evaluation of argument mapping as a method of enhancing critical thinking performance in e-learning environments. Metacognition and Learning, 7(3), 1-26. doi: 10.1007/s11409-012-9092-1

Ennis, R. H. (1996). Critical thinking. Upper Saddle River, NJ: Prentice Hall.

Ennis, R. H., Millman, J., & Tomko, T. N. (1985). Cornell Critical Thinking Tests Level X & Level Z:

Manual. Boise, ID: Midwest Publications.

(21)

Ennis, R. H., & Weir, E. E. (1985). The Ennis-Weir Critical Thinking Essay Test: An Instrument for Teaching and Testing. Boise, ID: Midwest Publications.

Facione, P. A. (1990). The California Critical Thinking Skills Test: College Level. Millbrae, CA:

California Academic Press.

Facione, P. A. (1998). Critical thinking: What it is and why it counts. Millbrae, CA: California Academic Press.

Facione, P. A. (2000). The disposition toward critical thinking: Its character, measurement, and relationship to critical thinking skill. Informal Logic, 20(1), 61-84.

Field, A. (2009). Discovering statistics using SPSS. Thousand Oaks, CA: Sage publications.

Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Disposition, skills, structure training, and metacognitive monitoring. American Psychologist, 53(4), 449-455. doi:

10.1037/0003-066X.53.4.449

Halpern, D. F. (2012). Halpern Critical Thinking Assessment: Test Manual. Mödling, Austria:

Schuhfried GmbH.

Hatcher, D. L. (2011). Which test? Whose scores? Comparing standardized critical thinking tests. New Directions for Institutional Research, 2011(149), 29-39. doi: 10.1002/ir.378

Hau, K. T., Halpern, D. F., Marin-Burkhart, L., Ho, I. T., Ku, K. Y. L., & Chan, N. M. (2006, April).

Chinese and United States students’ critical thinking: Cross-cultural construct validation of a critical thinking assessment. Paper presented at the Paper presented at the American Educational Research Association Annual Meeting, San Fransisco, CA.

Hupkens, J. (2011, September 14). Veel media trappen in vals wetenschapsnieuws [Media are misled by false science news], nrc.next. Retrieved from http://www.nrcnext.nl

Jackson, D. L., Gillaspy Jr, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological methods, 14(1), 6-23.

doi: 10.1037/a0014694

Kline, P. (2000). The handbook of psychological testing. London: Routledge.

Ku, K. Y. L. (2009). Assessing students’ critical thinking performance: Urging for measurements using multi-response format. Thinking Skills and Creativity, 4(1), 70-76. doi:

10.1016/j.tsc.2009.02.001

Ku, K. Y. L., & Ho, I. T. (2010). Dispositional factors predicting Chinese students’ critical thinking performance. Personality and Individual Differences, 48(1), 54-58. doi:

10.1016/j.paid.2009.08.015

Lievens, F., & Burke, E. (2011). Dealing with the threats inherent in unproctored Internet testing of cognitive ability: Results from a large-scale operational test program. Journal of Occupational

& Organizational Psychology, 84(4), 817-824. doi: 10.1348/096317910x522672

(22)

Marin, L. M., & Halpern, D. F. (2011). Pedagogy for developing critical thinking in adolescents:

Explicit instruction produces greatest gains. Thinking Skills and Creativity, 6(1), 1-13. doi:

10.1016/j.tsc.2010.08.002

Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis- testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's (1999) findings. Structural Equation Modeling, 11(3), 320-341. doi:

10.1207/s15328007sem1103_2

Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207-218.

Moseley, D., Baumfield, V., Elliott, J., Higgins, S., Miller, J., Newton, D. P., & Gregson, M. (2005).

Frameworks for thinking: A handbook for teaching and learning. Cambridge, UK: Cambridge University Press.

Rotherham, A. J., & Willingham, D. T. (2010). 21st century skills the challenges ahead. Educational Leadership, 67(1), 16-21.

Sijtsma, K. (2009). On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha.

Psychometrika, 74(1), 107-120. doi: 10.1007/s11336-008-9101-0

SonomaStateUniversity. (1996). ICAT Critical Thinking Essay Test. Rohnert Park, CA: Sonoma State University.

Spector, P. E., Schneider, J. R., Vance, C. A., & Hezlett, S. A. (2000). The relation of cognitive ability and personality traits to assessment center performance. Journal of Applied Social Psychology, 30(7), 1474-1491. doi: 10.1111/j.1559-1816.2000.tb02531.x

Stein, B., & Haynes, A. (2011). Engaging faculty in the assessment and improvement of students' critical thinking using the critical thinking assessment test. Change: the magazine of higher learning, 43(2), 44-49. doi: 10.1080/00091383.2011.550254

Sternberg, R. J. (1986). Critical Thinking: Its Nature, Measurement, and Improvement. Washington, DC: National Inst. of Education.

Taube, K. T. (1997). Critical thinking ability and disposition as factors of performance on a written critical thinking test. The Journal of General Education, 46(2), 129-164.

TilburgUniversity. (2011). Interim Report Regarding the Breach of Scientific Integrity Committed by Prof. D. A. Stapel. Retrieved from http://www.tilburguniversity.edu/nl/nieuws-en- agenda/commissie-levelt/interim-report.pdf.

Voogt, J., & Roblin, N. P. (2010). 21st Century Skills: Discussion paper. University of Twente.

Retrieved from http://onderzoek.kennisnet.nl/onderzoeken-totaal/21stecentury

Voogt, J., & Roblin, N. P. (2012). A comparative analysis of international frameworks for 21st century

competences: Implications for national curriculum policies. Journal of Curriculum Studies,

44(3), 299-321. doi: 10.1080/00220272.2012.668938

(23)

Watson, G., & Glaser, E. M. (1980). Manual for the Watson Glaser critical thinking appraisal.

Cleveland, OH: Psychological Corporation.

(24)

Appendix A

Corrections of the Dutch HCTA before administration

Location Incorrect text Corrected text

Instructie We willen graag begrijpen hoe je denkt over complexe dagdagelijkse situaties.

We willen graag begrijpen hoe je denkt over complexe dagelijkse situaties.

Probleem 1: Deel B Je baserend op deze informatie, welk van onderstaande stellingen is het meest plausibel?

Je baserend op deze informatie, welk van onderstaande stellingen is het meest aannemelijk?

Probleem 1: Deel B Schoolresultaten zullen waarschijnlijk verbeteren als we adolescenten verhinderen van te roken, ... (zo ook in optie 2 en 4)

Schoolresultaten zullen waarschijnlijk verbeteren als we adolescenten verhinderen te roken, ...

Situatie en probleem 4 Beide programma's kosten evenveel. Beide programma's kosten evenveel geld.

Probleem 4: Deel B Welk percentage van de deelnemers weegt binnen het jaar terug evenveel als zijn begingewicht?

Welk percentage van de deelnemers weegt binnen een jaar weer evenveel als zijn of haar begingewicht?

Situatie en probleem 5 "Zoals men kan afleiden uit de

verhoging in gemiddeld resultaat bij de studenten, was dit programma was een gigantisch succes."

"Zoals men kan afleiden uit de

verhoging in gemiddeld resultaat bij de studenten, was dit programma een gigantisch succes."

Situatie en probleem 6 Als leerlingen in je tekenlessen dezelfde tekeningen maken als ze zouden gemaakt hebben als ze thuis waren gebleven of niet begeleid werden, ...

Als leerlingen in je tekenlessen dezelfde tekeningen maken als ze gemaakt zouden hebben als ze thuis waren gebleven of niet begeleid werden, ...

Probleem 6: Deel B Leerkrachten zitten vaak reeds voor het schooljaar ten einde is doorheen hun materiaalvoorraad voor tekenlessen.

Leerkrachten zitten vaak door hun materiaalvoorraad voor tekenlessen heen voor het schooljaar ten einde is.

(25)

Appendix A (continued)

Probleem 7: Deel B Het gebruik van deze term suggereert dat het slachtoffer van mishandeling er in zekere zin zelf voor

verantwoordelijk is dat hij mishandeld wordt.

Het gebruik van deze term suggereert dat het slachtoffer van mishandeling er in zekere zin zelf voor verantwoordelijk is dat zij mishandeld wordt.

Situatie 8: Deel A Beoordeel de redenering van de minister-president over deze kwestie, gebruik maken van een 7-puntenschaal waarin.

Beoordeel de redenering van de minister-president over deze kwestie, gebruik makend van een 7-puntenschaal.

Probleem 8: Deel B Welke veronderstelt de baas wanneer hij deze analogie maakt?

Welke veronderstelling maakt de minister-president wanneer hij deze analogie maakt?

Probleem 9: Deel B Als de ouders erin slagen in hun opzet en hun voorstel wordt een nieuwe regel in de schoolgemeenschap, wat is dan waarschijnlijk het grootste probleem waarmee ze zullen geconfronteerd worden?

Als de ouders slagen in hun opzet en hun voorstel wordt een nieuwe regel in de schoolgemeenschap, wat is dan waarschijnlijk het grootste probleem waar ze mee geconfronteerd zullen worden?

Probleem 9: Deel B Sommige ouders zijn nalatig en leren hun kinderen niet van vriendelijk te zijn tegen anderen.

Sommige ouders zijn nalatig en leren hun kinderen niet vriendelijk te zijn tegen anderen.

Situatie en probleem 10 Een politicus werd gevraagd om zijn standpunt uit te leggen over het wetsvoorstel dat voorziet dat de staat propere naalden zou geven aan drugsverslaafden, om de verspreiding van ziekten als aids tegen te gaan. Hij antwoordde dat hij zich verzet tegen een ‘propere naalden’-programma omdat dit verkeerd is.

Een politicus werd gevraagd om zijn standpunt uit te leggen over het wetsvoorstel dat voorziet dat de staat schone naalden zou geven aan drugsverslaafden, om de verspreiding van ziekten als aids tegen te gaan. Hij antwoordde dat hij zich verzet tegen een

‘schone naalden’-programma omdat dit verkeerd is.

(26)

Probleem 10: Deel B Hij heeft niet duidelijk gemaakt of hij voor of tegen een ‘propere naalden’- programma is.

Hij heeft niet duidelijk gemaakt of hij voor of tegen een ‘schone naalden’- programma is.

Situatie en probleem 12 Natuurlijk is het geen goede keuze als je schrik hebt voor wiskunde of graag buiten werkt.

Natuurlijk is het geen goede keuze als je angst hebt voor wiskunde of graag buiten werkt.

Probleem 12: Deel B Computerwetenschappen is geen goede keuze als je schrik hebt voor wiskunde

Computerwetenschappen is geen goede keuze als je angst hebt voor wiskunde

Situatie 14: Deel A Beschrijf het redeneerwijze van de Immigratiedienst.

Beschrijf de redeneerwijze van de Immigratiedienst.

Situatie en probleem 16 Ann Marie, een Amerikaanse vrouw, wilt naar Hollywood verhuizen ...

Ann Marie, een Amerikaanse vrouw, wil naar Hollywood verhuizen ...

Probleem 16: Deel B De kans dat ze om het even welke at random geselecteerde vrouw een succesvolle actrice zal worden

De kans dat een willekeurig

geselecteerde vrouw een succesvolle actrice zal worden

Probleem 18: Deel B Welke van de volgende stellingen over de kans dat om het even welke zes nummers de winnende getallen van de Lotto zijn, is waar?

Welke van de volgende stellingen over de winkans van een getallenreeks van de Lotto, zijn waar?

Probleem 21: Deel B De vriendin zou van school kunnen gestuurd worden als ze zich zo vaak blijft bezatten.

De vriendin zou van school gestuurd kunnen worden als ze zich zo vaak blijft bezatten.

Situatie en probleem 23 Je maakt een toets in de les natuurkunde en je stoot op een probleem waarvoor je geen antwoord kan bedenken.

Je maakt een toets in de les natuurkunde en je stuit op een probleem waarvoor je geen antwoord kan bedenken.

(27)

Probleem 23: Deel B Schrijf in een gemene nota aan de leerkracht omdat hij zo'n moeilijk probleem gebruikt.

Schrijf een brief aan de leerkracht waarin je aangeeft dat hij te moeilijke vragen of sommen gebruikt.

Probleem 24: Deel B Laat de pil op de grond liggen en kijk of de hond hem opeet.

Laat de pil op de grond liggen en kijk of de hond hem op eet.

(28)

Appendix B The Dutch HCTA

With respect to copyright, only the first five questions are shown.

(29)

Instructie

We willen graag begrijpen hoe je denkt over complexe dagelijkse situaties. Alle vragen starten met een korte situatieschets. Nadat je de situatieschets hebt gelezen, krijg je hierover verschillende vragen. Bij sommige vragen moet je zelf een kort antwoord formuleren. Bij andere vragen moet je tussen een aantal alternatieven kiezen.

Hier is een voorbeeldvraag waar je zelf een kort antwoord moet formuleren.

Voorbeeld van een situatieschets:

Na afloop van een televisiedebat over de doodstraf, werden de kijkers aangemoedigd om naar de website van de zender te surfen en online te stemmen of ze "voor" of "tegen" de doodstraf zijn.

Binnen het eerste uur "stemden" bijna 1000 mensen op de website, waarbij ongeveer de helft

"voor" en de helft "tegen" de doodstraf stemde. Het nieuwsanker van deze zender maakte de resultaten de volgende dag bekend. Hij concludeerde dat de inwoners van het land evenredig verdeeld waren over het al dan niet toepassen van de doodstraf.

Hier is een voorbeeldvraag waar je zelf een kort antwoord moet formuleren:

Ben je het op basis van deze gegevens eens met de conclusie van het nieuwsanker?

ja nee

Geef twee suggesties om deze studie te verbeteren:

Eerste suggestie:

Tweede suggestie:

Het antwoord dat bij deze voorbeeldvraag verwacht wordt:

Ben je op basis van deze gegevens eens met de conclusie van het nieuwsanker?

ja nee

Geef twee suggesties om deze studie te verbeteren:

Eerste suggestie: Ik zou een steekproef proberen samen te stellen die meer representatief is voor het land - niet alleen mensen die het internet kunnen gebruiken om vragen te beantwoorden.