A holistic evaluation of a museum project for history and art education

(1)

Faculty of Social and Behavioural Sciences

Graduate School of Child Development and Education

A Holistic Evaluation of a Museum Project

for History and Art Education

Research Master Child Development and Education Research Master Thesis

Sophia Braumann

Carla van Boxtel, Mark Schep September 3rd_{, 2018}

(2)

Preface

The co-author Mark Schep was responsible for contacting the selected schools, as well as for most parts of the data collection.

The first supervisor Carla van Boxtel developed the Dutch questionnaires employed in this study and also collected parts of the data.

The research master student Sophia Braumann wrote this article representing her master thesis. She was further responsible for maintaining the contact with the project coordinators of the Rijksmuseum and she was involved in choosing the outcomes of interest for this study and

(3)

Abstract

The aim of this study was to evaluate an educational museum project of the Rijksmuseum in Amsterdam. Previous research on museum learning has been extended to address several shortcomings of the existing literature. In this study, the outcomes of interest

were theoretically grounded and based on a close collaboration with the museum. A mixed method approach enabled the bridging between scientific research on museum learning, and

an efficient evaluation study with practical significance for the museum. By employing an answer-frequency analysis, our results showed that children who participated in the project Jij

en de Gouden Eeuw (i.e You and the Golden Age) had substantial knowledge about the Dutch

Golden Age and reported a positive museum- and role-taking experience that was facilitated through various learning activities. Furthermore, we found that the participating children displayed a positive perception of the value related to the presented history and its traces. Structural equation modelling further enabled the analysis of construct relations, facilitating a

holistic approach to investigate museum learning. These relations differed across different theme-topics, but not between different time groups (i.e. assessment 0-6 weeks and 24-30

weeks after museum visit). These results indicate that the developed methods captured important aspects of museum learning. The qualitative teacher feedback resulted in practical

suggestions for improving the museum project. It is proposed that a set of diverse learning activities and contexts, as provided by the evaluated project for art and history education, can

be used by teachers to facilitate experience- and inquiry-based learning for their pupils.

(4)

A Holistic Evaluation of a Museum Project for History and Art Education

Field trips to museums are a relatively common practice for schools to exchange the formal learning setting of a classroom with that of a museum to enrich the curriculum. Particularly in the context of learning in museums (or museum learning), two types of pedagogic approaches are often targeted, namely experiential-based learning (Hooper‐

Greenhill, 2004) and inquiry-based learning (see e.g. Andre, Durksen, and Volman (2017) for a summary of studies). In this study, we aimed to evaluate the effectiveness of a museum project which includes activities that are based upon the pedagogic approaches of

experiential-based and inquiry-based learning to teach about art and history.

In the experience-based (or experiential) learning framework by Kolb (1985), learning is defined as a process of building knowledge by grasping and transforming experiences (happening in a given learning environment). In a theoretical framework by Ip and Naidu (2001), it is further differentiated between first-hand and third-hand experience-based designs. First-hand experience-based designs are thereby realized by authentic learning environments that provide a setting where the individual learner has the room to think, reflect, make

mistakes and ultimately learn from the own experiences. A museum as learning environment, which allows for touching, interacting or dynamically choosing of learning input, or the participation in role-plays, facilitates first-hand experience-based learning. Third-hand experience-based designs are realized through an extensive use of (real-life) stories throughout the teaching process. As a result, the presented stories are assumed to hold the most authentic repository of knowledge and consequently function as an effective motivator of learning.

In educational projects related to museums, first-and third-hand experienced-based pedagogic approaches very often go hand in hand, for example, by embedding exhibits of a museum into stories, or conducting role-plays of stories that are part of a museum’s theme.

(5)

Research on museum learning provides support for the effectiveness of the combination of such learning activities. For example, it was often shown that well-prepared museum visits - where an extensive context (a story) was previously provided for embedding the later visited exhibits - rendered a field trip more fruitful and memorable for visiting children (Eshach, 2007; Falk & Balling, 1982; Falk & Dierking, 2013; Orion & Hofstein, 1994).

Inquiry-based learning refers to the process of learning by employing methods and practices as used by professional scientists to construct knowledge (Keselman, 2003; Pedaste et al., 2015). The individual learner, therefore, needs to actively participate in the process and has the responsibility to discover new knowledge by posing questions and conducting

researching activities.

In educational projects related to museums, inquiry-based learning is often tapped through inquiry-based activities that are integrated in, for example, hands-on activities, conversations at the exhibits, or even mobile guide systems (see Andre et al. (2017) for a summary). All these inquiry-based activities were reported to be popular tools for conveying knowledge and making meaning of the past (Melber, 2003; Sung, Hou, Liu, & Chang, 2010; Tenenbaum, Prior, Dowling, & Frost, 2010). Summarizing, both types of learning (i.e., experience-based and inquiry-based learning) are always embedded in a certain type of learning activity and there are many studies that investigated the application of single learning activities within the context of museum education (see e.g. Andre et al, 2017).

However, despite the vast application of activities reflecting experience-based and inquiry-based learning in museum education, not much is known about the learning outcomes that are facilitated through sets of different learning activities. Most museums offer a variety of activities within one educational program (e.g., a combination of a guided tour, short films, and plays) but research evaluating the effectiveness of such programs is scarce (Andre et al, 2017). This ultimately means that outcomes of museum learning are mostly unknown. The present study hence aimed to focus on the assessment of potential learning outcomes of an

(6)

educational museum project that combines different learning activities within one program. In the following, literature on potential learning outcomes of museum education is discussed in light of a framework provided by Hooper‐Greenhill (2004), which embeds five generic learning outcomes for measuring the learning impact in museums. These five learning

outcomes are: (1) knowledge and understanding, (2) enjoyment, inspiration and creativity, (3)

action, behavior and progression, (4) attitudes and values, and (5) skills.

Andre (2017) listed several studies that targeted knowledge and understanding as outcome of interest for research on children’s learning in museums. It is important to note that most of these studies were conducted in science museums. Only one study by Wickens (2012) investigated improved knowledge about an artist’s life, music, and lifestyle through qualitative interviews in a longitudinal case study. One other study was identified that

assessed substantial knowledge acquisition in an art museum through a quantitative approach with a higher sample size. This study aimed to evaluate the role of an art museum for museum learning by posing specific questions related to themes on displayed paintings (Greene,

Kisida, & Bowen, 2014).

Schep, van Boxtel and Noordegraaf (in Schep & Kintz, 2017) further elaborated more specific aspects related to knowledge and understanding as learning outcome in art and history museums. In collaboration with different art and history museums in Amsterdam, the following aspects were specified: a) the development of awareness that there is evidence of

historical events, b) the acquisition of knowledge of historical facts, concepts, people, developments, and events, and c) the acquisition of insights into the ways in which people in the present address the past.

Enjoyment (as outcome related to the next dimension of generic learning) in science-, history- and art museums was investigated by, for example, Anderson, Piscitelli, Weier, Everett, and Tayler, (2002). In this study, the authors conducted interviews of 99 preschool children. Another study by Anderson, Piscitelli, and Everett (2008) assessed the museum

(7)

experience of participating children, which can also be attributed to this generic outcome

dimension. The authors thereby employed an exploratory approach to analyze field notes, video observations and audio recording of children’s conversations. Aligning with this

dimension of generic learning outcomes, Schep, van Boxtel and Noordegraaf (2017) specified a pleasurable experience during guided tours as (affective) learning outcome of interest in history and art museums.

Action and behavior as generic outcome dimension were targeted through studies investigating, for example, the exploratory behavior of children during exhibits (Van Schijndel, Franse, & Raijmakers, 2010), or the attention and interactions of children with peers and adults in museums (Sung, Hou, Liu, & Chang, 2010). Another study also found that drama (or role play) can be used as tool for activities such as exploration, imagining and interacting during history learning (Edmiston and Wilhelm, 1998).

Attitudes and values were also addressed by a number of studies. Cheng, Annetta, Folta and Holmes (2011), for example, analyzed children’s adopted attitude towards the impact of methamphetamine abuse on the brain after an interactive museum exhibition. Using interviews and questionnaires, Savenije, van Boxtel and Grever (2014) also investigated students' attribution of significance to the history of slavery and its remnants while engaged in a museum project. For art and history museums Schep, van Boxtel and Noordegraaf (2017) further specified the development of curiosity and interest in history, and the development of

tolerance towards other perspectives, cultures and times.

Skills (the last dimension of the generic learning outcomes) were assessed by studies assessing factors such as critical thinking skills of children following a visit to an art museum (Luke, Stein, Foutz, & Adams, 2007). Schep, van Boxtel and Noordegraaf (2017) further specified the development of historical empathy, learning of critically analyzing

representations and stories of the past, learning to ask historical questions, learning to place objects and events in a historical context, and learning to connect the past, present and the

(8)

future as possible skills that could be acquired in art and history museums.

Although not all studies are reported here, based on the review by Andre et al. (2017), several gaps in the literature become apparent. First of all, it can be concluded that previous research on museum learning mainly targeted one particular type of learning activity at a time, neglecting the potential of multiple factors (activities) that might influence learning. A more holistic view on learning in museums is hence still scarce. Another gap that can be noted is the scarce time dimension when investigating museum learning. Many studies that investigated longitudinal effects only considered a few weeks for their longer-term effects (Greene, Kisida, & Bowen, 2014; Anderson, Piscitelli, Weier, Everett, & Tayler, 2002) or did not target the effectiveness of a museum learning intervention (Anderson et al., 2008). Yet another gap that has emerged from the review of Andre et al. (2017) is the scarce evaluation-research based in art and history museums. Most published evaluation-research on the effectiveness in museum learning is based on learning in science museums.

Lastly, in an earlier review on museum learning, Hooper-Greenhill and Moussouri (2000) stressed (among other things) a need for research that is multi-method, aims at

developing proper and transparent research methods, is based on collaborations with targeted institutions and theoretically grounded, and examines both short-term and long-term

outcomes. Considering the reviewed literature on museum learning from the last decades, it is assumed that such a bundled approach for evaluation is still missing.

The aim of the present study is to address the previously mentioned gaps in the literature by adopting a holistic approach to museum learning in a museum with a collection related to art and history, while employing a mixed-method design that includes a time dimension that extends over 30 weeks. By specifying different (generic) learning outcomes, the ultimate goal is to manage the bridging between a project-specific evaluation (with practical significance to the museum), and a research that is on one hand capable of contributing to the understanding of museum learning, and on the other hand provides an

(9)

example for a methodological approach that is applicable beyond this specific museum and program.

The Present Study: Evaluation of the Museum Project You and the Golden Age

The Rijksmuseum in Amsterdam is a museum with an art-historical collection on the so-called Golden Age, a period of economical- and cultural prosperity in Dutch history that lasted for about 100 years around the 17th_{century. The educational department of the museum} developed a project called Jij en de Gouden Eeuw (i.e., You and the Golden Age) as learning intervention for (mainly) primary school children. The main aim of the project thereby is to provide an unforgettable experience by facilitating a fresh dive into the Golden Age that

elicits the curiosity to explore more about [this] history (from the project guide for the

teachers).

The project consists of three phases, namely a preparation phase, a museum visit, and a wrap-up/follow-up phase. The conduction of the preparation and wrap-up phases are the responsibility of the teachers of a participating class, while the visit to the Rijksmuseum is conducted by professional actors. All phases include different activities such as watching short films, researching tasks, discussions and presentations. Additionally, the children play a ranking game in the preparation phase in which they assign their classmates to a character from the Golden Age. The characters are part of one of three theme groups (i.e., either the

Rembrandt group, the Hugo Grotius group or the Nova Zembla group).

The Rembrandt theme comprises learning over Rembrandt’s most famous painting

The Night Watch and related topics such as how difficult life could be during these times. The

Hugo Grotius theme comprises learning about Hugo Grotius’ role as pioneer of modern international law (with questions such as who owns the sea?). The Nova Zembla theme comprises learning about the Dutch East India Company (VOC) as a trading power, with a special focus on one discovery-mission that tried to explore new trading routes (and unsuccessfully ended on the island Nova Zembla). The three themes are well-known in the

(10)

Netherlands and listed in the Dutch Canon as obligatory teaching content during primary school (in case of Nova Zembla only indirectly through the VOC).

Throughout the project, the children are expected to get familiar with their character and to dive into this assigned role through research activities that are described in guidelines given to the teachers. At the museum, the children participate in a guided tour, watch a short movie, and practice and play a role play according to the theme group they were assigned to (i.e., so three different tours, films and role plays are performed). The three role plays are ultimately performed in front of the entire class. The underlying aim of all activities is to establish a role-taking experience with a character of the past to enable an understanding about living in times of the Golden Age. Content that is normally addressed through textbook chapters should thereby be conveyed through the different activities.

Embedding this project description into the previously presented theoretical frame, it can be summarized that all learning activities are embedded in either the experience-based learning approach (e.g., through the strong narrative focus in the preparation phase and the museum visit (third-hand experience-based learning), and the guided tours and role plays (first-hand experience-based learning) or the inquiry-based learning approach (e.g., through the strong focus on researching activities in the preparation phase).

Choosing the outcomes of interest for our evaluation study was a process that included going through all project materials (i.e., project descriptions, guidelines for teachers, and learning materials), attending the museum visits, and closely collaborating with the project coordinators. Based on this process and accounting for the reviewed literature, we specified four major outcomes of interest (i.e., (substantial) content knowledge, museum experience,

role-taking experience, and historical value perception) along the lines of the generic learning

outcome (GLO) dimensions.

The first outcome of interest is the acquired historical knowledge about Rembrandt, Hugo Grotius, Nova Zembla and more general aspects about the economical- and cultural

(11)

facets of the Golden Age. Regarding the Hugo Grotius theme, for example, the following learning aim was stated by the museum: [The children come to] know who Hugo Grotius was,

what his ideas were, why he was imprisoned and how he escaped Slot Loevestein (from the

project guide for the teachers; similar aims were formulated for the other two themes). The second outcome of interest we defined was the perceived museum experience by each child (i.e., enjoyment as GLO). Previous research has shown that each child’s memory, enjoyment, and learning during a museum visit is highly individual, even while undergoing the same program as their peers (Anderson et al., 2002). Therefore, this outcome was, on one hand, important to the Rijksmuseum (as a positive museum experience can already be

considered a desired outcome, also because it might contribute to a return visit), and on the other hand important for the scientific evaluation of museum learning to be able to account for a factor that might contribute to the substantial knowledge acquisition (i.e., assuming that a positive museum experience has a positive effect on substantial knowledge acquisition).

The next outcome of interest is connected to children’s experiences with activities of narrative and play (i.e. action and behavior as GLO). During the project the children

individually explored stories and facts related to their character and ultimately performed their assigned role during the theatre play at the museum visit. We were interested in how much the children enjoyed taking on the role of a character from the Golden Age. A study by Otten, Stigler, Woodward, and Staley (2004) found that students who participated in drama activities displayed higher history knowledge as well as higher levels of enjoyment for history than students who did not participate in the drama activities. Consequently, we hypothesized that a positive role-taking experience would positively predict the knowledge scores. We further predicted that these scores would be positively correlated with the museum experience outcomes.

For the last construct of interest, we considered that museums generally aim to promote reflection on the significance of a particular heritage (i.e., attitudes and values as

(12)

GLO). Accordingly, Hooper-Greenhill (1999) proposed that museums are able to mediate many of society’s basic values through their expositions. Consequently, we were interested in how the participating children in the project Jij en de Grouden Eeuw perceived the

significance of the objects and stories related to the Golden Age as displayed in the

Rijksmuseum. We predicted that this construct is positively correlated with the knowledge scores, the museum experience, and the role-taking experience.

Note that the appropriation of skills was not one of the main outcomes of interest to the museum. The project was assumed to contribute to student’s inquiry skills, but the duration of the project was considered to be too short to be able to properly facilitate the appropriation of those skills.

Next to the museum experience, role-taking experience, and historical value scores, we included some other factors that might have had an impact on children’s knowledge scores. For instance, we assessed potential visits to the Rijksmuseum either preceding

(pre-visit) or following (post-(pre-visit) the museum project. Due to the splitting of each class into three

different theme groups (including different preparation tasks, tours in the museums, and role plays) we further accounted for potential theme group differences related to the different outcome scores. Lastly, research has shown that interest and excitement can elicit a long-term recall in children about visited exhibitions (Fivush, Hudson, & Nelson, 1984), so we decided to include children that participated in the museum project 24 to 30 weeks before data assessment.

Research Questions

1. Did the children who participated in Jij en de Gouden Eeuw have knowledge about the key-content of the project? Further, did children learn the key-content equally well across the three theme groups and does their knowledge differ across the two time groups)? Lastly, does a pre- or post-visit to the museum predict students’ knowledge of the key-concepts of the project?  

(13)

2. Did the children report a positive museum experience related to the museum visit? Does this experience differ across the theme or time groups and does it predict student’s knowledge about the key concepts of the project?

3. Did the children report a positive role-taking experience and does this perception differ across the theme groups or time? Further, does the role-taking experience predict the knowledge scores?  

4. Did the children report a positive historical value perception (based on the museum visit) and does this value perception differ across the theme or time groups? Further, does the historical value perception positively correlate with the knowledge outcomes, museum experience, and role-taking experience?

It was further checked for effects of pre- and post-visits on the latent outcomes. Lastly, we had one guiding question regarding the qualitative aspect of our study:

5. How did the teachers evaluate the project?  Figure 1

(14)

Methods   Sample

The sample of this study consisted of 𝑁 = 387 children, of which 200 girls, 186 boys, and 1 student who refused the answer regarding sex. Data were collected in 16 classes, with 𝑇 = 16 teacher, from 𝐾 = 10 schools. Most children were in fifth grade (i.e. groep 7; n = 263). Additionally, 43 children were in fourth grade (groep 6), and 81 children were in sixth grade (groep 8). Accordingly, the age of the children varied between 9 and 13 (𝑀 = 10.75, 𝑆𝐷 = 0.83). Of all children, 223 (57.62%, 3 missing) visited the Rijksmuseum prior to the museum project, and 34 (8.79%, 3 missing) of the children went to the Rijksmuseum after participating in the project. 130 of the sampled children (33.59%) participated in the Rembrandt theme group, 142 (36.69%) participated in the Hugo Grotius theme group, and 115 (29.72%) participated in the Nova Zembla theme group. Furthermore, 267 (68.99%) children were in the long time group (i.e., questionnaire completion 24-30 weeks after the museum visit), and 120 children (31.01%) were in the short time group (i.e., questionnaire completion 0 - 6 weeks after the museum visit).

Procedure

The research questions were aimed to be tackled through a cross-sectional approach, whereby the interval passing between the museum visit and the questionnaire completion was used to distinguish two different groups, i.e., museum visit preceding the questionnaire completion by 24-30 weeks (long time group), and 0-6 weeks (short time group). A pilot study including one class with 30 children and professional feedback was conducted to validate the constructed scales for this study. For the data collection, schools were initially informed about the study through email and subsequently called by the researchers. The questionnaires were printed and completed at the participating schools under the supervision of the researchers. The children completed the questionnaires within 15-30 minutes. All teachers were asked to obtain the passive consent of parents before letting their students

(15)

complete the questionnaires. Variables

The constructs that are featured in the research questions were measured through two questionnaires, one for the students, and one for the teachers. The student data thereby yielded the quantitative data for this study, and the teacher’s feedback yielded the qualitative data. The student questionnaire assessed the demographic variables age, gender, and class level of each individual child, as well in which (within-class) theme group the child participated (either the Rembrandt, Nova Zembla, or Hugo Grotius group). Two more questions checked for pre- and post-visits to the Rijksmuseum.

Content knowledge. The children’s content knowledge about the project was measured through a test of ten multiple choice questions and five open questions (15 in total with five questions related to each of the three themes). The open questions were designed to only require short answers and were displayed with a picture, asking for example “What is the

name of Rembrandt’s most famous painting?” (see Appendix A1 for all items). The questions

were based on the materials that were provided to the teachers, the content of the guided tours and the role-play scripts. The resulting first draft of the questionnaire was subsequently discussed with the educators of the Rijksmuseum and slightly adapted based on their feedback.

Museum experience. The construct tapping the perceived museum experience was subdivided into two subscales. These subscales mainly reflected a conceptual distinction between questions being formulated in a third-hand-experience or first-hand-experience fashion (see Table 1 for examples). This was mainly done to simplify the later model-fitting, as the inclusion of so many factors and indicators can easily generate problems otherwise. The items to measure first-hand-experience were based upon the Self-Report Measure of Intrinsic Motivation (SRIM) of Isen and Reeve (2005). It was thereby aimed to measure how interesting and enjoyable students found the tasks that were used in this study. Furthermore,

(16)

we included 'third-hand-experience' items that were based on the behavioral intentions items developed by Baker and Crompton (2000). Their items are indications of whether a visitor to a program or facility will return, for example: I would encourage friends and relatives to go

to […] and I attend […] again next year or the year after. Each subscale was measured

through three items on a five-point Likert scale, with possible answers ranging from 1 (strongly disagree) to 5 (strongly agree). The items were phrased in such a way that they specifically addressed the project content such as in “I would go again to the Rijksmuseum”.

Role-taking experience. The items about the role-taking experience were phrased in a similar way as the museum experience items. The role-taking experience was conceptually sub-divided into questions that addressed the research activities in the preparation phase (to get acquainted with the assigned character), and the role-play experience during the museum visit (see Table 1 for examples). Analogously to the museum experience, the questions were measured on a five-point Likert scale and were phrased in such a way that they specifically addressed project content (e.g. “It was nice to play a role in the role play”).

Historical value perception. This construct was based upon a questionnaire developed by Savenije, van Boxtel and Grever (2014) to measure students' attribution of significance to the history of slavery and its historical traces. It was conceptually sub-divided into questions asking for children’s perceived value of keeping historical items and the value of learning about historical items. Contrary to the previous two outcomes of interest, the questions of this construct were phrased in a more general tone, such as in “It is important

that there are museums where you can see paintings from the Golden Age”, so five additional

questions (with the same five-point Likert scale schema) asking more specifically whether the reason for this answer was due to the museum visit were added to the respective first

questions (i.e., “I think that because of the visit to the Rijksmuseum” accompanying the previous item example). The second question was consecutively used as correction factor for the score of the first question by adding half of the score for the second question to the score

(17)

of the first question. So, if a child responded, for example, with a 5 on the question “It is

important that there are museums where you can see paintings from the Golden Age”, and

with a 1 on the question “I think that because of the visit to the Rijksmuseum”, the coded final score on the perceived historical value construct would be 5 + 1₂= 5.5. This means that the possible corrected value perception scores ranged from 1.5 to 7.5. See Table 1 for a complete overview of all items and latent factors.

Teacher feedback. The teacher questionnaire was carried out to obtain qualitative feedback on four aspects, targeting 1) the preparation and wrap-up activities conducted by the teachers, 2) satisfactory elements (as in the Pros of Pros and Cons) and perceived benefits of the project, 3) less satisfactory elements (Cons) of the project and improvement suggestions, and targeting 4) the perceived educational value and textbook-replacing function of the project, as well as future motivations to participate again with another class.

Some questions were fixed-choice questions (i.e., whether preparation or wrap-up activities were conducted, the perceived satisfaction about the educative value as judged from a scale from 1 (very unsatisfied) to 5 (very satisfied), whether the project fulfilled a text-book-replacing function, and whether there was motivation to participate again in the future). The other questions were open questions, asking for qualitative feedback.

(18)

Table 1

Overview of all items and latent factors

Item Second-Order Latent Construct First-Order Latent Sub-Scale Question Museum Experience Third-Hand Experience

1.1 I would like to go again to the Rijksmuseum

1.2 I would recommend other people to also go to the

Rijksmuseum

1.3 I told other people (e.g. parents, friends,

neighbors) about my visit to the Rijksmuseum First-Hand

Experience

1.4 I liked the visit to the Rijksmuseum

1.5 I am happy we did the trip to the Rijksmuseum

1.6 I will not forget the visit to the Rijksmuseum any

time soon Role-Taking

Experience

Role-Taking Preparation

2.1 It was nice to prepare the role play together

2.2 It was nice to do research about life during the

Golden Age

2.3 The preparation at school made me curious about

the museum visit Role Play

Performance

2.4 I liked to learn about history through role play

2.5 I liked to play a role in the role play

2.6 I liked to project myself into a person living in the

Golden Age Value Perception Value Keeping

3.1 It is important that there are museums where you

can see paintings from the Golden Age

3.2 Things from the Golden Age, such as vases, ships

and weapons should be preserved

3.3 I think that diaries and travel reports from the

(19)

Value Learning

3.4 It is important that stories about the Golden Age

are told at school

3.5 All children in the Netherlands should go to the

Rijksmuseum once

3.6 It is important to know something about historical

persons such as Rembrandt and Hugo Grotius

Note. See Figure 1 for a clear overview of the model.

Data Analysis

Quantitative analyses. Descriptive analyses were performed in two ways to answer the first sub-questions of all research questions (i.e., did the children acquire substantial knowledge, did they display a positive museum- or role-taking experience and did they display a positive value perception). First, we counted correct answer frequencies in the knowledge test to obtain test performances. Second, we counted positive answer frequencies for the three second-order latent constructs museum experience, role-taking experience and historical value perception. Inferential analyses were then employed through a confirmatory factor analysis with structural equation modelling (SEM), to answer the second parts of the research questions (related to theme- and time group mean differences across all constructs, as well related to the general construct relations).

To assess whether children acquired substantial knowledge, we assessed the correct answer frequencies of all items in the knowledge test. A total test-performance of at least 70%, so on average 9-10 correct answers per test (with a maximum score of 15), was assumed to sufficiently reflect the acquisition of substantial knowledge of the key-content of the

project.

Because analyzing and meaningfully interpreting means of latent factors is very complicated, a positive museum experience and a positive role-taking experience of the children was assumed if an average of at least 70% of the children answered either with a 4 (agree) or a 5 (strongly agree) on the five-point Likert scale of the items belonging to the

(20)

given underlying construct. The same method was applied to judge the historical value perception of the children. Because these scores reflected a combined score of two items (where the second items were used as correction factor on the first), the criterion for a positive value perception was set to an averagely counted score of at least 4.5 (reflecting the minimum score of a combination when a 4 was checked on the first question). Scores of 4.5 or higher resulting from answer combinations where the first question was checked with a 2 or 3 with a respectively high correction factor were not considered to have passed the criterion. The group mean differences and construct relations were analyzed through path analysis, based on the specified model as depicted in Figure 1.

To fit the hypothesized model, both the covariance structure, as well as the mean structure of the data needed to be identified. See the description of model identification in the Appendix for an explanation on how this was done. We did not rely on the χ2 difference test for accepting model fit, because this test is known to reject models when violations are minor in a large sample (Chen, 2007). Instead, we used the Comparative Fit Index (CFI), the Root Mean Square Error of Approximation (RMSEA) and its confidence intervals, as well as the Standardized Root Mean Square Residuals (SRMR) to assess the model fit. Accordingly, we considered a model fit to be acceptable when 1) the CFI value was at least as high as .90 (Bentler & Bonett, 1980), 2) the RMSEA value was smaller than .08, while the upper value of the computed confidence interval was not larger than .10 (Browne & Cudeck, 1992), and 3) the SRMR was lower than .10 (Bentler, 1995). Most observed scores showed evidence against a normal distribution, so maximum likelihood robust (MLR) estimators were used for

parameter estimations and full maximum likelihood estimation was chosen to handle missing data.

To investigate potential mean differences between the three theme groups and between the two different time groups, we conducted a multiple group analysis and checked whether the fitted model would allow a comparison of the latent construct means, standard deviations

(21)

and correlations across groups (i.e., testing for measurement equivalence). See the description of testing measurement invariance in the Appendix for a complete elaboration on how this was done. Once measurement invariance was established for the first-order latent factors (i.e. the subscales), any combination of these invariant factors was treated as indicative of the more general higher-order factor (Rudnev, Lytkina, Davidov, Schmidt, & Zick, 2018). Given that measurement equivalence could be established, we the following results on the three second-order latent factors (next to the knowledge scores), reflecting the general outcomes for museum experience, role-taking experience and value perception.

The means of the second-order common factors (denoted with κ) were reported with Wald 𝑧-scores, 95% Confidence Intervals (𝐶𝐼), and the respective 𝑝-values. For the construct relations, unstandardized parameter estimates (β) were reported, next to the previously mentioned statistics, with standardized estimates based on Cohen’s d (with respective

interpretations; Cohen, 1988). Standard errors for the parameters were estimated according to the robust Huber-White procedure.

Additionally, individual correlation residuals were checked to investigate potential local sources of model misspecification (which can be masked from acceptable SRMR values; Jak, Verdam, Jorgensen, & Oort, 2017). Correlation residuals larger than .100 were thereby regarded as being problematic but due to the large number of items and groups, only recurring patterns across groups (i.e., same problematic values across groups) or very large (i.e., >.180) values were reported. Reliabilities were calculated for the three latent factors and provided in omega with unequal weights (𝜔), a measure where values can be interpreted the same way as Crombach’s alpha but estimates are theoretically more appropriate (Bacon, Sauer & Young, 1995). Factor scores were computed to check the distributions and interclass correlations (ICCs) of these latent constructs. If observations from the same class displayed ICCs larger than 0.100, a correction for an underlying multilevel-structure of the data should be adopted.

(22)

(Rosseel, 2012) and semTools (semTools Contributors, 2016) packages for the SEM-related analyses. The analyses were further based on the Dutch versions of the questionnaires.

Qualitative analysis. The qualitative data stemming from the teachers were analyzed by calculating answer frequencies for the fixed-choice questions (e.g. asking whether specific preparation activities were conducted) and by exploring the content of the open questions. The answers to the open questions were summarized based on overlapping content and subsequently also counted to provide a frequency for every answer type. Because the study was conducted in Dutch, the answers were translated into English.

Results (Quantitative) Student Outcomes

Outcome distributions. All distributions (i.e., of the knowledge scores, museum experience, role-taking experience, and value perception) showed a negative skew. This is especially true for the museum experience and role-taking scores, while the skewness seemed only slightly negative for the knowledge and the value perception scores. Based on this evidence against normality, a robust maximum likelihood estimator was chosen for the SEM analyses.

Measurement invariance and fit of the final model. Measurement equivalence was tested based on the measurement model (i.e., a model in which the structure between all latent constructs is saturated). All fitted measurement models displayed acceptable values on the fit indices (i.e., the largest values being CFI > .90, RMSEA < .75, and SRMR <.079) and both metric and scalar invariance was given for both the first, and the first-and-second order factors across the three theme groups and across the two time groups. These results indicate that means and covariances may in the following be compared across the different groups.

The final model with all hypothesized relations (see Figure 1) included the constrains of the scalar invariance model. For parameter estimation, 344 observations were used for the theme group model, with 13 missings in the Rembrandt group, 16 in the Hugo Grotius group,

(23)

and 14 missings in the Nova Zembla group. For the time group model, 354 observations were used, with 20 missings in the long time group and 13 missings in the short time group. The model fit for the theme groups (i.e., χ2(576) = 912.90, p < .001; CFI = .901; RMSEA = .067, 95% CI [.059; .075]; SRMR = .079) and for the time groups (i.e., 𝜒2

(373) = 691.59, p < .001;

CFI = .901; RMSEA = .066, 95% CI [.059; .074], SRMR = .066) was acceptable.

All factor loadings of the indicator variables on the first-order latent factors, as well as the loadings of the first-order factors (subscales) on the second order factors were significant across all group models, with p < .002 and displayed medium to very large effect sizes across all factor loadings.

Across all theme and time groups there were some problematic correlation residuals (i.e., > .100) related to items 3.2, 3.4, 3.5, 2.1, 2.2, and 2.3 (all >.180). This indicates local sources of model misspecification but since the overall model-fit was acceptable and there were no sufficient justifications for model modifications, the results of this study are reported based on the hypothesized model.

An analysis of the ICCs displayed critical values for the knowledge scores (ICC = .146), the museum experience (ICC = .112) and pre- (ICC = .195) and post-visit (ICC = .135) outcomes but not for the role-taking experience (ICC = .078) and the value-perception (ICC = .080) scores. This indicates that a multilevel-correction of the model outcomes would have been appropriate (Snijders and Bosker, 2012). However, in the current version of R it is not yet possible to handle missing data without list-wise deletion in combination with this correction. It was therefore decided so handle missing data more appropriately with full maximum likelihood estimation and ignore the underlying nested structure of the data. The results are hence reported without the multilevel-correction.

The estimated reliabilities for the second-order factors all displayed acceptable to excellent values (see Table 2).

(24)

Table 2

Reliabilities of the second-order latent factors

Museum Experience Role-Taking Experience Value Perception Theme Group Rembrandt 0.92 0.83 0.83 Hugo Grotius 0.86 0.83 0.83 Nova Zembla 0.89 0.80 0.80 Time Group Short (0 – 6 weeks) 0.83 0.83 0.71 Long (24 – 30 weeks) 0.83 0.83 0.85

Note. These scores reflect the proportion of the respective second-order factor explaining the

total score of the respective latent outcome.

Content knowledge. The overall mean of the (observed) knowledge sum scores was

M = 11.17 (SD = 2.67), which corresponds to an average test performance of 74.47% correct

answers. See Table 3 for the means and standard deviations of all theme and time groups. There were no significant differences between the knowledge score means across the three theme groups, or the two time groups. Analyzing pre- and post-visits to the Rijksmuseum as predictors for the knowledge scores, we found that a visit to the museum before the museum project (pre-visit) positively predicted the knowledge scores in the long time group (see Table 5 and 6 for all test statistics and effect sizes), but not in any other time or theme group. A visit to the Rijksmuseum after the museum project (post-visit) did not significantly predict the knowledge scores across any theme or time groups.

Table 3

(25)

Note. *This question was treated in the role play or tour; red and orange numbers reflect test performances below 50% and 60%, respectively.

Total

Theme Group Time Group

Rembrandt Hugo Grotius Nova Zembla Long Short

M SD % M SD % M SD % M SD % M SD % M SD % 11.17 2.67 74.47 11.22 2.30 74.82 10.97 3.04 73.15 11.35 2.57 75.65 11.09 2.55 73.91 11.35 2.92 75.67 Item n % n % n % n % n % n % 1* 372 96.12 129 99.23 134 94.37 109 94.78 259 97.00 113 94.17 2* 313 80.88 111 85.38 106 74.65 96 83.48 214 80.15 99 82.50 3* 346 89.41 122 93.85 124 87.32 100 86.96 234 87.64 112 93.33 4 329 85.01 116 89.23 113 79.58 100 86.96 224 83.90 105 87.50 5 252 65.12 84 64.62 95 66.90 73 63.48 168 62.92 84 70.00 6* 277 71.58 90 69.23 90 63.38 97 84.35 189 70.79 88 73.33 7 222 57.36 75 57.69 74 52.11 73 63.48 155 58.05 67 55.83 8* 303 78.29 102 78.46 112 78.87 89 77.39 211 79.03 92 76.67 9 340 87.86 117 90.00 118 83.10 105 91.30 243 91.01 97 80.83 10 290 74.94 97 74.62 109 76.76 84 73.04 202 75.66 88 73.33 11 171 44.19 51 39.23 77 54.23 43 37.39 116 43.45 55 45.83 12* 353 91.21 119 91.54 131 92.25 103 89.57 245 91.76 108 90.00 13* 295 76.23 91 70.00 116 81.69 88 76.52 198 74.16 97 80.83 14 199 51.42 72 55.38 66 46.48 61 53.04 132 49.44 67 55.83 15 260 67.18 83 63.85 93 65.49 84 73.04 170 63.67 90 75.00

(26)

Museum experience. Adding up the number of children that answered with a 4 or 5 (i.e.,

agree and strongly agree) on the six questions related to the museum experience, we found

that on average 298.66 (77.14%) of the children responded positively to these questionnaire items. Hence, the majority of the children reported a positive museum experience. See Table 4 for all answer-frequencies across items. (The mean differences between the groups, as yielded by the following SEM analysis, are based on how indicative a specific item actually is for the higher-order latent factor, hence the frequencies might suggest different outcomes than reported in the following).

The following analysis of mean differences between the three theme groups showed that there was a significant difference in the means of the latent factor museum experience between the Rembrandt group and the Nova Zembla group, indicating that children of the Rembrandt group reported a more positive museum experience than children from the Nova Zembla group (κ = .904, z = 3.218, CI = [0.369; 1.518], p = .001), which was a very large effect size (d = 1.276). There was no significant difference in museum experience outcomes between the other theme groups. The analysis for the two time groups showed that there was further also no significant difference in reported museum experience outcomes across time.

Regarding the hypothesized relations, it was found that the second-order latent factor museum experience positively predicted the knowledge scores in the Rembrandt theme group (see Table 5) but not in any other theme or time group (see Table 5 and 6). Pre-visits to the museum were a significant predictor for the museum experience outcomes in the Nova Zembla theme group (b = 0.32, SE = 0.158, z = 2.034, p = .042, CI = [0.012; 0.630], d = 0.245) but not in any other theme or time group. Post-visits were a significant predictor in the Rembrandt group (b = 0.258, SE = 0.106, z = 2.439, p = .015, CI = [0.051; 0.466], d = 0.124), the Nova Zembla group (b = 0.39, SE = 0.14, z = 2.793, p = .005, CI = [0.117; 0.667], d = 0.174), and the long time group (b = 0.45, SE = 0.18, z = 2.458, p = .014, CI = [0.091; 0.810],

(27)

Role-taking experience. Regarding the third research question we found that across the six items related to the role-taking construct, 316 children (81.65%) on average responded with a 4 or 5 on the questionnaire items. The majority hence reported a positive role-taking experience.

Regarding the mean differences in the latent role-taking experience scores, is was found that children in the Hugo Grotius group reported a more positive role-taking experience than the children in the Nova Zembla group (κ = 1.664, z = 3.812, CI = [0.809; 2.520], p < .001), which had a huge effect size (d = 2.759). Similarly, children in the Rembrandt group also reported a more positive role-taking experience than children in the Nova Zembla group (κ = 1.390, z = 3.220, CI = [0.544; 2.236], p = .001), which was a huge effect (d = 2.024). There was no significant mean difference in role-taking experience scores between the Rembrandt group and the Hugo Grotius group. Furthermore, there were no significant differences in these role-taking scores across the two time groups.

The second order latent factor of the role-taking experience was not a significant predictor for the knowledge scores in any theme or time group. However, in Table 3 we marked the questions that were treated in the role plays, and it seems that these questions were answered very well across the groups. Only in the Nova Zembla group did pre-visits to the museum positively predict the role-taking outcomes (b = 0.42, SE = 0.16, z = 2.632, p = .008, CI = [0.107; 0.733], d = 0.325). Post-visits to the museum were a significant predictor for the role-taking experience in the Rembrandt group (b = 0.40, SE = 0.10, z = 3.974, p < .000, CI = [0.204; 0.602], d = 0.183), the Nova Zembla group (b = 0.24, SE = 0.11, z = 2.137,

p = .033, CI = [0.020; 0.456], d = 0.107), and the long time group (b = 0.317, SE = 0.103, z =

3.080, p = .002, CI = [0.115; 0.518], d = 0.161) but not in the short time group or the Hugo Grotius group.

Historical value perception. Regarding the forth research question we found that 279.33 (72.18%) of the children fulfilled the criterion for a positive historical value

(28)

perception as measured through the questionnaire.

There was a significant mean difference only between the Rembrandt and the Nova Zembla theme groups, while the children of the Rembrandt group reported more positive value scores than the children of the Nova Zembla group (κ = 1.217, z = 3.479, CI = [0.532; 1.903], p = 0.001, with a large effect size (d = 1.262). There were no other significant mean difference between the other theme or time groups.

The covariance between the knowledge outcomes and the historical value scores did not reach significance in any of the theme or time groups. The value scores were significantly predicted by the post-visit scores in the Rembrandt group (b = 0.480, SE = 0.190, z = 2.532, p = .011, CI = [0.108; 0.851], d = 0.174), the Nova Zembla group (b = 0.59, SE = 0.24, z = 2.475, p = .013, CI = [0.122; 1.055], d = 0.191), and the long time group (b = 0.45, SE = 0.18,

z = 2.458, p = .014, CI = [0.091; 0.810], d = 0.154), but not in the short time group or the

Hugo Grotius group. Pre-visits to the museum did not significantly predict the value scores in any theme or time group.

Further relations. The covariances between all other latent constructs (i.e., museum experience, role-taking experience and value perception) were positive and significant in all groups (p < .001 with medium to large effect sizes).

(29)

Table 4

Latent construct frequencies

Total Rembrandt Hugo Grotius Nova Zembla Long Time Short Time

n % n % n % n % n % n % Museum Experience 1.1 280 72.35 99 76.15 106 74.65 75 65.22 194 72.66 86 71.67 1.2 290 74.94 107 82.31 105 73.94 78 67.83 201 75.28 89 74.17 1.3 292 75.45 106 81.54 111 78.17 75 65.22 206 77.15 86 71.67 1.4 329 85.01 118 90.77 123 86.62 88 76.52 231 86.52 98 81.67 1.5 335 86.56 117 90.00 125 88.03 93 80.87 236 88.39 99 82.50 1.6 266 68.73 97 74.62 94 66.20 75 65.22 182 68.16 84 70.00 Total 298.67 77.17 107.33 82.56 110.67 77.93 80.67 70.14 208.33 78.03 90.33 75.28 Role-Taking Experience 2.1 323 83.46 112 86.15 121 85.21 90 78.26 221 82.77 102 85.00 2.2 322 83.20 108 83.08 122 85.92 92 80.00 227 85.02 95 79.17 2.3 346 89.41 114 87.69 132 92.96 100 86.96 243 91.01 103 85.83 2.4 343 88.63 112 86.15 130 91.55 101 87.83 237 88.76 106 88.33 2.5 314 81.14 109 83.85 116 81.69 89 77.39 221 82.77 93 77.50 2.6 248 64.08 86 66.15 90 63.38 72 62.61 174 65.17 74 61.67 Total 316.00 81.65 106.83 82.18 118.5 83.45 90.67 78.84 220.50 82.58 95.50 79.58

(30)

(continued) Total Rembrandt Hugo Grotius Nova Zembla Long Time Short Time

n % n % n % n % n % n %

Hist. Value Perception

3.1 260 67.18 88 67.69 97 68.31 75 65.22 185 69.29 86 71.67 3.2 354 91.47 120 92.31 132 92.96 102 88.70 251 94.01 89 74.17 3.3 327 84.50 111 85.38 122 85.92 94 81.74 226 84.64 86 71.67 3.4 229 59.17 80 61.54 80 56.34 69 60.00 167 62.55 98 81.67 3.5 223 57.62 74 56.92 84 59.15 65 56.52 158 59.18 99 82.50 3.6 283 73.13 101 77.69 98 69.01 84 73.04 198 74.16 84 70.00 Total 279.33 72.18 95.67 73.59 102.17 71.95 81.50 70.87 197.50 73.97 90.33 75.28

(31)

Table 5

Outcome Relations with knowledge scores per theme group

Knowledge ~ 𝜷 𝑺𝑬 𝒛 𝒑 𝑪𝑰 𝒅

Rembrandt Museum Exp. 3.39 1.52 2.232 0.026 [ 0.41; 6.36] 0.94

Role-Taking Exp. -2.35 1.48 -1.586 0.113 [-5.25; 0.55] -0.69

Pre-visit 0.63 0.37 1.686 0.092 [-0.10; 1.36] 0.14

Post-visit -0.08 0.50 -0.164 0.870 [-1.05; 0.89] -0.01

Hugo Grotius Museum Exp. 1.96 1.30 1.501 0.133 [-0.60; 4.51] 0.39

Role-Taking Exp. -0.62 1.34 -0.459 0.647 [-3.24; 2.01] -0.12

Pre-visit 0.39 0.46 0.846 0.398 [-0.51; 1.29] 0.06

Post-visit -0.64 1.11 -0.578 0.563 [-2.82; 1.53] -0.06

Nova Zembla Museum Exp. 7.36 12.72 0.579 0.563 [-17.57; 32.28] 1.79

Role-Taking Exp. -6.67 13.10 -0.509 0.611 [-32.31; 19.00] -1.60

Pre-visit 1.79 1.54 1.166 0.244 [-1.22; 4.80] 0.33

(32)

Table 6

Outcome Relations with knowledge scores per time group

Knowledge ~ 𝜷 𝑺𝑬 𝒛 𝒑 𝑪𝑰 𝒅 Long Time Museum Exp. 1.84 1.15 1.594 0.111 [-0.42; 4.10] 0.43 Role-Taking Exp. -0.88 1.11 -0.795 0.427 [-3.06; 1.29] -0.22 Pre-visit 0.66 0.30 2.226 0.026 [ 0.08; 1.23] 0.12 Post-visit -0.26 0.46 -0.566 0.572 [-1.17; 0.65] -0.03 Short Time Museum Exp. 4.57 3.16 1.447 0.148 [-1.62; 10.75] 1.00 Role-Taking Exp. -2.91 2.92 -0.996 0.319 [-8.62; 2.81] -0.69 Pre-visit 0.85 0.52 1.612 0.107 [-0.18; 1.87] 0.15 Post-visit -0.94 1.05 -0.889 0.374 [-3.00; 1.13] -0.06

(33)

Qualitative Analysis

Preparation and wrap-up activities. Analyzing our qualitative data from the 16 teachers we found that 15 teachers (93.75%) conducted the introduction lesson as suggested by the project guide for teachers. 15 (93.75%) of the teachers let the children play the ranking game for the character assignments for the three role play groups, and 14 (87.5%) conducted a session for reading and practicing the role play script with the entire class. Lastly, 10 (62.5%) of the teachers conducted some extra activities to prepare their class for the museum trip. Such extra activities were for example practicing the role plays (and songs) (e.g. in small groups), additionally watching films, reading books, decorating the class, conducting presentations based on the tasks of the project guide (for teachers), or generally conducting more lessons about the Golden Age.

Regarding the wrap-up activities after the museum visit, we found that 13 (81.25%) of the teachers performed activities to round up the museum trip. Such wrap-up activities were for example doing follow-up talks about the museum trip (n =6), performance of the role plays for the parents of the children (n = 3), writing a small essay over the museum trip (n = 2), watching films (n = 2), drawing pictures (n = 1), giving presentations (n = 1), writing a letter to the museum guides (n = 1), or other additional teaching about the Golden Age from the school textbook (n = 1).

Satisfactory elements and perceived benefits. When asked to name satisfactory aspects of the project we found that most answers (n = 8) targeted the provided learning materials and the diversity of the (gameful) activities and background stories throughout the project. One teacher, for example, reported: Nice varying of activities, good films, great

power-point presentation. It was also reported that the preparatory lessons aligned well with

the normal curriculum agenda. Many were furthermore very satisfied with the museum visit itself and the tours that were provided for the children (n = 4), for example: Warm welcoming

(34)

children dived into the history of the Golden Age and the way how theory was turned into practice throughout the project, for example: The children participate in the story and thereby

it becomes their own history. 

14 teachers answered the question about their perceived benefits of the project. Most answers were thereby related to the way in which children learned throughout the project, for example by experiencing the history while diving into the topics. One reported, for example:

Learning about the Golden Age through playing. Walking differently through the museum due to [background] stories, fun/pleasure in history.

A (connected) benefit reported by many other teachers was related to their appreciation that the children seem to have remembered a lot of things. One teacher, for example, reported: The content knowledge is remembered well through the variety of learning

activities/teaching material. Especially the bigger picture they can recall, less the details.

Similarly, some teachers (n = 3) perceived the transparent and clear learning goals, as well as the curriculum deepening activities for embedding knowledge as benefits.

Less satisfying elements and improvement suggestions. Seven teachers (43.75%) answered our question on things that they found less satisfying. Of these teachers, most (n = 6) remarked things related to the preparation activities. One reported that the answers to the research questions for the students were hard to look-up in the internet, another reported that information went missing due to broken links in the teacher’s guide. Two teachers had remarks regarding the amount of preparation, one saying that the teacher’s guide wasn’t clear enough on what was necessary and voluntary preparation and the other reporting that the proposed preparation lessons would take up too much time. Yet another teacher found that he didn’t have enough time to practice the roleplay script and song texts as they were handed out too shortly before the museum trip. Furthermore, one teacher remarked that the song for the Hugo Grotius theme was much easier to learn and nicer than the songs of the other two groups. Only one teacher had a remark related to the museum visit, saying that it was

(35)

unsatisfying that the children saw different things in the museum based on the different theme group tours, for example: The guided tours are nice but different things are seen/visited

through the different groups. 

Our question regarding suggestions for improvements was answered 11 times.

Analogously (and related to) our question regarding less satisfying things, most answers were connected to the preparation phase of the project. Five issues could be related to the teacher’s guide of the project. The concerns were thereby (as mentioned before) that the research questions for the students were hard to answer and that the role-play script and song texts should be handed out earlier and placed on one site together with the powerpoint

presentations so that it would be easier to use them. One teacher wished for two new songs for the Rembrandt and Nova Zembla groups, another teacher wished for a better introduction to the ranking game and yet another wished for a shorter duration/ smaller extend of the preparation phase. One teacher also wished for more follow-up and evaluation tasks that could be performed after the museum visit. Only one teacher had a remark regarding the museum visit, saying that the time in the museum was very short, i.e., They should be allowed

to walk around more in the museum. The time was relatively short.

Two teachers used the space to wish for more general positive remarks, one wishing for more projects of such kind and the other writing: This is the nicest field trip that we do

with our school. After the visit many children think the same. 

Educative value, textbook-replacing function and future motivation. We found that all teachers were either very satisfied with the educative value of the project, i.e., answering with a 5 (n = 10; 62.5%), or mostly satisfied, i.e., answering with a 4 (n = 6, 37.5%) with the educative value of the project.

We further wanted to know whether the teachers thought of the project Jij en de

Gouden Eeuw as a well-textbook-replacing education project. Most teachers (n = 12; 75%)

(36)

curriculum teaching. Only three teachers (18.75%) regarded the project as entirely textbook-replacing, and one teacher couldn’t answer the question due to not being the teacher of the class.

When asked whether they would participate again in the project Jij en de Gouden

Eeuw, almost all (n = 15; 93.75%) teachers answered positively (one wasn’t sure yet).

Discussion

The aim of this study was to adopt a holistic approach on museum learning in an art/history museum, while employing a mixed-method approach. We thereby aimed to

quantitatively analyze the outcomes for the knowledge test, reported museum- and role-taking experiences, as well as the perceived historical value scores of the participating children. For the quantitative data we further attempted to develop a working analysis-method that could capture aspects of museum learning by modelling construct relations and group differences. Additional qualitative teacher responses provided practical feedback for the museum, so that ultimately generalizable outcomes for research on museum learning, as well as practical outcomes for the museum could be reported.

Regarding the substantial knowledge acquisition, we found that, on average, the children learned the key-content that was aimed to be conveyed throughout the museum project. This was true for all single knowledge questions, so the type of question was not an issue in our sample. This finding opposes results of studies that found a significant difference between open and closed questions (e.g., Wilde and Urhahne, 2008) during research on museum learning. There were also no significant differences between the knowledge score means across the three theme groups, or the two time groups.

The lack of mean differences across the theme groups is a (positive) practical outcome for the Rijksmuseum, indicating that the set-up of splitting each class into three groups

throughout the project does not impact on children’s substantial knowledge acquisition. Not finding significant differences between the two time groups further implies that the learned

(37)

key-content is (on average) remembered for a period of at least 6 months after the museum visit. In line with the study of (Greene, Kisida, & Bowen (2014), we ultimately propose that a simple knowledge test is a methodologically proper tool for assessing substantial knowledge as outcome of museum learning.

Looking at the knowledge test questions based on whether they were treated in the role play or not, it seems that children performed especially well on questions that were addressed in the role play or tour during the museum visit. This indication would be in line with the findings by Anderson et al. (2002), who found that children could better recall things from an exhibition when they were embedded in activities such as theatre-play or story-telling. Follow-up studies could consider designs that specifically target learning differences between role-play-enhanced and not-role-play-enhanced substantial knowledge questions.

Next, we found that, on average, all children reported a positive museum experience, role-taking experience and historical value perception, and we thereby did not find a

significant difference between the time groups. This is a very positive feedback for the museum project and shows that the general aims of the program (i.e., conveying substantial knowledge, enabling a positive museum experience, facilitating the role-taking experience, and conveying a certain importance for historical values) were achieved and persistent over 6 months following the museum visit.

Related to the construct relations we found that the museum experience was a positive predictor for the knowledge test outcomes in the Rembrandt theme group. This result suggests that the museum experience can (at least partly) contribute to enhanced knowledge

performances. For each outcome of interest there was also always at least one group for which a positive relation with the respective outcome and pre- or post-visits was displayed. This is in line with findings of Anderson et al. (2002) who suggested that a familiar context (as arguably established through a pre- or post-visit to the museum) can positively mediate the enjoyment, memory and learning.

(38)

Not finding a relation between the knowledge performances with the role-taking experience, the value perception, and the museum experience in the other theme or time groups could be regarded as support for the independency of the latent learning outcomes. Especially because we conducted a knowledge test as normal in schools (with the same serious testing situation), it could be that some children were naturally more comfortable with such an assessment (regardless of what they reported on the other latent constructs). In

combination with the finding that most children were over-averagely positive on the latent constructs (as reflected by the negative distributions of the outcomes), such cases render the detection of relations with the knowledge test performances difficult. The relations between the museum experience, role-taking experience, and value perception were positive, which could be regarded as generally positive attitude towards the museum project.

It is important to name some limitations in relation to the results of this study. First of all, we did not include a control group in our study design. This means that we cannot discuss how the substantial knowledge outcomes might be differed in classroom learning settings and in our museum learning setting (combining classroom and museum learning). We are

therefore also not able to judge whether the museum project is more effective for conveying knowledge about the Golden Age than normal classroom teaching. However, we do have the slight indication that the role plays enhanced knowledge test performances for questions that were treated in the plays and this would reflect an advantage of museum learning over only classroom learning. Future applications of this study should plan a classroom learning curriculum to include results of a comparable control group to further investigate difference between classroom learning and museum learning.

Nevertheless, it is believed that outcome scores for the museum experience, role-taking experience and value perception can plausibly be interpreted without a control group, also in terms of effect sizes. This is because for one, the construct items were either posed very specifically, targeting project components (museum and role-taking construct), or were

(39)

corrected for whether children gave a response due to their visit to the Rijksmuseum (value construct). Secondly, considering the young age of the children in our study, it can be

assumed that most children probably did not have extensive experiences with other museums. We therefore conclude that outcomes of a control group for these three latent constructs would not have significantly contributed to our results.

The second limitation we would like to note is that we could not consider the multilevel nature of the sample. This limitation was based on the lacking possibility to properly handle missing observations in combination with a multilevel-SEM analysis in R. This lacking analysis correction might have masked correlations between observations of one class or school. This could especially be true for the variables pre- and post-visit (in relation to the knowledge scores). It might thereby be that one school was located closer to the Rijksmuseum or hosted more children from families with a higher social economic status (both cases might be reflected in higher numbers of pre- and post-visits to the museum) than other schools. Most schools in this sample were located relatively close to the Rijksmuseum, which could also explain the relatively high number of pre-visits in this sample. Future

studies should therefore attempt to take the nested structure of data from different schools and classes into account.

Another limitation we faced is related to a strong sampling bias. Based on the very positive outcomes of the teacher feedback, it is evident that only very well prepared and enthusiastic teachers volunteered to participate in this study (with their classes). We had some cases where one teacher of a school signed up for the study and dragged along more teachers from the same school, but even those teachers conducted all preparation components,

conducted some follow-up activit(ies), and displayed a positive attitude towards the project. Based on the enthusiastic participation of the teachers, where many also conducted extra activities with their classes, it could be marked that the positive knowledge outcomes are more likely bound to the (longer) preparation and wrap-up phases of the project (that take