• No results found

Social robots for (second) language learning with (migrant) primary school children

N/A
N/A
Protected

Academic year: 2021

Share "Social robots for (second) language learning with (migrant) primary school children"

Copied!
48
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Social robots for (second) language learning with (migrant) primary

school children

Veerle L.N.F. Hobbelink University of Amsterdam, Bachelorproject Psychobiology

Supervisor: prof. dr. Elly Konijn Student number: 11604689

(2)

Abstract

The aim of this study is to test the potential advantage of a social robot compared to a tablet in (second) language learning, in the effects on learning, engagement and enjoyment. Shortages in primary education call for new solutions, such as using technologies for

learning tasks. The current study differs from previous studies in studying direct

interaction with a robot, without using a tablet in between, often needed to compensate for limitations in the robots’ technology. In a field experiment, primary school children (N = 63, age = 4-6) participated in 3 story-telling exercises with either a semi-autonomous robot (without tablet, using WOz) or tablet. Results showed that children training with a social robot improved significantly more over time in comparison to those who trained with a tablet. However, no difference in improvement could be found on immediate learning outcomes. Children who trained with a robot were more engaged in the story-telling task, enjoyed it more and overall perceived the robot to be more human. Behavioral style of the robot made little difference overall, however, a differential effect of behavior was found for children of high and low educational abilities. While more steps need to be taken before social robots can be implemented in primary education, this study shows promising results for using social robot tutors in (second) language learning.

Keywords: Social robots, robot tutor, second language learning, primary school,

(3)

Social robots for (second) language learning in (migrant) primary school children

Preface

First and foremost, I would like to thank my supervisor, Elly Konijn, for trusting me fully in writing this article, performing data-analysis and encouraging me through this project, even though it was challenging for me at times. I would also like to express my gratitude toward Daniel Preciado-Vanegas for putting up with all my statistics enquiries and being excited about the obtained results. All data in this study were acquired by Brechtje Jansen (Jansen, 2019) and Victoria Mondaca-Bustos (Mondaca Bustos, 2019). The study design was based on previous research in which dr. Paul Vogt (Tilburg University) played an additional supervisory role.

As I was not able to do any data acquisition, I mostly did data-processing and analysis. The data-analysis was therefore quite extensive. To keep this report as concise as possible, some of the data-analysis and figures have been moved to the Appendix.

Personally, I feel those parts are quite exciting as well, so I encourage the reader to take a look. This report contains a reduced and concise version of the original methods and introduction.

Introduction

Primary education in Europe is facing a rising shortage. Budget cuts and shortage of personnel result in growing classrooms whereas they are also facing increasing diversity. The global Covid-pandemic has made the shortage of personnel in primary education even more prominent. As teachers fall sick, under-qualified staff is faced with teaching their classes (Bijl, 2020). The need for innovation in digital education has become greater than ever (van Baars, 2020). Qualified teachers are scarce, in particular for special need

students, and nearly half of schools report a shortage of teachers for these students

(4)

student population where some students require more attention than others (Inspectie van het Onderwijs, 2019). Primary school classes today are very diverse, consisting of children with different backgrounds and educational levels. As of today, almost one-third of children starting primary school in the Netherlands has a migration background (Inspectie van het Onderwijs, 2019). Children with a migration background start primary school with a disadvantage, as their educational achievement falls behind even at the age of three (Leseman, 2000; Scheele et al., 2010). Unequal levels of knowledge further complicate compiling a curriculum for preschool teachers relevant to all children in their class

(Kory-Westlund, Dickens, Jeong, Harris, Desteno, et al., 2015). This is especially relevant as children profit most from education adjusted to their level of knowledge (Vygotsky, 1980).

To cope with these problems, technologies could offer opportunities for

improvement. Tablets are accessible tools and currently, primary schools have started using them for tailored learning tasks. Learning with a tablet seems to affect learning outcomes positively, however, a number of difficulties have emerged for implementing tablets as learning tools, such as the devices distracting and addictive nature (Haßler et al., 2016; Otterborn et al., 2019; Tamim et al., 2015). A recent study did not find an advantage of using tablets in comparison to conventional teaching of a foreign language (Kayapinar et al., 2019).

An alternative option worth exploring is a social robot. A great advantage of social robots over tablets is its embodiment and anthropomorphism (van den Berghe et al., 2018; van den Berghe et al., 2020). Through the social robot’s embodiment natural interaction is enabled, which is beneficial for (second) language learning (van den Berghe et al., 2018). Natural interaction is the way young children learn language, for example through

conversation or the usage of language in their environment (Biemiller, 2012; Cartmill et al., 2013; Foster et al., 2005; Valli, 2008). Embodied systems such as the social robot are therefore more stimulating for language development than devices lacking humanlike

(5)

features, thereby creating opportunity for a social robot to take up a role in language education (Randall, 2019; van den Berghe et al., 2018).

More importantly, social robots are shown to be effective tutors, enhancing

concentration and academic performance, and are used in language development in young children (Belpaeme et al., 2018). Social robots are able to relieve a teacher’s workload, as they carry the opportunity for one-on-one tutoring and exercises tailored to the students’ level. In contrast to their human counterparts, robot tutors will never get tired or annoyed and are unbiased toward students. In addition, language learning with a robot tutor can be experienced as less intimidating in comparison to a peer or teacher (Golonka et al., 2014). However, studies also show mixed results, in particular in the area of second language learning, experiencing methodological problems like small sample sizes, (van den Berghe et al., 2018). Clarity around the true effect of a social robot tutor is needed in order for them to be implemented, especially when working with a young target audience.

An effective learning process ideally focuses on more than achieving high learning outcomes. Studies have shown that factors such as engagement and enjoyment during exercise positively affect learning gains (Gomez et al., 2010; Konishi et al., 2014). This also seems true for exercises with social robots; studies showed that exercises with a robot had a positive effect on learning gains, enjoyment and engagement (Belpaeme et al., 2018). Children who engaged more in story-telling with a robot also showed higher learning outcomes (Park et al., 2019; Schodde et al., 2019). Furthermore, language learning with a social robot boosts enjoyment, which facilitates a better learning experience, causing the student to immerse themselves in the learning process (Ali et al., 2019). However, van den Berghe et al. (2018) point out high engagement might be due to the novelty effect of interacting with the robot. Thus, while engagement and enjoyment possibly influence the learning process positively, confounding effects need to be accounted for.

Another effect that needs to be accounted for, is the effect of the teachers’

(6)

differential factor (Wayne & Youngs, 2003). Teachers can communicate in a more neutral or personalized and socially supportive way, and for human teachers the latter is usually more successful (Farrel, 2010; Skinner & Belmont, 1993; van Lehn, 2011). For robot tutors’ behavior, results are mixed (Belpaeme et al., 2018). A number of studies report different effects, or lack thereof, on learning outcomes when robots express different gestures, gazes and behavioral expressions (Castellano et al., 2013; de Wit et al., 2020; Kennedy et al., 2015; Kory-Westlund & Breazeal, 2014; Leyzberg et al., 2014). There is a fine line between the robot being social enough to sustain children’s interest and the robot distracting or even intimidating children by being too social (van den Berghe et al., 2018). The influence of different behavioral styles of robots in language learning remains unclear, and there is knowledge to be gained in this area.

The current study crucially differs from previous studies in studying direct

interaction with a robot, without using a tablet in between to allow for direct robot-child interaction (Kennedy et al., 2015; Kory-Westlund, Dickens, Jeong, Harris, Desteno, et al., 2015; Leyzberg et al., 2014). Recent studies have not yet found a difference in learning outcomes for (second) language learning with a social robot compared to a tablet

(Kory-Westlund, Dickens, Jeong, Harris, Desteno, et al., 2015; van den Berghe et al., 2018; Vogt et al., 2019). At present, use of robot tutors is often mediated by a tablet due to limitations in speech technology and object recognition (Konijn et al., 2020;

van den Berghe et al., 2018). A mediating device, such as a tablet, may undermine the advantage of the robot being a physical conversational partner in direction interaction with the pupil (Konijn et al., 2020). This is particularly problematic in language education, where conversation is key to natural interaction.

The aim of this study is to test the effectiveness of a social robot without an additional tablet for (second) language learning, guided by the following main research question: "What is the effect of interactive storytelling with a social robot on (second) language learning, enjoyment, engagement, and perceived humanness compared to a

(7)

tablet?". To answer this question, we focused on 4 to 6 year-old children learning Dutch as first or second language with a socially or neutrally behaving robot compared to a tablet. In a field experiment, the robot or tablet read an interactive story with the children, of whom most have (parents from) a (non-western) migration background and just entered primary school in the Netherlands. Interactive storytelling makes use of natural

interaction, and has proven to be effective in situations where a tutor or teacher read a story (Biemiller, 2012; Marulis & Neuman, 2010; Mol et al., 2008).

This study is two-fold, as it focuses on the effectiveness of the social robot compared to a tablet, as well as effectiveness of different behavioral styles of the robot. Due to the mixed results with respect to the behavioral style of the robot, this study programmed the social behavior of the robot without any gestures or movements during the task and only focuses on social behavior in dialogue. Learning outcomes, engagement and enjoyment of language learning exercises with a social and neutral robot will be compared to the same exercises with a tablet. We hypothesize that an embodied robot will lead to higher learning outcomes, enjoyment and engagement in language learning exercises than a tablet.

Perceived humanness is included in this study as potential mediator for the effect between embodiment of the device and learning outcomes, engagement and enjoyment. As an additional exploration, the effect of behavioral style of the robot on learning outcomes of children with different initial language levels is analyzed.

Method

Participants and design

Participants

All 63 participants were Dutch primary-school children, aged between 4 and 6 (N =

63, Mage = 5.45 years, SD = 0.68, 39 boys). Each child received active consent from their

parents. A high number of children has parents with a migration background (84.1%)and 57.1% is bi-or multilingual. Initially, 68 children started this experiment, but 1 child did

(8)

not complete the three weeks due to absence. 4 children were omitted from the data due to extremely low or high scores, which were not representative to the task. To avoid random data elimination, data was omitted if a participant scored in the lowest or highest 5th percentile in all categories; the baseline measurement, the immediate post-test and the delayed post-test.

Design

Children were randomly assigned to a device (social robot; N = 23, neutral robot;

N = 20, tablet; N = 20). A mixed factorial design was used with device as a between

factor and learning outcomes (T0: baseline, T1: immediate post-test, T2: delayed

post-test) as within factor. The tablet or robot (socially or neutrally behaving) read three stories during interactive exercises (indicated with E) with individual participants (at E1: first, E2: second and E3: third tutoring session). Testing was two-fold, both the difference between device (i.e., robot, both behaviors included vs. tablet) and the difference between behavioral style of the robot was tested. Learning outcomes, engagement and enjoyment were included as dependent variables. Perceived humanness was included as a potential mediator of the effect between IV and DVs. The baseline measurement (T0) was taken one week before the first tutoring exercise, to control for prior knowledge. The delayed

post-test was taken two weeks after the final tutoring exercise. Over the course of three weeks, each child had, once a week, three tutoring sessions with either the robot or tablet. For a visual representation of this design, see Appendix Figure A3.

Materials and procedures

Robot versus tablet conditions

The robot used was a SoftBank NAO robot, a 58cm tall humanoid (Gouaillier et al., 2009) with Choreographe 2.4.1. software. An iPad 2 with a custom-built web application was used in the tablet condition. Both conditions were made as similar as possible, using the same voice. Feedback options and images in the tasks were kept equal in both

(9)

conditions as well.

To minimize novelty effects and allow children to get comfortable with the robot, children were introduced to the robot beforehand (van den Berghe et al., 2018). Individual meetings (e.g., tutoring sessions) took place in a separate room at school. An experimenter was always present in the room, placed behind the child so as not to interfere (Fig 1).

Figure 1

Note. Left: Schematic overview of the setting during exercises in the robot condition.

Right: Schematic overview of the setting during exercises in the tablet condition. Figure by Jansen, 2019, Mondaca Bustos, 2019.

Robot behaving socially or neutral

In this research, both the social and neutral robot were programmed to be

task-oriented. Neither robot was programmed in a personalized way (i.e., not adapting to the participant’s skill set). The social behavior included the robot welcoming the child by introducing itself and asking for the child’s name, which it used throughout. Other social cues included gestures such as waving and looking at the participant. Upon finishing the interaction, the social robot thanked the child for reading a story together, and said "Goodbye". The neutral behaving robot was solely task-oriented and did not use gestures or social cues.

The task for the participant was to match images to target words used in the story. In the robot condition, the images were presented on plasticized sheets. Each sheet

(10)

The other three images contained images similar to the target word. Upon choosing an image, the robot could provide feedback (correct, false, try again). As Automatic Speech Recognition (ASR) used by the NAO robot is not yet sufficient for young children’s speech in direct interaction (Belpaeme et al., 2018), a Wizard of Oz (WOz) technique was used. Thus, the experimenter controlled the responses of the robot without the child knowing.

Tablet condition

Procedures for the tablet condition were kept as similar as possible to the robot condition. Although the tablet was not able to introduce itself, it asked for the child’s name. As the images were presented on screen, the tablet asked the child to click on them instead of point at the image on the sheet. Exercises with both devices were equal.

Children in the tablet condition were given the opportunity to read a short story with the robot after all exercises were completed in order to prevent possible disappointment. All children in the tablet condition knew how to operate one beforehand.

Word learning through storytelling

Stories

Three stories were selected based on the children’s age (4 - 6) and average language level, after consultation with pedagogues and experts in language development of children. The original stories were adjusted to fit the target words, and the duration was, on

average, 10 minutes and 31 seconds.

Target words

The target words and receptive vocabulary task are based on the Peabody Picture Vocabulary Test (Dunn and Dunn, 2007, Fourth edition). At fixed points during the story-telling exercise, the child had to choose from four pictures. One contained the target words, the other four were similar images (i.e. the same category). In total there were twenty target words, divided over three stories. Target words of previous exercises were tested again in order to remember them better. Each target word was introduced in the

(11)

context of the story first, after which the robot or tablet would repeat the word and ask the child to point to the matching image. When the child picked the correct image, the device would compliment them and give a description of the target word. When the incorrect image was chosen, the device gave the child the opportunity to try again. For a second correct guess, the device would compliment the child. If not, the device would describe the right picture and ask the child to repeat the word.

Measures

Learning performance

Learning outcomes were determined based on productive vocabulary tests, which were based on CELF-tests (Paslawaski, 2005). The child was asked to match images with the target words used in the story. Similarly to the active vocabulary task, children had up to two attempts to answer correctly. Researchers encouraged children with a maximum of two further questions, but never gave the right answer, not even when the child failed. Children could be granted two points if they got the word correctly the first attempt, one point if they got it correctly at the second attempt and zero points if they did not get it at all.

At the first storytelling exercise (E1), the task contained seven target words, giving children the opportunity to score between 0 - 14 points, based on the scoring system

described above. Each session tested new words, as well as repeating the words exercised at the previous session(s). The second storytelling exercise (E2) thus contained six additional target words and scores could be between 0 - 26 points. The last exercise (E3) added seven target words, giving children the opportunity to score between 0 - 40 points. Thus, at the baseline test (T0), immediate post-test (T1) and delayed post-test (T1), children could score a minimum of 0 points if they knew none of the twenty target words, and a maximum of 40 if they knew all words (see Appendix, Fig A4).

(12)

words children knew beforehand. The delayed post-test (T2) indicated how many words children still remembered after time. The variable ’Immediate post-test’ (T1) was calculated by the sum of points a child received during the test at the last storytelling exercise (E3), which included all target words used in the previous exercises as well as new ones. Therefore, the dependent variable learning outcomes held three different time points with scores, and is thus used as a within variable in repeated measures ANOVA.

Enjoyment and engagement

In addition to measuring learning outcomes, engagement and enjoyment were monitored during the interactive storytelling exercises at all three sessions. These variables were comprised of self-report items as well as observational items. The observation scheme was filled out by researchers during the exercise.

Engagement was originally scored on a 12-item observation scheme during E1, E2 and E3, but to improve consistency and reliability of the scheme, one item was removed

from the variable (N = 11, Cronbach’s αE1 = .82, αE2 = .83, αE3 = .80). This scale

contained items based on prior related child-robot interaction observation schemes (Baxter et al., 2017; Laevers et al., 2005; Pulido et al., 2017), and focused on, for example, the concentration of the child in the task or focus on the robot. Observations were represented on a 4-point Likert scale (1 = not at all, 4 = very much).

Enjoyment was scored on a 7-item scheme during E1, E2 and E3, with both

observational and self-report items (N = 7, Chronbach’s αE1 = .76, αE2 = .81, αE3 = .78).

Likewise, these items were based on literature and scales for enjoyment (Davis et al., 1992; Gomez et al., 2007; Moore et al., 2009; Pulido et al., 2017). Self-report items in children of this age are hard to measure because of the so called yes-bias (Moriguchi et al., 2008). Therefore, observational items were included for enjoyment, and all self-report items were asked as two-part questions, first asking them if they enjoyed something, whereafter they could indicate how much they (dis)liked it. Both the observational and self-report

(13)

Perceived humanness

Perceived humanness of the device was measured at the last tutoring session through a questionnaire. This way, children had time to get used to the device they were working with and form an impression about it. It was originally scored on a 7-item scheme with both observational and self-report items, but to improve reliability, one item was removed (N = 6, Cronbach’s α = .81). Perceived humanness was measured through an oral questionnaire tailored for this age group. We measured perceived humanness at E3, after the children interacted with the tablet or robot three times, because this way the children had time to become familiar with the device and form an impression of it. The questionnaire for perceived humanness measured to what extent the children perceived the device they used during the storybook reading exercise as humanlike, on a 4-point Likert scale (1 = not at all to 4 = very much). It was based on prior literature and scales

measuring anthropomorphism and perceived humanness (Bartneck & Forlizzi, 2004; Duffy, 2003; Konijn & Hoorn, 2017a). The original questionnaire included 7 items that tested for, among other things, anthropomorphic scores (Duffy, 2003) and items related to theory of affective bonding (Konijn & Hoorn, 2017a).

Control variables

Demographic and background variables needed to control for (e.g. gender, age, background), were provided by the children’s teachers. In addition, teachers scored the receptive (i.e. passive) and productive (i.e. active) language level of each child. The

variable ’language level’ is comprised of the mean of these two variables. This way, it could be checked if the distribution of language levels between all conditions was equal.

Moreover, the baseline measurement (T0) assessed whether a child had an initial knowledge level of above or below level.

Analysis plan

To test for significant differences between robot vs. tablet and socially behaving vs. neutrally behaving groups in learning outcomes, engagement and enjoyment, repeated

(14)

measures ANOVA and MANOVA were used. First the effect of the independent variables (i.e., robot vs. tablet) on perceived humanness were tested using a one-way ANOVA. Then, a regression analysis using perceived humanness was used to determine if it could be a possible mediator on the effect between embodiment, learning outcomes, engagement and enjoyment. For the additional analysis, a repeated measures ANOVA was used with different advancement groups. To determine the location of possible differences found in ANOVA, t-tests or Wilcoxon tests were used. Levene’s test revealed the assumption of equality of variances was not violated. Shapiro’s test indicated that the data were not normally distributed, but ANOVA is robust for this violation (Schmider et al., 2010). When the assumption of sphericity was violated, a Greenhouse-Geisser correction was applied.

Results

Descriptive statistics and correlations

Out of 63 participating children, 39 were boys (61.9%) and 24 girls (38.1%).

Participants were aged between 4 and 6, with a mean age of 5.45 years (Mage in months =

65.44, SD = 8.17; full overview in Appendix Table A1). In total, 53 children had parents from a non-western migration background (84.1%) and 10 had parents with a Dutch background (15.9%). There were 36 bilingual or multilingual children (57.1%) and 27 monolingual children (42.9%).

A Kruskal-Wallis rank sample test was conducted for different variables (see Appendix, Table A2) to check whether the participants were randomly assigned to the experimental conditions. This showed no significant differences for any of the demographic variables and initial language level. Moreover, there was no significant difference in baseline scores between the tablet and robot (both behaviors included, t(61) = 0.82, p = .42). This indicates that the conditions were randomly assigned to the participants within the sample.

Correlations between all demographic variables, initial learning level, perceived humanness and the dependent variables in this study were calculated (see Appendix, Table

(15)

A3). Language level, as scored by the childrens’ teachers, and the baseline measure at T0 both indicate the initial language level and correlate significantly (r(63) = 0.51, p < .001). Furthermore, there is a relatively low but significant negative correlation between

background and baseline (T0) scores (r(63) = -.30, p = .02). Perceived humanness correlates strongly with enjoyment (r(63) = .71, p < .001). There is also a significant correlation between engagement and enjoyment (r(63) = .52, p < .001).

Testing hypotheses

Testing hypothesis 1: effect of embodiment on learning outcomes

To test hypothesis 1a, that children who read stories with the robot remember more words covered in the stories, compared to children who used a tablet, a 2 (device) x 3 (time of testing: T0 baseline, T1 immediate post-test and T2 delayed post-test) repeated

measures ANOVA was used.

The effect of time on learning outcomes was significant (F (2, 122) = 289.30, p < .001, ηp2 = .23), indicating the learning task was successful (Fig 2). There is no significant main effect of device (F (2, 61) = 2.81, p = .14, ηp2 = .03), however, there is a significant interaction effect between time of testing and device (F (2, 122) = 4.05, p = .03, ηp2 = .004). Children learned more with the robot from the baseline to the delayed post-test, in comparison to a tablet, thus supporting the hypothesis partly (see Appendix for an in-depth analysis).

To test for differences between the robot types (hypothesis 1b, i.e., a robot behaving socially, N = 23 versus neutrally, N = 20), a 2 (device) x 3 (time of testing: T0 baseline, T1 immediate learning outcomes and T2 delayed learning outcomes) ANOVA was

conducted. It yielded no significant results for the interaction effect of time and device (F (2, 82) = 0.15, p = .79, ηp2 < .001), but the main effect over time was significant (F (2, 82) = 204.67, p < .001, ηp2 = .27). Children did not learn significantly more with either the socially or neutrally behaving robot (see Appendix, Fig A1). Thus, hypothesis 1b is

(16)

Figure 2

Note. Children’s mean scores on the language test, with error bars (95% confidence interval) over

time, organized by the device they exercised with (blue: robot; red: tablet). Children who trained with a robot improved more in scores from the baseline to the delayed post-test in comparison to

children who trained with a tablet. * indicates a significant difference at the .05 level.

refuted as no differences between behavioral styles of the robot were found.

Testing hypothesis 2: effect of embodiment on engagement and enjoyment

To test if engagement and enjoyment were higher with a robot than a tablet, a MANOVA was conducted due to the relatively strong correlation between these two dependent variables (r(63) = .52). For this, engagement and enjoyment of a device (robot vs. tablet) during interactive storytelling at E1, E2 and E3 (Fig 3, Fig 4) were tested. There was a significant main effect of device (F (1, 61) = 36.15, p = < .001, η2 = 0.37). Analysis showed that engagement and enjoyment were higher with the robot at all times,

(17)

thus this hypothesis (2a) is supported. Furthermore, engagement was a significant predictor of immediate learning outcomes (see Appendix).

Figure 3

Note. Children’s mean engagement during exercising, with error bars (95% confidence interval)

over time, organized by the device they exercised with (blue: robot; red: tablet). Children were significantly more engaged with the robot at all times. * indicates a significant difference at the

.05 level, *** at the .001 level.

To test for differences between the two behavioral types of the robot (i.e., a robot behaving socially versus neutral), a MANOVA with dependent variables engagement, enjoyment and time (E1, E2 and E3) and independent variable the robot types was conducted. The MANOVA did not show significant main effects of time or an interaction effect between time and device. It did not show a main effect of device. This means children did not engage more in exercising with a social robot in comparison to a neutral robot, and did not enjoy exercising with one more than the other, they were rather engaged

(18)

Figure 4

Note. Children’s mean enjoyment during exercising, with error bars (95% confidence interval)

over time, organized by the device they exercised with (blue: robot; red: tablet). Children enjoyed exercising with the robot more at all times. * indicates a significant difference at the .05

level, *** at the .001 level.

with and enjoyed exercising with the robot regardless of how it behaved. Thus, hypothesis 2b is refuted as no difference between behavioral styles emerged.

Testing hypothesis 3: Effect of embodiment on perceived humanness

To test if perceived humanness was higher for the robot or tablet, an ANOVA with scores of perceived humanness per device (social robot, neutral robot and tablet), as rated using a questionnaire, was used as dependent variable. There is a significant difference in scores of perceived humanness per device (F (2, 60) = 19.43, p < .001, ηp2 = 0.39). The robot, regardless of behavior, is perceived as more human than the tablet (see Appendix). Social and neutral robot scores on perceived humanness did not significantly differ.

(19)

Furthermore, perceived humanness as potential mediator between device and learning outcomes was tested, and results showed perceived humanness to be a partial mediator between the effect of embodiment on enjoyment (see Appendix). Thus, hypothesis 3 is accepted.

Exploratory analysis

An additional analysis was conducted to examine whether behavioral style of the robot would affect learning outcomes based on individual educational ability differences of children. Educational ability was included in this analysis as independent variable, as well as the robot’s behavioral style (socially or neutrally, Fig A6 in Appendix). To determine the initial educational ability, baseline scores where used. Two groups were created for ’advancement’: ∆M < above average < ∆M + 1SD (N =22) and ∆M > below average > ∆M − 1SD (N =21). Based on these two groups, a 2 (device) x 2 (advancement) x 2 (time; T0 baseline, T1 immediate post-test) ANOVA was conducted. Besides a significant effect for time (F (1, 39) = 265.57, p < .001, η2 = .87), a significant three-way interaction effect

of time * device * advancement was found (F (1, 39) = 8.12, p = .007, η2 = .17).

Furthermore, a marginally significant two-way interaction effect of device * advancement

was found (F (1, 39) = 2.74, p = .11, η2 = .07). Post-hoc testing (see Appendix) revealed

that children with below average educational ability (i.e., low advancement), exercising with the social robot, advanced more in (second) language learning than children with above average educational ability exercising with the social robot.

Discussion

In this study, children were taught new words with a digital tutor, either a robot or a tablet. We wanted to test whether robot tutoring had a more positive effect on learning outcomes in (second) language learning, in comparison to tablets. Both devices proved to be adequate tutors, as all children improved significantly over time and learned more words. Children who had trained with the robot showed a greater increase in learning

(20)

outcomes from the baseline test (T0) to the delayed post-test (T2), obtained two weeks after the story-telling exercise. Secondly, we hypothesized that children who interacted with the robot during the story-telling would engage more with the robot than the tablet and enjoy the exercise better, which they did. Learning outcomes would also be predicted by engagement scores, meaning higher engagement scores stimulated higher learning

outcomes. Moreover, children perceived more humanness in the robot than with the tablet. In addition, we tested how different types of behavior of the robot would affect the results, whether a robot behaved more social or neutral, just task-oriented. However, the

behavioral style of the robot did not affect the language learning. Children learned a similar amount of words with both communicative styles of the robot. However, the behavioral style of the robot did affect children with differences in individual learning ability differently. Children whose knowledge was below level starting the experiment, had more to gain from the social behavior of the robot, considerably more than children whose skills were above average and trained with the same socially behaving robot.

To our knowledge, this is the first research in (second) language learning that compares an semi-autonomous embodied robot interacting directly with the learner,

without a tablet, to a tablet-only condition. This is a valued addition to current studies, as we provide an insight in the effectiveness of the social robot in comparison to a different technical tutor, the tablet. Effectiveness of the social robot was found over time, however, training with a robot over a tablet did not prove to significantly increase learning outcomes from the baseline (T0) to the immediate post-test (T1), obtained right after the last

tutoring session. Our findings are partially in line with previous research, where

effectiveness of a social robot in second language learning has already been shown to be positive (Belpaeme et al., 2018), though no differences were found in comparison to a tablet (van den Berghe et al., 2018). The lack of finding differences thus far was attributed to the robot interacting with the learner via a tablet rather than in direct one-on-one interaction. This is therefore an important contribution of our research, resulting in

(21)

stronger effects of language tutoring by a robot over a tablet. Our results suggest that, over time, children indeed learn more with a robot than a tablet.

Furthermore, the robot significantly increased engagement and enjoyment, and engagement positively affected learning outcomes. Konishi et al. (2014) already stated that interactive environments in which children are engaged, opportunity is created for second language learning and Belpaeme et al. (2018) argue that exercising with a robot does just that. In line with previous studies, children who trained with a robot engaged more in exercising and enjoyed it better (Ali et al., 2019; Park et al., 2019; Schodde et al., 2019). The distracting nature of the tablet when combined for interaction with the robot (as in van den Berghe et al., 2018), might lead to low engagement in the task. Moreover, training with a robot might be more captivating and lively than training with a tablet, which cannot move nor look at you, and therefore could be perceived as more monotonous and less interesting and enjoyable.

The humanlike features of the robot elicit anthropomorphization (DiSalvo et al., 2002) and children perceived it as more human than a tablet. In this study, perceived humanness did not predict learning outcomes. This in contrast to the findings of Berghe et al. (2020). However, perceived humanness was a predictor for engagement and enjoyment and affected engagement and enjoyment positively, which is in line with previous research (Randall, 2019). While perceived humanness and embodiment (i.e., the use of a robot vs. tablet) both predicted engagement and enjoyment, perceived humanness did not mediate the effect between embodiment and engagement, but did partially mediate the effect between embodiment and enjoyment. Thus, in this study, high enjoyment in a task can be partially explained by a greater perceived humanness of the device used for the storytelling exercise, which adds to previous research by explaining why enjoyment is higher with a robot (Pereira et al., 2008; Sinoo et al., 2018). Seemingly, the degree to which children perceived the robot or tablet to be human affected enjoyment, and effectiveness of the robot can be partially explained by its perceived humanness. The high perceived

(22)

humanness of the robot, alongside its high engagement and enjoyment, thereby creates potential to take up the role of tutor for learning tasks.

The robot was programmed in either a more social or neutral way. Behavioral style of the robot did not affect how many words a child learned. Furthermore, whether the robot behaved socially or neutrally did not affect how engaged a child was in working with the robot and how much they enjoyed it. It also did not affect how humanlike they

perceived it to be. The lack of difference in effects of the social versus neutral robot is not a novel finding, mixed results were reported earlier for different behaving robots in (second) language learning (Belpaeme et al., 2018). Using initial educational ability as a possible explanation to the differential effectiveness of behaviors, Konijn and Hoorn (2017b) found that in math tutoring the neutral robot was more effective overall, as the social behavior of the robot was perhaps too distracting during such a high-concentration mathematical task. Contrary to those findings, our findings are partially in line with those of Hein and

Nathan-Roberts (2018), as they support the idea that a socially behaving robot is more successful for language learning, but primarily for less advanced children. Perhaps social behavior matters most for under-achieving students as they require more support in their learning process.

A marginally significant result showed that children who had above average

educational abilities had more to gain from the neutrally behaving robot than the socially behaving robot. Possibly, above-average children are more task-focused and the social robot distracts them. Furthermore, this study shows that individual differences matter in choosing the behavioral style of the robot. Perhaps, the mixed results of the effect of the robot’s behavioral style on learning outcomes (Belpaeme et al., 2018) can be explained through the fact some studies did not account for individual differences and because of differences in learning tasks. There seems to be a mechanism that for social tasks, like learning a language, below-average children can benefit from a social style of the robot while above-average children cannot. Whereas for more task-focused learning processes, like

(23)

math, this mechanism is reversed (Konijn & Hoorn, 2017b). These findings are certainly very interesting, and future research involving more socially or neutrally behaving robots should take initial knowledge level into account as well as compare different tasks, which might clarify the differential effects of behavior of a social robot on learning outcomes.

Interestingly, a trend emerged for a decrease in learning outcomes from the immediate (T1) to the delayed (T2) post-test. Children who trained with the tablet seemed to have forgotten more words in the time between the last tutoring session (T1) and the delayed post-test (T2) than children who trained with the robot. These results suggest training with a robot possibly enhances retainment of knowledge. While this trend was not a significant result, it nonetheless provides an interesting angle of approach for future research.

Furthermore, a negative correlation result showed that children who were raised by parents from a (non-western) migration background scored lower on the baseline test than children who only speak Dutch. This is in line with the literature and concerns about bilingual children and children whose parents from a (non-western) migration background lagging behind in Dutch language development (Inspectie van het Onderwijs, 2019;

Leseman, 2000; Leseman et al., 2019; Scheele et al., 2010). In light of the growing diversity in educational levels, cultural backgrounds, and socioeconomic status in classrooms, the number of children who start primary school with a disadvantage will only increase.

Particularly these children stand to benefit from one-on-one tutoring. Embodied robots are good candidates to provide the individual tutoring, something that is currently not

achievable for human teachers. Thus, embodied robots carry potential to reduce individual differences and disadvantages.

Limitations and considerations

For this type of research, the sample size is quite large. However, for the study’s power, the small sizes per group in the statistical analyses are relatively low. Moreover, the variance in the test scores appears rather large, which challenges to obtain significance

(24)

through frequentist statistical analysis. Larger sample sizes are hard to obtain in field research involving participants of such a young age, especially when it requires one-to-one interaction. To reach solid conclusions about whether an embodied robot is more suitable for (second) language learning than a tablet, and what possible social behavior works better, additional data collection is needed. For field studies with relatively small sample sizes and large variety in test scores, using Bayesian statistics might be a better approach to analyze this kind of data (Konijn et al., 2015).

Most participants in this study were bi- or multilingual children. For these children, Dutch was taught as a second language. Though tutoring with the robot proved more effective over time in comparison to the tablet, it might not point to an absolute positive effect of the digital tutor used. Most likely, these children hear Dutch in their day-to-day environment, used by teachers and on the television or radio, and learn from this as well. To find the true effect of the robot as language tutor, training with a foreign language can be considered in future research.

Due to time constraints, children practiced with the robot or tablet three times and could not rehearse over a longer time period. Randall (2019) recommends that in robot language learning, studies of the effects be at least eight sessions long. Any possible

differences will then emerge clearer and make the results more conclusive. Furthermore, in order to implement a fully autonomous social robot into a classroom, the robot should have more technical sophistication. This will also be important to draw a clear distinction between a more social (e.g., speech supportive) and neutral robot.

Furthermore, there were a few outliers. Because the children were of different ages, some of whom were still really young, their language level may simply not be sufficient to fully comprehend the task and stories. The children with remarkably low scores had a mean age of 4.5 years. The child with the highest score was aged 6.3 years. The task was possibly below level for this child, as this child was in a further stage of language

(25)

language level as scored by their teachers.

In general, robots seem to be effective teaching tools to support human teachers. However, when a robot is actually implemented in educational practice, there will be pedagogical and societal implications involved. Reich-Stiebert and Eyssel (2016) captured teacher’s concerns and attitudes towards social robots in education. Furthermore, there is still a great deal to unravel before implementing robots in the classroom. Which behavioral style and teaching role (tutor, tutee, peer; Chen et al., 2020) works best for which task? This research shows that robots can be better tutors than tablets for (second) language learning over time, however, conclusiveness and clarity about the added value of robot’s physical embodiment in comparison to other agents needs further study. Importantly, the disadvantages of use of robots in education need not be overlooked, especially for such a young target audience. Tolksdorf et al. (2020) mapped ethical concerns of applying robots in Kindergarten settings and thereby emphasize, among other aspects, the vulnerability of children as a group and the role of stakeholders (i.e. teachers and caregivers). Smakman and Konijn (2019) systematically analyzed the moral and ethical considerations that come with the implementation of robot tutors in education, for various stakeholder groups. They reviewed concerns related to the values of friendship and attachment, human contact, privacy, safety, and so on. Concerns need to be extensively researched if robots are to be implemented in (second) language learning.

In this research, effectiveness of a robot over a tablet was proven over time, but not on immediate learning outcomes. Though both robot and tablet proved to increase word knowledge significantly, robots proved to be more successful than tablets in tutoring over a larger period of time. This study provided a valued addition to research, studying a robot acting without the use of an additional tablet, helping to unravel the true effect of social robots in education. While more steps need to be taken before social robots can be implemented in primary education, this study shows promising results for using social robot tutors in (second) language learning.

(26)

References

Ali, H., Bhansali, S., Köksal, I., Möller, M., Pekarek-Rosin, T., Sharma, S., Thebille, A.-K., Tobergte, J., Hübner, S., Logacjov, A., Özdemir, O., Rodriguez Parra, J.,

Sanchez, M., Shruti Surendrakumar, N., Alpay, T., Griffiths, S., Heinrich, S., Strahl, E., Weber, C., & Wermter, S. (2019). Virtual or physical? social robots teaching a fictional language through a role-playing game inspired by game of thrones. Social Robotics. ICSR 2019., 11876, 358–367.

https://doi.org/https://doi.org/10.1007/978-3-030-35888-4_33

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental

disorders: DSM-5 (5th ed.). Autor.

Bartneck, C., & Forlizzi, J. (2004). A design-centred framework for social human-robot interaction. RO-MAN 2004. 13th IEEE International Workshop on Robot and

Human Interactive Communication (IEEE Catalog No.04TH8759), 591–594.

https://doi.org/https://doi.org/10.1109/ROMAN.2004.1374827

Baxter, P., Ashurst, E., Read, R., Kennedy, J., & Belpaeme, T. (2017). Robot education peers in a situated primary school study: Personalisation promotes child learning.

PLOS ONE, 12. https://doi.org/https://doi.org/10.1371/journal.pone.0178126

Belpaeme, T., Kennedy, J., Ramachandran, A., Scasselati, B., & Tanaka, F. (2018). Social robots for education: A review. Science robotics, 3, 1–9.

https://doi.org/10.1126/scirobotics.aat5954

Biemiller, A. (2012). Teaching vocabulary in the primary grades: Vocabulary instruction

needed (2nd). Guildford Press.

Bijl, H. (2020). Lerarentekort en coronacrisis, dus staat hier een onbevoegd persoon voor de klas. Het Parool. Retrieved November 12, 2020, from

https://www.parool.nl/amsterdam/lerarentekort-en-coronacrisis-dus-staat-hier-een-onbevoegd-persoon-voor-de-klas~b1eef877/

(27)

Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N., & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences, 110, 11278–11283. https://doi.org/https://doi.org/10.1073/pnas.1309518110

Castellano, G., Paiva, A., Kappas, A., Aylett, R., Hastie, H., Barendregt, W., Nabais, F., & Bull, S. (2013). Towards empathic virtual and robotic tutors. Artificial

intelligence in education, 7926, 733–736.

https://doi.org/https://doi.org/10.1007/978-3-642-39112-5_100

Chen, H., Park, H. W., & Breazeal, C. (2020). Teaching and learning with children: Impact of reciprocal peer learning with a social robot on children’s learning and emotive engagement. Computers & Education, 150.

https://doi.org/https://doi.org/10.1016/j.compedu.2020.103836.

Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1992). Extrinsic and intrinsic motivation to use computers in the workplace. Journal of Applied Social Psychology, 22. https://doi.org/https://doi.org/10.1111/j.1559-1816.1992.tb00945.x

de Wit, J., Brandse, A., Krahmer, E., & Vogt, P. (2020). Varied human-like gestures for social robots: Investigating the effects on children’s engagement and language learning. Proceedings of the 2020 ACM/IEEE International Conference on

Human-Robot Interaction, 359–367. https://doi.org/10.1145/3319502.3374815

DiSalvo, C. F., Gemperle, F., Forlizzi, J., & Kiesler, S. (2002). All robots are not created equal: The design and perception of humanoid robot heads [ACM]. Proceedings of

the 4th Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques, 321–326.

Duffy, B. R. (2003). Anthropomorphism and the social robot. Robotics and Autonomous

Systems, 42, 177–190.

(28)

Dunn, L. M., & Dunn, D. M. (2007). Peabody picture vocabulary test–4th edition. Bloomington: NCS Pearson.

Farrel, P. (2010). School psychology: Learning lessons from history and moving forward.

School Psychology International, 31, 581–598.

https://doi.org/https://doi.org/10.1177/0143034310386533

Foster, M. A., Lambert, R., Abbott-Shim, M., McCarty, F., & Franze, S. (2005). A model of home learning environment and social risk factors in relation to children’s emergent literacy and social outcomes. Early Childhood Research Quarterly, 20, 13–36. https://doi.org/https://doi.org/10.1016/j.ecresq.2005.01.006

Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, L., D., & Freynik, S. (2014). Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, 27.

https://doi.org/doi:10.1080/09588221.2012.700315

Gomez, E. A., Wu, D., & Passerini, K. (2010). Computer-supported team-based learning: The impact of motivation, enjoyment and team contributions on learning outcomes.

Computers & Education, 55, 378–390.

https://doi.org/https://doi.org/10.1016/j.compedu.2010.02.003

Gomez, E. A., Wu, D., Passerini, K., & Bieber, M. (2007). Utilizing web tools for

computer-mediated communication to enhance team-based learning. International

Journal of Web-Based Learning and Teaching Technologies (IJWLTT), 2, 21–37.

https://doi.org/https://doi.org/10.4018/jwltt.2007040102

Gouaillier, D., Hugel, V., Blazevic, P., Kilner, C., Monceaux, J. O., Lafourcade, P., & Maisonnier, B. (2009). Mechatronic design of nao humanoid. 2009 IEEE

International Conference on Robotics and Automation, 769–774.

(29)

Haßler, B., Major, L., & Hennessy, S. (2016). Tablet use in schools: A critical review of the evidence for learning outcomes. Journal of Computer Assisted Learning, 32,

139–156. https://doi.org/https://doi.org/10.1111/jcal.12123

Hein, M., & Nathan-Roberts, D. (2018). Socially interactive robots can teach young

students language skills; a systematic review. Proceedings of the Human Factors and

Ergonomics Society - Annual Meeting, 62, 1083–1087.

https://doi.org/https://doi.org/10.1177/1541931218621249

Inspectie van het Onderwijs. (2019). De staat van het primair onderwijs 2019. https: //www.onderwijsinspectie.nl/documenten/rapporten/2019/04/10/deelrapport-primair-onderwijs

Jansen, B. (2019). Social robots for (second) language learning in dutch primary schools. Katsarova, I. (2020). Teaching careers in the eu: Why boys do not want to be teachers.

https://www.europarl.europa.eu/RegData/etudes/BRIE/2019/642220/EPRS_ BRI(2019)642220_EN.pdf

Kayapinar, U., Erkir, S., & Ko, N. (2019). The effect of tablet use on students’ success in english as a foreign language (efl) grammar classroom. Educational Research and

Reviews, 14, 178–189. https://doi.org/DOI:10.5897/ERR2018.3670

Kennedy, J., Baxter, P., & Belpaeme, T. (2015). The robot who tried too hard: Social behaviour of a robot tutor can negatively affect child learning. Proceedings of the

Tenth Annual ACM/IEEE International Conference on HumanRobot Interaction -HRI ’15, 67–74. https://doi.org/doi:10.1145/2696454.2696457

Konijn, E. A., & Hoorn, J. F. (2017a). Parasocial interaction and beyond: Media personae and affective bonding. in p. rössler, c. a. hoffner, & l. van zoonen (eds.) The

International Encyclopedia of Media Effects, 1–15.

(30)

Konijn, E. A., & Hoorn, J. F. (2017b). Robot tutor and pupils’ educational ability: Teaching the times tables. Computers & Education, 159.

https://doi.org/https://doi.org/10.1016/j.compedu.2020.103970

Konijn, E. A., Smakman, M., & van den Berghe, R. (2020). Use of robots in education.

The International Encyclopedia of Media Psychology, 1.

Konijn, E. A., van de Schoot, R., Winter, S. D., & Ferguson, C. J. (2015). Possible solution to publication bias through bayesian statistics, including proper null hypothesis testing. Communication Methods and Measures, 280–302.

https://doi.org/https://doi.org/10.1080/19312458.2015.1096332

Konishi, H., Kanero, J., Freeman, M. R., Golinkoff, R. M., & Hirsh-Pasek, K. (2014). Six principles of language development: Implications for second language learners.

Developmental Neuropsychology, 39, 404–420.

https://doi.org/https://doi.org/10.1080/87565641.2014.931961

Kory-Westlund, J. M., & Breazeal, C. L. (2014). Storytelling with robots: Learning

companions for preschool children’s language development. 23rd IEEE international

symposium on robot and human interactive communication, 2014, 643–648.

https://doi.org/https://doi.org/10.1109/roman.2014.6926325

Kory-Westlund, J. M., Dickens, L., Jeong, S., Harris, P., DeSteno, D., & Breazeal, C. (2015). A comparison of children learning new words from robots, tablets, people.

Conference Proceedings of New Friends: The 1st International Conference on Social Robots in Therapy and Education, 7–8.

Kory-Westlund, J. M., Dickens, L., Jeong, S., Harris, P., Desteno, D., & Breazeal, C. (2015). The interplay of robot language level with children’s language learning during storytelling. Proceedings of the Tenth Annual ACM/IEEE International

Conference on Human-Robot Interaction Extended Abstracts, 65–66.

(31)

Laevers, F., Daems, M., De Bruyckere, G., Declercq, B., Silkens, K., Snoeck, G., & van Kessel, M. (2005). Well-being and involvement in care a process-oriented

selfevaluation instrument for care settings (sics). Kind & Gezin and Research Centre

for Experientel Education.

Leseman, P. P. M. (2000). Bilingual vocabulary development of turkish preschoolers in the netherlands. Journal of Multilingual and Multicultural Development, 21, 93–112. https://doi.org/https://doi.org/10.1080/01434630008666396

Leseman, P. P. M., Henrichs, L. F., Blom, E., & Verhagen, J. (2019). Young monolingual

and bilingual children’s exposure to academic language as related to language development and school achievement. Cambridge: Cambridge University Press.

Leyzberg, D., Spaulding, S., Toneva, M., & Scassellati, B. (2014). Personalizing robot tutors to individuals’ learning differences. Proceedings of the 2014 ACM/IEEE

International Conference on Human-Robot Interaction - HRI ’14, 423–430.

https://doi.org/doi:10.1145/2559636.2559671

Marulis, L. M., & Neuman, S. B. (2010). The effects of vocabulary intervention on young children’s word learning: A meta-analysis. Review of Educational Research, 80, 300–335. https://doi.org/https://doi.org/10.3102/0034654310377087

Mol, S. E., Bus, A. G., Jong, M. T. d., & Smeets, D. J. H. (2008). Added value of dialogic parent–child book readings: A meta-analysis. Early Education and Development, 19, 7–26. https://doi.org/https://doi.org/10.1080/10409280701838603

Mondaca Bustos, V. (2019). Storytelling with a robot tutor.

Moore, J. B., Yin, Z., Hanes, J., Duda, J., Gutin, B., & Barbeau, P. (2009). Measuring enjoyment of physical activity in children: Validation of the physical activity enjoyment scale. Journal of Applied Sport Psychology, 21, s116–s129.

(32)

Moriguchi, Y., Okanda, M., & Itakura, S. (2008). Young children’s yes bias: How does it relate to verbal ability, inhibitory control, and theory of mind? First language, 28, 431–442. https://doi.org/https://doi.org/10.1177/0142723708092413

Otterborn, A., Schönborn, K., & Hultén, M. (2019). Surveying preschool teachers’ use of digital tablets: General and technology education related findings. International

Journal of Technology and Design Education, 717–737.

https://doi.org/https://doi.org/10.1007/s10798-018-9469-9

Park, H. W., Grover, I., Spaulding, S., Gomez, L., & Breazeal, C. (2019). A model-free affective reinforcement learning approach to personalization of an autonomous social robot companion for early literacy education. The Thirty-Third AAAI Conference

on Artificial Intelligence (AAAI-19), 33.

https://doi.org/https://doi.org/10.1609/aaai.v33i01.3301687

Paslawaski, T. (2005). The clinical evaluation of language fundamentals, fourth edition (celf-4): A review. Canadian Journal of School Psychology, 20, 129–134.

https://doi.org/doi:10.1177/0829573506295465

Pereira, A., Martinho, C., Leite, I., & Paiva, A. (2008). Icat, the chess player: The influence of embodiment in the enjoyment of a game. In Proceedings of the 7th International

Joint Conference on Autonomous Agents and Multiagent Systems, 3, 1253–1256.

Pulido, J. C., González, J. C., Suárez-Mejías, C., Bandera, A., Bustos, P., & Fernández, F. (2017). Evaluating the child–robot interaction of the naotherapist platform in pediatric rehabilitation. International Journal of Social Robotics, 9, 343–358. https://doi.org/https://doi.org/10.1007/s12369-017-0402-2

Randall, N. (2019). A survey of robot-assisted language learning (rall). ACM Transactions

on Human-Robot Interaction, 9, 36.

https://doi.org/https://doi.org/10.1145/3345506

Scheele, A. F., Leseman, P. P. M., & Mayo, A. Y. (2010). The home language environment of monolingual and bilingual children and their language proficiency. Applied

(33)

Psycholinguistics, 31, 117–140.

https://doi.org/https://doi.org/10.1017/S0142716409990191

Schmider, E., Ziegler, M., Danay, E., Beyer, L., & Bühner. (2010). Is it really robust?

Methodology, 6, 147–151.

https://doi.org/https://doi.org/10.1027/1614-2241/a000016

Schodde, T., Hoffmann, L., Stange, S., & Kopp, S. (2019). Adapt, explain, engage—a study on how social robots can scaffold second-language learning of children. 9 (1).

https://doi.org/10.1145/3366422

Sinoo, C., van der Pal, S., Blanson Henkemans, O. A., Keizer, A., Bierman, B. P. B., Looije, R., & Neerincx, M. A. (2018). Friendship with a robot: Children’s perception of similarity between a robot’s physical and virtual embodiment that supports diabetes self-management. Patient Education and Counseling, 101, 1248–1255. https://doi.org/https://doi.org/10.1016/j.pec.2018.02.008

Skinner, E. A., & Belmont, M. J. (1993). Motivation in the classroom: Reciprocal effects of teacher behavior and student engagement across the school year. Journal of

Educational Psychology, 85, 571.

https://doi.org/https://doi.org/10.1037/0022-0663.85.4.571

Tamim, R. M., Borokhovski, E., Pickup, D., Bernard, R. M., & El Saadi, L. (2015). Tablets

for teaching and learning: A systematic review and meta-analysis [report] [Retrieved

from Commonwealth of Learning (COL) website: http://oasis.col.org/handle/11599/1012].

Traag, T. (2018). Leerkrachten in het basisonderwijs.

Valli, A. (2008). The design of natural interaction. Multimed Tools Applied, 38, 295–305. https://doi.org/DOI10.1007/s11042-007-0190-z

van den Berghe, R., Verhagen, J., Oudgenoeg-Paz, O., van der Ven, S., & Leseman, P. P. M. (2018). Social robots for language learning: A review. Review of Educational

(34)

van den Berghe, R., de Haas, M., Oudgenoeg-Paz, O., Krahmer, E., Verhagen, J., Vogt, P., Willemsen, B., de Wit, J., & Leseman, P. (2020). A toy or a friend? children’s anthropomorphic beliefs about robots and how these relate to second-language word learning. Journal of Computer Assisted Learning, 1–15.

https://doi.org/https://doi.org/10.1111/jcal.12497

van Baars, L. (2020). De groeiende greep van big tech op het digitale onderwijs. Trouw. Retrieved November 12, 2020, from https://www.trouw.nl/onderwijs/de-groeiende-greep-van-big-tech-op-het-digitale-onderwijs~b4cbc7be/

van Lehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46, 197–221. https://doi.org/10.1080/00461520.2011.611369

Vogt, P., van den Berghe, R., de Haas, M., Hoffmann, L., Kanero, J., Mamus, E., & Pandey, A. K. (2019). Second language tutoring using social robots. a large-scale study. IEEE/ACM Int. Conf. on Human-Robot Interaction (HRI 2019).

https://pub.uni-bielefeld.de/record/2933181.

Vygotsky, L. S. (1980). Mind in society: The development of higher psychological processes. Harvard University Press.

Wayne, A. J., & Youngs, P. (2003). Teacher characteristics and student achievement gains: A review. Review of Educational Research, 73, 89–122.

(35)

Appendix

Descriptive statistics, as in table A1

Teachers scored productive and receptive vocabulary levels for each child (1 = below average, 2 = average, 3 = above average), with a mean of 2.6 (SD = 0.6) for receptive, and a mean of 2.2 (SD = 0.7) for productive vocabulary levels. These two variables combined produced the language level, with a mean of 2.4 (SD = 0.6). Baseline (T0) was measured as an indication of the initial language level and produced a mean of 16.4 (SD = 8.7). The first exercise (E1) tested seven target words (M = 7.03, SD = 4.8) and the second exercise (E2) included six new words, as well as the target words from the previous exercise (M = 15.1, SD = 6.4). The variable immediate post-test (T1), obtained during the third exercise (E3) included seven new words, as well as words tested in the first and second exercise and had a mean of 27.1 (SD = 9.0). Delayed post-test scores were 26.1 (SD = 9.6) on average.

Engagement and enjoyment were obtained through questionnaire self-report and observation by the experimenters, scored on a scale of 1-4 (1 = not at all engaged in/enjoying the task, 4 = totally engaged/enjoying the task). Engagement scored on average over all three measurement times (E1, E2, E3) 3.0 (SD = 0.4), enjoyment scored 3.0 (SD = 0.5) over time as well. Humanness and liking of the story were scored on a scale of 1-4 (1 = not human at all/didn’t like the story at all, 4 = very human-like/liked the story very much) by participating children. For perceived humanness, this produced a mean of 2.8 (SD = 0.7). On average, children scored the story 3.4 (SD = 0.6).

(36)

Additional figures and tables Table A1 Variable M SD Agea 65.44 8.17 Malee 61.9 0.5 Class 2.2 1.2 Background 2.7 0.7 Bilinguale 57.1 0.5 Receptive vocabularyb 2.6 0.6 Productive vocabularyb 2.2 0.7 Language levelb 2.4 0.6 Baseline (T0)c 16.4 8.7 Immediate post-test (T1)c 27.1 9.0 Delayed post-test (T2)c 26.1 9.6 Engagementd 3.0 0.4 Enjoymentd 3.0 0.5 Perceived humannessd 2.8 0.7 Like the storyd 3.4 0.6

N = 63 Note. Means and standard deviations of demographic variables, learning outcomes, engagement,

enjoyment, perceived humanness and general liking. a Measured in months. b Measured on scale 1 – 3. c

(37)

Table A2

Condition Social Robot Neutral Robot Tablet Kruskal -Wallis

Variable M SD M SD M SD χ2 p Agea 67.04 8.10 65.50 8.26 63.55 8.15 1.86 .40 Male 1.35 0.49 1.30 0.47 1.50 0.51 1.84 .40 Class 2.30 1.22 2.00 1.12 2.15 1.18 0.70 .71 Background 2.65 0.78 2.70 0.73 2.70 0.73 0.06 .97 Bilingual 1.52 0.51 1.60 0.50 1.60 0.50 0.36 .84 Receptive 2.74 0.45 2.50 0.76 2.60 0.60 0.90 .64 Productive 2.39 0.66 2.00 0.79 2.15 0.67 3.12 .21 Language level 2.57 0.51 2.25 0.72 2.38 0.56 2.46 .29 Baseline (T0) 17.74 8.46 16.10 8.42 15.05 9.38 1.15 .56

Note. Kruskal-Wallis test with demographic variables and initial language level. N =23, N =20

and N =20, respectively.

(38)

Table A3

Age Male Class Back-ground

Bilin-gual

Rec. Prod. Lan-guage level T0 T1 T2 PH Enjoy. Age 1 Male -.13 1 Class -.04 -.21 1 Background .04 .07 -.12 1 Bilingual .08 .09 .10 .41∗∗∗ 1 Receptive voc. .31 ∗ .01 -.08 -.12 .01 1 Productive voc. .15 .04 .06 -.15 -.08 .63∗∗∗ 1 Language level .24 -.00 .03 -.17 -.06 .83∗∗∗ .95 1 Baseline (T0) .37 ∗∗ -.41∗∗∗ .29∗ -.30∗ -.23 .39∗∗∗ .47∗∗∗ .51∗∗∗ 1 Immediate post-test (T1) .27∗ -.47∗∗∗ .40∗ -.31∗ -.20 .30∗ .33∗∗ .38∗∗ .89∗∗∗ 1 Delayed post-test (T2) .29∗ -.47∗∗∗ .39∗∗ -.38∗∗ -.23 .37∗∗ .37∗∗ .43∗∗∗ .905∗∗∗ .968∗∗∗ 1 Perceived humanness .03 .02 -.09 -.09 .03 0.02 .03 -.01 -.14 .07 -.07 1 Enjoyment -.03 .04 .02 -.06 -.08 -0.07 .02 .02 -.04 -.06 .07 .71∗∗∗ 1 Engagement .26∗ -.25∗ .19 -.07 .04 .07 .21 .19 .38∗∗ .39∗∗ .44∗∗∗ .37∗∗ .52∗∗

Note. Correlations of demographic variables, learning outcomes, perceived humanness, enjoyment and engagement. N = 63∗∗∗Correlation is significant at the 0.001 level (2-tailed).

∗∗

Correlation is significant at the 0.01 level (2-tailed).

Correlation is significant at the 0.05 level (2-tailed).

a. Language level is the mean score of receptive and productive vocabulary, as indicated by the teacher.

Abbreviations: rec. = receptive vocabulary; prod. = productive vocabulary; T0 = baseline test; T1 = immediate post-test ; T2 = delayed post-test; PH = perceived humanness; enjoy. = enjoyment (mean).

(39)

Figure A1

Note. Children’s mean scores on the language test with error bars (95% confidence interval) over

time, organized by the behavior of the robot they exercised with (green: socially behaving; grey: neutrally behaving). No significant effect of the robots’ behavior was found.

(40)

Figure A2

Note. Example of a sheet with four pictures, one containing the target word and the other three

(41)

Figure A3

Note. A flowchart of the design of this study.

Figure A4

Note. A flowchart of the times of testing. At every storytelling exercise, target words of previous

exercises were incorporated. To create the variable ’Immediate learning outcomes’, the scores of the productive vocabulary task at the last storytelling exercise, containing all 20 target words,

(42)

Extended analyses

Extended analysis hypothesis 1: effect of embodiment on learning outcomes

To analyse at which measurement times the groups differed, a Wilcoxon test

comparing difference scores (T2 - T0, T1 - T0 and T2 - T1) was used. The increase of test scores between the baseline and delayed post-test was significantly higher for the robot

group (N = 43, Mincrease = 10.67, SD = 4.48) in comparison to the tablet group (N = 20,

Mincrease = 7.70, SD = 4.90, W = 588.5, p = .019). There was no significant difference in test scores between the baseline and immediate post-test (W = 522, p = .18).

Interestingly, difference in test scores between the delayed post-test and immediate post-test was marginally significant (Mincrease robot = -0.72, SD = 2.58, Mincrease tablet = -1.75, SD = 2.57, W = 558,5, p = .05).

Extended analysis hypothesis 2: effect of embodiment on engagement and enjoyment

There was a significant main effect of device (F (1, 61) = 36.15, p = < .001, η2 = 0.37). A Wilcoxon tests with Bonferroni correction showed that engagement in the robot

condition (N = 43) was higher at the first (Mrobot = 3.08, SD = 0.43; Mtablet = 2.67, SD

= 0.54, W = 433.5, p = .013), second (Mrobot = 3.15, SD = 0.35; Mtablet = 2.68, SD =

0.51, W = 397.0, p = .001) and third (Mrobot = 3.29, SD = 0.25, M2.64, SD = 0.40, W =

284.5, p < .001) tutoring session than with in the tablet condition (N = 20).

Enjoyment of the task with the robot was higher at the first (Mrobot = 3.18, SD =

0.36; Mtablet = 2.65, SD = 0.54, W = 374.5, p < .001), second (Mrobot = 3.07, SD = 0.45;

M2.55, SD = 0.65, W = 437.5, p = .02) and third (Mrobot = 3.11, SD = 0.46; Mtablet =

2.59, SD = 0.51, W = 391.0, p = .001), indicating children engaged in more in exercise with the robot and enjoyed it more.

There was a significant effect between time and dependent variable (engagement

and enjoyment) (F (2,60) = 3.74, p = .027, η2 = 0.06). There was no significant main effect

of time, indicating children did not become more engaged or enjoyed the story over time. Furthermore, there was no significant effect between time and device, meaning children did

Referenties

GERELATEERDE DOCUMENTEN

S pr ingzaad, Kloo

Original studies were included in the review if: (i) the main purpose of the research focused on SRL or SDL in the clinical workplace, and (ii) participants were undergraduate

- Verwijzing is vervolgens alleen geïndiceerd als naar inschatting van de professional de voedingstoestand duidelijk is aangedaan, als er een hoog risico is op ondervoeding en

Er is wel een significante samenhang gevonden tussen de totaalscores van de TOSCA schaamte-schaal en de verschilscores van de verdriet-items, r S = .46, p (one-tailed) &lt; .05.

Duurzame leefomgeving Bestemming bereiken Openbaarvervoer verbeteren Verkeersveiligheid verhogen Fietsgebruik stimuleren Vrachtvervoer faciliteren Geluidsoverlast

the framework of Lintner (1956) firms can only distribute dividend based on unrealized income is the fair value adjustments are persistent.. The results of table

een meervoud aan, onderling vaak strijdige, morele richtlijnen. Aangezien deze situatie niet langer van het individu wordt weggenomen door een hoger gezag dat oplossingen

Met inagneming van hierdie uitdagings en die problematiek van ’n nuwe bewind in die Vrystaat, word die vernaamste redes ondersoek vir die stryd wat in die Vrystaat tussen die