Declaration of Authenticity
MA Applied Linguistics - 2018/2019MA-thesis
Student name:_______________________________________________________________ Student number:______________________________________________________________ PLAGIARISM is the presentation by a student of an assignment or piece of work which has in fact been copied in whole, in part, or in paraphrase from another student's work, or from any other source (e.g. published books or periodicals or material from Internet sites), without due acknowledgement in the text.
TEAMWORK: Students are encouraged to work with each other to develop their generic skills and increase their knowledge and understanding of the curriculum. Such teamwork includes general discussion and sharing of ideas on the curriculum. All written work must however (without specific authorization to the contrary) be done by individual students. Students are neither permitted to copy any part of another student’s work nor permitted to allow their own work to be copied by other students.
DECLARATION
• I declare that all work submitted for assessment of this MA-thesis is my own work and does not involve plagiarism or teamwork other than that authorised in the general terms above or that authorised and documented for any particular piece of work.
Signed_____________________________________________________________________
Date_______________________________________________________________________
Alberto Vazquez
S3825809
Measuring Fluency in L2 Russian Students
Studying Abroad with Extempore
Alberto Vazquez S3825809MA Thesis
Department of Applied Linguistics
Faculty of Arts
Rijksuniversiteit Groningen
Supervisors:
dr. S. (Sake) Jager
dr. M.C. (Marije) Michel
21 June 2019List of Abbreviations ACMC asynchronous computermediated communication CALL computerassisted language learning CDST Complex Dynamic Systems Theory CEFR Common European Framework of Reference CLT communicative language teaching CMC computermediated communication IELTS International English Language Testing System L1 first language; native language L2 second language MALL mobileassisted language learning SCMC synchronous computermediated communication TBLT TaskBased Language Teaching TOEFL iBT Test of English as a Foreign Language, internet based test TORFL Test of Russian as a Foreign Language ZPD Zone of Proximal Development
Table of Contents Abstract 4 Introduction 5 Literature Review 9 Oral Fluency 10 Task Design 14 Individual Speaking Styles 16 Statement of Purpose 21 Method 22 Participants 23 Materials 23 Procedure 26 Design & Analyses 29 Results 30 Research Question 1a 31 Research Question 1b 34 Research Question 2 38 Research Question 3 39 Discussion 40 Implications & Limitations 49 Conclusion 52 References 55 Appendix A 61 Appendix B 66 Appendix C 68 Appendix D 70
Abstract The current study investigated the gains in oral fluency for a group of L2 Russian learners studying abroad in Saint Petersburg, Russia. Data were gathered in the form of audio recordings using Extempore, an online platform in which the participants were instructed to complete weekly oral tasks for one month. Their gains were analyzed with a pre and posttest using such measures as speech rate, pruned speech rate, articulation rate, phonation time ratio, and number of disfluencies. The collected data were then compared to audio samples of their English speech to evaluate their individual speaking styles using the same measures. Feedback was also obtained from the participants in the form of a postsurvey questionnaire to inspect their digital literacy, anxiety, and learning strategies. A casestudy analysis of the findings reinforced existing literature that language learning is a dynamic process and L2 disfluencies match L1 disfluencies, indicating a need to update current assessment practices. Group gains were found in pruned speech rate, articulation rate, and number of disfluencies. Portfolios of each participant were created to analyze individual gains and individual differences, providing a comprehensive outlook on each participant’s personal L2 development. In addition, learners reported a reduced sense of anxiety performing tasks on Extempore than speaking in the classroom. Keywords : CALL, MALL, technologymediated TBLT, DST, fluency, disfluency, oral proficiency, Extempore, speech rate, pruned speech rate, articulation rate, phonation time ratio, assessment, individual differences, speaking styles, foreign/second language acquisition, Russian, study abroad, online tasks.
Introduction Learning a second language (L2), from a student’s point of view, can often seem like a daunting task, requiring several hours of study, dedication, motivation, and what feels like a miracle to achieve a high level of proficiency in a given language, especially in regard to oral fluency. There are several factors at play that influence the results of that the process in order to produce such a high standard of fluency, which the student may not be aware of. These factors, from a teacher’s point of view, are constantly being juggled and taken into account, as the role of the language instructor is to adapt the curriculum to incorporate more authentic material to replicate reallife scenarios to better prepare students to use the language outside of the classroom in a meaningful way, all the while catering to the needs of all the students in a particular classroom in a particular context (whether abroad or in one’s own home country). This process of reciprocity is, of course, continuous and complex, for one does not “arrive” at mastery in a second language (much less concerning the ability to speak fluently). While measuring results is significant, it is also important to track the learner’s progress toward gaining proficiency. One such way to improve one’s aptitude is to study abroad immersed in the target language. Participation in a study abroad program for language learners is both an exciting and harrowing experience. It is commonly believed that studying abroad in the country where the target language is spoken helps students improve their oral proficiency skills, especially concerning languages less accessible to English speakers, such as Russian; in fact, some universities have proposed making it a requirement for language students to go abroad (Brecht,
Davidson, & Ginsberg, 1995). Students often struggle with developing a high level of fluency in a traditional classroom environment given the fact that classroom activities do not supplant communicative practice with a native speaker (Brecht et al., 1995). Nevertheless, this concept does not provide insight into whether gains actually occur during a studyabroad session and disregards the fact that this responsibility remains entirely at the students’ discretion, as they are presumed to be interacting with native speakers outside of their officially enrolled courses. Moreover, class time is limited, and every student may not get the same opportunity to speak during the lesson, putting more pressure on students to find their own interlocutors outside of class, a task which can often prove to be cumbersome for some personality types. In addition to acclimating to the everyday customs of an entirely new culture, the students have to battle culture shock and maneuver their way through the intricate cultural system in effect and, on top of that, develop a strong sense of communicative competence. This process can feel overwhelming, produce anxiety, and demotivate students of lower proficiency levels without the necessary skills to successfully communicate in the target language with native speakers (BakerSmemoe, Dewey, Bown, & Martinsen, 2014). As a result, in order to augment inclass activities, a plausible solution to this issue is to utilize technology and the affordances it offers to assign online speaking tasks at the appropriate level for the students studying abroad to complete outside of class. In fact, research has shown that learners who complete additional oral tasks outside of the classroom using their computers or mobile devices usually outperform learners in control groups in oral proficiency measures (AlJarf, 2012; Anaraki, 2009; Hsu, Wang, & Comac, 2008; Lee, 2019).
Technological advancements have been developing exponentially, offering additional tools for communication and practice that can be applied to L2 learning and teaching. These new technologies have paved the way for new kinds of tasks (GonzálezLloret & Ortega, 2014) within the field of computerassisted language learning (CALL), which is still a relatively fledgling discipline. There is considerable need to critically evaluate CALL resources in terms of their relevance and implementation into the L2 curriculum. Unlike that of textbooks, the content and operation of an online L2 resource cannot be easily gleaned, often due to an instructor’s lack of technological competence, their inexperience with the resource from a learner’s mindset, and their inability to properly discern its objectives, rendering evaluation a challenge (Hubbard, 2006). A subdivision of CALL to emerge is known as tutorial CALL, which consists of activities that resemble the grammar exercises that teachers know under the “drill and kill” moniker (Blake, 2011). Despite their inauspicious connotations, these exercises serve a role in the L2 curriculum (Hubbard & Siskin, 2004) and should be analyzed with regards to what Levy (1997) refers to as the tutortool distinction. In other words, the “tutor” role aids in language learning, as opposed to being simply utilized as a “tool,” which does not in and of itself foster learning (Levy, 1997). Online tasks, consequently, must pertain to these concepts and profit from the affordances provided by this distinction. Despite these advancements in technology, oral tasks are still often difficult to implement within a CALL environment. Furthermore, as users shift from computers to mobile devices, the demand for platforms available on mobile devices, known as mobileassisted language learning (MALL), is also quickly developing, and along with it, a younger generation with a stronger
sense of digital literacy. Moreover, these technologies are available now, albeit with some limitations, and can be included in L2 classrooms, assuming the teacher has knowledge of these existing options. One such tutorial CALL (or MALL) resource is the online platform Extempore, which offers students supplemental speaking practice with the guidance of an instructor. It is not a form of selfstudy, as are some of the popular applications available (e.g. Duolingo, Mango, or Memrise). It can be considered “teachercentered” in the sense that the students work alone and the tasks are created and assigned by the teacher. Furthermore, the tasks that can be designed on the platform focus on students’ oral production (individual) skills rather than their oral interaction (collaborative) skills. The Extempore platform is compatible with both CALL and MALL learning environments, allowing students the choice of completing tasks either on their personal computer or mobile devices, such as a tablet or smartphone. Extempore is tailored for use in a blendedlearning environment, as students cannot simply enroll in a class without a link to join. It is a form of humancomputer interactive communication that gives each student the opportunity to practice his or her speaking skills independently by recording a spontaneous response with the computer. As such, the students have time to process the material in their own way, developing a sense of autonomy. Additionally, it can also be viewed as a form of asynchronous computermediated communication (ACMC) since the students’ responses are generated for and listened to by the instructor, who can also provide oral feedback. This type of practice allows the teacher to assess the progress of the students and notice common problematic areas permeating the students’ language use and make adjustments to the lesson plan accordingly.
The aim of the current study is to investigate the gains in oral fluency for an intact group of L2 Russian learners studying abroad in Saint Petersburg, Russia by gauging the efficacy of online tasks designed with TaskBased Language Teaching (TBLT) and CALL/MALL approaches and principles. The group’s oral fluency was analyzed and compared with the use of pre and posttests to determine whether any gains were procured by the participants using Extempore, who completed two tasks once a week for four weeks. In addition, the participants’ L2 fluency was evaluated alongside that of their English fluency, which is their first language (L1). Lastly, a postsurvey questionnaire was administered to gain insight into the participants’ attitudes toward digital tools, learning strategies and preferences, anxiety, confidence, and selfassessment. In the following section, a definition of fluency and a description of the fluency measures will be discussed before an explanation of the task design procedures, followed by an overview of the appropriate literature to contextualize and provide a theoretical framework for the study. Literature Review The ultimate goal of most language learners is to be able to speak their L2; this implies developing communicative competence (Hymes, 1972; Canale & Swain, 1980) that allows learners to use the target language adequately and accordingly in any given realworld setting (Lee, 2019). This task appears the most daunting for students and seems to be the cause of much of the anxiety surrounding L2 learning. Speech is creative and spontaneous by nature, and it is the belief of the researchers that practice in the classroom should have a practical component, one that emulates, prepares, and equips learners with the necessary tools for successful
interactions outside of the classroom. Despite the fact that pedagogy has indeed shifted in recent years to a more communicative language teaching (CLT) approach, the time spent in the
classroom alone is still insufficient for students to build their oral fluency skills. In fact, recent research suggests that fluency might actually be neglected in the classroom due to teachers’ much broader interpretation of the term “fluency” ( Tavakoli & Hunter, 2018 ). There is not only a need to develop a distinction between fluency as it is commonly understood by the general public and as it is in used by researchers in the field of Applied Linguistics but also between teachers’ definition of fluency and its application in the classroom ( Tavakoli & Hunter, 2018 ). Oral Fluency What exactly is meant by fluency ? Everyday people use the word as an indicator of overall oral proficiency (Luoma, 2004), one that refers to the fluidity of speech (Kormos, 2006), whereas the term is considered by researchers to represent a single construct within the triad (along with complexity and accuracy ) that embodies the notion of oral proficiency (Housen & Kuiken, 2009). This distinction between oral fluency and oral proficiency can also be referred to as the broad and narrow sense of fluency (Lennon, 1997). Understanding the relationship between these two concepts has lead to significant implications for L2 pedagogy (BakerSmemoe et al., 2014; de Jong, Groenhout, Schoonen, & Hulstijn, 2015; Segalowitz, 2010). Although an examination of complexity and accuracy as measures is a critical part of proficiency, an analysis of these constructs goes beyond the scope of the current study. Instead, an investigation of the development of fluency in the narrow sense was conducted using Lennon’s (2000) definition, which identifies fluency as “the rapid, smooth, accurate, lucid, and
efficient translation of thought or communicative intention under the temporal constraints of online processing” (p. 26). For study abroad students, providing them with the opportunity to speak develops a strong sense of oral fluency that strengthens the repertoire of skills necessary for them to successfully survive in the host country. Having defined fluency in this narrow sense, the measures used to characterize the term can be further explained in different ways. According to Segalowitz (2010), fluency can be divided into three subcategories: cognitive fluency , which describes the speaker’s ability to produce L2 language, perceived fluency , the listener’s subjective opinion of the speaker’s speech, and utterance fluency , which consists of “the features of utterances that reflect the speakers cognitive fluency” (p. 165). This study focused on L2 utterance fluency and some of its many measures. In their previous investigations on fluency measures, both Kormos (2006) and de Jong (2018) report that the best predictors of fluency were speech rate , the total number of syllables uttered in a speech sample per minute, articulation rate , the total number of syllables generated relative to the speaking time, and phonation time ratio , a percentage of the total speaking time divided by the total time to produce an a speech sample (i.e. speaking time and silent pauses). Pruned speech rate , which is the total number of syllables minus the disfluent syllables generated in a sample per minute, can also be used as an alternative fluency measure to speech rate (de Jong, 2018). In regard to the number of disfluencies , which include factors such as filled pauses, repetitions, and repairs & restarts, research findings reveal mixed results (Kormos, 2006; de Jong, 2018). These disfluent aspects of speech are generally viewed as detractors of eloquent speech (de Jong, 2018), but determining whether disfluencies are accurate measures of oral
fluency can be problematic. Many language tests include disfluencies in their rubrics even though they do not present any comprehension issues for the listener (de Jong, 2018). This notion will be extrapolated in the section below on Individual speaking styles . Although there are many other measures that have been used to examine fluency (mean length of runs, number of silent pauses per minute, and mean duration of silent pauses), the current study will concentrate on the aforementioned measures based on the data collected. There are several studies that measure fluency in a CALL/MALL environment that deal with CMC, which incorporates Vygotsky’s (1962) notion of the Zone of Proximal Development (ZPD). This approach to language learning asserts that two or more learners working together leads to the negotiation of meaning (Long & Robinson, 1998; Varonis & Gass, 1985). CALL activities have attempted to transfer this type of facetoface interaction into a virtual setting. In order for this type of learning to take place, opportunities must be created for the learner to notice gaps in their own knowledge (Schmidt, 1990). As a result, teachers have remained hesitant or ignorant with respect to the application of technology to teach L2 speaking and develop fluency (Blake, 2017). In fact, many of them adhere to the broad sense of the term fluency, thereby influencing their selection of speaking materials to incorporate in their
classrooms, as well as their grasp on fluency assessment ( Tavakoli & Hunter, 2018 ). Therefore, the efficacy of any task relies heavily on the decision an instructor makes when choosing appropriate CALL activities for the classroom (Blake, 2017). Nevertheless, studies have shown that CMC yields positive results in L2 learning overall, both synchronous and asynchronous forms (Payne & Whitney, 2002; Hampel & Hauck, 2006; Abrams, 2003; Blake, 2011; Blake, 2017; Sykes, 2005).
In one study, Lee (2019) examined learner attitudes of beginning Spanish students’ toward VoiceThread, an online CMC Web 2.0 tool that can be used for L2 learning, through online surveys and postinterviews. The learners were divided into small groups of 67 students and instructed to complete tasks on VoiceThread after being provided with the proper scaffolding and guidance to encourage individual contributions and collaboration. After a fiveweek session, the learners completed a postsurvey questionnaire using a fivepoint Likert scale to evaluate their attitudes toward the assigned tasks. The students reported have positive attitudes toward the platform. However, the students’ gains in oral fluency were not analyzed and mentioned as a limitation due to a lack of a control group. In fact, few studies on fluency have tracked gains within speakers over time (de Jong, 2018). In addition to learners developing their oral interaction skills, developing oral production skills is just as important. In one study, AlJarf (2012) examined the benefits of assigning learners exercises using mobile technology outside of the classroom to improve their oral production skills. The two groups of L2 English learners were both exposed to the same inclass instruction and completed the same inclass exercises and tests; the experimental group, however, was also required to follow a selfstudy listening and speaking program on their mobile devices. In order to track the groups’ gains, pre and posttests were administered at the beginning and end of the semester. The results revealed that the experimental group showed higher gains in “listening comprehension, oral expression, fluency, pronunciation correctness, and vocabulary knowledge” (AlJarf, 2012, p. 106). Similar results were found in Hsu et al.’s study (2008), which investigated the use of audioblogs to improve oral production skills in L2 English students. Whether or not the results of both studies were due to the extra practice the
experimental group received, the findings point to the benefits of a blending learning environment, aiding learners who supplement class time with additional speaking and listening exercises. Anaraki’s research (2009) confirms this sentiment, illustrating that students have positive associations to using their mobile devices for L2 learning and describe the optimal MALL setting as being “a hybrid model” (p. 35). Therefore, the design and implementation of new online tasks are necessary to ensure their efficacy in this modern context and assure that these kinds of tasks accurately reflect empirical findings and harness the affordances of MALL. Task Design The tasks for the current study were designed with TBLT principles in mind (Ellis, 2003; Doughty & Long, 2003; Norris, 2009; Samuda & Bygate, 2008; Van den Branden, 2006; Willis, 1996), as well as the employment of relevant features offered on the Extempore platform. The creation of new tasks arose out of a necessity to reflect the affordances now available on digital platforms like these (GonzálezLloret & Ortega, 2014). Teachers often merely transfer tasks without adapting them to a digital context (Hampel, 2006). Svensson (2004) even warns of the pitfalls of what he calls the “do what you did before” approach. Therefore, it is important to broaden the scope of task theory to include an online environment. The principles of TBLT can be integrated with digital technology in a framework known as technologymediated TBLT (GonzálezLloret & Ortega, 2014). While there are several definitions of a task, the current study will be using Willis’s (1996) version: “A goaloriented activity in which learners use language to achieve a real outcome. In other words, learners use whatever target language resources they have in order to solve a problem, do a puzzle, play a
game or share and compare experiences” (p. 53) Tasks, in other words, are a means to an end. “They are justified by the fact that they serve an overall educational purpose” (GonzálezLloret & Ortega, 2014, p. 7). This notion of a task illustrates the pragmatic elements studyabroad students require and aptly complements with the previously discussed fluency theory. According to Skehan (1998), there are five main characteristics that comprise a task: a primary focus on meaning , which emphasizes an implicit focus on language, learnercenteredness , which addresses the learners’ individual needs allows for flexible responses, goal orientation , which can include a communicative purpose and outcome, a pragmatic application , which offers realworld use of language, and reflective learning , which encourages learners to meditate on the learning process. In this way, tasks can be applied to digital technologies and allow learners to develop their L2 skills, including oral fluency. The main two characteristics driving task creation for this study were a primary focus on meaning and a pragmatic application. The speaking tasks were also designed with some of the examples described by Luoma (2004). These include the following: description tasks, narrative tasks, comparing and contrastive tasks, explaining and predicting tasks, reacting in situations tasks, and decision tasks. All of these tasks were selected because they complement a technologymediated TBLT approach and serve as a good model for oral production skills. The prompts could also be adjusted for oral interaction skills; however, given the focus on oral production skills on Extempore, this was not imperative. In order to create tasks in Russian, the Test of Russian as a Foreign Language (TORFL, or in Russian, Тест по русскому как иностранному ) was referenced. The TORFL was
developed by the Lomonosov State Moscow University, Pushkin State Russian Language Institute, the Peoples' Friendship University of Russia, and the SaintPetersburg State University in order to assess the communicative competence of Russian foreign language learners (Dolzhikova, Kurilenko, Ivanova, Pomortseva, & Kulikova, 2015). The test is also used by teachers to comply with Russian foreign language teaching practices (Andryushina, 2009). It is composed of 6 levels, which roughly correspond with the 6 levels of the Common European Framework of Reference (CEFR) (see Table 1 for more information). The overall goal of Level 1 is to provide the necessary basis for successful communication in a linguistic environment using a limited set of language tools (Andryushina, 2009). Table 1 Comparison of Tests in Russian TORFL CEFR Elementary
Basic User A1 Breakthrough
Basic A2 Waystage
1st Certificate Independent User B1 Threshold 2nd Certificate B2 Vantage 3rd Certificate Proficient User C1 Effective Operational Proficiency 4th Certificate C2 Mastery Individual Speaking Styles Everyone has their own style of speaking, using different registers and adapting their speech to correspond to any given situation; some people are more eloquent than others in their
native language. Following this logic, it would make sense that an individual’s speaking style in their L1 carries over into their style of speaking in their L2, including the speaker’s disfluencies. Indeed, recent studies have confirmed this tendency (de Jong et al., 2015; BakerSmemoe et al., 2014; Riazantseva, 2001). In other words, the same predilection someone has for filler words (such as “ um ” and “ like ”) in their L1 means these words are also likely to appear in their L2 speech. This can also apply to other measures of fluency and disfluency patterns such as pauses, hesitations, and restarts & repairs. As a result, these observations question the evaluation methods of oral fluency used in language testing. In its rubric for speaking, the International English Language Testing System (IELTS) mentions “hesitations” and “pauses” for the lower level bands and “repetitions” and selfrepairs for the higher level bands ( IELTS, n.d. ). The Test of English as a Foreign Language, internetbased best (TOEFL iBT) also mentions “pauses and hesitations” at the lower levels, whereas at the higher levels “pace” and “automaticity” are specified but nothing on disfluencies ( Educational Testing Service, 2004) . In the CEFR’s criteria, on the other hand, the phrase “smooth flow” consistently permeates the rubric for the spoken language use at the higher levels; at the lower levels, the rubric indicates a number of disfluencies: “pauses,” “repair communication,” and “false starts and reformulation” (Council of Europe, 2001). The TORFL rubric for the speaking section only mentions that the learner should be able to produce “coherent and logical statements” and focuses more on the content and its relevance to the prompt of the oral production tasks (Andryushina, 2009). This is a description of the overall assessment of the speaking test. More taskspecific evaluations are provided with scoring
measures that do indicate fluency more explicitly as following the “norms” of the Russian language (Andryushina, 2009). Could this imply that fluency measures are determined by language? In a study by de Jong et al. (2015), the fluency measures of two typologically different L1 language groups (Turkish and English) and Dutch as L2 were analyzed, and their results show that L2 fluency behavior can be predicted based on L1 speech patterns from both groups. However, they also mention that there were differences observed between the L1 fluency groups. Despite the L1 not having an effect on the groups’ L2 fluency in Dutch, the crosslinguistic variance that was discovered was justified as a result of syllable duration and word length. The results found in another study by BakerSmemoe et al. (2014) examined the measures of L2 fluency across several different languages (including Russian) and proficiency levels suggest that “L2 utterance fluency measures may be language specific rather than universal” (p. 725). Their study also revealed that these measures may not correspond with the test score of lowerlevel L2 learners. L2 instructors and testers should be aware of these differences in order to better support and assess the development of their learners’ fluency abilities. It could be challenging implementing this change and sampling both language pairs for assessment purposes, which is something that de Jong et al. (2015) acknowledge. Nevertheless, it could provide learners with a more precise L2 assessment of their skills. Riazantseva (2001) conducted a crosslinguistic study comparing the difference between both Russian and English native speakers learning their respective language pair. In her study, the pausing patterns of L1 Russian speakers differed from the pausing patterns of L1 English speakings. In other words, the pause durations in L1 Russian were found to be longer than those in L1 English, suggesting that transferring this pausing
behavior into L2 English or Russian may be interpreted as nonnativelike. While the differences in de Jong et al. (2015) were explained as typological differences, Riazantseva (2001) attributes the differences in her study to cultural differences. Further crosslinguistic research is necessary to investigate the influence of culture and typography on fluency measures. Moreover, most studies on fluency have assessed objective measures of utterance fluency, while fewer studies have tracked gains within speakers over time (de Jong, 2018). The current study will attempt to explore these gaps. Even with the recommended adjustments for L2 language proficiency testing, as well as the ongoing changes transforming current curricula with the implementation of online L2 learning tools, noticing these individual differences that each learner exhibits while in the classroom remains a critical component of L2 development. Individual differences have been shown to have an impact on L2 learning (Dörnyei, 2009), yet one of the challenges that remains is developing a lesson plan that takes into account each student’s individual differences, especially as classroom sizes continue to grow. This can be problematic with regard to oral fluency. As aforementioned, time spent in the classroom is limited, which invariably leads to some students practicing more than others. Even in a bestcase scenario, this would typically amount to an hour a day, two or three times a week. One possible solution already explored above is by assigning oral fluency tasks for students to complete outside the classroom. These types of tasks allow each student not only to develop their fluency skills but also to reduce anxiety speaking in the target language. In Kessler’s (2010) study, students preferred to record themselves on their mobile devices than in an audio laboratory, which implies that letting students record these tasks in an environment of their choice is an important factor in reducing
the students’ level of anxiety. Moreover, if lower anxiety leads to better performance, then the teacher can more accurately monitor and assess the students’ capabilities both in and out of the classroom. Further research examining anxiety when completing fluency tasks in a MALL environment would be beneficial. Anxiety is but one of many individual differences. Other factors that could play a role in developing or hindering L2 skills include learning strategies, confidence, and attitudes toward digital tools. To further complicate the process, these are not static but fluctuate at any given moment. A Complex Dynamic Systems Theory (CDST) acknowledges these other individual differences that influence L2 development, describing the process as dynamic, nonlinear and one that consists of much variability, fluctuating at any given point in time within an array of interconnected subsystems that selforganize (LarsenFreeman, 1997; de Bot, Lowie, & Verspoor, 2005). This more complex perspective on second language learning recognizes that language consists of several subsystems that are interconnected and interact, thereby ensuring that no two learners’ languagelearning journeys are ever the same, as their life trajectories, behaviors, attitudes, backgrounds, cognitive abilities, and many more factors dictate their tortuous learning paths, with various periods of improvement, stagnation, and attrition. This approach renders languagelearning development unpredictable, variable, and individualized, which makes its application into large L2 classroom quite a demanding task (Lowie, 2013). From this perspective, the fruit of one’s languagelearning labor, years of hard work enshrined on a language certificate, may not accurately reflect the realities of L2 language skills; rather, merely a reflection of one progress point in time across one’s L2 development. While tests are an important form of assessment, used by governments for citizenship, universities for student
enrollment purposes, or companies for professional contexts, a potential alternative to demonstrate a learner’s skill sets and level of proficiency would be a language portfolio, one that includes a detailed, longitudinal history of one’s progress in the language (Lowie, 2013). Subsequently, oral fluency will be examined in this context as dynamic action, affected by internal and external forces and established by language cues, nonlinearity, and continual growth. Statement of Purpose To summarize the above description of the literature and explain the relevant factors that guided the operationalization of the current study, students studying abroad could benefit from a blended learning environment by completing speaking tasks, allowing students in the classroom an extra opportunity to speak and helping teachers track their students’ progress (AlJarf, 2012; Anaraki, 2009; Hsu et al., 2008; Lee, 2019). Teachers also need to narrow their definition of fluency to better support the development of oral fluency skills and communicative competence (Hymes, 1972; Canale & Swain, 1980; Tavakoli & Hunter, 2018 ). Previous studies have found that speech rate, phonation time ratio, and articulation rate are good indicators of fluency (Kormos, 2006; de Jong, 2018). Disfluencies and pruned speech rate will also be examined (de Jong, 2018; de Jong et al., 2015). Pauses, on the other hand, may not be a good measure for L2 Russian learners (Riazantseva, 2001) and will not be analyzed in the current study. Tasks were designed consulting technologymediated TBLT principles (Willis, 1996; Skehan, 1998; GonzálezLloret & Ortega, 2014) and the specific speaking activities (Luoma, 2004). In order to track gains in oral fluency (de Jong, 2018), the prompts for the pre and posttests were translated from two sample TORFL speaking tests. A sample of their English speech will also be collected
in order to compare their disfluencies to those in Russian (de Jong et al., 2015). A postsurvey questionnaire will also be administered to attain more information on the learners’ individual differences (Lee, 2019; Tanaka & Ellis, 2003). In addition, these individual differences will be taken into consideration and investigated through the lens of a CDST perspective, one that looks at language as a dynamic, nonlinear, and selforganized series of interconnected subsystems (LarsenFreeman, 1997; de Bot et al., 2005; Lowie, 2013). The following research questions were formulated based on the aforementioned criteria and literature discussion: (1a) What gains in oral fluency can be observed in L2 Russian students studying abroad through the evaluation of pre and posttests after using Web 2.0 tools to complete oral tasks? (1b) What signs of individual development can be revealed after such a relatively short testing period? (2) Do the participants’ disfluencies in L2 Russian reflect the same types of disfluencies in their L1 English speech? (3) How do the students’ individual differences affect their progress and performance? Method The experiment was conducted in a quantitative approach, measuring fluency in terms of speech rate, pruned speech rate, articulation rate, phonation time ratio, and the number of disfluencies throughout several oral tasks in Russian, assessed before and after with the administration of pre and posttests, as well as one oral task in English. A casestudy design was used in order to investigate the performance of each individual participant and provide more indepth analysis.
Participants The participants were a selfselected sample composed of 3 American students (2 males and 1 female) who were all 21 years of age studying abroad in Saint Petersburg as a part of a Russian Language and Area Studies Program. They were all L1 English speakers studying Russian as an L2, each with a level ranging between A2 and B1 on the scale used for the CEFR. They began the program in late January and finished midMay. The study took place toward the end of their program during the last month of the semester. As part of their study abroad program, the students participated and attended facetoface classes 5 days a week for 6 hours a day without any online platform implemented into the curriculum. Their schedule of courses included grammar, phonetics, speech practice, and politics. These students indicated having struggled with developing their fluency skills in Russian and agreed to join the study with the expectation of gaining speaking practice and obtaining more specific feedback that measured their fluency and accuracy (although their mistakes will not be analyzed in the current study). In addition to the feedback provided, gift certificates were offered as a form of compensation and as a means of motivation to complete the designed tasks. It should also be noted that 3 of their classmates dropped out of the study. Materials The study consisted of a pre and posttest of 15 minutes (10 minutes of preparation and 5 minutes of oral response), as well as 8 tasks (30 seconds of preparation and 1 minute oral response), two tasks per week for four weeks, and a postsurvey questionnaire with 20 questions in order to gauge the participants’ technical competence, learning strategies and preferences,
anxiety, confidence, and selfassessment. The tests and tasks were completed on Extempore, while the postsurvey questionnaire was completed on Google Forms. The following section provides a brief overview of the Extempore platform, the design process for the pre and posttests and tasks devised for the study, and the postsurvey questionnaire, which were the materials utilized to collect the data. The Extempore platform was the main website used in creating the tasks. The learning environment has a mobile option available for iOS and Android devices, which presents students with the option of working on their mobile devices (MALL), as well as the option to complete the tasks online through a web browser at a computer (CALL). While there are other similar platforms available to educators (Voki and VoiceThread, to name a few), Extempore was selected based on the affordances offered to instructors and its dedication to creating learning tools based on research. While the platform allows students to record both video and audio responses, the current study only examined audio responses. A major limitation of the platform is that it does not accommodate for interaction, only production of oral skills, so the tasks were devised accordingly. This will be featured in a future update, but as of this study, it was in beta and not yet made available to the public. One of the primary reasons for selecting the Extempore platform was due to its ability to control the construct of time. To ensure spontaneity, the preparation time and response time for each particular task was limited, which maintained that all students received the same amount of exposure to review the prompt before formulating a recorded response. The prompts for the tasks supported the use of text, pictures, audio and video files, or a combination of these.
In order to track gains in oral fluency for the participants over time, the pre and posttests were administered and designed based on their level in Russian using the TORFL. While the full test consists of sections in Grammar, Reading, Writing, Listening, and Speaking, only the Speaking section tasks and rubrics were examined when planning the tasks. The section of the TORFL for the 1st Certificate (also referred to as Level 1) contains three tasks, while the section of the TORFL for the 2nd Certificate (Level 2) contains 3 parts with 15 tasks. The questions for the tasks were chosen based on their appropriateness for use within the Extempore platform. For the pretest, one prompt was adapted from a sample test in Level 1, and the question for the posttest was derived from another sample test in Level 2 (Andryushina, 2009). Both tasks are similar description tasks but differ in terms of their complexity, commensurate with their respective test level. For each test, the participants were given a prompt followed by a series of questions they were required to answer. In order to replicate the TORFL testing environment, the participants were given 10 minutes of preparation time to review the prompt, and 5 minutes to respond to the questions orally. During a facetoface administration of the exam, the examiners are required to give the test takers the allotted time; however, on the Extempore platform, the participants could choose to record a response much sooner. As a result, while the test was set up to emulate an online version of the TORFL, the preparation time could vary per participant, but for the purpose of this study, the fiveminute response time was more significant. This is a feature that could be changed in a future update of Extempore, not only allowing instructors more control over the amount of time allocated to their students for their tasks but also could prove useful in the development of an automated grading system, addressing some of the issues assessment tests have when determining fluency.
The TORFL speaking rubric indicates that the specific task used in this study is assessed in the participant’s ability to produce a monologic utterance with the appropriate language and speech facilities (Andryushina, 2009). For this task, a fivepoint scale is used to evaluate the participant’s speech: a mark of 5 is given when the quality of the response completely adheres to the prompt’s parameters and demonstrates a command of the norms of the Russian language, a 4 is the same as a 5 but contains mistakes that do not deter from the prompt, a mark of 3 is when the errors in speech do not show an understanding of the prompt, and a 2 and 1 are given when understanding is impaired due to the number of errors. Although this fivepoint scale was not used to rate the participants of this study, it is nevertheless critical to understand the objectives of the task in its original testing context. Procedure The participants were instructed to log onto the Extempore website from their computer or mobile device twice a week for four weeks to complete the tasks. The tasks were scheduled to appear on their list of tasks automatically. They were designed based on the TBLT principles mentioned in the background literature and serve as the treatment to investigate the participants’ gains in oral fluency. There were a total of 8 tasks, but it should be mentioned that 1 task was ungraded and will also be explained in detail below. A practice assignment was created in order for the participants to familiarize themselves with how the platform works and to introduce them to how to record and submit the tasks. This task was not analyzed and participants were also encouraged to respond in their L1 to have a better understanding of the technology. There were no time limitations, and students were
allowed to resubmit a response if necessary. Since no questions or technological issues arose after the practice task, the participants were then instructed to complete the pretest and record the first week’s tasks in Russian (See Appendix A for the full list of tasks). The two tasks assigned for the first week of the treatment were designed based on responses to visual stimuli. The first task was a description task (Luoma, 2004) and asked participants to describe a picture and infer what is going on based on their observations, while the second task was a narrative task (Luoma, 2004) and consisted of a series of three images that the participants had to link to tell a cohesive story. These tasks align with TBLT concepts especially in terms of learnercenteredness and pragmatic application, as the prompts generate unique responses from the participants. The goal of the second week’s tasks was to focus on developing listening comprehension skills. The first task included an audio file with the transcript of the instructions written down as an additional form of scaffolding for the students, as well as an image for further support. The second task also contained an audio recording with instructions; however, the transcript of the audio was not provided and the only given text was the prompt “Listen to the instructions” in Russian. Without the scaffolding, the task increases in complexity and forces the participants to listen to the prompt carefully. Due to this increase in complexity, the task focused on the participant’s prior knowledge: the city of St. Petersburg. The students were asked where they would take a friend who is visiting and explain why. In general, it should be noted that the tasks were not designed to be complex so as to not demotivate the students. These tasks are examples of reacting in situations tasks and explaining and decision tasks (Luoma, 2004), respectively.
They are goal oriented and reflective, encouraging the participants to draw upon their personal experiences. For the third week, the participants were told to interact in Russian with another student online and to pay attention to the mistakes of the interlocutor. Afterward, the participant was to record a reflection of their experience on Extempore. Since Extempore only focuses on oral production and not oral interaction, this task was designed to encourage student interaction, notice the language being produced, and to reflect on their experiences speaking Russian in a digital context. Consequently, the first task was ungraded. Given the amount of literature on interaction, including the ability for students to interact with their peers in an online environment would enhance the Extempore experience. The fourth week’s tasks focused on comparing and contrasting tasks and decision tasks that offered advice (Luoma, 2004). The first task showed two images of children performing various activities and the participants were instructed to compare and contrast the two photos. The second task asked for advice on learning a foreign language, another task that draws on the students’ prior knowledge and experience, as they reflect on their own languagelearning process and provide practical advice to future L2 learners. In order to compare the participants’ fluency with that of their native language, one English task was assigned. The task was a translation of a question from a TORFL sample test (Andryushina, 2009). The same parameters were set for this task as the pre and post tests: namely, the participants were given 10 minutes to prepare and 5 minutes to respond to the prompt of the pre and posttests, as is required on the official TORFL test.
At the end of the four weeks, the participants were asked to complete a posttest. The two tests were similar in design, although not the same to avoid testing bias. Once the last test was submitted, participants were emailed the postsurvey questionnaire. The questionnaire was completed by the participants upon successful completion of all the aforementioned tasks and tests. The survey contained 5 background questions and 15 statements that the participants had to express to what degree they agree or disagree using a fivepoint Likert scale. The questions were adapted from the postsurvey used in Lee (2019) and Tanaka & Ellis (2003). Design & Analyses Once the recordings were gathered and the postquestionnaire was completed, the speech was transcribed and the disfluencies (filled pauses, repetitions, and restarts/repairs) were annotated. The silent pauses were removed from the recordings to generate the speaking time. In order to calculate the number of disfluencies, the filled pauses included obvious words such as “ um ” and “ uh ,” which are typical filler words in English and not Russian, but the total number also includes a couple of Russian words which were used as fillers that are not used as fillers. Mainly, participants uttered the word “ но ” or “ but ,” which is supposed to be used as a conjunction, but it may have possibly been confused with the Russian word “ ну ” or “ well ,” which is used as a filler word; the word “ как ” or “ like ,” which is only used literally in Russian as a preposition or conjunction and not as a filler word, was also included as a disfluency. Words like “ и ” or “ and ,” and “ да ” or “ yes ” were evaluated on a casebycase basis, depending on whether these enhanced, emphasized, or served a grammatical function or meaning. It also included words used in English that were proclaimed as a form of selfcorrection (“ no yeah ”), as
well as English words that were plugged in to replace unknown Russian words (“ charity ,” “ harp ,” and “ links ”). Russian filler words, which were used less frequently (if at all), were not considered disfluencies and omitted from the total. Repetitions of phrases were counted as one occurrence if the entire phrase was repeated (e.g. “ в моем городе, в моем городе ” or “ in my city, in my city ”). However, each syllable was taken into account to calculate the total number of disfluent syllables. The same approach was applied to repairs and restarts: each repair was tallied as one instance, with the syllables subsequently added to the number of disfluent syllables. These were counted manually since the duration of the recordings was under a minute. The syllables for the longer recordings (the pre and posttests and the task in English) were calculated utilizing an online syllable counter. Results The current study investigated the following research questions: (1a) What gains in oral fluency can be observed in L2 Russian students studying abroad through the evaluation of pre and posttests after using Web 2.0 tools to complete oral tasks? (1b) What signs of individual development can be revealed after such a relatively short testing period? (2) Do the participants’ disfluencies in L2 Russian reflect the same types of disfluencies in their L1 English speech? (3) How do the students’ individual differences affect their progress and performance?
The variables analyzed to measure gains were speech rate, pruned speech rate (speech rate minus the disfluencies), articulation rate, phonation time ratio, and the number of disfluencies. Research Question 1a Upon initial inspection of the data, the statistics revealed that as a group ( N =3) the participants performed the same on both the pretest ( M = 1.94, SD = 0.53) and the posttest ( M = 1.94, SD = 0.62) in terms of their speech rate. Since we are dealing with one group tested on two separate occasions, a paired samples t test was used to run the statistics. A paired samples t test showed that this difference was not significant ( t (2) = 0.03, p = 0.98), and the effects size is very small ( r = 0.02). However, for the group’s pruned speech rate, the participants performed marginally better on the posttest ( M = 1.66, SD = 0.48) than the pretest ( M = 1.60, SD = 0.56), which reveals a slight decrease in their disfluencies on the posttest ( M = 0.20, SD = 0.07) from the results of their pretest ( M = 0.25, SD = 0.05). A paired samples t test was also executed and revealed that the results for pruned speech rate were also not significant ( t (2) = 1.25, p = 0.34) with a large effect size ( r = 0.66), as well as the results for number of disfluencies ( t (2) = 0.81, p = 0.40), which had a large effect size. The number of silent pauses for the group increased, as indicated by their phonation time ratio for the posttest ( M = 0.52, SD = 0.17) compared to their pretest ( M = 0.65, SD = 0.2). The results of a paired samples t test ( t (2) = 1.06, p = 0.40) with a large effect size ( r = 0.60) also proved to be insignificant. The group performed better in terms of their articulation rate on the posttest ( M = 3.77, SD = 0.27) than the pretest ( M = 3.19, SD = 1.28). Once again, a paired samples t test revealed that the difference was not significant ( t (2) =
0.97, p = 0.44) and the effect size is large ( r = 0.56). See Table 2 below for a summary of the group’s descriptive statistics for each measure, and Table 3 for an overview of the participants’ oral fluency scores. Table 2 Descriptive statistics for the group’s pre and posttests per measure
Measure PreTest PostTest
M SD M SD Speech Rate 1.94 0.53 1.94 0.62 Pruned Speech Rate 1.60 0.56 1.66 0.48 Articulation Rate 3.19 1.28 3.77 0.27 Phonation Time Ratio 0.65 0.2 0.52 0.17 Disfluencies 0.25 0.05 0.20 0.07 Due to the small sample size of the group ( N = 3), it is clear results that are not significant are to be expected. Consequently, the current study also examined the development of each participant individually from the point of view of a DST approach and included as a subset of this research question (RQ 1b). The goal of this approach is to shed more light on the learning process for each participant and interpret the results in a much more dynamic manner.
Research Question 1b The results of the group were examined per participant and visualized appropriately. Each participant’s results from the pre and posttests will be investigated, followed by an examination of the results per task, with the pre and posttests serving as guideposts. There are a couple of results that can be gleaned from the gains the participants made in the pre and posttests, as showcased in Table 3. The speech rate for each participant varied between the pre and posttests. Mary (Participant 1) appears to have decreased in speech rate, John (Participant 2) shows an increase in speech rate, and Daniel (Participant 3) demonstrates no change in speech rate. However, the number of disfluencies decreased for Mary and John but increased for Daniel. As a result, Mary shows an increase in her pruned speech rate, and John (who showed an increase in speech rate) actually exhibits a slight decrease in his pruned speech rate, while Daniel displays a slight increase in pruned speech rate. Mary and John both produced an increase in their articulation rate, producing more syllables in their speaking time, but they increased the total number silent pauses, as indicated by a decrease in the phonation time ratio. On the other hand, Daniel displayed a decrease in his articulation rate, but he minimally decreased the number of silent pauses in his phonation time ratio. To summarize, Mary reduced the number of disfluencies while increased the number of syllables and silent pauses in her speech, producing a higher result for her pruned speech rate. John slightly improved his speech rate and articulation rate, while remaining fairly consistent in terms of disfluencies, pauses, and pruned speech rate. Daniel improved his pruned speech rate, reducing the number of silent pauses in his speech but slightly increasing the number of disfluencies produced; his speech rate remained the same,
while his articulation rate decreased. (See Appendix B for a column chart of each participant’s progress per measure.) Some trends were also noticed analyzing each participant’s progress across the tasks. As seen in Figure 1, John’s pruned speech rate consistently outperformed the pruned speech rate of the other participants, except in Task 3 for Mary, whose pruned speech rate surpassed John’s. It must also be noted that Daniel did not complete all of the tasks, and, correspondingly, his scores weakened and appear to have fluctuated the most. Figure 1 . Pruned speech rate of all the participants across all tasks and tests. Another notable observation occurred with the number of disfluencies that the participants produced. Based on the data in Figure 2, the participants seemed to have the most fluctuation, especially the listening tasks in the second week of the intervention (Tasks 3 & 4). Broken down per measure of disfluency (see Appendix C for Figures 911), each participant seemed to struggle with different forms of disfluency in their speech. It can be seen in Figure 9
that Mary overuses filled pauses (mostly “ um ”), while rarely producing repetitions or reparations in her speech. This shows that Mary uses filled pauses to think about her next utterance, instead of incorrectly starting and backtracking. In figures 10 and 11, respectively, John and Daniel also overuse filler words, but they also tended to generate more repetitions and repairs/restarts. In Task 2, John had the most disfluencies with repairs and repetitions but had more repetitions in Tasks 3 & 5. Daniel had the most number of repetitions in Task 7, with the highest disfluency being filled pauses. Figure 2 . The total number of disfluencies (filled pauses, repetitions, repairs/restarts) of all the participants across all tasks and tests. The participants’ phonation time ratio sheds insight onto the total duration of silent pauses in their speech relative to their speaking time. John did not have as many silent pauses as Mary or Daniel, instead of filling the silence with disfluent syllables. Mary seemed to have increased the number of silent pauses in her speech; John decreased his silent pauses from Tasks
24 and increasing them in Task 5 before gradually decreasing them through the posttest; Daniel displays a tendency to be silent for around half of the total time. Figure 3. The phonation time ratio of all the participants across all tasks and tests. Figure 4. The articulation rate of all the participants across all tasks and tests.
Research Question 2 In order to analyze the participants’ disfluencies in their L1 and L2, the means of their pre and posttests was calculated and compared to their English results. Initial review of the data in Figure 5 revealed that Mary had significantly fewer disfluencies in English than in Russian, while John and Daniel had approximately the same number of disfluencies in English and Russian. In fact, Daniel actually had more disfluencies in English, than in Russian. Descriptive statistics showed that, unsurprisingly, the group exhibited more disfluencies in Russian ( M = 0.22, SD = 0.01) than in English ( M = 0.18, SD = 0.08). A pairedsample t test found that this difference was not significant ( t (2) = 1.21, p = 0.35), with a large effects size ( r = 0.65). The fact that the participants’ number of disfluencies seemed to parallel those of their L1 will be discussed in greater detail below. Figure 5. Total number of disfluencies across all participants between L1 & L2.
Research Question 3 In order to gain insight into the participants’ background and individual differences, the results of the postsurvey questionnaire were analyzed (See Appendix for notable results). The information provided by the participants’ responses in the background section confirmed their age and L1 but also showed that two of the three learners completed the tasks all on a mobile device and one learner used a computer to do the tasks (Figure 12 in Appendix D) and two learners indicated they spoke Russian outside of the classroom “frequently” and one learner acknowledged speaking Russian Russian “not very often” (Figure 14). On a fivepoint Likert scale, the learners also assessed the difficulty of the lessons, with one representing “not difficult” (or “very easy”) and five being “very difficult.” The outcome revealed that two participants found the tasks difficult (or a 4) and one participant deemed them “easy” (or a 2), as seen in Figure 13. In the second part of the questionnaire, the participants indicated to what degree they agreed or disagreed with 15 statements, which were also on a fivepoint Likert scale, with 1 meaning “strongly disagree” and 5 meaning “strongly agree.” For the purpose of this study, 7 statements on the following issues generated noteworthy responses: using Extempore, receiving immediate feedback, levels of anxiety, and learning strategies. For the statement “Overall, I had a positive experience using Extempore,” the participants generally agreed, with two learners agreeing and one strongly agreeing (Figure 15). For the statement on receiving immediate feedback, 1 student strongly disagreed, 1 agreed, and 1 strongly agreed. Three statements were meant to gauge the students’ level of anxiety speaking Russian in general, in class, and using