• No results found

The Influence of Input Strategies on Advanced-level L2 learners’ Knowledge of Collocations

N/A
N/A
Protected

Academic year: 2021

Share "The Influence of Input Strategies on Advanced-level L2 learners’ Knowledge of Collocations"

Copied!
107
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Influence of Input Strategies on

Advanced-level L2 learners’ Knowledge of Collocations

Thomas James Wigham

S3349179

MA in Applied Linguistics

Faculty of Liberal Arts

University of Groningen

Supervisors:

Professor Wander Lowie

Professor Marjolijn Verspoor

Date: 2

nd

July 2018

(2)

Declaration of authenticity

MA Applied Linguistics - 2017/2018

MA thesis

Student name: Thomas James Wigham Student number: s3349179

PLAGIARISM is the presentation by a student of an assignment or piece of work which has

in fact been copied in whole, in part, or in paraphrase from another student's work, or from any other source (e.g. published books or periodicals or material from Internet sites), without due acknowledgement in the text.

TEAMWORK: Students are encouraged to work with each other to develop their generic

skills and increase their knowledge and understanding of the curriculum. Such teamwork includes general discussion and sharing of ideas on the curriculum. All written work must however (without specific authorization to the contrary) be done by individual students. Students are neither permitted to copy any part of another student’s work nor permitted to allow their own work to be copied by other students.

DECLARATION

I declare that all work submitted for assessment of this MA thesis is my own work and does not involve plagiarism or teamwork other than that authorised in the general terms above or that authorised and documented for any particular piece of work.

(3)

Acknowledgements

I am extremely grateful to Professor Wander Lowie for his continued support as primary supervisor for this thesis. I would also like to thank my colleagues at the British Language Training Centre for their help in piloting and validating the test measures and tasks for this study.

(4)

Contents

Abstract ... 5

Introduction ... 6

Background ... 8

A Definition of Collocation ... 8

Advanced-level Learners’ Productive Knowledge of Collocations ... 8

Issues in the Acquisition of Collocations ... 10

Elaboration, Engagement and Levels of Processing ... 14

Inter- and Intra-lexical Factors ... 21

Aim and Research Questions ... 24

Method ... 25 Global Design ... 25 Participants ... 26 Target Collocations ... 27 Materials ... 29 Procedure ... 36 Scoring ... 38 Analysis ... 40 Results ... 42

RQ 1. The Relative Development of Productive and Receptive Knowledge... 42

RQ2. The Differential Effect of Example Generation and Copying on Collocational Knowledge ... 45

RQ3. The Influence of Intra-lexical Factors on Productive and Receptive Collocational Knowledge. ... 53

Discussion ... 54

The Relation between Productive and Receptive Knowledge... 54

Effects of Learning Conditions on Collocational Knowledge ... 56

Inter-lexical and Intra-lexical Factors ... 63

Conclusion ... 67

Appendix A: Target items ... 81

Appendix B: Pre- and Post-tests ... 82

Appendix C: Sample Treatment Task ... 88

Appendix D: Intra-lexical Rating Sheet ... 93

Appendix E: Background Questionnaire ... 95

(5)

Abstract

Developing target-like knowledge of collocations presents a considerable challenge to language learners, even those with an advanced command of a second language. This study investigated the relative effectiveness of two input strategies as a means of developing collocational knowledge, with a particular focus on target-like production. As a secondary focus, it examined the effects of two intra-lexical factors, transparency and imageability, on collocation retention. Dutch university students (N = 15) saw collocations in two learning conditions, with a gloss and an example sentence. In the first condition, they copied the example sentence; in the second, they generated their own example sentence and wrote it down. Retention was measured at three levels of sensitivity: free form recall, cued form recall, and a multiple-choice test. The analyses showed a significant benefit of context generation in immediate and delayed post-tests, but this knowledge proved no more durable over time than in the copy condition. Imageability was not found to have an effect on any of the measures, while an effect of transparency was found only in a post-hoc analysis for receptive knowledge. The results are considered with particular regard to precise semantic elaboration and the Involvement Load Hypothesis, and the pedagogical implications of the study are discussed.

(6)

The Influence of Input Strategies on Advanced-level L2 learners’ Knowledge of Collocations

Introduction

Learning vocabulary is arguably the most substantial challenge that language learners face. Even once learners have developed a substantial vocabulary of individual words, producing target-like formulaic language continues to present considerable difficulties at advanced levels. Formulaic sequences allow for faster processing of language than non-formulaic strings (Conklin & Schmitt, 2008) and lead to impressions of native-like language use (Boers, Eyckmans, Kappel, Stengers, & Demecheleer, 2006). They also perform a substantial social function (Schmitt & Carter, 2004; Wray, 2002) and help to denote group membership, making them an integral to membership of the academic community, for example (Jones & Haywood, 2004). Collocations comprise a substantial part of a native speaker’s repertoire of formulaic language, but at higher levels, target-like productive use of collocations continues to prove challenging (Laufer & Waldman, 2011). As such,

pedagogical treatments which address accurate productive use are desirable.

One of the most commonly employed vocabulary-learning strategies is to write down the vocabulary item to aid retention (Schmitt, 1997). However, a recent study by Stengers, Deconinck, Boers & Eyckmans (2016) found no benefit for copying, and it remains unclear whether writing in general has a beneficial effect on retention (Barcroft 2002, 2006). An alternative strategy advocated by teacher trainers and researchers is writing the new item in a novel context (Thornbury, 2002; Webb, 2009). Theoretical support for this proposal can be found in Levels of Processing theory (Craik & Lockhart, 1972; Craik & Tulving, 1975) and the need for engagement with a vocabulary item when first encountering it (Schmitt, 2008). However, those studies which have investigated context generation have hitherto focussed on word learning rather than collocation and have sometimes provided negligible support for the effectiveness of this strategy (Hulstijn & Trompetter, 1998), or only focussed on receptive

(7)

gains (Hulstijn & Laufer, 2001). Given that higher-level learners’ difficulties are most clearly manifest in accurate production of collocations, there is a clear need to assess whether these tasks might foster accurate productive knowledge.

This study therefore examines the effectiveness of two input strategies for learning collocations in order to evaluate their effectiveness as pedagogical or self-study tasks, with a particular focus on developing productive knowledge. A secondary focus of this study is to consider the influence of two intra-lexical factors which have previously been shown to influence learning of multi-word units, namely transparency and imageability (Steinel, Hulstijn, & Steinel, 2007), to determine whether these may influence the effectiveness of either strategy. This study is first situated within the existing literature on L2 collocational knowledge, productive vocabulary development, elaboration, and intra-lexical factors.

Research questions are then formulated, and the method of the experiment is explained. Next, the results are presented and discussed in the context of the research questions and the

background literature. Finally, the pedagogical implications of the study are explored, and suggestions are made for future research.

(8)

Background

A Definition of Collocation

In a statistical sense, collocations are word pairs which appear together within a given span with greater frequency than is predicted by chance (Hoey, 1991, p.7). As such, the appearance of one collocate in text increases the probability of finding its co-collocate (Durrant & Schmitt, 2010, p.164). While this statistical definition is broadly accepted by corpus linguists (Hunston, 2002; Sinclair, 1991), recent studies have also considered collocation as a psychological phenomenon. It has been found that collocates produce priming effects, both in L1 (Durrant & Doherty, 2010) and L2 (Wolter & Gyllstad, 2011; Yamashita & Jiang, 2010), depending on their frequency of occurrence. Hoey (2005) claims that collocations result from repeated encounters with a word in similar situations, noting that a word “becomes cumulatively loaded with the contexts and co-texts in which it is

encountered” (p.17). From a usage-based perspective, therefore, collocations can be seen as word combinations of varying strength of association, and they therefore fall under the broader umbrella of formulaic language (Wray, 2002).

Adopting a statistical definition entails including idioms as collocations (see e.g. Webb, Newton, & Chang, 2013). This potential disadvantage is partially offset in this study by the inclusion of a measure of semantic transparency within the analyses. This approach can also be considered ecologically valid in so far as learners will encounter collocations of varying degrees of transparency within a single text or often a single classroom exercise.

Advanced-level Learners’ Productive Knowledge of Collocations

At higher levels of L2 proficiency, learners have difficulty developing a native-like command of collocational restriction. Bahns & Eldaw (1993) tested advanced German learners of English on a set of collocations using elicitation tasks such as L1-L2 translation.

(9)

Verbal collocates made up less than a quarter of the lexical words in the sentences, yet these accounted for nearly half (48.2%) of all errors found. A later study by Granger (1998)

compared essays by French-speaking learners of English with texts written by English native speakers. The study found that the learners’ weaker sense of collocation led to problems of both underuse and overuse. Modifying adverbs such as highly tended to be underused by comparison with native speakers, whereas completely and totally were overused. Learners also tended towards the use of congruent collocations, that is, combinations where there is a word-for-word overlap between the L1 and L2 collocations.

The issue of congruence was also addressed by Nesselhauf (2003). In a similar corpus study focusing on German university-level learners of English, it was found that 56% of collocational errors across a variety of syntactic categories could plausibly have been caused by transfer from L1. A further analysis found that the ratio of correct uses of collocations to incorrect uses was much more favourable in congruent collocations (8:1) than in non-congruent collocations (almost 1:1), providing a strong indication that the L1 exerts considerable influence on L2 production of collocations even at higher levels.

More recently, Laufer & Waldman (2011) used corpus data to examine collocation use at three different proficiency levels. Not only was it found that learners at all levels used fewer collocations than their native-speaker counterparts, but that non-canonical collocations persisted even at advanced levels, with around a third of collocations being used incorrectly. In fact, in absolute terms, the number of errors made at more advanced levels was greater than that at lower levels, leading the authors to conclude that the number of collocational errors actually increases as a factor of proficiency (p.665). Again, non-congruent collocations appeared to cause the largest number of errors. Taken together, the above studies demonstrate that the challenge posed by collocations does not decrease at advanced levels, and that the influence of L1 continues to pervade production.

(10)

Issues in the Acquisition of Collocations Input and attention.

Why do collocations present such a challenge to high-level learners? One explanation is a lack of exposure. From a usage-based perspective (e.g. N. Ellis, 2002), L2 use will become more target-like as a factor of increasing L2 input. In an experimental study, Durrant & Schmitt (2010) were able to demonstrate that even one encounter with a novel collocation leaves an associative memory trace, providing convincing evidence that learners do develop collocational knowledge through exposure. In classroom conditions, Webb et al. (2013) had learners read and simultaneously listen to graded readers over the course of an hour. The readers were manipulated to contain the target collocations either 1, 5, 10, or 15 times. Scores were substantially higher in the 15 encounters condition than for the less frequent encounters, supporting the claim that collocations can be learned by extensive input. If this is the case, then the collocational problems reviewed above may be a product of low exposure to the L2.

However, if collocations are encountered at a lower density than those in Webb et al. (2013), they may be relatively non-salient and therefore difficult to attend to. Wray (2002) claims that L2 learners tend to be focused to a greater extent on single words than multi-word units. Certainly, literate learners may see words as the primary unit of analysis since in written language they are delineated by white space. Collocations not only lack formal boundaries but may be more difficult to notice because they are often comprised of common words, which are more likely to be known and which will be given less attention than novel words (Boers, Lindstromberg, & Eyckmans, 2014a). They may also be less likely to attract attention where their meaning is relatively semantically transparent (Peters, 2012). Classroom studies have shown that awareness-raising activities can make learners explicitly aware of the syntagmatic relations between words (Jones & Haywood, 2004); nevertheless, students may continue to identify word strings as formulaic which are not considered formulaic by native

(11)

speakers (Eyckmans, Boers, & Stengers, 2007). This indicates that even if learners are explicitly aware of collocations on a conceptual level, they may still be unable to identify them. Thus, if attention is a prerequisite for learning (Schmidt, 2001), L2 learners may lag behind in their collocational knowledge because learners do not notice them. It is worth noting that in the Webb et al. (2013) study, collocations were also processed aurally, meaning that they may have been more easily perceived as multi-word units in a continuous stream of speech than if they were only read on a page. Furthermore, 15 encounters in a short period of time was still insufficient to achieve ceiling scores in immediate receptive post-tests, and a single encounter with a collocation resulted in scores no higher than a control group who did not read the text. Without enrichment techniques such as this, the lower density of

collocations in natural text will mean they are acquired only slowly.

Receptive and productive learning.

Supposing learners do attend to collocations in the input, this may still not translate automatically into productive knowledge. Studies have consistently shown that receptive learning tasks lead to lower productive test scores than tasks which require productive recall (Griffin & Harley, 1996; Mondria & Wiersma, 2004). Nor does greater L2 exposure

necessarily lead steadily to greater productive knowledge. For instance, Laufer (1998) found that over the course of one year, Hebrew-speaking learners of English added substantially to their receptive vocabulary size (1,600 word families) and to their controlled productive vocabularies (850 word families), as measured by a vocabulary size test. However, their written work showed no increase at all in vocabulary size. Caspi (2010) conducted a longitudinal case study of L2 learners into the so-called “receptive-productive gap” (p.46) and similarly found that receptive knowledge did not lead automatically to production. For

(12)

collocations, the implication is that receptive learning may be insufficient to develop target-like productive knowledge in the L2.

While reception and production are often referred to in dichotomous terms, the reality is that they both refer to strata of “graded knowledge” (Caspi, 2010, p.52), with the ability to understand and recognize vocabulary receptively requiring relatively less

information to be present in the mental lexicon. This “amount of knowledge” explanation (Nation, 2013, p.51) holds that output, whether spoken or written, requires more precise knowledge of word forms. This accounts for the fact that productive knowledge is always smaller than receptive knowledge, including in L1 (Caspi, 2010), and appears to decay at a quicker rate than receptive knowledge. For example, Waring (1997) used a paired-associate learning (PAL) paradigm to test Japanese learners on their retention of English pseudo-words. Immediate post-tests showed results in all conditions to be near ceiling, but 3 months later almost no productive knowledge was retained, while receptive knowledge remained around 50% of the immediate post-test level. As knowledge decays, the more elaborate output patterns will appear to be affected first, particularly where scoring is strict (Barcroft, 2006), whereas correct answers on receptive tests are relatively more achievable even when less information is retained in the mental lexicon.

A complementary explanation which takes into account the influence of the L1 explains the difference in terms of competitive access (e.g. N. Ellis & Beaton, 1993). Assuming an interactive activation model of the lexicon, for example (Kroll & Dijkstra, 2002), L1 items have lower activation thresholds and are likely to inhibit L2 forms from being selected. Applying this explanation to the syntagmatic associations between words, more active L1 patterns of association may inhibit L2 collocates from being activated, resulting in the high density of errors with non-congruent collocations reviewed above.

(13)

While the input hypothesis tends to assume that massive exposure is sufficient for vocabulary development (Krashen, 1989; Nagy, 1997), this may neglect the vast amount of production native speakers engage in (Laufer, 2003). Indeed, studies indicate that productive knowledge is associated with increased mental effort which exposure alone may not provide. Laufer & Paribakht (1998) studied the receptive and productive vocabularies of comparable groups of learners of English as a second language (ESL) in Canada and learners of English as a foreign language (EFL) in Israel1. The ESL learners had greater receptive vocabulary sizes overall, which was attributed to their more frequent contact with the L2. Despite this, the EFL learners were found to have larger controlled and free productive vocabulary sizes, which the researchers attributed to greater deliberate learning in that context.

Studies of classroom learning would support this conclusion. In a PAL paradigm, Griffin & Harley (1996) found that productive test scores were higher when preceded by activities which required production of the L2 form from the L1 cue, rather than receptive knowledge by supplying the L1 word from the L2 cue. Mondria & Wiersma (2004) similarly found that the L1-L2 direction was most effective for developing productive knowledge. They also noted that a combination of receptive and productive learning was no more effective than productive learning alone, suggesting that receptive knowledge represents a partial component of productive knowledge, rather than a distinct form of mental

information. With respect to collocation specifically, Webb & Kagimoto (2009) tested learning in two conditions, either by reading collocations in example sentences or by writing the collocations into gapped sentences. For higher-level learners, the productive writing

1 ESL refers to situations where English is learnt and taught as the dominant language of communication, whereas EFL refers to situations where English is not widely used for communication or instruction (Carter & Nunan, 2001, p.2). While some researchers such as Carter and Nunan (2001) find the distinction problematic, it is useful here to illustrate the relative amounts of L2 exposure a learner may receive.

(14)

condition proved to be better for retention on both productive and receptive measures than the reading condition.

To summarize: the findings from both experimental studies and studies of the

receptive-productive gap point to the need for productive activities to foster target-like use of collocations. If collocations are learnt only receptively, L2 learners may struggle to overcome the influence of highly activated L1 output patterns. Indeed, even after daily use of English for several years, ESL learners continue to exhibit the influence of the L1 in response to non-congruent collocations (Yamashita & Jiang, 2010). Thus, where the goal of learning is to foster productive knowledge, productive learning may be most effective (Nation, 2013). Based on the above findings, the following hypotheses are formulated:

1a. The productive input tasks will result in a substantial amount of productive learning, but receptive levels of knowledge will be highest overall.

1b. Productive knowledge will decay at a faster rate than receptive knowledge.

Elaboration, Engagement and Levels of Processing

Assuming that productive knowledge is prone to more rapid decay, the most desirable input tasks will be those which result in the highest levels of long-term retention. There is a body of evidence which indicates that the quality of the “engagement” (Schmitt, 2008, p.338) with a new vocabulary item will influence how well it is retained. This engagement, also termed elaboration, may involve attending to either the formal properties of a word (structural elaboration) or its semantic properties, termed semantic elaboration (Barcroft, 2002). In an influential paper, Craik & Lockhart (1972) proposed that there are “levels” at which an item can be processed; semantic processing was posited to be deeper than

(15)

been criticized on the basis that quality of processing is also related to the task outcome (Morris, Bransford, & Franks, 1977) and that “levels of processing” (LOP) lacks a consistent operationalization (Hulstijn, 2001). Nevertheless, there is agreement among psychologists that the quality of processing is a determinant of retention rates (Hulstijn, 2001).

One attempt to operationalize elaboration for instruction was made by Laufer & Hulstijn (2001) in the form of the Involvement Load Hypothesis (ILH). This is comprised of three task factors: need, search, and evaluation (Laufer & Hulstijn, 2001, p.14). Need

encompasses the motivational aspect of the requirement to find a vocabulary item; search involves consulting an authority or reference material; and evaluation involves comparing a new vocabulary item to other items to make a decision about its meaning, such as by

generating a context or translating the item. Greater involvement is posited to result in better retention. The construct of evaluation is of particular relevance to the current study since it concerns the manner of engagement once the meaning of a collocation is known to the learner, either by attending to its formal properties, semantic properties, or both, to determine how well it fits the context. Laufer & Hulstijn (2001) cited a number of previous studies as support for the ILH; however, not all of these provided unequivocal support for the ILH. Hulstijn & Trompetter (1998), for example, found no significant benefit for the theoretically more elaborative task of writing vocabulary items into a composition.

What kind of elaboration is most effective? Several studies have suggested that the most useful elaborative techniques are those that involve precise elaboration (Stein et al., 1982; Verspoor & Lowie, 2003). Stein et al. (1982) defined precise elaborations as those which make clear the relevance or significance of new information (p.399). In that study, children who were able to provide elaborations to clarify the purpose of an action or to illustrate an activity were more likely to recall their sentences later. In terms of second language learning, studies from a cognitive semantic perspective (Lakoff, 1987) have also

(16)

demonstrated the benefits of precise elaboration. Verspoor & Lowie (2003) found that inferring the figurative meanings of polysemous words can be made more effective when learners are given cues to understand a word’s core meaning. It was argued that connecting the literal and figurative meanings would ensure that newly created retrieval paths led

towards the word’s meaning, whereas guessing at random could create erroneous connections leading away from the vocabulary item (p.551). Similarly, Boers, Demecheleer and

Eyckmans (2004) demonstrated that idioms were retained more effectively by learners who were informed of the original, literal meaning of the idioms than those who were only informed of the idiomatic meaning. The researchers termed this etymological elaboration (EE) as a subcategory of semantic elaboration. This approach has much in common with that of Verspoor & Lowie (2003) in that connecting the idiom’s meaning to its origin would create an additional path to retrieval. It is contended here that precise elaboration and EE are essentially comparable as forms of semantic elaboration.

Precise elaboration can be situated within Laufer & Hulstijn’s (2001) construct of task-induced evaluation. Drawing on the studies reviewed above, the more precisely a word or collocation is compared against related forms and meanings, the more likely it is to be retained. The following studies are therefore considered with a particular focus on precise elaboration and the task-induced construct of evaluation.

Semantic elaboration and context generation.

Several studies have considered the effects of context generation with single words. R. Ellis & He (1999) found that target words were retained significantly better when learners had to use them in negotiated output, as opposed to listening to them in either premodified or negotiated input. Scores on the productive tests were lower than the receptive tests but nevertheless substantial, ranging from 52% to 85%. The requirement to define the target

(17)

words, for example connecting an item of furniture such as rocker to a superordinate term chair and its related functions, seems to have aided retention by linking the target word to related concepts.

In another study of oral generation, Joe (1998) had Asian and Samoan learners of English read a text and subsequently retell its contents, either with or without the text to hand. Both groups significantly outperformed a control group, though no difference was found between the treatment groups themselves. A further analysis determined the extent to which learners had elaborated on the meaning and associations of the target words. Gains were generally higher for target words which had been more extensively elaborated during recall, for example by connecting the target word to its associated concepts and other L2 words, or by using examples (p.364). Again, more extensive elaboration meant integrating the target word more precisely within a learner’s existing knowledge. However, the post-tests only assessed receptive knowledge, meaning no conclusions can be drawn about the task’s effectiveness in developing productive knowledge.

With respect to written context generation, Hulstijn & Trompetter (1998) carried out a study which directly investigated the relative effects of reading and writing on vocabulary retention in a computer-assisted language learning (CALL) setting. Dutch-speaking students of French either read a weather report in French or wrote a weather report in French using a Dutch model text. Participants in the reading condition could look up words from L2 to L1, while those in the writing condition could look up words from L1 to L2. Look-ups were monitored digitally, and these comprised the basis of a translation test the next day. It was hypothesized that the writing condition would require greater evaluation of the target words in context and therefore show higher retention scores. Only 64% of the participants looked up 10 or more words to comprise a complete post-test. No significant differences were found between the two conditions in the group which looked up 10 words or more. The only

(18)

significant advantage for writing was found when the entire cohort was measured together, and partially correct responses were included. Contrary to Laufer & Hulstijn (2001), this study provides little support for the claim that writing a composition requires more elaboration of new vocabulary items.

A further study by Hulstijn & Laufer (2001) was operationalized in a slightly different manner. Participants in the Netherlands and Israel read an English text and completed one of three follow-up activities: reading comprehension, reading comprehension plus gap fill, or writing a composition using the target words. Participants in the writing condition had access to example sentences and glosses in both L1 and L2 throughout the treatment and were instructed to use all ten words in their compositions. Receptive post tests showed

significantly better retention for the writing task than either of the other conditions, including after a delay of one or two weeks. It seems likely that the instruction to include all ten target items required participants to engage in some amount of task planning prior to writing. This likely resulted in participants considering more carefully how the meaning of the words could be adapted to the context, thereby requiring more extensive evaluation than in Hulstijn & Trompetter (1998). These considerations are important from a pedagogical perspective, since relatively similar tasks may have vastly different outcomes depending on the requirements imposed on learners. Unfortunately, since the more difficult translation test was not used, no conclusions can be drawn about the effectiveness of the task in developing productive knowledge.

As far as the author is aware, no studies have directly addressed context generation with respect to collocation. In a study by Peters (2012), participants read a text containing formulaic sequences and single words and were instructed produce a summary of the text in order to check comprehension. 63% of the items used in the summaries were subsequently recalled in the immediate posttest. It was claimed that incorporating the items into a written

(19)

summary may have aided retention. However, overall use of target items in the summaries was low (around 10%). Furthermore, it seems quite possible that the items were used in the summary precisely because they had been better retained while reading the text. It is

therefore unclear whether the inclusion of the target items in the summaries was the cause of retention, or the result of it.

Structural elaboration

Not only semantic elaboration may aid retention, but also attending to the formal properties of a word, termed structural elaboration (e.g. Barcroft, 2002). Lindstromberg & Boers (2008a; 2008b) found attending to sound patterns in idioms, such as alliteration and assonance, can improve learners’ retention of them, for example. However, the type of elaboration is again likely to be influential. Stengers et al. (2016) investigated the potential benefits of copying as an addition to a computer-assisted language learning (CALL)

application to learn idioms. The rationale for this exercise was that learners often substituted synonyms to make non-canonical idioms (e.g. play second fiddle was incorrectly produced as play second violin). Consistent with transfer-appropriate processing (TAP) theory (Morris et al., 1977), it was hypothesized that attention to the formal properties of the idioms would consequently aid retention. Immediate and delayed post-tests found no benefit from copying over the control condition, however.

It is interesting to consider how this task differs from successful application of mnemonic devices employed by Lindstromberg and Boers (2008a, 2008b). In the study by Stengers et al. (2016), a large number of word forms were probably already known to the participants, at least receptively. The researchers suggested that for this reason, copying may have been a relatively unengaging task. By contrast, attending to alliteration or assonance may involve noticing an aspect of form which has hitherto gone unnoticed, as such patterns

(20)

are apparently difficult to recognize without the aid of a teacher (Boers, Lindstromberg, & Eyckmans, 2014b). In motivational terms, then, the copying activity may have proven rather shallow. Indeed, Morris et al. (1977) contend that the “meaningfulness” of a task must not be seen in either structural or semantic terms, but rather in relation to the learning outcomes (p.519). Where tasks require higher-level learners to perform relatively basic activities such as copying, therefore, learners may be less likely to engage with the target vocabulary.

To summarize: tasks that engage learners in precise elaboration are most likely to aid retention. These can involve directing learners to make meaningful links between vocabulary items (Boers et al., 2004; Verspoor & Lowie, 2003), requiring the item to be defined (R. Ellis & He, 1999; Joe, 1998), or, depending on the task requirements, using them in a composition (Hulstijn & Laufer, 2001). With regard to the present study, it is anticipated that generating a novel sentence will require learners to elaborate precisely on the semantic properties of the item in a similar way to defining or explaining the meaning. Copying, by contrast, may require only a superficial amount of semantic and structural engagement with the item, and with regard to the ILH, will require no evaluation, whereas sentence generation requires strong evaluation (Laufer & Hulstijn, 2001, p.17). Therefore, the following hypotheses are formulated:

2a. Gains will be greater as a result of the sentence generation task than the copying task.

2b. The rate of decay will be lower in the sentence generation task than the copying task.

(21)

Inter- and Intra-lexical Factors

While precise elaborative tasks may aid retention, the effect is unlikely to be uniform across collocations. Aside from the effects of congruence, collocation learning can be

affected by factors including word length, collocate node relationship (Peters, 2016) and, when learned as a set, inter-lexical factors such as synonymy of collocates (Webb &

Kagimoto, 2011). At higher language levels, learners may find it relatively easier to add new collocations to their lexicon: Peters (2016) found a beneficial effect of vocabulary size on the ability to recall a collocation, an effect mirrored by findings in studies of single words (Zahar, Cobb, & Spada, 2001). To date, Peters (2016) appears to have been the only study to examine the effect of vocabulary size on the ability to learn collocations.

In the present study, two intra-lexical factors are discussed which have previously been shown to have an influence on the learning of multi-word units, namely transparency and imageability.

Transparency.

Transparency concerns the relationship between a collocation’s individual words and its meaning as a whole unit (Steinel et al., 2007). This is often associated with

compositionality (Boers et al., 2014a) or decomposability (Gibbs, Nayak, & Cutting, 1989). Compositional, or decomposable, multi-word units have a global meaning which follows closely from the meaning of the individual words. For instance, to trade insults can be considered compositional in that the meaning of its parts is closely related to the meaning of the collocation as a whole. By contrast, to lose one’s rag has no clear relationship to the meaning become extremely angry and can therefore be considered non-compositional However, second language learners’ judgments of what is compositional will likely differ from native speakers as a result of their prior knowledge (Boers et al., 2014a). As such,

(22)

transparency is taken here to be the relationship between the meaning of the collocation’s parts and the whole, as perceived by the language learner.

As noted earlier, transparent items may be less easy to identify in text (Boers et al., 2014a; Peters, 2012). However, studies have previously shown facilitative effects for transparency when learned out of context. Irujo (1986) tested Spanish-speaking advanced-level learners of English on idioms which were either congruent or non-congruent. In both cases, it was observed that transparent items were both produced and comprehended more easily than less transparent items. However, transparency was not formally defined or operationalized in that study. In a study discussed earlier, Boers et al. (2004) found that etymological elaboration was more effective for retention of transparent idioms, but that for opaque idioms the effect was negligible. Seemingly, where participants could derive the original meaning from the constituent parts, this had a facilitative effect on retention. However, the operationalization of transparency was rudimentary: items were classified as transparent or opaque depending on the number of clicks it took students to identify the target item’s source origin. Steinel et al. (2007) investigated transparency in a PAL task of idiom learning, and used a rating scale of 1 to 7 to obtain transparency ratings both from

participants and a comparable group; no effect was found on productive knowledge, but comprehension was higher for more transparent idioms. The results of Steinel et al. (2007) are logical in so far as learners are more easily able to identify the meaning of a multi-word unit where the meaning of its individual parts has a close relation to the overall meaning.

In line with these studies, it is hypothesized that transparent collocations are likely to be recognized more readily than less transparent items. Since the more rigorous

operationalization in Steinel et al. (2007) found no effect in productive measures, no firm predictions are made with regard to production.

(23)

Imageability.

Imageability refers to the “image-evoking potential” (Steinel et al., 2007, p.154) of a given lexical item. Some studies have tended to treat this as synonymous with concreteness, since ratings of the two tend to correlate very strongly, at r > .90 (de Groot & Keijzer, 2000, p.9). However, Steinel et al. (2007) contend that concreteness may be more accurately applied to individual words, and that imageability is more appropriately applied to multi-word units such as idioms (p.455). This study follows Steinel et al. (2007) in referring to imageability for the findings below. Just as for transparency, imageability is assumed to be a subjective measure from the perspective of the language learner.

Studies have shown faciliatory effects for more imageable lexical items. De Groot & Poot (1997) found that more imageable words were more easily learnt than less imageable ones, and that this effect held true at both higher and lower levels of proficiency. This facilitative effect of was also found by De Groot & Keijzer (2000). In their study of English idiom learning, Steinel et al. (2007) again used student ratings to operationalize imageability and found highly imageable idioms to have a positive influence on retention, with a stronger effect in the receptive (L2-L1) than productive direction.

An explanation for the imageability effect may lie in dual coding theory (Paivio, 1986). Information is encoded either visually or verbally, meaning that the addition of a mental image provides an extra route to recall. This account supports the findings discussed earlier of Boers et al. (2004) in their cognitive semantic approach to idiom learning, and Verspoor & Lowie (2003) in their approach to inferring the meaning of polysemous words. An alternative explanation is also offered by De Groot & Keijzer (2000) in the form of context availability theory (Schwanenflugel, 1991). Contextual information supports comprehension, and since it is usually easier to supply a context for concrete words and imageable multi-word units, this will result in highly imageable items being more easily

(24)

learnt. Consistent with the findings in the above studies, the following hypotheses are presented:

3a. More transparent items will be retained better in a receptive test.

3b. More imageable items will be retained better, both in receptive and productive tests, with a greater influence on receptive than productive knowledge.

Aim and Research Questions

The primary aim of the present study is to investigate the relative effects of two tasks which can be employed as input strategies when learning collocations, namely copying and example generation. To date, studies which have investigated context generation have found mixed results for its usefulness as a task (Hulstijn & Trompetter, 1998) and those which were found to be successful have not specifically addressed the effect on productive knowledge (Hulstijn & Laufer, 2001). Nor have studies focussed on context generation with respect to collocation specifically. The study therefore also aims to understand the relative effectiveness of these two tasks as a way to develop productive and receptive knowledge of collocations. A secondary focus is to understand whether intra-lexical and inter-lexical factors interact with the effectiveness of these two task types. Based on the hypotheses formulated above, the following research questions are presented to guide this study:

1. What is the relationship between productive and receptive knowledge as a result of two productive input strategies?

2. What is the differential effect on retention of two input strategies, context generation and copying?

3. To what extent to two intra-lexical factors (transparency and imageability) influence the development of productive and receptive collocational knowledge?

(25)

Method

Global Design

The primary purpose of the current experiment was to assess the relative benefits of two input strategies, namely context generation and copying. The experiment was designed to resemble a classroom or self-study context in which a language learner has initially

encountered a collocation in reference material and subsequently records it in an effort to aid retention.

To do this, the study employed a pre-test, treatment, post-test design, with learning condition as a within-subjects factor. In both conditions, participants were provided with the target item, a definition and an example sentence. In the first learning condition, participants copied the example sentence without any changes. In the second condition, participants generated a novel sentence illustrating the meaning of the target item, and wrote this down. Despite the requirement to focus on each collocation in turn, participants were not instructed to memorize the collocations, nor informed that a post-test would follow. As such, the treatment is best characterized as incidental (Hulstijn, 2001).

To control for order and intra-lexical factors, four versions of the test were created. The target items were divided into two sets, A and B. Half of the participants copied the examples with set A and generated sentences with set B; half copied set B and generated sentences with set A. Half of the participants completed the copying task first and the example generation task second; half of the participants completed the example generation first and subsequently completed the copying task. This yielded a Latin Square design with four treatments, as shown below in Figure 1.

(26)

Session 1. Pre-test session

2. Treatment session 3. Delayed

post-tests and intra-lexical ratings Week 1 2 4 Treatment 1 (n=4) Treatment 2 (n=4) Treatment 3 (n=3) Treatment 4 (n=4) Pre-tests 1 2 3 Copying A B Example B A Immediate post-tests 1 2 3 Delayed post-tests 1 2 3 Example A B Copying B A

Figure 1. Global experimental design (numbers 1-3 refer to test types; letters A and B refer to item subsets.)

Participants

Participants were first-year Bachelor students of the English Language and Culture programme at the University of Groningen (12 female, 3 male). Their native language was Dutch and their ages ranged between 18 and 25 years old. Dutch university students have typically had 6 to 7 years’ previous instruction in English, and the entry requirement for the BA English Language and Culture is equal to B2 in the Common European Framework of Reference, meaning that the participants could be considered at least upper-intermediate level learners of English prior to the experiment. During the first session all participants were administered the 14-level version of the Vocabulary Size Test (VST) (Nation & Beglar, 2007). This revealed that all participants had a receptive knowledge exceeding 10,000 word

(27)

families in English (M = 11,649, SD = 907). There is therefore a good case for considering these participants to be advanced-level learners. Participants registered for the experiment voluntarily and were paid 8 euros for their involvement. Upon completion of the pre-tests, participants were anonymously distributed between the four treatments so that vocabulary sizes remained as balanced as possible between the conditions. Descriptive statistics for the participants’ age and vocabulary scores are provided in Table 1.

Table 1. Descriptive statistics for participants’ age and vocabulary size.

Variable M SD Min Max

Age 20,27 2,15 18 25

Vocabulary size 11,649 907 10100 13200

Note. Vocabulary size is measured by the Vocabulary Size Test (Nation & Beglar 2007).

Target Collocations

The target items were a list of 24 low-frequency English collocations, with a range of node-collocate relationships. This range was considered ecologically valid in that language learners encounter collocations across a range of syntactic categories while learning

vocabulary incidentally.

As noted earlier, congruence influences the ability to learn L2 phrases (Nesselhauf, 2003; Wolter & Gyllstad, 2011); additionally, congruent items may be subject to so-called homoiophobia effects (Kellerman, 1978). Therefore, all congruent items were excluded. This further ensured that L2 knowledge and L1 transfer would not be confounded. Congruence was operationalized as the ability to translate a collocation word-for-word from the L1 to the L2 (Nesselhauf, 2003). For example, blind panic (‘blinde paniek’) is translatable word for

(28)

word and was therefore considered an unacceptable candidate. By contrast, a withering look is translatable by the Dutch een vernietigende blik (‘a destroying look’), which yields no direct L2 translation and was therefore considered non-congruent. Congruence was considered only at the level of the collocation: that is, other cognate factors were not controlled for.

To select appropriate items, a list of candidates was compiled from a variety of sources, including the Cambridge Advanced Learner’s Dictionary (2008), O’Dell &

McCarthy (2017), and Hill & Lewis (1997). Over 100 items were originally compiled. These were analysed using the Lextutor vocabulary profiler (Cobb, n.d.), and collocations

containing words above the 7,000 band in the BNC-COCA lists (Nation, 2012) were

discarded. On the basis of the VST results, it could therefore be assumed that the participants would be familiar with the form of all of the collocates within the study. To check for

congruency, two native Dutch speakers studying Applied Linguistics at the University of Groningen were asked to translate the collocations with a word-for-word translation, after which congruent items were discarded. Pilot testing with a similar group of Dutch native speakers (N = 9) indicated that a number of items were already well-known among similar groups, and these were also removed. A final list of 44 items was used for pre-testing, from which the 24 target items were later selected.

The 24 target collocations are reported in Appendix A with their mutual information (MI) scores (Church & Hanks, 1990) and log-likelihood values. MI scores are calculated by comparison between the expected number of occurrences (the number of times the word pair would be expected to co-occur within a given span if the words were completely independent of each other), with the actual number of occurrences of the word pair within the corpus. As such, MI scores assess “the strength of association between two words” (Clear, 1993,

(29)

pp.279-280, original italics). MI scores are more effective than t-scores at identifying strong but low frequency relationships (Clear, 1993), hence the preference for MI score in this study. Hunston (2002, p.71) claims that strong collocations can be identified by an arbitrary MI score of greater than 3. With one exception, all of the collocations used met this criterion. The exception, get someone’s goat, was accepted on the basis of its occurrence in both the Cambridge Advanced Learner’s Dictionary (2008) and the COBUILD idioms dictionary (2012).

Materials

The following materials were used: (a) dependent measures of form recall and form recognition; (b) a vocabulary size test; (c) treatment task sheets; (d) intra-lexical rating sheets; (e) a background questionnaire.

Dependent measures.

To assess gains in knowledge, three measures were devised as pre-tests and post-tests. Using multiple measures is more likely to provide an accurate picture of vocabulary

development (Webb, 2005) and the strata of productive and receptive knowledge. Since this study was particularly concerned with target-like production of word sequences, as opposed to plausible but non-canonical collocations, all three tests focused on participants’ knowledge of syntagmatic association. Of these, two tests assessed productive knowledge (recall), while the third tested receptive knowledge (recognition). The recall tests were designed to provide a complementary picture of the participants’ productive knowledge at two levels of sensitivity. The first required participants to supply a collocate; the second also required a collocate to be provided, but with the addition of a cue to act as a more sensitive measure.

(30)

All three tests focused on form. No tests of meaning were employed since the majority of the target items were relatively transparent, and the main focus of the study was to test higher-level learners’ ability to employ the appropriate form for the meaning they attempted to express. All three pre-tests included a total of 44 items, 24 of which became the target items, and the remaining 20 of which were then discarded. This had the dual purpose of preventing participants from identifying the eventual target items, while also allowing the researcher to select target items which were least familiar to the participants. The dependent measures are described below.

Free form recall. Productive knowledge of form was measured using two tests. In the

free form recall test, participants saw a collocation with a single word missing and were asked to supply the missing word. This is illustrated below in example 1.

Example 1

to __________ profusely

[Correct answer: apologise]

This test was intended to measure knowledge of syntagmatic association by requiring participants to supply collocates from a node. Sufficient syntactic information was provided so that only the correct part of speech could be supplied. In each case, the gap could only be filled by a small range of collocates or a single collocate. For example, the node word profusely yields only three verb collocates in the British National Corpus (BNC), namely sweat, bleed and apologize. Participants were therefore instructed to write down all the options they thought possible; however, only the target collocate was marked as correct. No context was provided since this may have allowed participants to guess the correct collocate

(31)

from the provision of meaning, at which point the test could not be considered only to be a test of syntagmatic association. A complete sample test is included in Appendix B.

Cued form recall. In the second test participants were again required to supply the

missing collocate; however, the first letter or two letters of the missing word were provided as a cue. This served two purposes: firstly, it limited the possible response to only a single collocate, and secondly, it allowed the test to act as a more sensitive measure of productive knowledge than the free form recall test. This allowed a composite picture to be gained of participants’ productive knowledge. An example of the test format is given in Example 2.

Example 2

to a__________ profusely

[Correct answer: apologise]

In all cases, the cue was designed to allow only one answer which occurred frequently within the BNC, and which could not reasonably be substituted for another word beginning with the same letter. As such, the cue “to a__________ profusely” could only be answered correctly with apologize. Appendix B contains a full sample test.

Multiple choice. In order to test receptive knowledge, a multiple-choice test was

developed. Participants were presented with the node word and had to choose from a list of five candidate collocates. A sixth option, I don’t know, was included to discourage random guessing. All options had the same initial letter to prevent the cue in the previous test from revealing the correct answer. Example 3 provides a sample item:

(32)

Example 3

to __________ profusely

a) advance b) accept c) apologise d) annoy e) agree f) I don’t know

[Correct answer: profusely]

Distracters were chosen from the Cambridge Advanced Learner’s Dictionary (2008) on the basis of being plausible but non-canonical combinations. None of the distracters were found to have MI scores above 3 in the BNC, or to collocate frequently with the node word. All distracters were entered into the Compleat vocabulary profiler (Cobb, n.d.) and were found to be within the 7,000 most frequent words on both the BNC and BNC-COCA

frequency lists (Nation, 2012). It was therefore reasonable to assume that participants would have a receptive knowledge of the overwhelming majority of the distracters.

Post-tests. Immediate and delayed post-tests contained the same target items as the

pre-tests. The pre-tests differed only in containing 20 extra distracters. Each of the six post-tests (3 immediate, 3 delayed) presented the items in a different random order so as to reduce sequence effects. This order was created by the random function in Microsoft Excel.

Pilot testing. Pilot testing was conducted to ensure that the syntagmatic associations

being tested also existed for native speakers. For the free recall test, native speakers of

English (N=20) participated in a pilot test via Google Forms. Test items were discarded if the target collocate was supplied in fewer than 20% of the responses. For the cued recall test, a separate group of native speakers (N=50) participated via Google Forms. In this case, test items were discarded if fewer than 60% of the responses included the target collocate. In

(33)

some cases, other canonical collocates turned out to be possible, in which case an extra letter was added to the cue to limit the response further. If this proved impossible, the item was discarded. The multiple-choice test was conducted with three native speakers and only items which were scored correctly by all three participants were retained.

In order to determine the length of time required to complete the tests, further pilot tests were conducted with non-native students from the Applied Linguistics Master’s

programme at the same university. These revealed that the time required for each test would be as follows: free recall 5-12 minutes; cued recall 5-11 minutes; multiple choice 3-7

minutes. All tests could therefore be conducted within a 1-hour session.

Vocabulary size test.

The Vocabulary Size Test (Nation & Beglar, 2007) is comprised of 140 four-item multiple choice questions and is intended as an estimate of the number of word families known receptively in English, to the nearest hundred. Unlike the vocabulary levels test

(Schmitt, Schmitt, & Clapham, 2001), the VST does not provide a stratified profile, but rather a single figure which approximates a participant’s receptive vocabulary size in word families. This test was chosen for two reasons: firstly, it measures vocabulary size at a level

appropriate to the participants in this study, namely in excess of 10,000 word families; secondly, a single number per participant could be entered into the analyses as an interval predictor variable.

Learning conditions.

Copying. In both learning conditions, participants were supplied with the target item,

its definition, and an example sentence. In the first condition, participants were instructed to read the item and definition, and copy down the example sentence word for word in a space

(34)

given underneath. A sample item is provided in Example 4; a full task is available in Appendix C.

Example 4

A sticking point = a point in a discussion on which it is not possible to reach an agreement The main sticking point of the peace talks is exactly how the land should be divided up.

Copy the sentence:

_____________________________________________________________________

Sentence generation. In the sentence generation task, participants were asked to read

the target item, the definition and the example sentence. However, rather than copying the sentence, participants were instructed to generate an original sentence which both

incorporated the target item and clearly illustrated its meaning. A sample item is provided in Example 5; Appendix C contains a full task.

Example 5

to strenuously deny = to deny something with a lot of energy

He strenuously denies all the allegations against him.

Write your own example:

(35)

The instruction to illustrate the meaning of the target collocation is consistent with precise elaboration as defined by Stein et al. (1982), namely making clear the significance of new information. Definitions and examples were taken from the Cambridge Advanced Learner’s Dictionary (2008). Where definitions were absent, or insufficient to illustrate the meaning, two native speakers of English were consulted to provide a definition based on examples from the BNC. Where examples were missing or not sufficiently clear to illustrate the meaning, an example was taken from the BNC and modified by the same two native speakers.

Intra-lexical rating sheets.

To obtain transparency and imageability ratings, a rating system was employed closely based on Steinel et al. (2007). Participants selected the extent to which they agreed or disagreed with two statements, A and B. For transparency, statement A read “The meaning of this collocation as a whole has a lot in common with the literal meaning of the words.” For imageability, statement B read “I could easily visualize this collocation.” Ratings were made on a 7-point Likert-scale, from 1 (“completely disagree”) to 7 (“fully agree”). To ensure equal attention to the target items, each collocation was presented in turn via PowerPoint presentation. Appendix D contains a sample rating sheet.

Background questionnaire.

A questionnaire was included to gather background information about the participants. The aim was to establish the participants’ age and language history, their attitude to English and to foreign languages, and to check they had not engaged in deliberate learning of the target items between tests. The questionnaire was adapted from the Language Background

(36)

Questionnaire (LBQ) by Gullberg & Indefrey (2003). The full background questionnaire is provided in Appendix E.

Procedure

Participants attended a pre-test session in which the tests were administered in the following order: free recall, cued recall, multiple choice, Vocabulary Size Test. This order ensured minimum contamination between tests. All participants were informed that the study involved vocabulary research, but the purpose of the tests was not revealed. To ensure that participants did not study the target items at home, the researcher asked the participants to refrain from looking up the vocabulary or discussing the items outside the test sessions. Participants agreed verbally to avoid discussing the tests.

Prior to the free recall test, participants were given a brief instruction about

collocations in general. On starting the test, participants were encouraged to write down as many possible answers as they could. After collection, the cued recall test was administered. In this case, instruction was given that only one option was possible, and that participants should not make guesses if they did not know correct answer. This instruction was provided to preclude guessing. Participants then completed the multiple-choice test. Next, participants were directed to the Lextutor website (Cobb, n.d.) and completed the VST via their mobile phones or laptops. Participants provided their VST score to the researcher and were free to leave. Completing the VST at this stage also served as an intermediary activity, to counteract any rehearsal of items from the recall and receptive tests.

The following week, all participants attended the same treatment session. It was briefly explained that two strategies for recording and learning new vocabulary involve either writing down the vocabulary item or generating one’s own sentence and writing that down

(37)

instead. Participants were given a brief example and generated their own example sentence in plenary. It was emphasized that the novel sentence should exemplify the meaning of the collocation and that the researcher would subsequently check whether this instruction had been followed. This was done to ensure that the target items were elaborated precisely (Stein et al., 1982).

Participants were then given their task sheets, according to one of the four treatments assigned previously (Figure 1). Participants were instructed to work through the task in the order presented to ensure that order effects remained balanced across the conditions. They were also told to record the exact times they started and finished the two tasks. The

participants then worked through the task at their own pace, after which the worksheets were collected in. To prevent short-term memory rehearsal of the target items, the participants were then directed to a website to perform an unrelated exercise. After 5 minutes working on the intervening exercise, participants were administered post-tests in the same order as previously: free recall, cued recall, multiple choice. Each test was collected in before a participant could start the next test. Once all materials had been collected in, participants could leave, and were reminded that they should continue not to discuss the items or the study outside the class.

Two weeks later, participants were administered the delayed post-tests in the same order as the previous sessions: free recall, cued recall, multiple choice. No prior warning of these tests was given. Next, the intra-lexical rating sheet was distributed. Participants saw each collocation on a PowerPoint and made their judgments on the rating sheet. Finally, the participants completed the language background questionnaire. On completion, participants were thanked for participation, received their payment and were free to leave.

(38)

Scoring

Sentence generation.

To ensure that semantic elaboration had been successfully operationalized, it was necessary to check whether participants had generated novel sentences, and whether these sentences made sense semantically. Three English teachers, each with at least three years’ experience, were asked to rate each example sentence on a scale from 1 to 4 (see Table 2 below). This scale was an adapted version of Joe’s (1995, p.151) levels of generativeness scale.

Table 2. Levels of generativeness rubric for sentence raters. Rating Descriptor

1 No generation: No demonstrable effort to integrate the meaning of the collocation. The example sentence has (largely) been copied word for word 2 Low generation: Very little effort to integrate the collocation within the

sentence. Minimal changes made to the provided example sentence.

3 Reasonable generation: some effort to exemplify the target collocation within the sentence. Substantial changes made to the provided example sentence 4 High generation: High effort to exemplify the target collocation within the

generated sentence. Generated sentence is completely distinct from the provided example

Note. The above scale is based on Joe (1995, p.151).

Descriptive statistics indicated that the level of generativeness was high (M = 3.74, SD = 0.50), with no sentences scored 1 by any of the raters. The example learning condition

(39)

was therefore considered to have been operationalized successfully. Table 3 shows descriptive statistics for the teachers’ ratings.

Table 3. Descriptive statistics for rater scores, levels of generativeness per sentence.

Rater M SD Min Max

1 3.94 0.27 2 4

2 3.41 0.62 2 4

3 3.86 0.36 2 4

Note. No inter-rater agreement was calculated as the aim was only check high generation across participants.

Dependent measures.

Scoring on the pre-tests and post-tests was conducted on a strictly binary basis: ‘1’ for a correct answer, ‘0’ for an incorrect answer. Credit was given where there were only minor spelling errors and where the word form could not be confused with any other form. For example, *apoligise in apologise profusely was marked correct. If it was possible to confound the word form with any other form, the score was marked zero. Thus, *loose for lose one’s rag was marked incorrect.

Intra-lexical ratings.

Transparency and imageability scores were obtained by calculating the mean of all the participants’ ratings. The mean ratings and their standard deviations can be found in Appendix A.

(40)

Analysis

To test whether the treatment as a whole resulted in higher scores, three one-way repeated-measures analyses of variance (ANOVAs) were conducted, one for each test type. The dependent variable was test score, on an interval scale; the independent variable was time, a categorical variable with three levels (pre-test, immediate post, and delayed-post). Post-hoc analyses using a Bonferroni correction were used to determine whether there were significant differences between pre-tests and immediate tests, pre-tests and delayed post-tests, and immediate and delayed post-tests. The alpha level was set at p<0.05.

To understand the differential effects of the two learning conditions, this study

employed mixed effects logistic regression models. Mixed effects models allow the inclusion of participants and items as random effects (Baayen, 2008). Thus, individual variation from participants, as well as differential performance on test items, can be used to explain a certain amount of the variance in the model. Unlike linear regression models, which model an

interval dependent variable, logistic regression models are used to calculate probabilities of a binary variable, dependent on categorical variables (factors) and interval variables

(covariates) (Peters, 2016). In this case, therefore, each observation specifies the binary score for a single item completed at a single participant. The process begins with a null model in which only random intercepts are specified; predictor variables are then entered into the model incrementally, and each iteration is tested against the previous one using likelihood ratio tests in a Chi Square distribution. Once a provisional best fit model is established, the predictor variables are then removed one by one in order to check whether any are redundant and can be excluded. For this analysis, the alpha level was also set at p < .05.

One model was created for each of the three test measures. The dependent variable was item gain, calculated by subtracting the pre-test score from the immediate and delayed post-test scores. The relevant categorical predictor variables (factors) were: time (immediate

(41)

or delayed) and learning condition (copy or example). Relevant interval predictor variables (covariates) entered into the model were: vocabulary size, order of presentation,

transparency, and imageability. All interval predictor variables were mean-centered prior to analysis, as recommended by Baayen (2008). Predictor variables were entered in a stepwise fashion as follows: condition, time, then a condition*time interaction. Random slopes were then added for participants as a factor of condition. Next, inter-lexical factors were added, firstly order of presentation, then vocabulary size. Finally, intra-lexical factors were added, first transparency, then a transparency*condition interaction, followed by imageability and an imageability*condition interaction. Once an interim model had been established, significant predictor variables were removed to check for redundancy. The resulting best fit models are presented in the results section.

Data was collated in Microsoft Excel, which was also used to produce descriptive statistics and graphs. All inferential statistics and their relevant graphs were produced in R Studio version 1.1.383 (R version 3.4.0).

(42)

Results

RQ 1. The Relative Development of Productive and Receptive Knowledge Descriptive statistics.

Table 4 presents the pre-test, immediate and delayed performance scores for the treatment as a whole. It is apparent that gains were made in all three measures, with the highest levels of retention achieved in the multiple-choice tests and the lowest scores in the free recall tests. Minimum scores are higher in all three delayed post-tests than immediate post-tests, indicating that some learning may have occurred as a result of taking the immediate post-tests.

Table 4. Mean pre-test and post-test results on three tests of collocational knowledge.

Test Free recall Cued recall Multiple choice

M SD Min Max M SD Min Max M SD Min Max

Pre-test 1.53 1.51 0 4 3.47 2.72 0 8 10.87 4.10 4 19

Immediate 14.13 4.81 3 22 16.53 4.36 6 23 21.33 2.06 17 24

Delayed 11.80 4.60 5 20 15.33 3.92 7 21 20.80 1.82 18 24

Note. Maximum score = 24.

Inferential statistics.

Productive knowledge. To determine whether there was a significant difference in

productive knowledge between pre-tests and post-tests, a one-way repeated-measures ANOVA was conducted for each recall test. For the free recall test, Mauchly's test was non-significant, p=0.058, and therefore sphericity was considered not to have been violated; however, because the p-value was only marginally non-significant, the Greenhouse-Geisser corrected test is also reported below in Table 5. The ANOVA revealed that time had a

Referenties

GERELATEERDE DOCUMENTEN

The study concluded with regard to small STDSs that size does matter, because the smaller STDSs in this group are relatively more efficient in minimising their operating costs,

The HPDF, together with partners, would undertake to: (i) support strategic thinking and advocacy on health promotion and social development issues; (ii) support special projects

Als de kunstmest gedurende het seizoen in vijf kleine giften, precies volgens advies, wordt uitgereden levert dit 1,2% meer opbrengst op dan wanneer dezelfde hoeveelheid kunstmest

Figuur 5a Percentage overgewicht (incl. obesitas) voor meisjes naar opleiding ouders/verzorgers voor de eigen organisatie ten opzichte van alle JGZ-organisaties die deelnemen aan

This study aims to test whether the use of FL distinguishes highly advanced from intermediate speakers, if receptive knowledge precedes productive knowledge

Het naoorlogse beeld van de Japanse vrouw als hulpeloos wezen moest juist laten zien hoe goed het bezettingsbeleid was voor Japan en hoe nobel de intentie van de Amerikanen: door

At first, the music store has a huge functional meaning for its visitors (at least the ones making music). There are mainly two reasons for this. First, there are people who are

Zoals eerder gesteld staat ‘algemeenheid’ namelijk voor de rechtsstaat en voor het feit dat de burger aan de hand van educatie naar het algemeen belang moet worden geleid..