Language in Use: Variability in L1 and L2 language processing

(1)

Faculteit der Letteren

Language in Use:

Variability in L1 and L2

language processing

A dynamic systems theory perspective on real

time language use

Rika Plat

(2)

(3)

The completion of this thesis also concludes many years of studying at the University of Groningen. Therefore, thanks are due not only to people who have assisted and supported my work on this thesis, but also to friends and fellow students. I want to thank my brother, my sisters and all my friends for having been available for coffees, dinners, conversations, working out and going to movies or concerts. There are three fellow students I want to

mention in particular, whose friendship, stimulating conversation and cooperation on multiple projects have made my years as a student much more enjoyable: Bregtje Seton, Margreet van Koert and Ruggero Montalto, thank you!

This thesis would never have been written without the guidance, support and patience of my supervisor, Wander Lowie. Not only did he suggest this topic and point me in the direction of pink noise, there is also no other teacher that has taught me more during my years at the university of Groningen. At times when I hardly dared contact him on the topic of my thesis because so much time had passed between appointments, he was always very encouraging that I pick up the project where I had left off. If it were not for his infinite patience and confidence that I would be up to this task, I doubt I would ever have finished it.

I am also indebted to my second supervisor and loyal participant to the experiment over the course of three years, Kees de Bot. A more motivated, cooperative and enthusiastic

participant would hardly be imaginable. Kees has gone to some lengths, sometimes even choosing his language carefully in his personal life, all in the name of science. Without him, this rather unique „Kees-study‟ would not exist. His enthusiasm and curiosity for looking beyond the beaten path have been an inspiration.

(7)

2 Abstract

In this paper, a radically innovative view is taken on L1 and L2 lexical knowledge. Instead of considering lexical knowledge and the mechanisms necessary to retrieve lexical knowledge as essentially static and linear, this paper explores the view of considering language use as dynamic and non-linear. This paper aims to complement what we can learn about language processing using the traditional linear statistical methods with using non-linear statistical methods. Variability in L1 and L2 language production is looked at on multiple time scales: seconds, hours, days, weeks, months and years.

A longitudinal single subject study is presented that took place over a two-year time span. The experiment discussed concerns a repeated word naming experiment in L1 Dutch and L2 English. The response time latencies obtained from this task are analysed using spectral analysis. This technique focuses on dynamic frequency patterns of variability that have thus far been regarded as irrelevant „noise‟. Looking at the patterns found within the noise shows they fall within the spectrum of „pink noise‟, a variability pattern that is associated with the degree of automaticity and control in behaviour. Automatic behaviour shows a significantly different pattern of variability – falling within the spectrum of random or white noise – than more controlled behaviour, falling within the spectrum of pink noise. These techniques have in the past already been applied to language processing in monolingual speakers, but are now for the first time applied to the language processing patterns of an advanced bilingual speaker. The results found here indicate that in the multilingual mind, there is a strong interaction between languages. Spectral analysis shows that the variability of lexical productions in L1 and L2 changes as a function of the recent context of episodic use of the other language. The variability patterns after a 7-day period of exclusively using the L1 were reliably different from the patterns found after a 7-day period of exclusively using the L2. The language used most recently showed a much clearer fractal pattern („pink noise‟) after being used

exclusively for 7 days, while at the same time, the language that had not been used had

changed toward the more random pattern, reflecting perturbations from the most recently used language. Recent episodic use of either language was thus found to directly influence

(8)

3 1 Introduction

Amongst the many cognitive functions a person has, language has always held a special place. For one, language has often been considered one of the most amazing of all cognitive abilities a human being possesses; it allows us to communicate with one another, even about the most abstract and complicated matters. Therefore, it is often thought to be the defining trait of humanness, the most important difference between humans and all other animals.

Even so, language research was restricted to the history of language development rather than the language spoken now up until the beginning of the 20th century. It was

Ferdinand de Saussure who is generally thought to be the founder of modern linguistics. This famous linguist contributed in making linguistics a subject fit for scientific study, by making a clear separation between the language use (parole) and the underlying structure of language (langue), as well as a separation between the mental concept of a word (signifié) and the actual object it refers to in the real world (signifiant). Language was thus effectively cut off from the world and from outside influence, and language research was becoming increasingly structuralist. The merits of studying language like this have undeniably been enormous and language research has made huge leaps over the last century. Taking something outside of time and influence has obvious advantages; it is a bit like cutting a slice out of a moving, developmental process and being able to look at the details at a certain point in time.

However, the costs of isolating en categorizing linguistic features are starting to become apparent as well. For how realistic are the boundaries with which we categorise, and might these boundaries not have been placed rather arbitrarily? Try to think of the mental concept of just one word, say a word such as „dog‟. A simple word, part of anyone‟s vocabulary; presumably, this word has the same meaning to all of us as well. Now imagine what this meaning - the underlying concept of the word - consists of. We all know a dog when we see one, so surely there must be overlap, but would the word „dog‟ really have the same meaning for a woman with a favourite pet poodle or a blind man with a seeing-eye dog? For that matter, is the meaning of the word fixed for one individual, or might it slightly change with your ever changing experience, say after you have been bitten by a dog? Do categories have any real basis, and how random are their boundaries?

(9)

4 influenced by preceding events. Language, in essence, is interaction. It is a reaction to and accumulation of the world and everything in it. Therefore, language should be looked at as inseparable from the time and context in which it takes place. Instead of isolating language from the world and awarding it this special place in between other cognitive functions, it is time we put the world back in language. This paper tries to determine what the continuity of language processing would mean for the way in which we look at language processing in general and bilingualism in particular. A word naming task was devised, and one single participant took part in the experiment for a period of 2 years. Language processing is looked at using linear and non-linear statistical methods to look at the variability in L1 and L2 language production on the time scale of seconds, hours, days, weeks, months and years.

2 Background

2.1 Dynamic Systems Theory

Dynamic Systems Theory is by no means a new approach to looking at the complexity of change. It is a theory that originated in physics and mathematics, in order to account for the processes within complex systems. A system can be described as a set of components that interact; a complex system is a system in which more than two components interact, causing a non-linear development whose outcome and different stages are more diverse and much harder to predict (van Geert, 2008). Our world abounds in complex systems, ranging from the human body to an entire culture,the ecosystem within a pond to the global weather. As in any complex system such as the weather, many components mutually affect one another; in this case things such as temperature, moisture, air circulation, etc. whose interaction determines the weather on any given day. The state of any of the components in the system, and thus also the way in which it influences the other components in the system, is dependent on its

previous state plus its change over time. At any point in time, a complex system such as the weather, can change into a number of other states, depending on all these different, interacting components. Looking at how all the components interact and the number of possible states they can evolve into, it is really no wonder the weather forecast is not always correct!

(10)

5 plenty of clues to successfully deduce the underlying structure and grammar of the language. Chomsky, however, stated that the language input does not provide enough clues to enable children to deduce all of the underlying structure and grammar solely from the language input. This led Chomsky to propose that children are born with a blueprint for language, the

Universal Grammar. Without going into the discussion of whether or not language is innate, Dynamic Systems theory does provide an alternative theory for the fact that children are very well able to construct a fully developed and complex grammar, even if the language input they receive is quite poor. An important feature of a dynamic system is that it is

self-organising; constant change, growth and development are inherent to the dynamic system. An example of the self-organising quality of language is the existence of pidgin languages; these are often becoming increasingly complex and evolve into languages with a complete and intricate grammar system, that far outreaches the complexity of the input received by the speakers of a pidgin language (Bickerton, 1991).

Another very important point as to why cognitive processes such as language acquisition can be considered as complex dynamic systems, is their change over time. That active second language learning means constant change may not sound very surprising, since a language learner is by definition adding new words and acquiring new grammatical patterns all the time. However, the idea that even one‟s L1 is in constant flux is often implicitly denied in the way language research is being conducted. When measuring L2 performance, L2 speakers are almost always compared to a control group of L1 speakers, the assumption being that the performance of L1 speakers provides a static baseline. Even though the L1 is usually quite entrenched in a speaker‟s mind, it is still developing over time. A bulk of attrition research proves that the L1 is not immune to loss when it is not being used over an extensive period of time. The area most susceptible to loss in the L1 is lexical access; grammatical knowledge seems to be quite stable; since even when L1 attrition is quite severe, people long retain a grammatical knowledge surpassing all but the most advanced L2 speakers (Schmid 2009). However, it is possible to lose the L1 completely, as shown by Palier et al. (2003), who tested Korean children that had been adopted in France. Some of his subjects had used their native language for as much as 8 years; however, when presented with Korean words and sentences when the subjects were in their 20s, they could not distinguish these any better than a group of French control subjects could. Also without a drastic change in circumstances such as

(11)

6 second language often had limited linguistic coding skills in their L1. On the basis of this finding, they formulated their Linguistic Coding Differences Hypothesis (LCDH), which states that difficulties with the rule system of the first language correspond directly to related problems in learning a second language.

Even though L2 acquisition is understood to be a process that is changing constantly whenever new knowledge is added, the amount of variability in the L2 acquisition process is still often underestimated. The bulk of research on L2 acquisition seems to assume that language acquisition is linear; an assumption that shows in the number of quantitative or cross-sectional studies that are aimed at determining the typical developmental path. However, language acquisition, by its nature constantly open to outside influence and thus constantly affected by it, is a nonlinear process. Studies of language acquisition that look at means over a large group of learners offer a lot of useful information, but a closer look at individual variation shows that a great deal of valuable information concerning actual, real time development is also filtered out and overlooked in quantitative studies. L2 acquisition is a very individual process, since it depends on so many factors leading to constant variation. This variation is stemming from both a constantly changing language environment and from self-organisation within the system.

The outside influence, and thus the context in which a language is learned, can in itself also be seen as a dynamic system that is constantly changing. In this view, context can never be reduced to a mere backdrop against which language is learned; it is in constant interaction with the language learner and vice versa, and can impossibly be separated from the learner. The context in which language acquisition takes place includes many components that will have different effects on different learners. There is for instance the cultural context that includes the role of student and teacher in a particular cultural environment. There is the social context, including the relationship with the teacher and other learners, and the

(12)

7 Apart from the context that causes variation among individual language learner‟s developmental paths, every learner also has internal variation inherent to the developmental process. In a self-organising system, variation is necessary for the system to develop. An increased amount of variation is in fact often a precursor to a jump in development to a higher level of performance. This was also found in the L2 writing performance of an advanced student of English whose writing over the course of three years was analysed on sentence complexity and vocabulary use. Looking at measurements for average word length showed a relatively stable period that was followed by a period showing many fluctuations in

performance (Verspoor, Lowie & van Dijk, 2008). After this period of variation, performance stabilizes again on a higher level of performance. The high amount of variability therefore seems a consequence of a necessary re-organisation of the system in order to enable a big step in development.

The language acquisition process can be characterised as the state of the system plus its change over time. The outcome is then the input for the next phase. For different

individuals learning a second language, developmental paths have been found to take very different routes. Still, a complex system cannot develop into an unlimited number of states. Usually, there are constraints that govern its development. In this respect, language also complies to the properties of a dynamic system; learning a new language is definitely limited by certain contraints. The learning of new words is for example constrained by the total number of words a language comprises, but also by the amount of time one can spend on learning new words, the amount and type of contact with the target language, one‟s

(13)

8

2.2 Single Subject Studies

Since the developmental process depends on so many different factors and can be quite different between individuals, single subject studies are especially suitable when looking at variation in detail. After all, studies with many subjects usually look at means across

groups, averaging out all variation. In order to really discover how one state is transformed into another and by what mechanism this occurs, single subject studies and dense

observations are the best way of looking at these processes in detail (Van Geert and Steenbeek, 2005; Larsen-Freeman, 2008). Single subject studies are by no means a new phenomenon; a study conducted to assess response time variability was executed no later than 1886 by J.M. Cattell. Due to the obviously more primitive means of recording response times, these cannot directly be compared to reaction times obtained today. However, Cattell already took stunning steps in looking at response time variability and word recognition in the L1 and L2.

(14)

9 and L2 speakers. For L1 speakers, processing is probably more efficient and autonomised than for L2 speakers. They pose that speed is probably not the only measure needed to assess this, since it is possible for the controlled processes to gain speed by practice, without there being an actual change in the „blend of underlying mechanisms‟. A measure that will give more insight in whether a change takes place at this level is the coefficient of variation (CV), defined as the RT/SD. If the controlled processes get faster without getting more efficient, the variation overall will stay the same in proportion to the RT, resulting in an unchanged CV. If however, there is a change in the underlying mechanisms, there will be less variation in the RTs in proportion to the mean RT, resulting in a lower CV score, even if the RTs are faster. A single subject with moderate to high level reading skill in English was tested four times over a period of three weeks on a set of 150 words taken from four frequency bands and a 150 nonwords. Over the 4 sessions, the RTs of the participant became faster; more importantly though, the CV data showed a significant decrease in variability. This means the SDs on the later sessions were lower, also in proportion to the lower RTs. When looking at the words from the different frequency bands separately, this decrease in variability was found to be stronger on the lower frequency words than the higher frequency words. Segalowitz et al. conclude that the blend of mechanisms underlying the word recognition process have changed in nature from being less controlled and more automatic. This conclusion seems a bit

premature; after all, it seems only natural that lower frequency words would benefit more from training than higher frequency words, and that the variability decreases may be due to practice rather than a change in underlying processes of word recognition. However, the CV is a simple and effective measurement when looking at the amount of variation in reaction time data.

2.3 Stability in the Multilingual Mind

(15)

10 even across groups of people. An example of the first presumption can still be found in a lot of attrition studies, where the level of attrition is usually determined by administering translation tasks; the number of words a subject does not remember is taken to correspond directly to the percentage of vocabulary the subject has supposedly lost (Meara, 2002). Also, the often used paradigm of lexical priming implicitly denies the variability of lexical

representations, in assuming the use of a prime has a similar and fixed effect on reaction times across many individuals.

One way of theorising about how language is organised in the brain that is consistent with language as a complex dynamic system is connectionism. Connectionism seeks to explain cognitive processes by using computer simulations of neural networks. Recent work in this area in trying to incorporate a more dynamic view of the lexicon in attrition research has been conducted by Paul Meara (2004). Meara moves away from the strong focus on the individual word or lexical entry, and uses a Boolean network to find out what the implications for attrition research are when thinking of the lexicon as a structured network. Assuming, as Meara does, that words are connected, and that the activation of one lexical item influences the activation level of other lexical items it is connected with, attrition is by no means a simple and linear process in which words get removed from the lexicon one by one. His computer simulation of a simplified lexicon shows that enough attrition events can cause a „ripple‟ throughout the system; once enough words are deactivated, entire parts of the network can become deactivated. The loss of every particular lexical item, though not necessarily causing one such „ripple‟, will thus weaken the structure of the network. Also, every one of his simulations takes a different course, which could suggest attrition to be a process different for every individual.

The above mentioned results all pertain to simulations of a single language lexicon. Meara (2006) has, however, also tried to test the difference in stability of a lexicon that contains an L1 and an L2. Another Boolean network consisting of 1,000 L1 and 1,000 L2 words where only 2 percent of the words are „entangled‟ (linked to and thus receiving activation from both an L1 and an L2 word) quickly settles into an attractor state: two thirds of the L1 words being active versus only 10 percent of the L2 words being active. „Forcing‟ the L2 to become more active by activating 15 words at each iteration of the model leads to activation of almost the entire L2 lexicon. Interestingly, this automatically suppresses

(16)

11 L2, and the interesting feature of networks that Meara uncovers here, is that even though a system may not be explicitly programmed to do so, the activation of one subset can inhibit the activation of another one.

Meara‟s work discussed above deals mostly with the global properties of the lexicon, and his computer simulations of a lexicon are hugely simplified in order to find out the basic features a lexicon should have. Elman (2004, 2009) has taken a closer look at what specific information should actually reside in the lexicon and how a lexical representation should be defined. Traditionally, the mental lexicon is thought to be a passive data structure that resides in long term memory, in which at the very least the meaning of a word is stored. The lexicon is often assumed to contain „types‟, abstract representations of what is known about a certain word. All instances of the same word are then „tokens‟ of this word. The meaning of each „token‟ that we come across can be interpreted by way of this „type‟ stored in our mental lexicon. Elman (2004) argues however that the meaning of a word is almost always context dependent, and therefore hardly ever means the exact same thing. Rather than having abstract representations of words stored in a mental lexicon, Elman proposes to think of words as having a similar effect as other kinds of sensory stimuli: as acting directly on mental states. Elman sums up this view by claiming that “words do not have meaning, they are cues to meaning” (2004, 306).

In this view, there is no need for a lexicon in the traditional sense, and there are no „types‟ stored in our brain, since we never encounter an actual „type‟. Instead, as Elman puts it, “lexical knowledge is implicit in the effects that words have on internal states”. The effect the word will have on a mental state is always going to be a little bit different, since its effect is unavoidably influenced by the ever changing context. This does not just go for ambiguous words of which the meaning is resolved by context; it also goes for words that we would consider quite unambiguous and straightforward in meaning. Take for example the word „run‟, a word that seems to have a straightforward meaning. Still, the word „run‟ does not mean the exact same thing in (a) and (b) below:

(a) The boss runs the company alone. (b) The child runs around the playground.

(17)

12 (c) The athlete runs towards the finish line.

(d) The jaguar runs across the open plain.

Below, the use of „runs‟ with different agents has been visualised in state space:

Figure 1. Figure 4 from Elman (2004). The verb ‘run’ preceded by different nouns, resulting in a different location of the verb in state space.

Figure 1 shows how some senses of „run‟ are more closely related than others, but none of them are exactly the same. This poses a problem for the mental lexicon in the way it is traditionally understood; obviously, the word „runs‟ will not have a separate entry for its use with all separate agents. We have not yet even touched upon the different tenses of a verb, the use of different patients with the verb, location, the filler of the instrument role, or the

(18)

13 different instances of the same type, by envisaging every lexical representation as inhabiting a bounded region of state space. Every occurrence of one word will then, even if it never

produces the exact same state ever again, produce a state within this same bounded region, and will thus be correctly identified.

Spivey and Dale (2004) also oppose the idea of a mental lexicon containing stable, discrete, symbolic representations. The most important features of the representations in the human mind making them unsuitable for comparison with computer-like symbolic

representation is their continuity in time and their continuity in space. The continuous, temporal dynamics of cognitive processing in the brain would make it impossible to

distinguish between discrete, mental representations. In fact, according to Spivey and Dale, these “graded mental states appear to be more than just temporary transitions between discrete mental representations, but instead may be the modus operandi of the mind” (2004, 91). They do not want to argue that pure mental states do not exist as attractor positions in state space, but rather that they hardly ever occur. In real time, all clues to categorise and thus recognise something are used, leading to a period of competition between different possibilities, up until the point where the clues only leave one option; that is, when the activation reaches a „basin of attraction‟ that surrounds the attractor point that is the „pure‟ mental state.

This continuity of mind theory is illustrated with examples from speech perception, word recognition and sentence recognition, since language comprehension in real time is an excellent example of a process that consists of continuous sensory input producing continuous cognitive processing. For the scope of this paper, the example about word recognition is most interesting. In order to prove that initially, different lexical items compete until the input uniquely specifies one lexical candidate, Spivey and Dale conducted an eye movement experiment. Subjects would have to manipulate real objects. On the table would be three objects, two of which were thought to initially compete because of their phonetic similarity (e.g. candy and candle, which are only recognised at the very end of the word). Eye

(19)

14

2.4 Pink Noise

This temporal continuity of language processing found by Elman and Spivey and Dale is one feature of a complex, dynamic system in which time is a factor that cannot be taken out of the equation. In complex dynamic systems, each step in time and each step in a process will influence the next. One by-product of the continuous processing on multiple time-scales can be found in the variation or noise signal that every experiment produces. Noise is usually discarded on the assumption of it being random. However, noise is more and more the subject of investigation and speculation, since in processes that rely on the interaction of many

components it is often found to not be random at all. In self-organising systems, noise is often found to show a pattern and to correlate over shorter and longer timescales. Both short range correlations - indicating that the response on one item influences the response on the

following item - and long range correlations are found in a wide variety of cognitive tasks. (Van Orden, Holden & Turvey, 2003; Holden, 2005; Thornton and Gilden, 2005).

Correlations that are both short and long range indicate components within the system to interact on multiple time scales. This is a sign of self-organisation taking place on different levels within the system. Correlated noise can be observed when visually inspecting a trial series, since it will show “a progression of nested, similarly shaped arcs or patterns of fluctuation” (Holden 2005, 289). A time series that is thus comprised of smaller copies of itself possesses a so-called fractal structure. In response time experiments, data with a fractal structure is said to contain pink noise, and refers to measures that are statistically dependent upon one another; long term fluctuations nest within themselves increasingly smaller, proportionately scaled fluctuations, which in turn nest within themselves even smaller patterns of fluctuation, etc.

(20)

15 fractal structure is referred to as pink noise. A spectral plot made of pink noise will show that changes of different sizes do not occur with the same frequency. Figure 2 below shows the typical scaling relation of pink noise.

Figure 2. Figure 1 from Kloos and Van Orden (2010). A Spectral Plot that shows the typical scaling of pink noise. Upper right: reaction times of one subject. Lower right: spectral plot of reaction times with an average

slope of -0.94 and four marked points referring to sinusoidal components displayed on the left.

The upper right plot displays the reaction times of one subject. The lower right shows a spectral plot constructed of this data; relative amplitude, or size of change, can be found on the vertical axis. The horizontal axis shows the frequency of change. The regression slope shows the scaling relation between the two, in this case -0.94, consistent with the scaling exponent of pink noise, which is α ≈ 1. To the left of figure two, the spectral plot is

decomposed into sine waves of different amplitudes. The sine wave on the bottom left shows relatively small changes in the data, that occur very frequently; these can be found in the many clustered dots in the bottom right of the spectral plot. On the top left the sine wave shows the big changes; these do not occur very often. This relates to the top left of the spectral plot; only a few dots show the few changes that occur of this size (Kloos and Van Orden, 2010). The third type of noise is referred to as brown noise. Brown noise displayed in a spectral plot will show a steeper slope than pink noise, of α ≈ 2. Where white noise is

(21)

16 associated with random behaviour and does not show correlation between measurements, brown noise is associated with over-regular behaviour, and shows very strong dependence between measurements. Pink noise can be found in between white noise and brown noise (Kloos and Van Orden, 2010).

Pink noise then, is found between random behaviour (white noise) and over-regular behaviour (brown noise). Pink noise can be observed when there is a balance between the two, and thus allows for both regular and random behaviour. Rigid, over-regular control only works in a very predictable environment, but fails when the environment becomes less predictable; over-random behaviour allows for flexible behaviour, but cannot take advantage of the predictable features of the environment. Behaviour that allows for both options thus offers an “optimal combination between stability and flexibility in control” (30). Kloos and van Orden have devised the following parameter of human performance variation:

Human performance variation: involuntary control uncontrollable DOF voluntary control ≈ controllable DOF

This control parameter explains all scaling-exponent findings known to the authors. This parameter explains intentions as constraints on behaviour; as temporary dynamical structures that are soft assembled to contribute constraints to control parameters. In case of exogenous control, voluntary control is restrained and vice versa. If voluntary control is reduced, scaling exponents will approach white noise; if there is a lot more voluntary control than involuntary control, scaling exponents will approach brown noise.

Behaviour that allows two opposing options constitutes an example of a critical state. In order for this balance to be maintained, critical states function as an attractor state. There is evidence that pink noise is a by-product of attractor states, since development and training have been found to change behaviour that elicits random white noise to more pink noise. This was found in a precision aiming task where participants had to draw lines as fast as possible between two dots that were 24 cm. apart using their non-dominant hand (Wijnants et al., 2009). The idea was that forcing participants to use their non-dominant hand would induce relatively unstable and uncoordinated behaviour that would leave plenty of room for

(22)

17 exponent of pink noise. This trend toward pink noise can be seen as attraction toward pink noise (Wijnants et al., 2009; Kloos and Van Orden, 2010).

There is some discussion as to whether pink noise is ubiquitous in human

performance. This discussion is outside of the scope of this paper; however, pink noise has at the very least become an often reported phenomena that can be found in a wide range of tasks, such as simple reaction time tasks (Van Orden et al. 2003, Wagenmakers et al. 2004),

tapping, human gait, skilled motor performance, repeated judgements of elapsed time, simple classifications, signal detection and discrimination, visual search and mental rotation (Van Orden et al. 2003, Thornton and Gilden, 2005). The question of whether pink noise can be found in psychophysical data is therefore hardly an interesting one anymore; the real question now, as justly posed by Wagenmakers et al. (2009) is: what does it mean? At present, pink noise is interpreted as a necessary consequence and by-product of the functioning of complex systems. Interaction dominant dynamics, so dynamics dependent on large numbers of

(23)

18

2.5 Statement of purpose

Understanding language processing as taking place continuously over time, during which more time is spent in unstable regions of state space than in relatively stable regions of state space, calls for a rethinking of experimental tools and linear analyses. After all, thinking of language processing as continuous and dynamic means that instead of discrete responses, there is a continuous, intertwined flow of activity. Individual items in for instance a word naming task may alter or redirect this flow of activity; however, it is impossible to partition the flow among individual response trials (Van Orden, 2003; Spivey and Dale, 2004).

Nevertheless, this is exactly what linear methods rely on; the assumption that the response to a single item is generated by that item alone and is thus independent from all previous trials and responses. Trial series are usually randomised to level out this effect; however, the underlying assumptions in conducting experiments in behavioural sciences would have to be revised radically if variability in response times scales with sample size. With mounting evidence of interdependent and continuous processing, it is about time to look at what non-linear analyses can contribute to our knowledge of real-time language processing. The present article tries to determine how realistic it is to attribute a single response time to the single trial that elicited it and the ensuing assumption that this signals a direct cause and effect

relationship. A word naming task was conducted in a subject‟s L1 Dutch and L2 English, lacking any internal conditions in order to try and capture the basic variation in language processing. The usual linear analyses such as ANOVAS were carried out to analyse the results between sessions and languages. Additionally, non-linear analyses including time series analysis and spectral analysis were used to analyse the data for any within session effects, to see if these can complement the linear analyses and give a more complete picture of the continuity of language processing.

(24)

19 I. Language processing is expected to show variability in both languages;

however, performance is more stable in the L1 than in the L2.

II. Language processing becomes more stable after use of the particular language. This effect will be greater for the L2 than for the L1, which is already more stable.

III. The assumption is that language processing is dynamic, continuous and self-organising. The noise is therefore expected to show a fractal pattern for both languages.

In order to test these hypotheses and really look at the baseline variability in language processing, it is important to limit external sources of variability. External sources of

variability can stem from the conditions used in an experiment, so a word naming task was devised that required the participant to only read words aloud from a screen. A word naming task was preferred to other language experiments such as lexical decision experiments since these allow for testing language performance without introducing any additional conditions, such as the use of nonwords. A word naming task merely requires a participant to produce language, and can thus be used to establish a baseline variability. Response times were recorded and subsequently analysed. To really look at variability in detail, the results of only one participant who did the experiment repeatedly over two years are analysed. Variability may change greatly across different individuals; moreover, looking at means across

individuals averages out much of the variability.

Different analyses will be used to look at the level of stability, or rather, variability, in the data gathered from the word naming task. These different ways actually correspond to measuring the variability on different timescales: one can look at how the variability changes between different sessions, and at patterns of variability within one session. Both hypotheses will be tested by looking at both types of variability in an attempt to give as complete a picture as possible. In order to test the first hypothesis that states that language processing is more stable in the L1 than in the L2, we will first compare the reaction times (RTs) and standard deviations (SDs) on each language version of the experiment. The standard deviation is a well known measure of variability; however, since RTs are probably higher in L2

(25)

20 shows how high the SD is in relation to the mean and allows for a more fair comparison of the difference in variability of the two languages. When the CV is compared across sessions and between languages, it will show whether the variability in language processing in the L1 and L2 changes over the course of hours, days, weeks, months or even years.

In order to look at the difference in variability in L1 and L2 language processing on the scale of seconds, non-linear analyses will be used that look at the variation within one session. Since the assumption here is that the flow of activity that follows one trial item and its response cannot be partitioned between items, the response on one item will affect a number of successive responses. Autocorrelations will be used to determine the longitude of the influence of one single trial on successive trials. A language that is more entrenched and more stable is expected to show correlations that are less strong and that last for a shorter period of time (and thus a lower number of trials). In this case, the L1 should show

autocorrelations that are less strong and that last for a fewer number of successive trials than the L2.

A second analysis that will be used to look at correlations within the experiment is spectral analysis. This method will show, by the steepness of the slope in a spectral plot, whether the RT data exhibits pink noise, indicating a fractal structure. If a fractal structure is found, this means that long term correlations in the data nest within themselves shorter term correlations, that nest within themselves even shorter term correlations, etc. Pink noise is associated with a critical state that balances random behaviour with controlled behaviour. Since the prediction is that the L1 is more entrenched and stable than the L2, the L1 is expected to show a pinker signal that translates to a steeper slope in the spectral plot. Since the L2 is probably less stable, this will probably whiten the noise signal and thus result in slopes in the spectral power spectra that are less steep.

(26)

21 expected to be the same, only less strong as for the L2, since the L1 is normally used daily and thus thought to be quite stable. A period of non-use is therefore also not expected to exhibit a very strong effect.

3 Method

3.1 Pilot

The goal of the experiment was to look at variability inherent to language processing. Therefore, the variability caused by different word properties had to be reduced to a

minimum. For this purpose orthographically similar, four and five letter, one-syllable words were selected. Two versions of the experiment were made; a Dutch and an English version. Half of the subjected started with the Dutch version, the other half with the English version. All words were frequent words in the target language, and easy to pronounce. For both languages, 305 words were selected. 5 of these words were meant to let participants become acquainted with the procedure and get used to the microphone. The other 300 words were presented to them in three separate random blocks, with a minute break in between. A pilot was conducted in order to select 200 words that showed least variation between individuals. 6 participants took part in the experiment. All were native speakers of Dutch and advanced learners of English as a second language; all were students of English at the University of Groningen. The results of the pilot are shown in Tables 1 below:

Table 1. Mean RTs on both language versions of the pilot

Mean RTs Dutch Mean RTs English P. 1 458 493 P. 2 462 482 P. 3 450 487 P. 4 442 495 P. 5 453 442 P. 6 467 461

(27)

22

3.2 Single subject study

The goal of the experiment was to look at the performance of one subject over time. One participant took part in the experiment repeatedly over a period of 2 years. The subject always started with the English version. The experiment consisted of the 200 four and five letter words selected from the pilot. The words would be presented to the subject in a fixed order. The subject was instructed to read these words aloud into a microphone. The reaction times and responses were recorded.

3.3 Participant

The participant was (at the onset of testing) a 57 year old male professor of Linguistics at the University of Groningen. The participant was native Dutch, and an advanced learner of English.

3.4 Materials

For both the English and the Dutch version of the experiment, the words used were selected from the CELEX/Cobuild lexical database. The goal was to reduce variation based on the choice of items to a minimum. Therefore, all words had a CV onset, consisted of one syllable, and were easy to pronounce. Both the pilot and the actual experiment were run on the

computer programme E-prime 1.2 (Psychology Software Tools, 2001). The words that appeared on the screen had to be pronounced into a microphone attached to a PST Serial Response Box. The microphone was tested and its sensitivity optimised. The experiment was conducted in the same way as the pre-test, with the difference that there were 2 lists of 100 words which were presented in a fixed order.

3.5 Procedure

(28)

23 enough time to finish pronouncing the previous word. The procedure of the experiment is visually represented in Figure 1.

Figure 1. Procedure of the presentation of targets

The response times would be measured up to the point where the participant would start to pronounce the word on the screen, and were recorded using the computer programme E-Prime. The actual responses were recorded using a portable voice-recorder, to ensure wrong responses could be filtered out afterwards, and to be able to check later on that the response times would not be based on any sounds produced by the participant before pronunciation of the word, such as breathing or swallowing noises. Before the experiment started, the

participants would be presented with an instruction slide informing them to pronounce the words appearing on the screen as quickly as possible.

The participant was tested 14 times in 7 days over a period of two years. One session would take place early in the day, the other late in the afternoon. The participant was first tested April 17th 2007, and again the next day, to test variation between two consecutive days. The third session took place a month later to look at the variation across weeks. Another three months later, the participant took place for the fourth time, this time after a week during which he had only used his L2 English. Two weeks later the fifth session took place, this time after a week during which he had only used his L1 Dutch, and had refrained completely from using his L2 English. The sixth session took place almost a year after the fifth one, and the

+

target

1000 ms

sound

(29)

24 seventh another year after that, to look at the variation over years. Table 2 below shows a complete overview of dates and times of all 14 sessions.

Table 2. Complete overview of dates and times of all test sessions

Session Date and time of testing

1.1 17/4/2007, 1 p.m. 1.2 17/4/2007, 6 p.m. 2.1 18/4/2007, 1 p.m. 2.2 18/4/2007, 6 p.m. 3.1 23/5/2007, 9 a.m. 3.2 23/5/2007, 4 p.m. 4.1 9/8/2007, 10 a.m. 4.2 9/8/2007, 2 p.m. 5.1 20/8/2007, 10 a.m. 5.2 20/8/2007, 1 p.m. 6.1 17/6/2008, 10 a.m. 6.2 17/6/2008, 1 p.m. 7.1 17/6/2009, 10 a.m. 7.2 17/6/2009, 5 p.m.

Throughout the rest of the article, the different test sessions will be referred to using the way they are numbered in Table 1 above. The sessions are numbered in chronological order of testing. The first number refers to the day of testing, the number after the dot refers to the time of testing; the ones numbered .1 being the morning sessions, the one numbered .2 being the afternoon sessions.

3.6 Analyses

Before any of the analyses were conducted, extreme outliers were removed from the data. All data points with values that were more than 3 SDs from the mean were removed from the trial series. This affected less than 1% of the data. The data per session makes up one time series; for the autocorrelations analysis, lagged variables of each time series were created, which is the same time series shifted down one position. Correlations with the first lag correlate all trials in the time series to the next trial, the second lag correlates all trials to the second trial following it, etc. Autocorrelations were calculated up to the 16th lag, which was the lag up to which the effect was still clearly observable on especially the data from the English language sessions.

Spectral Analysis transforms the data from the time domain into a frequency domain by a Fast-Fourier transformation. The best-fitting sum of sine and cosine waves are

(30)

25 scales. The slope of the fit line in this graph is the statistic of interest; a slope of ≈ 0 shows the structure of the signal to be random (white noise), while a steeper slope of ≈ -1 indicates a fractal structure that is associated with a balance between over-random and over-regular tendencies. Spectral analysis does not allow for missing values. The outliers that had been removed left some gaps in the data, and in order to leave the time series as intact as possible, these were not substituted by other values but were simply closed by moving the data

following this gap up one position.

4 Results

4.1 Linear Analyses

4.1.1 Visual Inspection, ANOVAS, CVs

The data collected of the subject spans a long period of time, and was taken under varying circumstances. Data sets 4.1 and 4.2 were collected after a week during which the subject was using his L2 English only, both privately and professionally. Data sets 5.1 and 5.2 were collected after 7 days of refraining from the use of his L2 English completely. The other data sets were not controlled for context; since the subject is a university professor, he used English quite frequently. Since sessions 4 and 5 have been controlled for context, these two will be treated separately in some of the analyses below where the context effect might come into play. Data sets 1 and 2 were collected on two consecutive days, data set 3 was collected 3 weeks later. Data set 6 was collected the following year, and data set 7 another year later. Since there is no difference in language context for sessions 1, 2, 3, 6 and 7, these will first be looked at together, to see if there is any pattern or learning effect. Table 2 below shows the mean response times (RTs) for sessions 1,2,3, 6 and 7. These RTs have been visualised in Figure 2.

Table 2. Mean RTs in ms. for English and Dutch sessions 1, 2, 3, 6 and 7

1.1 1.2 2.1 2.2 3.1 3.2 6.1 6.2 7.1 7.2

Dutch 454 471 465 470 464 481 459 472 480 483

(31)

26

Figure 2. Mean RTs in ms. with linear fit line for English and Dutch sessions 1, 2, 3, 6 and 7.

First of all, the mean RTs of the Dutch sessions of the experiment are consistently lower than the mean RTs of the English sessions. A repeated measures ANOVA analysis shows this to be significant, F(1,169) = 354, p<0.01. What is interesting to note about figure 1, is that the results on the Dutch sessions appear to be more stable and show less variability; in fact, the difference between the lowest mean RT (454, session 1.1) and the highest mean RT (483, session 7.2) is only 29 ms. The difference between the lowest mean RT (479, session 2.2) and the highest mean RT (529, session 1.1) on the English sessions is 50. One last point of interest that can be observed in figure 1 is that there seems to be a relation between the mean RTs on the Dutch and English sessions; when the RTs on the English sessions increase, the RTs on the Dutch sessions seem to decrease and vice versa. This also holds when looking at the trends overall; RTs on the Dutch sessions increase slightly over the period of testing, while the RTs on the English sessions decrease at about the same pace. A paired samples T-test showed the correlation of -.533 to not be significant, with p=0.11.

One way of comparing the variation between the different test sessions and languages, is looking at the standard deviations (SDs) of the test sessions. Figure 2 below shows the Dutch mean RTs and the mean SDs of session 1,2, 3, 6 and 7. Figure 3 shows the mean RTs and corresponding SDs of the same English sessions.

(32)

27

Figure 2. Mean RTs and SDs for Dutch sessions 1,2,3,6 and 7.

Figure 3. Mean RTs and SDs for English sessions 1,2,3,6 and 7.

One difference in comparing figure 2 and 3, is that the Dutch RTs are lower than the English RTs, as we have seen earlier. More interesting to note when looking at figures 2 and 3, is the relation between the SDs and the RTs. Figure 2 shows the mean RTs and SDs for the Dutch sessions, and visual inspection of figure 2 shows that the mean SDs of the Dutch sessions seem to follow the same pattern as the mean RTs; higher RTs seem to correspond consistently to higher SDs, and lower RTs to lower SDs. Figure 3 shows the mean RTs and SDs for the English sessions. The relationship between the mean RTs and mean SDs for the English sessions seems to be a lot more diffuse and a lot more chaotic.

(33)

28

Table 3. Mean RTs, SDs and CVs for Dutch sessions 1 to 3.

Session Dutch Mean RT SD CV

1.1 454 30 0.07 1.2 470 43 0.09 2.1 465 38 0.08 2.2 470 38 0.08 3.1 464 36 0.08 3.2 481 41 0.09

Table 4. Mean RTs, SDs and CVs for English sessions 1 to 3.

Session English Mean RT SD CV

1.1 528 50 0.09 1.2 522 53 0.10 2.1 513 52 0.10 2.2 479 38 0.08 3.1 516 41 0.08 3.2 489 44 0.09

The absolute scores for the SDs show that the SDs on the English sessions were higher than the SDs on the Dutch sessions. This is to be expected, since the mean RTs on the English sessions are also significantly higher. The CVs show the relation between the Mean RTs and the SDs. The CVs on the first three English sessions are indeed higher, which means that the SDs are not only higher for the English sessions in an absolute sense, but they are also higher in relation to the RTs. For the last three sessions, the CV scores are the same for the English and the Dutch sessions.

Tables 5 and 6 show the Mean RTs, SDs and CVs for the Dutch and English sessions 6 and 7.

Table 5. The mean RTs, SDs and CVs for Dutch sessions 6 and 7.

Session Dutch Mean RT SD CV

6.1 459 32 0.07

6.2 472 34 0.07

7.1 480 36 0.07

7.2 483 49 0.10

Table 6. The mean RTs, SDs and CVs for English sessions 6 and 7.

Session English Mean RT SD CV

6.1 510 37 0.07

6.2 520 42 0.08

7.1 509 36 0.07

(34)

29 The mean RTs of the Dutch and English sessions 6 and 7 do not differ much from those found in sessions 1 to 3. However, the SDs for the English sessions do appear to be lower than for sessions 1 to 3. This is confirmed when looking at the CVs for the English sessions; these do not seem to differ much from the CVs for the Dutch sessions. In other words, even though the absolute RTs and SDs on the English sessions are higher, which is typical for an L2 speaker, the SDs on sessions 6 and 7 are not higher than those on the Dutch sessions in proportion to the mean RTs. In fact, the last Dutch afternoon session 7.2 seems to be an outlier. Even though the mean RT is not much above what we found for the other Dutch sessions, the SD and thus the corresponding CV is much higher. This might be an effect brought on by fatigue, although the English session administered the same afternoon shows no such effect.

The mean RTs, SDs and CVs for the Dutch and English sessions 4 and 5 are shown in tables 7 and 8:

Session Dutch Mean SD CV

4.1 492 34 0.07

4.2 490 33 0.07

5.1 455 30 0.07

5.2 481 36 0.07

Session English Mean SD CV

4.1 547 44 0.08

4.2 528 43 0.08

5.1 503 36 0.07

5.2 498 36 0.07

Sessions 4 and 5 are treated separately since these were controlled for context. Session 4 was conducted after 7 days of only using English, and session 5 after 7 days of only using Dutch. Comparing the RTs for both languages shows that the mean RTs on the English sessions are all higher than the mean RTs on the Dutch sessions. In this respect then, sessions 4 and 5 were not different from the other sessions. The RTs are higher on session 4, so after 7 days of only using English, for both languages. However, when looking at the absolute scores, it is hard to tell whether language context has affected performance. The CVs are quite constant, but show a bit more variability in the English language session 4. A repeated measures ANOVA

(35)

30 context effect. The analysis confirmed the performance on the Dutch sessions to be faster than on the English sessions, F(1,174) = 333.8, p<0.01. The main effect of context was also found to be significant, F(1,174) = 334, p<0.01, with slower response times on both language versions after the „All English‟ period. The main effect of Time was not significant, which means there was no significant effect found for testing in the morning or the afternoon. The interaction between Context and Language was found to be significant, F(1,174) = 18,6, p<0.01. Figure 4 shows the interaction to be strongest in the „All English‟ context. The interaction between Context and Time was also significant, F(1,174) = 29.1, p<0.01, the strongest context effect occurring in the morning sessions. This interaction is illustrated in Figure 5.

Figure 4. Mean RTs for the interaction between Context and Language.

(36)

31

Figure 5. Mean RTs for the interaction between Time and Context.

Since both languages were always tested in the mornings and the afternoons, a repeated measures ANOVA analysis including all 14 sessions was conducted to test whether time of testing affected RTs. The main effect of time was again found not to be significant. The interaction between time of testing and language however was found to be significant, F(1,12) = 7.8, p=0.02. Figure 6 below shows the interaction; performance on the Dutch sessions is faster in the mornings and slower in the afternoons. The English sessions show the reverse effect; performance on these is slower in the mornings and faster in the afternoons.

(37)

32

Figure 6. Mean RTs for the interaction between Language and Time of testing.

4.1.2 Discussion Linear Analyses

Figure 2 showing the mean RTs of the Dutch and English sessions 1,2,3, 6 and 7 shows how two languages in the bilingual mind might interact. When performance on the task in the L1 was very fast, performance on the L2 seemed quite slow and vice versa: when performance on the L2 was very fast, performance on the L1 seemed slower. This seems to show that both languages in the bilingual mind are closely interconnected. Of course, the relation between the two languages as observed in figure 1 was not found to be significant; however, since the correlation was calculated over only 7 sessions per language, the correlation of -.533 with p=0.11 could be interpreted as a trend towards correlation that does not reach significance due to the small number of data points. This pattern would be consistent with both languages being situated in interconnected networks, in which activation of one language leads to inhibition of the other language(s) in the system, as proposed by Meara (2006).

The difference in variability of the two languages was first judged by looking at the SDs. The SDs on the Dutch sessions visually neatly corresponded to the RTs. The SDs consistently increased and decreased in tandem with the RTs. This was in contrast with the

(38)

33 SDs on the English sessions in which no pattern can be seen; the relation between the RTs and SDs appeared a lot more random. To get more insight into the difference in variability

between the languages, CVs were calculated for all sessions after the example of Segalowitz. et al. (1995), who found that practice in the L2 led to lower SDs in relation to the RTs,

indicating that what Segalowitz et al. referred to as the „blend of underlying mechanisms‟ had become more efficient and automatized. In this case, CVs would be expected to be lower for the Dutch sessions to start with, since one‟s native language should be optimally efficient and automatized. CVs should not decrease much over time, since there should not be much room for improvement in the L1. The English sessions might show decreasing CVs over time, due to practice and increased efficiency and automatization in the „blend of underlying

mechanisms‟. This expectation is in part confirmed by the data; the later sessions in English, 6 and 7, indeed show lower CVs than session 1, 2 and 3, and thus show less variability. This, however, also seems to apply to the Dutch sessions; sessions 6 and 7 also show lower CVs than sessions 1, 2 and 3, even though the difference is a bit less prominent. This can be explained by the design of the experiment; even though the variability decreases a bit more for the sessions in the L2, for an advanced learner of the L2, possibilities for improvement in the L2 are probably limited on a simple word naming task.

The CVs were also calculated for sessions 4 and 5. Since these sessions were

administered in a different context - session 4 after 7 days of only using English and session 5 after 7 days of only using Dutch - it is of particular interest to see if there is any context effect here. The CV scores did not show an unambiguous context effect; only English session 4 showed slightly higher CVs, which is counter-intuitive, since after 7 days of only using English, you would expect the reverse effect. However, this is not inconsistent with dynamic systems theory; use of the L2 may have stimulated the L2 system to develop, which could very well lead to slightly higher variability. The otherwise rather similar CVs on the English sessions as compared to the Dutch ones from session 4 onwards again may indicate that performance may simply already be at ceiling as far as variability between sessions is concerned, due to the simplicity of the word naming task.

Even though the context effect did not show in dramatically increased or decreased SDs overall, a repeated measures ANOVA analysis did confirm a context effect. Not very surprisingly, the analysis showed RTs to be consistently lower on the Dutch sessions.

(39)

34 was found for time of testing; however, the interaction between time of testing and context was found to be significant. In the „All Dutch‟ context, performance on all sessions was found to be faster in the mornings and slower in the afternoons. In the „All English‟ context, it was the other way around; performance was found to be slower in the mornings and faster in the afternoons. When looking at the effect of time of testing for all 14 sessions, there is also a significant interaction between time of testing and language. Performance on the Dutch sessions is faster in the morning and slower in the afternoon, while performance on the English sessions is slower in the morning and faster in the afternoon. Performance on the Dutch sessions being fast in the morning could be due to the language being so entrenched that it does not need much „practice‟ before being able to perform fast on a language task. The L2 of this advanced speaker will be less entrenched, which is probably why performance is faster in the afternoons; usage of the L2 activates the system. Whether performance on the Dutch sessions being slower in the afternoon is a consequence of fatigue or inhibition due to more activation in the L2 is hard to determine. Cattell (1886) claims there was hardly an effect of fatigue after his gruelling, all-day testing; however, his RTs are not reliable enough by today‟s standards. It is plausible that with the techniques available now, that enable us to measure RTs very accurately in ms., we are measuring a slight effect of fatigue in the afternoon sessions. If that is the case here, that would mean the effect of usage of the L2 on RTs would be strong enough to override a general tendency to perform slower in the afternoons. Whether this is the case, or whether usage of the L2 leads to almost immediate inhibition of the L1 resulting in higher RTs might become more clear in the next section. The mean RTs and SDs on the different language sessions have now been compared between sessions; we will next zoom in on effects that take place between individual items within a test session.

4.2 Non-linear Analyses 4.2.1 Autocorrelations

(40)

35

Table 9. Correlations between the RTs on individual items in Dutch and English sessions 1 to 3.

English Dutch 1.1 1.2 2.1 2.2 3.1 3.2 1.1 0,45** 0,10 0,32** 0,17* 0,30** 1.2 -0,16* 0,25** 0,22** 0,22** 0,24** 2.1 0,09 0,19** 0,15* 0,01 0,14* 2.2 0,10 0,09 0,08 0,19** 0,13 3.1 0,03 0,16* 0,18* 0,16 0,21** 3.2 -0,03 0,12 0,13 -0,13 0,02

Table 9 shows that the correlations are generally quite low. Especially the correlations on the Dutch items are very low; the correlations for the English items are a bit higher. Considering that these are the exact same words were tested in the exact same order, correlations on the English sessions are also very low, with the exception of the correlation between the English morning and afternoon session on the first day of testing.

The correlations of the items across sessions have also been calculated for the other sessions. Table 10 shows the correlations for sessions 4 and 5, and Table 11 shows the correlations for sessions 6 and 7:

Table 10. Correlations between the RTs on individual items in Dutch and English sessions 4 and 5.

Table 11. Correlations between the RTs on individual items in Dutch and English sessions 6 and 7.

Again, correlations between items across sessions are low. The exact same items tested in the exact same order render different RTs when tested at different times.

Autocorrelations can be calculated to discover whether there is correlation within a session; in other words, whether the effect of an item spills over in the items that follow it. Figure 7 shows the autocorrelations up to 16 lags, which was found to be the point after which the effect of an item on subsequent items had tapered off completely. Figure 7 A, B and C

(41)

36 show the autocorrelations for the Dutch data, Figure 7 E, D and F show the autocorrelations for the English data. Figure 7 A and D show the autocorrelations for the first session in both languages, Figure 7 B and E show the autocorrelations for the means of all sessions and Figure 7 C and F show the autocorrelations of the same data as used in B and E, randomised.

Figure 7. Figures A-C show the autocorrelations of the Dutch data. Figure D-F show the autocorrelations of the English data. Figures A and D show the autocorrelations of the first sessions. Figures B and E show the autocorrelations of the means of all 14 sessions. Figures C and F show the autocorrelations for the same data after is has been randomised.

(42)

37 Figure 7 A shows the autocorrelations up to lag 16 for the first Dutch morning session 1.1. The graph shows moderately strong correlations between an item and the items that follow it up to lag 6. An item therefore shows a gradually diminishing effect, moderately strong for the first 6 items following it and lasting until up to 16 items later. The effect of an item is

strongest in the item immediately following it, with r = 0.28; and gradually declines until it is absent 16 items later, with r = 0.004. All correlations shown in Figure 7 A are significant, p < 0.01. Figure 7 B shows the autocorrelations for the first English morning session. These are stronger as compared to the Dutch sessions; the first item following any item showing a strong correlation of r = 0.4. After 16 successive items the effect has again gradually

diminished to r = 0.17. For the English sessions, the effect is again significant for all 16 lags, p < 0.01. Figure 7 B shows the autocorrelations of the means of all 14 Dutch sessions.

Performance on the Dutch items again correlates with the 16 items following. Figure E shows the autocorrelations up to 16 lags for the means of all 14 English sessions. Performance on the English items therefore did not only correlate more strongly for the first session, as seen when comparing Figure 7 A and D; correlations are also stronger when calculated over all sessions. The autocorrelations found for the means of all 14 English sessions started at r = 0.37 for the first lag and gradually diminishing to r = 0.07 on the 16th lag. The autocorrelations found for the means of all 14 Dutch sessions started at r = 0.16 for the first lag, and did not diminish regularly; r = 0.18 for the 16th lag. The autocorrelations calculated for the means of both languages were found to be significant, p = < 0.01. Figures 7 C and F show the

autocorrelations calculated for the same means of all 14 Dutch sessions and all 14 English sessions respectively. However, the order of the original time series has been randomised for these analyses. The correlations found are clearly much lower. For the Dutch sessions, p-values for all 16 lags varied between 0.6 and 0.8. Even though some of the correlations for the English means seen in Figure 7 F seem quite high, none of these were significant either, with p-values ranging from 0.5 to 0.9.

4.2.2 Discussion Autocorrelations

Looking at the performance on the individual items within sessions instead of at the means across sessions makes performance on both languages seem more variable.

Correlations were calculated to gauge the stability of the same items across sessions.