• No results found

A Complexity Analysis of L2 English Academic Writing

N/A
N/A
Protected

Academic year: 2021

Share "A Complexity Analysis of L2 English Academic Writing"

Copied!
81
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A Complexity Analysis of L2 English

Academic Writing

John O'Leary

S3803775

MA in Applied Linguistics

Faculty of Liberal Arts

Rijksuniversiteit Groningen

Supervisors:

Dr. R.G.A. Steinkrauss

Dr. M.C. Michel

(2)

Declaration of Authenticity

MA Applied Linguistics - 2016/2017 MA-thesis

Student name: John O'Leary Student number: S3803775

PLAGIARISM is the presentation by a student of an assignment or piece of work which has in fact been copied in whole, in part, or in paraphrase from another student's work, or from any other source (e.g. published books or periodicals or material from Internet sites), without due acknowledgement in the text.

TEAMWORK: Students are encouraged to work with each other to develop their generic skills and increase their knowledge and understanding of the curriculum. Such teamwork includes general discussion and sharing of ideas on the curriculum. All written work must however (without specific authorization to the contrary) be done by individual students. Students are neither permitted to copy any part of another student’s work nor permitted to allow their own work to be copied by other students.

DECLARATION

 I declare that all work submitted for assessment of this MA-thesis is my own work and does not involve plagiarism or teamwork other than that authorised in the general terms above or that authorised and documented for any particular piece of work.

Signed________John O'Leary_____________________________________________________

(3)

LIST OF ABBREVIATIONS

ACL Average Clause Length

ASL-m Average Sentence Length (in morphemes) ASL-w Average Sentence Length (in words)

AWL Academic Word List

C/S Clauses per Sentence

CLAN Computerised Language Analysis Program CN/S Complex Nominals per Sentence

DC/C Dependent Clauses per Clause

DST Dynamic Systems Theory

L2 Second Language

L2SCA L2 Syntactical Complexity Analyzer

LFP Lexical Frequency Profile

(4)

CONTENTS ABSTRACT ... 6 1 INTRODUCTION ... 7 1.1 Aims... 8 1.2 Overview ... 8 2 BACKGROUND ... 11 2.1 Language Development ... 11

2.1.1 Dynamic Systems Theory ... 11

2.1.2 Development ... 13

2.1.3 Research Implications ... 15

2.2 Defining Complexity ... 16

2.3 Operationalising Complexity ... 19

2.3.1 Operationalising Lexical Complexity ... 19

2.3.2 Operationalising Syntactic Complexity ... 26

2.4 Research Questions ... 31

3 METHOD ... 34

3.1 Participants and Essays ... 34

3.2 Tools ... 35

3.3 Sample Selection ... 36

3.4 Text Preparation ... 39

3.5 Taking the Measures ... 40

3.5.1 Lexical Sophistication ... 41

3.5.2 Lexical Diversity ... 42

3.5.3 Lexical Density ... 42

3.5.4 Overall Syntactic Complexity ... 43

3.5.5 Clausal Complexity and Phrasal Complexity ... 44

3.6 The Pairwise Test ... 45

3.6.1 Background ... 45

3.6.2 Participants ... 47

3.6.3 Design and Materials ... 48

3.6.4 Procedure ... 48

3.7 Data Analysis ... 49

3.7.1 Research Question 1 ... 49

3.7.2 Research Question 2 ... 49

(5)

4 RESULTS AND DISCUSSION ... 51

4.1 The Complexity Measures ... 51

4.2 Research Question 1 ... 52

4.3 Research Question 2 ... 57

4.4 Research Question 3 ... 60

5 CONCLUSION ... 71

5.1 Summary of Findings ... 71

5.2 Limitations and Recommendations ... 72

5.3 Practical and Theoretical Implications ... 74

REFERENCES ... 76

APPENDICES ... 81

(6)

ABSTRACT

According to the Dynamic Systems Theory framework, language is a dynamic system and language development is a dynamic process. This implies that language development is an individually owned process and that it is better to analyse development in individual cases before exploring common patterns across learners. This study focussed on the development of linguistic complexity in the English academic writing of three Dutch university students over a period of three to four years using 12 complexity measures. A correlation analysis using pairwise ratings revealed that the lexical frequency profile and the number of complex nominals per sentence were the two measures which best predicted academic quality. A correlation analysis using the order in which the texts were written revealed that no single measure emerged as a reliable predictor of development over time for all three students. This was unsurprising given the dynamic nature of language development. In the final part of the study, potential interactions between the lexical and syntactic subsystems of the three students during development were explored by analysing correlations between measures associated with each subsystem. A number of negative correlations emerged between the lexical and syntactic measures, the strongest being those between the measures of lexical diversity and phrasal complexity. This would suggest a competitive relationship between the subsystems associated with these two measures.

(7)

1 INTRODUCTION

The Dynamic Systems Theory (DST) view of language holds that language is a dynamic system and that language development is a dynamic process. The theory aims to describe and explain how a learner’s language system develops owing to its interaction with the environment and principles of self-organisation (Verspoor, Lowie and van Dijk, 2008, p.214). It sees language development as non-linear and as involving chaotic variation over time (de Bot and Larsen-Freeman, 2011). A consequence of this view is that it forces the researcher to see language development as “an individually owned process” and that, therefore, it should not be assumed that generalisation beyond the individual is possible (Verspoor, Lowie, Chan and Vahtrick, 2017, p.26). With this in mind, an analysis of language development should begin by tracing development in individual cases before exploring common patterns across learners.

The present study focuses on the development of linguistic complexity in the academic writing of three advanced L2 learners of English over a period of three to four years. L2 English academic writing is an interesting area of focus not only because it is a highly valued skill necessary for the attainment of many academic qualifications across the world, but also because it is a highly challenging form of language production which tests a learner to the limit of their abilities. As such, analysing academic writing provides the researcher with an insight into the sort of linguistic development taking place in advanced level learners. Linguistic complexity is itself a useful construct to focus on in academic writing since it can be regarded as a reliable index of language development and progress (Verspoor et al., 2017, p.1). For the present study, 12 complexity measures will be traced in texts from three Dutch university students over a period of three to four years and emerging patterns will be explored.

(8)

1.1 Aims

The main aim of the study is to trace the development of linguistic complexity in the writing of these three advanced L2 learners of English and interpret the findings from a DST perspective. In particular, the study is interested in which complexity measures best capture the overall quality of academic writing, which measures best capture development over time and, finally, what kind of interaction exists between the lexical and syntactic subsystems of the three advanced L2 learners. To a certain extent, the present study mirrors that of Verspoor et al. (2017) which also analysed which complexity measures captured both the quality of academic writing and development over time. However, this study differs from Verspoor et al. in several respects, including the choice of which complexity measures to use and in focussing on the interaction between subsystems during the final part of the study.

As well as answering its research questions, the study aims to be as rigorous as possible and a number of techniques and tools will be adopted to ensure the best quality of results. This includes a more systematic method of sample selection than that used in previous studies using a program written in Python specifically for this purpose. Several other, shorter programs written to aid text preparation will also be used and described as part of the study. Along with these programs, a number of other freely available tools will also be used which will enable the future replication of this study.

1.2 Overview

In order to contextualise the present study, key aspects of DST will be elucidated in section 2.1 with a particular focus on how a DST view of language development influences the interpretation of data from a language development study. This includes a discussion of the

(9)

notion of a ‘carrying capacity’ and how the existence of finite resources often leads to a competitive relationship different sub-systems of a learner's language system. Following this, the notion of ‘complexity’ itself will be analysed in section 2.2 and a definition will be offered. This is in response to Bulté and Housen (2012) who complain that complexity is often poorly defined in L2 studies. Once complexity has been defined, section 2.3 will describe in detail how complexity is to be operationalised in the present study. This involves the division of linguistic complexity into its lexical and syntactic dimensions with each dimension further divided into three separate subconstructs. In total, 12 measures will be identified as suitable means of tapping into these subconstructs of lexical and syntactic complexity. Research questions shall be defined in section 2.4.

Broadly speaking, the research section of the study consists of three parts. The first part involves using a pairwise comparison test to produce quality ratings for all the texts written by one of the learners. The texts will be rated by a group of 56 university students attending an English for academic purposes course who are each required select the best text from a given pair. The ratings produced by the pairwise comparison test will then be correlated with the 12 complexity measures in order to determine which of these measures best captures the overall quality of this learner’s academic writing.

The second part of the study involves testing for correlations between the 12 measures taken and the order in which the texts were written. In this part of the study, the measures will be taken from the texts produced by all three of the learners. The aim here is to explore which complexity measures best capture development over time for each of the three learners.

(10)

The final part of the study involves a closer analysis of all the measures found to correlate with the text ratings and the order in which they were written followed by an interpretation of the developmental trajectories found from a DST perspective. A further analysis will then explore any potential interactions between syntactic complexity and lexical complexity using a combination of statistical and visualisation techniques.

(11)

2 BACKGROUND

2.1 Language Development

2.1.1 Dynamic Systems Theory

Dynamic systems theory (DST) began as a branch of theoretical mathematics designed to model the development of complex systems in the physical world (de Bot and Larsen-Freeman, 2011, p. 9). It has found application in the field of language studies as a useful tool to understand language development, itself seen as a kind of complex system. The characteristics of a complex system according to this view are numerous, but those germane to the present study are that language development is non-linear, involves complete interconnectedness, chaotic variation over time and constant change in which the system settles into ‘attractor states’ (de Bot and Larsen-Freeman, 2011).

Language development is seen as non-linear in the sense that it is difficult to identify a proportionate effect for a given cause (de Bot and Larsen-Freeman, 2011). Investing twice the time learning vocabulary, for example, does not necessarily lead to the acquisition of twice the number of words. This is related to the second characteristic, the complete interconnectedness of the system. In a dynamic system, all parts are connected to all other parts and so changes in one part of the system will have an impact on other parts. It is difficult, therefore, to trace a given effect to any particular cause. A consequence of the non-linearity and interconnectedness of a dynamic system is that language development often involves chaotic variation over time. A language learner may pass through phases of fluency and disfluency for reasons which are difficult to identify.

(12)

It is important to note that describing a language system as ‘chaotic’ or ‘random’ should not be taken as an ontological claim. For the present study, language development will not be considered as random in any real sense, only as random in appearance given the complex nature of the system. Language development will be considered a predictable process in principle, though it may often be unpredictable in practice. This is a distinction not always made clear in the literature, but is an important one. If language development were in any real sense unpredictable, investigating its properties in a study like this would be of little value.

The final property of language development according to DST which has relevance here, is that it involves constant change in which the system settles into ‘attractor states’. The idea is that as systems interact with their environments, they reorganise themselves as a result of internal changes (de Bot and Larsen-Freeman, 2011). However, whilst these internal changes are always ongoing, the system may settle into states of relative stability. These states, known as attractor states, are ‘preferred’ states, that is, the system tends toward this state at a particular point in time. A useful analogy provided by de Bot, Lowie and Verspoor (2007, p.8) concerns the different ways a horse may run. It can trot or it can gallop, but there is no in-between way of running. This analogy is useful since it captures the dynamic nature of an attractor state. Such a state is not static, but merely stable. It remains a dynamic system subject to continual change.

A further consequence of a DST view of language development is how it affects the interpretation of any variability which may be found as a complexity measure is traced over time. It is tempting to view such variability as ‘noise’ and employ various statistical methods to smooth out the variability in order to reveal general trends. However, according to a DST approach, variability should not be considered noise by default, but looked upon as a source of

(13)

potentially useful information. Variability within a system or sub-system is seen as “an inherent property of a changing system” (de Bot et al., 2007, p. 14) and can be seen as a result of a system’s “flexibility and adaptability to the environment” (van Dijk, 2003, p.129). It may be, for example, that when a system exhibits increased variability, it is a sign of progression from one attractor state to another.

2.1.2 Development

Van Geert (1991, p.3) provides a general definition of cognitive growth as “an autocatalytic quantitative increase in a growth variable”. Growth is autocatalytic in the sense that it is intrinsic to a system and not merely an addition from an outside source, while it is quantitative in that it is a property of a variable which can be measured. Following van Geert, development shall be considered here as both autocatalytic and quantitative, but not necessarily as an ‘increase’ since development may be associated with periods of variability involving the decrease of growth variables, at least in the short term. Development will, therefore, be defined as an autocatalytic quantitative increase (or decrease) in a development variable.

In order for development to take place, there must be resources to keep the process going (de Bot et al., 2007, p. 11). These include internal resources such as the capacity to learn, conceptual knowledge and motivational resources, and external resources related to the environment outside the individual (p. 11). The development itself is proportional to (amongst other things) the available resources (van Geert, 2008, p. 190) and, since these resources are finite, they have to be distributed amongst different subsystems often resulting in competition between those sub-systems (Verspoor et al., 2017, p.2). The concept of a ‘carrying capacity’ is relevant here which refers to “the state of knowledge that can be attained in a given participant’s

(14)

interlinked structure of resources” (Verspoor et al. , 2008, p.223). The state of knowledge a learner may achieve is dependent on available resources and their interlinked structure.

Two studies which have investigated competition between subsystems (between lexical and syntactic complexity) in L2 writing are Verspoor at al. (2008) and Caspi (2010). Verspoor et al., a case study of an advanced L2 learner of English, found an interesting relationship between type-token ratio, a lexical diversity measure, and average sentence length, a syntactic measure. When the two measures were traced over 18 assignments written by the learner, a competitive interaction emerged. When the type-token ratio increased, average sentence length decreased, with the reverse also being true. When represented graphically, an oscillating pattern emerged as the two measures interacted producing ‘waves’ which appeared to alternate almost perfectly (Verspoor et al., 2008, p.223). When the correlation coefficient between the two measures was tracked using a 5-point moving window, it also became clear that the two measures did not correlate negatively during the entire term, with periods of positive correlation during earlier and later stages. This implies that the relationship between the lexical and syntactic subsystems of this learner were in competition at certain times, but that this relationship was complex and changeable. The Caspi study, which traced writing development of four L2 learners of English (of differing nationalities) over a year, also found negative correlations between lexical and syntactic measures. For example, lexical accuracy (operationalised as correct lexical use) correlated negatively with syntactic complexity (operationalised as a subordination to clause ratio) in most of the texts analysed, although this too proved to be a complex relationship with periods of positive correlation (Caspi, 2010, p.142). These studies are interesting since they provide support for the view that finite resources are shared across an interconnected system with competition between subsystems.

(15)

2.1.3 Research Implications

Adopting a DST view of language development presents some significant challenges for the researcher. If language development is non-linear and interconnected, involves chaotic variation, and exhibits variability which cannot not be dismissed as ‘noise’, discerning patterns in the data and attempting to draw definitive conclusions is likely to be a fruitless endeavour. However, by adapting an approach to research which is sensitive to the demands of a DST viewpoint, it is possible to mitigate many of these difficulties.

A prerequisite of such an approach is to focus on the development of linguistic complexity in the language of individual learners. According to the DST view, language development is an individually owned process and it should not be assumed a priori that generalisation beyond the individual learner is possible (Verspoor et al., 2017, p. 26). On this view, a complexity analysis should proceed by tracing development in individual cases before exploring common patterns across learners.

Whilst it is true that certain measures provide a useful index of overall development across proficiency levels (Verspoor, Schmid and Xu, 2012), one should not always expect to see a linear increase in a measure over time. Rather, development will often be uneven, with periods of increase, stable states and even decreases. The development path of a given variable may also vary from learner to learner, as finite resources are shared between subsystems. As a development path changes, one may encounter variability as the system changes and adapts to its environment. Finally, one might also expect to find evidence of competitive relationships between variables. As the lexical system develops, for example, resources may be redirected from the syntactic system and vice versa.

(16)

2.2 Defining Complexity

The notion of ‘complexity’ (both lexical and syntactic) was alluded to several times in the previous section without proper explication of its meaning. Indeed, Bulté and Housen (2012) complain that many L2 studies inadequately define or fail to define altogether this term resulting in mixed and sometimes contradictory results. Verspoor et al. (2017), a study similar to the present one, attempts such a definition, defining linguistic complexity as “a quantitative property of language units” and that “the greater the number of components a construction has and the more levels of embedding it contains, the more complex it is” (p. 1).

Two problems are immediately apparent with this definition. Firstly, it contains in it no reference to variety. Presumably, Verspoor et al. would not consider a construction which contained 15 identical words to be equally as complex as a construction which contained 15 unique words. Secondly, the reference to ‘levels of embedding’ suggests a form of hierarchical complexity which may not fully capture the true complexity of a linguistic structure. Whilst we may speak of words and phrases as embedded in clauses and sentences and so on, it is not clear how elements linked by conjunctions (such as ‘and’, ‘if’, ‘but’) are ‘embedded’. Rather, the complexity here derives from a coordinate relationship between the conjoined linguistic elements rather than a hierarchical one. For the present study, a definition of complexity is needed which both serves to include the notion of ‘variety’ and to capture the complexity of a linguistic structure without couching it solely in terms of hierarchy1.

(17)

Before a definition can be formed, it is necessary to deal with a further problem which is apparent when we consider how one dimension of linguistic complexity, lexical complexity, is to be understood in light of a general definition of complexity. It is not obvious why a word frequency measure, such as the Academic Word List (AWL) measure utilised by Verspoor et al. (2017), should be considered a measure of complexity at all. Why, for example, should a verb like ‘cease’, which appears on list (Coxhead, 2000), be considered more complex than the verbs ‘end’, ‘finish’ or ‘stop’, which do not? It does not differ from these alternatives in terms of the number of components it has nor the levels of embedding it contains, but only in its frequency of use2 in academic texts.

To help shed light on this problem, it is useful to consider a distinction made by Bulté and Housen (2012, p. 26) between complexity on “an abstract theoretical level as a property of a (cognitive) system” and complexity on “a more concrete, observational level of language performance”. On this view, complexity exists in the language system of an individual learner (the theoretical level) and this complexity is made manifest in the language behaviour (either spoken or written) of this individual (the observational level). Given this distinction, it is necessary to be clear when we define linguistic complexity whether we are referring to complexity on the theoretical or the observational level. Indeed, this distinction is not made clear by Verspoor et al. (2017) which refers to complexity as “an index of language development and progress” (observational level) but subsequently claims that the study “will focus on the development of linguistic complexity in three advanced L2 learners” (theoretical level3).

2 In fact, to appear on the Academic Word List, a word must not only be used with great frequency in a broad

range of academic texts, but must also not appear the General Service List (West, 1953) of the most frequently occurring 2000 or so words (‘end’, ‘finish’ and ‘stop’ all do). (ref)

3 It is possible Verspoor et al. meant to say ‘in the writing of three advanced L2 learners’ (in this case it would

(18)

Since the aim of the present study is to trace the development of an L2 in three advanced learners, the form of complexity of greatest import is that which is located at the theoretical level, in the L2 system of the learner. At this level, lexical complexity can be considered the “elaboration, size, range, breadth of repertoire of L2 lexical items and collocations” (Bulté and Housen, 2012, p. 28). A word frequency measure can, on this view, be considered a complexity measure since it is an index of the vocabulary size (and perhaps more) of the L2 learner, even if word frequency itself (at the observational level) cannot be considered ‘complex’ or otherwise. The matter is more straightforward when grammatical complexity is considered, since both a syntactic or morphological structure can be considered complex at the observational level (in written or spoken form) as well as serving as an index to grammatical complexity at the theoretical level (in the L2 system of the learner).

With these problems in mind, an alternative definition of complexity to that provided by Verspoor et al. (2017) adapted from Rescher4 (1998, p. 1) will be used. Linguistic complexity (at the theoretical level) will be defined as the quantity and variety of a (theoretical) linguistic item’s constituent elements and of the elaborateness of their interrelational structure. This definition serves both to include the notion of ‘variety’ and to capture the complexity of a linguistic structure couched not in terms of hierarchy, but in more general terms which describe a structure’s organisational complexity. Complexity may also be found at the observational level in the writing or speech of an L2 learner and this will be considered an index of complexity at the theoretical level. Lexical features at the observational level (such as word frequency), though not complex in of themselves, will also be considered indices of

4 Rescher, a philosopher, was interested in complexity as a more general construct beyond the scope of

language. For him, the complexity of a system “is a matter of the quantity and variety of its constituent elements and of the interrelational elaborateness of their organizational and operational make-up” (Rescher, 1998, p. 1).

(19)

complexity at the theoretical level (and can, therefore, when operationalised, be considered ‘complexity measures’).

With regards to linguistic complexity, two major components of linguistic complexity can be identified as grammatical complexity and lexical complexity with grammatical complexity further analysed into syntactic and morphological complexity (Bulté and Housen, 2012). Since morphological development is most evident in the early stages of L2 development, the present study of advanced learners will focus only on the categories of lexical complexity and syntactic complexity.

2.3 Operationalising Complexity

In the previous section, a distinction was made between complexity on the theoretical level, in the L2 system of the learner, and its surface manifestation on the observational level, in L2 performance. But a third level, the “operational level of statistical constructs” (Bulté and Housen, 2012, p. 27) now needs to be considered. This is the level of various instruments and analytic measures which are designed to quantify complexity manifested at the observational level. Various length, frequency and diversity measures may be used at this level and if well chosen, will serve as an index of complexity manifested at the observational level and ultimately of complexity at the theoretical level. The operationalisation of both lexical and syntactic complexity will now be considered.

2.3.1 Operationalising Lexical Complexity

Following Bulté and Housen (2012), density, diversity and sophistication will be considered as the three primary aspects of lexical complexity at the observational level. In the

(20)

following section, each measure used to tap into the three constructs will be outlined complete with details of its calculation, the rationale behind its selection, limitations regarding its use, and examples of its use in previous research. A summary is provided in figure 1.

2.3.1.1 Lexical Sophistication

 Lexical Frequency Profile (LFP)  P_Lex

Both the LFP and P_Lex share the same underlying assumption that people with more proficient vocabularies use a larger proportion of infrequent words and that, therefore, a valid measure of lexical production is a measure of the quantity of infrequent words in a text (Laufer, 2012).

The LFP was proposed by Laufer and Nation (1995) and involves taking a text as raw input and describing its lexical content in terms of frequency bands. This is typically performed by a computer program which matches vocabulary frequency lists with the text under analysis.

(21)

The ‘classic’ version of the LFP analysed texts into four bands: the first 1000 most frequent words, the second 1000 most frequent, the Academic Word List (the 570 most frequently academic words (Coxhead, 2000)) and finally ‘off list’ words which do not occur in any of the lists (Laufer, 2012, p.1). This produces a lexical profile of four percentage values which correspond to the four bands. For example, a lexical profile of 70%–15%–10%–5% indicates that 70% of words in the text belong to the first 1000 words, 15% to the second 1000, 10% to the Academic Word List and 5% off-list words.

Laufer and Nation (1995) have suggested that the LFP could be used as a measure of the quality of lexis in the writing of second language learners and found that the LFP correlated well with an independent measure of vocabulary size. This has been criticised by Meara (2005) who claims that his Monte Carlo simulations of learner text production5 demonstrate that the LFP is not sensitive enough to pick up modest changes in vocabulary size. Laufer (2005, p.583), has responded by pointing out that the LFP measures vocabulary use (and ignores a learner’s passive vocabulary) and that, given the complex nature of lexical competence, one should not expect a small increase in vocabulary size to produce a significant difference in the vocabulary profile. Given these problems, the LFP will be considered here as an index of lexical sophistication in general sense, and not tied to a simplistic notion of a ‘vocabulary size’.

To deal with the LFP statistically, Laufer (1995), one of the original authors, recommended the Beyond 2000, a condensed profile of the LFP calculated from the percentage of words in the text which are not within the first 2000 words. Laufer argues that this provides a reliable and valid measure which has the advantage of producing a single figure which can

5 Using logarithmic weighting, a computer program simulates the kinds of text produced by imaginary learners

with a given vocabulary sizes. The LFP is applied to these imaginary texts to determine if it is sensitive to differences in vocabulary size.

(22)

be compared easily to other measures of development. The measure is far from ideal, however, since it clearly results in an information loss compared with the classical LFP measure (it effectively measures only two bands rather than four). This makes the measure insensitive to academic word usage since the ratio of academic words to off-list words will not impact on the Beyond 2000 profile. Another limitation of the condensed profile is that it precludes the use of word families rather than simply tokens as the unit of measurement. This is because it is only possible to establish which word family a particular item belongs to on the basis of its classification in one of the lists. Since the percentage calculated for Beyond 2000 includes off-list words, a token measurement must be taken from all four bands to make the calculation possible. An additional problem, highlighted by Meara and Bell (2001), is that the percentage of words falling outside the first two categories is typically very small (rarely exceeding 10%) which necessitates a longer sample length to achieve stable measures.

To address these shortcomings, Meara and Bell (2001) proposed the P_Lex measure. P_Lex works by looking at the distribution of difficult words in a text and calculating an index which reflects the likelihood of these words occurring. This involves dividing the text into 10-word segments and counting the number of difficult 10-words which occur within each segment. A difficult word is considered to be one which falls outside the 1000 most frequent words (though proper nouns, numbers and geographical derivatives are not considered difficult). A ‘P_Lex profile’ is formed from the distribution of these difficult words throughout every 10-word segment in the text and, from this, a single figure known as a lambda is formed. The process by which this is done is complex and will not be described in the present study, but it involves fitting a theoretical curve to the distribution data produced as part of the P_Lex profile. The final lambda figure, typically ranging from 0 to 4.5, has the advantage of being easy to

(23)

manipulate mathematically as well as being, according to the authors, less sensitive to text length than the LFP (Meara and Bell, 2001).

Studies which have made use of the LFP in relation to academic writing include Lemmouh (2008), which found a relationship between teacher grades and lexical richness measures (including the LFP) of essays written by Swedish advanced learners of English at university level, and Douglas (2013), which found that words from the General Service List (West, 1953) and Academic Word List cover an average 94% of a typical paper produced by an entry level (native English speaking) university student at a Canadian university. Meara and Bell (2001) tested their P_Lex on texts produced by English as a foreign language students at university level and demonstrated that P_Lex was stable across pairs of essays produced by the same participants. Both Beyond 2000 and P_Lex will be used as measures of lexical sophistication in the present study.

2.3.1.2 Lexical Diversity

 Guiraud  Vocd-D

A common measure used to test lexical diversity, and one from which the two measurements used here are derived, is type-token ratio (TTR) (Templin, 1957). Calculating TTR involves dividing the number of types (the total number of different words) in the text by the number of tokens (the total number of words) thereby determining the degree of lexical variation in a text. TTR has been subject to much criticism, however, since it is dependent on sample size. Although TTR involves calculating a proportion (and so not inherently sensitive

(24)

to sample size), the problem lies in the fact that as the number of tokens increases, the available pool of possible new types diminishes. The more tokens used in a given sample, the greater the probability of repetitions (Malvern and Richards, 1997, p.60). Given that the present study uses samples of roughly the same size (300 words), this particular shortcoming should not impact critically on the results. However, in order to produce data which has validity beyond the present study, it is useful to use measures which are not sensitive to sample size.

Two measures designed to achieve this are the Guiraud (Guiraud, 1954) and Vocd-D (McKee, Malvern, and Richards, 2000) (based on D (Malvern and Richards, 1997)), both of which use mathematical transformations of the TTR to compensate for the effect of sample size. The Guiraud attempts to achieve this by dividing the number of types by the square root of the number of tokens. This measure has the advantage of being easy to calculate, although some, including Tweedie and Baayen (1998), have claimed that it is still sensitive to sample size, albeit less so than TTR. The Vocd-D measure uses a random sampling technique to calculate an average TTR score for the whole sample. This is done using specially designed software which is available within the Computerised Language Analysis (CLAN) suite of programs developed by MacWhinney & Snow (1990) as part of the Child Language Data Exchange System (CHILDES) project. This is a popular measure and has been treated as an industry standard by some (McCarthy and Jarvis, 2007, p.476), although in the same paper the authors claim that Vocd-D remains susceptible to sample size, at least for texts of certain lengths. Both measures will be used in the present study and whichever appears to best capture development will be used for further analysis.

Both Vocd-D and the Guiraud have been used in previous research to investigate academic writing. For example, Gebril and Plakans (2016) used Vocd-D to investigate the

(25)

effect of reading on academic writing in English while the Guiraud was deployed by Yixin and Daller (2014) to investigate the academic writing ability in English of Chinese students in the UK. Gebril and Plakans found that integrated reading and writing tasks significantly affect lexical diversity values since students borrow vocabulary from the source texts. Yixin and Daller found a strong correlation between the Guiraud index of Chinese students' academic writing and their overall grade point average achieved on their academic programme.

2.3.1.3 Lexical Density

 Ure’s Method  Halliday’s Method

The term ‘lexical density’ was first coined by Ure (1971) to describe a measure of the relationship between the number of words with lexical properties (as opposed to grammatical properties) as a percentage of the total number of words in a text (O’Laughlin, 1995, p.221). Halliday (1985) proposed a revision to Ure’s method by recommending that the number of lexical items be calculated as a percentage not of the total number of words, but rather the total number of clauses. He argued that if we are interested in density, we are concerned with how closely packed together the information in a text is. But given that words are not packed into other words, but rather into larger grammatical units such as sentences and their constituent parts, Halliday proposed that the clause serve as the denominator in the lexical density formula. For the purposes of the present study, both methods will be explored to determine which one yields the most reliable measure.

Further work by Halliday (1985) sought to improve Ure’s method by providing a framework for making the distinction between lexical and grammatical words in a text. For

(26)

him, a grammatical word (Halliday prefers ‘item’) is one which enters a ‘closed system’. For example, a pronoun like him contrasts with he and his on one dimension and with me, you, her,

it, us, them and one on another, but with nothing else. A lexical word, on the other hand, enters

into an ‘open set’ which is indefinitely extendable. A word like door, for example, contrasts with gate and screen, but also with window, wall, knob, handle and so on (Halliday, 1985, p. 63). In practice, this means that determiners, pronouns, most prepositions, conjunctions and some classes of adverbs qualify as grammatical items. O’Laughlin (1995, p.228) provided a more detailed taxonomy based on Halliday’s distinction which will be adopted for use with both Ure and Halliday’s measures in the present study.

The focus of both Ure and Halliday’s work was the difference between spoken and written texts (in most cases they found the latter to have greater lexical density). However, some authors such as O’Laughlin (1995, p.222) have suggested the lexical density measure may have application beyond identifying contrasts between written and spoken language. Ishikawa (2007), for example, used both measures while investigating the effect of task complexity on L2 English written narratives while To, Fan and Thomas (2013) also used both measures while investigating the readability of English textbooks. The utility of using lexical density measures for academic texts specifically has not been properly established. It will be interesting, therefore, to see whether these measures of lexical density are able to capture development in the writing of the three learners in the present study.

2.3.2 Operationalising Syntactic Complexity

This study follows Norris and Ortega (2009, p.561) who identify three measurable subconstructs in syntactic complexity as overall or general complexity, complexity via

(27)

subordination and subclausal complexity. For the purposes of this study, these three constructs shall be referred to as sentential complexity, clausal complexity and phrasal complexity respectively. Measures will be selected based on the recommendations made by both Norris and Ortega and by Bulté and Housen (2012). The section which follows will proceed in much the same way as the last, with the provision of the most important details regarding each measure along with a rationale for their selection. Figure 2 below provides a summary of the measures selected.

2.3.2.1 Sentential Complexity

 Average Sentence Length (in words) (ASL-w)  Average Sentence Length (in morphemes) (ASL-m)

Sentential complexity will be operationalised as both the average number of words and the average number of morphemes per sentence. This measure is easily applied to academic writing since it invariably involves the use of clearly demarcated sentences beginning with a capital letter and ending with a full stop. Following Norris and Ortega (2009, p. 561), ASL in

(28)

both forms will be considered a “global or generic metric of linguistic complexity” since it is impossible to establish on the basis of this measure whether any increase (or decrease) in sentence length is due to a change in clause or phrase length, or a combination of the two. Changes at this level will be established using alternative measures outlined below.

When used in combination with clausal and phrasal lengths, it is important to bear in mind that ASL-w and ASL-m, as global measures, will always partially overlap with anything measured at a lower level. A change in clause length, for example, will always correspond to a change in sentence length (all else being equal). As a consequence, it would make little sense to investigate correlations between related measures such as these. However, w and ASL-m reASL-main useful to the present study since they can still be coASL-mpared to unrelated ASL-measures such as those associated with lexical development.

Studies which have used ASL-w (sometimes termed ‘mean length of T-unit’) include Ishikawa (2007) and Storch and Wigglesworth (2007). Ishikawa showed that ASL in written narrative discourse in English increased with task complexity while Storch and Wigglesworth found that when L2 learners of English engaged in collaborative writing activities, there was no noticeable effect on ASL-w. ASL-w and ASL-m have often been found to correlate strongly (for example, Parker and Brorson (2005)), but both will be used as part of the present study as an exploratory exercise.

(29)

2.3.2.2 Clausal Complexity

 Clauses per Sentence (C/S)

 Dependent Clauses per Clause (DC/C)

Following Norris and Ortega (2009, p.561), clausal complexity will be considered as measurable by “any metric with clause (or subordinate or dependent clause) in the numerator”. A clause will be considered as “a structure with a subject and a finite verb (a verb with a tense marker)” (Hunt, 1965, p.15). Bulté and Housen (2012, p.30) consider both C/S and DC/C measures as suitable measures of clausal complexity and, in their own survey of complexity measurement used in the academic literature, identified the two measures as the most frequently used to explore this aspect of syntactic complexity. Amongst those who adopted the two measures were Ishikawa (2007), Kuiken and Vedder (2007) and Sercu, de Wachter, Peters, Kuiken and Vedder (2006). All three studies investigated the effects of task complexity on L2 written discourse and while Ishikawa found that both C/S and DC/C measures increased with task complexity, both Kuiken and Vedder and Sercu et al. found no significant increase.

Of particular interest is the Kuiken and Vedder study into the effect of task complexity in French L2 writing. Whereas Ishikawa (2007) and Sercu et al. (2006) (who looked at L2 English writing) take both C/S and DC/C as simply measures of ‘structural’ or ‘syntactic’ complexity, thereby suggesting some overlap, Kuiken and Vedder (2007, p.124) treat DC/C differently as a measure of the “degree of syntactic embedding per clause”. The suggestion that the two measures index different constructs provides motivation for using them both in the present study. However, care must be taken when comparing the two measures during analysis, since they are structurally interdependent. For example, the use of a dependent clause in a sentence necessitates the use of an independent clause thereby increasing the total number of clauses per sentence.

(30)

2.3.2.3 Phrasal Complexity

 Average Clause Length (ACL)

 Complex Nominals per Sentence (CN/S)

While sentence length must be considered a global metric of linguistic complexity for the reasons outlined above, clause length is unaffected by the amount of subordination in production and so taps into complexification sub-clausally, at the phrasal level (Norris and Ortega, 2009, p.561). An increase in clause length can only result from phrasal elaboration (via adjectives, adverbs, prepositional phrases, or non-finite clauses) or the use of nominalisations (p.561) and so ACL can be considered an index of phrasal complexity. ACL was used by Ishikawa (2007) (defined as s-nodes per clause) demonstrating that the measure increased with task complexity.

Bulté and Housen (2012, p.29) observe that phrasal complexity has, to a large extent, been neglected in the literature as a sub-component of linguistic complexity. It is difficult, therefore, to identify a tried and tested measure to tap into this construct. One option is to investigate the frequency of the complex nominal (a grouping of words which together function as a noun) which may increase as academic writing develops. Ravid and Zilberbuch (2003) found that texts produced in Hebrew by both school age and adult writers produced more complex and diverse nominals the older and more experienced the writers were. It is conceivable that some of this development takes place during the years of university study. A complex nominal related measure will, therefore, be adopted in the present study as an exploratory exercise. The measure will be formulated as the number of complex nominals per sentence.

(31)

2.4 Research Questions

The main research questions of this study concern the interpretation of complexity measures, both lexical and syntactic, from a DST perspective. With regard to this, three research questions were formulated:

(1) Which measures of linguistic complexity best capture the overall quality of the academic writing of an advanced L2 learner of English?

(2) Which measures of linguistic complexity best capture development over time in the academic writing of advanced L2 learners of English?

(3) Does a competitive relationship exist between the lexical and syntactic subsystems of the three learners?

The first question will be operationalised as which measures of linguistic complexity taken from the academic essays of a single learner correlate most strongly with ratings of these essays provided by a group of human judges. The second question will be operationalised as which measures taken from the academic essays of all three learners correlate most strongly with the order in which the essays were written.

The basic operationalisation of the third question will be whether or not a negative correlation exists between the lexical and syntactic measures taken from each learner. The idea is that a negative correlation signals that two measures are competing for resources. In practice, however, this will require some interpretation since correlations may not exist between each lexical and syntactic measure making it difficult to establish whether a competitive relationship

(32)

exists overall or not. The results from research questions 1 and 2 may be useful here since they will indicate which measures best represent quality and development overall and so will aid the interpretation of the third research question.

Since language development is considered an individually owned process from a DST perspective, forming general hypotheses with regards to these questions is fraught with difficulty. In view of this, only tentative hypotheses shall be put forward at this stage.

Verspoor et al. (2012), which analysed written texts across levels of proficiency, found that average sentence length and the Guiraud index were effective means to distinguish proficiency levels. Verspoor et al. (2017), which analysed academic writing in advanced L2 learners specifically, found that average sentence length and the number of words from the academic word list correlated strongly with both text ratings and text order. However, in this case the Guiraud index did not correlate will with text ratings or text order. On this basis, the answer to the first research question is predicted to be that both ASL-w, ASL-m (both average sentence length measures) and Beyond 2000 (which is sensitive to the use of less frequent words) will correlate strongly with both text order and text ratings.

Verspoor et al. (2012) also found that almost all specific constructions showed non-linear development and variation from learner to learner. In light of this, measures of clausal and phrasal syntactic complexity are not expected to correlate strongly with text ratings or text order, at least not across all three learners. Rather, non-linear development is expected which will vary amongst the three learners.

(33)

On the basis of the studies by Verspoor et al. (2008) and Caspi (2010) descried earlier, it is predicted that a competitive relationship will be found between many of the lexical measures and syntactic measures. It is difficult, however, to predict exactly which measures will interact and what the nature of these interactions will be. In view of DST, variation across the three learners is anticipated, but common patterns may also emerge which could form the basis of future research.

(34)

3 METHOD

3.1 Participants and Essays

All the texts analysed as part of the study were taken from three Dutch female university students. They all completed English Language and Culture degrees at Dutch universities (students A and B at the University of Groningen, Student C at Radboud University) and were aged 18 to 21 during the time the texts were written. They can all be considered advanced level students since both programmes stipulated an IELTS score of 6.0 or more (6.5 in the case of Groningen) or an equivalent in order to gain admission. The essays were all written during the course of their study and were well distributed over this time. The essays were all written on topics related to the degree programme and focussed on either literature or linguistics. All the essays were written without a time limit with free access to dictionaries and other resources.

In total, the three students offered 56 essays for analysis of which 42 were deemed suitable. Reasons for rejection included that the essays were too short for analysis, had multiple authors, were written in a question and answer format which may have influenced writing style, or contained significant overlap with a previous essay (some redrafts were submitted). The average word count of the final 42 essays was 1,723 with the shortest being 449 words and longest 10,740 words. Table 1 below provides details of the final 42 essays selected for analysis.

(35)

Table 1

A summary of the essays selected for analysis

Student A Student B Student C

Number of essays 13 17 12

Date of first essay Jan 2012 Oct 2014 Oct 2014 Date of last essay Dec 2015 Jun 2017 Aug 2018

Total words 11730 31753 28897

Average length (words) 902 1868 2408

3.2 Tools

The CLAN programs were used to calculate several measures as well as serve a number of other functions which have been detailed in the sections below. CLAN consists of two parts, an editor which allows texts files to be edited in the CHAT format (MacWhinney, 2000), and a set of data analysis programs run from a separate command window which includes MOR and vocd (McKee et al., 2000). The editor was used to prepare texts for analysis in CLAN,

MOR provided morphosyntactic analysis used for lexical density and ASL-m measures, and vocd provided the Vocd-D measure as well as a TTR score from the Guiraud was calculated.

The L2 Syntactical Complexity Analyzer (L2SCA) developed by Lu (2010) was used to measure both clausal and phrasal complexity. This program is included as part of the TAASSC package developed by Kyle (2016). The program reads language samples in plain text format and generates a report containing several complexity measures. More precise details of its functionality have been provided below.

(36)

In addition, two freely available programs, AntWordProfiler (Anthony, 2014) and P_Lex (Meara, 2018), were used to measure lexical sophistication. Details of their functionality and use are provided in the relevant section below.

Several programs were written in the Python programming language using IDLE, a freely available integrated development environment, to perform certain tasks necessary to the study. These include a program to extract a representative 300-word sample from each essay, a program to generate the scripts needed to perform the pairwise analysis, and several short programs for cleaning and preparing the texts for analysis. Since the programs were written specifically for use in the present study, they were not designed to be user-friendly and as such, are unsuitable for use by other researchers. However, their key functionality will be described in the method section below, making possible their replication in future studies.

3.3 Sample Selection

Once the 42 texts had been chosen, it was necessary to select a sample from each text using a method reliable enough to ensure it was representative of the whole text. In line with Verspoor et al. (2017), introductions and abstracts were removed along with any direct quotes or references. Whilst Verspoor et al. opted for a 200-word sample, a larger 300-word sample was selected in this case to improve reliability. It was also felt that Verspoor et al.’s method of randomly selecting a sample was insufficient given the uneven nature of a typical piece of writing and that, as a consequence, an improved method was required.

The method designed to achieve this involved extracting from each text every possible 300-word sample (whilst preserving sentence integrity) and then comparing each extracted

(37)

sample with the overall text to find the closest match. This involved measuring both the average sentence length in words (ASL) and average word length in letters (AWL) in both the original text and each extracted sample and then performing two Z-tests6 to establish the difference between a given sample and the original. For example, for a given sample x, Z-tests were performed to calculate the difference between the ASL of x and the ASL of the original, and then between the AWL of x and the AWL of the original. These Z-test scores were added to produce a single overall value for each sample. All the samples were then ranked according to this value and the sample with the smallest overall difference from the original text was selected as the sample to be analysed. The rationale behind using ASL and AWL was that, firstly, the two measures are simple to calculate and require minimal preparation (this was important given the combined size of the 42 texts of approximately 72,000 words) and, secondly, they both have a long history as reliable measures of text difficulty (the Flesch–Kincaid readability tests use both measures, for example (Flesch, 1948)).

Both ASL and AWL were measured using the CLAN software while the sample extraction was performed using a program written in Python specially designed for this purpose. The program works by taking the first sentence in the original text as a starting point and then by iterating through subsequent sentences taking a cumulative word count as it proceeds. When it reaches 300 words (or as close as possible without violating sentence integrity), the sentences processed are copied and written to an external text file. This file is considered as the first sample. The process is then repeated using the second sentence of the original text as a starting point to produce the second sample. The program then proceeds through the remainder of the original text in the same manner until it nears the end of the text

6 A Z-test was chosen in preference to a t-test since the population variance was known (both average

(38)

and can no longer produce 300-word samples. This process typically produces a very large number of samples. For example, text 16 from student B, which consisted of 6,036 words in its original form, produced a total of 206 samples. The ASL and AWL for each sample was measured using CLAN and the Z-tests were performed using Excel. An example of the Excel output for text 16 is provided in table 2 below.

Table 2

Samples Extracted from Text 16, Student B

Rank

Sample

Number Words AWL

Z-test (AWL) ASL Z-test (ASL) Combined Z-tests - Original 6036 5.16 - 28.34 - - 1 *119 311 5.15 0.0051 28.27 0.0003 0.0054 2 117 303 5.17 0.0015 27.55 0.0485 0.0499 3 113 278 5.19 0.0383 27.80 0.0203 0.0586 4 50 290 5.19 0.0375 29.00 0.0307 0.0682 … 203 92 306 5.49 2.7597 38.25 5.5086 9.9402 204 149 309 4.72 2.4501 23.77 1.9020 10.2132 205 91 311 5.51 2.7592 38.88 6.2252 11.1505 206 93 291 5.45 2.7508 41.57 8.5910 11.7728

*In this case, sample number 119 was considered the most representative sample since

its combined Z-test score was lower than any other sample.

It is clear from the measurements taken from the lower ranked samples displayed in table 1 that this method reduces the possibility of extracting an unrepresentative sample. The ASL measured in sample number 93, for example, was almost 50% higher than the ASL of the original text and so is likely to produce syntactic measures unrepresentative of the student at this stage of development.

Table 2. A summary of samples extracted from text 16 of student B. Only the top 4 and bottom 4 ranked items have been included.

(39)

3.4 Text Preparation

To prepare the texts for analysis, various preparations were made to ensure that the texts were compliant with the requirements of the tools used. Titles and subheadings were removed to ensure only fully-formed sentences remained. Hyphens were removed to ensure that a word count was properly taken (a phrase such as ‘text-based conversation’ was rendered as ‘text based conversation’ to ensure three words were counted rather than two) and abbreviations such as ‘e.g.’ and ‘i.e.’ were removed since the full stop is used both by CLAN and L2SCA to determine the end point of a sentence. After the pairwise test had been administered, the placeholders ‘NAM’ and ‘NUM’ were used to replace names and numbers, including years and decades (it was felt they should be retained during the pairwise test to maintain readability). It was important to retain placeholders in the text since this ensured the syntactic structure of each sentence was preserved during the syntactic analysis. Proper names were dealt with differently depending on the measure being used. This is outlined in the section below. Finally, note that direct quotes and references had already been removed by this stage as part of the sample selection process described above.

In order to prepare the texts as efficiently and as accurately as possible, several short programs were written in Python to aid the process. For example, one program was written to inspect each text and compile a report on any potentially troublesome punctuation or unwanted words such as numerals or years. Note that these programs did not utilise a ‘replace’ function to deal with issues, but served only to identify potential problems. Every decision to modify a text in any way was taken manually.

(40)

3.5 Taking the Measures

Table 3

Measures and Calculation

Measure Calculation

Lexical Frequency Profile (LFP) (Beyond 2000)

The percentage of words which are beyond the first 2000 most frequent words.

P_Lex

Calculates the distribution of words beyond the first 1000 most frequent words and transforms this data into a lambda score.

Guiraud The number of types divided by the square root of the number of tokens.

Vocd-D Utilises a random sampling technique to calculate an average TTR score for the whole sample.

Lexical Density (Ure’s Method) The number of lexical items as a percentage of the total number of words.

Lexical Density (Halliday’s Method)

The number of lexical items as a ratio of the total number of clauses.

Average Sentence Length in Words

(ASL-w)

The average number of words per sentence. Average Sentence Length in

Morphemes (ASL-m)

The average number of morphemes per sentence.

Clauses per Sentence (C/S) The average number of clauses per sentence.

Dependent Clauses per Clause

(DC/C) The average number of dependent clauses per clause.

Average Clause Length (ACL) The average length of a clause in words.

Complex Nominals per Sentence (CN/S)

The average number of complex nominals per sentence.

(41)

Table 3 above provides a summary of all 12 measures used during the study along with their calculation. The following section describes the methodology used to take each measure. It has been divided according to the six constructs associated with lexical and syntactic complexity.

3.5.1 Lexical Sophistication

 Lexical Frequency Profile (LFP)  P_Lex

AntWordProfiler was used to measure the LFP. The package includes the New General Service List (Browne, Culligan and Phillips, 2013) which serves to represent the first and second bands of most frequent words along with the Academic Word List compiled by Coxhead (2000). The software is ideal since it allows the user to view the sample text with colour coding to represent which band each word belongs. Proper nouns were removed before the off list was calculated in line with Laufer and Nation (1995). The program produces a full LFP profile consisting of four percentage values corresponding to the four bands. From this, a Beyond 2000 score was calculated manually from the sum of the Academic Word List and the off-list percentages.

P_Lex was calculated using online software provided by Meara (2018) which uses a vocabulary list developed by Nation (1984). The program allows the user to enter a text from which all the ‘difficult’ words are identified and then displayed on screen. The user is then given the option to manually reclassify any of these words as ‘easy’. In line with Meara and Bell (2001), proper nouns, numbers and geographical derivatives were reclassified as easy at this point. The program then produces a lambda value for each text which will be used for analysis.

(42)

3.5.2 Lexical Diversity

 Guiraud  Vocd-D

The Vocd-D scores were calculated using the in-built function vocd provided by CLAN. In summary, the function works by plotting a TTR versus tokens curve (thereby progressing through the text) with values derived using words randomly selected from the text up to that point. The function then finds the best fit between the curve derived and ideal curves of theory based on a parameter D. The value of this parameter for best fit is considered the index of lexical diversity. A high value for D reflects a high degree of lexical diversity while a low value reflects less diversity (Malvern, Richards, Chipere and Durán, 2004 p.55).

The Guiraud score was calculated by first obtaining a TTR measurement using vocd. The number of types recorded was then divided by the square root of the number of tokens recorded to obtain the final Guiraud score.

3.5.3 Lexical Density

 Ure’s Method  Halliday’s Method

The total number of clauses was calculated using L2SCA (the program provides an average clauses per sentence figure which can multiplied by the number of sentences to produce this figure) while the number of lexical words was calculated using CLAN. Following O’Laughlin’s (1995) taxonomy, all nouns (including proper nouns), adjectives and verbs (including copulas and participles) were counted as lexical items, while all determiners,

(43)

conjunctions, auxiliary verbs (including modals), pronouns, prepositions and infinitive markers were counted as non-lexical items. Adverbs were dealt with separately and only those which could be considered “adverbs of time, manner and place” (O’Laughlin, 1995, p.228) were counted as lexical. This distinction was made manually based on a list of all adverbs produced by CLAN from the texts. The only deviation from O’Laughlin’s taxonomy was in the treatment of the verbs ‘to be’ and ‘to have’. While O’Laughlin recommended treating all instances of these verbs as grammatical items, they were instead treated as lexical except when they occurred as auxiliary verbs. This is in line with the common practice of classifying non auxiliary verbs as lexical (for example, Huddleston and Pullum (2005, p.18)).

Ure's formulation of lexical density was used in the manner she recommended by calculating the number of lexical items as a percentage of the total number of words. Halliday's formulation as the number of lexical items as a percentage of the total number of clauses was instead calculated as a simple ratio (without the ‘100’ multiplier) since when dealing with words and clauses, the numerator and denominator refer to different types which are not readily comparable with a percentage.

3.5.4 Overall Syntactic Complexity

 Average Sentence Length (in words) (ASL-w)  Average Sentence Length (in morphemes) (ASL-m)

Both ASL-w and ASL-m measures were calculated using CLAN package. ASL-m is calculated using the morphosyntactic analysis tool MOR, packaged with CLAN. The tool analyses each word of the text into its component morphemes which are then counted and averaged to produce the ASL-m measure. ASL-w is simply a count of the average number of words which occur within each sentence.

(44)

3.5.5 Clausal Complexity and Phrasal Complexity

 Clauses per Sentence (C/S)

 Dependent Clauses per Clause (DC/C)  Average Clause Length (ACL)

 Complex Nominals per Sentence (CN/S)

Measures for both clausal and phrasal complexity were measured using L2SCA. The program works by taking as input written English language samples in plain text format and outputting various indices of syntactic complexity (Lu, 2010). During the pre-processing stage, the Stanford parser (Klein & Manning 2003) is used to analyse the syntactic structure of sentences in the sample, producing a sequence of parse trees to represent the syntactic structure. The system is able to segment the text into individual sentences (without the need for manually entered line breaks) and provide part-of-speech tagging. During the syntactic complexity analysis stage, the syntactically-parsed sample is analysed and nine production units are counted (Lu, 2010). The units relevant to the present study are words, sentences, clauses, dependent clauses and complex nominals. Lu (2010, p.481), details how these production units are measured by the program which has been summarised here:

Words. A word is considered as any token which is not a punctuation mark.

Sentences. A sentence is considered by the program as a group of words delimited by

an appropriate punctuation mark (full stop, question or exclamation mark). Care was taken to ensure the 42 samples were correctly punctuated in this regard prior to analysis (in fact, no changes were needed).

Clauses. A clause is a structure with a subject and a finite verb.

Dependent clauses. A dependent clause is defined as a finite adjective, adverbial, or

Referenties

GERELATEERDE DOCUMENTEN

Il ne subsistait que fort peu de vestiges des structures qui avaient vraisembla- blement pro i égé cette trouée dans le rempart: nous avons retrouvé trois grands

Wel moeten er ten noorden van het plangebied, wanneer fase 2 geprospecteerd wordt, rekening worden gehouden met sporen uit de metaaltijden aangezien er drie sporen

In de Oester Gronden en op het Friese Front worden slechts incidenteel dieren kleiner dan 30 mm gevonden (Figuur 5) terwijl in de noordelijke Noordzee (Fladen Gronden) op het moment

In het laatste kwart van de dertiende eeuw zien we naast de attestatie van een laatste formule zonder tijdsbepaling ook potentiële samengestelde werkwoorden van het type hier

Deze metingen werden door de meetploeg van Animal Sciences Group uitgevoerd volgens het nieuwe meetprotocol voor ammoniak (Ogink et al. , 2007) zoals die is opgenomen in de

Om een idee te krijgen van de huidige aanwezigheid van de Apartheidsideologie in de Afrikaner identiteit en de dominante (racistische) denkbeelden die hiermee gepaard gaan is

The primary objective of this study was to propose and empirically test a model that combined the TRA and the TAM to measure the extent to which perceived ease of use,