• No results found

The Role of Etymological Origin for Recognizing English-German Cognates

N/A
N/A
Protected

Academic year: 2021

Share "The Role of Etymological Origin for Recognizing English-German Cognates"

Copied!
40
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Rijksuniversiteit Groningen Department of Linguistics

MA Programme: European Linguistics

The Role of Etymological Origin for Recognizing English-German Cognates

Testing native English speakers’ ability to translate German words, comparing inherited Germanic cognates with shared Romance & Latin loanwords

Master's Thesis submitted by Merle Schumann July 2017

(2)

Contents 1 Introduction

1 1 Studying the language pair English-German p. 3-4 1 2 Research question, outline & hypothesis p. 4-5 2 Background

2 1 Historical-linguistic background p. 5-6

2 1 1 Cognates & loanwords: Definition & behaviour p. 6-8 2 1 2 Common origins & divergence of English & German p. 8-10 2 1 3 Latin and the Romance languages p. 10-11 2 1 4 Influence on the lexica of English and German p. 11-15 2 2 Theoretical background: Linguistic distance & intelligibility p. 15

2 2 1 Measuring linguistic distance p. 15-18 2 2 2 Methods of intelligibility testing p. 18-19 2 2 3 Previous research on the pair German-English p. 19-20 3 Methodology

3 1 Test type p. 21

3 2 Material & selection of test items p. 21-24

3 3 Calculating orthographic distance p. 24

3 4 Calculating phonetic distance p. 24-25

3 5 Study design p. 25-27

3 6 Participants p. 27-28

4 Results

4 1 Linguistic distance of the test items p. 29-30

4 2 Results functional test p. 30-33

5 Discussion & Conclusion p. 33-35

(3)

1 INTRODUCTION

1 1 Studying the language pair English-German

Is it possible for someone to perform well on a vocabulary test of a language they have never studied or even had extensive contact with? The intuitive answer would be 'no', but under the right circumstances they might be able to translate a substantial number of items correctly. This paper explores this phenomenon for the language pair of English and German. More precisely, it is concerned with the diachronic development of the two languages within the cultural framework of European languages and how these language histories affect intelligibility today. European language policy, multilingualism and mutual intelligibility have been a blooming field of linguistic research in the recent past (see, for example, Gooskens 2007, Gooskens & van Heuven 2017, Francia & Riis 2016 and Horner, De Saint-Georges & Weber 2014). This paper places itself within this framework, since it seeks to contribute to the understanding of European cultural history through linguistic research.

(4)

of special interest for intelligibility research since they have common roots but were also influenced drastically by other European languages, most notably Latin and, later on, its Romance descendants such as French and Italian. Note that there is “no real dividing line between Latin and Romance” (Elcock 1975 [1960], p. 223; see also Banniard 2013), but in accordance with historical-linguistic convention, the two terms will be treated separately, with Latin being the language of the Roman empire and Romance being the large and varied language family descending from it.

To return to the opening question if someone can perform well on a vocabulary test of an unfamiliar language: It is possible if the listener's own language and the target language share a number of cognates due to either their common origin or their adoption of the same loanwords from a third language. It is also possible if the listener recognizes a corresponding cognate from a third language they know (Swarte, Schüppert & Gooskens 2013 show that the knowledge of a third, related language can, in fact, help speakers understand an unknown but typologically related, at least in the case Dutch speakers with knowledge of German, which helped them understand some Danish words. Kürschner (2013) shows how speakers of German use their knowledge of English to translate Dutch words). This paper, therefore, seeks to explore the role of cognate origin, i.e. inherited word or loanword, for the performance of native English speakers on a vocabulary test of German.

1 2 Research question, hypothesis and outline

This paper looks at two questions, one concerned with calculating linguistic distance and one concerned with its implications for intelligibility. a) How do inherited Germanic words and Latin/Romance loanwords which are cognates in English and German behave in terms of linguistic distance: Does the etymological origin of the cognate predict orthographic and phonetic distance? b) How does this affect intelligibility: Do non-speakers of German find it easier to translate one type of cognate over the other one? To take into account orthographic and phonetic distance, both written and spoken items were tested, by means of a word translating task, which was taken as an online survey by native speakers of English with little or no previous knowledge of German.

(5)

inherited words (Gooskens, Kürschner & van Bezooijen, 2012). Therefore, speakers are predicted to score higher on more recent loanwords than on inherited words. Factors such as average word length and knowledge of other foreign languages also play into the results, but this will be described in more detail below.

This paper is structured the following way: The second section covers the theoretical background for this study: First, I will take a look at the historical connections between the German and the English language and their contact with other European languages. After that, methods of testing intelligibility and calculating linguistic distance will be considered. Next, the methodology followed in this study will be described: How was the survey compiled and conducted, and how was the linguistic distance calculated for the particular items in question? After that, the results for both the linguistic distance calculation and the practical intelligibility test will be presented. Finally, I will discuss the outcome of these questions for the overall context of European intelligibility research.

2 THEORETICAL BACKGROUND 2 1 Historical-linguistic background

As briefly outlined in the introduction, the scope of this research encompasses two broad fields of linguistics: Historical linguistics and intelligibility research, and the bridge which connects the two: How languages diverge and converge as a result of cultural change, i.e. increasing or decreasing the linguistic distance between them – thereby, of course, directly affecting mutual intelligibility. Therefore, section two of this paper will discuss the mutual history of English and German in the context of European linguistic and cultural history, before turning to the theoretical methods of measuring linguistic distance and testing intelligibility.

Historical Linguistics as a modern research field using quantifiable methods came to prominence in the eighteenth and nineteenth century when “the disciplines of historical and comparative linguistics emerged which focused on reconstructing the proto-language Indo-European [and classifying its branches]” (Trips 2015, p. 48). The interest in comparing and classifying languages according to their history has not subsided since then, but methods and theories have changed drastically: Modern historical-linguistic research makes use of corpus methods and computational models of systematic language change (Trips 2015, pp. 49-63, see also Campbell 2013 [1998], pp. 107-195). The results of this field of research are valuable not only for the field of linguistics, for the overall understanding of cultural and political history (for the cultural and political dimensions which shaped the face of modern English, see e.g. Crowley 2003 [1989]). While it would be oversimplifying to say that linguistic distance is a direct function of the languages' shared history, many insights on linguistic distance are gained from historical linguistics.

(6)

After that, the cultural importance of Romance languages (with a focus on Latin and French) will be sketched out briefly. The concluding section of this section will describe how these Romance languages affected English and German, with a focus on the area of vocabulary.

2 2 1 Cognates & loanwords: Definition and behaviour

This section will introduce a central concept for this study, that of shared vocabulary between two (or more) languages. For this purpose, the term 'cognate' will be defined before turning to how they are used in linguistic research. The difference between inherited words and shared loanwords will be considered, before describing the mechanisms with which loanwords and cognates find their way into a language and then develop.

Crystal (2009 [1980]) gives two definitions of the term, one lexical and one grammatical: It can describe either “[a] language or a linguistic form which is historically derived from the same source as another language/form” or “some kinds of syntactic relations” (pp. 83-84). Note that his first definition can refer to either a language as a whole, or an individual lexical item. While German and English are clearly cognate languages by this definition (see the section 2.1.2 on their common origin), it is the individual lexical item which will be the focus here. Other definitions of the term 'cognate' focus solely on this interpretation, such as Pyles (1971 [1964]): “Words [my emphasis] of similar structure and similar, related, and in many instances identical meanings in the various languages of the Indo-European group may be recognized […] as cognate – that is, of common origin” (p. 91) or Tips (2015), putting it more briefly: “words which are historically derived from the same source” (p. 41). Note that these definitions take a historical-linguistic approach to cognates, but modern computational methods of linguistic research have brought about definitions which focus on synchronic relationships between the terms, such as cognates being “translation equivalents with high orthographic overlap” (Schepens, Dijkstra & Grootjen 2012, p. 157).

(7)

up, it should be kept in mind that it is not always possible to undoubtedly identify a loanword in terms of origin.

What role do cognates play in different areas of linguistic research? Within the framework of historical linguistics, cognates are a main source of information regarding phonetic factors such as stress: For example, Modern French saule and Spanish sauce derive from the Latin salicem, allowing the conclusion that the Latin item carries its stress on the first syllable (Elcock 1975 [1960], p. 21). In synchronic linguistic research, cognates are studied, for example, from a cognitive-linguistic point of view (for example Dijkstra, Grainger & van Heuven 1999) and in research on bilingualism (e.g. Bartolotti & Marian 2016; Bartolotti & Marian 2017). Apart from that, cognates play a central role in the study of linguistic distance and mutual intelligibility. The lexical distance between languages can be measured by putting into relation the number of cognates and non-cognates of two languages' lexica (e. g. Heeringa, Golubović, Gooskens, Schüppert, Swarte & Voigt (2013), where lexical distance is defined as “the percentage of non-cognates in the language of the reader compared to the stimulus language”, p. 107). How important the lexical difference is for intelligibility depends on the varieties in question. For example, an assessment of lexical and phonetic distance of seventeen Scandinavian dialects has shown that phonetic distance is a more important predictor of intelligibility than lexical distance (Gooskens, Heeringa & Beijering 2009). In a similar vein, an intelligibility study between the main Scandinavian languages as well as Dutch, Frisian and Afrikaans has determined phonetic distance to be a more important factor than phonetic distance (Gooskens 2007). This does not diminish the important role of cognates in linguistic research. Möller (2011) reports on different recognition strategies for cognates present in a variety of Germanic languages. Schepens, Dijkstra & Grootjen (2012) sum up the importance of cognate research for varying fields such as, of course, historical linguistics, but also psycholinguistics, especially in the context of bilingualism studies. In addition, the paper notes that “[s]tudies involving only small lists of cognates have already proved to be successful in the prediction of historical relations between language combinations” (p. 158). In an investigation of Chinese dialects, Tang & van Heuven (2015) found the percentage of shared cognates between the dialects to be the most important factor in predicting intelligibility. Cognates are also of particular importance in second language acquisition (for an overview see the introduction in Friel & Kennison 2001).

(8)

of pronunciation (and, resulting from that, in orthography. Semantic changes occur as well, but this will not be the focus of this research). Therefore, if a word has been part of the English and German lexicon since before the divergence of these Germanic dialects (see below), its present forms in both languages will differ from each other considerably. In contrast, loanwords which have only entered two languages independently of each other at a later point in time would be more similar since they have participated in less structural changes. Another central mechanism which can be demonstrated across languages is that so-called core vocabulary is rarely ever made up of loanwords, but usually consists of inherited words. These mechanisms are well-documented and form a major basic concept in the field of historical linguistics (see (Haspelmath 2009, pp. 36-43; Banta 1981, p. 130). The distinction between inherited words and loanwords is of particular importance when considering the language pair of German and English, since they share their historical roots, but have also both been influenced by other languages, meaning that English-German cognate pairs represent both inherited words and loanwords from different periods of time. This will be explored in more detail below.

2 1 2 Common origins & divergence of English & German

The history of the Germanic branch of the as a part of the Indo-European language family has been the subject to long-standing and thorough research. Particularly for the Germanic language family, a substantial amount of information is available on development and family connections as well as on reconstructed forms (Schrijver 2014, p. 2; see also, for example, Harbert 2007 and König & Auwera (ed.) 1994 for extensive overviews of the history of Germanic languages). Concerning the role of Germanic on the Indo-European time line: “It is likely that Proto-Germanic began to diversify 2500 years or more before the present day (with Northwest Germanic splitting off soon after that and with West Germanic diversifying less than 2000 years ago” (Grant 2009, p. 360).

(9)

To begin with, the status of English as a part of the West Germanic languages has not gone unchallenged: While the common historical roots are undisputed, the subsequent development of English, especially in the field of vocabulary, has set it apart from other Germanic languages, leading to the view that English can be seen as neither a part of the West Germanic nor the Romance dialect continuum (Chambers & Trudgill 1998 [1980], p. 6). This view is partly built on the fact that modern English contains a large number of French loanwords compared to other West Germanic languages. The picture is further complicated by the discussion whether English – if it is to be considered a Germanic language at all – is a part of the North- or the West Germanic language continuum due to the considerable influence of Old Norse on Middle English. The fact that English borrowed some of its pronouns from Old Norse (such as they, them, their) is well-documented (e.g. Grant 2009, p. 375). This is remarkable since pronouns and other function words are generally considered to be extraordinarily resistant to changes through borrowing (Tadmor, Haspelmath & Taylor 2010). This is just one example of the strong influence of Old Norse on Middle English. Other areas of similarity are syntax and morphology, and these overlaps have recently led to the theory that English should be considered a North Germanic language (Emmonds 2011). This discussion illustrates the point that historical-linguistic research and language family trees should not be considered clear-cut indicators of modern-day linguistic distance, highlighting the importance of synchronic computational methods which form the basis of this research. In the following section I will make no attempts to solve this question but simply describe which historical developments took place to create this intricate language situation.

(10)

this is not to say that these other changes bear no relevance for intelligibility testing. In the area of phonetics, systematic sound changes in both languages caused English and German to diverge. One of the most important of these sound changes was the High German consonant shift which took place at around the same time Old English began to separate from other Germanic dialects. It therefore affected Old High German, but not Old English (Schrijver 2014, pp. 97-107). In terms of lexical development, arguably the biggest factor contributing towards the divergence of the language was the influence of Latin and its predecessors. The following sections will focus on the role of these languages in Europe and how they affected English and German separately.

2 1 3 Latin and the Romance languages

The Latin language and its descendants are a prime example for how cultural factors influence language development on a large scale. This section will briefly outline the development of Latin and its predecessors, the Romance language family, and their cultural importance within Europe. Extensive overviews can be found, for example, in Elcock (1975 [1960]) and Stolova (2015) which gives an overview of scholarly work and main objectives of research on the Romance languages. The focus of this section lies on the sociolinguistic and cultural dimensions which led to the overarching importance of Latin and the Romance languages within the framework of European language history. This is to provide the background necessary for understanding why these particular languages have had such a drastic influence on the lexica of English and German.

(11)

Given that the focus of this research lies on the impact of language contact on the lexicon, the following question is of importance: How did the Latin lexicon develop during the emergence of the Romance languages? Stefenelli (1992, p. 12-14; quoted in Stolova 2013, p. 54-55) proposes a five-way distinction for the possible ways individual lexical items could take in the course of this development:

1. Pan-Romance continuity (referring to a lexeme which is shared by all Romance languages) 2. Inter-Romance continuity (referring to lexemes found in most, but not all Romance

languages)

3. Regional Romance continuity (referring to lexemes found in some or even just one Romance language)

4. Sporadic continuity (lexeme is present in only one Romance language)

5. Zero continuity (the lexical item is not found in any modern Romance languages)

This serves to illustrate the complexity of language change and development in the area of the lexicon in general. More precisely, it shows that some items survived throughout the course of development while new innovations and items borrowed from other languages as well as items present in the pre-latinised languages of the respective regions made up the lexicon of the Romance languages.

To sum up, the Latin language had a massive effect on the linguistic development of Europe during the height of its speakers' political power, since military and economic operations facilitated language contact between Latin and other languages on a large scale. Even after the decline of Latin as a spoken language it continued to be a high-prestige variety of culture and learning, and therefore a rich source of loanwords.

2 1 4 Influence on the lexica of English and German

Latin and some Romance languages have executed central functions within European language history, not due to any inherent merits of their respective language system but due to the cultural and political powers connected with their speakers. This section will give an overview of how the English and German lexica developed with a special focus on Latin and French (this being the most important source language for loanwords among the Romance languages). The Germanic landscape was influenced by a large number of neighbouring and intermixing varieties, but due to “Roman military and political might” (Salmons 2012, p. 91), Latin exerted the biggest influence on Germanic compared to other language families with which speakers of German were in contact, such as the Celtic languages.

(12)

walk' << Frankish *markôn 'to make or imprint a sign' < Frankish *marka 'sign that marks the border'” (Stolova 2013, p. 74), or French “jardin < garto, guerre < *werra” (Salmons 2012, p. 170). But overall “one might be justified in concluding that Germanic influence [on Latin] during the time of the western Empire was negligible” (Elcock 1975 [1960] p. 218), illustrating the power and prestige asymmetry between the varieties.

Another caveat: German and English contain a number of Latin loans which stem most likely from the Roman Imperial age, i.e. before Germanic tribes settled Britain resulting in the separate development of English. (Freeman 1998 [1992], p. 71). Examples are Cellarum (German Keller, English Cellar) and Ceresa (German Kirsche, English Cherry, Elcock 1975 [1960], p. 46). These loanwords are predominately culturally motivated (i.e. giving Germanic terms for items they were not necessarily familiar with before the contact). In total, there are around 175 of these type of loanword (Williams 1975, p. 57). They have undergone the respective sound changes in both English and German (Pyles 1971 [1964], p. 314). Items of this type represent a special case which is not subject to this study: They are parts of both the English and German vocabulary from a time before the two languages began to diverge, but they are also culturally motivated Latin loanwords constituting the first 'wave' of Latin loans. While they are undoubtedly interesting to investigate, they will be excluded from this study.

First, I will examine the influence of Latin and French loans on English, before turning to German. In Britain during the period of Roman occupation, Latin was more the language of the educated administrative class than the general population, although it was understood and probably spoken by those who were in direct contact with the Roman administration (Freeman 1998 [1992], p. 9). With the decline of the empire the local vernaculars gained importance over Latin, but some loanwords from Latin prevailed from this period (Elcock 1975 [1960], p. 181). When the Roman occupation of Britain ended, the occupiers took their language and writing system with them, leaving relatively few traces of Latin behind (referring to the local Celtic dialects spoken on the island before the settlement of the Germanic tribes whose language would later develop into English). The departure of the Roman occupiers was followed by (and to a certain degree caused by) Germanic tribes settling on the British island, bringing with them a runic writing system which was slightly modernized under the influence of Irish Christian literacy, which the Germanic speakers came in contact with in Britain (Pyles 1971 [1964], p. 58). During this Old English period, Latin loanwords were mostly taken from the semantic field of religion and scholarly life. On a side note: For some loanwords from this time period it is unclear whether the item in question was borrowed from Latin or from Greek, since there was extensive borrowing between these two languages (cf. Green 2015 [1990]). All in all, around 500 Latin words were incorporated into the lexicon of English during the Old English period, meaning before the Norman conquest of 1066 AD (Pyles 1971 [1964], pp. 315-317).

(13)

restricted to speakers of lower social classes. This sociolinguistic phenomenon resulted in a large number of French loanwords (Freeman 1998 [1992], p. 96). More borrowings stem from the Middle English period. Especially for this period, a methodological complication arises since it can be “impossible to tell whether a word is from French, or from Latin, for instance complex, miserable, register, rubric, and social, which might be from either language, judging by form alone” (Pyles 1971 [1964], p. 318). As shown in the previous section, these kinds of borrowing were mostly due to a perceived cultural superiority of the source languages: “A great deal of 'Latinate' vocabulary came into English from the 16th century onwards, during the Renaissance […] when both Latin and

Greek were generally considered to be languages superior to English” (Freeman 1998 [1992], p. 71). As a reaction to this development, there was a movement of linguistic purism which attempted to replace some of the new foreign items with native expressions, for example “crossed instead of crucified, wiseards instead of magi, waite on instead of servant, biwordes instead of parables, hundreder instead of centurion” (Geers 2005, p. 101 ). Despite attempts like that, most Latin loans entered in the Middle English period from 1500 onwards (Pyles 1971 [1964], p. 319). This is not to say that no borrowing took place in later periods. English continued to absorb loanwords from Latin, French and other languages, but not again to such a massive extend as in the time periods described above. For example, the industrial revolution with its wealth of new semantic concepts brought abut a large number of newly coined loanwords with Greek origin (Geers 2005, p. 103). In the context of this study, these loans are not relevant since they are not a big part of high-frequency vocabulary due to their technical nature. To sum up: Latin (first as an active, 'living' language, and later as a written language) and French were the main source languages for loanwords into English, although other languages such as Greek also supplied a smaller number of loanwords. The highest number of French and Latin loanwords entered the language during the Renaissance period.

Next, I will turn to German and how its lexicon was shaped by loanwords from Latin and French. To begin: as in the case of English, Latin and other Germanic varieties share a number of cognates, forming a first group of lexical overlaps between the languages. Salmons (2012, p. 56) names as examples: Latin piscis and Gothic, Latin super and Old High German ubir, Latin edo and Old High German ezzan. This is to show that there is not always a clear dividing line between Germanic inherited (or native) words and Latin loanwords. In the present study, words of this type have been grouped as inherent Germanic despite the cognates available in Romance languages and Latin since these words have been a part of the German lexicon for such a considerable time that they took part in most sound changes the language underwent.

The root of the modern word Deutsch ('German') emerged during the Old High German period around the 8th century, in order to distinguish its speakers from Latin speakers (Salmons

(14)

towards languages with many loanwords, and that comparing loanword density of two temporally separated languages bears its own methodological concerns.

The majority of these few loanwords present in Old High German are of Latin origin, covering mostly the semantic fields of religion, education, technology and farming (Salmons 2012, p. 169). Early New High German (starting in the 17th century) saw a development not unlike the

English lexicon in the sense that a large number of French loanwords entered the lexicon, for similar culturally motivated reasons. French-German bilingualism became prevalent across upper-class Germans, resulting in “a notable amount of lexical borrowing” (Polenz 1994, pp. 1-2). While Latin remained the main language of the state, religion and science, French also became popular in politics and high society, indirectly enriching the German language: “Latin was the language of science and literature, French was the means of communication of the upper classes and German (in the form of its various dialects) was the medium of the poor” (Geers 2005, p. 100). Texts from that period range from containing 3.1 % French loanwords to up to 33 % French loanwords (Salmons 2012, p. 278). Polenz (1994) notes for 17th century German: Latin loanwords go back from around

50% to 28% at the end of the 18th century. French loanwords are around 40 % to 60% at the end of

the 18th century. Italian loanwords are less important, but nevertheless present: 20% at the start of

the 17th century to 6 to 9 % in the middle of the 17th century (p. 75). Around the same period,

attempts to standardize German were made, with grammarians prescribing 'correct' grammar use (Lange 2005, p. 63). One element of this standardisation and purism movement was the rejection of foreign loanwords (Lange 2005, pp. 71-72). Alternatives were proposed to replace Latin and other Romance loanwords with native Germanic items, such as “Bescheidenheit instead of Discretion, Ausrede instead of Elocutio, beobachten instead of observieren, Geschmack instead of Gusto” (Geers 2005, p. 102. Note the existence of parallel tendencies in English, see above). Some of these puristic innovations made it into general language use, but a large number of loanwords from this period remains in the German lexicon.

So much for the historical-linguistic circumstances which affected borrowing in English and German. All these different influences shaped the languages' lexica into the form it has today. The World Loanword Database1 gives an overview of the most common origin languages for loanwords

in many different languages, including English. The results are displayed in table 1. The numbers are an estimation made on a sample of 1085 selected loanwords, so it should be seen more as a rough estimation than precise numbers. Accurate figures are hard to get by: As noted in section 2.1.1, it is not always clear where a cognate originated and estimates as to the total numbers of loanwords in both English and German vary drastically from source to source. Note that nouns, adjectives and verbs are the word classes which are borrowed most frequently. French and Latin are, by far, the most common source languages for loanwords.

(15)

Table 1: Estimates for the percentage of loanwords in English from different source languages. Grant (2009, p. 370).

2 2 Theoretical background: Linguistic distance and intelligibility testing

The previous section showed that German and English are languages with a common origin, which diverged from each other over their course of their development, both adopting words from other, typologically less closely related languages. Gooskens (2007, p. 446) lists the following factors relevant for successful semi-communication between speakers of two closely-related languages: The listener's attitude (ie. whether they regard the language as pleasant-sounding, prestigious etc.), previous language contact and experience, and the linguistic distance between the languages. Only the third factor is purely linguistic; the correlation between the non-linguistic factor and intelligibility is generally low. It is therefore crucial to use accurate measurements of linguistic distance in order to predict intelligibility. The overarching question of this research is concerned with the consequences of these historical developments for modern-day language users. The following section will be focusing on three concepts: First, how is linguistic distance (which can be seen, in a way, as the direct consequence of the historical development of a set of languages) defined and measured? Second, how does one test mutual intelligibility between two languages? And third, what are the results of past research on both linguistic distance and mutual intelligibility for the language pair English-German?

2 2 1 Measuring Linguistic Distance

Linguistic distance can be seen as the structural (phonetic, semantic, morphological, syntactic and pragmatic) differences between two varieties. The concept can be applied to the differences between dialects as well as between typologically unrelated languages and a variety of methods has been used in the past to measure linguistic distance. Borin (2013) additionally introduces the concept of 'linguistic differences': If linguistic distance is a summarized measure for a specific area of language (for example, lexicon or morphology), linguistic differences are all the different features which contribute to linguistic distance.

(16)

divisions, the isogloss method, which collects pronunciation differences of a number of individual items, and the structure geographic method, which takes into account the phoneme inventory of different regions (pp. 9-12). Perceptual methods use as data the judgments of dialect speakers and language experts; in experimental perceptual methods, subjects listen to dialect samples and then judge the distance to their own dialect (pp. 12-14; note the methodological overlap with opinion and functional intelligibility testing, which will be described in more detail below). Computational methods (pp. 14-24) have the advantage that they can make use of large amounts of data and do not rely on speaker intuition. (The Levenshtein method, which is used in this study, is an example for such a computational method; see below for a more detailed description). Kondrak (2003, pp. 274-276) gives an overview of, and evaluates, past methods used to compare the phonetic similarity of strings, including the Levenshtein distance. The emergence of quantifiable computational methods is a relatively recent development: Chiswick & Miller (2004) still claim that the “prevailing view is that [linguistic distance] cannot be measured. That is, no scalar measure can be developed for linguistic distance” (p. 1). Their proposed method of measuring linguistic distance consist of monitoring the language learning progress of immigrants in the United States and America; their speed of acquisition is interpreted as a measure of linguistic distance between English and the immigrants' respective language: “For the same number of weeks of instruction a lower score represents less language facility, and it is assumed that this means a greater distance between the language and English [...] On the basis of the assumption of linguistic symmetry, this provides a measure of the linguistic distance between English and a variety of other languages” (p. 7, my emphasis). This method is doubtful at best: It is not based on linguistic measurements and the results can be influenced easily by extralinguistic factors, such as the learner's attitude towards English or the degree of cultural integration of the different speaker groups. In conclusion, a vast array of methods for measuring linguistic distance is available today, and depending on the particular aspect of linguistic distance one seeks to capture, different methods might be chosen.

(17)

suitable to analyse large data sets. For the present study, the variant approach is used. Next, I will turn to the algorithm used to compare the data.

The Levenshtein index is an algorithm to calculate the distance between two data strings. Using it in the field of comparative linguistics is a relatively new approach, but has been applied successfully as an intelligibility predictor for small sets of languages (Gooskens 2007, p. 446). This measure was originally developed in computing sciences, but has been used to measure linguistic distance between Irish Gaelic dialects in Kessler (1995) and explored in more detail in Heeringa (2004). The method is used to calculate the distance between two strings of linguistic material (this can be a morpheme, an individual word, or a phonological transcription of an item). The strings are aligned so that consonants correspond to consonants and vowels correspond to vowels, whenever possible (see Kondrak 2003, p. 279 for the influence of the alignment method on the results of the calculation). For each insertion, deletion or substitution which is needed to turn one string into the other, a pre-determined value is attributed. The sum of these operations form the Levenshtein distance between the two strings. The following example demonstrates how to calculate the difference between a northern and a southern American English pronunciation of the word afternoon:

Table 2: Example calculation for the Levenshtein distance between two alternative pronunciations of the item afternoon. Nerbonne (2005, p. 12).

To compare data like two word lists containing items of different lengths, an additional step might need to be taken to account for different word lengths. Otherwise, two relatively long words would potentially show a higher distance than two short words and distorting the results (cf. Schepens et al 2012). To normalize the results for word length, the Levenshtein distance is divided by the total length of the alignment.

(18)

1000 native speakers of American English were asked to judge the speech samples for their 'foreign-accentednesss'. In total, speech samples from 99 different L1 backgrounds were transcribed and analysed in this way. The study shows a high correspondence of Levenshtein distance with native speaker judgement, concluding that “the LD-based method is not very different from a human rater, [...] we claim that the automatically obtained LDs are a valid means to assess foreign accent strength in pronunciation”. (p. 264). This shows that the method is flexible and can be used to analyse a number of different linguistic phenomena with relative reliability. In addition, being a computerized approach, it has the advantage that it can be used for large amounts of data.

2 2 2 Methods of intelligibility testing

Measuring linguistic distance by itself yields valuable information on the relationship between languages and dialects. For example, this information can be used to create dialect maps or to understand the historical development of dialects; another fruitful use of linguistic distance measurements is to analyse the relationship between linguistic distance and intelligibility since “mutual intelligibility depends on the degree of cross-language similarity” (Schepens, Dijkstra, Grootjen & van Heuven 2013, p. 1). There are a number of ways to test intelligibility: When testing intelligibility, one can either ask the speakers how well they understand another variety (opinion testing), or one can test directly how well they actually understand it (also called functional testing, Gooskens 2013). Opinion testing can be executed with or without using speech samples: Gooskens & van Heuven (2017, p. 26) differentiate between estimated intelligibility and perceived intelligibility. In the first case, participants are simply asked about their opinion towards a language variety; more precisely how well they believe could they understand this variety and how well do they think speakers of the test variety could understand their own. For this approach, participants need to be familiar with the test language at least to some extent, which bears the danger that attitudinal factors might influence the results; on the other hand, participants are not influenced by judgements on the test material (e.g. how friendly they judge the speaker of a spoken recording). When testing perceived intelligibility, where participants are presented with a language sample and asked to evaluate intelligibility, these kinds of judgements might play a role and should be taken into account. Testing perceived intelligibility has also been described as a sort of in-between method between opinion testing and functional testing since it works with actual speech samples (Tang & van Heuven 2009, p. 710).

(19)

the advantage that it reflects real language use arguably better than a word translation task, since in real-life communicative situations it might be more useful to understand the overall context of a piece of information than individual words. A disadvantage of this method is that it is less suitable to test the effect of particular features of language on intelligibility.

While functional testing arguably leads to more reliable results (cf. Swarte 2016, p. 46), it is methodologically more complicated (for example when the participants' answers need to be transcribed phonetically before they can be evaluated), so researchers often resort to opinion testing as a more feasible approach (Tang & van Heuven 2015, p 288). Tang & van Heuven (2009) claim that opinion testing and functional testing correlate to such an extent that both measures can reliably be used to determine intelligibility and can be used to draws conclusions regarding linguistic distance, at least for the number of Chinese dialects for which they applied both functional and opinion tests. They also claim that the results of intelligibility tests can be taken as a measure of linguistic distance, which is methodologically similar to Chiswick & Miller (2004, see above). In contrast, Tang & van Heuven (2015) choose the approach that a mathematically calculated distance between two languages can predict their mutual intelligibility. This is a different approach to the one taken here in that linguistic distance is seen as a predictor for intelligibility, not the other way around. Golubović & Gooskens (2015) uses a cloze test, word translation task and a picture task to determine the mutual intelligibility between West and South Slavic languages. The advantage of such a mixed approach is that different levels of language understanding are covered, from individual words to the overall context of a written piece of information

Finally, the influence of word length on intelligibility requires clarification. Kürschner (2013) reports that, in theory, longer words should be recognized and translated more easily than short words since “there are few words that are auditorily similar to elephant or hippopotamus. Also, the longer a word is, the more likely it is that in the word recognition process a ‘uniqueness point’ is reached before the end of the sound chain” (p. 169). This is connected with the concept of neighbouring words, which are “word forms that are similar to the stimulus word and may therefore serve as competing responses, hindering communication” (Gooskens et. al. 2015, p. 257). The more neighbours a word has, the harder it is to translate it correctly since it can easily be confused with one of its neighbours (Gooskens 2013, p. 3). However, in an intelligibility test, Kürschner's study shows a “negative correlation between word length and intelligibility score in our data shows that intelligibility decreases with growing word length” (ibid.). This shows that the connection between word length and intelligibility is not straightforward, but depend on additional factors such as the specific language pair in question.

(20)

information one aims to gather: Word list translation tasks on the one hand and whole text translation tasks provide different challenges for the participant and potentially yield different results.

2 2 3 Previous research on linguistic distance and mutual intelligibility between English and German

In section 2.3, some remarks were made on the specific linguistic distance between the language pair German/English (such as the different influences of other languages causing that “[a]t the lexical level we find English distinct from the other Germanic languages” (Heeringa et al 2013, p. 113). The question remains what conclusions have been made regarding the linguistic distance and mutual intelligibility of the languages. Different studies have dealt with these questions. For example, Chiswick & Miller (2004) attribute the score of 2.25 to the pair English-German, with a range from 1.0 (very different, e.g. Korean) to 3.0 (very similar, e.g. Swedish) using the method described above (note the methodological concerns connected with their approach of measuring linguistic distance by measuring speed of acquisition for immigrants). While the very small linguistic distance established for the pair Swedish-English is in line with the view that English is typologically closer to the North Germanic languages (see section 2.3.1), this result could also be due to extra-linguistic factors such as cultural similarities. The following data stems from a comparison (Swarte 2016) between five Germanic languages including German and English:

Table 3: Lexical distances between Germanic languages. Swarte 2016, p. 119.

(21)

Table 5: Phonetic distance between Germanic languages. Swarte 2016, p. 126.

The lexical distance (i.e. the percentage of non-cognates between the language pairs) is relatively high for the pair English-German compared to the lexical distances between the other pairs. This reflects the high percentage of French loanwords which are not present to that extend in the other languages analysed here. Note also the slight asymmetry between English and German. I will return to the orthographic and phonetic distances below, when I compare them to the distances calculated for the test items.

3 METHODOLOGY 3 1 Test type

The test employed in this study is a functional test, meaning it investigates the actual language skills of the participant and not their own judgements. While this was the main part of the study, participants were also asked to provide their foreign language knowledge, their age and educational status. In section two, different approaches to intelligibility testing were discussed: In this study, a single word list translation task is used, presenting participants with half written items, and half spoken items which were recorded and played to the participant during the test. Since the goal of this study was to compare the translating difficulties between two etymological groups, this test is well-suited: It is possible to determine for each word individually how well participants can translate it (in contrast with, for example, a text translation task, where participants could guess unfamiliar words from context. See also Kürschner, Gooskens & van Bezooijen 2008, p. 84). 3 2 Material & selection of test items

The study is designed to be unidirectional: The input language is German and the participants are native speakers of English (An additional comparison on how well native German speakers perform on translating the respective English cognates would give additional insight into potential asymmetries in intelligibility, but was not feasible in the framework of this study). Therefore, the test material required was a list of German words with a corresponding cognate translation in English. The word list was created as following:

(22)

spontaneous conversation and television material, and 3 million words taken from written sources such as fiction, newspapers, instructional texts and academic writing (Jones & Tschirner 2006, p. 2). The corpus is designed to be representative of actual language use, balancing “genre, register, style, geography, and age group” (ibid.). Additionally, information on word class and a suggested translation into English are provided. This makes it an ideal source material for this study, since the aim is ultimately to get a realistic picture on the communicative opportunities speakers can encounter on a day-to-day basis. Jones & Tschirner's list encompasses 4034 items in total, but for the present study the first 1000 most commonly used words were used as a basis for the test word lists. This brought about two methodological concerns: First, as mentioned above, high-frequency words in any language, or the core vocabulary, tend to be part of a language's lexicon for a long period of time, and are therefore more likely to be inherited words than loanwords. This was balanced by selecting an equal number of loanwords and inherited words, but when working with frequency lists in connection with borrowing and cognate behaviour, it should be kept in mind that high-frequency words tend to be more resistant to borrowing than low-frequency words. A second point of concern is the fact that even participants who report no previous knowledge of German might be familiar with individual high-frequency items simply because they are so common. This might result in a ceiling effect for certain items; I will return to this point in the discussion section below.

Taking these 1000 most frequent items as a basis, the following steps were undertaken to compile the word lists used in the study: First, only content words (nouns, verbs, adjectives & adverbs) were included. Function words are no appropriate test material for two reasons: First, they tend to be very short, which yields distorted results when calculating orthographic and phonetic distance. Recall also that short words tend to have a higher number of neighbouring words (i.e. words which are very similar to each other, whether in terms of spelling, pronunciation, or both), distorting intelligibility results (cf. Kürschner et. al. 2008, pp. 88f.). Second, providing accurate one-word translations of function words is exceedingly difficult since their exact meaning is highly dependent on context and syntactic structure (just take into account that an online dictionary [dict.cc)] yields 15 different German translations for the English item 'on', including adverbial translations). For these reasons, only nouns, adjectives, verbs and adverbs were included in this study.

(23)

of language development, whereas others are a development of the Renaissance. Nevertheless, they were grouped together to allow for an overall comparison with inherited Germanic items. The items in the frequency list were analysed one by one using this method to determine whether they were suitable testing material for the purposes of this study. This procedure was applied to the 1000 most frequent items in Jones & Tschirner (2006) until two word lists were established for the two categories G and L/F (these abbreviations will be used from now on to refer to inherited Germanic items and Latin and French loanwords, respectively). In total, 46 G-items and 48 L/F items were used in the study. The average word length of the two lists was also calculated by counting the average number of letters per item. The resulting word lists were sorted in alphabetical order and then used as the basis for calculating the linguistic distance between the items and, ultimately, designing the survey. Table 6 shows the first six items, respectively, of the two word lists for inherited Germanic items and Latin and French loanwords, including the cognate translation and indication of word class.

G L/F

German item Translation Word class German item Translation Word class

Beginnen To start Verb Absolut Absolute Adjective

Besser Better Adjective Artikel Article Noun

Bringen To bring Verb Aspekt Aspect Noun

Buch Book Noun Direct Direct Adjective

Ding Thing Noun Diskutieren To discuss Verb

Ende End Noun Familie Family Noun

Table 6: Exemplary excerpt of the compiled word list, alphabetized.

In addition, six dummy items were selected from the same frequency band as the test items: The relevant selection criteria were that there was no adequate cognate translation available, unlike with the actual test items. The dummy items were chosen to represent the same word classes like the test items (i.e. no function words, see above) and were judged to be easy enough to translate for even a beginning learner of German. Since the test was designed for speakers with no or little previous knowledge of German, these items were included to exclude participants with too much previous knowledge of German, assuming that a speaker with no previous German knowledge would be able to guess correctly when translating a cognate but would have no way of correctly translating these dummy items. While participants were asked to provide information on how well they speak German, these items were included as an additional measure to exclude participants with too much proficiency in German.

German item Translation Word class

Schwierig Difficult Adjective

Sofort Immediately Adverb

(24)

Geschäft Shop/Business Noun

Benutzen To use Verb

Zählen To count Verb

Table 7: Dummy items. Note that there is an ever so slight semantic overlap between German zählen and its cognate translation to tell (both derive from Proto-Germanic *taljana to count), which can be seen, for example, in the compound bank teller. I judge this to be too far apart for an untrained test participant to make this connection, which justifies the addition of this word as a dummy item.

As a final step in compiling the test material, all test items including dummy items were recorded by a native German speaker using a Zoom mic active Handy H2 voice recorder. The individual files containing one item each were saved as .mp3 files to be inserted into the spoken translation part of the test. To sum up: The test material was compiled by selecting cognates between English and German from a high-frequency band of German words. Items were selected for their etymological origin: Germanic inherent words and French & Latin loanwords were added. These word lists formed the basis of calculation of orthographic and phonetic distance as well as the functional test.

3 3 Calculating orthographic distance

Tang & van Heuven (2015, pp. 309-310) point out that more accurate results can be achieved when one measures distance between the test items actually used (in this case, the word lists) instead of using pre-existing data which is valid for the whole lexicon. Therefore, the linguistic distance between the test items was to be calculated using the Levenshtein index as described above. The first step was to determine the orthographic distance between the test items. The Levenshtein distance was calculated manually using the following criteria: Insertions or deletions were awarded 1 point, as well as changes; diacritics were awarded ½ operation for change between 'ü' and 'u', but a full operation when the corresponding vowel was different. In the following example (comparing German persönlich with English personal), four insertions were counted, and one vowel change – resulting in a Levenshtein distance of 4.5. With a total segment length of 11, the orthographic distance between the items normalized for length is 40.9%.

p e r s ö n l i c h

p e r s o n a l

0 0 0 0 0,5 0 1 0 1 1 1

Table 8: Example calculation of orthographic distance.

3 4 Calculating phonetic distance

(25)

point, a substitution of a vowel by a vowel or of a consonant by a consonant was awarded 0.5 points. Van Heuven (2008, p. 56) points out the importance of stress patterns for the recognition of familiar and unfamiliar words: Accordingly, this factor was taken into account when calculating the phonetic distance between items: 0.5 points were awarded when the main stress fell on different syllable positions (not applied when one of the compared items was monosyllabic). Diphthongs were counted as two separate segments. Even though the results would be less precise, the calculation was based on a rough transcription to keep the workload feasible (cf. Wieling et al 2014, p. 261). This means that, for example, individual segment length was not taken into account. The following example illustrates how the distance was calculated for the German and English pronunciation of the item pair persönlich/personal. Three substitutions of vowels and consonants for a different consonant were awarded 0.5 points, and four insertions were awarded one point each. Additionally, 0.5 points were awarded because the main stress falls on the second syllable in the German item, but on the first syllable in the English one. Normalized for a total alignment length of 10, the resulting phonetic distance between the items amounts to 60%. Table 9 illustrates this exemplary calculation.

P ɛ ʁ z øː n l ɪ ç

p ɜː s ə n ə l

0 0,5 1 0,5 0,5 0 1 0 1 1

Table 9: Example calculation of phonetic distance.

3 5 Study design

The next step was to create an online-based application to test the actual intelligibility scores of the created word lists, using Qualtrics survey software. The study was created as an online application to facilitate collecting results from a broad variety of participants without restriction to a geographic area.

(26)

Test A Test B

Spoken 50 items 50 items

G: 24 L/F 23 D: 3 G: 24 L/F 23 D: 3

Written 50 items 50 items

G: 22 L/F: 25 D: 3 G: 22 L/F: 25 D: 3

Total 100 items 100 items

Table 10: Test versions A and B with indication of the number of items for each group.

A short introductory test informed participants that they were about to take part in a vocabulary test of German aimed at native English speakers. Participants were informed that the test was directed at speakers with little or no previous knowledge of German. Since this might cause confusion among the participants (since they might have, rightfully so, asked themselves the same question I asked in the introduction: How could I possibly be able to perform well on a vocabulary test in a language I do not speak?), they were also informed that some words might be similar to words they already know, so they should attempt the test even if they judge their German knowledge to be non-existent. By this, participants were made aware of the presence of cognates in the test without explicitly naming the phenomenon. This might have influenced the test results since participants might have been more aware of the presence of cognates than when encountering speech in a natural speech situation, which should ultimately be the goal of intelligibility research. Nevertheless, the decision was made to include this statement in the introductory test to avoid participants getting discouraged from taking part in the test.

(27)

English speakers, due to the geographic proximity, might have more language contact with German than, for example, American or Australian speakers of English.

3 6 Participants

In this section I will argue why it was not feasible to analyse the results separately by language background of the participants, as originally planned (i.e. break up the results by speakers with knowledge of other Romance or Germanic languages, and by speakers who speak no other language besides English.

When the test was designed, two different versions were produced in order to compare the written and spoken data of each word without presenting one participant with the same word twice (see section 3.5). Unfortunately, this approach had to be modified due to a scarceness of complete replies, especially for test version A. The total number of results is displayed in the table below: ‘Invalid’ refers to responses where no item was attempted to translate, and the answers of participants who were excluded due to wrong language background like not being native speakers of English, or because they showed a too high proficiency in German (correct translation of more than three out of six dummy items/self-reported). This category also includes responses where participants did fill out the questionnaire, but did not continue to the functional test afterwards. ‘Partial’ refers to those respondents who discontinued the test before finishing. The high number of impartial responses might in partly be due to technical problems which were not anticipated: Some participants reported that the first part of the test, which contained the spoken data, was inaccessible on certain types of mobile phone browser. For other mobile phone browsers, participants reported that the sound file took a long time to load, resulting in the time running out before the participant was able to enter their translation. In addition, the test was relatively time-consuming (it took participants between 15 and 30 minutes to complete the test). These factors might have contributed to a large number of participants not completing the test. Additionally it created an asymmetry in the amount of answers for spoken and for written test items. Three data sets were excluded from the analysis: One answer set was excluded from test A because the participant reported to be a native German speaker. Two answer sets were excluded from test B because the participants translated more than three of the dummy items correctly. The following table gives an overview of the numbers of responses for both test versions.

Test version A Test version B Total Data points

(28)

Total 52 55 107 Spoken: 1838 Written: 1199 Total: 3037

Table 11: Number of complete, partially complete, and invalid responses for both test types. The three columns to the left refer to the number of responses. The column to the right refers to the number of data points collected from these responses, i.e. the number of individual translations for written and spoken question types.

Table 11 shows that more spoken than written data was collected, since the test began with the spoken section. There are no partial responses for the written section, indicating that all those who did not complete the test stopped during the first, spoken, section. This is problematic for the analysis of the results insofar that less data is available overall, but mostly because it creates a bias towards the test items which were (by coincidence) asked at the start of the test. Another potential issue is that those candidates who felt confident with the test and had an interest in linguistic questions (and therefore might achieve higher scores) might have been more likely to complete the test than participants who did not. This could be tested by analysing whether the score of the participants for the first part of the test correlates with him or her continuing to the end or not, but such an analysis was not feasible here. In the analysis of the results in section 4.3 below, this is taken into account by including only complete responses in the analysis of the highest- and lowest scoring items. The overall results include complete as well as partial responses. While this means that the results can be distorted due to the aforementioned bias, this was done to include more of the collected data.

The following data includes those answers which were complete or partially complete. The average age of the participants was 28.5 years with a standard deviation of 8.4 and a range from 18 to 58 years. 68.4% reported their educational background to be on university level, 19.3% reported to have completed tertiary education. The remaining 12.3% reported secondary education or lower. These factors were not taken into account in the analysis of the results, but it should be kept in mind that the educational status of the participants might influence their test performance.

(29)

4 RESULTS

4 1 Linguistic distance of the test items

Using the Levenshtein distance as explained in sections 3.3 and 3.4, the following results were obtained for the orthographic and phonetic distance between the test items: In terms of orthographic distance, the Germanic items showed a higher distance (41.9) than the French and Latin items (28.1). The phonetic distance is slightly lower (42.5) for Germanic items compared to the Latin and French items (45.2).

Figure 1: Orthographic distance normalized for length of the test items, separated by etymological origin (G: inherited Germanic items: L/F: French and Latin loanwords).

Figure 2: Phonetic distance normalized for length of the test items, separated by etymological origin (G: inherited Germanic items: L/F: French and Latin loanwords).

(30)

Germanic items had an average length of 5.1 for the German word, and 5.0 for the English translation. The French & Latin items had an average length of 7.7 (German) and 7.2 (English). Arguably, this reflects the written part of the test only, since word length is measured in the number of letters, not the number of phonemes or the average duration of the spoken word. While this would give valuable additional information, this was not feasible in the span of this study.

4 2 Results functional test

To evaluate the results of the functional test, the collected responses were checked manually. A correctly translated item was awarded one point. Correct answers include answers containing spelling mistakes, unless when the spelling mistake resulted in another word. If a participant correctly identified the semantic properties of the test item, but not the word class, 0.5 points were awarded, as in the case of German interessieren (verb), which was translated frequently as interesting (adjective): Only one participant did not recognize the meaning of the word. The remaining participants (excluding those who did not encounter the test item because the test was not completed) all translated the meaning correctly, but a majority (82.4% for spoken data and 71.4% for written data) did not recognize the correct word class. Blank gaps and incorrect answers were both awarded 0 points.

In the following overview of the results, the data from incomplete results was included. This does distort the results slightly because more results were collected for some items than for others (since it is possible that, by coincidence, items which were more difficult to translate grouped at the start or the end of the test), but this approach was chosen due to the high number of incomplete responses. This was possible because results were not calculated for individual items, but for each group – Germanic and French/Latin items. The sum of all answers (each scored as either '0', '1', or '0.5') was calculated and then divided by the overall number of results. Figure 3 shows the combined results for test A and B. The y-axis refers to the average score across participants, with '0' meaning no correct results and '1' meaning that every participant gave the right answer. For Germanic items, the score across participants was 0.54 for written and 0.39 for spoken items. For French and Latin loanwords, participants scored an average of 0.87 for written and 0.53 for spoken items.

(31)

While the results in figure 3 show the average performance across items, the following tables shows the items with the lowest and the highest score across participants, respectively. For this part of the analysis, only the data from completed tests were included so items which happened to be shown towards the end of the test could also be meaningfully included. The ten column to the left refer to the highest-scoring items, the ten columns to the right represent the lowest-scoring items. The graph shows the etymological origin of the items in question; precisely which words were scoring the highest & lowest is shown in figure 4 below.

Figure 4: Highest- and lowest scoring items (written) with indication of item type (G: inherited Germanic items: L/F: French and Latin loanwords). Note the ceiling effect for the highest-scoring items: In the written section, the item was translated correctly by all participants.

(32)

While figure 4 and 5 already give some indication as to which type of item was the easiest to translate (given that the highest-scoring items all belong to the group L/F, see the discussion section below), it is interesting to see which items in particular got the lowest and highest scores: The tables 12 and 13 give an overview of the highest- and lowest scoring items while noting the corresponding orthographic and phonetic distance for written and spoken items, respectively.

Written Spoken

Item Score Type Orth. dist. Item Score Type Phon. dist.

Meter 1.00 L/F 0.0 Absolut 0.94 L/F 36.0 Moment 1.00 L/F 0.0 Moment 0.94 L/F 36.0 Präsident 1.00 L/F 10.0 Aspekt 0.88 L/F 17.0 Programm 1.00 L/F 12.5 Programm 0.88 L/F 31.0 Situation 1.00 L/F 0.0 Studieren 0.88 L/F 63.0 Familie 1.00 L/F 28.6 Information 0.86 L/F 50.0 Information 1.00 L/F 0.0 Musik 0.86 L/F 33.3 Papier 1.00 L/F 16.7 Papier 0.86 L/F 16.7 Problem 1.00 L/F 0.0 Problem 0.86 L/F 21.0 Sozial 1.00 L/F 16.7 Universität 0.86 L/F 50.0

Table 12: Ten highest-scoring items in the written and spoken test. 'Score' indicates the average score for the item in question across all participants who completed the test.

Written Spoken

Item Score Type Orth. dist. Item Score Type Phon. dist.

(33)

Folgen 0.05 G 50.0 Kapitel 0.00 L/F 57.0

Tief 0.00 G 75.0 Tief 0.00 G 33.0

Kapitel 0.00 L/F 50.0 Rund 0.00 L/F 17.0

Table 13: Ten lowest-scoring items in the written and spoken test. Score' indicates the average score for the item in question across all participants who completed the test.

It should also be noted that the highest-scoring items (referring to the written section only) have a higher average word length than the lowest-scoring items (5.7 vs. 7.4). This mirrors the average word length of the F/L and G groups overall, but it should be noted that among the four L/F words among the lowest-scoring items, two (Monat and Zelle) are comparably short.

5 Discussion & Conclusion

The following section will discuss to what extend the results answer the research question and what this means in the context of European languages influencing each other over the course of their historical development.

Referenties

GERELATEERDE DOCUMENTEN

Because systemic information processing has a relation with popularity and influence, these variables explain what makes a blog or blogger popular and influential.. The

Phase 3: Pre-selection of consultancy Phase 2: Proposal writing Information about the project and the requirements (KSA + Personal characteristics) Phase 3a:

Our paper is organized as follows: in Section 2 we introduce the basic prelim- inaries about Gibbs measures, in Section 3 we analyze the first moment of N in the case of matching

However, if the shelf of PIS A (blood of type O) is empty of items an arriving demand of type A is unsatisfied, since demand of type A cannot be satisfied by an item of type B

The study focused on knowledge about the existing policies and their applicability (HIV and AIDS policy, as well as the policy on people with disabilities), awareness and knowledge

Op woensdag 20 maart 2013 heeft Condor Archaeological Research bvba in opdracht van McDonald's Restaurants Belgium N V een booronderzoek uitgevoerd aan de Tongersestraat

Het vasthouden van water zelf kan ook een kwaliteitsverbeterende maatregel zijn, omdat er daar- door minder gebiedsvreemd water hoeft te worden aangevoerd..

Voor zover ik weet zijn er echter in Mill geen zwammen gevonden.. Dit is wel het geval in Liessel, een andere Brabantse locatie op 45