• No results found

Dialects across time and space Computational modeling of dialects in the Netherlandic language area

N/A
N/A
Protected

Academic year: 2021

Share "Dialects across time and space Computational modeling of dialects in the Netherlandic language area"

Copied!
67
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Dialects across time and space

Computational modeling of dialects in the Netherlandic language area

September 2020

Author

Raoul Buurke

Supervisor

prof. dr. Martijn Wieling

(2)

Contents

Page Abstract v Acknowledgments vi 1 Introduction 1 2 Background 3

2.1 Approach to dialect research . . . 3

2.2 Approach to language change . . . 4

2.3 Transcriber differences . . . 5

2.4 The Netherlandic language area . . . 6

2.4.1 Low Saxon . . . 8

2.4.2 Low Franconian . . . 9

2.4.3 Frisian . . . 12

2.5 Research questions . . . 13

3 Data 16 3.1 Reeks Nederlandse Dialectatlassen (RND) . . . 16

3.2 Goeman-Taeldeman-Van Reenen Project (GTRP) . . . 18

3.3 From Dialect to Regiolect project (DiaReg) . . . 21

4 Methods 24 4.1 Levenshtein distance . . . 24

4.2 Phonetic inventory reduction . . . 26

4.3 Quantifying diachronic change . . . 29

4.4 Assessing Low Saxon change . . . 30

4.5 Assessing the direction of change . . . 32

4.6 DiaReg validation . . . 34

5 Results 35 5.1 Influence of recording years . . . 35

5.2 Phonetic inventory reduction . . . 36

5.3 Assessing Low Saxon change . . . 40

5.4 Assessing the direction of change . . . 40

5.5 DiaReg validation . . . 43

(3)

6.1 Discussion . . . 45

6.2 Conclusion . . . 49

6.2.1 Future research . . . 49

6.2.2 Summary . . . 52

Appendix 60 1 Full list of characters occurring in RND and GTRP transcriptions . 60

List of Tables

2.1 A summary of properties associated with the language varieties. . . . 13

3.1 Summary of the RND recording regions and years (in chronological order). . . 17

3.2 Overlapping words between RND and GTRP. . . 21

3.3 Overlapping words between GTRP and DiaReg. . . 23

4.1 Example Levenshtein alignment between dialectal variations of Dutch ‘straat’. . . 24

4.2 VC-penalized alignment of ‘straat’ variations. . . 25

4.3 A segment distances matrix based on PMI values between phonetic symbols. . . 26

5.1 List of IPA symbols that are absent in at least one of the subsets. . . . 36

5.2 The 28 phonetic symbols of the reduced phonetic inventory with the replaced symbols. . . 38

5.3 Wilcoxon rank sum tests between pronunciation change in Low Saxon areas and in non-Low Saxon areas in the Netherlands. . . 40

1 Occurrence of all phonetic symbols in each subset. . . 60

List of Figures

2.1 Convergence and divergence based on phonetic patterns between neigh-boring dialects. Blue lines indicate divergence. Red lines indicate con-vergence. . . 14

3.1 The RND locations.. . . 16

(4)

List of Figures

3.3 The GTRP locations. . . 19 3.4 Boxplot of the difference in years between the RND and GTRP recordings. 20 3.5 The overlapping locations between the RND and GTRP. . . 20 3.6 The overlapping selected locations between the DiaReg and GTRP. . . 22 4.1 The two transcribers who transcribed most of the GTRP transcriptions

with corresponding recording locations. . . 28 4.2 An example of a contour plot of a generalized additive model

predict-ing phonetic distances based on geography. . . 30 4.3 The manually selected (southern) Low Saxon border in Gelderland. . 31 4.4 MDS map based on GTRP data. . . 33 5.1 Pronunciation change predicted on the basis of geography. . . 39 5.2 The first three multidimensional scaling dimensions mapped onto RGB

color space. Left plot: the RND data. Right plot: the GTRP data. . . . 41 5.3 The first multidimensional scaling dimension. . . 42 5.4 The second multidimensional scaling dimension. . . 42 5.5 Pronunciation change predicted on the basis of geography for both

(5)

Abstract

In this study diachronic pronunciation change across the Netherlandic area is investigated using two large-scale phonetic corpora, as well as existing and novel dialectometric methods. The Low Saxon varieties in the Netherlands are analyzed in depth due to unexpected recently emerging patterns, such as exhibiting change away from the standard language as opposed to towards it. This raised questions as to whether it was possible that Low Saxon varieties are exhibiting the main signs of regiolect formation.

The phonetic data originated from two different data sources during differ-ent time periods in the 20th cdiffer-entury, which also introduced systematic variation between transcribers. These differences were minimized using a point-wise mu-tual information (PMI) based method for merging the different phonetic invento-ries to a common one, which enabled the direct comparison of these data.

Further analyses showed that (1) regiolect formation may be taking place for the Low Saxon varieties, (2) relatively vital language groups in the area resist change better than the less vital ones, and (3) overall phonetic change is both slow and towards the standard language during the 20th century. These findings were partially evaluated with a third dataset containing more recent phonetic data. Our findings are in line with research on Dutch dialects and minority lan-guages, but several refinements are still necessary to fully explain the patterns that we find, especially when it comes to the preprocessing methods for phonetic transcriptions proposed here.

Word count: Section Word count

Introduction 923 Background 5602 Data 1868 Methods 3846 Results 2679 Discussion 2041 Conclusion 2331 Total 19290

(6)

Acknowledgments

As with any research project there are fewer names on the front than there should be. I was obliged with the least pleasant part of this work, i.e. the write-up of all the work during a period where everything is affected by a pandemic, so my name is on the front. This work showcases in one way or another the collective effort of our Speech Lab Groningen, however. A few names deserve explicit mention regarding the work exercised here, but I want to thank my fellow lab members for their advice on related efforts during the thesis period as well. One of these related activities was the PhD application procedure in which I succeeded and which allows me to refine the work of this thesis in the years to come, and which I am very much looking forward to.

Three of my colleagues deserve credit for both the thesis as well as the suc-cessful PhD procedure. The first is Martijn Wieling, who always proved ready to answer any of my many questions and pushed me to put in the effort to ensure my work is always of the highest possible quality. Together with Remco Knooihuizen he will supervise me during my doctoral project, and I expect to enjoy every part of it. Next is Martijn Bartelds, who supervised my research internship, and who like Martijn (senior, as I typically distinguish them) was always ready to give me quick and concise advice on analyses. The last but far from least is Hedwig Sekeres, whose transcriptions are used for part of the analyses in this thesis and without her meticulous feedback this thesis would have been considerably less readable. Next to this, she brightens my daily life, endures the more and lesser fortunate moments with me, and fills me with love.

In particular when it comes to the PhD application I feel I should explicitly thank Teja Rebernik. She has also dedicated a considerable amount of her time and good will into preparing me for all the required steps. Without her plain instruction to stop rambling on in conversation I would never have done well in the interview. I also thank Jelle Brouwer, who I think was very much looking forward to having me as a colleague given the frequent inquiries whether I had heard back from the committee yet. I also thank those who participated in the mock interviews and I have not named yet: Antonio Toral Ruiz and Floor van den Berg.

Lastly, I want to thank Hanneke Loerts, Simone Sprenger, Petra Hendriks, Jack Hoeksema, and the many others who I collaborated with over the past few years as a student in my beloved hometown. I am glad to say I will be able to stick around for a few more years and hope to see you all again soon.

(7)

1 Introduction

In this thesis we focus on pronunciation change in the Netherlands and Flanders (in Belgium). An umbrella term for this area is the Netherlandic language area, which correctly implies that (Standard) Dutch is the main language variety spo-ken in the region, but at the same time many other language varieties are also spoken here. We focus on these language varieties as opposed to the standard-ized Dutch variety, and in particular on the Low Saxon varieties spoken in the northeastern parts of the Netherlands. Like the other (Dutch) dialects, these va-rieties are in decline due to an aging and decreasing speaker population, little intergenerational transmission, as well as the longstanding political power and linguistic influence of Standard Dutch. These simultaneously occurring processes are worth exploring, because the complex interplay between them are of great interest to language policy makers, linguists, and language enthusiasts.

One may wonder why the Low Saxon varieties are of particular interest to us, as there are clearly many varieties in the Netherlandic language area. Impor-tantly, Low Saxon dialects are not dialects of Dutch but of Low German, which makes them different from all other varieties (except Frisian varieties in Fries-land province). They are also spoken in at least five out of the twelve Dutch provinces, and they therefore have a substantial speaker population. Despite these characteristics, which are typically seen as properties of vital languages, Low Saxon varieties are in steep decline. The paradoxicality of this situation is even greater in light of striking internal language change patterns (elaborated upon in the background section), which makes the Low Saxon varieties a highly interesting linguistic research topic. Of particular interest is (1) whether these varieties change similar rate as the other language varieties, and (2) whether ongoing pronunciation change is toward Standard Dutch or away from it.

We are not the first to investigate pronunciation patterns of the Netherlandic language area. In fact, we tread in the footsteps of many researchers before us at the University of Groningen, whose studies are referred back to frequently throughout this work. Most of these studies have either dealt with the contempo-rary language variation of the area, or they investigated language change using data from different contemporary generations of speakers (the so called apparent-time approach). For the first apparent-time, we will investigate pronunciation change in this area from a diachronic perspective instead. We use large-scale language samples from different points in time (a real-time approach), which is arguably the most direct approach to investigate pronunciation change of the respective language varieties. In order to execute such a real-time analysis, we rely on ex-isting phonetically transcribed datasets from different sources. The first dataset is comprised of a part of the Reeks Nederlandse Dialectatlassen (Blancquaert & Pée, 1930; collected between 1923 and 1982), and our second dataset derives

(8)

1 Introduction

from the Goeman-Taeldeman-Van Reenen Project (Taeldeman & Goeman, 1996; collected between 1980 and 1995). These are henceforth abbreviated to RND and GTRP, respectively, and they can be found freely online through Gabmap[1]

(Nerbonne et al., 2011; Leinonen et al., 2016). These large-scale datasets

con-tain dialect data across the Netherlandic language area, and by comparing these datasets we are able to pronunciation change.

Many different transcribers contributed to these datasets, but transcribers are known to have different preferences and transcription practices. This made a direct comparison of these datasets impossible up until now, but we address this problem here by ensuring that any detrimental effects due to transcriber differences are minimized. All phonetic transcriptions are adjusted in such a way that for all transcriptions the same phonetic symbol inventory is used, which leaves the general pronunciation patterns intact, and this is done in such a way that only a minimal amount of phonetic information is lost.

When the phonetic transcriptions have been made comparable, we quantify pronunciation change between the RND and GTRP using dialectometric methods. We model pronunciation change across the Netherlandic language area, and we take into account the role of geography in these changes by means of generalized additive modeling. We explicitly model the contrast in pronunciation change be-tween Low Saxon and other dialect groups in the Netherlandic language area as a part of this analysis. Moreover, we assess whether the pronunciation change is towards Standard Dutch or away from it by applying multidimensional scal-ing to our data. Lastly, we evaluate our findscal-ings partially by analyzscal-ing change of Frisian and Low Saxon areas between the GTRP and phonetic data from the From Dialect to Regiolect (hencefort DiaReg) project. This newer phonetically transcribed dataset (from around the 2010s) covers the same geographical area as the RND and GTRP. This final analysis shows that our findings are reliable, because we find similar patterns here.

The rest of this thesis is structured as follows. In the background we explain the approaches we take to investigating pronunciation change, and we explain how we deal with transcriber differences. We then provide an overview of the languages in the Netherlandic and their relevant characteristics for this analysis, which leads to our research questions and hypotheses. In the data section we de-scribe key characteristics of each phonetically trande-scribed dataset, and we explain how and for what purpose these data were obtained. In the methods section we describe the dialectometric methods we apply here and how they are adjusted for the current purposes. The results section follows the same order as the methods section, and we summarize the results of the phonetic reduction procedure here, together with the (overall) pronunciation pattern analyses. In the discussion sec-tion we synthesize our results to form a conclusion about the current state of Low Saxon varieties. We suggest several points of improvement in the conclusion.

(9)

2 Background

In this section we first describe the line of research to which this study belongs. Afterwards, we clarify our choice for a real-time analysis and these particular data collections. We acknowledge some known shortcomings of our datasets and specify how we deal with these problems in the analyses. A considerable amount of space is then dedicated to providing background information about the vari-eties in the Netherlandic language area, which greatly aids interpretation of any patterns we find in our analyses. Our research questions and hypotheses then follow from this overview and the changing speech patterns in the Low Saxon area that are reported in the literature.

2.1 Approach to dialect research

Early efforts in dialect geography byWenker & Wrede(1889) andGilliéron & Ed-mont(1902) resulted in linguistic atlases covering hundreds of locations in Ger-many and France, which have inspired Ger-many linguists to follow in their footsteps since. These works contained the phonetic transcriptions of speech exhibited by local language speakers, and they therefore described much of the variation present in the area. Similar atlases have been constructed that deal, for example, with the language varieties on the Iberian peninsula (Alvar,1974,1985), in the United Kingdom (Orton et al., 1998), and also in the Netherlands and Belgium

(Blancquaert & Pée,1930;Taeldeman & Goeman,1996).

Such collections have often been used in the field of dialectometry, which was pioneered bySéguy (1971, 1973) andGoebl (1982). They were among the first to quantify the similarities or differences between dialect transcriptions and to use this information to infer linguistic variation between dialects. The next crucial innovation came fromKessler(1995), who pioneered the use of the Lev-enshtein distance (Levenshtein, 1966) for the same purposes, which allowed for a greater level of detail than the binary distinctions that were used before. This string comparison algorithm is used to compare phonetic strings and quantifies how different they are from each other (by means of a count of how many bi-nary operations are necessary to turn one in the other). Kessler (1995) applied this approach to dialects of Irish Gaelic by computing the differences between all the phonetic strings and then clustering the distances he obtained, which yielded reliable dialect groupings.

The Levenshtein distance has been used many times in the field since then, such as in Heeringa (2004)’s influential PhD dissertation. Analyses within this framework have often been executed at the University of Groningen, causing some to term the ‘Groningen School of Dialectometry’ as a separate research

(10)

2 Background

school (Szmrecsanyi, 2011, 45). Notable and relevant achievements are the de-tection of the traditional dialect groupings (Nerbonne et al., 1996), the remark-ably successful perceptual evaluation of the Levenshtein distance (Gooskens &

Heeringa, 2004; Wieling et al.,2014), and the fine-tuning of the dialectometric

procedure in general by evaluating the many computational techniques in use, such as multidimensional scaling (Embleton, 1993; Wieling, 2007) and various clustering techniques (Wieling et al., 2007; Nerbonne et al., 2008; Heeringa &

Hinskens,2019). Due to the success of these studies, we have also opted here for

the Levenshtein distance as a method for quantifying pronunciation change. The variations that we apply specifically in the analysis are elaborated upon in the methods section.

2.2 Approach to language change

It is necessary to address the general approach to researching language change taken here, before we continue with the specifics. The most important distinction is arguably that of real-time versus apparent-time methods. This is a classical dilemma within the field of sociolinguistics and may influence results greatly. Many researchers use the apparent-time hypothesis (ATH) when they investigate language change (see Sankoff, 2006 for a methodological overview). The key assumption of the ATH is that individual language systems of speakers do not change anymore after adulthood is reached, which means that older speakers of a language reflect an earlier form of the language, while younger ones use the most innovative linguistic forms.

The validity of the ATH has been contested, however, as there is clear evi-dence of speakers’ language systems still changing considerably well into adult-hood (Blondeau,2001;Ashby,2001;Sankoff,2004; Sankoff & Blondeau,2007). Unless this individual lifespan change is somehow accounted for while analyz-ing the language varieties, one will therefore inherently underestimate the rate of change at the community level, because individuals have changed along with the community-level change during their lifetime. When combined with other methods the results from an apparent-time study can be reasonably validated (cf.

Sankoff, 2006), but this crucial shortcoming cannot inherently be overcome and

leaves it undesirable to apply here[2].

Real-time studies, on the other hand, sample the same language variety over time, either by tracking the same speakers over time or by tracking the same area, which is an intuitive approach to language change. Any real-time study comes with a set of difficult problems (Tillery & Bailey, 2003), however, such as high expenses and finding of reliable participants. This explains why these large-scale data collections are rare and not always pursued by dialectologists. In our case the data is already there, but the aforementioned problems are still reflected in the quality of the data collection. Using already collected data has as an important [2]My upcoming doctoral project addresses these problems more directly by quantifying the un-certainty of lifespan change directly in a combination of a real-time and an apparent-time analysis, when diachronic data is available.

(11)

2.3 Transcriber differences

advantage, however, that it has been used for dialectometric analysis. We will therefore be aware of at least a subset of its shortcoming, and we will be able compare our analyses to those in the literature.

2.3 Transcriber differences

We are aware of shortcomings in the data collections, because they have been subjected to meticulous investigation before. Hinskens & Oostendorp(2006) an-alyzed the GTRP data and found that there were transcriber effects, specifically when looking at the variation in how /nd/- and /nt/-clusters were transcribed in the data. It should be noted that the authors did not analyze a large amount of phonological predictors, but focused instead on these particular variations based on suggestions in prior literature. This makes sense, but it also means that other structural issues in the data are like a black box, and it certainly suggests that we should be wary of transcriber effects.

Similarly,Wieling(2007) noted that there is considerable variation between the RND and GTRP in how many phonetic symbols transcribers of different geo-graphical regions used for their transcriptions. For example, for the Dutch GTRP transcriptions almost double as many phonetic symbols were used than for the RND transcriptions, which indicates that in all likelihood that some or most GTRP transcriber in the Netherlands preferred ‘narrower’ transcriptions.

Differences between Dutch and Belgian transcriptions become especially clear when we look at two border dialects (Goirle in the Netherlands and Pop-pel in Belgium) that are approximately 10 kilometers apart. We can see that transcriptions for the same words differ notably, even though only minor differ-ences are expected given their locations on the dialect continuum. Example tran-scriptions respectively include [bɬɔnt] vs [blɔnt] (‘blonde’), [kɒməʁs] vs [kɔmərs] (‘rooms’), and [tʀɒlis] vs [trɑlis] (‘bars’). The Dutch transcriptions clearly show a structurally greater variation in realizations of /r/, which is unlikely to be rele-vant in accurately distinguishing these two closely-related varieties. It is possible that Dutch transcriptions made by Dutch transcribers exhibit such patterns, be-cause the well-attested and substantial /r/-variation in the Netherlands (Van de

Velde & Van Hout,1999; Van Bezooijen, 2005) is likely to be known to

experi-enced transcribers, and therefore these differences are likely to be more salient to them. As a consequence, Dutch transcribers pay extra attention to these dif-ferences and use a greater set of symbols.

It is not surprising that transcriber differences occur, especially when there are many transcribers with different (linguistic, but also social) backgrounds. After all, speech perception is subjective, and physiological hearing capacity also differs from person to person. Moreover, it is a well known fact that in general inter-transcriber reliability is relatively low (rarely higher than 80%) even among experienced transcribers (Amorosa et al.,1985), and this effect is amplified when narrow transcriptions are made as opposed to broad ones (Shriberg & Lof,1991). We attempt to address this problem in our data by reducing the number of phonetic symbols that are used across the datasets, which we do by replacing

(12)

2 Background

phonetic symbols with their nearest equivalent in phonetic space. This phonetic space is induced on the basis of co-occurrence patterns in the complete set of transcriptions, which is explained in detail in the methods section. What is im-portant to mention here, however, is that we are able to apply this procedure proportionately. The Dutch GTRP symbol inventory, which contains the most “excess” phonetic symbols, is reduced the most and the other symbol inventories the least, which avoids losing too much phonetic information. The goal is to have a single phonetic inventory that is used for all transcriptions in the end.

2.4 The Netherlandic language area

We take the Netherlandic language area into view, which is artificially bound by the geographical borders of the Netherlands and Flanders (the historical “Low Countries”), although almost all language areas are typically not restricted by political borders due to cross-border language contact. Across this geographical area varieties from three main language families are spoken: Low Saxon, Low Franconian, and Frisian. These West Germanic languages branch out into a large number of more local varieties. Numerous cities in the Netherlands even have their own urban dialect, which suggest the presence of many highly localized varieties in an already small geographical area.

The abundant inter-language influence of these varieties makes it extremely difficult to present an accurate picture of what uniquely defines these languages and how to distinguish them, even if we restrict ourselves to the level of pho-netics. In fact, we will resist the temptation to do so to a large degree. Instead, general impressions are provided below about the phonetic similarity to Stan-dard Dutch for the main varieties (in a historical sense), and the information is limited to phonological processes that are likely to influence our results (i.e. that cause differences in broad as opposed to narrow phonetic transcription). More-over, the current position of these varieties in the overall language community is shortly discussed, as well as the attitudes and make-up of their own language community. These facts together constitute the relevant information about these language varieties before the first data collection (i.e. the RND) and the likely phonetic trajectories they followed afterwards. For a detailed and systematically structured investigation of each variety on other linguistic levels, we refer to the excellent work ofHinskens & Taeldeman (2013).

The general language policy in the Netherlands deserves some attention here. Frisian, Limburgish, and Low Saxon have all been supported under the Eu-ropean Charter for Regional or Minority Languages (ECRML) of 1992, regardless of differences in speaker population size (e.g. there are many more Low Saxon speakers than Limburgish ones). The charter provides different levels of support and protection under parts II and III, however, and this again potentially causes differences. Frisian is protected to the fullest extent, but Low Saxon and Lim-burgish are only under supported under part II. Under part II Low Saxon and Limburgish have officially been recognized as regional languages, and there are funds to promote them. Under part III the protection is more encompassing, as

(13)

2.4 The Netherlandic language area

signatories are then required to select a subset of concrete undertakings proposed by the European Council to actively expand the use of the relevant language to new social domains. For Frisian this means i.a. that it is being taught in schools, and it can be demanded to be used in court.

It is worth noting here that the Belgian government never signed the ECRML. This was due to the NTU, the Dutch Language Union, advising against recognizing Limburgish in Belgium, because the consequential political re-balancing would cause Flemish to be politically outweighed by French-speaking Wallonia (

Swa-nenberg,2013). This in turn influenced the decision to not include Zeelandic and

Brabantish varieties in Flanders under the ECRML, which are therefore left unpro-tected despite considerable lobbying from their speaker populations (Taeldeman,

2013b).

The general prevalence of the dialects across their respective speaker pop-ulations is as follows. Driessen (2005) reports that around 2000 about 75% of the population in Friesland and Limburg spoke their respective regional variant, while only about 60% of the Low Saxon population uses their language. The same level is reported for the much smaller Zeelandic population. Driessenwas unable to provide information about Brabantish, but Swanenberg & Van Hout (2013) report that dialect use is relatively high in North Brabant. De Tier et al.(2008), however, compared Zeelandic, Brabantish, and Limburgish in a single study based on an internet questionnaire and found slightly different patterns. The question-naire was posted in general newspapers instead of forums for dialect and lan-guage enthusiasts, which avoids overrepresentation of prolific dialect speakers. They found that Limburgish was in a comparatively strong position, but they also noted a surprising prevalence of regiolect[3] use in Zeeland and North Brabant. This is corroborated by findings that regiolects are even used by younger genera-tions (Swanenberg & Meulepas,2011;Wilting et al.,2014), who have been using traditional dialects less and less over time compared to older generations.

Driessen(2012) reports the findings from a large cohort study (data

collec-tions in 1995, 2001, and 2011) investigating self-reported language use among dialect speakers. The findings indicate that the balance between the language va-rieties has largely remained unchanged over time: Frisian is spoken the most by its population, followed by Limburgish, Zeelandic, and then Low Saxon and Bra-bantish. The use of all varieties has decreased over time, although for Frisian it is mostly stable and for Low Saxon there has been a 10% decrease at each succes-sive data collection. The findings about dialect use indicate that in all likelihood Frisian and Limburgish speaker populations have the strongest connection with their language, and therefore try to uphold its use, while this is not (or to a lesser degree) the case for the other varieties. It should be clear in any case that the exact numbers of dialect use differ considerable from study to study depending on the approach that is taken, but the overall picture remains the same.

What follows below is a summary for all relatively major dialect groups in the Netherlandic language area, including their historical development in broad [3]A discussion of the concept of regiolect takes up too much space. It suffices here to accept regiolectal features as “dialect features with a wide geographical distribution” (p. 80, Van-dekerckhove,2009).

(14)

2 Background

terms, key phonological differences with Standard Dutch, and what is known in general about the speaker populations. This information serves as a calibration for interpreting the results we find later in the thesis.

2.4.1 Low Saxon

Low Saxon dialects in the Netherlands are spoken in the provinces of Groningen, Drenthe, Overijssel, parts of Gelderland, and to a lesser degree in Friesland (and possibly Flevoland if the Urk dialect is classified as Low Saxon). These language varieties are relatively distinct from Standard Dutch due to their origins lying in Low German as opposed to Low Franconian, although they are colloquially in-correctly referred to as Dutch dialects even by their native speakers. Historically the Low Saxon varieties derive from the same branch of the West Germanic lan-guage family as Frisian varieties, i.e. the Ingvaeonic (or: the North Sea Germanic) branch as opposed to the Istvaeonic branch to which Low Franconian belongs. In the Middle Ages Old Frisian and Old Saxon split into separate languages, with Old Saxon developing into Middle Low German and consequently Low Saxon, although language contact between Frisian and Low Saxon was intense and a Frisian substrate remained present.

Together these dialects are part of a dialect continuum stretching far across the border into Germany that is defined chiefly by its common phonological fea-tures, which are absent in Frisian and Low Franconian. Bloemhoff et al.(2013a) note that key characteristics are for example the use of short and long vow-els where Standard Dutch and Frisian exhibit different patterns[4]. Moreover, many varieties have historically developed broken diphthongs (i.e. relatively short diphthongs with a schwa as the second component) in a large area due to Westphalian breaking (cf. Bloemhoff & van der Kooi, 2008; still observed in Low Saxon (Twents) [i:əzəl], as opposed to Standard Dutch [ezəl], for ‘donkey’). These are relatively rare in Low Saxon varieties nowadays, but the processes that have taken place for these particular diphthongs since then (e.g. mergers with more closed vowels) have resulted in much of the vowel variation between the dialects nowadays (cf. Bloemhoff et al., 2013a). The pervasiveness of syllabic consonants also stands out compared to neighboring areas (e.g. Standard Dutch [-ən] being realized as [-n]/[-ŋ]/[-m̩] in different phonological contexts)[5].

Recent dialectometric studies corroborate the specification based on pho-netic patterns (although morphological and syntactic differences also exist), as it is a reliable finding that these local varieties cluster together on the aggregate [4]An example of vowel shortening is [i:] becoming as follows in the contemporary pronuncia-tions of ‘grey’: Low Saxon (Gronings) [ɣris], Frisian [ɡri:s], and Standard Dutch [ɣrɛis]. An example of lengthening is [ɑ] becoming as follows in contemporary pronunciations of ‘land’: Low Saxon (Gronings) [lɑ:nt], Frisian [lɔ:n], and Standard Dutch [lɑnt]. Note that each these current forms developed from different ancestral languages, however, so these forms also had different sources.

[5]Although not the topic of this investigation, it should also be noted that, in contrast with the Low Franconian dialects, Low Saxon varieties also have a much more distinct lexicon from Standard Dutch, which has historically borrowed much from Frisian and German as opposed to Dutch.

(15)

2.4 The Netherlandic language area

phonetic level in a manner that separates them from neighboring language areas (e.g. Wieling, 2007; Wieling et al., 2011). At the same time phonetic variation seems also to be the main source of linguistic variation within the continuum, as evidenced by the numerous examples in our datasets of Standard Dutch [bludən] (“to bleed”) being pronounced as [blɑudn] in a large part of Groningen and as [blujə(n)] in Gelderland localities. These micro-variations are unlikely to signifi-cantly impact aggregate classification, however, and still clearly leave Low Saxon a unified group of dialects.

Low Saxon varieties are spoken by a large number of speakers across five provinces (a minor part of Flevoland can be included if the Urk dialect is counted as Low Saxon, yielding a total of six provinces), but the population is aging and decreasing, which is the case for all dialectal varieties that follow below to a greater or lesser degree, but perhaps especially so for Low Saxon. Intergener-ational transmission for Low Saxon is low, and speakers tend to use their local variety almost exclusively in close interpersonal communication, which suggests Low Saxon exists in a diglossic language situation with Standard Dutch. This is unlikely to be true for most speakers, however, because there is evidence that over time fewer proficient speakers have maintained their language and some even stop using the language at home (Bloemhoff et al., 2013b).

These facts are somewhat puzzling when language attitude research is car-ried out under Low Saxon speakers, because most Low Saxon participants in

Ter Denge(2012)’s study expressed that they do in fact find their language

beau-tiful and that they are affectionate towards it. Despite (as of yet still relatively) widespread usage among older generations, there is little effort to promote and preserve the language. Especially the latter fact sets it apart from the more pro-lific Frisian and Limburgish varieties, which foreshadows that the Low Saxon varieties might be more prone to external linguistic influence from the standard language.

2.4.2 Low Franconian

Low Franconian dialects are spoken in a large part of the Netherlandic area, i.e. the provinces of North Holland, South Holland, Utrecht, North Brabant, Zeeland, Limburg, and non-Low Saxon parts of Gelderland in the Netherlands. The Flem-ish varieties in Belgium are also members of the Low Franconian language family, which means that most of the language area under investigation belongs to the Low Franconian language group. Even the Standard Dutch language is a variant of Low Franconian, i.e. Hollandic, and it is therefore essentially spoken every-where. Old Low Franconian developed into the Low Franconian language group in the Middle Ages, which itself branched out into Western and Eastern varieties. Middle Dutch developed within the Western branch, albeit with much mutual influence from Brabantish, which was at that time still relatively influential (as well as more similar to Zeelandic than to Flemish). Around the same time Old Limburgish developed within the Eastern Low Franconian branch and it has since followed a separate trajectory into modern day Limburgish variants.

(16)

2 Background

own regional dialects that overflow the provincial borders: Zeelandic, Braban-tish, and Limburgish. The former two have been heavily influenced by the stan-dard language for hundreds of years and have therefore already lost many dis-tinctive dialect features by the time of the first data collection. Stability of these dialects correlates roughly with the distance to the cultural and economic center of the Netherlands (the ‘Randstad’), which, together with its early split from the other Low Franconian dialects, explains the relative distinctiveness from Standard Dutch of Limburgish.

Zeelandic and Brabantish

Zeelandic is still relatively close to Flemish nowadays in terms of pronuncia-tion, although it can be clearly distinguished from West Flemish in Belgium (cf.

Heeringa,2004). The most noticeable differences from Standard Dutch for

Zee-landic are in the vowel system, such as shortening of Middle Dutch [i:] and [y:] to [i] and [y] where modern Dutch developed diphthongs [ɛi] and [œy] (e.g. [kikən] vs Standard Dutch [kɛikən] and [ys] vs Standard Dutch [hœys]). The most significant difference in the consonant system is the loss of initial [h], which is a relatively minor difference. These dialects seem overall to be the least dis-tant from Standard Dutch, which is to be expected given their low geographical distance to the Hollandic center of the Netherlands.

Brabantish variants are also relatively close to Standard Dutch, although there has been much mutual influence of Standard Dutch over the ages. For Bra-bantish (across the Dutch and Belgian borders) the common phonological features are notably difficult to summarize and the overall picture seems scattered with many highly localized features (Taeldeman, 2013a). In our analyses these small individual patterns will likely end up as noise. These varieties are therefore un-likely to be highly distinctive from Standard Dutch at the aggregate level, and the effect is presumably strengthened by the influx of aforementioned regiolec-tal features in the area, which makes the dialects even more similar to Standard Dutch overall[6]. The consonantal system is mostly identical to Standard Dutch, which sets these varieties apart from Low Saxon and Frisian, which have more salient differences in consonantal structure and inventories.

Limburgish

Limburgish varieties are derived from (Middle) Dutch, although some Limbur-gish dialects also draw much influence from the neighboring German area, e.g. from Colognian and Standard German. Due to the prolonged language contact with different neighboring language centers, there are many variants in a rela-tively small space, and they have a relarela-tively distinct lexicon. The Limburgish language area in the Netherlands can nonetheless be clearly separated as a sepa-rate group of varieties from the neighboring German area, although there also is a noticeable difference between the northern and southern varieties (Bakker & van

[6]More noticeable differences are to be found in the lexicon or morphosyntactic patterns (

(17)

2.4 The Netherlandic language area

Hout, 2017). A few key phonological characteristics that separate Limburgish from neighboring varieties are vowel lengthening phenomena and diphthongiza-tion, but many varieties also exhibit phonological properties that are rare (or perhaps even unique) for the Netherlandic language area, such as tonal accents and sandhi voicing (i.e. voicing across word boundaries). We refer to Hermans

(2013) for a more extensive overview of these features, as they can differ quite notably from dialect area to dialect area.

Limburgish was granted the status of a regional language and remains rel-atively prolific even in urban areas. This is mostly due to its population, which values the language as regional cultural heritage to a greater degree than the Bra-bantish and Zeelandic (and Low Saxon) speakers seem to do. This is reflected by its standardization efforts, which is rare, and it also enables the prominent use on modern social media (Jongbloed-Faber et al., 2017). As is the case for Zee-landic and Brabantish there is an influx of regiolectal features, although these are currently restricted to the southern part of Dutch Limburg and most of Belgian Limburg (Cornips, 2013). In conclusion, Zeelandic and Brabantish are in con-siderable decline, while Limburgish is also losing ground to external influences, although it is likely to resist change much better due to the language attitudes, historical development, and relative periphery from Standard Dutch.

Flemish

Another large and consequential Low Franconian dialect group is Flemish, which exists mainly in (the western half of) Belgium neighboring the Brabantish and Limburgish areas. Note that the use of the term Flemish is ambiguous in itself, because it is simultaneously used to refer to the Standard Dutch ‘accent’ in Flan-ders as well as the regiolect and traditional local varieties, but we focus on the latter two.

Flemish dialects followed the same pattern as Low Franconian dialects well into the development of Middle Dutch, although at that time the similarity to Zeelandic was much larger. Relatively little is known about these dialects be-tween the Middle Ages and when Belgium became independent in 1830 (due to extremely low prestige during that period), but we can assume that it was during this time that Flemish split off from the formerly Zeelandic-Flemish group. By the time Flemish was politically strong again, i.e. the late 19th and early 20th century, there was so much enthusiasm for the standardization of Dutch by the Flemish Movement to counter the influence of the French language that Flem-ish dialects presumably started to level out, which is indeed observed nowadays

(Vandekerckhove, 2009) like it is for Brabantish and Zeelandic under influence

of Dutch. The ‘end’ of this tumultuous history approximately coincides with the beginning of our data collection, i.e. the language laws of 1930 that made Dutch the only official language of Flanders.

Nowadays there is a more or less clear east–west divide in in the Flemish dialect area, with the former being more similar to Brabantish and the latter more to Zeelandic. Taeldeman & Hinskens (2013) maintain that East Flemish is mostly an intermediate form between West Flemish and Southern Brabantish and

(18)

2 Background

therefore hard to define by unique features if there are any. For both Flemish groups distinctive phonological features are monophthongization (e.g. Middle Dutch [i:s] and [hy:s] becoming [is] and [ys], compared to Standard Dutch [ɛis] and [hœys]) and the preservation of historical word endings from Middle Dutch (e.g. [-ə] and [-ən], which seems to have lost most of its grammatical function).

2.4.3 Frisian

Lastly, there is the (West) Frisian language, which is spoken in the province of Friesland. This is likely the language group that is the most different from Stan-dard Dutch, because it historically followed a distinct and unique language trajec-tory that made it closer to English than Dutch, and it was spoken along the entire North Sea coast (as Old Frisian). This area has reduced significantly nowadays, although there are still numerous varieties within the three main dialect groups: Clay Frisian in the northwest, Wood Frisian in the southeast, and Southwestern Frisian (cf. Hoekstra et al., 2003). Differences between these will mainly be salient to native speakers, because in dialectometric analysis these are typically clustered together (e.g. Nerbonne et al.,1999), and mutual intelligibility is high. Prolonged contact with the Dutch language varieties has resulted in some reduction of unique language features, but it is still clearly distinct on every lin-guistic level. In fact,Van Bezooijen & Gooskens(2005) found that mutual intel-ligibility between Dutch and Afrikaans speakers was higher than between Dutch and Frisian speakers, even though Afrikaans is spoken is not spoken in the same language area and language contact is presumably virtually absent. Dutch speak-ers also find it much easier to undspeak-erstand Low Saxon than Frisian (Van Bezooijen

& Van den Berg, 1999; they respectively translated 94% and 58% words

cor-rectly), which moreover shows Frisian’s relative distinctiveness in the language area. We omit the phonological comparison to Standard Dutch here, since this would require a separate study (but see e.g. Sipma, 1913).

Again, it is important to take into account the role the language plays for its speakers. It can be argued that Frisian exists as in a more diglossic than dia-glossic language situation (cf. Auer, 2005), meaning that the language can be the language of preference in specific social situations instead of Standard Dutch.

Ytsma(2006) reports that the majority of students even exclusively spoke Frisian

in the home with their parents. Frisian therefore occupies a fundamentally differ-ent role in the everyday life of its speakers compared to the other dialects. This suggests that Frisian may be less susceptible to Standard Dutch influence than other varieties are.

Hudson (2002) questions to what degree the linguistic situation is fully

diglossic, however, because in a diglossic situation the language varieties are in a complementary functional distribution, but Frisian and Standard Dutch seem to be competing for the same domains of use in some cases. On the other hand, most dialects in the Dutch area are used exclusively in personal or otherwise specific domains of use, while Frisian is also taught in schools and used by official public institutions in standardized form. The latter has mainly become possible due to active and intense campaigning of the Frisian population in the 20th century.

(19)

2.5 Research questions

All in all, it is likely that Frisian nowadays increasingly occupies a typical dialect role as opposed to the native language role for many of its speakers, but its history and the strong language attitudes of its speaker population suggest that Frisian is currently able to resist Standard Dutch influence relatively well. This is likely to be especially visible in our data considering that we are dealing with data from the 20th century, as it seems that intergenerational transmission has diminished only recently and only slightly (cf. Ytsma,2006;Driessen, 2012).

This section about the language area has supplied much information, which may benefit from some synthesis in order to be clear, so the information is sum-marized in Table2.1.

Table 2.1

A summary of properties associated with the language varieties.

Language group Support fromspeaker population Periphery fromStandard Dutch Phonologicaldifferences Regiolectformation

Low Saxon Weak Far Many Possibly

Zeelandic Moderate Close Few Yes

Brabantish Moderate Close Few Yes

Limburgish Strong Far Many Yes

Flemish Strong Relatively far Moderate Yes

Frisian Strong Far Many No

2.5 Research questions

Recall from the beginning of this thesis that the original focus was to be more nar-row, i.e. only focusing on the Low Saxon language area. One may wonder what makes this area a more interesting topic than the other varieties in the Nether-landic language area. Some indications have already been mentioned above. These varieties are spoken across a large part of the language area, i.e. across at least five out of twelve Dutch provinces, but its position in the language com-munity seems to be relatively weak. Intergenerational transmission is low, and language attitudes are positive, but there are relatively few cultural preservation efforts given its large speaker population. This leaves the language group vulner-able to external influence from Standard Dutch, which is an interesting process to quantify using dialectometric analysis.

Low Saxon varieties exhibit a few other peculiarities that other language varieties in the Netherlandic language area do not, such as undergoing phonetic change that makes the language more dissimilar from Standard Dutch. An in-teresting finding is that in the local dialect of Winterswijk (in the province of Gelderland) there are a set of ongoing changes, such as local vowel inventories becoming more similar to the neighboring northern and western varieties, which makes neighboring local varieties more similar to each other (but not to Stan-dard Dutch). These inter-varietal convergence effects are even found across the German borderSmits(2005,2011).

(20)

2 Background

In the province of Groningen, we observe yet another pattern: there is an ongoing change from diphthong [aɪ] to [ɔɪ] (p. 162,Bloemhoff & van der Kooi,

2008). The local Low Saxon word for ‘one’ (written as either ‘ain’ or ‘aine’) is increasingly pronounced as [ɔɪn(ə)], even by older generations of speakers. It is interesting that the spelling has not caught up with this ongoing change, which is an indication of its relative novelty. This change is both divergent from Standard Dutch as well as from surrounding varieties (to our knowledge), which is even more surprising than finding convergence between dialects.

The results indicate that both the second and third

hypotheses are correct. The map in the middle shows that

when both members of a dialect pair converge to

stan-dard Dutch, they usually also converge to each other

(second hypothesis). The map on the right shows that

when both members of a dialect pair diverge from

stan-dard Dutch, they usually also diverge from each other.

We had a closer look at the change measurements

obtained on the basis of all sound changes as shown in

the left picture in Figure 2. We correlated them with

geographic distances and found a weak correlation of

r

= −0.14 (p < 0.001). This means that geographically

close dialects converge to each other and geographically

distant dialects diverge from each other. The average

geographic as-the-crow-flies distance of converging

dialect pairs is 143 km, and the average geographic

distance of diverging dialect pairs is 162 km. Boxplots

of geographic distances of both converging and

diverging dialect pairs are shown in Figure 3. The

geographic distances of converging dialect pairs vary

from 2.2 km (between Sint Annaparochie and Nij

Altoenae, in the northwest of Frisia) to 392.0 km

(between Uithuizen, in northern Groningen, and

Poperinge, in southwestern West Flanders). The

geographic distances of diverging dialect pairs vary

between 1.9 km (between Kampen and IJsselmuiden)

and 396.3 km (between Finsterwolde and Poperinge).

We compared the two groups of dialect pairs by means

of the Welch

’s t-test and found that the geographic

distances of converging pairs is significantly smaller

than the geographic distances of diverging pairs

(t

= −6.84, df = 3569, p < 0.0001).

Figure 4 shows only the changes in relationships

between neighboring dialects. The picture obtained on

the basis of all sound changes (left) gives the impression

that dialects in the west converge to each other and

dialects in the north and the east mainly diverge from

each other. The pictures obtained on the basis of sound

changes resulting in convergence to standard Dutch

(middle) and divergence from standard Dutch (right)

show a remarkably large number of white lines.

A white line between two dialects means that the

relationship has hardly changed. When comparing the

pictures in Figure 4 with the corresponding ones in

Figure 2, we notice that changes in relationships

resulting in convergence to or divergence from

stan-dard Dutch mainly occur between geographically more

distant dialects, rather than between neighboring

dialects.

We test the hypothesis that sound changes in two

dialects which make them converge to standard Dutch,

make them also become more similar to each other. This

hypothesis is tested by testing the null hypothesis that

measurements of the change in the distance between

dialect pairs are zero on average, i.e. convergence to

standard Dutch does not cause dialects either to mainly

converge to each other or to mainly diverge from each

other. We applied a right-sided one-sample t-test to the

measurements of the change in the distance between

Figure 4.Convergence/divergence between neighboring dialects measured on the basis of all sound changes (left), on the basis of sound changes which cause dialects to converge to standard Dutch (center) and on the basis of sound changes which cause dialects to diverge from standard Dutch (right). Red lines indicate convergence and blue lines indicate divergence; the intensity represents the degree of convergence (red shade) or divergence (blue shade).

Converging dialect pairs

Geographic distance 0 100 200 300 400

Diverging dialect pairs

Figure 3.Boxplots showing the distributions of geographic as-the-crow-flies distances of converging dialects pairs (left) and diverging dialect pairs (right).

Consequences of change in Dutch dialects

31

https://www.cambridge.org/core/terms. https://doi.org/10.1017/jlg.2015.2

Downloaded from https://www.cambridge.org/core. University of Groningen, on 06 Feb 2020 at 07:26:03, subject to the Cambridge Core terms of use, available at

Figure 2.1

Convergence and divergence based on phonetic patterns between neighboring dialects. Blue lines indicate divergence. Red lines indicate convergence.

In light of these different patterns, it is interesting to turn our attention to a recent apparent-time based study of language change for the Netherlandic language area (Heeringa & Hinskens, 2015), of which the results are partially reprinted in Figure 2.1. In this paper phonetic dialect data in the Netherlands from around the 2010s are analyzed. We observe both convergence and diver-gence between dialects in the Low Saxon area during this time period (around the 2010s), but upon closer inspection of the results there are seemingly more promi-nent divergent patterns than convergent ones. The main conclusion ofHeeringa

& Hinskens(2015) is, however, that the the dialects converge towards Standard

Dutch on the aggregate level, and that the overall change is small (i.e. the average change across all dialects is 13.3%).

The patterns that we find so far are confusing. On the one hand the main pattern seems to be one of convergence to Standard Dutch, but on the other hand we find patterns away from the standard variety (or even from neighboring vari-eties and therefore likely to be internal). A possible explanation is that regiolect formation is taking place for the Low Saxon varieties, which would be in line with the patterns that are reported in the literature for Limburgish, Zeelandic, Braban-tish, and Flemish. However,Auer(2018) states that regiolect formation (when a prestigious standard variety is present in the language area) is typically a combi-nation of convergence between dialects on the one hand and convergence to the

(21)

2.5 Research questions

standard variety on the other. Heeringa & Hinskens(2015) found that divergence from Standard Dutch is typically paired with divergence between dialects, and that asymmetric convergence–divergence patterns are almost never found.

We hypothesize here that the balance between the convergent and diver-gent processes is unequal and that it is this imbalance that is the main determi-nant of pronunciation change. While we do observe noticeable patterns away from Standard Dutch for the Low Saxon varieties, we assume that these are rela-tively minor compared to the overall convergence towards Standard Dutch. It is likely that the convergence patterns between dialects are more common than the dialect-internal ones (perhaps to the point that we should treat the latter group of patterns as noise), so we are then left with the typical situation of regiolect forma-tion. We (partially) test these assumptions in this thesis by investigating whether the effect on an aggregate level is one of convergence to Standard Dutch.

Assuming that most pronunciation change is towards Standard Dutch, we can also expect that the rate of pronunciation change we observe in our aggregate-level analyses mainly reflects the rate of change of these convergence patterns (as opposed to divergence patterns). In other words, we expect Low Saxon varieties to show a possibly higher rate of change than the other language varieties in the Netherlandic language area, because both the convergence between dialects and the convergence towards Standard Dutch accelerate the overall regiolect forma-tion. This is admittedly a strong assumption, because the regiolect formation for other dialect groups is also relatively novel. It is also possible that Low Saxon areas therefore show similar rates of change to the areas for which regiolect for-mation is taking place. In either case, we do not expect the Low Saxon area to show a lower rate of change than Friesland, because we have no evidence of re-giolect formation for these varieties (and we do not expect this to happen based on its speaker population and relative political power).

In sum, our research questions and corresponding hypotheses are these: 1. Do the Low Saxon show a different rate of pronunciation change than other

language areas? Our hypothesis is that they show a higher rate of change, but the possibility of a similar rate of change to language areas for which regiolect formation is known to be taking place is not ruled out.

2. Do the Low Saxon varieties mainly converge towards Standard Dutch or away from it? Our hypothesis is that the Low Saxon varieties mainly con-verge towards Standard Dutch, as is suggested by the existing literature.

(22)

3 Data

In this section we summarize how the phonetically transcribed datasets were col-lected, when they were colcol-lected, and with which intentions they were collected. We also note possible shortcomings of these datasets, and how they differ in this regard.

3.1 Reeks Nederlandse Dialectatlassen (RND)

The earliest data collection of interest was initiated in 1923, when Edgard Blanc-quaert of Ghent University started collecting phonetic data in his birth region (Klein-Brabant). He constructed 141 short sentences in Standard Dutch, which were translated to local dialect by dialect speakers. His intention was to con-struct a dialect atlas to enable dialect geographical research, and the first atlas was published in 1925. The same approach was then replicated over the years for different geographical areas with many colleagues (most notably Willem Pée), and by the end of the data collection (in 1982) the full set consisted of record-ings from 1956 localities and over 250,000 translated sentences. Only a subset is available in digitized form, however, so we work with data of 166 word types from 347 locations, which still yields a useful sample of the dialect continuum of interest (see Figure3.1).

Figure 3.1

The RND locations.

One important shortcoming of the RND is that the total data collection spanned 59 years. Each of the recording regions are visualized in Figure3.2, and

(23)

3.1 Reeks Nederlandse Dialectatlassen (RND)

Figure 3.2

The RND recording regions visualized.

Table 3.1

Summary of the RND recording regions and years (in chronological order).

Region Region number Recording year

Klein-Brabant 1 1925 Zuid-Oost-Vlaanderen 2 1930 Noord-Oost-Vlaanderen en Zeeuwsch-Vlaanderen 3 1935 Zeeuwsche eilanden 5 1939 Vlaamsch Brabant 4 1940 West-Vlaanderen en Fransch-Vlaanderen 6 1946 Noord-Brabant 9 1952 Friesland 15 1955 Antwerpen 7 1958 Belgisch-Limburg en Zuid-Nederlands-Limburg 8 1962 Oost-Noord-Brabant, de Rivierenstreek en Noord-Nederlands-Limburg 10 1966 Groningen en Noord-Drenthe 16 1967 Zuid-Holland en Utrecht 11 1968 Noord-Holland 13 1969 Gelderland en Zuid-Overijssel 12 1973 Zuid-Drenthe en Noord-Overijssel 14 1982

the corresponding years of publication are summarized in Table3.1. There is a clear division between the Netherlands and Flanders recording years, with the

(24)

lat-3 Data

ter regions being covered almost exclusively earlier than the former regions (with the exception of Zeeland, Brabant, and Friesland). This is problematic, because it is then possible that Flemish regions will inherently show more pronunciation change than Dutch regions, because more time has passed between their first and second recording years (allowing for much pronunciation change to occur). This is unlikely to be the case here, because the difference of a one or two decades is unlikely to significantly influence the results between regions (unless the rate of phonetic change is very high, but recall that Heeringa & Hinskens, 2015 found that phonetic change was low in the area). In order to account for this possibility, however, we will compute a correlation between the obtained phonetic distances and the recording year of the RND as a first step in the analyses. We do so be-fore applying the phonetic inventory reduction method, in order to evaluate this possible effect as directly as possible.

One further shortcoming pertains to how many dialect speakers are re-cruited for all data collections that we use here. In each of our datasets there is only phonetic data from one dialect speaker per locality. The phonetic data come from older male dialect speakers, who are commonly seen as exhibiting the most conservative forms of the language, especially when they are from the countryside (Chambers & Trudgill, 1980). This alleviates the problem to a cer-tain degree, but it is still possible that the contributing dialect speaker is not representative of the local dialect, especially in localities with much language contact. The addition of a second speaker already makes it considerably easier to detect such anomalies, but this is admittedly difficult to achieve. Specifically for the RND, for example, it means that for 1956 locations a second speaker would have had to be found, which is only possible with a considerable amount of extra resources.

3.2 Goeman-Taeldeman-Van Reenen Project (GTRP)

Some specifically linguistic shortcomings of the RND were noted by dialectolo-gists, such as the absence of morphologically varied forms on the list of items, and thus the GTRP (collected between 1980 and 1995) was constructed to improve upon the RND. The project was founded by Ton Goeman (Meertens Institute[7]), Johan Taeldeman (Ghent University), and Piet van Reenen (Vrije Universiteit Am-sterdam), although many researchers from other universities and organizations were involved in the work[8]. Their main responsibilities were respectively the direction of the Dutch field work recordings, the Flemish work field recordings, and the set up of the database. The resulting data are “at the core of phonological research at the Meertens Institute” (p. 1,Oostendorp,2014), so they are still very valued and much in use to this day. They were mostly collected from older rural males, as was the case for the RND.

[7]A linguistic research institute founded by the Dutch Academy of Arts and Sciences. The main focus of the institute is on the Dutch language and culture, but much effort is invested in dialectology.

[8]Seehttps://www.meertens.knaw.nl/projecten/mand/GTRPdatata.htmlfor the complete list of contributors.

(25)

3.2 Goeman-Taeldeman-Van Reenen Project (GTRP)

For the GTRP the same methodology was applied as for the RND, and the intended use of the data was again dialect geography, but a considerably longer questionnaire of 1876 items was constructed (note that a large part of these were single words, as opposed to sentences). Participants were recruited from 613 locations across the Netherlands and Flanders (see Figure3.3), which means the full set consisted of over a million items (and even double as much when articles and morphological variations are counted separately). Another improvement is that the audio recordings of the dialect speaker interviews were saved for future studies, which allows researchers to make their own transcriptions if they desire to do so.

Figure 3.3

The GTRP locations.

Again, however, only a part is available in digitized form, although consid-erably more than for the RND. The dataset on Gabmap[9] consists of transcrip-tions from the original 613 locatranscrip-tions, but “only” a selection of 562 words. This is due to the fact that we re-use the dataset fromWieling (2007), which leaves out (1) diminutive and singular nouns, (2) comparative forms of adjectives, and (3) non-1PL conjugated forms of verbs. This was done to ensure that the measured phonetic variation was not structurally obscured by the (largely phonetically sim-ilar) inflected forms of verbs and nouns, and articles or pronouns preceding nouns and verbs. This leaves out a significant amount of data, as the aforementioned intention of GTRP was to include as much morphological variation as possible.

Recall that the RND data were collected between 1923 and 1982, which means there is a partial overlap with GTRP recording years. Importantly, for none of the overlapping localities the GTRP recording came before the RND, al-though in some cases the difference in recording years was very small (see the distribution in Figure 3.4), which means that little phonetic change is expected (if at all) for these locations. The median time span between the RND and GTRP is 25 years, but the range for about 75% of the locations is between 15 and 45 [9]The complete dataset is available throughhttps://www.meertens.knaw.nl/mand/database/.

(26)

3 Data 1 0 2 0 3 0 4 0 5 0 6 0 D iff e re n ce (i n ye a rs) Figure 3.4

Boxplot of the difference in years between the RND and GTRP recordings.

years, so there is much variation. In the extreme cases, it may even be the dif-ference between a time span of 3 years and 67 years. We compute a correlation between this difference in years and the pronunciation change (without appli-cation of the phonetic inventory reduction method) to assess whether this range in recording year differences significantly influences the pronunciation change values we obtain.

Figure 3.5

(27)

3.3 From Dialect to Regiolect project (DiaReg)

Note that in order to do a valid diachronic analysis (as opposed to a

syn-chronic analysis), it is necessary to select words (as well as locations) that overlap

between the datasets. Without this restriction of overlap it is impossible to de-lineate the effect that different word types (or specific locations) have on the changing patterns we may find, as it is known that speech change may progress differently from word to word depending on lexical characteristics (such as word frequency; see e.g. Bybee, 2002). We can therefore only use a relatively small subset of the available words in each dataset. The 61 words that overlap between the datasets are shown in Table 3.2. Luckily, this is more than 50 words, which is a reliable threshold for finding distinct dialect areas[10]. Similarly, the loca-tions should overlap in order to properly investigate and control for the effect of geography, so only the overlapping 192 locations are used (see Figure3.5). Table 3.2

Overlapping words between RND and GTRP.

bakken dun kaas op ver

bier duwen komen potten vier

binden eieren koud rijp vijf

blauw flauw krijgen saus voor

brengen gaan krom sneeuw vuur

buigen geld laten spannen weg

doen geweest licht springen wijn

dopen goed maart stenen zee

dorsen gras melk tegen zes

dorst groen moe twee zijn

drie hebben nog vader zuur

drinken hooi ook veel zwemmen

droog

3.3 From Dialect to Regiolect project (DiaReg)

A third dataset, the From Dialect to Regiolect project (Heeringa & Hinskens,2015), covers the Netherlandic language area at a later point in time. This dataset was not initially planned to be included in the analysis, but the inclusion of this dataset allowed us to evaluate our RND–GTRP comparison findings with partially more optimal settings (what is seen as more optimal is explained further below). In order to do so, we compare the data from a subset of GTRP locations to the DiaReg data for these locations.

Note that the intentions and methodology for this data collection were no-tably different from those for the RND and GTRP. The principal investigators, [10]We recently tested this within Speech Lab Groningen, but more work is necessary to determine optimal “conditions” for this type of analysis, and therefore these results are not yet published. In terms of sheer size, however, 50 words can be taken as a safe baseline. When fewer words are used, results may be similar as when over 50 words are used (as was the case during testing), but results become more sensitive to outliers and noise

(28)

3 Data

Wilbert Heeringa (University of Groningen) and Frans Hinskens (Meertens Insti-tuut), constructed these data specifically to investigate whether there was evi-dence for regiolect formation across the Netherlandic area both in speech pro-duction and perception across linguistic levels of representation [11] (e.g. the lexical, phonological, and phonetic level). We only investigate possible regiolect formation on the phonetic level here, however.

Participants in the DiaReg study participated in small groups of (at least) two. Each group was presented with stills of a short silent movie as well as a transcript of the movie in Standard Dutch. These steps ensured that the results per group were relatively comparable. Each group was then required to discuss and describe what happened in the movie and to construct a transcript in the local dialect. Each member read the transcript aloud and the “best one” (by consensus) was transcribed for the final dataset, yielding 90 different word types across 86 localities.

An important improvement over the RND and also the GTRP is that these data were collected in a much smaller time period, i.e. between 2008 and 2011. Especially the RND data were collected during a much greater time span, which is has the potential to cause problems in the analysis (as mentioned in the pre-vious sections). For the GTRP the range in recording years is smaller than for the RND, as the vast majority was collected between 1980 and 1995, but this is still five times larger than the DiaReg range in recording years. By applying the same methods in the GTRP–DiaReg comparison, we are able to (partially) verify whether our results from the RND–GTRP are sensible, and whether our methodology is useful in both scenarios.

Figure 3.6

The overlapping selected locations between the DiaReg and GTRP.

Again, it is required that the words overlap, but this time between the GTRP and DiaReg. It would be even more ideal to have overlap between all [11]Seehttps://www.nwo.nl/en/research-and-results/research-projects/i/13/3213.html

Referenties

GERELATEERDE DOCUMENTEN

school language on the academic language proficiency of first-year students in higher education: an explorative

Tijd en ruimte om Sa- men te Beslissen is er niet altijd en er is niet altijd (afdoende) financiering en een vastomlijnd plan. Toch zijn er steeds meer initiatieven gericht op

Similar to Barsalou’s (1999) perceptual symbols systems, the indexical hypothesis (Glenberg &amp; Robertson, 1999; 2000) is another theoretical framework that connects the

Table 3 shows the accuracy with the implementation of each spectral feature individually, per classifier for both municipality and province classification.. The bold-face accuracy’s

In this research the independent variable (use of native or foreign language), the dependent variable (attitude towards the slogan) and the effects (country of origin,

Since previous research on Javanese and Sundanese learners’ production of English is very scarce, we need to demonstrate the pattern of the vowel space area by the

We hypothesized that there is regional variation in the pronunciation of /s/ within the Dutch language area. a more [ʃ]-like pronunciation of /s/) than speakers from other

Our statistical analysis shows that microstructural properties the right fronto-parietal language pathways (i.e. the anterior indirect segment of the arcuate