• No results found

Research internship report Hedwig Sekeres

N/A
N/A
Protected

Academic year: 2021

Share "Research internship report Hedwig Sekeres"

Copied!
22
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)
(2)

Report of Internship at Centrum Groninger Taal en Cultuur

February 2020 – June 2020

Hedwig Sekeres

S2992604 h.g.sekeres@student.rug.nl Supervisors:

RuG: prof. dr. Martijn Wieling CGTC: prof. dr. Goffe Jensma

(3)

Table of contents

1 Introduction ... 4

2 Description of placement providing organisation ... 5

2.1 CGTC... 5

2.2 Speech Lab Groningen ... 5

3 Position / description of tasks ... 7

3.1 Preceding the placement ... 7

3.2 Student’s position in organisation ... 7

3.3 Tasks ... 7 3.3.1 Literature study ... 7 3.3.2 Method design ... 8 3.3.3 CETO approval ... 10 3.3.4 Data collection ... 10 3.3.5 Data processing ... 10 3.3.6 Programming ... 11

3.3.7 Results & analysis ... 12

3.3.8 Outreach ... 14

3.4 Learning outcomes ... 14

4 Evaluation of placement ... 16

4.1 Place in programme ... 16

4.2 New knowledge and skills ... 16

4.2.1 Knowledge ... 16 4.2.2 Hard skills ... 17 4.2.3 Soft skills ... 17 4.3 Learning outcomes ... 18 4.4 Supervision ... 18 4.5 Future career ... 19

4.6 What comes next ... 19

5 Conclusion ... 20

(4)

1

Introduction

With this report, I will conclude my research internship at the Centrum Groninger Taal & Cultuur (CGTC) and reflect on my experiences. The internship is a compulsory component of the Language and Cognition track of the Research Master in Linguistics in Groningen and has a workload of either 20 ECTS (560 hours) or 25 ECTS (700 hours). I chose for the latter option as my initial internship plan consisted of a rather substantial data collection as well, which would take up a lot of time. Furthermore, in order to keep a good balance between the internship, my student assistantship with Martijn and a healthy outside-of-work life, I decided to work for 35 hours a week for a longer period instead of the usual 40 hours a week.

I think that in many reports that will be submitted this semester, the COVID-19 pandemic will play a substantial role and this report will be no different. After 1,5 months of working on my initial project, I got an email from the Speech Lab advising students doing an internship or research training involving interaction with participants to consider switching projects, as they suspected that data collection would soon become impossible. A few days later, the faculty proved them to be right, and I had to start working on a project using existing data. Luckily, Martijn immediately proposed a project involving two datasets that were easily obtainable. As I was very excited about my initial project, having to change to a different project came as a bit of a disappointment, but I am still very happy with the results and can safely say that I have learned a lot during the past months. This internship has provided me with insights that I could not have gotten from any of the regular ReMa courses, as it gave me a better impression of the reality of doing research in academia: both the good and the bad.

The first project that is described in this report concerns the dialects of Groningen and Drenthe. The goal of this project was to investigate the relationship between the actual

phonetic differences and the phonetic differences that were perceived by the speakers, and also whether there was an influence of spelling on pronunciation. The second project concerns the Low Saxon and Frisian dialects spoken in the Netherlands. Its goal was to investigate dialect change in these areas both in real time and apparent time by using datasets that were collected at different times.

This report will consist of a brief introduction to the CGTC, a description of my tasks as an intern for both research projects I worked on and an assessment of the learning

outcomes. The most substantial part of the report will be devoted to the evaluation of the internship, in which I will reflect on the skills and knowledge I have gained and what this will mean for my future career.

(5)

2

Description of placement providing organisation

2.1 CGTC

The Centrum Groninger Taal & Cultuur (CGTC) was founded in 2018, as a fusion of the former Bureau Groninger Taal en Cultuur and the Huis van de Groninger Cultuur. The Bureau was the RuG’s centre of expertise on the dialects of Groningen and the cultural expressions that were connected to these dialects. They focused not only on research, but also on the communication of research results to the broader public (Bloemhoff et al., 2008). The Huis organised events and provided information on Groningen’s cultural heritage, with a smaller focus on the research and linguistic components.

The current CGTC has two departments: one for public activities (CGTC

Publieksactiviteiten) and one for science and research (CGTC Wetenschap & Onderzoek). To the broader public, the CGTC is probably most well known for the events they organise, most importantly the Dag van de Grunneger Toal (DvdGT, Day of the Groningen Language). At this annual event musicians, authors, theatre makers, linguists and others present to the public their recent creations and findings related to the language (and culture) of Groningen. Some of the other events organised by the CGTC are the Dag Groninger Geschiedenis (Day of Groningen History) and a writing contest.

The research projects conducted at the CGTC are of interest to speakers of Gronings and often make use of citizen science and a bottom-up approach. The goal of these projects is usually not only to collect data, but to also help preserve the Groningen language and to inspire speakers to think about or use the language more frequently. One of the current research projects is called Stemmen van Grunnen (Voices of Groningen) and consists of an application that asks participants to provide their pronunciation in dialect of a few Dutch words, after which the app ‘guesses’ where the speaker comes from. The data collected through this project is stored in Woordwaark, an online database of spoken and written Gronings which is accessible to the public. Another project ran by the CGTC is Van Old noar Jong (From Old to Young), in which older speakers of Gronings are asked which words and phrases they think young inhabitants of the province should learn. These words and phrases are then integrated in a few short stories which are incorporated in a serious gaming app for primary schools. The CGTC pays a lot of attention to computational techniques and novel methods in dialect research and preservation, such as automatic speech recognition and the development of text to speech systems. All of these projects have a clear societal relevance next to their scientific importance.

Some of the prominent researchers at the CGTC are prof. dr. Martijn Wieling, prof. dr. Goffe Jensma and dr. Wilbert Heeringa. Martijn is professor by special appointment of Low Saxon / Groningen Language and Culture and an expert on the use of computational

techniques in language research. He is also my RuG supervisor. Goffe is professor of Frisian Language and Literature and an expert on cultural expressions in minority languages. He is my supervisor from the placement providing organisation. Wilbert is an expert on

dialectometry and phonetics and works both for the CGTC and the Fryske Akademy.

2.2 Speech Lab Groningen

The Speech Lab Groningen (SLG) is one of the many labs that are part of the Center for Language and Cognition Groningen (CLCG). The lab is headed by prof. dr. Martijn Wieling and next to him consist of several PhD and ReMa students at the RuG. Research at

(6)

the Speech Lab is mostly concerned with two disciplines: articulatory research (for example using electromagnetic articulography or ultrasound) and dialect research (using innovative computational techniques). Every other week the lab members meet to give updates on their work and sometimes present their projects. During these meetings, lab members can ask questions or give feedback. As of May, there is also a reading group, in which articles related to the members’ subdisciplines are discussed so that members can expand their knowledge on topics they are less familiar with.

(7)

3

Position / description of tasks

3.1 Preceding the placement

When I started looking for a position, I already knew that I wanted to do my placement in Groningen, as I am interested in the Low Saxon dialects and would like to continue doing research in that direction. I also enjoyed working with the electromagnetic articulography (EMA) device before, so for the internship I planned on working with that as well. A few days after I started looking for projects, Martijn emailed me with an idea for a project that he and Goffe had: I would collect spelling and pronunciation differences in the Low Saxon language area and then assess if there was an influence of spelling on

pronunciation using the EMA. In the weeks leading up to the internship, the project evolved. Because it would probably be hard to get dialect speakers from many different areas to come to Groningen (the EMA is not very mobile), we decided that I would not be working with EMA after all. I also wanted to put a smaller focus on spelling, so that in the end I would look at phonetic differences, the influence of a few spelling contrasts and most importantly the relationship between the actual phonetic differences between dialects and the perceived phonetic differences by its speakers.

3.2 Student’s position in organisation

For research internships, there are usually two options: either working on your own project or joining a larger existing project. I chose to do the former for this internship, so I mostly worked independently on my own project. Although I very much enjoyed the freedom to go in whichever direction I preferred with my research, I do think it’s a pity that I did not really have a lot of contact with other researchers at the CGTC because of this. I did attend regular meetings with the Speech Lab in which lab members gave feedback on each other’s ideas. Outside of these meetings I was also in touch with SLG members, and we frequently helped each other with questions.

3.3 Tasks

Below, I will summarise the several tasks that I have worked on during this internship. The first two categories have information on both projects, the third and fourth category only on the first project and the fifth, sixth and seventh category belong to the second project. 3.3.1 Literature study

For the first project, I started reading up on methods that are used in dialectology (Chambers & Trudgill, 1980), and perceptual dialectology in particular (Preston, 1989, 2018). Perceptual dialectology is a subfield of dialectology, which concerns itself with the ways in which nonlinguists see dialects. Questions asked in perceptual dialectology often relate to the dialect differences that are seen as relevant by speakers, which dialects they feel are similar or different to their own and whether they can identify these dialects accurately.

One effective (but relatively uncommon) method of eliciting dialect differences that are salient to speakers is by doing an imitation task. In imitation tasks, participants are asked to imitate the language (usually dialect) of a speaker group they do not belong to. The nature of these imitations can tell us something about the sounds or words this speaker thinks are characteristic of that language (Palander & Riionheimo, 2018; Preston, 1989). A subsequent comparison of these imitations to both the speaker’s native language or dialect and the language or dialect they are imitating can then inform us on the accuracy of these linguistic

(8)

stereotypes.

Another method that is used in perceptual dialectological research is that of dialect identification. In her study on Norwegian dialects, Charlotte Gooskens asked participants to listen to recordings of dialect speakers and then indicate on a map where they thought the speaker came from and to rate how similar or dissimilar the dialect was to their own dialect (2005). Studies like these inform us on the dialect boundaries that are perceptually important to speakers, and their results can be compared to dialect boundaries drawn on the basis of acoustic information to see whether the speakers’ intuitions are correct.

Finally, I started looking into the specific situation of dialect boundaries and sound differences in Drenthe and Groningen. Based on the literature (e.g. Bloemhoff et al., 2008, 2013, 2019; Kloeke, 1955; Veldman, 1992) I selected several phonological and phonetic features that were of interest for the area under investigation. Some of these features were the absence/presence of final schwa, the absence/presence of voiced fricatives in the onset, aspiration of plosives in the onset and the vowels in the dialect translations of the Dutch words boek, water, groen and zout, which represent important historical sound changes.

For the second project, I luckily already had a lot of the required background

knowledge relating to the Low Saxon language area. I did decide to read up on the differences between the Frisian and Low Saxon speaker populations and found that there were many more people in Frisia that speak the local language than in the Low Saxon language area. Furthermore, Frisian speakers have a more positive attitude towards their language than Low Saxon speakers (e.g. Goeman & Jongenburger, 2009; Hilton & Gooskens, 2013). I did not read up on specific phonological features of Frisian as there was a time constraint and because individual features are less relevant for dialectometric studies.

As the second project made use of dialectometric techniques, I also had to read up on the appropriate methods for investigating large datasets, how to calculate distances and what statistical tests to use (e.g. Heeringa & Nerbonne, 2013; Wieling et al., 2011). Particular attention was payed to the calculation of Levenshtein distances for dialectometric research. The Levenshtein distance is a metric for calculating string distances by adding up the amount of insertions, deletions and substitutions needed to transform one word to another while taking the least costly path (Heeringa, 2004).

The Levenshtein distance can be modified by adding weights for each operation, so that replacing more similar sounds (such as [o] with [ɔ]) will results in a lower cost than replacing more different sounds (such as [o] with [i]). One way in which these modified distances can be calculated is with pointwise mutual information (PMI), an association measure used in information theory. Sounds that frequently correspond in a corpus are said to be associated and will be assigned a lower value than sounds that are less associated (Wieling, Margaretha, et al., 2011).

3.3.2 Method design

Based on the literature mentioned in the previous section, I devised a methodology for the first project. In order to investigate the actual phonetic differences between the dialects of Drenthe and Groningen, participants would first have to perform a picture naming task so that the influence of spelling would be avoided. They would also read aloud a short text in which most of the sounds present in Low Saxon dialects would be present, similar to the Please Call Stella text from the Speech Accent Archive (Weinberger, n.d.). Next, the influence of spelling on pronunciation would be tested by providing participants with word lists containing words with the most salient spelling difference in both varieties (‘oa’ in Gronings and ‘ao’ in Drents

(9)

for the /ɔː/ sound), as well as filler words. After these pronunciation tasks, a semi-structured interview would be conducted, in which participants would be asked several questions about their own dialect and the surrounding dialects they were familiar with. The goal of this

interview was to find out to what degree participants believed surrounding dialects differ from their own, in what ways they differ and which dialects are most different. This would be accomplished by giving them a map that only shows municipal borders and some of the main villages (see figure 1) on which they could indicate the borders of their own dialect and surrounding dialects.

Fig. 1: Empty map for map task

Subsequently, participants would be asked to indicate how the dialects they

distinguished from their own dialect differ exactly: this would be done both directly and by asking participants to imitate what some of the words from the previous tasks would sound like in the other dialects. In a later session (possibly online), participants would be presented with recordings from other participants and asked to indicate on a scale how similar or dissimilar these dialects are to their own. They would also have to indicate on a map where they think the speaker comes from.

For the second project, the method was more straight-forward. I compared an older dataset to a newer dataset by making transcriptions of the items that were present in both datasets. The more recent dataset also contained apparent time data, in the form of two different speaker groups (old men and young women). I calculated the Levenshtein distances of the words between the old dataset and the old men in the new dataset and between the old men and young women in the newer dataset in order to do both an apparent time and a real time comparison. Whereas the first project consisted of several experiments and therefore needed a more elaborate methodology before starting the data collection, the data for the second project were already collected. This meant that I was not as ‘prepared’ for these data as I would have been for data I collected myself and that many of the considerations for the data processing and analysis were taken as problems arose, rather than being determined in advance. For more details on these considerations, see sections 3.3.5, 3.3.6 and 3.3.7.

(10)

3.3.3 CETO approval

As I would be working with participants for the first project, I needed to get approval for my research from the Commissie Ethische Toetsing Onderzoek (research ethics

committee, CETO). In order to do this, I had to provide them with a few example stimuli, an information brochure, an informed consent form and the debriefing that would be given to the participants after the experiment. I also had to research how data could be stored safely and in accordance with the GDPR law. Teja Rebernik provided many useful comments on the forms I handed in, for which I am very grateful.

3.3.4 Data collection

In order to find participants in Drenthe for the first project, I contacted Huus van de Taol (HvdT), an organisation akin to the CGTC but for the local dialects in Drenthe rather than Groningen. HvdT has an extensive network of dialect speakers, and through them I hoped to reach potential participants for the project. In return, I planned to write a popular science article about the results of the study, which could be published on the HvdT and CGTC websites. Unfortunately, at this point it became clear that data collection was no longer an option and I abandoned the project.

3.3.5 Data processing

For the new project, I started working with two large datasets containing dialect data. The first dataset was from the Goeman-Taeldeman-Van Reenen project (GTRP) (A. Goeman & Taeldeman, 1996; van den Berg, 2003) and the second from the DiaReg project (Heeringa & Hinskens, 2014). The GTRP consists of data collected between 1980 and 1995 among mostly elderly speakers at 822 locations the Netherlands and Belgium. The DiaReg data was collected between 2008 and 2011 at 86 locations in The Netherlands and Belgium and consists of two speaker groups: older men and younger women. In order to make a fair comparison between the datasets, I selected all the locations within the Low Saxon and

Frisian language areas in the Netherlands that were present in both datasets (n = 27, see figure 2) and all the words that overlapped (n = 47). Some of these words were excluded from the study because they were missing too often in either dataset, because they were not interesting from a phonetic perspective (e.g. ‘de’), or because their phonetic context caused too much reduction, after which 36 words remained.

(11)

Fig. 2: Overlapping locations between GTRP and DiaReg in the Frisian and Low Saxon language areas

In order to avoid a heavy influence of having different transcribers (see Hinskens & van Oostendorp, 2006), I made new transcriptions for all of these words. Following Heeringa and Hinskens (2014), I made transcriptions per word instead of per recording in order to increase the consistency per item. For this, the timestamp at which each selected word occurred had to be collected, after which a script written by Raoul Buurke would

automatically extract three seconds from that point onwards. Collecting the timestamps was a lot of work because the order in which words occur and the amount of irrelevant speech in between is very inconsistent for the GTRP. This part of the internship was very unrewarding and at some points made me doubt my love for doing research. When making the

transcriptions, I usually listened to and then immediately transcribed each instance of a word. In some cases, I was not entirely sure which symbol would be the best representation of a sound and I skipped that instance to return to it later, but always while I was still transcribing the same item.

The transcription system that I used was mostly based on the system used in the Reeks Nederlandse Dialectatlassen (Blanquaert & Pée, 1925), so that a later comparison to these data would also be possible.

3.3.6 Programming

With the transcriptions I made, I planned to calculate the Levenshtein distances

between the datasets in R. However, there are no R packages that include the option of adding weights for Levenshtein distances. This meant that I had to use the weighted-levenshtein library (Su, 2016) in Python, which I was not very familiar with yet. In order to do so, I had to make some adaptations to my data, for which I used the pandas (McKinney, 2010) and

NumPy (Oliphant, 2006) libraries.

The first thing I had to do was to change transcriptions containing an IPA lengthening symbol (e.g. /baː/) to transcriptions containing ‘double’ sounds (e.g. /baa/). I then made a dictionary so that I could replace all the IPA symbols with ASCII characters in both the

(12)

transcriptions and the PMI, as the Levenshtein function could not process non-ASCII characters. As a last step before I could calculate the distances, I had to set the weights by transforming the PMI data into a numpy array of np.float64 of length 128 for the insertion and deletion costs and a 2D numpy array of np.float64 of dimensions (128, 128) for the

substitution costs. This final step in particular took me a long time, as it involved some nested loops that were conceptually hard for me to understand. Finally, I calculated the Levenshtein distances between the datasets and made a distance matrix for each possible combination. An example of the code I wrote can be found in figure 3.

Fig. 3: A code to transform transcriptions containing the lengthening symbol into transcriptions containing repeated symbols to indicate a long sound

3.3.7 Results & analysis

Due to the fact that the second project was only started after 1,5 months, there was unfortunately very little time left to analyse the results of the study. More time could have been spent on the analysis, but I chose to spend more time on programming because I felt like I would learn more from that. What follows, are therefore preliminary analyses: after the summer I plan to do a more thorough analysis and publish an article on the findings.

In order to assess the influence of each variable on language change, I wanted to build a model. First, I calculated the annual change for each word for both data pairs (DiaReg old – DiaReg young, and GTRP – DiaReg old) by dividing the Levenshtein distance by the

difference in years (of birthyear for the former and of recording year for the latter). Initially, I started with a mixed effects regression analysis, but eventually I turned to GAMS. The first GAM only contained the annual change as the dependent variable and one smooth for longitude and latitude as the independent variable. From that point, more variables were added in different combinations in order to arrive at the final and best model. Predictor variables that were considered were type (a factor containing two levels: apparent time and real time), location, longitude, latitude, area (a factor relating to language area containing two levels: Frisian and Low Saxon), word, and apparent time correction (a numeric variable indicating the apparent time difference between speakers in the real time data, i.e. the difference in their birth years).

The final model contained a smooth for longitude and latitude, a random smooth for word and a fixed effect for type. This effect for type indicated that real time data is associated with an increase in language change, which confirms the idea that elderly speakers can also adopt newer forms and that apparent time data collection therefore underestimates language

(13)

change (e.g. Boberg, 2004). Visual inspections of separate plots for the real time and apparent time dat indicated that the patterns of the real time change and apparent time change were very similar, but less pronounced for the apparent time data. Unfortunately, the model explained only 13,4% of the variation, which is not very good. A visualisation of this model can be found in figure 4.

Fig. 4: Sound change as a function of longitude and latitude. Red colours indicate a larger sound change, green colours a smaller sound change. The two overlapping locations on the bottom left are IJsselmuiden and Kampen.

Figure 4 shows that the dialects appear to be most stable in Frisia and in the most Eastern parts of the Low Saxon language area, i.e. the East of Groningen, Drenthe and Overijssel. Although these data only relate to sound change in general and not to sound change towards Dutch, the fact that there is less language change in the Frisian language area is in line with the more positive language attitudes of the Frisians and the larger amount of speakers. As for the Eastern parts of the Low Saxon language area, I hypothesise that their conservativeness might be related to their relative isolation, which I will further investigate in my master’s thesis. It also appears that the most prominent place of language change lies around the city of Groningen, which is in line with the fact that the city attracts people from different linguistic backgrounds that settle in the villages around it and commute to

Groningen.

Due to the low percentage of variation that is explained by the model and the many possible predictors that were not included in this study (e.g. population size, population income, isolation of locations), a further analysis with more variables is required. A current

(14)

study also suggests that separating the language change from the amount of years between the recordings (or birth years) also results in a better model (Buurke, in preparation).

3.3.8 Outreach

Outreach events are an important (and often overlooked) part of academic research, and I was glad to be asked by Anna Pot for help in making some promotional material for the annual Dag van de Grunneger Toal. I interviewed several people with an interest in or

importance to the dialects of Groningen, such as musicians, authors or researchers. I also edited the interviews I conducted, after which they were placed on the CGTC website. The event itself was unfortunately postponed until September 2020.

3.4 Learning outcomes

Below, you will find the learning outcomes as proposed in the Form for Approval of Placement Work Plan (form 3). For each learning outcome, I will specify how this goal has been attained or, in the case it has not been attained, why this happened.

1.2 have a thorough knowledge of at least one theoretical and methodological approach within linguistics

During the internship, I learned a lot about current theories and methods in

dialectology. Initially, the focus was more on perceptual dialectology, but after the change of projects it shifted to dialectometry. For both lines of research I have learned about the general assumptions underlying the theories and the most common methods associated with these theories.

2.1 be able to formulate an academic problem independently, and in so doing, to select, apply and where necessary adapt an adequate theoretical framework and one or more relevant research methods

For both projects, a niche was first found within research on the Low Saxon dialects spoken in the Netherlands. Then, an appropriate research method was constructed by combining and/or adapting methods from perceptual dialectology, psycholinguistics and phonetics for the first project, and methods from dialectometry for the second project.

2.2 be able to make an original contribution to knowledge in at least one subdiscipline in linguistics

The first project, if finished, would have made a contribution to the discipline of dialectology, and more specifically perceptual dialectology. No studies known to the researcher have attempted to study the relationship between actual and perceived

pronunciation differences between the dialects of Groningen and Drenthe. The second project made a contribution to the field of dialectometry by researching both the apparent time and real time language change in the Low Saxon and Frisian dialects spoken in the Netherlands.

3.1 to make use of the research results of others and evaluate these critically

Results from previous studies were used to inform the method. Especially for the first project, pronunciation differences between the dialects of Drenthe and Groningen that had been found previously were evaluated in order to see which ones were still worthwhile investigating.

(15)

4.1 be able to participate actively in a research group working on an academic project 4.2 be able to work with other students and lecturers on an academic project

As I worked on my own project rather than joining an existing project, I did not work together with other students or lecturers a lot. I did join the biweekly Speech Lab meetings in which members gave feedback on each other’s progress, ideas and presentations. I also incorporated feedback on my projects from both my supervisors, dr. Charlotte Gooskens, Martijn Bartelds, Teja Rebernik and Raoul Buurke. Within the lab, brainstorming on projects outside of meetings was also common and I participated in this as well.

4.3 be able to participate in international academic debate in the chosen area of

specialization and to present an academic problem convincingly in appropriate English, both orally and in writing.

I presented the background and methodology of the second study to the Speech Lab Groningen, although at the time there were no results yet. An abstract for a poster

presentation was also accepted for TABU Dag 2020, but unfortunately this conference was cancelled because of the pandemic. Instead, I will present the findings of this study at the 2021 TABU Dag. In terms of writing, I am currently preparing a publication based on the findings of this study which I hope to finish after the summer.

5.2 be able to reflect on the implications of one’s work for the development of linguistic theories

The results of the second project have an implication for the use of apparent time and real time constructs in linguistic research. The results indicated that there was a difference between the rate of change measured in the apparent time data and the real time data, but that the general geographical patterns of the two types of data did not differ. After further, more elaborate analysis of the results, they can inform methods of data collection and analysis. Because of the limited amount of time that could be payed to the project due to the switch, it was unfortunately not possible to perform an elaborate analysis of the results.

(16)

4

Evaluation of placement

4.1 Place in programme

I think that one of the best things about the Language and Cognition programme is that students have a lot of liberty to work on projects outside of the courses that are offered. As a researcher, my main interests are with sociolinguistics and dialectology. Although there are no ReMa courses that are geared specifically towards these topics, I still was able to engage with them through the other opportunities the ReMa offers, such as tutorials, summer school courses and research assistantships. The placement in particular provides a great opportunity for students to work on the field they are most interested in and experience research first hand instead of learning about it in a course. Although none of the ReMa courses I took were directly related to dialectology, I will list below the courses that were in some way connected to my internship and helped me prepare for it.

New Sounds: This course provided me with tools to measure phonetic differences (especially between accents) and to visualise them. This would have been especially useful for the first project.

Methodology and Statistics for Linguistic Research: During this course, I first learned about GAMs and other more complex statistical models. Without this course, I don’t think I would have felt confident enough to perform a GAM for my internship. I also used

Levenshtein distances for the first time in the final paper for this course.

Inleiding programmeren I: During this course, I learned the basics of programming in Python, which were essential for the second project of this internship. N.B. As this is a BA course, I did not receive credits for the course and it does not appear on my transcript. Summer school courses: Several summer school courses have enforced my interest in dialectology and the use of large quantitative datasets in researching language variation.

BA courses: The BA course English Language Variation first introduced me to concepts such as dialect levelling and sparked my interest in dialectology. The course Europese talen: Veeltaligheid en meertaligheid gave me valuable insights in speaker populations and language policy.

4.2 New knowledge and skills

Below, I will provide a reflection on the most important new knowledge and skills that I have gained through this internship. Rather than listing all skills that have improved, I will only mention the most significant improvements and provide a short analysis of how this skill has improved and, if applicable, how I intend to keep improving it.

4.2.1 Knowledge

Before the internship, I already had quite extensive knowledge of the Low Saxon language (and Gronings in particular) because of my research assistantship with Martijn and personal interest. The literature research I did for the initial internship project has deepened this knowledge, especially regarding the differences between the traditional Low Saxon dialects and how these differences came to be. Additionally, I have learned a lot about the dialects spoken in the Frisian language area and how its speaker population differs from the Low Saxon speaker population.

I also learned a lot about computational techniques such as the Levenshtein distance and pointwise mutual information (PMI). When I worked with Levenshtein distances in my

(17)

final paper for the Methodology and Statistics course, I only did simple Levenshtein distances in which all operations have the same cost and sound differences are not taken into account. During this internship I learned about the considerations you need to take when working with Levenshtein distances, such as forced alignment, operation costs, weightedness and

normalisation. I also learned how to work with PMI, how it is calculated and in which situations it is useful. Knowledge of these computational techniques will come in useful for my thesis as well, and I hope to learn more about them then.

Finally, I learned a lot about privacy and data storage from my CETO application. The Research Data Office had many useful tools, checklists and information on the GDPR that will also come in useful for future data collection.

4.2.2 Hard skills

Perhaps the most obvious skill to have improved during the internship is that of transcription, as I have done a lot of it. I had made phonetic transcriptions before during my assistantship and also during an unrelated project I did with Remco Knooijhuizen during my BA, but the amount and nature of the data used for the internship have made this a very different experience. I feel like I can confidently say that I have not only become a more accurate but also a faster transcriber, as I realised that some sounds are just ‘in between’ and therefore will not have a perfect transcription.

The second and perhaps most significant hard skill that I have gained through this internship is definitely Python programming. At first, I did not feel like I would know enough about programming to actually write code for my own project, but I am very glad that I took on the challenge anyway. Although changing the transcriptions with lengthening symbols to transcriptions without lengthening symbols would probably have gone faster if I would have done it manually, I think it was a good decision to try to write the code for this myself as it has taught me a lot. Before this internship, I had not worked with libraries and dictionaries before and had only done some very basic loops and if/else statements during an introductory course. This experience has not only improved my programming skills in Python, but when I opened R to do the statistical analyses I noticed that some aspects of the R language also made more sense to me than they did before. Writing the code for my own research project has been a very satisfying experience and I intend to continue improving my programming skills during the thesis and after.

The final major skill improvement that I would like to mention is that of performing GAMs. During the Statistics and Methodology class we also performed GAMs, but as that was only an introduction to the technique, I couldn’t say that I was confident in doing it. When I started the second project, I was still a bit fearful to perform a GAM, as I often have trouble with ‘just accepting’ that something works when I don’t understand how or why it works. After reading up on GAMs more (Wieling, 2018; Wood, 2006), I felt like I understood them just enough to understand what to do. Although I still made a lot of beginner’s mistakes such as having my data in wide rather than long format and forgetting to change character variables to factor variables, I now feel more confident in using this technique and would like to continue improving my understanding of it.

4.2.3 Soft skills

Although the acquisition of new knowledge and concrete skills have been a great benefit to doing this internship, I think one of the most important things that I have learned is to see what ‘soft skills’ I still need to improve. One thing that has been an issue for me in my entire academic career so far, is that I find it very difficult to ask for help. Where courses and

(18)

my BA thesis were concerned, this has not been a large problem, as I was always in more or less familiar waters and knew enough about the field or programmes I was using to figure things out myself. During this internship, however, I have worked with many techniques and programmes that I was mostly unfamiliar with before, such as GAMs, PMI and Python programming. In hindsight, I feel that on some occasions I have wasted too much time trying to find solutions to problems that I could not solve with my level of knowledge and skill, and that it would have been better to ask for help earlier in these cases. I strife to improve on this during my MA thesis and afterwards. When solving a problem, it would be useful for me assess earlier on whether the benefits of doing it myself (e.g. independence, learning from the experience) outweigh the costs (e.g. spending too much time on one thing, frustration if it doesn’t work).

Another problem that I encountered is that I continuously underestimated the amount of work some of the tasks would take. This especially pertains to the more repetitive,

unrewarding tasks such as finding the timestamps in the recordings and making transcriptions. I would often make an estimation of how long something would take based on how long I wanted it to take, rather than setting a timer while I did a small section of the work and then multiplying that by the amount of work that still needed to be done. To some degree, I also underestimated how long writing the necessary functions in Python would take, as I was not very experienced with this. In the future, I aim to make more realistic estimations of how long I will spend on each task, based on evaluations of small sections of work and calculating in some extra time for tasks I am not yet familiar with.

A final useful soft skill that I have learned during this internship is the ability to work from home. Normally, I prefer to work at the university or at a library so that I can separate my work or study life from my home life and so that I get less distracted. During the first project, I mostly worked in the flex office in the ‘Frisian hallway’, which I very much enjoyed. At the start of the pandemic, it took me a long time to get adjusted to working at home and I had difficulty concentrating on my work. Although I still strongly prefer working at the university, it is good to know that working from home (to some degree) is an option for me as well.

4.3 Learning outcomes

As can be seen in section 3.4, I did not meet all of the learning goals that were set at the beginning of the internship. Part of this is due to the pandemic and the resulting project change, as it meant that I did not only abandon my first project but also that I had a more limited amount of time for the second project than a regular internship. This has especially had an effect on learning goal 5.2 (“be able to reflect on the implications of one’s work for the development of linguistic theories”), as there was little time left at the end of the internship to do a thorough analysis of the results. In the limited amount of time that was left for the project, I chose to spend more time on writing code, as I thought his would be a more useful experience for me.

4.4 Supervision

At the start of the internship I had irregular meetings with Goffe. I indicated that I would like to meet every week, but he preferred to avoid ‘meeting for the sake of meeting’, so we agreed that I would email him or drop by in his office whenever I had a question. As I usually worked in the flex office around the corner, this worked out well enough. When I switched projects, I started having meetings with Martijn instead of Goffe, as the new topic

(19)

was more related to his area of expertise. These meetings were weekly, which worked better for me as it allowed me to ask the questions that I felt were too small or trivial to require a separate email or visit. I will try to keep this in mind for future projects as well.

4.5 Future career

Before starting the internship, I wanted to use the experience as a way of seeing whether I would be interested in applying for a PhD position after the ReMa, because certain aspects of academia had made me reconsider whether this would be the best career choice for me. During the first part of the internship, I was very motivated and felt that creating and carrying out my own research project was something I would always want to do. After the project change, however, I became less motivated and often wondered whether research would be the right career choice after all. Especially when I was collecting the timestamps and making transcriptions, I felt bored and frustrated because it did not feel like doing ‘actual research’. I realise that research (almost) always involves pre-processing of data, so this made me wonder whether I was cut out for research at all.

In the end, I am still not sure whether I want to pursue a career in academia. On the one hand, there were aspects of my internship that I did not like at all, but on the other hand I think that the circumstances under which I did the second project were not ideal and also not representative of what a research environment is usually like. The frustration I felt was also reinforced by having to work from home every day, which made me feel isolated and

disconnected from the university. I hope that during the thesis I will be able to find out more about what would be the best choice for me.

4.6 What comes next

In September, I will start on my thesis. At the start of the pandemic, I planned to do my initial internship project as my thesis instead, but since data collection will probably still be possible only to a limited degree, I have decided on a different project. Because I really enjoyed working with dialectometric techniques, I want to study dialects in the Netherlands (or the Low Saxon language area in particular if the scope of the thesis demands this) from an ‘accessibility’ point of view. This means that I will look at the degree to which locations in the Netherlands are accessible/isolated and how this correlates with dialect differences or dialect change. In order to do so, I have requested access to a large dataset by the Dutch government on accessibility by public transport.

Since I am not sure yet whether I want to apply for a PhD position at some point, I think it would be best for me to apply for jobs outside of academia after the thesis. This would hopefully help me to not accidentally rush into something I am not entirely sure about and give me some time to see whether I miss doing research enough to come back to academia.

(20)

5

Conclusion

During this internship, I have learned new things about Low Saxon and Frisian

dialects, programming and statistics. The most important thing, however, is that I have gained experience in the practice of doing research and now have a more complete view of what goes into being a researcher. I am very grateful for this experience and hope that in combination with my thesis it will allow me to make a well-considered decision on what I want to do after the research master.

Both before and during the internship I have been involved in the CGTC and the Speech Lab Groningen, and I certainly plan to stay involved after. I would like to thank the other lab members for their creativity, knowledge, support and, perhaps most importantly, the great atmosphere they provided. I would also like to thank the members of the CGTC for the important work they are doing to preserve and promote the dialects of Groningen: their efforts create awareness and a breeding ground for cultural expressions that are essential for a

(21)

Bibliography

Blanquaert, E., & Pée, W. (1925). Reeks Nederlandse Dialectatlassen. De Sikkel.

Bloemhoff, H., Bloemhoff - de Bruijn, P., Nijen Twilhaar, J., Nijkeuter, H., & Scholtmeijer, H. (2019). Nedersaksisch in een notendop. Van Gorcum.

Bloemhoff, H., Niebaum, H., Nijen Twilhaar, J., & Scholtmeijer, H. (2013). Low Saxon phonology. In F. Hinskens & J. Taeldeman (Eds.), Language and Space: Dutch (pp. 454–475). De Gruyter Mouton.

Bloemhoff, H., van der Kooi, J., Niebaum, H., & Reker, S. (2008). Handboek Nedersaksische taal- en letterkunde. Van Gorcum.

Boberg, C. (2004). Real and apparent time in language change: Late adoption of changes in Montreal English. American Speech, 79(3), 250–269. https://doi.org/10.1215/00031283-79-3-250

Buurke, R. (2020). Dialects across time and space: Computational modeling of dialects in the Netherlands [master's thesis]. University of Groningen. Manuscript in preparation. Chambers, J. K., & Trudgill, P. (1980). Dialectology (2nd ed.). Cambridge University Press. Goeman, A., & Taeldeman, J. (1996). Fonologie en morfologie van de Nederlandse dialecten.

Een nieuwe materiaalverzameling en twee nieuwe atlasprojecten. Taal En Tongval, 48, 38–59. http://hdl.handle.net/1854/LU-258419

Goeman, T., & Jongenburger, W. (2009). Dimensions and determinants of dialect use in the Netherlands at the individual and regional levels at the end of the twentieth century. International Journal of the Sociology of Language, 196–197, 31–72.

https://doi.org/10.1515/ijsl.2009.016

Gooskens, C. (2005). How well can Norwegians identify their dialects? Nordic Journal of Linguistics, 28(1), 37–60. https://doi.org/10.1017/s0332586505001319

Heeringa, W. (2004). Measuring dialect pronunciation differences using Levenshtein distance [Doctoral thesis]. University of Groningen.

Heeringa, W., & Hinskens, F. (2014). Convergence between dialect varieties and dialect groups in the Dutch language area. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech (pp. 26–52). De Gruyter.

Heeringa, W., & Nerbonne, J. (2013). Dialectometry. In Frans Hinskens & J. Taeldeman (Eds.), Language and Space: Dutch (pp. 624–646). De Gruyter Mouton.

Hilton, N. H., & Gooskens, C. (2013). Language policies and attitudes towards Frisian in the Netherlands. In C. Gooskens & R. van Bezooijen (Eds.), Phonetics in Europe:

Perception and production (pp. 139–157). Peter Lang.

Hinskens, Frans, & van Oostendorp, M. (2006). De palatalisering en velarisering van coronale nasaal-plosief clusters in GTR. Talige, dialectografische en onderzoekerseffecten. Taal En Tongval, 58(1), 103–122.

(22)

Noord-Hollandsche Uitgevers Maatschappij.

McKinney, W. (2010). Data Structures for Statistical Computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a

Oliphant, T. E. (2006). A guide to NumPy. Trelgol Publishing USA.

Palander, M., & Riionheimo, H. (2018). Imitating Karelian: How is Karelian recalled and imitated by Finns with Border Karelian roots? 85. In On the border of language and dialect. The Finnish Literature Society.

Preston, D. R. (1989). Perceptual dialectology: Nonlinguists’ views of areal linguistics. Foris Publications.

Preston, D. R. (2018). What’s old and what’s new in perceptual dialectology. In On the border of language and dialect (pp. 16–37). The Finnish Literature Society. Su, D. (2016). weighted-levenshtein (Version 0.2.1) [Python library].

https://github.com/infoscout/weighted-levenshtein

van den Berg, B. L. (2003). Phonology & Morphology of Dutch & Frisian Dialects in 1.1 million transcriptions. Goeman-Taeldeman-Van Reenen project 1980-1995. In Meertens Instituut Electronic Publications in Linguistics 3. Meertens Instituut, CD-ROM.

Veldman, F. (1992). De taal van Westerwolde: Patronen en structuren in een Gronings dialect [Doctoral thesis]. Rijksuniversiteit Groningen.

Weinberger, S. H. (n.d.). The Speech Accent Archive. Retrieved February 20, 2020, from http://accent.gmu.edu/

Wieling, M. (2018). Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics, 70, 86–116. https://doi.org/10.1016/j.wocn.2018.03.002 Wieling, M., Margaretha, E., & Nerbonne, J. (2011). Inducing phonetic distances from dialect

variation. Computational Linguistics in the Netherlands Journal, 1, 109–118. Wieling, M., Nerbonne, J., & Baayen, R. H. (2011). Quantitative social dialectology:

Explaining linguistic variation geographically and socially. PLoS ONE, 6(9). https://doi.org/10.1371/journal.pone.0023613

Wood, S. (2006). Generalized Additive Models: An Introduction With R. Chapman & Hall. https://doi.org/10.1198/tech.2007.s505

Referenties

GERELATEERDE DOCUMENTEN

Therefore, we investigated the effect of the implementation of the screening program on the stage distribution of incident breast cancer in women aged 70–75 years in the

x Software development: For the project I had to develop small programs such as a topic modelling program to select the most similar sentences to our Frisian data and a program

In other words, we expect Low Saxon varieties to show a possibly higher rate of change than the other language varieties in the Netherlandic language area, because both the

Jensma to do an internship under his supervision for the Centrum Groninger Taal en Cultuur (Center Groningen Language and Culture) focusing on the regional language

The module description of Beginners’ Dutch states that the module “will introduce students to the Dutch language both in its written and spoken form so that by the end of the

[r]

(Als het goed is, behoort een ieder dit van zijn vakgebied te kunnen zeggen.) Hij heeft echter gemeend aan deze verleiding weerstand te moeten bieden, omdat er rede-

The increased Hb A and P50 values found in the diabetic mothers (Table I) as well as the signifIcant correlation between the P 50 values in diabetic mothers and the percentage of Hb F