• No results found

Pronunciation of English by Dutch native speakers: Is there a correlation between human judgements and acoustic-phonetic measurements?

N/A
N/A
Protected

Academic year: 2021

Share "Pronunciation of English by Dutch native speakers: Is there a correlation between human judgements and acoustic-phonetic measurements?"

Copied!
49
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Pronunciation of English by Dutch native speakers:

Is there a correlation between human judgements and

acoustic-phonetic measurements?

Karen Désirée Johanna Maria de Bot 1605291

MA thesis

(2)
(3)

Table of Contents

Acknowledgements 5

Abstract 6

Chapter 1: Introduction 7

Chapter 2: Human judgements in pronunciation 10

- Elicitation of speech 10

- Findings of previous research on pronunciation in controlled situations 11 - Findings of previous research on pronunciation in spontaneous speech 12 Chapter 3: Common errors in the pronunciation of Dutch speakers and English speakers 15

- Segmental features 15

Voice onset time 15

Vowels 17 Final (de)voicing 18 Dental fricatives 19 - Suprasegmental features 20 Coarticulation 20 Intonation 20 Stress 20 Rhythm 21

Chapter 4: Studies using acoustic-phonetic measurements 23

Chapter 5: Method 25 - Design 25 - Expectations 25 - Participants 25 - Materials 26 - Procedure 27 Chapter 6: Results 30 - Manipulations 34 - Quantitative analysis 40 Chapter 7: Discussion 41 Chapter 8: Conclusion 43 - Methodological pitfalls 43

(4)

References 45

Appendices 47

- Catching a Thief 47

(5)

Acknowledgements

(6)

Abstract

(7)

Chapter 1: Introduction

In the area of second language pronunciation research, native speaker judges are often used to judge second language pronunciation. In this thesis the assessment of the pronunciation of English by native speakers of Dutch is investigated. Accents will be judged by native speakers of the particular language. Research into second language pronunciation included both the production and perception of sounds. Golestani and Zatorre (2009), for example, investigated the individual differences in non-native speech sound learning, and Flege, Bohn and Jang (1997) studied the production and perception of English vowels by non-native speakers. Both studies, and many others, made use of judges who give their opinion about the pronunciation of the speakers. These judges judge some errors more strictly than others. Van den Doel (2006) showed errors that influence intelligibility or lead to annoyance or amusement are judged most strictly in his study about the pronunciation of Dutch native speakers speaking English. This is confirmed by Koet (2007) who found that misplacements of primary stress are the most serious errors to native speakers of English, followed by fortis/lenis neutralisation and some substitutions of the dental fricatives /θ, ð/ by /t, d/.

(8)

(Nooteboom & Cohen, 1995). These differences in vowel length can be measured the same way VOT is measured.

With the opinion of the human judges on one side and the acoustic-phonetic measurements on the other side, the question is which method is more valid. Therefore, on the basis of the literature two research questions were developed. The first question is if there is a correlation between human judges and acoustic-phonetic measurements. Does the opinion of the human judges correspond to acoustic measurements? The second question is which acoustic-phonetic features can influence a judgement about the quality of the pronunciation and if the judges make use of the same acoustic-phonetic cues in their assessment. To find an answer on these research questions I developed the following experiment.

Several groups of participants were used in the experiment to look for differences in assessment of the pronunciation between different levels of L2. For the real speakers that were needed for the spontaneous speech to be judged, three different groups were used, with six participants per group. This means that 18 participants joined the experiment. The manipulations have been done in a 3x2x2 design, with a change in VOT and a change in vowel length in every subgroup. The task the speakers have been done is a task that elicits spontaneous speech. It is based on the study by Mobärg (1989), who asked his speakers to describe a cartoon. This is also done in this thesis.

The choice for the United Kingdom was made because of a study by Koet (2007), who found that native speakers of British English notice more errors than American speakers do. American judges also judge female speakers more positively than male speakers. Based on these findings, Koet’s suggestion for further research was to use British native speakers. Thirty native speakers were asked to fill in a questionnaire. For every speech sample, both original and manipulated, the judges gave their opinion about the quality of the pronunciation and how certain they were about their opinion. Then the judges were asked how they came to their opinion. The goal of the experiment is if British native speakers are able to hear whether or not the speaker is a native speaker of English. A five point Likert scale (Likert, 1932) which ranged between “The pronunciation influences the intelligibility so badly you cannot understand what is said” and “Native or native like accent” was developed for more nuance. The manipulated speech samples were brought in the questionnaire to find out if the judges would change their opinion when, for example, the VOT was manipulated. If so, it would be a clue that VOT is an important feature for the pronunciation of English.

(9)
(10)

Chapter 2: Human judgements in pronunciation

When Dutch learners of English ask native speakers of British English what they think of their pronunciation, the native speakers will always have a judgement. The question is, how reliable is a judgement like this. Subsequently, one might question as to what the most important characteristic for a fair or good pronunciation is. And how can a judgement of pronunciation be validated? In this chapter I will describe different ways of eliciting speech and discuss their advantages and disadvantages. I will also provide an overview of the different studies that have dealt with judgements of pronunciation and link those studies to my research questions. The research question that will be addressed in this chapter is if there is a correlation between human judges and acoustic-phonetic measurements, so if the opinion of the human judges corresponds to the measurements of the computer. Answering these questions will hopefully provide insight in the validity of both the judges and the acoustic measurements. However, before a pronunciation study can be conducted speech samples should be provided. The ways of eliciting speech should be chosen carefully, because they may affect the assessment of pronunciation. Therefore in this chapter I will first discuss different ways of eliciting speech before elaborating on pronunciation studies that made use of human judgements.

Elicitation of speech

(11)

that the controllability of the experiment is very high, the participants do exactly as the researcher says. However, the participants learn to focus on their pronunciation, so it is likely that the pronunciation differs from their everyday pronunciation because attentional control may affect the quality of the pronunciation. For this reason, it would be better to use pronunciation in spontaneous speech.

Pronunciation in spontaneous speech includes the pronunciation of how a person would speak in everyday life. Spontaneous speech differs from, for example, an echoing task since people do not have to repeat a word or sentence which is the case in an echoing task. An experiment involving spontaneous speech does not deal with tests or sound repetition. Mobärg (1989) suggested a way to record spontaneous speech. His speakers were asked to describe a cartoon. They were allowed anything to say about the cartoon within approximately one and a half minute. This way the participants could say whatever they wanted using quite spontaneous speech, but all participants talk about the same subject, which is better for comparing the speech. Also, if participants were asked to talk about a certain cartoon, they are focussing on that cartoon and not on their pronunciation. Therefore pronunciation is more natural and more like the participant would speak every day. In order to answer the research questions posed in this thesis, it might be best to use spontaneous speech for the reasons discussed above. In order to judge pronunciation as fairly as possible, it is beneficial if the pronunciation samples are as natural as possible. In the next section of this chapter I will discuss findings from studies focusing on learned pronunciation as well as findings from studies that made use of spontaneous speech.

Findings of previous research on pronunciation in controlled situations

As described in the previous section, there are different ways of studying pronunciation, being spontaneous speech pronunciation. A study by Flege et al (1997) investigated the production and perception of English vowels by non-native speakers. Two groups were compared: all the participants were non-native speakers, but one group was exposed intensively to English whereas the other group was not. The group that was intensively exposed was better at identifying vowels than the group that was not. However, the samples were only judged by the researchers, not by independent judges. Also, the participants were exposed to an amount of input that was controlled by the researchers. A controllable situation like in the study by Flege et al, where the input is controlled or where the participants are doing a learning task, for example a task in which they have to listen and repeat certain sounds, is the most common situation in studies with judges.

(12)

researchers were also the judges, there were no independent judges involved in the experiment. This is, however, not the case in my thesis since I focus on the judgements made by external judges. Also, the focus is on spontaneous speech because, as discussed earlier, this is more natural than controlled speech. Therefore, I will now elaborate more on studies focusing on spontaneous speech since they may provide a better background for this subject.

Findings of previous research on pronunciation in spontaneous speech

(13)

The scales used by Koet are just one way to judge pronunciation. Another way of judging pronunciation was done by Van den Doel (2006). Van den Doel developed two hierarchies of error, based on his investigation. The first hierarchy of error was for British English (RP, Received Pronunciation), the second for American English (GA, General American). These hierarchies of error contain the most common errors in the pronunciation of English by Dutch speakers. Van den Doel developed the hierarchies of error by questioning 500 native speakers of English who spoke different varieties to listen to English pronounced by Dutch speakers and asked them to fill in the errors they heard on a scale based on the Likert scale (Likert, 1932) for British English and American English. Van den Doel used a 5 point Likert scale, so the judgements ranged from 1 to 5, one being the highest and 5 being the lowest or worst. He decided that the errors that were seen as worst scored over 3.5, the least serious errors scored less than 0.4. The choice for these precise numbers for a rather global assessment is not very clear. After analysing, a difference between the British and American speakers was found, although in both groups errors that influenced intelligibility were named as the most serious errors. This gives a good insight in the goals of second language pronunciation; whether to focus more on native-like pronunciation or if intelligibility is also enough. Furthermore, the British judges thought that stress errors are the most serious errors of all (>3.5). Stress errors are misplacements of primary stress on words like advertisement or perfect. Generally, Dutch people tend to stress the syllable tise and fect, as is done in American English (Van den Doel, 2006). One might wonder however if it is right to call these certain errors an error, since it is common in an other English dialect. These ‘stress errors’ were followed in terms of seriousness by fortis/lenis neutralisation, some substitutions of the dental fricatives /θ, ð/ by /t, d/, and the misuse of sounds, for example the use of the uvular-r, unaspirated [t], et cetera. The American judges in this investigation were also more strict than the British judges, just as in the study by Koet. They thought that stress and stress-related errors, fortis/lenis neutralisation and the use of the uvular-r all were very serious errors (>3.5). Van den Doel also mentioned that British and Australian speakers observed more errors, but judged them more leniently than the American speakers. So again, proof was found that in future studies with judges, it is better to use British judges than American judges. This is not exclusively because American judges judge more strictly, but more because British judges observe more errors.

(14)

more likely to judge the pronunciation more leniently, even though the specific word was not pronounced that way in British English or American English. The judges have heard of a native speaker pronouncing it like ‘fillem’, so some of them did not mark it as being an error. The effect of accent similarity was investigated, but no significant difference between words like ‘film’ and other words without accent similarity was found. This means that not all judges based their opinion on a certain variety of English. A final important conclusion by Van den Doel was that not only the pronunciation that influences intelligibility was judged strictly by the native speakers, but also pronunciation that leads to annoyance or amusement was judged in a strict way. For example, the uvular-r in red and phoneme exchanges like f<->v, t<->d, v<->w and æ<->e, which makes very sound like ferry and bed like bet. The judges understood the speaker perfectly well, but still marked it as a serious error. The question then is if it is all right to mark some characteristics as an error when they do not influence intelligibility, as an error is a very serious word.

(15)

Chapter 3: Common errors in the pronunciation of Dutch speakers and English speakers

Introduction

In this chapter I will elaborate on the differences between Dutch and English and the difficulties Dutch speakers may have with the pronunciation of English. These differences cannot only be heard, but they are also measurable and can therefore be analysed acoustically using a computer. After giving a description of each feature and discussing what the differences between Dutch and English are, I will explain how these measurements and acoustic-phonetic analyses can be done according to the literature and with the help from computer programme Praat by Boersma and Weenink (2009). I will not only discuss the differences in pronunciation of sounds, but also I will also focus on some other parts of pronunciation, such as coarticulation. An important thing to remember is that pronunciation is not just about words. There are some phenomena that only occur when words or sounds are combined so I will also discuss these phenomena. I will start with the segmental features, which are features that deal with segments like vowels, consonants and other sounds. Then I will continue with the suprasegmental features. Suprasegmental features are features that are not just about a sound, they can be described as phenomena between or above words, like stress, rhythm or intonation (Kooij and Van Oostendorp, 2003). Then I will explain how this knowledge can be used to answer the research questions. From these feature a selection will be made of phenomena that can reliably be used in the experiment in this thesis.

Not all the characteristics that will be described in this chapter have been used in this thesis. This is because it would not be feasible to include every characteristic in the study, as there are too many. I have focussed on the ones that have been used in previous studies about pronunciation assessment, but to provide a fair overview, other characteristics will also be mentioned in this chapter.

Segmental features

Segments are consonant and vowel speech sounds that can be represented by a phonetic alphabet like IPA, the International Phonetic Alphabet (Collins and Mees, 2003). The acoustic characteristics described below are the most important ones because they highlight the differences between the pronunciation of Dutch and English.

Voice onset time (VOT)

(16)

According to Collins and Mees (2003), aspiration is heard in the English voiceless plosives /p, t, k/, mostly when these sounds occur in stressed syllables. The aspiration is heard as a short period of voicelessness after the release of the burst. The difference between English and Dutch is that in English voiceless plosives can be either aspirated or unaspirated, whereas in Dutch consonants are either voiced or voiceless and prevoiced. This means that voiceless plosives in Dutch can never be aspirated; immediately after the release of the burst, the rest of the sound follows. The aspiration in Dutch voiced vowels usually takes place before the release of the stop, which is called prevoicing. The table below by Lisker and Abramsom (1964) compares the voiceless plosives of English and Dutch with their VOT in ms. The table shows that plosives pronounced in English have a longer VOT than plosives pronounced in Dutch. The raised [h] indicates the aspiration in phonetic transcriptions.

Table 1: VOT in ms. Dutch is unaspirated, English is aspirated (Lisker and Abramson, 1964)

The differences between the VOTs of English and Dutch are made clear by the following table. The time of the voice onset time can be divided into three parts. As stated before, when the VOT is less than 0 ms, so before the release of the burst, it is called prevoicing, also know as lead. The time between 0 and 35 ms is short-lag, and above 35 ms is called long-lag.

Lead/prevoiced Short-Lag Long-Lag

VOT (ms) <0 ms 0-35 ms >35 ms

Table 2: VOT scale

(17)

Table 3: Difference between Dutch an English for /d/ and /t/

In Dutch, the VOT is normally in the prevoicing or short-lag region. Therefore Dutch is called a ‘prevoicing language’. The VOT in English is much longer, between short-lag and long-lag. English is therefore called an ‘aspirating language’.

The difference in VOT can be measured using computer software. The best way to do this is by using computer programme Praat. In Praat a graphic image of speech is shown through an oscillogram or a spectrogram. That way, sounds can be distinguished from each other and also the length of certain sounds can be measured. Since VOT is the time from the beginning of the plosive to onset of the vowel, one simply measures that time to get the VOT of the speaker in that sound. If the VOT of a Dutch speaker and the VOT of a native speaker of English for the same sound will be compared, the VOT of the Dutch speaker is shorter than the VOT of the English speaker. However, since the VOT is measured in milliseconds, the difference between Dutch and English is very small and it is the question if judges would say that the pronunciation of the Dutch speaker is bad because his plosives are shorter. This could be investigated by making the VOT longer to see if their opinion does change a bit, even though the judges might not have a strong reason why. The figures below show the VOT for /p/ in ms, pronounced by Dutch woman and a British man. The selected pink parts the VOT.

Figure 1: the VOT for /p/ in ms, pronounced by Dutch woman

Figure 2: the VOT for /p/ in ms, pronounced by a British man

(18)

There are also differences between the vowels in Dutch and English. For example, the Dutch learners of English have trouble distinguishing between /ε/ and /æ/, because in Dutch no distinction like this is made (Broersma, 2005). Because Dutch learners cannot distinguish between the two it is more difficult to produce them as separate vowels. This is also the case for other vowels, but for the explanation and the measurements I use the example of / ε/ and /æ/.

To find out whether or not a Dutch speakers produces the two different vowels /ε/ and /æ/ in English, it is necessary to record speech and analyse it. As touched upon earlier, analysis can be done using Praat, because Praat can show a spectrogram with formants. A formant in a spectrogram is a peak of energy, which is the result of the vocal tract that acts like a filter (Roach, 2001). The mouth and pharynx are shaped by a tongue- and lip-configuration so that they can resonate the air for every individual vowel, because for every vowel the mouth and pharynx are shaped differently. There are multiple formants, but the first two formants help us characterise most vowel-sounds (Catford, 2001). The table below shows the first two formants of the /ε/ and /æ/ for a male voice.

F1 F2

[ε] 610 1900

[æ] 820 1530

Table 4: F1 and F2 for /ε/ and /æ/, in hertz (Hz).

As shown in the table, there is a difference between the two vowels. In the computer one can measure the formants of certain vowels to find out if a Dutch speaker really is pronouncing [æ] instead of [ε]. This can also be done for other vowels.

Final (de)voicing

(19)

fricative are longer than vowels before a voiceless plosive or fricative, as described in the section above. A computer programme can also recognise the voiced or voiceless sound itself. In her master’s thesis, Otten (2003) found that Dutch advanced learners of English who have achieved a near-native level in the pronunciation of English were able to master final voicing through practice. Dutch learners of English who did not study English or any other language, did not learn how to use final voicing. Learners noticed differences in the vowel duration before the voiced or voiceless sound, but were not yet able to reproduce it. More practice can solve this problem, as Otten shows with the advanced learners of English. The difference between advanced learners of English and intermediate learners of English in final voicing can thus be measured in Praat. However, the question whether human judges find it important for good pronunciation, still remains. Therefore, this issue will be investigated in this thesis.

Final devoicing is, as stated in the paragraph above, connected to vowel length. Vowels in English are significantly longer if they are pronounced before a voiced sound than if they are pronounced before a voiceless sound. This means that if final devoicing takes place because a Dutch speaker is unfamiliar with final voicing, the speaker automatically also shortens the length of the vowel (Nooteboom & Cohen, 1995). So, vowel length is a very important cue of final voicing, as Otten (2003) describes.

Dental fricatives

It is a well-known fact that Dutch speakers have difficulties with pronouncing dental fricatives in English (voiceless: /θ/ and voiced: /ð/). They substitute the sounds with /t/, /d/, /f/, /z/ or /s/, which are either dental sounds or fricatives. The replacement of /ð/ by /d/ is one of the most common Dutch errors (Collins and Mees, 2003). According to Wester, Gilbers and Lowie (2007), this substitution problem is phonetically based. They also found that very few learners of English could produce the dental fricatives correctly. The differences between the sounds can be detected because the sounds that the Dutch use to pronounce the dental fricative, all exist in English. If for example ‘three’, [θri:] is pronounced as ‘tree’ [tri:], the meaning of the word also changes. It is also possible to show the differences in Praat. The /t/ and the /d/ are plosives and therefore different from the fricative /θ/ and /ð/, because in an oscillogram there is nothing until the burst of the plosive sound. A fricative sound is more continuous. The /f/ and /s/ are both voiceless and therefore automatically different from the voiced /ð/. Furthermore, the /s/ is easy to recognise, because in a spectrogram the intensity is very strong around 5000 hertz.

(20)

pronounce it, the students will not make a difference between the two sounds. They will most likely produce /θ/, like the ‘th’ in think, since that is the less difficult sound of the dental fricatives (Collins and Mees, 2003). However, in words like these, rather and bathe, /ð/ should be used. It would be interesting to see in Praat if the Dutch speakers who do make a dental fricative, will also make a difference between the two of them. In this thesis dental fricatives will not be used, because judges may use it as an accent cue. Also, it would not be feasible to include every characteristic in the study.

Coarticulation between words

Coarticulation is a characteristic of speech production that occurs in many languages. The last sound of the first word and the first sound of the second word can be coarticulated, which means that they are not pronounced entirely in their own place of articulation, but that they ‘meet halfway’. The speech system already goes to the next sound (Nooteboom & Cohen, 1995). In coarticulation, speech is realised with the highest possible speech rate, but it does not negatively influence the intelligibility (Otten, 2003). A speaker that is more fluent is more likely to speak faster and will therefore coarticulate more. In a spectrogram, coarticulation can be recognised because the formants change at the end of the first sound towards the next sound. Coarticulation also takes place within the word, but instead of coarticulation between words, this cannot be avoided.

Suprasegmental features

Suprasegmental features are opposed to segmental features. These are features that cover more than just consonant or vowel speech sounds, because they cover the individual segments (Collins and Mees, 2003). To give a correct overview of different features and characteristics, intonation, stress and rhythm will be discussed although they will not be use furthermore in this thesis since they are not relevant for the research questions. They are mentioned for a correct overview of differences between Dutch and English and for possible use in further research.

Intonation

Intonation is the use of pitch variation to convey meaning (Roach, 2001) or the pitch pattern of connected speech (Collins and Mees, 2003). Intonation patterns are more frequently properties of longer stretches of speech, and intonation can be easily heard. It is possible to distinguish an English speaker from a Dutch speaker based on their intonation only. In Dutch and in English the pitch goes down at the end of a sentence. This is called declination. However, the level of declination, so the height of the pitch, is different in Dutch and English (Nooteboom & Cohen, 1995).

(21)

A syllable is stressed if it is made more noticeable or prominent than other syllables. (Roach, 2001). This is done by a combination of loudness, pitch, vowel length and vowel quality (Collins and Mees, 2003). Not only syllables can be stressed, the stress can also be on a particular word in a sentence. According to Trommelen and Zonneveld (1990), the stress patterns of English and of Dutch are highly similar, but there are some differences, mostly when the words are (almost) the same. Examples can be found in the table below.

English Dutch Módern Modérn Président Presidént Cartóonist Cartooníst Apócalypse Apocalýps Lábyrinth Labyrínt

Table 5: Stress patterns in English and Dutch of the same words. The ‘ on the vowel highlights the stressed syllable.

In stress, a difference must be made between primary stress patterns and secondary stress patterns. An example of a secondary stress pattern is the word pronunciation. The primary stress is on the /a/, so it is pronounced pronunciátion. However, there is also stress on the /u/, because pronunciation is derived from the verb to pronóunce, in which the stress is on the /ou/. This syllable keeps a bit of its stress, so the stress pattern of pronunciation must be pronúnciátion. Lowie, Gilbers and Bos (2005) found that Dutch learners of English were able to perceive the English secondary stress pattern that was pronounced by English native speakers, but could not produce it themselves.

(22)

English. If the stress in applepie for example falls on the first syllable (so ‘applepie), it is very noticeable for a judge. Even though stress is a salient characteristic, it will not be used in this study. Previous studies that have used acoustic-phonetic measurements for pronunciation studies also have not used it, and the focus in this study is on the ones that have been used. This does not mean that stress is not a good characteristic for pronunciation assessment.

Rhythm

In rhythm, a difference must be made between syllable-timed and stress-timed rhythm. Syllable-timed means that the rhythm is based on syllables which occupy roughly equal amounts of time. Examples of syllable-timed languages are French and Spanish. Stress-timed means that the rhythm is based on the recurrence of stressed syllables. These syllables occur at approximately the same intervals of time. Stress timed languages are languages such as English, Dutch and German (Collins and Mees, 2003). Since English and Dutch belong to the same type of rhythm, their pattern is equal. Therefore I will not discuss rhythm any further.

(23)

Chapter 4: Studies using acoustic-phonetic measurements

In the first chapter of this thesis several studies that made use of judges were explained. In this thesis not only judges are used, but they are compared to acoustic-phonetic measurements. Therefore studies using acoustic-phonetic measurements will be discussed in this chapter.

Acoustic-phonetic measurements can be applied to investigate small changes in the pronunciation of the L2 of a participant. Simon (2007) for example set up a longitudinal case-study to find out if the Dutch phonetic system changes when a child acquires English, or that the child develops two different phonetic systems. She followed a three-year-old native speaker of Dutch who moved to Massachusetts (USA) where he was exposed to English intensively. Simon discovered that the child’s voiceless stops all became aspirated, both in English and in Dutch. The Dutch voiceless stops are non-native, since Dutch does not have aspirated voiceless stops. However, the VOTs in English were longer than the VOTs in Dutch, which means that there still is a small contrast between the voiceless stops of Dutch and English. Dutch voiced stops also moved to the English voiced stops in the same way that the voiceless stops did. In a study like this, only the VOT is measured. Simon measured the length of the onset of the words, thus the VOT, in both English and Dutch. This was done in Praat.

Like Simon, there are many others who make study VOT in a second or foreign language. An example is the work of Flege, who has been mentioned earlier in this thesis. Kehoe, Lleó and Rakow (2004) also studied VOT. They measured the VOT in four German-Spanish bilingual children and compared these VOTs to three monolingual German children. Differences between the different VOT patterns were found: the bilingual children did not have a pure German VOT, but it moved towards a Spanish VOT. For testing, the children were audio-recorded after which their VOT was measured using the programme Soundscope. Deuchar and Clark (1996) also investigated Spanish, but in combination with English. They measured the VOT of utterance-initial stops of English and Spanish for a young child living in England. Also in this study, differences in VOT patterns were found.

(24)

vowel length, it could be interesting to investigate it in order to find its importance in pronunciation. Therefore final (de)voicing and vowel length will be investigated in this thesis the same way VOT will be investigated.

(25)

Chapter 5: Method

Design

This study is a quantitative group study of 30 participants who judge 30 speakers speaking English. Six speakers are native speakers of Dutch who are advanced speakers of English, six speakers are native speakers of Dutch who are intermediate speakers of English and there are six native speakers of British English. The design is a 3x2x3 design. The other 12 speech samples are manipulated by the computer, following a 3x2x2 design. I will describe how the speech samples were manipulated in the procedures section. The judgements of the native speakers will be compared to acoustic-phonetic measurements of the same speech samples to find out if there is a correlation between the human judgements and the acoustic-phonetic measurements.

Expectations

As the literature describes, judges have explicit opinions about foreign pronunciation (Van den Doel, 2006; Koet, 2007). Also, studies using acoustic-phonetic features as VOT and final devoicing to show differences between Dutch and English were discussed (e.g. Simon, 2007). Based on the literature, two research questions were developed. The first question is if there is a correlation between human judges and acoustic-phonetic measurements. Does the opinion of the human judges correspond to an acoustic analysis? And which acoustic-phonetic features are important in pronouncing English? This will be tested by comparing the non-manipulated speech and the manipulated speech of the same speaker to see what the changes are. The reliability of the judges is tested by using three groups with different levels of English proficiency, to find out if they can hear differences between the pronunciation of the three groups. The details of the experiment will be discussed in the next sections.

Participants

In the experiment, there are two groups of participants: thirty judges and thirty speakers. I will first describe the judges, and then all four subgroups of the speakers.

(26)

The speakers are the participants from whom the speech was recorded. There are four subgroups of speakers. The choice for multiple subgroups was made to test the differences between the groups and to check the reliability of the judges. The first group contains native speakers of Dutch who are intermediate speakers of English. Three participants are male, Three of them are female. All the participants are between twenty and thirty years old. The intermediate speakers have had six years of English in a Dutch secondary school, but chose a different major at the university. This is the difference between the intermediate speakers and the advanced speakers. The advanced speakers study English at a university level or have a bachelor’s or master’s degree in English. The group of advanced speakers also contains three female and three speakers. This is also the case for the third subgroup, in which the speakers are all native speakers of British English. All the speakers in the third group still live in England. Due to organisational problems, only three male students of English could participate. To facilitate the statistical analyses the choice for three people in every group was made.

The fourth group of speakers is technically not a group of participants, although it is presented to the judges as a fourth group of speakers. It contains twelve samples from the other subgroups in which sounds are manipulated. From the group of Dutch intermediate speakers, four samples were selected for the manipulations, two male speakers and two female speakers. This is also the case for the other two groups. How these manipulations have been done will be described in the procedure section. The table below shows all participants.

Participant Nationality Sex Manipulation Number of part.

Student Dutch Male No 3

Student Dutch Female No 3

Student of English Dutch Male No 3

Student of English Dutch Female No 3

Native speaker British Male No 3

Native speaker British Female No 3

Student Dutch Male Yes 2

Student Dutch Female Yes 2

Student of English Dutch Male Yes 2

Student of English Dutch Female Yes 2

Native speaker British Male Yes 2

Native speaker British Female Yes 2

(27)

Materials

To provide speech samples, the speech of the participants needed to be recorded. This is done with a an Olympus WS 650S Audio Digital voice recorder. The participants were asked to describe what they saw on a a four picture cartoon called Catching a Thief (Heaton, 1975) that was given to them. The cartoon can be found in the appendix and was chosen because it provides enough subjects for discussion because there is a lot to see in the pictures. Also, De Jong and Vercellotti (2011) found that pictures like in the book by Heaton work best in a picture elicitation task.

For the selection of the recordings and the manipulation of the speech samples computer programme Adobe Audition was used. Adobe Audition (Adobe Systems) is a computer programme in which speech can be represented in a waveform. It is very useful to select certain pieces of recorded speech. The selected parts can be isolated in an other file for further analyses. These further analyses have been done in Praat, that has already been described. In this thesis Praat was used for the acoustic-phonetic measurements of the thirty speech samples.

The material used next in this study is the questionnaire that was given to the judges. The questionnaire is developed for non-linguists since it does not specifically focus on features like VOT or final devoicing. In the questionnaire three questions were used:

1. What is the quality of the pronunciation? 2. How certain are you about your judgement? 3. Why did you make this decision?

The first question was answered with a 5 point Likert scale. The five answers were as follows:

1: The pronunciation influences the intelligibility so badly you cannot understand what is said 2: The pronunciation does not influence the intelligibility but it distracts from what is said 3: The pronunciation does not influence the intelligibility, but you can easily hear a foreign accent (without getting distracted by it)

4: The pronunciation is very good and only if you listen carefully you can hear a foreign accent 5: Native or native like accent

The second question was a three point Likert scale: Certainty:

1: Uncertain 2: Neutral 3: Certain

(28)

quality of the pronunciation, is designed this way because of the study by Van den Doel (2006) who rated intelligibility errors as the most serious errors. Also, because in his study the judges rated errors that led up to annoyance as serious, the part of distracting accent was added in the answers. Furthermore, it is interesting to see how certain the judges are of their judgements and if they are more certain in the extreme groups, so the native speakers and the clear non-native speakers, than in the group in the middle. Therefore the certainty question was added. Last of all, SPSS was used for statistical analyses.

Procedure

The experiment contained two parts. In this section I will describe the sequence of what the participants, both speakers and judges, did, how the speech samples were selected and manipulated and how the samples were measured and compared to the answers of the judges.

First of all, the speakers were asked to describe a cartoon in approximately 1,5 minute. This is based on the study by Mobärg (1989), who also asked his participants to describe a cartoon to record spontaneous speech. The choice for the cartoon Catching a Thief (Heaton, 1975) was made because it is a cartoon in which a lot is happening so there is a lot to talk about, which should facilitate the spontaneous speech elicitation. The speakers first looked at the cartoon in silence and had the chance to think of how they would start talking. This was done to prevent long pauses in the speech. The speech is recorded and uploaded on a computer. Once this was done, a part from every speech sample is selected for further analysis. To analyse the whole sample was of course too much work, so the part where the speech of the participant was the most spontaneous was chosen. If everything was equally fluent and spontaneous, the sample was chosen randomly. The sample either included a word with an initial plosive to measure the VOT or a word that can be voiced or devoiced at the end, to measure the vowel. The pronunciation of twelve speakers is manipulated in computer programme Adobe Audition, by the following schedule.

Group Male/female Manipulation

Intermediate student Male VOT Intermediate student Male Vowel Intermediate student Female VOT Intermediate student Female Vowel

Student of English Male VOT

Student of English Male Vowel

(29)

Native speaker Male VOT

Native speaker Male Vowel

Native speaker Female VOT

Native speaker Female Vowel

Table 7: classification of the manipulated speech samples

The VOT can be stretched out or made shorter in Adobe Audition, as well as the vowels within a word. This is done to find out if the judges would change their opinion about the same samples of speech. The answers they gave for the normal part of speech were compared to their judgements about the manipulated part of the same speech.

The thirty selected speech samples were sent to the judges, together with the questionnaire that can be found in the appendix. The judges had to answer what they thought of the quality of the pronunciation in a seven point Likert scale and how certain they were about their judgement, on a five point Likert scale. There was also space for other things they noticed about the pronunciation of the speakers and to explain why they made their decision the way they did. These questions were asked for every speech sample, so the judges gave their opinion thirty times. Once the answers from the participants were received, their questionnaires were compared to the acoustic-phonetic measurements that were already analysed.

Analyses

For the computer analyses Praat was used. I looked at voice onset time (VOT) and the length of the vowels before final voicing or final devoicing. An example is the VOT of /t/. An English /t/ has to be pronounced with a length of more than 35 ms. In Dutch, a /t/ is shorter, between 0 and 35 ms. For every speaker I checked how long the average VOT for voiceless plosives was to find out if it was more like Dutch or more like English. The same is done for the average length of the vowels. A vowel before a voiced ending of a word is longer than a vowel before a voiceless ending. The vowels of British native speakers should therefore be longer than the vowels of Dutch native speakers.

(30)

1 2 3 4 5 0 20 40 60 80 100 120 140 160

Students

Judgement Likert scale N u m b e r o f ju d g e m e n ts Chapter 6: Results

In this thesis two research questions were developed. The first question is if there is a correlation between human judges and acoustic-phonetic measurement. Does the opinion of the human judges correspond to the measurements that were made using a computer? The second question derives from the first one: which acoustic-phonetic features can influence a judgement about the quality of the pronunciation? In this section the results of the experiment will be revealed, starting with the first research question.

To answer the first research question the three groups, namely the Dutch students, the students of English and the British native speakers have been analysed separately. The graphs below, made in Excel, show the judgements per group including the certainty level of the judges. The levels on the Likert scale were as follows:

Judgements:

1: The pronunciation influences the intelligibility so badly you cannot understand what is said 2: The pronunciation does not influence the intelligibility but it distracts from what is said 3: The pronunciation does not influence the intelligibility, but you can easily hear a foreign accent (without getting distracted by it)

4: The pronunciation is very good and only if you listen carefully you can hear a foreign accent 5: Native or native like accent

(31)

1 2 3 4 5 0 20 40 60 80 100 120 140 160

Students of English

Judgement Likert scale N u m b e r o f ju d g e m e n ts

Figures 3 and 4: Judgements and certainty level of the pronunciation of the Student group

1 2 3 0 20 40 60 80 100 120 140 160

Students of English

Certainty Likert scale N u m b e r o f ju d g e m e n ts

Figures 5 and 6: Judgements and certainty level of the pronunciation of the Student of English group

1 2 3 0 20 40 60 80 100 120 140 160

Native speakers

Certainty Likert scale N u m b e r o f ju d g e m e n ts

Figures 7 and 8: Judgements and certainty level of the pronunciation of the Native speakers group

As the graphs above show, there is a difference in judgement between the three groups. Inductively, the first group, the group of students, has a peak at number three, which means that the pronunciation of the group members mostly did not influence the intelligibility, but that a foreign accent could easily be heard. The same peak also occurs in the group of students of English, but number four and five also have a high occurrence level. This means that the pronunciation of the students of English is judged to be very good and that the speakers could only hear a foreign accent if they listened very carefully. In some cases the judges even thought that the speaker was a native speaker, as they gave it a score of five. Lastly, the native speakers graph shows a peak at number five, meaning that most judges recognised a native accent. The three graphs on the right show the

(32)

certainty level of the judges. All three graphs show some answers at level one and some more at level two, but the peaks are at level three, meaning that most judges were very certain about the answer they gave about the pronunciation of the speakers. To check the correlation between the acoustic measurements and the judges, the following tables were made.

Speaker 1 2 3 4 5 Total score Mean SD

CK 1 1 24 3 1 92 18.4 30.3 DW 0 2 16 8 4 104 28.8 20.5 AH 1 7 20 2 0 83 20.8 26.7 MS 1 0 27 2 0 90 30 44.3 AB 4 6 19 1 0 77 18.5 26.1 RR 1 2 20 8 0 97 24.3 27.6

Table 8: Scores per speaker (students) together with the aggregated score (the maximum score is 150), mean and standard deviation.

Speaker 1 2 3 4 5 Total score Mean SD

KV 0 1 0 7 22 140 46.7 56.4

MD 3 0 15 11 1 107 24.3 23.4

MK 0 3 17 10 0 97 32.3 23.5

TM 1 3 12 6 7 102 20.4 16.3

ZN 1 2 13 8 6 105 21.2 17.4

Table 9: Scores per speaker (students of English) together with the aggregated score (the maximum score is 150), mean and standard deviation.

Speaker 1 2 3 4 5 Total score Mean SD

KS 0 0 0 4 26 146 73 80.6 EF 0 0 2 7 21 139 46.3 51.9 NO 0 2 1 9 18 133 33.2 40.8 NN 1 0 0 0 29 146 73 101.8 OM 1 2 0 0 27 140 46.3 51.9 OW 0 0 10 7 13 123 41 20.9

(33)

Of course there is some overlap, but the judges all seem to be very consistent in their judgements: the group of students has the lowest score, the group of students of English stands in the middle and the group of native speakers are all judged to be native or native like. As both the graphs and the tables show, there is a difference between the three groups. However, to be certain, a statistical analysis must be done to find out whether or not this difference is significant. A correlation analysis between the three groups shows the following differences. The judgements of the student group and the native speaker group are not correlated at all (-,304). Lastly, the group of students of English stands between the students and the native speakers, as the correlation between the students of English and the native speakers is ,279. All these differences however are not significant (p=,181, p=,619 and p=,649). The judges all were very certain about their opinions, as the results of the certainty judgements show. There is a high correlation between the scores of the speakers in all three groups: students and students of English (r=,996; p=,05), students and native speakers (r=,999; p=,02) and students of English and native speakers (r=,999; p=,03).

To compare the judges to acoustic measurements, the VOT of the speakers has been measured. The first table below shows the voiceless VOT of speakers. The group does not matter, because the comparison is just between the VOT and the score, so the group number is not mentioned. The second table shows the VOT and mean judge scores for speakers using voiced plosives.

Participant Mean VOT Mean judge score SD of judge scores

AH K: 32 ms 20.8 26.7

AB K: 23 ms 18.5 26.1

MK P: 10 ms 32.3 23.5

TM K: 85 ms 20.4 16.2

OM: P: 41 ms 46.7 76.5

Table 11: Voiceless VOTs of the participants and their mean judge score and standard deviation

Participant Normal VOT Mean judge score SD

(34)

KV B: -78 ms 46.7 56.4 MD B: -31 ms 24.3 23.4 EF B: 22 ms 46.3 51.9 OW B: 19 ms 41 20.9 KS B: 17 ms 73 80.6 NN B: 24 ms 73 101.8 NO B: 37 ms 33.3 40.8

Table 12: Voiced VOTS of the participants and their mean judge score and standard deviation

The mean judge score has a high range of scores. An example is KV, who prevoiced the B (-78 ms) but still received a very high mean score of 46.7. A correlation analysis for the first table, voiceless plosives, showed a correlation of -0,17. This correlation however was not significant (p=.7, α<.05). When running a correlation for the voiced VOTs, a positive correlation of .39 has been found. This correlation also was not significant (p=,18, α<.05).

Manipulations

The second research question was which acoustic-phonetic features can influence a judgement on the quality of the pronunciation. To answer this research question, manipulations in some of the samples were made and presented to the judges as an other speaker from which the pronunciation needed to be judged. The table below shows the speakers that were selected, described in the first column. The second column shows their normal VOTs as found in the selected speech samples, for every initial plosive they pronounced. The plosives of the British native speakers are longer than the plosives of the Dutch native speakers. The third column shows the manipulations that have been made. For the Dutch native speakers, so the male and female students and students of English, the VOTs have been made longer. For the male and female native speaker of English, the VOTs have been made longer. The differences between the normal VOTs and the manipulated VOTs are written down in the last column of the table. The consonant before every VOT measurement is the initial plosive.

Participant Normal VOT Manipulated VOT Difference

(35)

Male native speaker B: 28 ms B: 18 ms B: 20 ms B: 10 ms B: -8 ms B: -8 ms

Female native speaker B: 37 ms B: 12 ms B: -25 ms

Table 13: VOTs of the selected speakers and their manipulations, in ms.

The second part of the manipulations existed of manipulating the vowels before the last plosive or fricative sound of a word. The vowel length influences the last voiced sound that will be pronounced either voiced or voiceless. In Dutch final devoicing exists, meaning that the last voiced sound of the word will be pronounced voiceless. English knows the opposite, final voicing. The table below is build up the same way as the table of the VOT manipulations. The first column shows the selected speakers, the second column their original vowel length in ms., the third column the manipulations and the last column the differences between the original samples and the manipulations. The vowel before the length is the vowel that is pronounced before the consonant that is pronounced either voiced or voiceless. Note that the male student of English and the male British native speaker both pronounce an /I/, but with a great difference in length. This is because the word with the /I/ that the English student pronounces is stressed within the sentence, whereas the /I/ from the native speaker is not.

Participant Normal length Manipulated length Difference

Male student U: 14 ms U: 199 ms U: +185 ms

Female student O: 60 ms O: 107 ms O: +47 ms

Male student of English I: 113 ms I: 155 ms I: +42 ms Female student of English Au: 137 ms Au: 200 ms Au: +63 ms

Male native speaker I: 75 ms I: 20 ms I: -55 ms

Female native speaker O: 111 ms O: 61 ms O: -50 ms

Table 14: VOTs of the selected speakers and their manipulations, in ms.

The manipulations as described above, the original samples of all the groups and some extra samples that have not been manipulated were sent to the British judges. The outcome of all the questionnaires, so the score that every judge gave to every speech sample plus his or her certainty about the judgement, can be found in the appendix, listed per group. The results of the questionnaires will be discussed in the rest of this chapter.

(36)

1 2 3 4 5 0 10 20 30 40 50 60

Students VOT

Judgement Likert scale N u m b e r o f ju d g e m e n ts 1 2 3 0 10 20 30 40 50 60

Students VOT

Certainty Likert scale N u m b e r o f ju d g e m e n ts

were added up, so the VOT manipulated samples and the Vowel manipulated samples both have three graphs, one per group, for judgement, and three for certainty. First the VOT graphs will be discussed.

Figures 9 and 10: Judgements and certainty level of the manipulated VOT pronunciation of the group of students

The graph of the students’ manipulated VOT shows, as already seen in the original samples, a peak at the third level of the Likert scale, meaning that the pronunciation did not influence the intelligibility, but the foreign accent could still easily be heard. The judges were also very certain about their judgement, as the second graph shows. This certainty level is the same for the judgements about the students of English, but the judgements graph is different, as seen below. In the judgement graph of the students of English it is clear that levels three and four scored the highest. This is different from the graph of the original samples, in which the pronunciation also often scored a five. Since the groups are not equally divided, the group with the original samples is larger than the group of manipulated VOT samples, this difference cannot be significantly tested.

1 2 3 4 5 0 10 20 30 40 50 60

Students of English VOT

Judgement Likert scale N u m b e r o f ju d g e m e n ts 1 2 3 0 10 20 30 40 50 60

Students of English VOT

(37)

1 2 3 0 10 20 30 40 50 60

Native speakers VOT

Certainty Likert scale N u m b e r o f ju d g e m e n ts 1 2 3 4 5 0 10 20 30 40 50 60

Native speakers VOT

Judgement Likert scale N u m b e r o f ju d g e m e n ts

Figures 10 and 11: Judgements and certainty level of the manipulated VOT pronunciation of the group of students of English

Figures 12 and 13: Judgements and certainty level of the manipulated VOT pronunciation of the group of native speakers of English

As the judgement graph of the native speakers shows, they were still recognised as native speakers. Also there is hardly any doubt about that, as is made clear from the certainty graph.

Again, for every manipulated sample the scores on the Likert scale and the total score has been calculated. The scores can be found in the table below, organised per group. S between brackets stands for student, se for student of English and ns for native speaker.

Speaker 1 2 3 4 5 Total score Mean SD

RRV (s) 0 2 16 9 3 103 25.8 19.9 AHV (s) 2 11 17 0 0 75 25 24.6 MKV (se) 1 2 12 15 0 101 25.3 28.1 ZNV (se) 2 1 14 9 4 102 20.4 18.6 NNV (ns) 1 0 0 0 29 146 73 101.8 NOV (ns) 0 0 4 11 15 131 43.7 31.5

Table 15: Scores per manipulated sample together with the aggregated score, the maximum score is 150, mean and standard deviation.

(38)

1 2 3 4 5 0 10 20 30 40 50 60

Students Vowel

Judgement Likert scale N u m b e r o f ju d g e m e n ts

original groups and the manipulated groups. For every group of speakers a Spearman’s rho correlation analysis between their scores on the original samples, manipulated samples and differences in the manipulation of the VOT has been done. For the group of students, the scores on the original samples and the manipulated samples are highly correlated, they have a significant correlation level of 1 (p<.01). This is also the same for the group of students of English and native speakers. When looking at the differences between the three groups again, it looks like the correlation between the three groups has changed. Between the students and the students of English, the correlation decreased (r=,694), as for the students and native speakers (r=-348) and the students of English and native speakers (=-,198). Especially the last group is interesting, since the correlation between the students of English switched from positive to negative. All correlations however were not significant.

To look for differences between the manipulated vowel groups, the following six graphs have been made. Again, the three graphs on the left are the judgements about the pronunciation, the three graphs on the right are the certainty levels of the judges.

1 2 3 0 10 20 30 40 50 60

Students Vowel

Certainty Likert scale N u m b e r o f ju d g e m e n ts

(39)

1 2 3 4 5 0 10 20 30 40 50 60

Students of English Vowel

Judgement Likert scale N u m b e r o f ju d g e m e n ts 1 2 3 0 10 20 30 40 50 60

Students of English Vowel

Certainty Likert scale N u m b e r o f ju d g e m e n ts

Figures 16 and 17: Judgements and certainty level of the manipulated vowel pronunciation of the group of students of English 1 2 3 0 10 20 30 40 50 60

Native speakers Vowel

Certainty Likert scale N u m b e r o f ju d g e m e n ts

Figures 18 and 19: Judgements and certainty level of the manipulated vowel pronunciation of the group of native speakers of English

Speaker 1 2 3 4 5 Total score Mean SD

ABK (s) 3 10 15 2 0 76 19 18.7 DWK (s) 0 0 16 11 3 107 35.7 18.1 TMK (se) 0 0 13 9 8 115 38.3 2.1 KVK (se) 0 1 2 16 11 128 31.8 32.3 OMK (ns) 0 3 0 2 25 139 46.3 68.1 KSK (ns) 0 0 2 4 24 142 47.3 63.1 1 2 3 4 5 0 10 20 30 40 50 60

Native speakers Vowel

(40)

Table 16: Scores per manipulated sample together with the aggregated score, the maximum score is 150, mean and standard deviation.

When comparing the Vowel graphs to the graphs of the original speakers, it is noticeable that for the students the scores a level five are higher in the graph of the manipulated samples than in the graph of the original samples. Also, for the students of English the scores are higher in the graph of the manipulated vowel samples than in the original samples. Lastly, in the graph of the native speakers it looks like not much has been changed. As for the VOT graphs, these differences cannot be significantly tested since the groups are to small.

When comparing the certainty graphs per group, there seem to be a small difference between the certainty of VOT scores and Vowel scores. When looking at the differences between the three groups again, the correlation between the three groups has changed, although again none of the correlations was significant. Between the students and the students of English, the correlation decreased (r=,269), as for the students and native speakers (r=-422), but not for the students of English and native speakers (=,427). However, for every judgement group of speakers a Spearman’s rho correlation analysis between their scores on the original samples, manipulated samples and differences in the manipulation of the VOT has been done. For the group of students, the scores on the original samples and the manipulated samples are highly correlated, they have a significant correlation level of 1 (p<.01). This is also the same for the group of students of English and native speakers.

Quantitative analyses

(41)

Chapter 7: Discussion

In this section the results will be discussed. The results are linked to the two research questions. The first research question was whether or not there is a correlation between the acoustic-phonetic features and human judgements. When looking at the VOT tables made in the results section, one can see that the more Dutch the VOT is, the lower the score of the judges is. A positive correlation of ,39 for the voiced plosives was found implying that less prevoicing leads to higher rating. This score however was not significant. For the voiceless plosives, a negative correlation of -,17 has been found. Unfortunately, the voiceless table contains too few data point and too much variability for meaningful analysis.

The British native speakers of course did not prevoice the voiced initial plosives, but the Dutch native speakers did. The scores for the pronunciation of the British native speakers were higher than the score for the Dutch native speakers, so this could mean there is a link between human judgements and VOT. However, even though all Dutch native speakers pronounced their VOT very Dutch, according to the standards described by Lisker and Abramson (1964), there was a clear difference between the Dutch students of English and the Dutch students that did not study English. Also, there were some exceptions within the judgements; some students produced a very Dutch VOT but still received a very high score from the judges. Thus, it is not very clear if the participants judge the pronunciation also based on VOT or if they do not.

(42)

For the vowel analysis correlations have been compared between students and students of English, students and native speakers, and students of English and native speakers. These correlation were compared to the correlations of the original samples of the same groups. Between the students and the students of English, the correlation decreased (r=,269), as for the students and native speakers (r=-422). But not for the students of English and native speakers (=,427). The last group is interesting, since the correlation between the students of English is positive, whereas it was negative in the VOT analyses. This means that the differences in judgements between the students of English and the native speakers were smaller when the vowel was manipulated than when the VOT was manipulated.

Quantitative analyses

In the questionnaire, the participants were also asked to write down why they judged the pronunciation the way they did. The answers were similar to the answers that were found in the studies by Van den Doel (2006) and Koet (2007). The participants often said that the intonation of the sentence was not native like or that the stress patterns distracted them from what was said. Differences in vowels were also named very often. As described in the background section of this thesis, there are some vowel distinctions within for example the /e/, as the table below shows.

F1 F2

[ε] 610 1900

[æ] 820 1530

Table 17: F1 and F2 for /ε/ and /æ/, in hertz (Hz).

Dutch does not know the [æ], so it is often replaced with [ε] by Dutch speakers. The British judges heard this difference in vowel, for example in the word catch, which is supposed to be pronounced with an [æ] sound instead of a [ε] sound, and named it as a noticeable pronunciation error. The judges explicitly mentioned this vowel contrast, so this may be further investigated.

(43)

Chapter 8: Conclusion

This thesis reported on the correlation between human judgements and acoustic-phonetic measurements of the pronunciation of English by Dutch native speakers. The question was if VOT and vowel length or final (de)voicing) would influence the opinion of the human judges, native speakers of British English. Even though it looks like changes in vowel length before final (de)voicing influence the opinions of the judges more than changes in VOT, no significant differences have been found. The research questions I came up with in this thesis asked if the opinion of the human judges correspond to the measurements of the computer, and which acoustic-phonetic features can influence a judgement about the quality of the pronunciation? Even though the computer manipulated the length of the vowels and the VOTs as native English vowels and VOTs, the judges did not hear the difference, so the second research question must be answered negatively. The judges however did judge the native speakers as the highest, the students as the lowest and the students of English in the middle, just as expected. This means that studies with judges, like the studies by Flege et al (1997) and Golestani and Zatorre (2009), are reliable. Furthermore, the quantitative analysis that has been done to find out what would judges hear and what would lead them to their judgements shows the same result as found in Koet (2007) and Van den Doel (2006). Intelligibility was the most important issue, but the participants also noticed a wrong intonation and the emphasis on words as being non-native. These characteristics were noticed because they were annoying to the judges, as Van den Doel discovered.

VOT is a standard tool when investigating L1 and L2 pronunciation. Deuchar and Clark (1996), Kehoe, Lleó and Rakow (2004) and Simon (2007) all used it in their studies as the only tool for assessment of pronunciation. However, findings in this thesis concluded that judges that were proven to be reliable did not hear a difference between the VOTs of the L1 and the L2. This could be because they payed attention to the rest of the sentence, and that they would hear a difference when the original sample and the manipulated sample came from an isolated word, but it is also possible that VOT is not as good as often described, and certainly not as the only tool for assessment of pronunciation.

Methodological pitfalls

(44)

the word ‘boys’, it would be ideal to have two or three different words that every speaker uses to make the comparison as good as possible.

A second item is that it may be better to use more participants. Since it was very hard to find male students of English, only three people participated in each group to equally divide the groups. The group of speakers could be much larger for a better analysis. As noticed, the judges remembered the speaker as heard before if they heard the same speaker again later on in the questionnaire. This might influence their listening abilities, so larger groups with more different speakers could help solving this problem.

An other issue that was named frequently is that the judges did not notice a difference between an original sample and a manipulated sample. They asked if I checked for a repeated judgement to find out if it would change, or they just typed something like ‘I already heard this one’ or ‘these are repeats?’. This can have an influence on the results. If the judges for example hear a very obvious British native speaker, they pay less attention to the pronunciation the second time they hear it, because they remember the other sample. It is an issue that should be removed in further investigations to receive cleaner results.

Further recommendations

Keeping the pitfalls described above in mind, the issues mentioned could be improved. For example, two questionnaires could be designed with different samples and questions for a wider range of speakers. In these two questionnaires it would also be possible to include more samples of the same speakers to find out if the judgements really are about the pronunciation of that particular speaker, and not just about the pronunciation within one sentence. Of course, with more samples, more speakers, it is also better to use more manipulated samples. In this thesis only six manipulated VOTs were used, but it is better to have more of these manipulated samples, for example three male students, three female students, three male native speakers, et cetera. It is better to analyse and to draw a conclusion from a larger group of samples than from a smaller group.

Referenties

GERELATEERDE DOCUMENTEN

In de Agromere Arena ontwikkelden belanghebbenden samen met het onderzoeksteam van Wageningen UR een nieuwe visie op de rol van landbouw in een stedelijke omgeving, een visie op

Gezien de sterk toenemende vraag naar dierlijk eiwit, wordt het steeds belangrijker om deze eiwitefficiency verder te verbeteren.. Drie opties lijken

Het bedrijf MSD Animal Health werkt met onderzoekers van Wageningen UR, het RIVM en de Universiteit Utrecht aan een betere bestrijding van infectieziekten die van dier op mens

Er is tijdens het onderzoek ook gekeken of het aantal goede spenen van de zeug invloed heeft op de uitval van zogende biggen, Op het Proef- station voor de Varkenshouderij wordt er

Bauer illustreert het gebruik van sensoren met de soldaat als belangrijkste sensor en de fusie van sensor- data in toepassingen van de Nederlandse krijgsmacht in

10 dagen opkweek bij opkweekbedrijf KD = Korte Dag ** zandbed en grondbed scoren relatief laag door de plantdichtheid van 60 tegenover een dichtheid van 55 in de andere

Het Bronzen Kruis, ingesteld in 1940, wordt toegekend aan Nederlandse militairen, die zich ten behoeve van de Nederlandse Staat door moedig of beleidvol optreden tegen de

Moreover, the results from the robustness test show that the relationship between stock index return and changes in implied volatility is more negative under the negative return