Acoustic and Perceptual Correlates of Vowel Articulation of A Bilingual Speaker with Parkinson's Disease

(1)

Acoustic and Perceptual Correlates

of Vowel Articulation of A Bilingual Speaker with

Parkinson's Disease

Ruiqi Wang S4098994 MA in Multilingualism, Faculty of Arts, University of Groningen Supervisors: Charlotte Gooskens Vass Verkhodanova 25th_{July, 2020}

(2)

Acknowledgment. I would like to thank the following people, without whom I would not have been able to complete this research, and without whom I would not have made it through my master's degree.

Firstly, I wish to show my gratitude to Hanneke Loerts, who gave vivid and exciting lectures during the course of Multilingual Mind. From this course, I have grown a great interest in neurolinguistics and developed the idea to write a thesis in this field. Besides, Hanneke Loerts recommended Vass Verkhodanova as a supervisor of this research, to whom I would like to pay my special regards for her insightful suggestions on narrowing down the thesis topic. It is whole-heartedly appreciated that Vass provided data, tools, and guidance along the way. I could not finish this thesis without her most detailed and patient explanation on phonetics, stimuli selection, R studio, and JATOS scripts. I would also like to thank Vass for guiding me to her important publications and for the stimulating questions on vowel measurements and speech intelligibility.

Further, I wish to express my sincere appreciation to my supervisors Charlotte Gooskens and Vass Verkhodanova. They kindly permitted me to postpone the deadline of this work and provided helpful comments on draft revisions.

I also thank all the participants who took the time to fill the questionnaires, and volunteered to participate in our experiment. I would particularly like to thank Monica and Sibrecht, who kindly offered me their laptops when I had technical problems with mine.

(3)

Finally, I wish to acknowledge the support and great love of my family, my girlfriend, Jiayun; and my mother, Yuqin. They kept me going on, and this work would not have been possible without their encouragement.

(4)

ABSTRACT

Parkinson's Disease (PD) is one of the most common neurodegenerative disorders. Patients with HD usually suffer varying degrees of speech impairment. Speech therapy is proven to be beneficial to articulation and speech intelligibility. This study is designed to find out whether a person with PD who received speech therapy in his L1 also experienced changes in his L2, and if listeners could perceive changes in L2 speech before, during, and after therapy, and how intelligible his L2 in these three time period respectively. The subject is a male native Dutch speaker who speaks English fluently and uses both languages daily. He participated in 24 session recordings over two years and completed four speech tasks during the sessions: spontaneous speech, picture description, video description, and reading. To have a comprehensive assessment of the changes in the subject’s articulation and speech intelligibility, two evaluations were conducted. One was to perform acoustic analysis using speech processing programs and scripts; the other one was to conduct a perceptual experiment with a group of listeners, who rated speech healthiness and intelligibility levels. After measuring the subject’s vowel space area (VSA), vowel articulation index (VAI), F2 ratio, F1, and F2 variability values, we observed that the subject had improvement in his vowel articulation and tongue advancement over time. However, his steadiness in achieving vowel targets remained the same. The perceptual experiment results showed that the healthiness rating scores increased as the sessions progressed, and listeners were more confident when they rated the recordings after speech therapy. The reading task's healthiness rating score was higher

(5)

than the other three tasks: free speech, picture, and video description. The accuracy scores of correctly recognized and understood words and non-words were also increased over time. To sum up, speech therapy received in L1 was beneficial at both production and perception levels in L2 speech.

(6)

Index

Acknowledgment... 2

Abstract... 4

1. Introduction... 8

2.Theoretical Background... 11

2.1. Speech disorder in PD patients...11

2.2. Speech therapy... 13

2.3. Bilingual PD and speech therapy... 14

3. Methodology...17

3.1. Speech intelligibility as a subjective assessment... 17

3.2. Acoustic analysis as an objective assessment... 20

3.3. Acoustic measurement of vowels...21

3.3.1. Definition of formant...21

3.3.2. Definition of Vowel Space Area... 24

3.3.3. Definition of Vowel Articulation Index... 25

3.4. Speech Data collection... 26

3.5. Perceptual experiment... 27

3.5.1. Design of the perceptual experiment...27

3.5.1. Stimuli... 29

3.5.3.Participants and Procedure for Perceptual Experiment...30

4. Results...32

(7)

4.1.1. Calculation of vowel measurements...35

4.1.2. Results of vowel measurements... 36

4.2.Results of perceptual experiment... 42

4.2.1. Healthiness rating of phrases...42

4.2.2. Accuracy scores of words and non-words...47

4.2.3. Intelligibility rate of words and non-words...50

5. Summary and Discussion...54

6. Conclusion and limitations... 60

7. Future Work... 62

References... 63

Appendix... 70

1. Questionnaire ...70

(8)

1. INTRODUCTION

Parkinson's Disease (PD) has been recorded from the early nineteenth century, it is one of the most common neurodegenerative disorders nowadays ("Parkinson's Disease Statistics," 2018), and even the fastest growing one (Dorsey et al., 2020).

According to Parkinson's Foundation ("Statistics," 2020), "more than 10 million people across the globe are living with PD." One in a thousand of the world population are influenced by PD, and its symptoms typically occur after the age of 50 years in both men and women (Marsden, 1996). Many studies have demonstrated that the diagnosis of PD is based on symptoms including tremor, rigid muscles, slowed movement (bradykinesia), loss or impairment of the power of voluntary movement (akinesia), and postural abnormalities (Manyam, 1997; Marsden, 1996; Tanner & Goldman, 1996).

The patients with PD suffer from a progressive neurological disorder that affects movement to varying degrees, such as slowed movement or stiffness in limbs, tremor, and shuffling gait. The motor symptoms are varied greatly from individual to individual in terms of their intensity and how they progress. However, up to 90% of PD cases develop a speech disorder, hypokinetic dysarthria (Logemann et al., 1978; Hartelius, & Svensson, 1994; Ho et al., 1999). It can cause difficulties in PD patients' speech, including intonation, pitch, loudness, pause, and lexical stress (Brabenec et al. 2017; Darley et al. 1969a, b; Mekyska et al. 2011). Thus the listeners' perception is also affected, which leads to poor intelligibility and impaired social interactions.

(9)

As one of the treatments to improve speech intelligibility of people with PD, speech therapy has been proven to be the most significant therapeutic approach available in terms of ameliorating voice and speech function, especially for those who are not in proper condition for surgery or medicine (e.g., Johnson and Pring, 1990; Kalf J. et al., 2012;Ramig et al., 1995; Sapir S. et al., 2011a; Schulz and Grant, 2000).

In previous studies (e.g., Johari, et al., 2013; Paradis, 2004; Yazawa et al., 2003; Zanini et al., 2004a) of speech disorder of bilingual PD patients, the findings suggest that more significant impairments appear more often in L1 than in languages learned subsequently. Paradis (2004, p.218) indicated that psychotic symptoms like auditory hallucinations or conceptual disorganization often been in at least one of a patient’s languages. However, he also suggested that more studies need to be done to investigate the reasons behind it, and more importantly, if one particular language impairment with bilingual patients should be treated, then which one? Zanini et al. (2004a) and Yazawa et al. (2003) reported that bilingual PD patients had less impairments in their second language than their native ones. Thus in this thesis, we are going to present the results of a longitudinal case study of a bilingual PD speaker (L1 Dutch, L2 English) to investigate what are the changes in his L2 vowel articulation after receiving speech therapy in L1. Moreover, we are interested in how healthy listeners perceive his speech. This study was designed to address the following questions: 1) Are there changes in vowel articulation in L2 (English) after speech therapy in L1 (Dutch) in a bilingual patient with PD? 2) Can listeners perceive changes in the L2 speech before, during, and after speech therapy in general? 3) Are

(10)

they able to recognize the target vowels in L2 in these three time periods? The hypotheses are that speech therapy in L1 has a positive impact on L2 in vowel articulation, and that listeners will recognize the changes in L2 speech throughout the recording sessions.

To have a comprehensive result of the research questions, two types of assessment of speech intelligibility were conducted. One was to analyze audio recordings collected from the subject and then performed an acoustic analysis by means of speech processing programs and scripts. The other was to conduct a perceptual experiment with a group of listeners, who would rate speech healthiness and intelligibility levels. The results are compared and discussed in the result section of this paper.

Since the speech disorder caused by PD compromise intelligibility, communication, and further, patients' social life, this study was meant to explore the relationship between speech therapy in L1 and untreated L2. Findings of phonetic profiling were considered useful when identifying treatment goals and provided support for the efficiency of speech therapy.

(11)

2. THEORETICAL BACKGROUND

This part introduces the typical symptoms of PD and its negative impact on speech abilities. Then we give a brief history of speech therapy and its development, along with its influences on PD patients. Some earlier studies of bilingual PD patients are listed to give a base to the hypothesis of this research.To have a clearer understanding

of the analyses used in the present study and the following analysis, various vowel measurement terms, methods and their mechanism are explained, in addition to their phonetic principles.

2.1. Speech disorder in PD patients

PD has a negative impact on speech output. PD causes impairments in muscle control, which shows not only in faciokinesis (i.e., facial expression) but, more importantly, in the areas of sound articulation. Thus, it affects all aspects of speech, such as the patient’s intonation, speech rate, loudness, pausing, and lexical stress (Brabenec et al. 2017; Darley et al. 1969a, b; Mekyska et al. 2011). Darley et al. (1969a) describe motor speech disorder in PD patients as hypokinetic dysarthria (HD). Swigert (1997) pointed out that HD may also affect the speech production subsystems, respiration, phonation, and articulation. The specific degree of the detriment of each of these speech subsystems directly affects how PD patients express themselves during a conversation. In a recent study by Schalling et al. (2017), 92.5% of 188 participants declare that they have at a minimum one symptom associated with expression; during conservation, they often find themselves having problems finding the right words, or having a weak voice. Sometimes participants experienced

(12)

inaccuracy in articulation and stray from the point in a conversation. Between a quarter and a third of the participants suffer from constrained expressions due to the speech and communication problems. The pathophysiological mechanisms of HD are still not completely understood. However, HD has a negative influence on PD patients’ communication and daily social life (Miller et al., 2006). Darley et al. (1969a, 1969b) characterize speech affected by HD as "monopitch, inappropriate pitch level, reduced stress, mono loudness, inappropriate silences, short rushes of speech, variable speech rate, harsh and breathy voice qualities, and reduced intelligibility." The symptoms of HD in PD patients differ considerably from one individual to another. Factors like the severity of HD/PD, coexistent conditions, or specific neurological impairment contribute to dissimilar development in PD (Mekyska et al., 2011; Paradis, 2004).

PD affects speech subsystems. For example, patients experience impaired breathing when speaking. Solomon and Hixon (1993) indicate that PD patients’ respiratory system is weakened because reduced rib cage volumes and increased abdominal volumes brought about an insufficient quantity of air reaching the vocal tract during the speech. In earlier studies (e.g., Boshes, 1966; Canter, 1965; Mueller, 1971), the capability to maintain extended voicing has been evaluated and reported to be impaired. Furthermore, Boshes (1966) and Canter (1963) also argue that PD patients differ in speaking rates from healthy individuals. It is observed that they are either faster or slower than the control speakers (Canter, 1963; Skodda et al., 2011a).

As a result of impaired muscle control, the range of articulatory movements in PD patients reduces (Skodda et al., 2011b), leading to blurred vowel articulation. When

(13)

vowels are produced less distinct from each other and more centralized, they sound schwa like and thus hinder the listeners’ perception. Furthermore, PD affects patients’ phonatory system. Many patients report a lowered voice in speech, and sometimes they experience difficulty in producing speech with variable vocal loudness (Boshes, 1966; Canter, 1965). During speech and voice tasks, the vocal sound pressure level (SPL) of PD patients was, according to statistics, shown to be significantly lower (by 2.0–4.0 dB SPL) than SPL of healthy control subjects (Fox & Ramig, 1997). Yorkston et al. (1984) suggested that early detection of changes in voice is "clinically meaningful"; Skodda (2012) indicates that if early changes in speech can be identified, then it can be applied as a useful indicator to check whether there is an improvement or worsening during treatment.

2.2. Speech therapy

Before the 1980s, the attitude towards PD speech treatment was pessimistic. Due to PD’s progressive and irreversible characteristics, some believed that PD patients did not achieve improvement with treatment (Sarno, 1968). Many assumed that PD patients would need continuous treatment because the speech ability kept deteriorating (Allan, 1970). Later in the 80s and 90s, some wearable devices such as intensity biofeedback devices and a masking device were used. These devices helped to increase clarity and loudness, thereby improved PD patients' speech rate and intensity, deduced their anxiety level, and raised self-monitoring of speech intensity and rates (Schulz & Grant, 2000, p66).

(14)

the main treatments for PD patients. Many acoustic studies (Johnson and Pring, 1990; Kalf J. et al., 2012; Pawlas and Countryman, 1995; Sapir S. et al., 2011; Schulz and Grant, 2000) demonstrate that, among these treatments, speech therapy was recognized to be the most significant therapeutic approach available in terms of ameliorating voice and speech function. For those who suffer from speech impairment caused by PD, customized therapy can be helpful, for example, the Lee Silverman Voice Treatment (LSVT) is proven to be an efficient therapeutic method in improving phonatory and respiratory functions (e.g., Spielman et al., 2007; Ramig et al., 2014;). However, the importance of pharmacological treatment(s) cannot be ignored. Statistically speaking, the respondents in LSVT studies are optimally medicated with dopamine replacement medications (levodopa) (Schulz & Grant, 2000, p67).

Furthermore, scope of the speech therapy has been extended onto swallowing, micrographia (i.e., a common disorder that features abnormally small, cramped handwriting), and cognitive disorders (Rolland-Monnoury, 2013). Such treatment is essential for those who are not very sensitive to medication or surgery. As a long-term rehabilitation, speech therapy also adapts to the evolution of the disease. However, although many PD patients experience speech disorder and impairment in articulation, Trail et al., (2005) indicate that merely 3–4% of them receive speech therapy. Hence, a large sample of new studies is needed.

2.3. Bilingual PD and speech therapy

Speech impairments in bilingual or multilingual patients with PD are yet to be widely investigated. Early studies of language therapy suggest that when bilingual or

(15)

multilingual speakers receive speech therapy in one of their languages, the other untreated languages can improve as well (e.g., Fredman, 1975; Voinescu et al., 1977). Zanini et al. (2004) argue that bilingual Parkinsonian patients show more severe syntactic damage in their native language than in their second language. Among PD patients, the selective or greater impairment is caused by damage to either procedural memory, which affects predominantly the native language(s) (Paradis, 2008, p.224). Additionally, a bilingual Japanese-English patient in a case study of Yazawa et al. (2003) shows worse micrographia in his written Japanese (L1) than his written English (L2).

In a later study (Zanini et al., 2010), more mistakes were observed in PD patients' L1 than in L2, no matter phonologically, morphologically, and syntactically. In a spontaneous language production task, Zanini et al. (2010) compared the performance between two groups of participants, who acquired L1 - Friulian at home and later learned L2 - Italian at the age of six at school. One group consisted of nine early non-demented patients with PD; members of the other group were nine healthy controls. They assumed that the age of acquisition of L1 might play an essential role in language usages, considering that only the L1 was impaired in PD patients. Zanini et al. (2010) made a point that when the PD patients had used L2 in a daily for many years, which meant that this later learned the language (in this case, Italian) might become automatized to a certain extent. Moreover, the L2 was still supported by the same mechanisms that sustain L1. The key variable would be the level of L2 proficiency. Since PD subjects in Zanini et al.'s study (2010) were not spontaneous

(16)

bilinguals (i.e., those who learn both languages before the age of three, normally from family members), the explicit knowledge about L2, such as phonology, lexicon, morphology, and syntactic, were all acquired explicitly at school. This metalinguistic knowledge was still accessible for a bilingual speaker, although the learners were of very high or native-like proficiency. These findings suggest that, among PD patients, the language learned implicitly at home at a very early age would be more impaired than the one learned explicitly later at school.

However, the dissimilarities between the two languages should also be carefully considered in the treatment of bilingual patients. Lee and McCann (2009) compared the speech intelligibility of a group of bilingual speakers (L1 - Mandarin, L2 - English) with HD before and after speech therapy. They suggested that the therapy was more efficient in improving intelligibility for Mandarin (a tonal language) than for English (a non-tonal language). However, there are limited, albeit growing, amount of empirical studies that have examined the integrity of language abilities in bilingual individuals with PD. Undoubtedly, more research is required before anything conclusive can be said about the speech of patients with PD in bilinguals.

(17)

3. METHODLOGY

As a general rule, speech analysis can be divided into two ways: subjective and objective (Brabenec et al., 2017, p.304). The former is a perceptive analysis of listeners, who can be speech therapists or those without related experiences, while the latter is an acoustic analysis of speech waveforms.

3.1. Speech intelligibility as a subjective assessment

Tjaden and Wilding (2011, p.155) defined intelligibility as the scale of how well the listener understands a speaker’s acoustic signal. Intelligibility is an important aspect of both studies and the evaluation of the degree of speech disorders (Yorkston & Beukelman 1981; Kent et al., 1989). Tjaden and Wilding (2011, p.155) also suggest that once quantitative measures of intelligibility are documented, it can be used to set up a baseline before a speech therapy, to check improvement or deterioration related to disease progression. Duffy (2019) further reports that speech intelligibility can contribute as an objective evaluation of the severity of HD, so that it could be used for clinical, legal, or research purposes. Walshe et al. (2008) believe that for most speech therapies, improved intelligibility represents a primary goal.

Different measurements are used to assess speech intelligibility. The three most common approaches are:

1) The speech and language pathologist give their assessment of speech intelligibility for clinical, legal, or research purposes (Pinto et al., 2017, p.162). Various tools are applied to evaluate intelligibility as an indicator of HD severity, for instance, Assessment of Intelligibility of Dysarthric Speech (AIDS, i.e., a tool for

(18)

quantifying single-word intelligibility, sentence intelligibility, and speaking rate of adult and adolescent speakers with HD) by Yorkston et al. (1984) and A Set of Unpredictable Sentences for Intelligibility Testing by McHenry & Parle (2006) have been widely used in the estimation of intelligibility in the English language. Another widespread test is Frenchay Dysarthria Assessment - Second Edition (FDA-2, Enderby & Palmer, 2008), which is designed to rate motor speech disorders by a number of simple performance tasks related to speech function. Speech and language pathologists often adjust these tools for different languages. As an example, FDA-2 has been adapted into the French language by Auzou and Rolland-Monnoury (version 1, 2006) then into Portuguese by Cardoso et al. (version 2, 2017). However, there are many influential factors in this approach, such as different types of stimuli, the patients, and particularly, the knowledge and experience in intelligibility assessment of each individual speech and language pathologist (Pinto et al., 2017, p.162).

2) Many researchers (e.g., Tjaden & Wilding, 2011; Van Wijngaarden,2001; Yorkston & Beukelman 1978) prefer an estimate with the multiple-choice format instead of writing a form of evaluation. Such tests are built on sets of words or phonemes. Furthermore, a method called Single Word Intelligibility Test was developed by Kent et al. (1989). They identified 19 contrast errors at a segmental level connected with HD (e.g., high versus low vowels). Single words are arranged in minimal pair sets, such as "bit" and "bat" because these target contrasts are likely to sound schwa-like in HD patients’ speech (Blaney & Hewlett, 2007, p.22). However, this approach only reveals the mistakes in articulation in the perspective of phonetics.

(19)

Thus it does not estimate the severity degree of HD accurately and neither its influence on communication (Pinto et al., 2017, p.162).

3) The last but not least method, also the one we used in this study, is a combination of an extension of the first and second approaches. The words, non-words, and sentences of the speakers with speech disorders are assessed by a group of listeners functioning as "an auditory jury (Pinto et al., 2017, p.162)" instead of a single or few speech and language pathologists. The ideal auditory jury should consist of native speakers of the language spoken by the patients, with no deficit in auditory perception, inexperienced in the field of neurodegenerative disease, and unfamiliar with the purpose of the intelligibility experiment (Pinto et al., 2017, p.163). During the experiment, sometimes listeners are asked to do a multiple-choice words/non-words test (e.g., Namasivayam et al., 2013; Sussman & Tjaden, 2012; Van Wijngaarden, 2001), some are required to transcribe what they have heard (e.g., Hustad, 2006b; Kempler & Van Lancker, 2002; Tjaden & Wilding, 2011). At the end of the experiment, the percentage of correctly understood words or sentences would be measured as a sign of speech intelligibility (Pinto et al., 2017, p.162).

However, some studies (Yorkston et al., 1994; Walshe, 2003) demonstrate that changes are noticeable even if the person has not been diagnosed with HD, while sometimes a patient diagnosed with severe HD may be relatively unaffected in speech intelligibility. In some situations, speakers do not seem to aware of the severity of their HD (Yorkston et al., 1994; Antonius et al., 1996; Fox & Ramig, 1997). Sometimes they may even show different degrees of speech intelligibility with

(20)

various speech tasks. In a case study of one single PD patient, Kempler and Van Lancker (2002) gave five speech production tasks: spontaneous speech, repetition, reading, repeated singing, and spontaneous singing. Their results show an inconsistency in intelligibility. The outcomes of speech production efficiency varied with tasks: when the participant was asked to read aloud a piece of a printed transcript, which was from a conversational speech sample, the percentage of corrected recognized items was 78%, while the percentage was 29% in spontaneous speech. 3.2. Acoustic analysis as an objective assessment

To analyze voice and speech objectively, audio recordings are usually processed by means of speech processing programs and scripts (Brabenec et al., 2017, p.304). Brabenec points out that normally the process consists of "speech parameterization, consequent statistical analysis, or mathematical modeling (Brabenec et al., 2017, p.304) ." A broad range of speech tasks are necessary to quantify speech pathology description, such as syllable repetition tasks (Sapir et al., 2010), reading tasks (Kempler & Van Lancker, 2002; Skodda et al., 2011), and free speech (Tjaden & Wilding, 2011; Zanini et al., 2010).

In this research, we are looking at the cardinal vowels (or less theoretically, corner vowels) that are a set of reference vowels used by phoneticians in describing the sounds of languages. A cardinal vowel is a vowel sound produced when the tongue is in an extreme position, either front or back, high or low.

Annotation. From the collected recordings, the three corner vowels /a, i, u/ were manually labeled using a two-tier text grid in Praat (i.e., a program for speech analysis

(21)

and synthesis. Boersma & Weenink, 2020). Praat script was used to extract all F1/F2 pairs corresponding to labeled segments automatically. Then they were annotated based on visual observation of the waveform and the wideband spectrogram in Praat (Boersma & Weenink, 2020). Given the characteristics of continuous speech, the vowels were selected according to the following criteria based on an early study of Strinzel et al. (2017, p.59):

"1. Only vowels occurring in intelligible, phonated words were annotated;

2. Only vowels with a stable part of at least 40 ms were selected. This stable part was the central part of each vowel, starting at least one period after the vowel onset and ending one period before vowel offset;

3. Vowels preceded by a voiced sound were only selected if that sound matched the respective vowel’s place of articulation, to ensure that formant transitions and co-articulation did not affect the vowel;

4. Vowels immediately following nasals, glides, or other vowels were not selected." 3.3. Acoustic measurement of vowels

In this part, we introduce the definitions of formant, vowel space area (VSA), vowel articulation index (VAI), F2 ratio, F1, and F2 variability. Then we make some adjustments in these vowel measurements so that they can fit in with this study.

3.3.1. Definition of formant

The production of vowels is mainly constructed by movements of the tongue, the degree of mouth opening, and the flow of air passes from the larynx to the lips. These articulation components work together to create oropharyngeal resonating cavities

(22)

(pharynx and oral cavity) and play a fundamental role in amplifying certain frequency bands of the voice spectrum (Skodda et al., 2011). These harmonics are known as "formants." A formant can be defined as a peak, or local maximum in the spectrum (Jeans, 1968). Formants delineate the individual vowels by their specific peaks of the acoustic energy of a voice. It appears at approximately every 1000Hz. Each formant corresponds to a resonance in the vocal tract (Wood, 2005). Formant frequencies refer to the acoustic resonance of the human vocal tract. Fry (1979, p.76) states that "formants are strictly the resonant frequencies of the driven system(i.e., the vocal tract)," "but since a formant must give rise to a peak in the spectrum of the sound produced, the term formant is quite commonly applied to the frequency at which this peak occurs." The various definitions of formants relate closely to the precise use of the term. However, there is a universal agreement in the numbering convention used to describe the different resonances or spectral peaks (Harrison, 2013, p.30). The formant with the lowest frequency is called the first formant (F1), the second-lowest is called the second formant (F2), and the third F3. Most often the two first formants, F1 and F2, are sufficient to identify the vowel. A wideband spectrogram can display formants (see Fig 1). The darker parts of the spectrogram mean higher energy densities, i.e., more distinct and audible.

(23)

Figure 1. A demonstration from Praat (Boersma & Weenink, 2020): the darker a formant is reproduced in the spectrogram, the more distinct and more audible it is. The yellow line indicates F1, and the blue one indicates F2.

To assess speech intelligibility, the measurement of F2-ratio F1, F2 variabilities are instrumental as well. F2-ratio is expressed as F2i/F2u. This ratio is calculated to index a reduction in the extent of articulatory anterior-posterior movements of the tongue. When the numerator decreases and the denominator increases. In other words, the smaller the ratio, the more reduced is tongue advancement and vice versa. In comparison, F1/F2 variability is a measurement of relative acoustic stability in achieving vowel targets (Kim et al., 2011). They are expressed as the mean standard deviation of first and second formant frequencies of each vowel. The higher the variability of formant frequencies, the less stable it is a speaker in reaching a certain vowel.

(24)

3.3.2. Definition of Vowel Space Area

Vowel Space Area (VSA) or Acoustic Vowel Space (AVS) is a two-dimensional graph with lines connecting plotted values for the first and second formant (F1 and F2) of the vowels: /i/, /u/ and /ɑ/ (Fant, 1973). Calculation of VSA is crucial for the study of speech development, speaking style of individuals, and vowel production (Sandoval et al., 2013). The /i/, /u/ and /ɑ/ vowels are also known as the cardinal vowels (Stevens, 2000), see Fig. 2. Fant (1973) suggested that we could make assumptions of vowel articulation based on the close relationship between the values of F1, F2, and tongue position. F1 is associated with the size or shape of the cavities created by jaw opening. At the same time, F2 is related to tongue placement, i.e., a high F2 value is the outcome of an advanced tongue movement. The mean F1/F2 values for each of the three corner vowels are then used to compute the triangular area formed by the corner vowels (Sandoval et al., 2013. p.477). The formula for VSA is:

VSA = 0.5 × |F1i × (F2a - F2u) + F1a × (F2u - F2i) + F1u × (F2i - F2a)| (Liu et al., 2005).

(25)

Figure 2. The three cardinal vowels, /a/, /i/, and /u/, constitute a triangle (based on the formants in the pronunciations of the IPA (Wells, 2001).

VSA can be used as a measurement of vowel articulation and movement range (Liu et al., 2005). Bradlow and Bent (2002) argue that larger VSAs indicate clearer and more intelligible speech than smaller VSAs. Thus, VSA is calculated as a meaningful, predictive measurement of overall intelligibility for patients with HD and multiple sclerosis (between 6 and 13 %) (McRae et al., 2002, Tjaden and Wilding, 2004). 3.3.3. Definition of Vowel Articulation Index

Roy et al. (2009) and Sapir et al. (2010) developed another useful vowel measurement to quantify acoustic cues of HD in PD speech, which is called the "vowel articulation index (VAI)." The formula for VAI calculation is:

VAI = | F1a + F2i | / |F1i + F1u + F2a + F2u| (Roy et al., 2009)

Although VSA can detect changes in articulatory function, some studies (Ansel & Kent, 1992; Bunton & Weismer, 2001; Sapir et al., 2007; Weismer et al., 2001) show

(26)

that VSA was not a successful measure to differentiate between healthy speakers and speakers with HD, even though HD participants in these studies have proven to have poor speech intelligibility or weak articulation. There is no clear explanation of this issue. However, Roy et al. (2009, p128) assume that one reason could be the different vowel formant measurements, which lead to a large inter-speaker variability. Thus VAI is designed to minimize inter-speaker variability and maximize sensitivity to formant centralization (Roy et al., 2009).

3.4. Speech Data collection

Subject. Speech recordings were collected from a male native Dutch speaker, who speaks English fluently and uses both languages on a daily basis. The participant was at the age of 66 when the speech recordings started in August 2017. His definite diagnosis of PD was made six years before the beginning of the recordings. He has not been diagnosed with HD, but he speaks with a slight stutter (i.e., a chronic speech disorder that is characterized by repeated speech movements and fixed articulatory postures). The patient started speech therapy in his L1, Dutch, on 25th September in 2018. When this research was conducted, there were twenty-four recordings in total: twelve were collected before his speech therapy, the other twelve were collected during/after it.

Speech Task and Recording Procedure. Speech recordings are constituted of four speech tasks: 1) to answer an open question, normally a monologue of one memorable event from past weeks; 2) to describe one of Heaton's serial picture stories (1966); 3) to describe a video clip from one of Charlie Chaplin's films; 4) to read The North

(27)

Wind and the Sun passage from Aesop's Fables. All tasks were conducted first in English and, subsequently, Dutch. The recordings were collected every month to the extent possible (mean interval is 39.3 days, i.e., 5.6 weeks, SD = 12.56) from one to three hours after the medication intake. The recording sessions took place in quiet rooms at the university with the Zoom H2 recorder placed at around a 40 cm distance. There were 24 sessions in total. We only examined the English data relative to the hypotheses of the current study. The collection and analysis of the material were approved by the Medical Ethics Committee of the University Medical Center Groningen.

3.5. Perceptual experiment

3.5.1. Design of the perceptual experiment

Two types of speech intelligibility experiments were conducted: a sentence intelligibility test and a phoneme intelligibility test based on words and non-words. In both sections of the experiment, the subject’s recording sessions were divided into three parts according to speech therapy stages: before therapy - "before," i.e., session 1 - 12, during therapy - "during," i.e., session 13 - 21, and after therapy - "after," i.e., session 22 - 24.

The research question of the first part is to investigate whether listeners can perceive changes in the L2 speech before, during, and after speech therapy in general. Participants listened to the extractions of the PD patient's recording only once, then rate them on a 7 Likert scale according to their perception of speech healthiness (from "very healthy" to "very unhealthy").

(28)

The second part examined whether the corner vowels (/i/, /u/ and /a/) of our speaker were confused with other vowels that were more central. For a central vowel, the defining characteristic is that the tongue is positioned halfway between a front vowel and a back vowel (International Phonetic Association & International Phonetic Association Staff, 1999). Furthermore, for PD patients, front vowels and back vowels tend to move to the center because it is difficult to control their precise tongue position (Skodda et al., 2011). We wanted to find out if the listeners could recognize the corner vowels in L2 speech before, during, and after speech therapy. The recordings, consisting of words and non-words, were played twice with silence in between. Listeners were asked to choose an answer, which was the syllable or word they thought they just heard, from three choices. They could also choose "unclear" if the recordings were unintelligible to them. This multiple choice answer sheet can be seen in the Appendix (page 72).

Due to the irresistible factor of forced social distancing caused by the pandemic Covid-19, the experiment was required to be completed online. Therefore, we created an online version of the experiment using "Just Another Tool for Online Studies" (JATOS, www.jatos.org, Lange et al., 2015), which is an open-source, cross-platform web application with a graphical user interface (GUI). JATOS allowed us to run the experiment on mobile phones, tablets, desktops, and laptops. More importantly, it gave us complete control over who can access the result data and can comply with the ethics. A JavaScript was adjusted to apply for this experiment, and the original one was written by Verkhodanova et al. and used in their study (2019). We also used a

(29)

JavaScript library called "jsPsych (de Leeuw, 2015)" for determining which trial to run next, storing data, and randomization. After writing the experiment's content, we employed OSWeb extension for OpenSesame (Mathôt et al., 2012), which is a program that enabled us to test experiments in a browser.

3.5.2. Stimuli

There were twenty-four sessions with a period of two years, twelve sessions each year. For the first part of the experiment, we chose 20 phrases from year one (session 1 - 12) and year two (session 13 - 25). Those twenty short phrases are fragments chosen from different speech tasks. Ten were from free speech recordings (5 chosen from year one, 5 from year two), two fragments from reading tasks (1 chosen from year one, 1 from year two), four from picture description (2 chosen from year one, 2 from year two), and four from video description (2 chosen from year one, 2 from year two).

In regards to the second part of the experiment, we selected 80 stimuli from year one and year two (40 chosen from year one, 40 from year two). The answer choices of these stimuli are minimal pairs, i.e., a pair of words that vary by only a single sound (like bad and bed). To get a more unbiased response from participants, most stimuli are non-words, such as "dlas", "dlis", "dlos", or "haf", "hof", "hef" and "herf". Among the 80 stimuli, 60 of them are words and non-words with the target vowels /i/, /u/, and /a/; 20 of them are distractors with other vowels, for instance, /ə/, /æ/, /ɒ/, and /ʌ/. There are no diphthongs and triphthongs in neither the stimuli nor the answer options.

(30)

3.5.3. Participants and Procedure for Perceptual Experiment

Participants. A total of 44 listeners judged intelligibility, 29 females and 15 males

(mean age 26.4, SD 5.5 years). All listeners reported normal hearing and high proficiency in English. Thirty of them were recruited from the University of Groningen. The other 14 were university students or residents in English-speaking countries such as Canada, the USA, the UK, and Ireland. Listeners reported minimal exposure to motor speech disorders, had not taken a course in motor speech disorders, and reported no history of speech-language disorders.

Procedure. All perceptual tasks were completed on the individual listeners’ digital

device (computer, laptop, tablet, or mobile phones). Listeners filled out a survey through Qualtrics Online Survey Software (Qualtrics, 2014). To protect the patient’s personal information, listeners were required to sign a consent form followed by patients' data protection policy of University Medical Center Groningen (UMCG). After listeners finished the survey, Qualtrics redirected them to the set perceptual experiment hosted on the server of the University of Groningen. Prior to presentation to listeners, stimuli were normalized to a peak intensity of 70 dB using Praat (Boersma & Weenink, 2020). Listeners completed the tasks in which they listened to the stimuli in randomized order. In the first part, listeners were asked to rate healthiness on a 7 Likert scale according to their perception; while in the second part, listeners would choose an answer they perceived from 5 choices, e.g., "rock", "reck", "rack", "ruck", and "unclear". After choosing an answer for each audio clip heard, listeners also reported the confidence in their answer by answering the question, "On

(31)

a scale from 1 to 7, how intelligible did you find the word? (from "Not intelligible at

(32)

4. RESULTS

This part is comprised of results from both objective and subjective assessment of the PD patient’s L2. In the acoustic analysis, the four vowel measurements, VSA, VAI, F2 ratio, and F1, F2 variabilities, are listed and compared. In the analysis of the perceptual experiment, we reviewed the listener’s rating for L2 healthiness and intelligibility.

4.1. Results of acoustic analysis

We calculated four vowel measurements using the 1406 mean formant values extracted from 703 vowels (223 a, 240 i, and 240 u) through 24 sessions using R (R Core Team, 2013). Table 1 summarizes the results of the vowel measurements for three groups divided by time: before therapy - "before," i.e., session 1 - 12, during therapy "during," i.e., session 13 21, and after therapy "after," i.e., session 22 -24.

Table 1: Summary of vowel measurements before, during, and after speech therapy, where F2 ratio is the ratio of /i/ and /u/ second formants, F1-var and F2-var are F1 and F2 variabilities (mean(sd)).

Vowel measurements

Time Value VSA VAI F2 ratio F1-var F2-var

Before

Mean 137646.2 0.96 2.03 29.45 125.85

Maximum 256783.7 1.30 3.21 37.85 174.91

Minimum 35544.0 0.64 1.17 20.57 95.02

(33)

During Mean 132858.7 0.95 2 29.15 127.39 Maximum 285332.0 1.22 2.83 40.52 167.47 Minimum 43824.4 0.69 1.36 21.77 109.29 SD 35111.7 0.08 0.27 6.04 18.79 After Mean 150323.8 0.98 2.04 33.75 121.28 Maximum 306205.7 1.27 2.98 38.32 145.04 Minimum 32881.1 0.72 1.2 28.91 94.04 SD 42281.0 0.09 0.32 3.96 22.41

First, we performed a visual plotting of VSA for each session recording (see Fig. 3).

Each triangle was plotted using the F1 and F2 values of the cardinal vowels, /i/, /u/, and /a/. They visualized the triangle area of VSA. If we compare the first VSA plotting with the last one, the shape and area differ distinctly.

(34)

From the graph, we could see that the triangles are different from one to another. However, we needed further comparison statistically to verify the results.

Then we plotted the VSAs of three time periods (see Fig. 4), from which we could not observe a noticeable difference in patterns. Hence further statistical comparisons were required.

Figure 4. The plotting represents VSA before, during, and after speech therapy. (VSA with green lines reflects the results from session 1 till 12; VSA with blue lines reflects the results from session 13 till 20; VSA with pink lines reflects the results from session 21 till 24.) We can see that the “after” VSA plotting has the most distinct points comparing to the other two.

(35)

4.1.1. Calculation of vowel measurements

The purpose of the chosen acoustic analysis is to see whether there are changes in vowel measurements over the three time periods. Since we had only one single subject and 24 sessions, we calculated all possible permutations of his formant values instead of averaging everything per session; in addition, the mean measures would be less sensitive in general. This allowed us to have many data points for each time stamp providing a distribution for every session by using every vowel sample with every other vowel sample. For instance, as mentioned in the Methodology section, the values of VSA and VAI are calculated as the following formula:

VSA = 0.5 × | F1i × | F2a - F2u | + F1a × |F2u - F2i| + F1u × |F2i - F2a| (Liu et al., 2005).

VAI = | F1a + F2i | / |F1i + F1u + F2a + F2u| (Roy et al., 2009).

However, to operate with many data points instead of a single measurement per session, we use each formant of vowels. In other words, if we had 10 /i/, 10 /u/, and 10 /a/ in one session, then we would have 10 F1i (F1i1, F1i2, F1i3 ... F1i10), 10 F2i (F2i1, F2i2, F2i3 ... F2i10), 10 F1u, 10 F2u, 10 F1a, and 10 F2a. Instead of using each value only once, the adjusted formula are shown below:

VSA1 = 0.5 × | F1i1 ×F2a1-F2u1 + F1a1 × F2u1 - F2i1+ F1u1 ×F2i1-F2a1|, VSA2 = 0.5 × | F1i1 ×F2a1-F2u2 + F1a1 × F2u2 - F2i1+ F1u2 ×F2i1-F2a1|, VSA3 = 0.5 × | F1i1 ×F2a1-F2u3 + F1a1 × F2u2 - F2i1+ F1u3 ×F2i1-F2a1|, VSA4 = 0.5 × | F1i1 ×F2a2-F2u1 + F1a2 × F2u1 - F2i1+ F1u1 ×F2i1-F2a2|, VSA5 = 0.5 × | F1i1 ×F2a2-F2u2 + F1a2 × F2u2 - F2i1+ F1u2 ×F2i1-F2a2|,

(36)

VSA6 = 0.5 × | F1i1 ×F2a2-F2u3 + F1a2 × F2u3 - F2i1+ F1u3 ×F2i1-F2a2|,

... etc., until every possible formant values combinations are used within one session. This procedure did not only enabled us to have a distribution for every session, but also to compensate for the lack of measurements of /a/: 17 less than of /i/ and /u/. The same rule also applies for VAI, i.e.,

VAI1 = | F2i1 + F1a1 | / | F1i1 + F1u1 + F2u1 + F2a1 | VAI2 = | F2i1 + F1a1 | / | F1i1 + F1u2 + F2u2 + F2a1 | VAI3 = | F2i1 + F1a1 | / | F1i1 + F1u3 + F2u3 + F2a1 | .... etc.

When calculating F2-ratio (formula: F2i/F2u), instead of having one mean ratio (F2 (i1+i2+...+i10) / F2 (u1 + u2 + u3+...+u10) ), we would use 100 ratios for each session: F2i1/F2u1, F2i1/F2u2, F2i1/F2u3, etc.

4.1.2. Results of vowel measurements

To assess the change in forms of a trend in the different time period, we fitted a simple linear regression model in R. Our null hypothesis was that there was no relationship or effect of the time of sessions (“before,” “during,” and “after”); the alternative hypothesis was that there was a relationship or an effect of the time of sessions.

VSA & VAI. The results showed that the model was significant (VSA: F (1, 22298)

= 68.5, p < 0.001; VAI: F (1, 22298) = 71.14, p < 0.001) and explained 3% of the variance in the data (see Fig. 5 and 6). Thus we can reject H0, and assume there was an impact of time on values of VSA and VAI. Regression coefficients are shown in

(37)

the Tables 2 and 3. From Table 2, it can be seen that there was a positive coefficient for the session, suggesting that as the sessions progressed, the value of VSA increased; while Table 3 shows that there was a negative coefficient for a session, suggesting that as the sessions progressed, the value of VAI decreased (see Fig. 7).

Table 2. The output of the simple linear regression of VSA value Coefficients

Estimate Std. Error t value Pr(>|t|) (Intercept) 134258 508.62 263.963 <2e-16 ***

session 284.21 34.34 8.277 <2e-16 ***

---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

(38)

The Normal Q-Q plot indicates that the data follow a straight line, so it approximates the normal distribution. This Scale-Location plot shows that the residuals are spread equally along with the ranges of predictors. Scatter plot Residuals vs. Leverage indicates that there are some non-influential cases.

Table 3. The output of the simple linear regression of VAI value Coefficients

Estimate Std. Error t value Pr(>|t|) (Intercept) 9.508e-01 1.140e-03 833.814 <2e-16 *** session 6.493e-04 7.698e-05 8.434 <2e-16 ***

---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Figure 6. Four scatter plots of VAI: Residuals vs. Fitted showing the residuals and their deviations from the fitted values. There is a visible pattern in the relationship.

(39)

The Normal Q-Q plot indicates that the data follow a straight line, so it approximates the normal distribution. This Scale-Location plot shows that the residuals are spread equally along with the ranges of predictors. Scatter plot Residuals vs. Leverage indicates that the 2934th case is not influential.

Figure 7. box plots showing the dispersion in the VSA (left) and VAI (right) values over three time periods (before, during, and after speech therapy).

We also conducted a two-sample Kolmogorov-Smirnov test to compare the form of the distributions. The null hypothesis is that two distributions are the same, and the alternative hypothesis is that two distributions are different. The results showed that we can reject H0 (“before” and “during”: D = 0.15, p < 0.001; “before” and “after”: D = 0.07, p = 0.01; “during” and “after”: D = 0.09, p = 0.004).

From these facts, we may conclude that the subject had a significant improvement in VSA and VAI measures after speech therapy, despite that the values dropped slightly in the duration. His VSA values increased from 137646.2 before speech therapy to 150323.8 after therapy. This indicates that his vowel articulation had been decentralized along with time, i.e., the production of the most distinct vowels /i/, /u/,

(40)

of three triangles in Fig. 4. In the meantime, the VAI values become larger, from 0.96 before speech therapy to 0.98 after therapy, i.e., the larger the ratio, the smaller the centralization.

F2 Ratio. Examining F2 ratio throughout 24 sessions, the one-way ANOVA test

output can be seen in Table 4. It shows the differences in means for time period (diff) and also the lower (lwr) and upper (upr) boundaries. The last column contains the adjusted p-values for each time period. The one-way ANOVA shows us that there was a significant effect of time on value of F2 ratio, F (2, 2397) = 3.25, p < 0.05, η2 = .003. A Tukey post hoc analysis revealed that F2 ratio value in the “during” period was the lowest (M = 2, SD = 0.27) as compared to both “before” (M = 2.03, SD = 0.33), p = 0.05, 95% CI [- 0.066, 0.0003] and “after” (M = 2.03, SD = 0.32), p = 0.12, 95% CI [-0.007, 0.082]. The time before and after therapy showed no difference at p = 0.96, 95% CI [- 0.037,0.047]. The effects sizes are very small (η2 = .003) and have been visualized in Fig 8.

In conclusion, the patient’s tongue advancement had improved after speech therapy. The F2 ratio values increased slightly from 2.3 before speech therapy to 2.4 after therapy. Although the mean of F2 ratio value in the “during” period dropped to 2, the values were less scattered (see Fig. 8).

Table 4. Tukey's Test for Post-Hoc Analysis 95% CI for Mean Difference

session diff. lwr. upr. p adj.

(41)

After - Before 0.005 - 0.037 0.047 0.96

After - During 0.037 -0.007 0.082 0.12

Figure 8. Boxplot showing the dispersion in F2 ratio value in three time periods (before, during, and after speech therapy).

F1 and F2 variabilities indicate stability of achieving vowel targets (Kim et al., 2011). The one-way ANOVA test shows that there was no significant effect of the time period on the F1 and F2 variabilities (F1 var. :F (2, 21) = 1.37, p = 0.3,η2= .3; while F2 var.: F (2, 21) = 0.1, p = 0.81,η2= .09). A Tukey post hoc analysis (see Table. 5) revealed that there was no significant difference during three time periods in both F1 and F2 variabilities (all p-value > 0.05).

In conclusion, the subject’s stability of achieving vowel targets had no significant changes, which means that speech therapy had little impact on his F1 and F2 variabilities.

(42)

Table 5. Tukey's Test for Post-Hoc Analysis 95% CI for Mean Difference

F1 Variability

session diff. lwr. upr. p adj

During - Before -0.29 -6.47 5.89 0.99 After - Before 4.31 -3.51 12.12 0.36 After - During 4.6 -3.69 12.89 0.36 F2 Variability During - Before 1.54 -24.96 28.05 0.99 After - Before -4.48 -38.1 28.95 0.94 After - During -6.12 -41.68 29.44 0.9

4.2. Results of the perceptual experiment 4.2.1. Healthiness rating of phrases

To assess the rating patterns of the participants, we conducted a one-way ANOVA test and then fitted a simple linear regression model in R. Listeners rated healthiness in from 1 - 7 (“Very unhealthy” to “Very healthy”). Table 6 summarizes the results for the healthiness rating scores of phrases in three time periods.

One-way ANOVA test. Our null hypothesis was that there was no difference in

healthiness ratings between the time groups. An alternative hypothesis was that there was a difference in healthiness ratings depending on the different periods: at least one group was different from the others.

(43)

Table 6. Descriptives of Healthiness rating scores for the three time periods.

Time Before During After

mean 4.98 4.94 5.08

minimum 1 2 2

maximum 7 7 7

SD 1.51 1.49 1.42

There was no significant effect of the time on healthiness rating score, F (2, 877) = 0,41, p = 0.66, η2 = 0.0009. Thus we cannot reject H0. Table 7 shows the results of a Tukey post hoc analysis, that the healthiness rating for recordings done after speech therapy was the highest (M = 5.08, SD = 1.42) as compared to both “before” group (M = 4.98, SD = 1.51), p = 0.8, 95% CI [-0.25, 0.44] and “during” group (M = 4.94, SD = 1.49), p = 63, 95% CI [-0.22, 0.5]. “Before” recordings group also showed slightly better scores than the “during” recordings group at p = 0.91, 95% CI [-0.31,-0.21] (see also Fig. 9).

Table 7. Tukey's Test for Post-Hoc Analysis session

diff. lwr. upr. p adj

During - Before - 0.05 - 0.31 0.21 0.91

After - Before 0.09 - 0.25 0.44 0.80

(44)

Figure 9. Boxplot with the dispersion of rating scores for 3 time periods: before, during, and after speech therapy. The lowest rate score appears in the “before” group. It also revealed that people were less sure about their rating decisions while listening to the “before” recordings.

In addition, listeners had more confidence when rating audio clips from “after” group (85.2% of listeners chose “rather sure”), then the confidence rate decreased receptively (“during” recording 83.8%, and for “before”: 81.6%). Moreover, Table 8 shows that listeners gave different rating scores for each speech task (see Fig. 10). Among the four tasks, the healthiness rating scores of the reading task were the highest (M = 5.39), followed by the picture description task (M = 5.18), then the video description task(M = 4.91), the free speech task had the lowest rating score (M = 4.87).

(45)

Table 8. Means of 4 speech tasks (Fsp = free speech, Pic = picture description, Read = reading, and Vid = video description) in three time periods.

Time Fsp. Pic. Read. Vid.

Before 4.9 5.14 5.72 4.67

During 4.81 5.09 5.14 5.05

After 4.89 5.31 5.31 5

All Time 4.87 5.18 5.39 4.91

Figure 10. Bar chart of the mean rating score for each of the speech tasks in three time periods.

Simple linear regression. We fitted a linear model of healthiness rating scores as a function of the session number. This model was not significant (F (1, 878) = 0.13, p =

(46)

0.72) and explained 1.2% of the variance in the data (see Fig. 11 and 12). Regression coefficients are shown in Table 9. It can be seen that there was a slightly positive coefficient for the session stage, suggesting that as when the sessions progressed, the healthiness rating scores also increased slightly.

Figure 11. Scatterplot showing the residuals and their deviations from the fitted values. There is an obvious pattern in the relationship, so that we can assume heteroscedasticity.

Figure 12. The Q-Q plot of the residuals. The Q-Q plot indicates that the data does not follow a straight line, so it does not approximate the normal distribution. A

(47)

Shapiro-Wilk on the residuals also confirms the same outcome (p < 0.001). Table 9. output of the simple linear regression

Coefficients

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.93796 0.12442 39.688 <2e-16 ***

session 0.02520 0.06902 0.365 0.715

---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

In conclusion, listeners’ perception was not influenced significantly by the time of the three recording groups. The healthiness rating scores raised from “before” 4.98 to “after” 5.08. However, listeners were more certain when they rated the “after” recordings. Moreover, recordings of the reading task showed the highest healthiness rating score.

4.2.2. Accuracy scores of words and non-words

We fitted a simple linear regression to find the relationship between the percentage of correctly recognized words, non-words and the time of sessions. Our null hypothesis was that the time of sessions (before, during, and after therapy) did not impact the percentage of correctly recognized words, non-words; alternative hypothesis 1 was that the time of sessions positively influenced the percentage; alternative hypothesis 2 was that the time of sessions negatively influenced the percentage.

(48)

Figure 13. Scatterplot showing the relationship between session (y-axis) and the percentage of accuracy scores (x-axis).

This model was significant (F (1, 8) = 21.67, p < 0.005) and explained 73% of the variance in the data (multiple R-squared). Regression coefficients are shown in Table 10. Thus we could accept H1. It can be seen that there was a positive coefficient for the continuous session, suggesting that as the session progressed, the percentage of correctly recognized words and non-words also increased (see Fig. 13, 14, and 15). Table 10. The output of the simple linear regression

Coefficients

Estimate Std. Error t value Pr(>|t|) (Intercept) 74.73753 1.12553 66.402 2.94e-12 ***

session 0.33507 0.07198 4.655 0.00163 **

(49)

---Figure 14. Box plot showing the dispersion in the percentage of correctly recognized words and non-words in 3 time periods (before, during, and after speech therapy).

Figure 15. The Q-Q plot of the residuals. The Q-Q plot indicates that the data follow a straight line, so it approximates the normal distribution. The result from a Shapiro-Wilk test on the residuals also confirms it (p > 0.05).

(50)

4.2.3. Intelligibility rate of words and non-words

Regarding the speech intelligibility, we conducted a simple linear regression model in R. Listeners rated intelligibility of the words and non-words they heard from 1 - 7 (“Not intelligible at all” to “Very intelligible”), and Table 11 summarizes the descriptives of the intelligibility rating scores. Our null hypothesis was that there was no difference in intelligibility ratings between the time groups. An alternative hypothesis was that there was a difference in intelligibility ratings depending on the different periods of time: at least one of the groups was different from the others. Table 11. Descriptives of intelligibility rating scores over the three time periods.

Time Before During After

mean 4.81 4.67 4.65

minimum 1 1 1

maximum 7 7 7

SD 1.99 2.01 2.02

The result of one- way ANOVA test showed that F (1, 3517) = 2.4, p = 0.09, η2 = 0.001. Thus we cannot reject H0. Table 12 shows the results of a Tukey post hoc analysis, that the intelligibility rating scores for the recordings in the time before speech therapy showed highest (M = 4,81, SD = 1.99) as compared to both “during” group (M = 4.67, SD = 2.01), p = 0.17, 95% CI [ -0.32, 0.04] and “after” recording group (M = 4.65, SD = 2.02), p = 0.18 0.001, 95% CI [-0.37, 0.05]. The “during” group also did slightly better than the “after” group at p = 0.98, 95% CI [-0.25,0.21] (see also Fig. 16).

(51)

Table 12. Tukey's Test for Post-Hoc Analysis session

diff. lwr. upr. p adj

During - Before - 0.14 - 0.32 0.04 0.17

After - Before - 0.16 - 0.37 0.05 0.18

After - During - 0.02 -0.25 0.21 0.98

Figure 16. Bar chart showing the mean intelligibility rating scores before, during, and after speech therapy in 4 different task items (a, i, u, filler).

Simple linear regression. We fitted a linear model of healthiness rating scores as a

function of the session number. The null hypothesis was there was no difference in intelligibility ratings between the time groups. The alternative hypothesis was that

(52)

there was a difference in intelligibility ratings depending on the different periods of time: at least one of the groups was different from the others.

The results showed that this model was significant (F (1, 3518) = 4.15, p = 0.04) and explained 3% of the variance in the data (see Fig. 17 and 18). Thus we can reject H0. Regression coefficients are shown in Table 13. It can be seen that there was a slightly negative coefficient for the session stage, suggesting that when the sessions progressed, the intelligibility rating scores decreased slightly.

Figure 17: Scatterplot showing the residuals and their deviations from the fitted values. There is an obvious pattern in the relationship, so that we can assume heteroscedasticity.

(53)

Figure 18. The Q-Q plot of the residuals. The Q-Q plot indicates that the data does not follow a straight line, so it does not approximate the normal distribution. A Shapiro-Wilk on the residuals also confirms the same outcome (p < 0.001).

Table 13. The output of the simple linear regression Coefficients

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.88264 0.08071 60.498 <2e-16 ***

session -0.08788 0.04314 -2.037 0.0417 *

---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

In conclusion, the intelligibility rating scores showed a descending trend over time. However, the result could also be influenced by the varied understanding of different listeners.

(54)

5. SUMMARY AND DISCUSSION

This case study had two purposes to establish: 1) whether a person with PD who received speech therapy in his L1 also experienced changes in his L2; 2) if listeners could perceive changes in L2 before, during, and after therapy, and how intelligible his L2 in these three time period respectively.

The results of the acoustic analysis showed significant changes for VSA, VAI, and F2 ratio. Kent and Kim (2003) pointed out that VSA is related to the distances between the F1 and F2 coordinates of three distinctive vowels /a/, /i/, and /u/. The reduced value of VSA indicates centralized cardinal vowels, which imply potentially existing issues associated with the speech intelligibility deficit (Liu et al., 2005). We can claim that the subject had improvement in his vowel articulation over time. However, the positive changes did not present a perfect upward continuation. This can be explained by the fact that “during” measures included the first months of speech therapy, in which the subject showed worse vowel articulation than the last months. It can be seen in Table1, that the value of VSA had dropped lower during the therapy (from 137646.2 to 132858.7) and then progressed noticeably after therapy (150323.8). While the VAI values of our subject developed from 0.96 (before therapy) to 0.95 (during) and then reached 0.98 after therapy, which was almost considered to be a normal metrical number. Sapir et al. (2011b) argue that a normal VAI value is supposed to be close to 1.0. because "numerator in the VAI is likely to decrease, and the denominator is likely to increase with vowel centralization (Sapir et al., 2011b, p174)." The same situation could be observed with the F2 ratio, which indicated

(55)

reduction in the extent of articulatory anterior-posterior movements of the tongue. The subject’s F2 ratio raised slightly after receiving speech therapy, from “before” 2.03 to “after” 2.04, which meant that his tongue advancement extended marginally. Besides, there were no significant changes in F1 and F2 variabilities as well, which reflect a speaker’s steadiness in achieving vowel targets (Kim et al., 2011).

Kim et al. (2011, p.191) observed that lower intelligibility was related to four aspects: 1) increased overlap among vowels; 2) increased F1 variability; 3) reduced VSA, and 4) reduced mean distance between vowels. Our findings showed increased VSA and VAI, yet increased F1 variability as well. Therefore the listeners’ rating scores on both healthiness and intelligibility would function as markers of the patient’s speech intelligibility.

The results of the perceptual experiment were admixing. There were four main findings: 1) the healthiness rating from listeners was similar to the values of VSA and VAI, i.e., they rated the audio clips extracted before speech therapy at 4.98, then the score dropped to 4.94 during therapy, and increased to 5.08 after therapy; 2) among the four speech tasks, the healthiness rating score of the reading task was unmistakably higher than other three tasks: free speech, picture and video description; 3) the accuracy scores of correctly understood words and non-words were increased over time; and yet strangely, 4) the intelligibility showed a trend in the opposite direction.

The first finding of the perceptual experiment matched the results of vowel measurements. Since the patient has not been diagnosed with HD, and his vowel

(56)

articulation in L2 was not centralized, the listeners detected little changes in the three time periods. However, the overall healthiness rating score was relatively low, mean = 5. Two factors might have influenced listeners’ judgment. Firstly, the patient spoke English with a Dutch accent. It is known that speech intelligibility depends strongly on the experience with the target language by listeners as well as speakers (e.g., Flege, 1992; Strange, 1995). Dagenais et al. (1998, 1999, 2006) stated that the variables affecting listeners’ perception have not been comprehend clearly. Furthermore,

Dagenais et al. (1998, 1999, 2006) observed that listeners have a different

understanding of intelligibility, and they are under the influences of other speech dimensions (e.g. acceptability, naturalness, etc.). In our recordings, components like pronunciation, intonation, stress, and rhythm of Dutch speech can be related to accentedness. Generally, listeners are sensitive to detect the presence of a foreign accent (Munro & Derwing, 2011, p.477). Flege (1988, 1995) believed that accented speech generated a perceptual bias from audiences (especially for L2 users). Speaking with a foreign accent would lead to numbers of possible consequences such as negative evaluation and reduced intelligibility. When listeners heard the speech, it could be sentence, words, even phone, they probably did not recognize it as non-native, but to connect it with a variety of English they were familiar with (Flege, 1988). Derwing and Munro (1995) also pointed out that a strong accent required listeners to spend extra processing time to understand what was said, which may result in a lower intelligibility rating score. As Hustad (2006a, p.268) suggested, production-related variables are dependent on the speaker, while perception related

(57)

variables are connected with the listener. Another influential element may be the lack of native English speakers participants. Although Derwing and Munro (1998) emphasized there was no significant difference in intelligibility ratings between two groups of listeners, one being a group of native English listeners, and the other being a group of high proficient non-native listeners from mixed L1 backgrounds. Van Wijngaarden et al. (2002), on the other hand, conducted a research on quantifying the intelligibility of speech in noise for non-native listeners. They did a speech reception threshold (SRT) test for people from different linguistic backgrounds. SRT is an adaptive method that measures the speech-to-noise ratio, at which 50% of the tested sentences are perceived correctly (Van Wijngaarden et al., 2002, p. 1907). The results showed that when Dutch participants listened to German and English speakers, they need a better speech-to-noise ratio (SNR, i.e., the level of the speech relative to the level of the background noise) from 1 to 7 dB to have the same perception as native listeners. Since all audio clips in our experiment had the same sound volume, i.e., to a peak intensity of 70 dB, native speakers may have a better chance of successfully understanding what was said.

The second finding that the reading task received the highest rating on healthiness is in line with a report from Kempler & Van Lancker (2002) on a case study of one HD speaker. The distinctive aspect from the other three speech tasks (spontaneous speech, picture, and video description) was that our subject had read the same text ”The North Wind and the Sun” time and time again over two years. He was familiar with this piece of text and proceeded in a moderate speech rate, more