• No results found

Perception of dysarthric speech: An exploratory cross-linguistic study of Parkinson’s disease identification from a Dutch voice Dominika Trčková S4053648

N/A
N/A
Protected

Academic year: 2021

Share "Perception of dysarthric speech: An exploratory cross-linguistic study of Parkinson’s disease identification from a Dutch voice Dominika Trčková S4053648"

Copied!
79
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Perception of dysarthric speech:

An exploratory cross-linguistic study of Parkinson’s

disease identification from a Dutch voice

Dominika Trčková

S4053648

Supervisor prof. dr. W. M. Lowie

Second reader prof. dr. M. C. J. Keijzer

Date of completion 22/06/2020

Word count 17 988

MA Thesis

Department of Applied Linguistics

Faculty of Arts

(2)

Declaration of Authenticity MA Applied Linguistics – 2019/2020

MA-thesis Student name: Dominika Trčková

Student number: S4053648

PLAGIARISM is the presentation by a student of an assignment or piece of work which has

in fact been copied in whole, in part, or in paraphrase from another student's work, or from any other source (e.g. published books or periodicals or material from Internet sites), without due acknowledgement in the text.

TEAMWORK: Students are encouraged to work with each other to develop their generic

skills and increase their knowledge and understanding of the curriculum. Such teamwork includes general discussion and sharing of ideas on the curriculum. All written work must however (without specific authorization to the contrary) be done by individual students. Students are neither permitted to copy any part of another student’s work nor permitted to allow their own work to be copied by other students.

DECLARATION

• I declare that all work submitted for assessment of this MA-thesis is my own work and does not involve plagiarism or teamwork other than that authorised in the general terms above or that authorised and documented for any particular piece of work.

Signed

(3)

Acknowledgments

I would like to express my gratitude to my supervisor, Wander Lowie, for his time, helpful feedback, and support in the face of the strenuous time pressure we were all under. I would also like to thank him for offering us, students of Applied of Linguistics, an

opportunity to find a work placement within research internships, without which this study would not be possible. I am most grateful for the chance to collaborate and become friends with Vass Verkhodanova, my placement supervisor, as her guidance, bright solutions, and kind words of support were essential to my writing process. I would also like to thank my friends and professors, who helped me look for participants when finding listeners seemed impossible. Thank you!

(4)

A list of abbreviations

F0 fundamental frequency F1 first formant frequency F2 second formant frequency HC healthy control

HD hypokinetic dysarthria L1 first language

L2 second language PD Parkinson’s disease

(5)

Table of contents

1. Introduction ...7

2. Literature ...10

Parkinson’s disease ...10

Hypokinetic dysarthria ...13

Speech analysis as a diagnostic tool in PD ...16

Listener effect ...19

Recognizing PD from multilingual voice ...22

2.1 Statement of purpose ...24

3. Method ...27

3.1 Participants ...28

3.2 Materials ...29

Data selection ...29

Preparation of auditory stimuli ...33

Preparation and design of perception test ...34

Consent form and questionnaire ...37

3.3 Procedures ...38

3.4 Analyses and statistics ...39

4. Results ...40

4.1 Initial visualizations ...40

4.2 Welch’s F test and Tukey post-hoc tests ...44

Unhealthiness-recognition-accuracy scores ...44

Sentence-type-recognition-accuracy scores in the PD group ...49

Sentence-type-recognition-accuracy scores in the HC group ...54

5. Discussion ...58

5.1 Distinguishing PD patients from healthy speakers ...59

5.2 Recognizing sentence type in a parkinsonian voice ...61

5.3 Recognizing sentence type in a healthy voice ...63

5.4 The main research question ...65

6. Conclusion ...66 6.1 Future research ...68 6.2 Limitations ...69 6.3 Practical implications ...70 References ...71 Appendices ...76 Appendix A ...76 Appendix B ...78

(6)

Abstract

This study investigates the effect of listeners’ familiarity with the language of speakers (Czech listeners, Dutch listeners) and the effect expertise (naïve listeners, phoneticians) on their perception of speech produced by Dutch Parkinson’s disease (PD) patients diagnosed with hypokinetic dysarthria (HD). A perception experiment was designed to collect

healthiness assessment scores and identification accuracy of naïve and phonetically trained Czech and Dutch listeners in two listening tasks: rating of speech healthiness and recognition of sentence type intonation. Speech samples from 30 Dutch speakers diagnosed with PD and HD, and 30 Dutch healthy controls were collected to create auditory stimuli for the listening test from short phrases from intonation tasks and prompted free speech interviews. 40 people, 20 phoneticians and 20 naïve listeners, participated in the online experiment on perception of Dutch dysarthric speech of PD patients. Results revealed that there must be prominent acoustic cues in the healthiness assessment task that trigger recognition of PD speech as unhealthy in naïve listeners, who were significantly more accurate than phoneticians, whose expertise did not prove advantageous in this task. In sentence type recognition, both Czech and Dutch phoneticians outperformed both naïve groups, indicating the added value of phonetic experience for the prosodic task. The effect of expertise therefore dependent on task type. Overall, Dutch listeners were more accurate than Czech listeners, suggesting a presence of language-specific cues in dysarthric PD speech.

(7)

Perception of dysarthric speech:

An exploratory cross-linguistic study of Parkinson’s disease identification from a Dutch voice

1. Introduction

Parkinson’s disease (PD) is a progressive neurological disorder characterized by both motor and non-motor features. Patients suffering from this disease are most noticeably affected with rest tremor, rigidity, brandykinesia, and postural instability (Mayo Clinic; Jankovic, 2007; Hayes, 2019). Hypokinetic dysarthria (HD) is a speech disorder and it manifests among the most frequent symptoms of PD as it affects up to 90% of patients (Ho, Iansek, Marigliani, Bradshaw, & Gates, 1999). The disorder arises from “disturbances in muscular control over the speech mechanism due to damage of the central or the peripheral nervous system” (Darley, 1969a: 246), which in turn mirror dopaminergic pathology of the basal ganglia (Pinto et al., 2004). In speech of patients affected by this disorder, these muscular disturbances translate into noticeable speech flaws: increased acoustic noise, reduced intensity of voice, harsh and breathy voice quality, increased voice nasality, monopitch, monoloudness, or speech rate disturbances (see Brabenec et al., 2017 for full review of the symptoms and their presence in contemporary research).

HD may largely affect patients’ mental wellbeing as well: the numerous speech flaws present in their oral delivery can negatively impact not only their social-linguistic competence in interactions with others (Peng, Cheang, & Leonard, 2006), but also their concept of self-identity and roles (Miller, Noble, Jones, & Burn, 2006; Shadden, Hagstrom, & Kostki, 2008). Studies researching the capacity of PD patients to communicate emotional qualities have found that listeners judge people suffering from hypokinetic dysarthria as sad, devoid of emotion, less interested, less involved or less friendly than speakers from a healthy population (Pell et al., 2006; Jaywant & Pell, 2010). Regarding these negative descriptors, it is perhaps not surprising that many patients feel incompetent and choose to withdraw from interactions (Miller et al., 2006). In a qualitative overview of the communication challenges, a university

(8)

professor diagnosed with PD talks about language, a tool he previously mastered, as losing its reliability and thus changing his sense of role and identity as a capable university professor: “I have to go through this thing of whether we ignore or whatever when I can’t talk, so I know I’m not as, I know I’m not as good as I was. And it is so hard to go from knowing you are so damn good at what you do, wondering what the hell did I do?” (Shadden et al., 2008: 123).

Given that HD may obstruct patients’ ability to successfully communicate to the extent of personal crisis as the one experienced by the university professor, it is of high interest to research the different contexts in which and to what extent dysarthric speech is detected. No cure has yet been discovered, but symptoms such as HD in some cases may be alleviated with medication, procedure, or therapy (Pinto et al., 2004; Dexter & Jenner, 2013; Hayes, 2019). Early and accurate diagnosis is thus essential. Although the knowledge of perceptual and acoustic parameters of PD speech has grown and automated technology to aid diagnosis is in development (Hazan et al., 2012; Orozco-Arroyave, 2015; 2016), many issues are still unresolved (Mekyska et al., 2016) and the diagnosis still lies in the hands of clinicians. Thus the aim of the PhD project of Vass Verkhodanova concerning the recognition of PD from multilingual voice, which the present study is a part of as it is a product of a research

internship led by Verkhodanova, is to translate the implicit knowledge of experts into explicit knowledge in order to facilitate PD diagnosis, allowing for early treatment and effective rehabilitation.

Despite the widespread acceptance of Darley’s dysarthria classification (Darley et al., 1969a; 1969b), the topic of who and how they perceive parkinsonian speech is still largely unresolved, even though it might hold a clue to which acoustic cues are prominent in the diagnosis of HD, and consequently of PD. A varied level of PD identification skills among clinicians, trainees, and speech language therapists was reported by Fonville at al. (2008) and Van der Graaf et al. (2009); their results point to the fact that listeners’ profession and

(9)

perceptive skills need to be investigated to determine who, and why, is capable to detect a dysarthric voice. Although there are some studies which discuss either professional or naïve listeners’ perception of dysphonic speech, there are very few which concern the comparison of the two expertise groups and their rating accuracy of dysarthric speech. Studies by Walshe, Miller, Leahy & Murray (2008) and by Smith et al. (2019) found no difference in

intelligibility rating between speech therapists and naïve listeners. However, Van Borsel & Eeckhout (2008) did, although the focus of the study was on stutter. Other studies (Wolfe, Martin, & Palmer, 2000; Kreiman, Gerratt, & Precoda, 1990) explain the nature of

professional rating and conclude that there indeed is a separate set of perceptual abilities in the case of clinicians, suggesting that the effect of expertise might be real.

Researching the effect of familiarity with the speakers’ language also holds a potential, as very little is known about dysarthric speech being either specific, or language-independent. Classifying HD, or at least some of its acoustic characteristics, as one or the other, could become another tool to aid diagnosis of PD. Unfortunately, even fewer studies are dedicated to cross-linguistic perception of dysarthria in PD patients than to the role of

expertise. What we know so far comes from research in automated machine learning (Hazan et al., 2012; Orozco-Arroyave, 2015; 2016). The results of these studies recommend

language-dependency, but they are difficult to extend to human perception. Research which does not pertain to dysphonic speech is the other source of knowledge in this area: Zhu (2013) demonstrated that the knowledge of the speakers’ language does not give an advantage to listeners familiar with the language.

The research concerning the effect of expertise and language familiarity on HD detection in comparative terms is thus unsatisfactory in number, despite the potential of the analysis of the listener to aid diagnosis. The present exploratory study will attempt to help fill that gap by comparing the perceptive abilities of different listeners. In a perception

(10)

experiment, the ability to accurately identify a health issue (HD in PD) and sentence type (statement, question) from a Dutch voice will be compared across four groups differentiated on the basis of the two underlying variables: language familiarity, and expertise. To test language familiarity, it is important to use relatively unrelated languages: Czech and Dutch are therefore an ideal choice. To test expertise, it is important that the listeners’ education and expertise differ: university trained listeners and phonetically naïve (untrained) listeners satisfy that criteria. The following groups thus emerge – Czech phoneticians, Czech naïve listeners, Dutch phoneticians, and Dutch naïve listeners. The aim of the study is to find out whether expertise level and familiarity with the speakers’ language play a significant role in the detection of HD.

The thesis is divided into several conventional parts. The following section undertakes to discuss relevant literature in relation to PD and HD, as well as speech analysis as a

diagnostic tool, and the research pertaining to various listening groups; the PhD project to which the present study contributes is discussed as well. After introducing the theoretical framework and stating research questions and hypotheses, the design and preparation of the material for the perceptual test are explained. What follows is the documentation of the results of the experiment, and then a discussion in which these results are linked back to literature and the research questions are answered. Recommendations for future research, limitations of the study, and its practical implications are discussed within the conclusion of the thesis.

2. Literature Parkinson’s disease

Neurodegenerative diseases occur, as the denomination suggests, when neurons in the central or the peripheral nervous system gradually lose their function until they ultimately die, causing irreparable damage to the nervous tissue and the bodily functions under its control. PD is the second most frequently occurring of such diseases, with only Alzheimer’s disease

(11)

being more prevalent (Hayes, 2019). The disease is estimated to affect some 4 million people worldwide and the prevalence in industrialized countries is approximated to be 0.3% (Dexter & Jenner, 2013). The incidence generally increases with age, although it may stabilize in the 80+ age group, in which reduced heterogeneity across genders is usually noticed (Hirsch, Jette, Frolkis, Steeves, & Pringsheim, 2016) and in which the incidence rate is calculated at 3% (Dexter & Jenner, 2013). As the systematic review by Hirsch et al. (2016) shows, most studies indicate that males have a higher incidence of PD in all age groups, but this difference is significant only in the age range groups 60-69 and 70-79.

Unfortunately, the causes of PD are still largely unknown, although genetics,

epigenetics, and environmental triggers are suspected to play a key role in the onset and the progress of the disease (Lill, 2016; Ball, Teo, Chandra & Chapman, 2019). Etiologically, there are two types of PD: genetic, inherited either in an autosomal dominant or recessive manner and accounting for 10-15% of all patients, and sporadic (also called idiopathic), assumed to develop from gene-environment interactions (Ball et al., 2019; Lill, 2016). Despite the lack of specific knowledge about the origins of the disease, recent decades brought much needed information about the nature of PD after its onset. The disease can be neuropathologically described by the loss of dopamine in a part of the brain called substantia nigra pars compacta, which consists of dopaminergic neurons and is located in the basal ganglia in the midbrain (Lill, 2016; Kucinski & Sarter, 2016). Communication in this part of the brain is predominantly induced by the neurotransmitter dopamine. Thus, the loss of this type of cell results in the diminished functioning of the basal ganglia due to reduced

innervation of excitatory and inhibitory pathways; this in turn leads to a reduction of

projections to the motor cortex and the brain stem (Kucinski & Sarter, 2016). This process of dopamine loss corresponds to the clinical motor hallmarks of PD (Lill, 2016).

(12)

PD exhibits various non-motor symptoms. However, the research of non-motor symptoms is still poor in number and under-appreciated, even though their impact can be as debilitating to the patients’ quality of life as motor symptoms. Non-motor symptoms may include cognitive decline, depression and anxiety, autonomic dysfunctions, loss of smell, or sleep disturbances (Hayes, 2019; Jankovic, 2007; Dexter & Jenner, 2013). The Mayo Clinic review describes more concrete instances of these dysfunctions, such as bladder problems, constipation, or blood pressure changes (Mayo Clinic).

Unlike non-motor symptoms, the four major motor symptoms have been linked to the pathology of the basal ganglia (Dexter & Jenner, 2013) and can be usefully summarized using the acronym TRAP: tremor at rest, rigidity, akinesia (or brandykinesia), and postural

instability (Jankovic, 2007; Hayes, 2019; Dexter & Jenner, 2013). Tremor is usually unilateral as the shaking movement is typically seen in only one extremity at first. It is most prominent in a posture of repose and can be suppressed with movement; rest tremor is slower than a classic essential tremor, with frequency of 4-6 Hz, and apart from hands it may also affect lips, chin, jaw and legs (Hayes, 2019; Jankovic, 2007). Rigidity, or muscle stiffness, is characterized by increased muscle resistance in proximal and/or distal movement; it is often associated with pain, and a painful shoulder especially is a frequent initial manifestation of PD (Jankovic, 2007). Brandykinesia describes slowness of movement and simplification of complex motor tasks and it is the most characteristic clinical feature of the disease; this symptom encompasses trouble with planning, initiation and execution of movement, and it includes loss of spontaneous movements and gesturing (Hayes, 2019; Jankovic, 2007; Dexter & Jenner, 2013). Linked to brandykinesia, postural instability manifests due to loss of

postural reflexes and leads to the onset of falls; instability, along with postural deformities characterized by abnormal flexed positions, appears in the later stages of PD, usually after the onset of other clinical features (Jankovic, 2007) These postural difficulties are in some cases

(13)

also accompanied by freezing, an intermittent arrest of motor functions (Hayes, 2019; Jankovic, 2007).

Both motor and non-motor symptoms come together in a distinctive speech disorder called hypokinetic dysarthria (HD), developed by up to 90% of PD patients (Ho, Iansek, Marigliani, Bradshaw, & Gates, 1999). The disorder is often attributed to weakness and slowness of musculature, rigidity and rest tremor – motor features which mirror the

dopaminergic neurodegeneration discussed earlier (Pinto et al., 2004). The fusion of the two types of symptoms within HD can be exemplified as follows. The motoric inability of PD patients to have a functional muscle control, such as brandykinesia, may turn even the most benign daily tasks such as getting out of a chair, brushing teeth, or even talking into

impossible challenges. This in turn may lead to a psychological decline, a frequent non-motor symptom, as well (Shadden, Hagstrom, & Koski, 2008) since all these difficulties often negatively affect interaction with others and even the notion of self-concept of the patients (Miller, Noble, Jones, & Burn, 2006). As HD is a clinical feature central to the experiment concerning this thesis, the following section discusses the disorder in further detail.

Hypokinetic dysarthria

Dysarthria, in general, is a collective name for a group of speech disorders caused by weakness in muscles we use for speech production and by a difficulty in controlling these muscles. A broad definition presented by the Mayo Clinic characterizes the condition as “slurred or slow speech that can be difficult to understand” (Mayo Clinic). The causes of dysarthria stem from cerebral damage at the level of brainstem nuclei, supra nuclear brain dysfunction or neuromuscular impairment (Pinto et al., 2004), which in turn may arise from head and brain injury, stroke, brain tumor, and various neurodegenerative conditions (Mayo Clinic). As the disorder designates issues with oral communication due to paralysis,

(14)

dysarthria from disorders of higher centers related to the defective programming of movements and sequences of movements (like apraxia of speech), and to the ineffective processing of linguistic units (like aphasia) (Darley, Aronson, & Brown, 1969a). The

conditions that may result in dysarthria include ALS, Huntington’s disease, Multiple sclerosis, and, of course, Parkinson’s disease, whose own dysarthria type – hypokinetic dysarthria – is the focus of the present study.

The seminal framework to understanding dysarthrias and their classification was made by Darley et al. (1969a;1969b). His classification of the different types of dysarthria is still widely used in both research and clinical purposes and is referred to as the Mayo system, after the clinic with whose affiliation the research has been carried out. The first of his two

companion papers identifies the link between speech pathology and perceptual speech characteristics and recognizes the fact that speech indeed follows neuroanatomy and

neurophysiology; this, in practical terms, implies that there are various patterns of dysarthria, each reflecting a different abnormality of motor functioning and each sounding differently. Darley thus identified five types of dysarthria: flaccid dysarthria (in bulbar palsy), spastic dysarthria (in pseudobulbar palsy), ataxic dysarthria (in cerebellar disorders), hyperkinetic dysarthria (in dystonia and chorea), and, of course, hypokinetic dysarthria (in parkinsonism) (Darley et al., 1969a).1 In a second, complementary study, Darley et al. (1969b) used

correlation matrices to determine distinctive clusters of dysfunction in seven neurologic disorders, including PD. The results yielded cues to co-occurrence of neuromuscular defects and led to an identification of the likely neuromuscular bases of individual speech

dimensions.

1 An additional, a mixed dysarthria was identified as well, combining elements of flaccid and spastic dysarthrias

(15)

In summary, HD according to Darley (1969b) is “manifested outstandingly by paucity of movement and by tremor” which occurs “chiefly at rest and is abated by movement;” it is not mirrored in speech, as “there is no true paralysis but there is marked reduction in the automatic aspects of movement” (Darley et al., 1969b: 469). Speech flow is characterized as having slow single movements, but abnormally fast repetitive movements of very reduced range, with the resulting speech being comparable to shivering.

The dysfunction dimensions linked to parkinsonism are illustrated in Figure 1, which shows that the cluster most suitable to characterize HD is prosodic insufficiency. Physiologically, this cluster of deviant speech dimensions is due to a “reduction in the peaks of accent of train of repetitive movements,” with the neuromuscular defect being the “reduction of range of movements” (Darley et al., 1969b: 480). The symptoms typical for this type of dysarthria and correlated to prosodic insufficiency are monopitch, reduced stress, monoloudness, imprecise consonants, short rushes of speech, and variable rate; uncorrelated dimensions include inappropriate silences (related to the difficulty initiating movements), breathy voice, harsh voice, and low pitch (all three related to the rigidity of laryngeal musculature).

Figure 1

Cluster of deviant speech dimensions in hypokinetic dysarthria, taken from Darley (1969b: 487)

(16)

Further research focusing on HD and its manifestations in the parkinsonian voice confirms Darley’s results: more than 40 years since the Mayo system has been introduced, phonation, articulation, prosody and speech fluency are still areas affected by dysarthria (Mekyska et al., 2011; Rusz, Cmejla, Ruzikova, & Ruzicka, 2011), with phonetic/acoustic level characterized by several pauses and impaired prosody being the most marked features (Boschi et al., 2017). Contemporary studies show that monopitch and smaller pitch range, impaired speech, reduced vowel space or harsh voice quality are still among the leading cues in detecting HD from a voice sample (Rusz et al., 2011; Mekyska et al., 2011; Galaz et al., 2015; Hsu et al., 2017; Strinzel, Verkhodanova, Jalvingh, Jonkers, & Coler, 2017;

Verkhodanova & Coler, 2018; Verkhodanova, Coler, Jonkers, de Bot, & Lowie, 2019b; Gomez et al., 2019). Of course, not all studies bring forward identical results. Hsu et al. (2017) did not report a significantly altered speech rate, most probably due to language-specific features of Mandarin, in which the study was conducted; neither does Skodda & Schlegel (2008) in their speech rate oriented study, which found no significant differences for speech rate and rhythm. Speech rate is thus one of the less consistent features. Galaz et al. (2015) did not report significant monoloudness, but this study was marked by a small patient cohort and further research is thus required. Rusz et al. (2008) did not detect a significant difference in pitch variation between patients and healthy controls, but the study’s limitations stress the unique nature of its cohort. Still, despite these marginal differences, the general trends speak in favour of the classification and the cluster introduced by Darley et al. (1969a; 1969b).

Speech analysis as a diagnostic tool in PD

However, despite the wide use of the Mayo system and its applicability in terms of the individual speech defects described in the previous section, there are some doubts about its attainability in terms of clinical practice. As Verkhodanova et al. (2019a) pointed out, two

(17)

independent studies tested the classification accuracy among neurologists and neurology trainees (Fonville et al., 2008) and among neurologists, residents in neurology and speech therapists (Van der Graaf et al., 2009), both of which reported accuracy between 35% and 40%. Perceptual judgement based on the Darley classification alone is thus not reliable enough and other resources need to be considered for clinical purposes. This need motivated numerous studies whose research foci are the dimension of articulation, such as vowel space, F1 and F2 variability or the vowel articulation index (Rusz et al., 2011; Strinzel et al., 2017; Hsu et al., 2017; Verkhodanova & Coler, 2018) and the dimension of prosody, such as speech rate or F0 trajectory and variance, which correspond to what is perceived as pitch (intonation) (Rusz et al., 2011; Galaz et al., 2015; Verkhodanova & Coler, 2018) of dysarthric speech.2

While different patient cohorts are usually considered unique, it is generally believed that automatic measurements of these acoustic and perceptual correlates should be useful in most of the countries across the world, as their mathematical nature is independent of

language (although there are exceptions in these results, as can be demonstrated on Mandarin in Hsu et al., 2017) (Rusz et al., 2011). Mekyska et al. (2016) demonstrated that perceptual features, although clinically often inexplicable, can be good markers of PD as they highly correlate with clinical features. The assumption that speech analysis is a useful tool in clinical practice appears to be also supported by machine learning classification based on the acoustic and perceptual features. Exploratory results showed that this technology can be used to train and test different data sets (Hazan, Hilu, Manevitz, Raming, & Sapir, 2012; Orozco-Arroyave, 2015; 2016) and that acoustic analysis of phonation in combination with machine learning modeling can predict the progress of PD (Galaz et al., 2018); the performance of automatic speech recognition technology can also be predicted from perceptual disturbances in

2 The measures presented here are only a selected few, as these particular ones appear most frequently in studies

researching speech correlates of hypokinetic dysarthria; however, there are many more. Rusz et al. (2011) offer a useful overview of all these measures in Table 3 on page 354.

(18)

dysarthric speech, most precisely from articulatory precision followed by prosody (Tu, Wissler, Berisha, & Liss, 2016).

Perceptual and acoustic evaluations of dysarthric speech in PD patients are

traditionally conducted using a multitude of speech and vocal tasks which allow for controlled quantification of the various features of HD (for overview of their typology and use in

research, see Brabenec, Mekyska, Galaz, & Rektorova, 2017); it has been shown that the characteristics of dysarthria differ depending on task type, of which some are more

representative of PD speech difficulties than others (Kempler & Van Lacker, 2002; Brabenec et al., 2017; Verkhodanova et al., 2019a). Kempler & Van Lacker (2002) assessed five production tasks (spontaneous speech, repetition, reading, repeated singing, spontaneous singing) and presented clear results – that spontaneous speech showed critical differences from non-spontaneous tasks in its different acoustic features of dysarthria. Furthermore, spontaneous speech tasks come the closest possible to naturally occurring speech, it is thus representative of the daily speech difficulties of patients diagnosed with dysarthria. This trend was confirmed in Orozco-Arroyave et al. (2015) where monologue task had the highest identification success rate at 98%. Martens et al. (2011) assessed reading and imitation tasks (boundary marking task, focus task, sentence typing task, emotional prosody task) and observed that significant differences between patients and healthy controls (HC) arose in imitation tasks, not in reading tasks; focus and sentence typing tasks proved to be the most problematic for the speakers and thus representative of the dysarthric struggle, while

emotional prosody appeared to be too vague to be considered a useful tasks in identification of Parkinson’s disease patients. The recent results of Verkhodanova et al. (2019a) showed even narrower assessment of task effect: sentence typing task was evaluated as the clearest task and the only one correlating to prosodic measures of F0 trajectories.

(19)

Listener effect

While the exploratory results of automated speech recognition, such as that discussed by Hazan et al. (2012), are promising, diagnosis of HD and PD is still in the hands of

clinicians. However, their perceptive abilities are varied (Fonville et al., 2008; Van der Graaf, 2009), an assumption which suggests that the matter of who the audience is an important factor to be considered as well. To that end, various perception experiments were carried out in the past decades to figure out who and to what effect is able to perceive dysarthric PD speech. To the best of the author’s knowledge, studies on the perception of dysarthric speech are scarce, especially those which use expertise as a differentiating factor between groups; the following section thus aims to introduce an overview of the contemporary knowledge of this area.

Perhaps the most relevant study to this thesis is by Walshe, Miller, Leahy & Murray (2008) whose extensive experiment compared, among others, the perception and rating of (Irish) English dysarthric speech from the point of view of dysarthric speakers, speech language therapists, and naïve listeners, and which had mixed results: they showed no significant differences between the three groups. However, intra-rater reliability was lower, though significant, for the trained group of listeners, suggesting that “some change in construing the intelligibility of the repeated speech sample occurred as the task progressed” (Walshe et al., 2008: 645) and that training might influence speech perception as clinicians listened to different aspects of the speech sample. This result confirmed the model proposed by Kreiman, Gerratt & Precoda (1990), which claimed that naïve listeners employed

comparable strategies when listening to dysphonic population, while clinicians differed on an individual basis. A similar conclusion regarding clinicians’ varying strategies was also reached by Wolfe, Martin & Palmer (2000), who found that clinicians (the experiment analyzed impressions of students in an introductory course) can become progressively

(20)

perceptually aware of “high frequency noise components as dimensions of dysphonic voice quality” (Wolfe et al., 2000: 703). Smith et al. (2019) did not find any significant differences between naïve listeners and speech language therapists in training either when comparing their intelligibility ratings of dysarthric speech. Van Borsel & Eeckhout (2008) also studied the impressions of three groups (naïve listeners, speech pathologists, and speakers with stutter), but the studied phenomena was not hypokinetic dysarthria, but stutter under delayed auditory feedback; the study is thus less relevant to the topic of this thesis than others. Still, unlike in Walshe et al. (2008) the described significant differences between groups: naïve listeners were more severe in their rating than speech pathologists, who in turn were more severe than stuttering speakers and their own ratings. It will be interesting to see the results deduced from data collected by Verkhodanova et al. (2019a), who also plan to compare the speech rating of naïve listeners and speech language therapists; the results which were so far reported suggested that naïve listeners were sensitive to prosodic changes in a single

dysarthric speaker over time.

An important factor to consider is task type. The stimuli used in Walshe et al. (2008) were extracted from reading tasks; since then, elicitation and free speech tasks have been assessed as more representative of difficulties typical for dysarthric PD speech (Kempler & Van Lacker, 2002; Martens et al., 2011). The study of Martens et al. (2011) analyzed the impressions of speech language pathologists to evaluated various speaking tasks, with the results demonstrating high interrater agreement for elicitation tasks and low interrater agreement for an emotional prosody task, which was found too obscure even for the professional ear. The study thus demonstrated that clinicians’ tendency to use different strategies (Kreiman et al., 1990; Walshe et al., 2008) depended on task type. Even though emotional prosody is difficult to map clinically, it is still a popular task type when assessing social-linguistic competence judged by naïve listeners. Dysarthric PD speakers tend to be less

(21)

successful at communicating stress distinctions and emotional qualities, and as a result, naïve listeners are less able to detect intended emotion (similar to professionals in Martens et al., 2011) and they characterize PD patients as sad or entirely devoid of emotion (Pell, Cheang, & Leonard, 2006) and as less interested, less involved, or less friendly than healthy population (Jaywant & Pell, 2010). The sound of perceived otherness shifts personality evaluations by naïve listeners towards negative traits in general, as can be demonstrated in accent research such as the study of Volín, Poesová & Skarnitzl (2014) which showed the negative impact of rhythmic distortions in speech on personality assessment. These conclusions are useful as they supplement qualitative publications about life with communicative challenges typical for PD (such as Miller et al., 2006, or Shadden et al., 2008). However, clinically they are less likely to aid diagnosis (Martens et al., 2011).

The multinational listener and their familiarity with the language of the speaker is yet another aspect which plays a role in the perception of dysarthric PD speech. Zhu (2013) presents perhaps the most relevant study to this thesis regarding language familiarity as he compared the perception of Chinese emotional prosody by Chinese native speakers, naïve Dutch listeners, and Dutch L2 learners of Chinese. Although Zhu analyzed healthy speakers in a task previously discussed as clinically less important, the study concluded that there was indeed a significant difference in speech perception between the three language groups. Chinese native listeners did not show an advantage in identifying emotions accurately and confidently; naïve listeners unfamiliar with Chinese were as successful as native listeners, and advanced Dutch learners of Chinese reported even better results than native Chinese. The remaining knowledge on multilingual recognition arose from research on machine learning and automatic classification of dysarthric PD speech. Hazan et al. (2012) demonstrated on English and German that while both training and testing phases of machine learning process from one language could be reused in another linguistic setting, different features sets were

(22)

necessary for optimal results. Orozco-Arroyave et al. (2015, 2016) published an interesting pair of studies in which several cross-language experiments with Czech, German, and Spanish datasets were carried out to test the capacity of automated detection in multinational contexts. The datasets were trained in one language context and then tested in another, resulting in various recommendations about the percentages of recordings to be added from testing to training sessions to boost accuracy up to even 100% (Orozco-Arroyave et al., 2016). While the results showed great potential of this methodology, they also demonstrated the need of target language data required during training. As naïve human listeners usually do not receive such a training, it can be assumed that listeners who are not familiar with the language of the speakers might be less successful than familiar, native listeners. The results of the technology-based studies are thus contrary to those conducted with human listeners; this difference may possibly be due to the fact that the automated recognition studies focus on segmental features (Hazan et al., 2012; Orozco-Arroyave, 2015; 2016) while Zhu (2013) discusses prosody.

Recognizing PD from multilingual voice

As this thesis is a product of a research internship undertaken as a part of the MA Applied Linguistics program, it is necessary to address the current research of Vass

Verkhodanova, MA, a PhD candidate at the University of Groningen, whose own dissertation project, titled “Recognizing Parkinson’s disease from multilingual voice” provided the

seminal framework and procedural guidance for this study. Under the supervision of prof. dr. Wander Lowie, dr. Matt Coler and dr. Roel Jonkers, the aim of this PhD project is to analyze acoustic cues that characterize PD speech and to determine on which of these cues experts rely during diagnosis. The following discussion sheds light on the major trends in dysarthric speech so far detected in the project.

Three studies linked to the PhD research brought a contribution to the perception of dysarthric voice in PD patients. The first study clearly demonstrated the adequacy of acoustic

(23)

measurements of prosody and the articulation of vowels to differentiate dysarthric speech in Dutch speakers from their non-pathological counterparts (Verkhodanova & Coler, 2018). The study thus confirmed conclusions of many previous studies conducted in different languages, namely that monopitch was a significant tendency for patients with hypokinetic dysarthria (Mekyska et al., 2011; Galaz et al., 2015; Hsu et al., 2017), as well as imprecise vowel articulation with a centralization trend within the vowel space and the speakers’ steadiness in achieving vowel targets (F1 variability) (Rusz et al., 2011; Strinzel et al., 2017; Hsu et al., 2017). Unlike previous research (Skodda & Schlegel, 2008; Hsu et al., 2017), the study also suggested that speech rate distribution in dysarthric speech was significantly different from healthy controls; this inconsistency was found likely due to in-group variation and difference in methodology and thus requires further investigation. Narrowing in focus from both acoustic and prosodic aspects to prosody only and shifting in methodology from a cross-sectional to a longitudinal study, the following study analyzed PD speech from the perspective of both production and perception (Verkhodanova et al., 2019a). A series of monthly recordings of the same PD patient yet without HD diagnosis were made for over a year; these recordings were then used to perform acoustic analyses and a perceptual experiment, in which listeners rated the healthiness of stimuli extracted from the recordings in a randomized order. Unlike in the previous study, the results showed significant changes only in F0 and jitter, both of which are related to pitch; this result was in line with previous research as it again reported

monopitch as one of the earlier symptoms of HD. The results of the perceptual experiment validated this conclusion, as later recordings were rated as less healthy. The study’s

experiment also explored differences in rating between speech therapists and naïve listeners, but those are not yet published. The third study, however, brought attention to the area of perception as it focused on the impact of dysarthric prosody on naïve listeners’ recognition (Verkhodanova et al., 2019b). Listeners were asked to rate intended intonation and stress

(24)

patterns produced by speakers suffering from different types of dysarthria. The results reported that different diseases affected the intelligibility of prosody differently, with stimuli produced by PD patients producing the poorest performance, showing that recognition of HD is especially troublesome for the untrained ear. The study further explored the reliability of different tasks for assessing prosody deficiency and the results recommended sentence type recognition as the clearest task and as the only task showing correlation with F0 trajectories estimation (a result which confirms sentence typing assessment in Martens et al., 2011).

So far, Verkhodanova’s research clearly shows that acoustic analysis is a useful tool in dysarthria recognition. The most consistent finding is that F0 trajectory appears to be a

meaningful cue for differentiation between healthy and dysarthric speech. However, it cannot act as a single predictor and further research targeting other acoustic cues is necessary

(Verkhodanova et al., 2019b). Secondly, the research also points out the potential importance of listeners’ perception as a guidance to determining which perceptual cues are diagnostically reliable. In conclusion, it appears that acoustic analysis and listeners’ perception are both potentially informative resources for further exploration of the link between acoustic and perceptual cues in dysarthria research. The present study undertakes to offer further insight into the perception of dysarthric speech, in line with the procedures and the assumption so far outlined by Verkhodanova’s project; the study’s concrete aims and hypotheses are discussed below.

2.1 Statement of purpose

While the number of studies researching segmental and prosodic features of PD patients suffering from HD is growing and bringing more understanding to the classification system first introduced by Darley et al. (1969a; 1969b), there is still a large gap in research pertaining to the perception of dysarthric speech characterized by those features. Thus, to facilitate the knowledge of diagnosis arising from perceptual analysis of parkinsonian speech,

(25)

further research into who and to what extent they can detect HD is necessary. The present study thus undertakes to contribute to this field by collecting health rating responses and sentence type identification responses during a perception experiment and by analyzing these responses with regards to the effect of language familiarity and expertise to see to what extent does experience with speech sounds and the comprehension of the verbal content play a role in the recognition of HD. The four groups of listeners who took the listening test were Czech naïve listeners who are not familiar with the Dutch language, Czech phoneticians who are not familiar with the Dutch language, Dutch naïve listeners, and Dutch phoneticians. The study thus aims to answer the following main research question:

Do listeners perceive dysarthric speech of PD patients differently on the basis of their familiarity with the spoken language, and on expertise?

Several secondary research questions concerning the four subgroups and the different listening tasks the participants completed will need to be answered as well to provide well rounded answer to the primary research question:

1. Is there a significant difference in the listeners’ ability to accurately distinguish PD speakers from HC speakers on the basis of perceived healthiness rating between Dutch phoneticians, Czech phoneticians, Dutch naïve listeners, and Czech naïve listeners?

2. Is there a significant difference in the listeners’ ability to correctly recognize sentence types from the voice of PD speakers between Dutch phoneticians, Czech phoneticians, Dutch naïve listeners, and Czech naïve listeners?

3. Is there a significant difference in the listeners’ ability to correctly recognize sentence types from the voice of HC speakers between Dutch phoneticians, Czech

(26)

Past research has demonstrated mixed results for the effect of language familiarity and expertise in perception of speech of PD patients by human listeners (as opposed to automated identification and machine learning); furthermore, not all the studies pertained to the

phenomenon of HD. Still, based on literature, we hypothesize that:

1. Phoneticians will be less accurate in identifying an underlying health issue (HD) in the speech sample than phonetically naïve listeners, following Walshe et al. (2008) and Smith et al. (2019), and contrary to Van Borsel & Eeckhout (2008). Even though the cohort of phoneticians have teaching experience in the field of phonetics and phonology or have at least obtained a BA title in Phonetics & Phonology, their trained perceptual abilities will not be enough to distinguish between healthy and dysarthric voice. In a listening task, in which the participants will be asked to describe a

recorded phrase as either a sentence or a question, opposite results are expected, given that phoneticians are likely to have more sensitive perceptive abilities than untrained listeners do (Wolfe at al., 2000). Phoneticians are thus expected to be identify sentence type in both healthy and dysarthric voices more correctly than naïve listeners.

2. There will not be a significant difference in the accuracy with which Dutch and Czech listeners identify dysarthric PD speech and with which they identify sentence type. Although Czech listeners in the experiment do not comprehend Dutch, this language difference will not prove disadvantageous following Zhu (2013). Although his study did not focus on dysarthria, the results concerned the effect of language familiarity on the perception of human listeners and they were linked to the dimension of prosody; since the present study also uses larger units of speech as the material for its

perception test, rather than isolated segmental information, Zhu’s study is perhaps more likely to correspond with the current study. This is contrary to Hazan et al.

(27)

(2012) and Orozco-Arroyave (2015;2016), based on whose results a difference in language groups could be expected, but whose research focused on technology-based recognition and on the segmental dimension and is thus not as relevant to the present study, whose purpose is human perception of dysarthric prosody. Familiarity with language will not be an effect.

3. The effect of expertise will be stronger than the effect of language familiarity, of which not much is yet known. In comparisons between different nationalities within the same expertise group, the results will be similar to those outlined in the second hypothesis, namely that there will not be a significant difference in the accuracy between Czech experts and Dutch experts, and between Czech naïve listeners and Dutch naïve listeners, as similar conclusions were reached in Zhu (2013). This is expected in both listening tasks in the perception test.

3. Method

To investigate whether expertise level and language familiarity affect the ability to identify a PD patient from a healthy speaker and the ability to correctly identify sentence type from a voice only, a perception test was administered to four different groups of participants. Their results were subjected to cross-comparisons in subsequent analyses. Czech

phoneticians, Czech phonetically naïve people, Dutch phoneticians, and Dutch phonetically naïve people all took an online listening test, the study thus had a between-subjects design. The test had two parts. In the first part, the participants were asked to assess the healthiness of the voice played to them (“Did the voice sound healthy to you?”); in the second part, they were asked to assess its intonation (“Was this sentence a question or a statement?”). The short Dutch phrases played in the test were recorded with healthy controls and PD patients with HD. The independent variables that received attention were the familiarity of the participants with Dutch (Czech and Dutch) and speech expertise of the participants (phoneticians and

(28)

phonetically naïve listeners); the dependent variable was their ability to recognize dysarthric voice in PD patients, which was operationalized as unhealthiness-recognition-accuracy score for data obtained in the first listening task, and their ability to correctly identify sentence type, operationalized as sentence-type-recognition-accuracy in the second task.

3.1 Participants

There were 40 participants in the experiment. These were found through convenience sampling, as native Dutch and native Czech listeners with a specific educational background were sought for the experiment; besides a specific L1 and familiarity with Dutch, education in speech sciences such as phonetics and phonology, or the lack of it, were key criteria. Czech and Dutch naïve listeners were found through common acquaintance and so were, in most cases, Czech phoneticians; Dutch phoneticians, and a few Czech phoneticians, were found through search of linguistic departments of various universities and by contacting professors and assistants from the fields of phonetics via email. The nature of group membership was determined by the researcher’s selection; additionally, each participant was asked about their linguistic and phonetic expertise in a questionnaire.

The participants were divided into groups based on their L1 and expertise: Czech naïve listeners, Dutch naïve listeners, Czech phoneticians, and Dutch phoneticians. In the whole sample (n = 40), the number of males and female was balanced (n(f) = 21, n(m) = 19) and the age ranged from 19 to 60 (M = 33.3, SD = 11.4). The demographics of the groups were relatively similar, except Dutch experts, who reported higher age (M = 40.9, SD = 14.2) and more male than female participants than the other three groups. Separate characteristics of all groups are summarized in Table 1.

Table 1

Subjects’ group and total characteristics

Czech expert Czech naïve Dutch naïve Dutch expert All together n 10 10 10 10 40

(29)

n females 6 6 6 3 21

n males 4 4 4 7 19

M age 30.7 31.7 29.7 40.9 33.3

age range 53 - 19 52 - 23 52 - 19 60 - 19 60 – 19

SD age 10.3 9.4 11.5 14.2 11.4

The expert groups held a profession in the field of speech sciences (all Dutch phoneticians) or had at least obtained an undergraduate degree in the Phonetics & Phonology (the case of a few Czech phoneticians). The naïve groups held degrees and professions from various fields except speech sciences. None of the Czech speakers spoke Dutch as a foreign language. All but one subject reported healthy hearing; one naïve participant reported a minor hearing disability, but as their answers did not significantly differ from the other subjects in the same group, the subject’s responses were kept to preserve equal group sizes.

3.2 Materials Data selection

The first step in the project was to select appropriate speakers, whose speech

recordings would be used as stimuli in the perception test. The researchers behind the ongoing PhD project about the perception of dysarthric speech in multilingual contexts compiled a bank of Dutch speech samples; these samples were collected in cooperation between the Faculty of Arts and Campus Fryslân (both parts of the University of Groningen) and the University Medical Center Groningen. A precise protocol, devised by den Hollander, Coler, Verkhodanova and Timmermans (den Hollander et al., 2017) for the purposes of the project, was engaged to collect various speaking tasks from healthy control and from patients

suffering from various neurodegenerative diseases, including PD. The procedure during which the data were obtained is described in Verkhodanova et al. (2019a), a study which used the same compilation of recordings for its own perception test:

(30)

dysarthric symptoms due to neurological disorder according to the neurologist. Speakers reported (corrected-to) normal vision and hearing and signed informed consent. Exclusion criteria for patients were cognitive problems assessed by Minimal Mental State Examination (MMSE < 26), brain damage caused by stroke

that inflicted aphasia and/or apraxia of speech, and language and/or (motor) speech disorders other than dysarthria. Exclusion criteria for healthy controls were cognitive problems (MMSE < 26), brain damage, language and/or (motor) speech disorders. The recording sessions took place in quiet rooms at the University Medical Centre Groningen or at participants’ homes with the TASCAMDR100 recorder and an external Senheiser e865 microphone placed at around a

40 cm distance from a participant. (p. 513)

All speakers signed a consent form allowing collection and management of their recordings. The collection and subsequent analysis of the material were approved by the Medical Ethics Committee of the University Medical Center Groningen. The data was processed

anonymously, using code numbers such as “PT01” instead of names, and was managed carefully with the sensitivity of medical data such as this in mind; to gain access to this databank, a signature of a non-disclosure agreement was required.

After signing the non-disclosure agreement, it was possible to gain access to the data bank to choose those speakers whose recordings would be used in the test. Uniformity was a key selection criterium: the selected speakers thus reported Dutch as their L1, as opposed Frisian, which is common in area where the data was collected. They did not have a

noticeably strong regional accent (this factor was controlled for by listening inspection) and their age was within a similar range (outliers who were significantly younger were not used). After careful consideration, 30 healthy controls (hereafter the HC group) and 30 patients officially diagnosed with idiopathic Parkinson’s disease (hereafter the PD group) were

(31)

chosen. The mean age of the HC group was 66.7 (SD = 7.52, range = 79 – 45) and none of these speakers exhibited cognitive impairments as assessed by the Minimal Mental State Examination (MMSE, M = 28.9, SD = 1.2). The mean age of the PD group was 62.7 (SD = 9.7, range = 81 – 4) and none of these speakers exhibited cognitive impairments either, with the average MMSE score 28.1 (SD = 1.3, range = 30 – 26). All these patients were described by their neurologist as showing symptoms of HD. Type and timing of medication treating PD symptoms were not controlled for, as the aims of the study were independent of medication type and, furthermore, as the effects of medication on dysarthric speech in PD were still inconsistent (Pinto et al., 2004; Dexter & Jenner, 2013; Brabenec et al., 2017). Given the total number of speakers in the bank and the unbalanced number of males and females, gender as a factor could not be controlled and matched between groups. The group characteristics are summarized in Table 2.

Table 2

Speaker group characteristics

Group Gender Age

Duration of complaints Duration of disease MMSE score

in years in years in years

M (SD) M (SD, range) M (SD, range) M (SD) PD group male 65.7 (8.82) 12.0 (7.46, 32 – 1) 11.0 (7.83, 32 – 1) 28.0 (1.27) n = 30 n = 14 female 58.1 (10.49) 12.0 (6.17, 22 – 1) 10.4 (6.16, 22 – 1) 28.3 (1.49) n = 16 HC group male 67.9 (7.47) - - 28.9 (1.07) n = 30 n = 22 female 66.4 (7.71) - - 29.2 (1.13) n = 8

The last step in data selection was choosing which speaking tasks would serve as a source of phrases to be used in the perception test. After careful consideration of time constraints of the test and the validity of the tasks (in being representative of the issues with

(32)

which PD patients typically struggle), two tasks were selected as appropriate for the purposes of the present study: a free speech interview (Kempler & Van Lacker, 2002), and a prosody elicitation task focusing on sentence typing (Martens et al., 2011; Verkhodanova et al., 2019a) for the second task. The free speech interview task was selected as the best possible

representative of naturally occurring speech and was elicited by asking the question “Kunt u mij iets vertellen over uw eerste baan?” (“Can you tell me about your first job?”) (den

Hollander et al., 2017: 8). Phrases taken from recordings obtained in this task were later used in the first listening task of the perception. An exception had to be made in the case of one speaker, as their free speech interview task was unavailable due to technical reasons; free speech picture description task was used in its place and it was elicited by asking the question “Kunt u beschrijven wat u op deze afbeelding ziet?” (“Can you describe what you see in the picture?”) (den Hollander et al., 2017: 7). The second task was a prosody elicitation task focusing on utterances intended as either statements or questions (sentence typing). It was selected because intonation at the end of a sentence and its marked variations (F0 trajectories), such as a fall in statements and a rise in questions, is usually problematic in the prosody of dysarthric PD speech (Martens et al., 2011). These speech samples were elicited by asking the speakers to produce a question and a statement based on the exemplary sentence provided by the interviewer (den Hollander et al., 2017: 8); the list of sentences produced by the speakers in this task can be found in Table 3. After the selection, 60 audio files for further preparation of the perception test were obtained from the databank.

Table 3

Sentences produced during prosody elicitation task focusing on sentence typing (den Hollander et al., 2017: 8)

Sentence type Sentence produced

(33)

Statement Zij heeft de schietschijf geraakt.

Question Heeft hij de kater gevoerd?

Statement Hij heeft de kater gevoerd.

Question Heeft hij trompet gespeeld?

Statement Hij heeft trompet gespeeld.

Question Heeft zij op haar vriendinnen gewacht?

Statement Zij heeft op haar vriendinnen gewacht.

Question Wie heeft het vervallen huis gekocht?

Statement Piet heeft het vervallen huis gekocht.

Preparation of auditory stimuli

Stimuli for the first part of the test were prepared from the recordings of the free speech interview task. Each of the 60 sound files contained the full answer to the prompt and thus could not be used in its full length. Using the View and edit function in Praat, version 6.1.14 (Boersma & Weenink, 2020), a short phrase between 4 and 6 seconds in length was extracted from each sound file. To keep the phrases uniform in nature, each stimulus was extracted from the beginning of the interview where possible; in a few exceptions, the phrase was cut out from a later part of the recording due to excessive coughing or stuttering. The phrases respected natural pauses and were extracted at zero points of the wavelength. 60 short phrases were obtained using this procedure.

Stimuli for the second part of the test were prepared from the recordings collected in the sentence typing task in which the speakers were asked to utter phrases as either statements or questions. Each sound file contained all five sets of sentences prompted by the interviewer;

(34)

one set only was randomly assigned to each speaker. In Praat, two short phrases were extracted for each speaker, one question and one statement. As the Dutch interrogative sentence is marked by reversed word order, the stimuli had to be cut out without the pronoun and the auxiliary verb at the beginning. The sentences “Heeft hij trompet gespeeld?” (“Did he play the trumpet?” and “Hij heeft trompet gespeeld.” (“He played the trumpet.”) thus both became the phrase “trompet gespeeld” (“played the trumpet”) so that Dutch listeners would not be able to identify the sentence type based on the content of the phrase. The phrases were again extracted at zero points of the wavelength. 120 phrases, 60 statements and 60 questions, were obtained using this procedure.

Once all 180 Dutch phrases were extracted, they were normalized to ensure identical intensity. This was done in Praat, using the function Scale intensity; the new average intensity was set to 70 dB SPL. The last step in stimuli preparation was format conversion: all the files were converted from .wav to .mp3. This was performed using the Convert/Play function in VLC Media Player, version 3.0.8 (VideoLAN, 2019). As practice sessions for each of the listening tasks in the test were to be included in the test, 4 pilot items were created as well, using the same techniques as before, the only difference being that they were extracted from speakers not included in the test.

Preparation and design of perception test

Finally, to create the perception test, two separate html codes were written in Visual Studio Code, version 1.44.2 (Visual Studio Code, 2020), using the JavaScript library jsPsych (de Leeuw, 2015), which employed a timeline format to describe the experiment. One version of the code was written with Czech instructions for Czech listeners and the other with Dutch instructions for Dutch listeners; otherwise, the codes were identical. The code was written with the help of the main researcher of the PhD project about HD, who used the same technique in her ongoing PhD research. In order to convert the code into an online study, the

(35)

software JATOS, version 3.5.3 (Lange et al., 2015), was utilized to create a local server with the study, which was then exported to create hyperlinks to the online experiments. These enabled online distribution of the perception tests among the participants.

First, the participants were asked to enter the participant ID which they were assigned in the consent form. A short set of instructions followed, explaining the structure of the experiment, and advising the listeners to rely on their first impressions (Figure 2).

There were two listening tasks to complete. Both tasks were preceded by a short set of instructions (Figures 3 and 5) and a practice session. In the first task, the listeners were

Figure 2

Test instructions for the first listening task included in the Dutch version of the experiment

Figure 3

(36)

required to answer the question “Did the voice sound healthy to you?” upon listening to a short phrase with a choice from “yes,” “no,” and “I don’t know” answers (Figure 4). Their

answer was followed by a second question: “How sure were you of your answer?” with answers “rather sure” and “rather unsure” to choose from (Figure 7). The stimuli in the 60 items in this task were played in a randomized order. In the second task, the listeners were required to answer the question “Was this sentence a question or a statement?” again upon listening to a recorded phrase with answers “question,” “statement,” and “I don’t know” to choose from (Figure 6). There were 120 items in this task, with the stimuli played in a

Figure 4

The first listening task included in the Dutch version of the experiment

Figure 5

(37)

randomized order. Again, their answer was followed by a second question: “How sure were you of your answer?” with “rather sure” and “rather unsure” as answers.

Consent form and questionnaire

An informed consent form with an option to sign electronically using signature certificates was created as well so that the data could be analyzed without ethical

repercussions. This was performed using the software TexMaker, version 5.0.4 (TexMaker,

Figure 6

The second listening task included in the Dutch version of the experiment

Figure 7

A question determining the confidence behind the answers in both tasks included in the Dutch version of the experiment

(38)

2020), along with the Tex/LaTex implementation tool MikTex, version 2.9.7417 (MikText, 2020). The consent form was accompanied by a short questionnaire, which enquired about the demographic background of the participants and about their language and expertise

backgrounds to make sure that Czech listeners were not familiar with Dutch and that the listeners were truly either trained or untrained in phonetics. Additional information about musical skills or about the regions from which the participants came from was collected as well, in case future research would be interested studying these effects. A Czech and a Dutch version of the consent form and the questionnaire were created; the consent forms are

provided in Appendix A and the questionnaires in Appendix B.

3.3 Procedures

The participants were sent two emails, each with a set of instructions to two key steps of the experimental procedure: the consent form, and the perception test. The first email included the document with both the consent form and the questionnaire. Each document was assigned a unique identification number be used in the online test to provide a connection between the questionnaires and the results. The participants were asked to return the

document filled out and signed before being given access to the online perception test. Once this document was successfully returned, the participants received a second email, containing the hyperlink to the test and a set of instructions concerning the conditions in which it ought to be taken. An effort was made to aid the respondents in creating shared testing conditions: the instructions included advice concerning problematic Internet browsers, the use of

headphones, and the choice of a silent environment. Czech participants were administered the Czech version of the test and Dutch participants were administered the Dutch version of the test. The attrition rate was 13%: out of the 46 people who submitted the consent forms, 6 did not finish the test and were not included among the participants. Within the test, the

(39)

numerical values based on whether the first, the second, or the third option was selected. The answers and scores in the first task were “healthy” (scored 0), “unhealthy” (1), and “I don’t know” (2); the answers and scores in the second task were “question” (0), “statement” (1), and “I don’t know” (2). Confidence answers followed the same paradigm, with “rather sure” (0) and “rather unsure” (1).

3.4 Analyses and statistics

First, relative-unhealthiness scores, statement-recognition scores, and

question-recognition scores – each per speaker – were calculated. These scores were calculated based

on the numerical values which represented answers from both listening tasks. Relative

unhealthiness score was calculated by dividing the number of answers identifying a speaker as unhealthy by the total number of answers for that speaker; statement recognition scores and question recognition scores were calculated by dividing the number of correctly identified sentence type of one speaker by the total number of answers for that speaker. These scores were then plotted to visualize the comparison between HC and PD groups.

The following steps were taken for the purposes of statistical analysis. The first dependent variable - the participants’ ability to recognize a PD patient from a HC in the first listening task – was operationalized as unhealthiness-recognition-accuracy score, which was calculated by comparing the differences in means for answers for HC group and for PD. The second dependent variable - the participants’ ability to correctly recognize sentence type in the second listening task - was operationalized as sentence-type-recognition-accuracy score, which was calculated by comparing the differences in means of answers which correctly identified sentence type. Before score calculation, “I don’t know” answers (with a numerical value 2) were removed; the number of items the scores were based on was therefore slightly varied among participants. The two accuracy scores were then separately processed with regards to the independent variables, according to whose levels the participants were grouped

(40)

for the purposes of comparison. Language familiarity as the first independent variable had two levels: Czech, and Dutch. Expertise as the second independent variable had also two levels: Naïve, and Expert. Four groups were thus compared based on these levels: Czech naïve listeners, Dutch naïve listeners, Czech expert listeners, and Dutch expert listeners. The type of all the variables was thus between-subjects. Due to the abnormal distribution of responses in all groups in both tasks, all four samples were resampled through bootstrapping (R = 1000) to estimate the classification accuracy between PD and HC groups as well as between sentence types expressed by statistics comparing distributions of answers. The unhealthiness-recognition-accuracy scores and sentence-type-recognition-accuracy scores were then independently compared between the four groups using Welch’s F test, a non-parametric alternative of one-way ANOVA. The choice of Welch’s F test as statistic was motivated by the significantly different variance of data in the four bootstrapped groups and by the presence of two nominal independent variables with two levels each; the alpha decision level was set at 5% (α = 0.05). The test was first performed with the scores only. It was

performed as second time with scored weighted by the confidence levels reported with each answer; the interaction between answer and confidence was calculated using the formula [0.5+(answer-0.5)*(1-conf/2)]. All analyses were performed in R Studio, the desktop version 1.2.5042 (RStudio Team, 2020).

4. Results 4.1 Initial visualizations

First, relative-unhealthiness scores, statement-recognition scores, and question-recognition scores were calculated for each speaker in order to visualize the distribution of perceived unhealthiness and sentence type intelligibility across all the speakers in the PD group and the HC group. Figure 8 shows the extent to which each speaker in the first listening task was perceived in each group of subjects, with healthy speakers represented by orange

Referenties

GERELATEERDE DOCUMENTEN

In a forced ER task participants used reappraisal or distraction to regulate their emotions in response to stills from the film clips in both facilitated and

1) Internal communication via mobile phone has a significant positive effect on employee engagement. To measure the employee engagement level a combination of

Tijdens de uitbraak van klassieke varkenspest zijn er voor de transmissie van het virus tussen bedrijven zeven verschillende contacttypen onderscheiden waarbij het virus naar

The impact of family physician supply on district health system performance, clinical processes and clinical outcomes in the Western Cape Province, South Africa (2011–2014)..

In accessing web data through either general search engines or direct query- ing of deep web sources, the laborious work of querying, navigating results, downloading, storing

Speakers did not economize on accent lending pitch movements, but 40% of the boundary marking pitch movements disappeared under time pressure, reflecting the linguistic hierar- chy

In this context, 37 Rotterdam cocaine retail dealers and 24 detained participants on the level of the cocaine middle market and import trade were interviewed between spring

Zambia  is  Africa’s  largest  copper  producer  and  its  copper  mining  industry  has  since  2003  experienced  an  unprecedented  increase  in  copper