• No results found

Voice Patterns in Context: Investigating SpeechFeatures in Schizophrenia during SocialInteractions

N/A
N/A
Protected

Academic year: 2021

Share "Voice Patterns in Context: Investigating SpeechFeatures in Schizophrenia during SocialInteractions"

Copied!
30
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

U

NIVERSITEIT VAN

A

MSTERDAM

R

ESEARCH

P

ROJECT

2 - RM B

RAIN AND

C

OGNITIVE

S

CIENCES

Voice Patterns in Context: Investigating Speech

Features in Schizophrenia during Social

Interactions

Author: Michelle K¨uhn Supervisor: Alban Voppel May 11, 2021

(2)

Abstract

Atypicalities in voice and speech have been identified and investigated in schizophrenia since it has been first characterized (Kraepelin,1906; Bleuler,1950). Recently, the diagnostic value of speech parameters has been explored in studies using multivariate classifiers (Elvev˚ag et al.,

2010; Corcoran et al.,2018; Bedi et al.,2015). Whereas these approaches deliver promising results in distinguishing patients from healthy controls (De Boer, Brederoo, et al.,2020), they currently focus on the individual rather than studying dialogue in social interactions (Fusaroli, Raczaszek-Leonardi, et al.,2014). The current study investigated linguistic accommodation, the phenomenon of conversational partners to adjust their behavior and speaking style to each other, in individuals with schizophrenia spectrum disorder (n = 80) and healthy controls (n = 91). Based on previous analyses of vocal expression (Parola et al.,2020), the accommodation of acoustic speech parameters like pitch, loudness, speech rate and speech timing were explored in semi-structured interviews using both univariate analyses as well as logistic classifiers to distinguish between the groups. Additionally, the relationship between accommodation and negative symptom severity as well as the change of accommodation over time was investigated. Results show that healthy controls and schizophrenia accommodate to their interview partner in some of the acoustic dimensions, yet a significant difference between the groups was not found. Furthermore, no significant difference between patients with a high severity in negative symptoms and patients with a low severity was found. The degrees of accommodation did not significantly increase or decrease across the interviews. The current findings suggest accommodation in schizophrenia is heterogeneous between individuals and multimodal in nature. Further research as well as future directions in methodology are discussed.

1

Introduction

Language disturbances are one of the core symptoms and diagnostic criteria for individuals with schizophrenia spectrum disorder. These disturbances range from morphosyntactic impairments on the word and sentence level to pragmatic and discourse anomalies. Specifically, typical impairments include a reduced verbal fluency (Covington et al.,2005) and diminished syntactic complexity (DeLisi,2001), as well as disordered comprehension of metaphors and sarcasm (Mitchell and Crow, 2005) and an increased derailment and tangentiality in discourse (Ditman and Kuperberg,2010; Kuperberg,2010).

The assessment of language disturbances is based on clinical rating scales (for a review see Hart and Lewine,2017). However, recent studies have explored computational linguistic features to investigate speech in schizophrenia. The methods vary per approach, with different assumptions about the underlying symptoms. Semantic space models have been used to assess speech coherence (Elvev˚ag et al.,2010) and measures derived from Part-of-Speech tagging or speech graphs to reflect syntactic complexity (Bedi et al.,2015; Mota et al.,2017). Other models employ acoustic measures, such as pause patterns or prosody features (Cohen et al.,2016). Statistical semantic features have been successfully used to diagnose psychosis patients and to predict the development of schizophrenia in young adults (Elvev˚ag et al.,2010; Corcoran et al.,2018; Bedi et al.,2015). Importantly, while subjective evaluations using rating scales find profound differences between patients and healthy controls, computational methods suggest these effects

(3)

are far more heterogeneous and dependent on context and demographics (Cohen et al., 2016; Parola et al.,2020). Thus, whereas previous research focused on language features within the individual, social interactions have not been taken into account and disturbances are rarely studied in conversation. Historically, disturbances in schizophrenia were attributed to an impairment originating in thought (Bleuler,1911), but more recent approaches have highlighted the integration of language and cognition (i.e., Covington et al., 2005; Hinzen and Rossell´o, 2015) and the importance of studying language within the context of social interaction (Bell,2013; Parola et al.,2020). More specifically, there has been an increase in interest in studying language use in dialogue (Pickering and Garrod,2004). Since the cognitive load of interacting with another person is more taxing on executive functions, impairments may become more pronounced as opposed to an experimental setting with monological productions (Parola et al.,2020). This interest extends to studying social communication. Cognitive impairments in schizophrenia also affect social cognition, such as face recognition (Kohler et al.,2010), processing of emotional prosody (Bach et al.,2009) and theory of mind (Br¨une,2005; Parola et al.,2018). In light of these findings, some researchers suggest studying the influence of socio-cognitive mechanisms on speech atypicalities (Parola et al., 2020).

One phenomenon of social communication is the accommodation of a conversational partner’s behavior to their interlocutor. This behavior occurs in both content and timing (Delaherche et al., 2012) and extends to speech as well. Linguistic alignment, also simply referred to as alignment, entrainment or synchrony, describes the accommodation of speech in multiple dimensions, such as in choice of words and expressions (i.e., Levelt and Kelter,1982), paralinguistic factors such as pauses (Edlund et al.,2009), as well as on the acoustic-prosodic level like in pitch and voice intensity (Levitan and Hirschberg, 2011; De Looze et al., 2014). The basis for studying this behavior is Giles’s Communication Accommodation Theory (CAT) (Giles,1979). This framework encompasses both inter-group and inter-individual dimensions and describes the way participants in a social interaction either increase or decrease the distance between them (Giles,2016). These two principles, convergence and divergence, have built on the principles of similarity attraction (Byrne,1971) and Social Identity Theory (Tajfel,1982). The idea behind these two principles is that if speakers accommodate their behavior to be more similar, they do so to increase approval from the other participant. Conversely, if the behavior is accommodated to diverge, it is to preserve their identity and distinct themselves from the other party (Giles,2016). This assumption of the framework has the practical consequences that accommodation cannot be expected to linearly increase over the course of a conversation, but that it is a dynamic process of similarity and dissimilarity (De Looze et al.,2014).

Few studies have investigated linguistic accommodation within schizophrenia. Dwyer et al. (2020) investigated syntactic alignment in a series of cooperative tasks and found a decreased alignment in individuals with schizophrenia spectrum disorder, with a significantly larger effect in patients diagnosed with formal thought disorder. These impairments were attributed to an impaired processing of semantic context and a decreased syntactic priming. Dombrowski et al. (2014) conducted interviews asking participants to describe affective life experiences that were perceived either as stressful or pleasant. The authors investigated acoustic pitch accommodation during these

(4)

interviews and found that with increasing severity of positive and negative symptoms in patients, pitch accommodation decreased. Whereas there was no healthy control group included, the authors established a link between symptom severity and speech accommodation. Le (2018) investigated vocal accommodation in structured clinical interviews. Using a protocol of acoustic measures such as utterance length, prosody and speech volume (Cohen et al.,2008; Cohen et al.,2015) the author assessed accommodation during the interview and found that patients with schizophrenia accommodated less than healthy controls, but no significant difference could be found. Furthermore, the accommodation scores did not predict clinical ratings of speech disturbances as discussed above.

These mixed findings highlight the need for further investigation. Studying speech accommoda-tion can add another perspective to the existing research on voice disturbances in schizophrenia and explore the link between negative symptoms and vocal expression (Tahir et al.,2019). Furthermore, it can provide a theory driven approach to study social cognitive impairments within the context of conversation. In order to develop an acoustic biomarker for schizophrenia, analyzing conversa-tional dynamics can introduce novel measures and may point towards suitable contexts in which discriminability is maximized. Lastly, a growing body of research has been dedicated to studying synchronous behavior in psychotherapy (i.e., Reich et al.,2014; Koole and Tschacher,2016; Koole et al.,2020). Thus, speech accommodation may shed light on the dynamics and rapport of the therapeutic relationship between patient and caregivers.

In this study, a large sample of semi-structured interviews between a member of the research group and either a healthy control or patient with schizophrenia spectrum disorder was investigated for linguistic accommodation of five acoustic speech features. The goal was to assess whether patients accommodate more or less than healthy controls, whether there is a change in accommoda-tion as the interview progresses and if there is a correlaaccommoda-tion between speech accommodaaccommoda-tion and the severity of negative symptoms.

2

Methods

2.1 Participants

Participants were selected as part of the PRAAT study in De Boer, Brederoo, et al. (2020) and De Boer, van Hoogdalem, et al. (2020). A total of 171 participants, 91 healthy controls and 80 patients with schizophrenia spectrum disorder, were included through the University Medical Center Utrecht (UMC) between 2016 and 2021 if they met the following criteria: 1) at least 18 years of age and 2) native speakers of Dutch.

Patients were selected if they met the criteria for either of the DSM-IV diagnoses: 295.x schizophrenia, schizophreniform disorder or schizoaffective disorder or 298.9 (psychotic disorder, not otherwise specified) as diagnosed by their treating psychiatrist. This diagnosis was confirmed using the Comprehensive Assessment of Symptoms and History (CASH) interview (Andreasen et al.,1992) by a trained researcher. The severity of symptoms in the patient group was assessed using the Positive And Negative Symptom Scale (PANSS; Kay et al.,1987). The PANSS is a score used to assess the severity of symptoms present in the patient and is administered in a clinical

(5)

interview. The patient is rated from a minimum score of 1, to a maximum score of 7 on a series of positive, negative and general psycho-pathological symptoms. Healthy controls were included if they had no current mental illness or history of such, as assessed by a trained researcher using the CASH. Furthermore, controls were excluded if they reported a family history of psychotic diagnosis. All participants disclosed no uncorrected hearing disabilities or speech deficits. Ethical approval was granted by the ethical review board of the UMC and all participants provided written informed consent.

2.2 Interviews

Spontaneous spoken speech was elicited in semi-structured interviews between the participant and a trained researcher. To keep the topics of interviews comparable, a standard set of questions concerning general life circumstances was used, a full list of questions is available in the supple-mentary material of De Boer, van Hoogdalem, et al. (2020). Interlocutors were recorded using two head-worn AKG-C544l cardioid microphones and a Tascam DR40 solid state recording device at a sampling rate of 44,100 kHz with 16-bit quantization. The interviews lasted around 15 minutes (M = 14; SD = 6). Since each speaker was recorded on a separate microphone, the audio channels were split between the interviewer and participant into separate files using the PRAAT software (Boersma and Weenink,2020). Speaker turns were annotated automatically in PRAAT based on the audio track of the interviewer. A speaking turn was considered a portion of audio signal of at least 100 ms, a minimal pitch of 100 Hz and with an intensity threshold higher than -25 dB of the maximum intensity. Acoustic features were only considered from speaking intervals, to eliminate the possibility of cross-talk between the speakers. An example of the interview structure and turn taking segmentation is shown in Fig. 1a.

2.3 Acoustic Speech Features

Previous studies have shown that interlocutors accommodate their pitch and intonation (Gregory Jr and Webster,1996; De Looze et al.,2011; Levitan et al.,2011) as well as voice intensity (Coulston et al.,2002; De Looze et al.,2011), speech rate, and speech timing (Weidman et al.,2016; Edlund et al.,2009). Thus, the current study focused on five acoustic features to study accommodation: pitch and pitch variability, voice loudness, syllable rate and pause duration as described in Table 1. Pitch refers to how the fundamental frequency (F0) is perceived by the human ear, i.e. how low or high a voice sounds. In schizophrenia, pitch variability, the change of pitch over time, is suggested to be moderately reduced (Cohen et al.,2014; Parola et al.,2020). The convergence of pitch across a dialogue is thought to be indicative of rapport between speakers (Babel and Bulatov,2012). Voice intensity refers to the perceived amplitude, or loudness of the speech signal. Variability in voice intensity appears to be distinct in schizophrenia patients (Alpert et al.,2000) although the effect is small (Cohen et al.,2014; Cohen et al.,2016; Parola et al.,2020). Both pitch and intensity variability have been identified as potential indicators of cognitive load in schizophrenia patients (Cohen et al., 2012), as well as in healthy controls (Yin et al.,2007). Lastly, speech rate and timing manifests in the measures of syllable rate and pause duration. The syllable rate reflects the rate of articulation

(6)

and thus the speed of speech production, which is thought to be impaired in schizophrenia and predictive of group status between patients and healthy controls (De Boer, van Hoogdalem, et al., 2020). Pauses in speech reflect time to think and plan ahead, and may thus indicate processing speed and cognitive load (Yin and Chen,2007; Khawaja et al.,2008). Pause duration has emerged as a robust feature correlating with clinical ratings of speech impairments in schizophrenia (Alpert et al.,1995; Cohen et al.,2014, Parola et al.,2020). Speech is not a stationary signal, as acoustic features dynamically change across utterances. Several studies have investigated to which extend accommodation between speakers increases or decreases across a conversation. It is to be assumed that if interlocutors use speech accommodation to increase rapport, the degree of synchrony should increase as the conversation progresses and reach a level of maintenance (Burgoon et al.,1995). On the other hand, as both participants engage in dialogue, cognitive load increases, which can lead to an overall decline in accommodation. Whereas an increase in accommodation was found for acoustic speech features in human-computer interactions (Suzuki and Katagiri,2007), generally authors argue that speech accommodation fluctuates more locally (Levitan and Hirschberg,2011) or that it can be both dynamically changing and increasing over time (De Looze et al.,2014). Thus, as well as studying accommodation across the entire conversation, investigating the degree of alignment within a dialogue can give insight into the dynamics of accommodation and development of rapport.

Therefore, the goal of this analysis was twofold. The first aim was investigate whether individu-als with schizophrenia spectrum disorder show different patterns of accommodation in loudness, pitch or speech timing compared to healthy controls and whether this accommodation behaviour is modulated by the severity of negative symptoms. Secondly, the degree of accommodation was compared between several time points during the interview to assess trends in accommodation across the dialogue.

2.4 Feature Extraction and Accommodation Procedure

Acoustic features were extracted using the openSMILE software, an open-source feature extraction toolkit for audio data (Eyben et al.,2010). All features were extracted using the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), available publicly within the openSMILE package (Eyben et al.,2015). This approach was chosen to promote a standardized and transparent method of feature extraction for clinical speech analysis. A conceptual description of the acoustic features is shown in Table 1.

Whereas accommodation is a thoroughly studied phenomenon, there is currently no consensus on how it should be modeled, and few automatic systems are available (i.e., De Looze et al.,2014). To approximate the dynamics of the dialogue between interviewer and participant, the interviews were studied on a turn-by-turn basis, following previous approaches studying accommodation (Levitan and Hirschberg,2011; Babel and Bulatov, 2012; Reich et al., 2014). The audio data was split into smaller utterances based on the turn of each speaker (Fig. 1a) using the PRAAT software. For each of these turns, the speech features were calculated using openSMILE and meaned, resulting in one mean measure per turn as shown in Fig. 1b.

(7)

Feature Description

Pitch F0 values are logarithmically scaled from Hz to semitone frequency to facilitate gender comparisons.

Pitch variability The standard deviation of the pitch, normalized by the arithmetic mean.

Loudness

Perceptual signal energy. Instead of the amplitude of a signal, a non-linear auditory spectrum is computed that closely represents the logarithmic human perception of loudness.

Syllable rate The number of continuously voiced segments per second. Average Pause

Du-ration The mean length of unvoiced segments where F0 = 0.

Table 1: Acoustic features and their conceptual description. Features are extracted for speech segments and summarized (meaned) for one speech turn using OpenSMILE. For details on how these features are computed see Eyben et al.,2015.

(8)

To assess whether participants accommodate to their interlocutor, the mean scores of all turns within an interview were correlated using Spearman’s rank correlation, rs, leading to one

correlation score per acoustic feature between interviewer and participant for each dialogue. This parametric approach was chosen over more traditional Pearson correlation due to the non-normal distribution of the data.

Based on these correlation scores, group comparisons between patients and healthy controls were performed using two-sided Welch’s t-tests. Prior to performing group comparisons, Spearman’s correlation scores were normalized using Fischer’s z transform as described in Myers and Sirois (2014). All data analysis was performed in Python using the SciPy and NumPy libraries (Virtanen et al.,2020; Van Der Walt et al.,2011). The analysis code is available publicly1. To assess whether the groups accommodate in their speech, the degree of accommodation was tested against chance (rs= 0). Prior to investigating vocal accommodation, the groups were compared on demographic

variables (age, gender and years of education) to identify possible confounds.

Previous research has indicated gender differences in linguistic accommodation. Particularly in shadowing tasks, where a portion of speech is repeated by participants after listening to a model, it has been shown that female speakers accommodate more, and greater to male speakers than males (Namy et al.,2002). While these findings could not be replicated using a larger sample size (Pardo et al.,2017), a within-group analysis of the participant’s self-reported gender (male vs. female) and interview pairing (same vs. different gender) was performed.

To assess the possible modulation of negative symptom severity, the patient group was split into a high severity group, with a score of four or higher in any of the negative symptoms listed in the PANSS (n = 22), and a low severity group with scores below four (n = 36), as this score is associated with a moderate interference in daily activity2. On all of these negative symptoms, patients are rated on a scale from one to seven based on the severity of the present symptoms. A score of four is associated with a moderate level of psychopathology, whereas any score below is rated as ’mild’, ’minimal’ or ’absent’ in decreasing order (Kay et al.,1987).

Following previous attempts in assessing accommodation across time (De Looze et al.,2014), correlation scores were compared between the first and second half of each interview, as well as between conversation thirds using two-sided paired t-tests.

2.5 Predicting Group Status

Whereas the previously described procedure tests the accommodation between groups in the different acoustic dimensions, an additional analysis was performed to investigate whether the combination of acoustic accommodation scores differs between the groups. To test this, similarly to Le (2018), a logistic regression model was employed using the Scikit-learn library (Pedregosa et al.,2011), to examine the relationship between speech accommodation and group status while considering possible confounding variables. The model used all acoustic accommodation features

1https://github.com/mkuehn11/MBCS RP2 PRAATVersion: May 11, 2021

2Negative Symptoms are: blunted affect, emotional withdrawal, poor rapport, passive/apathetic social withdrawal, difficulty in abstract thinking, lack of spontaneity and flow of conversation, stereotyped thinking (Kay,1991, Chapter 4)

(9)

(a) Speaker turns

(10)

(c) Syllable rate and Pause duration

Figure 1: Process of feature extraction. a) Speaker turns - During the 1:30 minute clip the in-terviewer (upper panel) and participant (lower panel) take turns speaking. The turn boundaries were automatically annotated using PRAAT based on the interviewer channel (turns shown in red). Whenever silences were detected in the interviewer channel, the turn was marked as belonging to the participant (turns shown in blue). b) Pitch and loudness features - Showing the average pitch (teal) and loudness (orange) exemplary for each speaker across an excerpt of the clip. c) Syllable rate (= Articulation rate) and Pause durations - Instead of using the low-level openSMILE features, the summary output for each speaker turn was computed, resulting in mean pitch (teal), loudness (orange) (Fig. 1b) syllable rate (purple) and pause duration (pink) (Fig. 1c) features per turn across the interview. These scores were then correlated using Spearman correlation.

(11)

(rsvalues of all interviews) and the years of education (YOE) of all participants as predictors for

group status (healthy control vs. patient).

To train the model, the full set of accommodation scores was split into training (80%) and testing (20%) set. To assess the performance of the logistic regression model, a stratified 5-fold cross-validation was employed. Independence of all predictors was confirmed prior to training the final model using the variance inflation factor. The performance was evaluated using the area under the receiver operating characteristic (ROC-AUC) metric, due to the slight imbalance between the groups (91 controls vs. 80 patients). The AUC scores were obtained for every fit of the model and averaged for every fold leading to a single, cross-validated AUC score. The final model was evaluated on the test set.

3

Results

3.1 Demographics

The results of demographic comparisons are shown in Table 2. The patient and control groups did not significantly differ in age, gender or parental education, but significant differences were found in years of education (p < 0.01).

Controls Patients Statistic p

(n = 91) (n = 80)

Gender (% Female) 66% 61% χ2(= 0.40) 0.52

Age, years (mean) 37 33 T ( = 1.7) 0.087

Years of Education - parental (mean) 13 12 MWU 0.41

Years of Education (mean) 15 13 MWU < 0.01**

Table 2: Demographic variables and statistical evaluation. MWU = Mann–Whitney U test. ** = significant p-value.

3.2 Speech Accommodation

To assess whether patient and control groups accommodate in their speech, the Spearman correlation coefficient of the mean acoustic features were compared against baseline (rs> 0). For all tests

within and between groups, significance was set at α = 0.01 to account for the repeated tests for all five acoustic variables. Control participants were found to accommodate both in their loudness (T = -2.8, p < 0.01) as well as in their average pause duration (T = 5.58, p < 0.01), but not in their pitch

(12)

Controls vs Patients T p Accommodation measures Pitch 1.6 0.12 Loudness −0.17 0.86 Articulation Rate −2.0 0.046

Average Pause Duration 1.1 0.26

Pitch Variability 0.99 0.33

Table 3: Two-sided t-Test for control and patient groups for all speech accommodation scores (rz).

(T = -0.8, p = 0.42), articulation rate (T = 1.3, p = 0.18) or pitch variability (T = 0.7, p = 0.44). In the patient group, participants accommodated in their pitch (T = -2.62, p = 0.01), their articulation rate (T = 3.7, p < 0.01) and average pause duration (T = 3.0, p < 0.01), but not in loudness (T = -1.9, p = 0.06) or pitch variability (T = -0.6, p = 0.52). These findings support previous studies showing that individuals with schizophrenia spectrum disorder exhibit accommodation in mean F0 with their conversational partners (Le,2018) or an indication of such (Dombrowski et al.,2014). Yet, they contradict research establishing that healthy controls accommodate in their pitch (i.e., De Looze et al.,2011; Levitan et al.,2011).

After establishing a baseline in speech accommodation by testing correlation coefficients against zero, comparisons between groups were performed. Two-sided t-tests between the patient and control groups are shown in Table 3. Controls and patients did not significantly differ in acoustic accommodation parameters except for articulation rate (T = -2.0, p = 0.05), however this value does not hold under the corrected significance threshold.

Based on previous findings linking speech atypicalities to negative symptoms (i.e., Parola et al.,2020), speech accommodation features were compared within the patient group based on the severity of their symptoms. The group was split based on their PANSS scores. The high symptom severity group consisted of individuals with a score of four or higher in any of the seven negative symptoms in the PANSS. The distribution of negative symptoms within the patient group is shown in Fig. 2. The majority of patients scored low on the negative symptoms of the PANSS (M = 12.3 out of maximum 49), with few individuals rated as five or higher and none rated the maximum of seven. Out of the 59 participants for which a PANSS interview was available, 29 were reported as experiencing an active psychosis with a mean negative score of 14.65, whereas 30 were in remission (M = 9.93). Testing the accommodation of acoustic features between the groups, patients with a high severity in negative symptoms (n = 22) did not significantly differ from patients with a low

(13)

20 40 PANSS_N1 20 40 PANSS_N2 20 40 PANSS_N3 20 40 Count PANSS_N4 20 40 PANSS_N5 20 40 PANSS_N6 1 2 3 4 5 6 PANSS Score 20 40 PANSS_N7 Distribution of Negative Symptoms

Figure 2: Distribution of negative symptoms in the patient group. Showing a histogram for each of the seven negative symptoms (N1 - blunted affect, N2 - emotional withdrawal, N3 - poor rapport, N4 - passive/apathetic social withdrawal, N5 - difficulty in abstract thinking, N6 - lack of spontaneity and flow of conversation, N7 - stereotyped thinking) with the number of participants for each score.

severity (n = 36) in any of the acoustic features, the results are shown in App. 6.1. These results suggest that if there is a modulation of accommodation, symptoms would have to be quite severe to affect the behavior.

In addition to comparisons between healthy controls and schizophrenia patients, within group tests between male and female participants were performed. The results indicate that no significant differences were found between male and female controls, as well as between male and female participants with schizophrenia. Furthermore, speech accommodation features were compared within the groups between interviews with a different gender pairing of conversational partners (i.e., male - female) and the same gender (i.e., female - female). Results show no significant differences for any of the speech measures in either the healthy control or patient group. Group sizes and test results are reported in App. 6.1.

Lastly, to investigate whether accommodation would change across the dialogue, conversation halves and conversation thirds were compared within each interview. Fig. 3shows an example of how the speech features change over the course of the interview. The figure shows the average speech features of the conversation halves and thirds for all conversation partners in each group. Whereas some evidence of divergence between interviewer and participant is visible for i.e., pitch variability, the results show that there is barely a trend in accommodation between the speakers, both for healthy controls as well as for patients. These findings were confirmed by statistical analyses of the correlation coefficients, which are reported in App. 6.1. No significant differences were found between the first and the second half of the interview or between the conversation thirds, suggesting that changes in accommodation may happen more locally.

(14)

1/2 2/2 Conversation Halves 20 22 24 26 28 30 32 34 Pi tc h 1/3 2/3 3/3 Conversation Thirds 20 22 24 26 28 30 32 34 Interviewer Participant Control Group Patient Group 1/2 2/2 Conversation Halves 0.10 0.12 0.14 0.16 0.18 0.20 Pi tc h Va ri a b ili ty 1/3 2/3 3/3 Conversation Thirds 0.10 0.12 0.14 0.16 0.18 0.20 Interviewer Participant Control Group Patient Group

Average Speech Features Across the Interviews

Figure 3: Comparison of mean speech features across different time points during the interviews. The plot shows the group mean values of all participants in pitch and pitch variability in the first vs. second half (left panels) and between conversation thirds (right panels). Healthy control participants are shown in dashed blue, schizophrenia patients are shown with their conversation partner in red. The average values of the interviewers are marked with full circles in lighter shade. The full figure with all five speech features is shown in Appendix 6.2.

(15)

0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

0.0 0.2 0.4 0.6 0.8 1.0 True Po siti ve Rate Cross-Validation ROC of SVM

ROC fold 1 (AUC = 0.59) ROC fold 2 (AUC = 0.55) ROC fold 3 (AUC = 0.68) ROC fold 4 (AUC = 0.54) ROC fold 5 (AUC = 0.64) Chance

Mean ROC (AUC = 0.60 ± 0.05) ± 1 std. dev.

(a) Cross-validation Performance

Control Patient Contr ol Pati ent 17 2 8 8 Confusion Matrix 2 4 6 8 10 12 14 16 (b) Confusion Matrix

Figure 4: (a) Cross-validation performance across 5-fold cross-validation, showing the ROC curve for every split and mean ROC curve as well as mean AUC across all splits. (b) The confusion matrix of the final model showing the real labels for the interview on the x-axis and he predicted labels on the y-axis, along with the number of observations that were classified in each category.

3.3 Logistic Regression Model

Based on the discrepancy between previous research and the present study, a logistic regression model was employed to examine the relationship between speech accommodation and group status. The model’s performance was assessed using the AUC score in a 5-fold stratified cross validation. The performance across all folds is shown in Fig. 4a. The model yielded cross-validated AUC score of 0.6, which is above chance (0.5). The confusion matrix in Fig. 4bshows the evaluation of the final classifier on a test set of interviews. The matrix shows the true and the predicted labels on the x and y axes and the number of instances that were classified in each quadrant. The main diagonal shows the correctly classified observations. While the model is very accurate in correctly identifying healthy controls based on their speech accommodation scores, it identifies patients at chance level.

This discrepancy was further investigated to assess whether there is an underlying pattern in the miss-classification of patients as controls. In light of previous research, it has been suggested that deficits in speech correlate with the severity of negative symptoms. Whereas this hypothesis could not be supported in the univariate analysis, the negative symptom scores of all patients in the evaluation set was compared. Due to the total sample size of 171 interviews, the split in a train and evaluation set and the incomplete PANSS scores at the time of analysis, the scores of only ten patients in the test set could be evaluated. Since this sample size was deemed to be too small for inferential analysis, the difference in scores can only be assessed visually. An overview of the negative symptom scores of patients in the test set is shown in Appendix 6.3. Visual inspection

(16)

of the different negative symptoms shows that surprisingly, incorrectly identified patients have higher scores in negative symptoms than correctly classified patients. Based on these results and the lack of inferential statistical methods, no definite claims can be made about whether patients with a low severity of negative symptoms are more often miss-classified as healthy controls than patients with a high severity. The limited results suggest this is not the case but rather, a higher severity in symptoms leads to a miss-classification. Further studies with larger sample sizes, more heterogeneous groups and better classification models are needed to study this effect.

4

Discussion

Disturbances in language have emerged as one of the key diagnostic features in schizophrenia. Although speech atypicalities have been described since the earliest characterizations of the disorder, only recently studies have aimed to quantify these differences using computational acoustic analyses (i.e., Cohen et al.,2016: De Boer, van Hoogdalem, et al.,2020). Results from these studies suggest much more moderate deficits in vocal expression than previous subjective ratings (Cohen et al.,2016; Parola et al.,2020). Few studies have investigated language disturbances specifically within dyadic interactions to explore these deficits in social contexts. Studying the effect of these interactions on vocal productions in schizophrenic patients could help identify situations where the effect size of language deficits is maximized due to increased cognitive load. Furthermore, the process could shed light on social cognitive impairments underlying the disorder.

In this study, speech accommodation of several acoustic features was compared between patients with schizophrenia and healthy controls on a turn-by-turn basis using semi-structured interviews. Results show that while healthy controls and schizophrenia patients do accommodate to their interlocutor in some acoustic dimensions, a significant difference was only found for accommodation in mean articulation rate. A within-group comparison of patients with a high degree of negative symptoms and patients with a low degree revealed no significant differences between the groups. A multivariate analysis using the five accommodation variables as well as the years of education as predictors for group status could reliably identify healthy controls, but struggled to correctly classify patients. A more detailed analysis of these instances was limited due to the small number of observations and revealed no clear trend that distinguishes correctly identified patients from incorrectly classified ones.

These findings integrate in a sparse body of research dedicated to this phenomenon. The reports fail to establish a robust difference between controls and patient groups in the majority of acoustic variables, which is in line with previous findings (Le,2018). Surprisingly, this study found few significant effects for the baseline accommodation of either group. Specifically, mean pitch accommodation was found to be not significant in the control group, contradicting previous results in healthy populations (i.e., De Looze et al.,2011; Levitan et al.,2011). This discrepancy could be the result from multiple considerations. For one, interviews were conducted at the UMC hospital with a member of the research group, which is hardly a natural environment or spontaneous conversation. Secondly, the interviewers themselves may have exhibited different accommodation behavior towards control participants and patients. This effect can be examined using cross-correlation

(17)

techniques to determine leader-follower relationships between interviewers and participants (Reich et al.,2014: Coco and Dale,2014) and compare them between the groups. Lastly, conversations before the recorded interview took place as the participants communicated with the researchers, signed the informed consent and were encouraged to ask questions about the task. In this time period, initial accommodation might have already occurred. In fact, similarly to divergence in conversation, maintenance is also a conversational strategy in which a speaker might retain their speaking style (Giles and Baker,2008). This would result in no considerable changes across the interview.

Although the univariate analysis of accommodation features did not reveal significant dif-ferences between the patient and control group - or no longer significant after multiple testing correction, the logistic regression model could still distinguish the participants above chance, with a high accuracy in identifying controls. This suggests that there is a difference in accommodation patterns that distinguishes controls from patients. This difference could be due to the combination of different accommodation features, i.e. it is the combination of controls and patients accommodating in different dimensions that drives the distinction between the two. These findings could also be a result of different patterns of accommodation by the interviewers themselves as mentioned before. Possible leader-follower analyses can investigate this question further.

Previous findings have shown that speech disturbances in schizophrenia correlate with negative symptoms (Parola et al., 2020). These results could not be supported with regards to speech accommodation in both the univariate and multivariate analysis. This contrast between the literature and current findings can have multiple reasons. For one, the majority of patients showed low clinical ratings of negative symptoms, often scoring at the minimum. Furthermore, at the time of data analysis some of the PANSS scores were missing or the interview had not yet been administered. It is possible that there is an effect of symptom severity on accommodation given a more heterogeneous sample. Secondly, whereas the regression model showed results above chance in terms of classification, it was operating on a limited set of observations, since only one accommodation score per acoustic feature was computed for each interview. This leads to a limited amount of samples that could be evaluated in the test set and rather low generalizability. Future attempts should utilize accommodation scores at different time points during the interview to generate more observations and to study at which time points during a conversation the classification accuracy is maximized. Additionally, further research should consider more than just negative symptoms, also taking general symptoms or cognitive impairments into account. Efforts in this area may provide further insight into different clinical characterizations of schizophrenia as well as building up to cross-diagnostic approaches to distinguish between different disorders.

4.1 Limitations

The current study is one of the very few attempts in studying speech accommodation is schizophre-nia using a standardized and transparent approach in obtaining the acoustic speech features. How-ever, several limitations can be raised. First, whereas the overall sample size was rather large given the clinical background, the dataset was not created to study this research question and speech accommodation in particular. This impeded further analyses of negative symptoms, the inclusion of

(18)

lexical or syntactic speech features or the dynamics of different interviewers since this information was either not, or not completely collected beforehand. Additionally, the interview structure is meant to elicit spontaneous speech from the participant. This results in relatively little speaking time for the interviewer, with short, fast utterances with few pauses. This structure also limits the investigation of dynamic speech accommodation over time, since there are only few turns within the interview. Lastly, the interviews were conducted by members of the research group. This means multiple interviews, regardless of the participants, were recorded with the same interviewer and are thus not fully independent. Exploratory analysis was carried out using the interviewer as an additional predictor in the logistic regression model. The overall performance of the model was not influenced by the inclusion of this additional predictor, suggesting that this confound does not influence the distinction between the groups. However, since several interviews were missing this information, the regression could not be performed on the entire dataset leading to limited conclusions about the influence of this factor. Future investigations should examine the task that is used to elicit speech in terms of achieving a balanced amount of speaking time and should employ a design in which the interviewer is blinded to the research question.

4.2 Future Directions

As communication and communication accommodation are not limited to language, a large body of research is dedicated to non-verbal synchrony of movement and gestures. It has been shown that individuals with schizophrenia spectrum disorder use fewer gestures than healthy controls (Lavelle et al.,2013), show impairments in perceiving and imitating gestures (Toomey et al.,2002; Bucci et al.,2008; Matthews et al., 2013) and show differences in timing synchronous motor movements (Wilquin et al.,2018). Similarly, linguistic accommodation can extend to all levels of language, not just acoustic features. A recent framework of interpersonal synergy even incorporates neuroimaging methods to observe inter-brain synchronous activity (Hasson et al.,2012; Koole and Tschacher,2016). These examples show that dialogue and interaction cannot be solely understood by investigating language in itself. Rather, a multidimensional approach can feature all these levels of interaction to make conclusions about social behavior (Rasenberg et al.,2020). Whereas approaches exist to integrate multiple dimensions of accommodation (i.e., Solanki,2017), another important concern is the methodology within the disciplines. This study has employed a traditional, straightforward way of assessing synchrony, however these ”aggregating” approaches, which assess accommodation across an entire time span have been criticized for a lack of continuous temporal integration (Coco and Dale,2014) and multiple other techniques have been proposed instead (see Delaherche et al.,2012 for a review). Particularly cross-recurrence quantification analysis is highlighted as an approach using the theory of dynamical systems to model the behaviour of agents in a conversation (Fusaroli, Konvalinka, et al.,2014). This approach has even been used to assess recurrence within speech patterns in schizophrenia patients and distinguish them from healthy controls (Fusaroli et al.,2013). Furthermore, techniques such as temporal generalization matrices are used in neuroimaging research to study the stability of mental representations over time using multivariate classifiers (King and Dehaene,2014; Fyshe,2020). Such approaches could be adapted to study the degree of synchrony across an interview as well. The future directions of

(19)

studying interpersonal behavior lay in multidimensional and interdisciplinary research which is equally as challenging as it is exciting.

5

Conclusion

This research project has shown that the development of quantitative computational markers of speech and language disturbances during social interactions is an emergent field with a promising outlook on future applications. The current study investigated computational features of vocal expression within the context of social interactions. Here, linguistic accommodation in five acoustic dimensions (pitch, pitch variability, loudness, syllable rate and pause duration) was compared between individuals with schizophrenia spectrum disorder and healthy controls. Analyses included univariate and multivariate investigations of accommodation across time and with regard to negative symptom severity. Although groups did not significantly differ in accommodation, this study is a stepping stone in applying novel methods to study communication in social interaction. Further research into individual differences has the potential to establish a framework of interpersonal coordination using computational speech features. This has possible applications in giving insight to social competence and in aiding the development of a linguistic biomarker for schizophrenia.

(20)

References

Alpert, M., Pouget, E. R., & Silva, R. (1995). Cues to the assessment of affects and moods: Speech fluency and pausing. Psychopharmacology bulletin.

Alpert, M., Rosenberg, S. D., Pouget, E. R., & Shaw, R. J. (2000). Prosody and lexical accuracy in flat affect schizophrenia. Psychiatry research, 97(2-3), 107–118.

Andreasen, N. C., Flaum, M., & Arndt, S. (1992). The comprehensive assessment of symptoms and history (cash): An instrument for assessing diagnosis and psychopathology. Archives of general psychiatry, 49(8), 615–623.

Babel, M., & Bulatov, D. (2012). The role of fundamental frequency in phonetic accommodation. Language and speech, 55(2), 231–248.

Bach, D., Buxtorf, K., Grandjean, D., & Strik, W. (2009). The influence of emotion clarity on emotional prosody identification in paranoid schizophrenia. Psychological medicine, 39(6), 927–38. Bedi, G., Carrillo, F., Cecchi, G. A., Slezak, D. F., Sigman, M., Mota, N. B., Ribeiro, S., Javitt, D. C.,

Copelli, M., & Corcoran, C. M. (2015). Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophrenia, 1(1), 1–7.

Bell, V. (2013). A community of one: Social cognition and auditory verbal hallucinations. PLoS Biol, 11(12), e1001723.

Bleuler, E. (1911). Dementia praecox oder gruppe der schizophrenien (Vol. 12). Deuticke. Bleuler, E. (1950). Dementia praecox or the group of schizophrenias.

Boersma, P., & Weenink, D. (2020). Praat: Doing phonetics by computer version 6.1.16. Amsterdam: Institute of Phonetic Sciences.

Br¨une, M. (2005). ”theory of mind” in schizophrenia: A review of the literature. Schizophrenia Bulletin, 31(1), 21–42.

Bucci, S., Startup, M., Wynn, P., Baker, A., & Lewin, T. J. (2008). Referential delusions of communica-tion and interpretacommunica-tions of gestures. Psychiatry Research, 158(1), 27–34.

Burgoon, J. K., Stern, L. A., & Dillman, L. (1995). Reconceptualizing interaction adaptation patterns. Interpersonal adaptation: Dyadic interaction patterns(pp. 115–131). Cambridge University Press.

Byrne, D. (1971). The attraction paradigm.

Coco, M. I., & Dale, R. (2014). Cross-recurrence quantification analysis of categorical and continuous time series: An r package. Frontiers in psychology, 5, 510.

Cohen, A. S., Alpert, M., Nienow, T. M., Dinzeo, T. J., & Docherty, N. M. (2008). Computerized measurement of negative symptoms in schizophrenia. Journal of psychiatric research, 42(10), 827–836.

Cohen, A. S., Mitchell, K. R., Docherty, N. M., & Horan, W. P. (2016). Vocal expression in schizophre-nia: Less than meets the ear. Journal of abnormal psychology, 125(2), 299.

Cohen, A. S., Mitchell, K. R., & Elvev˚ag, B. (2014). What do we really know about blunted vocal affect and alogia? a meta-analysis of objective assessments. Schizophrenia research, 159(2-3), 533–538.

(21)

Cohen, A. S., Morrison, S. C., Brown, L. A., & Minor, K. S. (2012). Towards a cognitive resource limitations model of diminished expression in schizotypy. Journal of abnormal psychology, 121(1), 109.

Cohen, A. S., Renshaw, T. L., Mitchell, K. R., & Kim, Y. (2015). A psychometric investigation of ”macroscopic” speech measures for clinical and psychological science. Behavior Research Methods, 48(2), 475–486.

Corcoran, C. M., Carrillo, F., Fernandez-Slezak, D., Bedi, G., Klim, C., Javitt, D. C., Bearden, C. E., & Cecchi, G. A. (2018). Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry, 17(1), 67–75.

Coulston, R., Oviatt, S., & Darves, C. (2002). Amplitude convergence in children’s conversational speech with animated personas. Seventh International Conference on Spoken Language Process-ing.

Covington, M. A., He, C., Brown, C., Naci, L., McClain, J. T., Fjordbak, B. S., Semple, J., & Brown, J. (2005). Schizophrenia and the structure of language: The linguist’s view. Schizophrenia research, 77(1), 85–98.

De Boer, J. N., Brederoo, S. G., Voppel, A. E., & Sommer, I. E. (2020). Anomalies in language as a biomarker for schizophrenia. Current opinion in psychiatry, 33(3), 212–218.

De Boer, J. N., van Hoogdalem, M., Mandl, R., Brummelman, J., Voppel, A., Begemann, M., van Dellen, E., Wijnen, F., & Sommer, I. (2020). Language in schizophrenia: Relation with diagnosis, symptomatology and white matter tracts. npj Schizophrenia, 6(1), 1–10.

De Looze, C., Oertel, C., Rauzy, S., & Campbell, N. (2011). Measuring dynamics of mimicry by means of prosodic cues in conversational speech. ICPhS 2011.

De Looze, C., Scherer, S., Vaughan, B., & Campbell, N. (2014). Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction. Speech Communication, 58, 11–34.

Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux, S., & Cohen, D. (2012). Interpersonal synchrony: A survey of evaluation methods across disciplines. IEEE Transactions on Affective Computing, 3(3), 349–365.

DeLisi, L. E. (2001). Speech disorder in schizophrenia: Review of the literature and exploration of its relation to the uniquely human capacity for language. Schizophrenia bulletin, 27(3), 481–496. Ditman, T., & Kuperberg, G. R. (2010). Building coherence: A framework for exploring the breakdown

of links across clause boundaries in schizophrenia. Journal of neurolinguistics, 23(3), 254–269. Dombrowski, M., McCleery, A., Gregory Jr, S. W., & Docherty, N. M. (2014). Stress reactivity of

emotional and verbal speech content in schizophrenia. The Journal of nervous and mental disease, 202(8), 608–612.

Dwyer, K., David, A. S., McCarthy, R., McKenna, P., & Peters, E. (2020). Linguistic alignment and theory of mind impairments in schizophrenia patients’ dialogic interactions. Psychological medicine, 50(13), 2194–2202.

(22)

Elvev˚ag, B., Foltz, P. W., Rosenstein, M., & DeLisi, L. E. (2010). An automated method to ana-lyze language use in patients with schizophrenia and their first-degree relatives. Journal of neurolinguistics, 23(3), 270–284.

Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., Andr´e, E., Busso, C., Devillers, L. Y., Epps, J., Laukka, P., Narayanan, S. S., et al. (2015). The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE transactions on affective computing, 7(2), 190–202.

Eyben, F., W¨ollmer, M., & Schuller, B. (2010). Opensmile: The munich versatile and fast open-source audio feature extractor. Proceedings of the 18th ACM international conference on Multimedia, 1459–1462.

Fusaroli, R., Konvalinka, I., & Wallot, S. (2014). Analyzing social interactions: The promises and challenges of using cross recurrence quantification analysis. Translational recurrences (pp. 137– 155). Springer.

Fusaroli, R., Raczaszek-Leonardi, J., & Tyl´en, K. (2014). Dialog as interpersonal synergy. New Ideas in Psychology, 32, 147–157.

Fusaroli, R., Weed, E., Simonsen, A., & Bliksted, V. (2013). Non-linear dynamics of speech and voice in schizophrenia. Development, 15(1), 1–16.

Fyshe, A. (2020). Studying language in context using the temporal generalization method. Philosophical Transactions of the Royal Society B, 375(1791), 20180531.

Giles, H. (1979). Accommodation theory: Optimal levels of convergence. Language and social psychol-ogy, 45–65.

Giles, H. (2016). Communication accommodation theory. The international encyclopedia of communi-cation theory and philosophy, 1–7.

Giles, H., & Baker, S. C. (2008). Communication accommodation theory. The international encyclope-dia of communication.

Gregory Jr, S. W., & Webster, S. (1996). A nonverbal signal in voices of interview partners effectively predicts communication accommodation and social status perceptions. Journal of personality and social psychology, 70(6), 1231.

Hart, M., & Lewine, R. R. (2017). Rethinking thought disorder.

Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S., & Keysers, C. (2012). Brain-to-brain coupling: A mechanism for creating and sharing a social world. Trends in cognitive sciences, 16(2), 114– 121.

Hinzen, W., & Rossell´o, J. (2015). The linguistics of schizophrenia: Thought disturbance as language pathology across positive symptoms. Frontiers in psychology, 6, 971.

Kay, S. R. (1991). Positive and negative syndromes in schizophrenia: Assessment and research (Vol. 5). Psychology Press.

Kay, S. R., Fiszbein, A., & Opler, L. A. (1987). The positive and negative syndrome scale (panss) for schizophrenia. Schizophrenia bulletin, 13(2), 261–276.

Khawaja, M. A., Ruiz, N., & Chen, F. (2008). Think before you talk: An empirical study of relationship between speech pauses and cognitive load. Proceedings of the 20th Australasian conference on computer-human interaction: Designing for habitus and habitat, 335–338.

(23)

King, J.-R., & Dehaene, S. (2014). Characterizing the dynamics of mental representations: The temporal generalization method. Trends in cognitive sciences, 18(4), 203–210.

Kohler, C. G., Walker, J. B., Martin, E. A., Healey, K. M., & Moberg, P. J. (2010). Facial emotion perception in schizophrenia: A meta-analytic review. Schizophrenia bulletin, 36(5), 1009–1019. Koole, S. L., Atzil-Slonim, D., Butler, E., Dikker, S., Tschacher, W., & Wilderjans, T. (2020). In sync

with your shrink. In J. P. Forgas, W. Crano, & K. Fiedler (Eds.), Applications of social psychology: How social psychology can contribute to the solution of real-world problems(pp. 161–184). Koole, S. L., & Tschacher, W. (2016). Synchrony in psychotherapy: A review and an integrative

framework for the therapeutic alliance. Frontiers in psychology, 7, 862. Kraepelin, E. (1906). ¨Uber sprachst¨orungen im traume. Engelmann.

Kuperberg, G. R. (2010). Language in schizophrenia part 1: An introduction. Language and linguistics compass, 4(8), 576–589.

Lavelle, M., Healey, P. G., & McCabe, R. (2013). Is nonverbal communication disrupted in interactions involving patients with schizophrenia? Schizophrenia bulletin, 39(5), 1150–1158.

Le, T. P. (2018). Vocal expression in schizophrenia: Examining the role of vocal accommodation in clinical ratings of speech.

Levelt, W. J., & Kelter, S. (1982). Surface form and memory in question answering. Cognitive psychol-ogy, 14(1), 78–106.

Levitan, R., Gravano, A., & Hirschberg, J. B. (2011). Entrainment in speech preceding backchannels. Levitan, R., & Hirschberg, J. (2011). Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. Twelfth Annual Conference of the International Speech Communication Association.

Matthews, N., Gold, B. J., Sekuler, R., & Park, S. (2013). Gesture imitation in schizophrenia. Schizophre-nia bulletin, 39(1), 94–101.

Mitchell, R. L., & Crow, T. J. (2005). Right hemisphere language functions and schizophrenia: The forgotten hemisphere? Brain, 128(5), 963–978.

Mota, N. B., Copelli, M., & Ribeiro, S. (2017). Thought disorder measured as random speech structure classifies negative symptoms and schizophrenia diagnosis 6 months in advance. npj Schizophre-nia, 3(1), 1–10.

Myers, L., & Sirois, M. J. (2014). Spearman correlation coefficients, differences between. Wiley statsref: Statistics reference online. American Cancer Society.

Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language and Social Psychology, 21(4), 422–432.

Pardo, J. S., Urmanche, A., Wilman, S., & Wiener, J. (2017). Phonetic convergence across multiple measures and model talkers. Attention, Perception, & Psychophysics, 79(2), 637–659.

Parola, A., Berardinelli, L., & Bosco, F. M. (2018). Cognitive abilities and theory of mind in explaining communicative-pragmatic disorders in patients with schizophrenia. Psychiatry research, 260, 144–151.

Parola, A., Simonsen, A., Bliksted, V., & Fusaroli, R. (2020). Voice patterns in schizophrenia: A systematic review and bayesian meta-analysis. Schizophrenia Research, 216, 24–40.

(24)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12, 2825–2830.

Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and brain sciences, 27(2), 169–190.

Rasenberg, M., ¨Ozy¨urek, A., & Dingemanse, M. (2020). Alignment in multimodal interaction: An integrative framework. Cognitive Science, 44(11), e12911.

Reich, C. M., Berman, J. S., Dale, R., & Levitt, H. M. (2014). Vocal synchrony in psychotherapy. Journal of Social and Clinical Psychology, 33(5), 481–494.

Solanki, V. J. (2017). Brains in dialogue: Investigating accommodation in live conversational speech for both speech and eeg data.(Doctoral dissertation). University of Glasgow.

Suzuki, N., & Katagiri, Y. (2007). Prosodic alignment in human–computer interaction. Connection Science, 19(2), 131–141.

Tahir, Y., Yang, Z., Chakraborty, D., Thalmann, N., Thalmann, D., Maniam, Y., binte Abdul Rashid, N. A., Tan, B.-L., Lee Chee Keong, J., & Dauwels, J. (2019). Non-verbal speech cues as objective measures for negative symptoms in patients with schizophrenia. PloS one, 14(4), e0214314. Tajfel, H. (1982). Social psychology of intergroup relations. Annual review of psychology, 33(1), 1–39. Toomey, R., Schuldberg, D., Corrigan, P., & Green, M. F. (2002). Nonverbal social perception and

symptomatology in schizophrenia. Schizophrenia research, 53(1-2), 83–91.

Van Der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The numpy array: A structure for efficient numerical computation. Computing in science & engineering, 13(2), 22–30.

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. (2020). Scipy 1.0: Fundamental algorithms for scientific computing in python. Nature methods, 17(3), 261–272.

Weidman, S., Breen, M., & Haydon, K. C. (2016). Prosodic speech entrainment in romantic relationships. proceedings of Speech Prosody, 508–512.

Wilquin, H., Delevoye-Turrell, Y., Dione, M., & Giersch, A. (2018). Motor synchronization in pa-tients with schizophrenia: Preserved time representation with abnormalities in predictive timing. Frontiers in human neuroscience, 12, 193.

Yin, B., & Chen, F. (2007). Towards automatic cognitive load measurement from speech analysis. International Conference on Human-Computer Interaction, 1011–1020.

Yin, B., Ruiz, N., Chen, F., & Khawaja, M. A. (2007). Automatic cognitive load detection from speech features. Proceedings of the 19th Australasian conference on computer-human interaction: Entertaining user interfaces, 249–255.

(25)

6

Appendix

6.1 Appendix 1

High severity vs Low sever-ity

High severity Low severity

(n = 22) (n = 36) T p Accommodation measures mean F0 0.77 0.45 mean loudness 1.2 0.25 Syllable rate −0.12 0.90

Average pause duration −0.86 0.39

Pitch variability 1.4 0.17

Table 4: Two-sided t-Test for schizophrenia patients with a high severity in negative symptoms (any PANSS negative symptom >= 4) and SZ patients with low severity (any PANSS negative symptom < 4) for accommodation measures.

(26)

Controls Male Female (n = 60) (n = 31) T p Accommodation measures mean F0 −1.6 0.11 mean loudness −0.85 0.40 Articulation rate −0.087 0.93

Average pause duration 0.41 0.68

Pitch variability 0.49 0.62

Table 5: Two-sided t-Test for male and female control participants for all accommodation measures.

Patients Male Female

(n = 49) (n = 31) T p Accommodation measures mean F0 −0.21 0.83 mean loudness −0.31 0.76 Articulation rate 1.3 0.20

Average pause duration −0.62 0.53

Pitch variability −0.68 0.50

(27)

Controls Same pairing Different Pairing (n = 22) (n = 29) T p Accommodation measures mean F0 1.2 0.23 mean loudness 1.5 0.15 Articulation rate −0.72 0.48

Average pause duration −1.0 0.30

Pitch variability 0.71 0.48

Table 7: Two-sided t-Test for gender pairings for all accommodation measures. Participants and interviewer were either of the same gender (self reported: male/male or female/female) or of different genders (male/female or female/male).

(28)

Patients Same pairing Different Pairing (n = 33) (n = 32) T p Accommodation measures mean F0 −0.30 0.76 mean loudness −0.76 0.45 Articulation rate 1.1 0.28

Average pause duration −0.48 0.63

Pitch variability 0.16 0.87

Table 8: Two-sided t-Test for male/female pairings for all accommodation measures. Participants and interviewer were either of the same gender (self reported: male/male or female/female) or of different genders (male/female or female/male).

(29)

6.2 Appendix 2

Figure 5: Comparison of mean speech features across different time points during the interviews. The plot shows the mean values of all speech features in the first vs. second half (left panels) and between conversation thirds (right panels). Healthy control participants are shown in dashed blue, schizophrenia patients are shown with their conversation partner in red. The average values of the interviewers are marked with full circles in lighter shade.

(30)

6.3 Appendix 3

Blunted Affect

Emotional Withdrawal

Poor Rapport

Social Withdrawal

Difficulty Abstract Thinking

Flow of conversation Stereotyped Thinking

1 2 3 4 5 6 PANSS Score

PANSS Scores of Negative Symptoms

Correctly Identified Patients Incorrectly Identified Patients

Figure 6: Plot showing the individual and mean PANSS scores for the patients in the test set that were correctly identified as patients (blue/triangle) and the patients that were incorrectly classified as healthy controls (orange/diamond).

Referenties

GERELATEERDE DOCUMENTEN

Heart Rate Variability as Early Marker of Multiple Organ Dysfunction Syndrome in Septic Patients. Heart rate variability in emergency department patients

o De samenwerkende partijen kunnen zich onderscheiden in hun aanbod door het aanbieden van innovatieve oplossingen welke van toegevoegde waarde voor de opdrachtgever zijn; o

Magnesium wordt erg sterk beïnvloed door de Ca/Mg verhoudingen en door de EC in de eerste proef.. De K/Ca

Ten op- zichte van de ongekapte hennen bleef de groei bij de op 6 weken op de brug gekapte hennen meer achter dan de op 3 weken door een sjabloon gekapte hennen.. De groei van de op

(Gedeeltelijke) uitschakeling van de ontwateringsfunctie van de perceelssloten: d) Vergroting maaiveldsberging. Egalisering en/of verhoging van het maaiveld langs de sloot

Als blijkt dat veerkracht en self- efficacy een verzachtend effect hebben op de relatie tussen stress en professioneel leren, kan naar methoden gezocht worden om veerkracht

The present results of nonlifting multiblade rotors show that this code and grid topology are potentially very useful for complex multiblade rotor flow. The computed

Er zijn echter ook enkele leraren die een negatieve invloed op het zelfvertrouwen van leerlingen ervaren, doordat leerlingen zich door de extra hulp bijvoorbeeld meer bewust