• No results found

Disfluency in Dialogue: Analyzing the effect of group size on the level of disfluencies and speaker convergence in informal dialogues

N/A
N/A
Protected

Academic year: 2021

Share "Disfluency in Dialogue: Analyzing the effect of group size on the level of disfluencies and speaker convergence in informal dialogues"

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Disfluency in Dialogue

(2)

Layout: typeset by the author using LATEX.

(3)

Disfluency in Dialogue

Analyzing the effect of group size on the level of disfluencies and

speaker convergence in informal dialogues

Ghislaine L. van den Boogerd 10996087

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. A.J. Sinclair

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 907 1098 XG Amsterdam

(4)

Abstract

Informal dialogues are not always perfectly fluent. Speakers have been found to use disflu-encies over the course of dialogues as a social signal to their listener that they have difficulty forming an utterance, as opposed to being simple transmission errors. Evidence of alignment between speakers has been shown to occur at many levels of communication, such as lexical, gesture, and gaze. Therefore, it is an open question of whether alignment is also a factor in this social use of disfluencies. Previous work in this area typically reported on two speaker dialogues, but whether the number of speakers in a dialogue can be considered a factor is not extensively researched.

This thesis analyzes the effect of group size, convergence and priming on the level of disfluencies in informal dialogues taken from the British National Corpus. A generalized linear model is used to model the effects. The results show a significantly higher rate of disfluencies in dialogues with a small participant size than in larger groups of speakers. The thesis finds no significant effect of priming on the level of disfluencies found for this specific corpus. The experiments also show no significant effect of differences in disfluency rate between speakers over the course of a dialogue. Furthermore, no significant changes on the individual disfluency rate over time were found. The disfluency rate in the corpus is very low and this thesis leaves room for future work to explore whether these results are reproducible in other settings.

(5)

Contents

1 Introduction 4 1.1 Research questions . . . 4 1.2 Hypotheses . . . 5 1.3 Thesis structure . . . 5 2 Literary Background 6 2.1 Disfluencies . . . 6 2.1.1 Filled Pauses . . . 6 2.1.2 Discourse markers . . . 6 2.1.3 Silent Pauses . . . 7

2.1.4 Other disfluency categories . . . 7

2.2 Alignment . . . 8

2.3 Generalized Linear Model . . . 8

3 Experimental Setup 10 3.1 Data . . . 10 3.2 Annotation . . . 11 3.2.1 Descriptive statistics . . . 11 3.3 Calculating Alignment . . . 12 3.4 Modelling . . . 13 3.5 Baseline . . . 13 3.6 Experiments . . . 13 3.6.1 Group size . . . 13 3.6.2 Local effect . . . 14

3.6.3 Change over time . . . 14

4 Results 15 4.1 Group size . . . 15

4.1.1 Combined . . . 15

4.1.2 Categorized by disfluency type . . . 16

4.2 Local Effect . . . 18

4.2.1 Filled pauses . . . 18

4.2.2 Discourse markers . . . 19

4.2.3 Silent pauses . . . 19

4.3 Change over time . . . 20

4.3.1 Differences in disfluency rate . . . 20

4.3.2 Individual speaker rates . . . 21

5 Analysis 22 5.1 Group size . . . 22

5.2 Local effect . . . 22

5.3 Change over time . . . 23

(6)

7 Discussion 25

A Annotation 28

B T-test results 29

C GLM result summaries 30

C.1 Local effect . . . 30

C.1.1 Comparison to baseline: two speaker dialogues . . . 31

C.1.2 Comparison to baseline: three speaker dialogues . . . 31

C.1.3 Comparison to baseline: four speaker dialogues . . . 32

C.1.4 Comparison to baseline: five or more speaker dialogues . . . 33

C.2 Change over time: difference between speakers . . . 34

(7)

1

Introduction

Disfluencies are interruptions of speech which yield no additional grammatical information (Tree, 1995). They are a characteristic feature of everyday spoken conversations. There are six main types of disfluencies: filled pauses, such as uh and um; discourse markers; silent pauses; prolongations; repairs; and repetitions (Bortfeld et al., 2001; Tree, 1995). Speakers can use specific kinds of these disfluencies to inform the listener (Tree, 1995). This would suggest that disfluencies are used like other conversational signals, such as gestures, gaze, or specific lexical repetitions (Branigan, Pickering, and Cleland, 2000; Finlayson and Corley, 2012; Jokinen et al., 2010; Kopp and Bergmann, 2013). Alignment, or in other words a synchronisation between speakers in a dialogue occurs at the level of these signals. Therefore, it is expected to find alignment on the level of disfluencies as well.

Research by Finlayson and Corley (2012) examined whether speakers would align on the level of disfluency. Although they reported alignment on a lexical level, there was no alignment found on a level of disfluencies. The main element of discussion of their research suggested that participants act differently in test conditions than they would outside of them. The focus of this thesis will be on analysing disfluency alignment in transcripts of spoken dialogues. By using natural language processing on transcripts of informal conversations, this research can try to give a more accurate representation of the alignment and level of disfluencies in free form dialogues.

This thesis will use the transcripts of the British National Corpus to analyze disfluencies. This corpus contains over 1250 different dialogues (Consortium, 2007). It is therefore worthwhile to research the level of disfluencies, because the scale is much larger then previously reported research analysing disfluencies.

Besides analysing whether alignment can be found on the level of disfluencies, this thesis will also evaluate the influence of group size on the overall level of disfluency. The main goal of this research is to evaluate the use of disfluencies in a dialogue and analyse how the level of disfluency is effected by different factors.

1.1

Research questions

This thesis will try to answer the following research question:

To what extend does disfluency rate differ between speakers in dialogues with a varying number of participants, and is there evidence of convergence in its use at ei-ther a local or a global level?

The thesis will be structured around three sub categories to analyse the research question above. RQ1: How does the number of speakers in a dialogue affect the disfluency rate?

RQ2: To what degree is there evidence of a local repetitions effect on disfluency rates? RQ3: To what extend does the relationship between speakers and their disfluency rate change

(8)

1.2

Hypotheses

Alignment has been shown to occur on different levels in dialogues (e.g. lexical, gesture, gaze) (Branigan, Pickering, and Cleland, 2000; Finlayson and Corley, 2012; Jokinen et al., 2010; Kopp and Bergmann, 2013). It is proposed to be used as a subconscious manner of grounding, mimicking, or communication (Clark and Brennan, 1991; Garrod and Pickering, 2009). Discourse markers (e.g. “like”, “you know”) are a category of disfluency on a lexical level, so it is expected to see the same

effect of alignment. This leads to the expectation to see alignment at other verbal and non-verbal levels of disfluency as well.

In order to answer our research questions, we make the following hypotheses: (RQ1):

Group size The overall disfluency rate will be higher in dialogues with a smaller group size compared to larger group sizes. Research reports that the rate of disfluency is higher in more difficult conversations (Bortfeld et al., 2001). It is expected that the level of difficulty is higher in dialogues with smaller group sizes. This leads to the expecatation that smaller groups have a higher level of disfluencies.

(RQ2):

Local effect Repetitions of disfluencies show a local effect. If priming exists on the level of disfluencies, this would show up at a local level. It is expected to find alignment which is an effect of priming, and it is therefore expected to show up at a local level.

(RQ3):

Change over time The relationship between speakers and their disfluency rates change over the course of the dialogue. It is expected to find convergence between speakers. This would also effect their individual disfluency rate.

1.3

Thesis structure

This thesis will address the research question “to what extend does disfluency rate differ between speakers in dialogues with a varying number of participants, and is there evidence of convergence in its use at either a local or a global level?” in the following manner. Section 2 describes a necessary literature background. This section will discuss related work, alignment, a breakdown of different kinds of disfluencies, and propose a definition of disfluency that will be used throughout this thesis. Section 3 will describe the experimental set-up. This section supports the decisions made in annotating the data and choosing the disfluency categories to examine. Furthermore, this section explains the model and experiments which will be conducted.

Section 4 shows the findings of the experiments. These findings are analysed in section 5 to find answers to the research questions in section 1.1.

Lastly, the research questions is answered in a conclusion in section 6. The thesis is evaluated in section 7. Furthermore, this last section also provides ideas for future research on this topic.

(9)

2

Literary Background

2.1

Disfluencies

Human speech is rarely perfect: speakers must conceptualize, formulate, and articulate utterances in such a way that their thoughts and ideas are clear to another person. This contributes to imperfect speech which is not produced in an optimal and fluent manner. Speakers will use filled pauses (e.g. um and uh); silent pauses; repetitions; prolongations; and discourse markers all throughout their speech. These disfluencies add no meaning to the utterance itself, however they can be used as markers for the listener.

Fox Tree describes disfluencies as “phenomena that interrupt the flow of speech and do not add propositional content to an utterance” (Tree, 1995). This is also the definition which will be used throughout this thesis.

Clark (1996) suggest that disfluencies are a signal between interlocutors, used by the current speaker to notify that they experience difficulty with producing speech. Such a signal allows speakers to account for their use of time.

Disfluencies occur throughout every demographic group. Research by Bortfeld et al. (2001) found that older speakers were slightly more disfluent than other speakers. Furthermore, the research discusses that disfluencies are more common in mentally challenging conversations (e.g. discussing abstract figures). This confirms that the use of disfluencies can be linked to an increase in planning difficulty.

In the following sections a more detailed description will be given for the different kinds of disfluencies.

2.1.1 Filled Pauses

Filled pauses are often seen as the stereotypical example of disfluencies. A filled pause is, together with discourse markers, categorized as a filler word.

A filled pause occurs in dialogue when the speaker breaks off speech while continuing to ar-ticulate, and this articulation is neither a word, nor part of a word (Finlayson, 2014). In spoken communication this specific kind of disfluency usually indicates that the speaker needs a pause to collect his thoughts (Tree and Tomlinson Jr, 2007). It can also be used to block the listeners from taking the turn in speech (Maclay and Osgood, 1959).

Research done on the subject of filled pauses by Bortfeld et al. (2001) shows that this specific disfluency is often associated with complex tasks.

The literature on filled pauses often refers to um and uh as classic examples. Although um and uh are the most commonly used filled pauses in the spoken English language (Strassel, 2004), there are many other variations (Tree, 1995). Research done by Finlayson (2014) proposes eight additional filler sounds which can be categorized as a filled pause, namely eh, ehm, er, erm, hmm, huh, mm, and nah.

2.1.2 Discourse markers

Discourse markers are a type of filler word, just like filled pauses. However, unlike filled pauses, discourse markers are short phrases as opposed to nasal sounds. Discourse markers do not contain any grammatical information but occur frequently in speech (Laserna, Seih, and Pennebaker, 2014; Tree and Schrock, 2002). While they do not provide any relevant grammatical information, research

(10)

shows that discourse markers are purposefully used (Tree and Tomlinson Jr, 2007). Generally, discourse markers are used to connect different sections in dialogue (Clark, 1996). However, there are different types of discourse markers which are all used in specific situations. For example, the discourse marker “I mean” is often used as an indicator that a speaker plans to correct its utterance. Another example is the discourse marker “you know”, this specific phrase is used to either check if the listener is still on par with the conversation (Erman, 2001) or to ask the listener to conclude the dialogue in their head (Tree and Tomlinson Jr, 2007).

“Like” is different than most discourse markers. While the two examples mentioned above convey a specific meaning in the dialogue, the use of “like” is ambiguous. Research done by Sharifian and Malcolm (2003) proposes the idea that “like” is used when a speaker is unsure what to say. More recent studies suggested that “like” appears to have a unique function and pattern in dialogues (Laserna, Seih, and Pennebaker, 2014; Liu and Tree, 2012).

2.1.3 Silent Pauses

Silent pauses are breaks in continuous speech, where the speaker ceases to produce any sounds. Although silent pauses can often come paired with other disfluencies, they are a viewed as their own kind of disfluency.

The literature makes a distinction between planning-based pauses and timing-based pauses (Schiller, Ferreira, and Alario, 2007). Where planning-based pauses are associated with the lexical that follows the pause, timing-based pauses are associated with the preceding lexical. The latter is often disassociated with difficulty in speech or the subject of the dialogue. Timing-based pauses often follow the natural flow in speech and are viewed as a logical contribution. They mostly occur at the end of a phrase or at natural break point. Moreover, timing-based pauses do not disturb or change the flow of continuous speech and therefore do not meet the given definition of a disfluency. Planning-based pauses however do interrupt the natural flow of speech and are therefore cate-gorized as a disfluency following the given definition. These pauses can be found anywhere in the speech where the speaker encounters any difficulty. Silent pauses are often explained as a way to process information (Siegman, 1978).

2.1.4 Other disfluency categories

Besides the disfluencies mentioned above (discourse markers, filled pauses, and silent pauses), the literature also distinguishes repetitions, repairs, and prolongations as different kinds of disfluencies. This thesis will not focus on these categories (see section 3.2). However, to give a complete overview and to give the reader a feel for the different kinds of disfluencies that exist, they will be briefly discussed.

Repetitions are disfluencies where the speaker repeats, but does not correct, a sentence or a part of the sentence. It often occurs when a speaker encounters difficulty in speech. The speaker uses the repetition to collect their thoughts, but not further delay their speech (Finlayson and Corley, 2012). Research done by Clark and Wasow (1998) show that the repetition is often produced with function words. Per one thousand mentions, function words were more frequently repeated than content words.

Repairs occur in speech when the speaker notices an error or ambiguity in the preceding utterance. It commonly consists of three parts: the original utterance, an editing phrase, and the repair

(11)

utterance (Levelt and Cutler, 1983). The original utterance contains the reparandum, in other words the part of the utterance which contains the error or ambiguity. The editing phrase allows the speaker time to come up with the right reformulation of the original utterance. The editing phrase often comes paired with either a silent or filled-pause. Lastly, the repair utterance consists out of part of the original phrase where the error of ambiguity is fixed.

Prolongations are speech segments which are stretched longer than expected from normal stan-dards. They are hard to detect because everyone has a naturally different flow of speech. The determination of speech segments as a prolongation is for that reason highly subjective (Finlayson, 2014).

2.2

Alignment

Alignment can occur at many different levels. Research shows alignment occurs on the level of gestures during dialogues (Kopp and Bergmann, 2013), on a lexical level (Branigan, Pickering, and Cleland, 2000; Finlayson and Corley, 2012), and even on a level of gaze (Jokinen et al., 2010). The question arises why alignment occurs during conversational speech.

Research by Clark and Brennan (1991) suggest a reason for alignment is grounding. Grounding is essential in communication, it describes the process of assuring all participants in the dialogue understand the message in the way it was originally sent off by the speaker. The research suggests that to ensure the message is well enough understood by all parties, speakers use repetitions of specific primed short phrases, to check the message that their interlocutor tried to convey.

Another possible reason for why alignment can be found is given in the research by Garrod and Pickering (2009). This research proposes the idea that the goal for interlocutors is to have the same mental representation. They argue that with a highly aligned mental representation interlocutors have a higher level of communication and understanding. They suggest that this can be achieved by both linguistic decisions and non-linguistic processes. In other words, both choice in words and gestures or even rate of speech influences the alignment of interlocutors’ mental representation. This suggests that alignment on both the lexical and non-lexical level contributes to better communication in dialogues.

Alignment can be seen as a subconscious manner of grounding, mimicking, or communication. In the field of disfluencies, research suggests that disfluencies are used as signals from speaker to listener (Clark and Tree, 2002; Smith and Clark, 1993). Therefore, it is of interest to see if alignment occurs on these signals.

2.3

Generalized Linear Model

The generalized linear model (GLM) is an extension of normal linear regression. It was developed as a way to unify various statistical models (Nelder and Wedderburn, 1972). While ordinary linear regression assumes homogeneous variance and a normally distributed error, generalized linear models give the opportunity to choose assumptions which better match the actual distributions of the data, instead of forcing the data to better match the model.

A GLM consists out of three components:

1. An exponential family of probability distributions 2. A linear predictor

(12)

3. A link function

In a linear regression model there are n dependent variables Yiwhich are declared by p declaring

variables xij consisting of n observations. In a GLM the vector of dependent variables Y can be

adapted to fit an exponential family of probability distributions (e.g. binomial, gamma, or Poisson) instead of only a normal distribution. This is also the main advantage of using a GLM, because real world data is not always normally distributed.

The linear predictor, also called the scorefunction, is denoted with η and is expressed as linear combinations of unkown regression parameters β on the coefficients of independent variables X. The equation below follows (1), where X is a matrix and β a vector of regression parameters. The linear predictor itself can be either linear or logistic.

η = Xβ (1)

The last component to a GLM is the link function. This function links the mean µi to the

distribution of the linear predictor. This relation is expressed in equation 2.

E(Y |X) = µ = g−1(η) (2)

The choice in link function is dependent on the data. For binary and proportional data, a binomial distribution is used. In case of a count it is most common to use a Poisson distribution. This thesis uses a binomial distribution because the data for this research is proportional. More details on this selection can be found in section 3.5.

(13)

3

Experimental Setup

3.1

Data

The corpus used for this research is the Spoken British National Corpus 2014 (BNC) (Consortium, 2007)1. The corpus consists of spoken British English, gathered in informal contexts using public

participation in scientific research (PPSR). The BNC contains 1251 dialogue transcripts of normal day-to-day conversations. The BNC is selected based on its size and modernity. Other informal corpora, such as the Switchboard corpus (Godfrey, Holliman, and McDaniel, 1992) and the AMI meeting corpus were considered but rejected. The Switchboard corpus is conducted by telephone which removes a level of naturalness from the speech and is limited to only two speaker conversations. Although the AMI meeting corpus does contain multiple participant dialogues, their recordings are restrained to dialogues in a work-setting. This constraints the speaker of the corpus via their work hierarchy.

The BNC mostly contains dialogues between native English speakers, however there are 56 dialogues where one or more speakers do not have English as their first language. These dialogues are discarded to ensure that the disfluencies that occur in the dialogue are not influenced by an insufficient grasp on the English language.

After discarding the dialogues with non-native speakers, we are left with 1195 dialogues. The group size of the dialogues varies between 2 - 12 speakers. A summary overview of the distribution of the dialogues categorized by amount of speakers can be found in table 1.

Number of speakers Dialogue count Average dialogue length (in utterances)

2 606 736,18

3 316 1079,80

4 185 1426,07

5 or more 88 1521,95

Table 1: Number of dialogues and the average dialogue length per number of speakers All the transcripts in the BNC are annotated by hand. Transcribers followed a corpus design proposed by Atkins, Clear, and Ostler (1992). This research states that classifying non-functional sounds (such as disfluencies) relies on inference of the transcriber. The research therefore suggests that a large set of orthographic representations is created. In other words, the transcriber transcribes only on basis of a set of non-lexical sounds created beforehand and does not interpret the sound itself. This method ensures a certain level of consistency throughout the dialogue.

The corpus also annotated other non-lexical events, including but not limited to: laughter; whistling; coughing; sneezing; and miscellaneous background noise. All of these sounds are annotated with the use of specific tags. The BNC also included pauses in dialogue with use of pause-tags. Pauses are categorized as either short or long. This difference is made clear in the tag itself. All tags are cleaned to process the data (see section 3.2).

1Spoken British National Corpus 2014, XML edition

(14)

3.2

Annotation

The data is cleaned with regular expressions to replace tags by tokens which can be parsed (a complete list of replacement tokens can be found in appendix A). Sometimes there might be a brief contribution to the dialogue by someone who is not identified as a speaker in the corpus. These specific utterances are removed. For example, this could be an utterance made by a member of staff in a restaurant or a random bystander. The removed utterances do not contribute to the main dialogue itself. Sometimes the removal of interjections results in two utterances of the same speaker becoming adjacent. Since this is not in line with the format to which the turn-taking of the corpus is defined, utterances are merged if two subsequent utterances both have the same speaker. This thesis will refer to an utterance in this context.

After cleaning the corpus data, each token is categorized. A total of 10.029.363 tokens are analyzed from the corpus. Disfluent tokens are counted, together with the total amount of tokens in the utterance. The results are stored in a Pandas dataframe.

The disfluencies found in the corpus are split into three different categories: filled pauses, silent pauses, and discourse markers. Exploring other categories of disfluencies such as repetitions, prolongations, and hesitations are outwith the scope of this thesis. To include these kinds of disfluencies, more extensive and expert annotation would be required. Furthermore, the disfluency categories which are analyzed provide a contrasting set of disfluencies which pertain to the research question without addition of additional factors. For example, repairs could be indicative of something complex, which is not the focus of this research.

3.2.1 Descriptive statistics

Table 2 provides an overview of the three different disfluency categories and how often these are found in the corpus.

Filled pauses Silent pauses Discourse markers

123.552 200.167 145.115

0,01231902764 0,01995809704 0,01446901463

Table 2: Total token count and ratio of the disfluency categories in the corpus

As shown in table 2, silent pauses are the most common disfluency in the dialogue, followed by discourse markers, and lastly filled pauses.

A more detailed breakdown of the different categories and their corresponding tokens is shown in table 3, 4, and 5. Furthermore, these tables show the total count of occurrences and the ratio of the token in the corpus.

Eh Er Erm Hmm Mm Uh Um

1976 47813 52644 341 10190 734 9854

0.000197 0,00477 0,00525 0,0000340 0,00102 0,0000732 0,000983 Table 3: Filled pauses: Token count and ratio in the BNC

The category filled pauses can be broken down into many different kinds of non-verbal disfluency, the breakdown of which can be seen in table 3. This selection is based on the annotation information of the BNC and on the research done by (Finlayson, 2014).

(15)

The tokens categorized as a filled pause were always counted in the corpus, except for hmm and mm. These were only counted as a filled pause when they either succeeded or were followed by a pause tag. Because hmm and mm are also used to describe vocal sounds of agreement in the BNC transcripts, they were only counted following a pause to make a separation between the two.

Short Long

195.762 4.005

0,01951888669 0,0003993274548 Table 4: Silent pauses: Token count and ratio in the BNC

Like 145.115 0,01446901463 Table 5: Discourse markers: Token count and ratio in the BNC

Silent pauses are split into two different tokens, short or long pauses. The separation between short and long pauses is made based on the original tags in the BNC.

In this thesis, “like” is used as an example of a discourse marker. It is chosen for its simplicity as a single-word token unlike “you know” or “I mean”. While it may still be confounded with its correct use, it is less likely to be used as a comparative, and more as a discourse marker in a dialogue setting (Sharifian and Malcolm, 2003).

3.3

Calculating Alignment

This thesis uses the alignment measure introduced by Reitter, Moore, and Keller (2010). This method shows a distribution of the probability of repetition between speakers from the target (the repetition itself) to the prime (the base utterance from where the comparison starts). They model the sampling probability that a prime is present in the nth utterance before the target occurs (3).

p(prime|target, n) (3)

If there is no priming, the model is assumed to be as described below (4).

p(prime|target) = p(prime|target) (4)

Each occurrence of a repetition of a disfluency token at distance d counts as priming. If there is an occurrence of a disfluency where this is not primed d utterances before in the dialogue, the disfluency is counted as non-priming.

A erm what resolution you want B I downloaded all those videos

A mm

B all those erm songs short_pause you’re distracting me Table 6: prime-target example

An example snippet of a two person dialogue from the BNC is shown in Table 62. Shown in bold

is an example of cross speaker disfluency use. We set our prime utterance to be the first utterance in this example. The utterance contains a disfluency; erm. We then select our target, in this case the second utterance of the example and check for a repetition of the disfluency. No repetition is found

(16)

in the target. Our total number of comparisons is one, and the total number of repetitions is zero. The target then shifts to the third utterance, we make no comparison because the interest of this thesis is on between speaker repetition. The target moves to the last utterance in the example where a repetition of the prime is found. Two comparisons are made in total, with only one repetition of the prime.

The distance d is measured as a distance in utterances. Within utterance priming is not counted as this research focuses on cross-speaker disfluency repetition. A window frame of d = 25, in utterances, is selected. The choice of this window frame is based on its use by Reitter, Moore, and Keller (2010). Usually, there is also a fast decay in priming, so a window frame of 25 should be large enough to analyse the effect (Branigan, Pickering, and Cleland, 1999).

If there is no priming in the corpus, it would be expected to see an equal amount of cases of repetition, no matter the distance between prime and target. This thesis attempts to reject this null hypothesis to show an effect of priming on the level of disfluencies.

3.4

Modelling

We use a GLM to model the data, choosing to use a binomial link function. As explained in section 2.3., the choice of link function for the GLM is dependent on the data. For this thesis a binomial GLM is used because the data is proportional. This is a typical choice for exponential family distribution based on the form of the data (Zuur et al., 2009).

The GLM implementation which is used for this research is featured from the python module statsmodel3. Statsmodel is part of the scipy package. It provides functions for different statistical models and tests. Statsmodel tests the results against existing packages to ensure correct results. The version of statsmodel used in this thesis is 0.11.1. With python running on version 3.7.7.

3.5

Baseline

In order to confirm that any distance effects are not due to corpus effects, all experiments are performed on the original corpus and a scrambled version.

The corpus is scrambled on the level of individual dialogue with the order of speaker turn-taking maintained. In other words, utterances are randomly shuffled within the same speaker, but never swapped with utterances from other speakers. The order of speaker turn-taking should matter if there are local repetition effects, which can be seen as possible evidence of priming. The baseline shows that any local effects are removed, resulting in the global disfluency repetition rate.

The baseline is used to ensure the findings of the experiments are based on the researched variables and are not random artefacts of the corpera used in dialogue.

3.6

Experiments

This thesis will conduct three different experiments to answer the sub-questions. Total combined disfluency refers to a grouping of all three disfluent categories.

3.6.1 Group size

The first question focuses on how disfluencies are influenced by group size. The dialogues are separated in four classes based on the amount of speakers in the dialogue; two, three, four, and five

(17)

or more.

To test the influence of group size on the total combined disfluency rate this thesis will calculate the individual combined disfluency rate categorized by group size of the dialogue. Because the sample sizes are different between group sizes, a two tailed Welch’s T-Test is conducted to measure significant differences(Ruxton, 2006).

Furthermore, this thesis will calculate the individual disfluency rates for every disfluent category, categorized by group size.

3.6.2 Local effect

To analyse the second question of this thesis, the local effect of disfluency is researched in the corpus. To see if there is a local effect, a Pandas dataframe is created which contains the rate of true comparisons (total count of repetitions from prime to target) over the total amount of comparison on an utterance distance d in a dialogue (see section 3.3). The data is split by each disfluency category. A GLM is applied to each section of the data to model and analyse the effect of alignment. If there is alignment on the level of disfluency we expect to see a decreasing trend in the model on the distance d.

3.6.3 Change over time

The last research question explores whether there is a change in disfluency rate over the course of the dialogue and whether speakers tend towards convergence, divergence, or a reduction or increase in rate.

The dialogues are split into ten sections, where the first section represents the first 10% of the dialogue, the second section represents the dialogue from 10% to 20%, and so on.

For each section of the dialogue the combined disfluency is calculated for each individual speaker. The difference is then calculated by taking the absolute difference between speakers’ disfluency ratios. For multi-speaker dialogues the difference was calculated by calculating all differences a specific speaker has with the other speakers from the dialogue and taking the averaging.

The results are modeled and analysed with a GLM. If convergence exists it is expected to see an inverse relationship between the dialogue position and ratio. However, it is possible that although asymmetry exists between speakers, both speakers increase or decrease their level of disfluency simultaneously. In this case a GLM will not show an inverse or a positive relationship between dialogue position and ratio, but it would be an interesting finding. Therefore, another GLM is modeled on the individual combined disfluency ratio per section of the dialogue to analyse if this is the case.

(18)

4

Results

This section is organised to address the three research questions explored in this thesis. Accordingly, section 4.1, Group size, reports the effect of group size on the disfluency ratio. Section 4.2, Local effect, report the results of alignment over the course of the dialogue and models if there is a local effect. The last section, Change over time, reports the differences in disfluency ratio between speakers over the course of the dialogue.

4.1

Group size

In order to address the first research question, to discover whether the number of speakers in a dialogue has an effect on the rate of disfluencies, the following statistical analyses are conducted. Firstly, the combined disfluency rates are measured across group sizes grouping our three categories together (section 4.1.1). Secondly, the individual categories of disfluency are compared (section 4.1.2).

4.1.1 Combined

Figure 1 shows the overall disfluency ratio across different sizes of speaker groups in the corpus. A independent-samples t-test with unequal variance was conducted pairwise across different group sizes in order to discover whether there were significant differences in the rates between larger and smaller group sizes.

Figure 1: Combined disfluency ratio of individual speakers split over the number of speakers per dialogue

(19)

An independent-samples t-test with unequal variance was conducted to evaluate the hypotheses that two speaker dialogues and three speaker dialogues differ significantly in their self-concept levels. The mean self-concept score of two speaker dialogues (M=0.055, sd=0.028) was statistically significantly different (t=14.943, df=2136, two-tailed p<0.001) from that of three speaker dialogues (M=0.038, sd=0.024). The effect size d=0.637 implies a medium effect(Burns and Burns, 2008).

The same independent-samples t-test with unequal variance was also conducted to evaluate the hypotheses that two speaker dialogues and four speaker dialogues differ significantly in their self-concept levels. There the mean self self-concept score was statistically significantly different (t=8.036, df=1559.2, two-tailed p<0.001) from that of four speaker dialogues (M=0.044, sd=0.028). The effect size d=0.375 implies a small effect.

Lastly, an independent-samples t-test with unequal variance was conducted to evaluate the hypotheses that two speaker dialogues and five or more speaker dialogues differ significantly in their self-concept levels. The mean self concept score was statistically significantly different (t=12.411, df=1165.66, two-tailed p<0.001) from that of five or more speaker dialogues (M=0.039, sd=0.023). The effect size d=0.602 implies a medium effect.

The results of the t-test confirm the differences visible in figure 1, there is a greater significant difference between two speaker dialogues and the other dialogues. While there are some significant differences between the larger group sizes, these are small. There is a small significant difference (p<0.001) between three speaker dialogues and four speaker dialogues, with a small effect size (d=-0.243). We also see a small significant difference (p<0.001) between four speaker dialogues and five or more speakers in dialogue. The effect size d=0.210 implies a small effect. No significant difference is found (p=0.514) between three and five or more speaker dialogues.

All the results of the conducted t-tests are summarized in table 7. The results of the t-test scores for the three different categories individually can be found in appendix B (see section 4.1.2).

Combination (Number of speakers) 2-3 2-4 2-5 3-4 3-5 4-5

Mean difference 0.017 0.011 0.016 -0.006 -0.001 0.006 t 14.943 8.036 12.411 -4.860 -0.654 3.798 Std. Error 0.001 0.001 0.001 0.001 0.001 0.001 df 2136 1559.2 1165.66 1463.15 1099.67 1214.81 P-value (two-tailed) 0.000 0.000 0.000 0.000 0.514 0.000 Mean 1 0.055 0.055 0.055 0.038 0.038 0.044 Mean 2 0.038 0.044 0.039 0.044 0.039 0.039 Std. Dev 1 0.028 0.028 0.028 0.024 0.024 0.028 Std. Dev 2 0.024 0.028 0.023 0.028 0.023 0.023 Cohen’s D 0.637 0.375 0.602 -0.243 -0.035 0.210

Table 7: Combined: T-Test results

4.1.2 Categorized by disfluency type

The overall higher disfluency ratio in two speaker dialogues is consistent across the three disfluency types examined, with the exception of no significant difference (p=0.097) of disfluency rate in two speaker dialogues compared to four speaker dialogues on the level of discourse markers.

All three disfluency categories show a significant difference between two speaker dialogues and multiple speaker dialogues. On the level of filled pauses, two speaker dialogues show a statistically significant higher disfluency rate compared to three (p<0.001, d=0.321), four (p<0.001, d=0.410) and five or more speaker dialogues (p<0.001, d=0.559).

(20)

Figure 2: Disfluency ratio of individual speakers split over the number of speakers per dialogue and disfluency category

Two speaker dialogues also show a significantly higher rate of disfluency on the level of silent pauses compared to three (p<0.001, d=0.346), four (p<0.001, d=0.213), and five or more speaker dialogues (p<0.001, d=0.307). All show a small effect size.

Lastly, two speaker dialogues show a significantly higher rate of disfluencies with a small effect size on the level of discourse markers compared to three (p<0.001, d=0.473) and five or more speaker dialogues (p<0.001, d=0.239). There is no significantly higher disfluency rate found in two speaker dialogues compared to four speaker dialogues (p=0.097).

Small but significant differences were also found while comparing the other group sizes. For a complete overview of all t-test result values, see appendix B.

(21)

4.2

Local Effect

The different categories of disfluency are considered separately since significant difference are found in disfluency rate. Section 4.2.1 shows the local effect on filled pauses. Section 4.2.2 shows the local effect on discourse markers. Lastly, section 4.2.3 shows the local effect on silent pauses.

4.2.1 Filled pauses

Figure 3: Filled pause: Side by side comparison of priming effect on original corpus and baseline. When exploring the second research question, whether there are local effects on cross speaker disfluency repetition, the GLM shows no significant effect (p=0.849). Figure 3 shows the result of the effect of ratio of repetition on the distance from the prime for filled pause disfluencies. While no significant trend was observed considering the effect of repetition ratio on the distance from prime, the GLM did report a significant effect of the number of speakers in the dialogue on the repetition ratio (p<0.05), although with a small effect size (coeff=-0.4649). It indicates that in dialogues with a smaller group size, the repetition rate is higher than in larger group sizes.

A GLM was also fitted to test whether the local effects of the original differ significantly from the scrambled baseline. We compared baseline and original for group sizes two, three, four and five or more, finding no significant effects (p=0.573, p=0.499, p=0.668, p=0.897, respectively). This is unsurprising since no significant effect was found in the original, therefore none is to be expected in the baseline.

(22)

4.2.2 Discourse markers

Figure 4: Discourse markers: Side by side comparison of priming effect on original corpus and baseline.

Secondly, the local effect on the level of discourse markers was researched. Figure 4 shows the findings of the experiment. A binomial GLM was modeled to see if there was an effect of the ratio of repetition on the distance from the prime. The GLM showed no significant effect of ratio of repetition on the distance from prime (p=0.906). The GLM did show that there is a small (coeff=-0.3226) significant effect (p<0.05) of the ratio of repetition on the amount of speakers per dialogue. This suggests that if there are more speakers in the dialogue, the repetition rate gets lower.

A GLM was also fitted to test whether the local effects of the original differ significantly from the scrambled baseline. We compared baseline and original for group sizes two, three, four and five or more, finding no significant effects (p=0.777, p=0.889, p=0.834, p=0.995, respectively).

4.2.3 Silent pauses

The hypothesis that there is an effect of the repetition ratio on the distance from the prime on the level of silent pauses is evaluated. The GLM showed no significant effect (p=0.921). The GLM did show a significant effect of the repetition ratio on the number of speakers in the dialogue (p<0.001). However, this effect is very small (coeff=-0.3622). This coefficient shows a negative relationship between speaker size and repetition rate. In other words, the rate of repetition is lower in larger group sizes.

A GLM was also fitted to test whether the local effects of the original differ significantly from the scrambled baseline. We compared baseline and original for group sizes two, three, four and five or more, finding no significant effects (p=0.623, p=0.555, p=0.635, p=0.834, respectively).

All the complete summaries for the models described in this results section can be found in the appendix.

(23)

Figure 5: Silent pause: Side by side comparison of priming effect on original corpus and baseline.

4.3

Change over time

This section reports the differences in disfluency ratio between speakers and the individual disfluency ratio over the course of the dialogue.

4.3.1 Differences in disfluency rate

Figure 6: Difference in disfluency ratio between speakers

In order to explore whether there are significant changes in disfluency ratio over the course of a dialogue, a binomial GLM was modeled to evaluate the hypotheses that the difference in disfluency ratio is effected by the dialogue section. For filled pauses the GLM showed no significant effect (p=0.628). The GLM also does not show a significant effect of the difference in disfluency ratio on

(24)

the dialogue section for the category of silent pauses (p=0.987). Lastly, there is also no effect of the difference in disfluency ratio on the dialogue section for discourse markers (p=0.636). A total overview of the results of the GLM can be found in the appendix.

Figure 6 shows the result of the differences in disfluency ratio between speakers over the course of the dialogue. The plots report the differences for a specific disfluency type.

Due to the trend observed in the category of filled pauses for dialogues with two speakers, another GLM was modeled to evaluate the hypotheses that the difference in disfluency ratio is effected by the dialogue section. However, this model showed no significant effect (p=0.523).

4.3.2 Individual speaker rates

Besides the differences in disfluency ratio the individual disfluency ratio of speakers over the course of the dialogue is calculated, the results are shown in figure 10. The individual disfluency ratios are categorized per disfluent category.

Figure 7: Individual disfluency ratio

A binomial GLM is fit on every category individually to evaluate the hypothesis that the individual disfluency ratio is affected over the course of the dialogue. Filled pauses show no effect (p=0.526). Furthermore, the hypothesis also shows no effect on the level of silent pauses (p=0.859). Lastly, the evaluation of the hypothesis that the individual disfluency ratio is effected by the dialogue section also shows no effect on the level of discourse markers (p=0.864).

In summary, there is no significant effect reported over all three disfluency categories.

The modeling of the GLM did show a significant effect on the hypothesis that individual disfluency ratio is affected by the number of speakers in the dialogue on the level of filled pauses (p=0.022). The model shows a very small negative relationship between the ratio and the number of speakers in a dialogue (coeff=-0.2356). This again demonstrates that the individual disfluency ratio is higher in smaller group sized dialogues.

(25)

5

Analysis

In this section the results for every experiment are analyzed. Section 5.1 analyzes the effect of group size on disfluencies. Section 5.2 analyzes the findings on a local level. The last analysis, section 5.3, analyzes the differences and individual disfluency ratios over the course of the dialogue.

5.1

Group size

The results for the experiment on the effect of group size on disfluency ratios showed a significant difference between the different group sizes in the dialogue (section 4.1).

Two speaker dialogues were shown to have a significantly higher disfluency ratio compared to multi-speaker dialogues. This disfluency ratio is consistently statistically significantly higher for every disfluent category individually, except for the disfluency ratio between two and four speaker dialogues on the level of discourse markers. Discourse markers are the only verbal disfluency examined in this thesis. This could be the cause of this difference.

A possible reason for this effect has been suggested by Fay, Garrod, and Carletta (2000). This research reports that in smaller groups there tends to be a more interactive dialogue than in larger groups. The dialogue style in larger groups tends to shift to a more monologue style, due to certain speakers taking the lead. Therefore, participants are more likely to only be influenced by the most dominant speaker.

Research also shows that speakers are more disfluent when they have mentally challenging conversations or when they discuss high level tasks (Bortfeld et al., 2001). The higher disfluency ratio in dialogues with a smaller number of participants could be explained by this. While this was outwith the scope of this thesis, an interesting future area to explore could be whether the level of linguistic complexity within the dialogues is a significant factor in the disfluency differences between different group sizes.

5.2

Local effect

The results show no significant differences in local effect between the original and the scrambled baseline (section 4.2).

Even though there is no significant effect modeled by the GLM, the original examined corpus does seem to show, by eye at least, a very slight upwards effect compared to the baseline. This is the opposite of our hypothesis of alignment, as in case of priming a negative relationship between repetition and priming distance is expected. In the case of no priming on a local level, it is expected to see no trend at all. This slight upward trend could be an indication of misalignment on a level of disfluency. Divergence in dialogue as opposed to alignment has been found. Findings by Healey, Purver, and Howes (2014) reported that speakers systematically diverge from each other in their choice for syntactic constructions. They suggest that the priming effects described in literature are a result of the need to actively participate and contribute to the conversation, for example the need for backchannels. This could be the reason why this slight upward trend is noticeable. The corpus data of the BNC consists solely out of informal, face-to-face conversations where no active participation or contribution is required. Unlike corpora used in these other studies, where this is task based.

The disfluencies examined in this thesis were very sparse in the data. Therefore, all the differences reported are at a very small scale. This could be a reason why there are no significant effects modeled by the GLM.

(26)

Interesting findings from experiment two and three were a significant effect of the number of speakers on the proportion of repetition (section 4.2) and the difference between disfluency rates between speakers and the individual disfluency rates (section 4.3). This effect can be explained by the overall higher individual disfluency ratio in smaller group sizes. Because the disfluency ratio is higher overall, this gives more possibilities in the dialogue for repetitions.

5.3

Change over time

In the last experiment, both the difference in disfluency ratio between speakers and the individual disfluency ratio are analyzed over the course of the dialogue.

For all disfluency categories the GLM reported no effect of the difference in disfluency ratio over the course of the dialogue. Filled pauses in two speaker dialogues do show a slight inverse relationship between the difference in disfluency ratio over the course of the dialogue. Although again the GLM did not report this as a significant effect, it could be the case that this is because of the sparse distribution of disfluencies throughout the data.

The GLM also reported no effect of the individual disfluency ratio over the course of the dialogue. It could be the case that there is an equal amount of dialogues where the individual disfluency rate either rises or drops over the course of the dialogue, and the model therefore almost shows an average of this ratio. However, because there is also no significant difference found on the difference of disfluency ratios between speakers, this is unlikely.

The only significant effect reported by the GLM is an inverse effect of the individual disfluency ratio of filled pauses on the number of speakers in a dialogue. This indicates that if there are more speakers in a dialogue, the individual disfluency ratio of filled pauses is lower over the course of the dialogue. This result could be explained by the overall higher disfluency rate in smaller group sizes.

This thesis also explored the average lengths of dialogues and reported that two speaker dialogues were on average shorter (in number of utterances) than dialogues with larger group sizes (section 3.1). The results of the third experiment do not show a significant effect of individual disfluency rate over the course of the dialogue for larger group sizes. Therefore, it can be expected that the length of the dialogue is not a factor on the level of disfluency.

(27)

6

Conclusion

This thesis evaluated the effect of group size and priming on the level of disfluencies in informal dialogue transcripts from the BNC. The results indicate that the disfluency ratio is affected by the group size of a dialogue but no significant evidence of alignment, nor speaker convergence or change over time was found on either a local or global level. While there is no support for the second and third research questions, this could be due to the low rate of disfluencies present in this corpus, resulting in the data being too sparse for significant effects to be found on a local level.

Firstly, this thesis hypothesized that disfluency ratios are influenced by group size (RQ1). A higher disfluency ratio was expected in dialogues with smaller group sizes compared to dialogues with larger group sizes. The results show that this is the case in the BNC.

Two speaker dialogues have an overall significantly higher disfluency level for filled and silent pauses compared to dialogues with more participants, with no consistently significant higher dis-fluency level for discourse markers. Both further support this finding, reporting a significant effect of the repetition rate and of the disfluency level over the course of the dialogue respectively, to be influenced by the number of speakers. Both experiments showed that in smaller sized groups the repetition rate and disfluency ratio over the course of the dialogue are higher compared to larger sized groups.

Secondly, this thesis explored whether cross speaker disfluency repetition is present (RQ2). The repetition rate at a distance d in utterance was calculated and then modeled using a binomial GLM. The experiment was done twice for every disfluency category, once to calculate the results of the original corpus and once to calculate the baseline based on a scrambled version of the corpus. No significant local repetition effects were found (section 4.2). While a hypothesis of this thesis was that alignment would be visible at the level of disfluencies, the results do not support this hypothesis. In the analysis, this thesis suggests this could be due to the low disfluency rates in general and a discussion to this effect is provided in the following section.

Lastly, the relationship between speakers is analysed to research whether speakers converge and whether their disfluency ratios changed over the course of the dialogue. This thesis hypothesized that if speakers converge, a reduction in this difference over time will indicate speaker adaptation. For the case that speakers simply change over time with no convergence, the individual speaker rate is modeled in the same manner. No significant change is found over the course of a dialogue in either case (section 4.3).

Overall, the main finding of this thesis is that there is an effect of group size on the ratio of disfluency in the BNC. This thesis reports no effect of any priming on a local level. It also reports no effect of convergence between speakers over the course of the dialogue. While these results do not prove the absence of these phenomena, they indicate that there are more factors to be considered in future work, and provide interesting insights into speaker dynamics in a multi-party setting.

(28)

7

Discussion

Some limitations of this work include that the analyzed disfluencies form only a very small part of the BNC. Because of this, the experiments reported are on a very small scale. This could have been an influence while modeling and analysing the effects. Furthermore, this thesis only focused on a select group of disfluency categories. The inclusion of all category types might uncover a more general pattern. Future research could show if the inclusion of more disfluency categories lead to more conclusive results.

Another note for this thesis is that it considers less data for larger sized groups, because the BNC has an uneven distribution of the amount of participants in a dialogue. This is also the case on a more general level; there are few corpora with multiple participants in a dialogue. The BNC contained overall almost twice as many dialogues with two speakers compared to three speakers dialogues, and even three times as many compared to four speaker dialogues (see section 3.1). This could influence the dialogues for which there was less data. For example, it could have influenced the much lower disfluency rates in the multi-speaker dialogues. Future research could explore if in more equally distributed corpora, the same findings would be reported.

Since previous work has found linguistic complexity and disfluency to correlate, future work will include this as a factor in similar experiments. Thus exploring whether the interaction of language complexity results in local results.

(29)

References

Atkins, Sue, Jeremy Clear, and Nicholas Ostler (1992). “Corpus design criteria”. In: Literary and linguistic computing 7.1, pp. 1–16.

Bortfeld, Heather et al. (2001). “Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender”. In: Language and speech 44.2, pp. 123–147.

Branigan, Holly P, Martin J Pickering, and Alexandra A Cleland (1999). “Syntactic priming in written production: Evidence for rapid decay”. In: Psychonomic Bulletin & Review 6.4, pp. 635– 640.

— (2000). “Syntactic co-ordination in dialogue”. In: Cognition 75.2, B13–B25.

Burns, Robert P and Richard Burns (2008). Business research methods and statistics using SPSS. Sage.

Clark, Herbert H (1996). Using language. Cambridge university press.

Clark, Herbert H and Susan E Brennan (1991). “Grounding in communication.” In:

Clark, Herbert H and Jean E Fox Tree (2002). “Using uh and um in spontaneous speaking”. In: Cognition 84.1, pp. 73–111.

Clark, Herbert H and Thomas Wasow (1998). “Repeating words in spontaneous speech”. In: Cognitive psychology 37.3, pp. 201–242.

Consortium, BNC (2007). British National Corpus, XML edition. Oxford Text Archive. url: http: //hdl.handle.net/20.500.12024/2554.

Erman, Britt (2001). “Pragmatic markers revisited with a focus on you know in adult and adolescent talk”. In: Journal of pragmatics 33.9, pp. 1337–1359.

Fay, Nicolas, Simon Garrod, and Jean Carletta (2000). “Group discussion as interactive dialogue or as serial monologue: The influence of group size”. In: Psychological science 11.6, pp. 481–486. Finlayson, Ian R (2014). “Testing the roles of disfluency and rate of speech in the coordination of

conversation”. PhD thesis. Queen Margaret University, Edinburgh.

Finlayson, Ian R and Martin Corley (2012). “Disfluency in dialogue: An intentional signal from the speaker?” In: Psychonomic bulletin & review 19.5, pp. 921–928.

Garrod, Simon and Martin J Pickering (2009). “Joint action, interactive alignment, and dialog”. In: Topics in Cognitive Science 1.2, pp. 292–304.

Godfrey, John J, Edward C Holliman, and Jane McDaniel (1992). “SWITCHBOARD: Telephone speech corpus for research and development”. In: Acoustics, Speech, and Signal Processing, IEEE International Conference on. Vol. 1. IEEE Computer Society, pp. 517–520.

Healey, Patrick GT, Matthew Purver, and Christine Howes (2014). “Divergence in dialogue”. In: PloS one 9.6, e98598.

Jokinen, Kristiina et al. (2010). “Turn-alignment using eye-gaze and speech in conversational interac-tion”. In: Eleventh Annual Conference of the International Speech Communication Association. Kopp, Stefan and Kirsten Bergmann (2013). “Automatic and strategic alignment of co-verbal gestures in dialogue”. In: Alignment in communication: Towards a new theory of communication, pp. 87– 107.

Laserna, Charlyn M, Yi-Tai Seih, and James W Pennebaker (2014). “Um... who like says you know: Filler word use as a function of age, gender, and personality”. In: Journal of Language and Social Psychology 33.3, pp. 328–338.

Levelt, Willem JM and Anne Cutler (1983). “Prosodic marking in speech repair”. In: Journal of semantics 2.2, pp. 205–218.

(30)

Liu, Kris and Jean E Fox Tree (2012). “Hedges enhance memory but inhibit retelling”. In: Psycho-nomic bulletin & review 19.5, pp. 892–898.

Maclay, Howard and Charles E Osgood (1959). “Hesitation phenomena in spontaneous English speech”. In: Word 15.1, pp. 19–44.

Nelder, John Ashworth and Robert WM Wedderburn (1972). “Generalized linear models”. In: Journal of the Royal Statistical Society: Series A (General) 135.3, pp. 370–384.

Reitter, David, Johanna D Moore, and Frank Keller (2010). “Priming of syntactic rules in task-oriented dialogue and spontaneous conversation”. In:

Ruxton, Graeme D (2006). “The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test”. In: Behavioral Ecology 17.4, pp. 688–690.

Schiller, Niels O, Victor S Ferreira, and F-Xavier Alario (2007). “Words, pauses, and gestures: New directions in language production research”. In: Language and Cognitive Processes 22.8, pp. 1145–1150.

Sharifian, Farzad and Ian G Malcolm (2003). “The pragmatic marker like in English teen talk: Australian Aboriginal usage”. In: Pragmatics & cognition 11.2, pp. 327–344.

Siegman, ARON WOLFE (1978). “The meaning of silent pauses in the initial interview.” In: The Journal of nervous and mental disease 166.9, pp. 642–654.

Smith, Vicki L and Herbert H Clark (1993). “On the course of answering questions”. In: Journal of memory and language 32.1, pp. 25–38.

Strassel, Stephanie (2004). Simple metadata annotation specification V6. 2.

Tree, Jean E Fox (1995). “The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech”. In: Journal of memory and language 34.6, pp. 709–738.

Tree, Jean E Fox and Josef C Schrock (2002). “Basic meanings of you know and I mean”. In: Journal of Pragmatics 34.6, pp. 727–747.

Tree, Jean E Fox and John M Tomlinson Jr (2007). “The rise of like in spontaneous quotations”. In: Discourse Processes 45.1, pp. 85–102.

Zuur, Alain et al. (2009). Mixed effects models and extensions in ecology with R. Springer Science & Business Media.

(31)

A

Annotation

Original tag in BNC Replacement token Remarks

<anon type=“name” nameType=“g”/> NAME ’g’ could either be f or m.

<anon type=“place”/> PLACE

<anon type=“telephoneNumber”/> TEL <anon type=“address”/> ADDRESS

<anon type=“email”/> EMAIL

<anon type=“financialDetails”/> FINANCE <anon type=“socialMediaName”/> SOCIAL <anon type=“dateOfBirth”/> DoBIRTH <anon type=“miscPersonalInfo”/> MISC

<vocal desc=“x”/> None ’x’ could represent all kinds of vocal descriptive sounds (e.g. laughter, sigh, gasp, etc)

<pause dur=“short”/> short_pause <pause dur=“long”/> long_pause

<event desc=“x”/> None ’x’ could represent all kinds of events (e.g. some-one leaving, tap running, background noise)

(32)

B

T-test results

Combination (Number of speakers) 2-3 2-4 2-5 3-4 3-5 4-5

Mean difference 0.004 0.005 0.006 0.001 0.003 0.002 t 7.488 8.949 11.407 2.031 4.794 2.682 Std. Error 0.001 0.001 0.001 0.001 0.001 0.001 df 2114.41 1654.63 1136.48 1568.51 1114.44 1162.46 P-value (two-tailed) 0.000 0.000 0.000 0.042 0.000 0.007 Mean 1 0.017 0.017 0.017 0.013 0.013 0.012 Mean 2 0.013 0.012 0.010 0.012 0.010 0.010 Std. Dev 1 0.012 0.012 0.012 0.011 0.011 0.011 Std. Dev 2 0.011 0.011 0.010 0.011 0.010 0.010 Cohen’s D 0.321 0.410 0.559 0.100 0.257 0.151

Table 9: Filled pause: T-Test results

Combination (Number of speakers) 2-3 2-4 2-5 3-4 3-5 4-5

Mean difference 0.007 0.005 0.006 -0.002 -0.001 0.002 t 8.113 4.525 6.368 -2.314 -0.821 1.402 Std. Error 0.001 0.001 0.001 0.001 0.001 0.001 df 2131.16 1507.86 1181.11 1433.22 1122.35 1233.19 P-value (two-tailed) 0.000 0.000 0.000 0.021 0.412 0.161 Mean 1 0.022 0.022 0.022 0.015 0.015 0.018 Mean 2 0.015 0.018 0.016 0.018 0.016 0.016 Std. Dev 1 0.021 0.021 0.021 0.019 0.019 0.022 Std. Dev 2 0.019 0.022 0.017 0.022 0.017 0.017 Cohen’s D 0.346 0.213 0.307 -0.116 -0.044 0.077

Table 10: Silent pause: T-Test results

Combination (Number of speakers) 2-3 2-4 2-5 3-4 3-5 4-5

Mean difference 0.006 0.001 0.003 -0.005 -0.003 0.002 t 11.559 1.658 5.092 -8.392 -4.608 3.165 Std. Error 0.001 0.000 0.001 0.001 0.001 0.001 df 2023.12 1663.57 1260.2 1199.9 857.807 1212.44 P-value (two-tailed) 0.000 0.097 0.000 0.000 0.000 0.002 Mean 1 0.016 0.016 0.016 0.010 0.010 0.015 Mean 2 0.010 0.015 0.012 0.015 0.012 0.012 Std. Dev 1 0.015 0.015 0.015 0.009 0.009 0.014 Std. Dev 2 0.009 0.014 0.012 0.014 0.012 0.012 Cohen’s D 0.473 0.076 0.239 -0.433 -0.271 0.176

(33)

C

GLM result summaries

C.1

Local effect

The summaries below are the modeled results from the original corpus. Filled pause ratio ∼ utt ∗ N _speakers

================================================================================== Filled pauses coef std err z P>|z| [0.025 0.975] ---Intercept -3.4292 0.540 -6.349 0.000 -4.488 -2.371 utt 0.0067 0.035 0.190 0.849 -0.062 0.076 N_speakers -0.4649 0.199 -2.334 0.020 -0.855 -0.074 utt:N_speakers 0.0003 0.013 0.024 0.981 -0.025 0.026 ==================================================================================

Discourse marker ratio ∼ utt ∗ N _speakers

================================================================================== Discourse marker coef std err z P>|z| [0.025 0.975] ---Intercept -3.2856 0.413 -7.950 0.000 -4.096 -2.476 utt 0.0032 0.027 0.118 0.906 -0.050 0.057 N_speakers -0.3226 0.143 -2.259 0.024 -0.603 -0.043 utt:N_speakers 4.726e-05 0.009 0.005 0.996 -0.019 0.019 ==================================================================================

Silent pause ratio ∼ utt ∗ N _speakers

================================================================================== Silent pauses coef std err z P>|z| [0.025 0.975] ---Intercept -2.3963 0.297 -8.080 0.000 -2.978 -1.815 utt 0.0019 0.020 0.099 0.921 -0.037 0.040 N_speakers -0.3622 0.105 -3.457 0.001 -0.568 -0.157 utt:N_speakers 0.0008 0.007 0.119 0.905 -0.013 0.014 ==================================================================================

(34)

C.1.1 Comparison to baseline: two speaker dialogues

The summaries below are the modeled results from both the original corpus and the baseline where there are two speakers in the dialogue.

Filled pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Filled pause N=2 coef std err z P>|z| [0.025 0.975] ---Intercept -4.1717 0.195 -21.445 0.000 -4.553 -3.790 Version[T.Original] -0.1606 0.285 -0.564 0.573 -0.719 0.397 utt 0.0028 0.013 0.221 0.825 -0.022 0.028 utt:Version[T.Original] 0.0045 0.019 0.242 0.809 -0.032 0.041 ===========================================================================================

Discourse marker ratio ∼ utt ∗ V ersion

=========================================================================================== Discourse marker N=2 coef std err z P>|z| [0.025 0.975] ---Intercept -3.7237 0.165 -22.609 0.000 -4.047 -3.401 Version[T.Original] -0.0668 0.236 -0.283 0.777 -0.529 0.396 utt 0.0020 0.011 0.187 0.852 -0.019 0.023 utt:Version[T.Original] 0.0015 0.016 0.097 0.923 -0.029 0.032 ===========================================================================================

Silent pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Silent pause N=2 coef std err z P>|z| [0.025 0.975] ---Intercept -2.9892 0.121 -24.727 0.000 -3.226 -2.752 Version[T.Original] -0.0845 0.172 -0.491 0.623 -0.422 0.253 utt 0.0014 0.008 0.168 0.866 -0.014 0.017 utt:Version[T.Original] 0.0022 0.011 0.192 0.848 -0.020 0.025 ===========================================================================================

C.1.2 Comparison to baseline: three speaker dialogues

The summaries below are the modeled results from both the original corpus and the baseline where there are three speakers in the dialogue.

Filled pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Filled pause N=3 coef std err z P>|z| [0.025 0.975]

---Intercept -4.6144 0.277 -16.647 0.000 -5.158 -4.071 Version[T.Original] -0.2851 0.422 -0.676 0.499 -1.111 0.541 utt -0.0017 0.019 -0.090 0.928 -0.038 0.035

(35)

utt:Version[T.Original] 0.0092 0.028 0.332 0.740 -0.045 0.064 ===========================================================================================

Discourse markers ratio ∼ utt ∗ V ersion

=========================================================================================== Discourse marker N=3 coef std err z P>|z| [0.025 0.975] ---Intercept -4.7771 0.319 -14.987 0.000 -5.402 -4.152 Version[T.Original] -0.0634 0.456 -0.139 0.889 -0.957 0.830 utt 0.0004 0.021 0.018 0.986 -0.042 0.042 utt:Version[T.Original] 0.0018 0.030 0.059 0.953 -0.058 0.061 ===========================================================================================

Silent pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Silent pause N=3 coef std err z P>|z| [0.025 0.975] ---Intercept -3.4856 0.182 -19.158 0.000 -3.842 -3.129 Version[T.Original] -0.1563 0.265 -0.590 0.555 -0.675 0.363 utt 0.0021 0.012 0.176 0.860 -0.022 0.026 utt:Version[T.Original] 0.0029 0.018 0.166 0.868 -0.032 0.037 ===========================================================================================

C.1.3 Comparison to baseline: four speaker dialogues

The summaries below are the modeled results from both the original corpus and the baseline where there are four speakers in the dialogue.

Filled pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Filled pause N=4 coef std err z P>|z| [0.025 0.975] ---Intercept -5.0486 0.444 -11.376 0.000 -5.918 -4.179 Version[T.Original] -0.2873 0.671 -0.428 0.668 -1.602 1.027 utt 0.0011 0.029 0.037 0.970 -0.057 0.059 utt:Version[T.Original] 0.0080 0.044 0.183 0.855 -0.078 0.094 ===========================================================================================

Discourse markers ratio ∼ utt ∗ V ersion

=========================================================================================== Discourse marker N=4 coef std err z P>|z| [0.025 0.975] ---Intercept -4.3112 0.303 -14.239 0.000 -4.905 -3.718 Version[T.Original] -0.0903 0.430 -0.210 0.834 -0.934 0.753 utt -0.0023 0.020 -0.113 0.910 -0.042 0.038 utt:Version[T.Original] 0.0074 0.029 0.257 0.797 -0.049 0.064 ===========================================================================================

(36)

Silent pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Silent pause N=4 coef std err z P>|z| [0.025 0.975] ---Intercept -3.6695 0.232 -15.845 0.000 -4.123 -3.216 Version[T.Original] -0.1595 0.337 -0.474 0.635 -0.819 0.500 utt 0.0022 0.015 0.142 0.887 -0.028 0.033 utt:Version[T.Original] 0.0034 0.022 0.154 0.878 -0.040 0.047 ===========================================================================================

C.1.4 Comparison to baseline: five or more speaker dialogues

The summaries below are the modeled results from both the original corpus and the baseline where there are five or more speakers in the dialogue.

Filled pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Filled pause N=5+ coef std err z P>|z| [0.025 0.975] ---Intercept -5.3538 0.750 -7.141 0.000 -6.823 -3.884 Version[T.Original] -0.1400 1.083 -0.129 0.897 -2.263 1.983 utt -0.0003 0.050 -0.007 0.995 -0.099 0.098 utt:Version[T.Original] 0.0078 0.072 0.108 0.914 -0.133 0.149 ===========================================================================================

Discourse markers ratio ∼ utt ∗ V ersion

=========================================================================================== Discourse marker N=5+ coef std err z P>|z| [0.025 0.975] ---Intercept -4.5651 0.477 -9.577 0.000 -5.499 -3.631 Version[T.Original] -0.0043 0.667 -0.006 0.995 -1.312 1.304 utt 0.0013 0.032 0.039 0.969 -0.062 0.064 utt:Version[T.Original] 0.0019 0.045 0.043 0.966 -0.086 0.090 ===========================================================================================

Silent pauses ratio ∼ utt ∗ V ersion

=========================================================================================== Silent pause N=5+ coef std err z P>|z| [0.025 0.975] ---Intercept -3.8742 0.377 -10.287 0.000 -4.612 -3.136 Version[T.Original] -0.1125 0.538 -0.209 0.834 -1.167 0.942 utt 0.0008 0.025 0.031 0.975 -0.049 0.051 utt:Version[T.Original] 0.0039 0.036 0.109 0.913 -0.067 0.074 ===========================================================================================

Referenties

GERELATEERDE DOCUMENTEN

Regarding the speaker variation as a function of linguistic context, we hypothesised that articulatory strong locations (onsets and fricatives with non-labial neighbours)

In general it is assumed that questions in natural language are represented by sentences with an interrogative sentence type ( example la).1 In natural dialogues, however, one

Previous research indicates that linguistic context affects the speaker-dependency of speech sounds; some linguistic contexts seem to be able to convey more speaker information than

Repeated measures analysis of variance (RM-ANOVA) is performed on prominence difference scores collected in [3] and the production experiment as dependent variables

per speaker, using as predictors the acoustic variables and the Word Class they were sampled

Het doel van deze scriptie is uiteen te zetten welke regels gelden voor toelating tot arbeidsvoorwaardenoverleg ten aanzien van niet-representatieve vakbonden en de

WITij

In welke mate de niet-gemonsterde evertebraten in het voorjaar van belang zijn voor de Veldleeuwerik, en in hoeverre de verschillende type randen van belang zijn voor deze