• No results found

Speech Rhythm as an Interaction of Measurements on Durational Variability and Vowel Quality

N/A
N/A
Protected

Academic year: 2021

Share "Speech Rhythm as an Interaction of Measurements on Durational Variability and Vowel Quality"

Copied!
95
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Speech Rhythm as an Interaction of

Measurements on Durational Variability

and Vowel Quality

Lili Szab´

o

Faculty of Arts

Rijksuniversiteit Groningen

Research Master Linguistics Erasmus Mundus European Masters

Program in Language and Communication Technologies (LCT)

November 2013

Advisor

Bistra Andreeva

(2)

Acknowledgements

I would like to thank to Bistra Andreeva for her guidance

throughout the project, and and to C

¸ a˘

grı C

¸ ¨

oltekin for his sharp

comments on the statistical side of things and on the drafts.

I am eternally grateful to Edit Kakuszi, Anna Szab´

o and K´

aroly

Szab´

o for their voices. To the speakers of all the corpora whom I

only know by their monograms, and for the work of the

anonymous manual annotators without which this thesis would

not exist.

Finally, I would like to thank to Pedro Mercado for his native

language expertise and help in understanding mathematical

(3)

Abstract

Speech rhythm is a perceptual pattern of regularly occurring prominent elements, arising from the phonological structure of the language. Prominent elements are phonological units such as syllables, feet or morae. The repetition of these elements is perceived as similar.

Recent work has established a set of quantitative measurements that aim to discriminate or group languages by placing them on a continuum enclosed by prototypical languages on both ends. These metrics can also be used in second language learning, to asses the speakers’ command on a given second language, by quantifying how far acoustically the produced speech is from the target language.

This thesis investigates the rhythmic properties of Hungarian, by comparing it to Bulgarian and German.

We also examine if Hungarian sentences with varying intonational patterns would result in differing rhythmic patterns.

(4)

Contents

1 Introduction 2

1.1 Purpose . . . 3

1.2 Related Work . . . 3

1.3 Overview of the Thesis . . . 7

2 Theoretical Foundations 9 2.1 Historical Overview . . . 9

2.2 The Dauer Model . . . 10

2.3 Criticism of the Dauer Model . . . 13

2.4 Metrics . . . 13

2.5 Rhythm Metrics and Second Language Learning . . . 19

2.6 Previous Measurements in the Literature . . . 19

2.7 Predictions . . . 20 3 Languages 23 3.1 Hungarian . . . 23 3.2 Bulgarian . . . 24 3.3 German . . . 25 4 Method 28 4.1 Material . . . 28 4.2 Statistical Analysis . . . 31 5 Implementation 34 5.1 Pre-processing . . . 34

5.2 Marking of Rhythmic Units . . . 35

5.3 Metrics and Statistics on Corpora . . . 36

6 Results 39 6.1 The Dauer-Model . . . 39

6.2 Story Data - Durational Metrics . . . 41

6.3 Story Data - Durational PVI Measures . . . 47

6.4 Story Data - Measuring Vowel Quality: Dispersion from Cen-troid (DC) . . . 49

(5)

6.6 Hungarian Sentences - Durational Metrics . . . 54 6.7 L2 Measurements - Durational Metrics . . . 55 6.8 Discussion . . . 57

7 Other Lines of Research and Applications 60

7.1 Speech Pathology . . . 60 7.2 Language Identification . . . 60

8 Conclusion 62

A Orthographic Prompting Text for Hungarian Own Recordings 65 A.1 Sentences . . . 65 B F1 and F2 Values in Stressed and Unstressed Positions 69 B.1 Miscellaneous . . . 70

(6)
(7)

Chapter 1

Introduction

The classical definition of speech rhythm is “the grouping of elements into larger units” [Dauer, 1987], where the elements can be syllables or the intervals between stressed syllables (feet). Rhythm is “the property of all languages”, whereas stress is not. (By stress we refer to lexical stress.) Rhythm of speech emerges from the interaction of a number of phonological components, such as syllable structure and prosody.

The categorical distinction between stress-timed, syllable-timed and mora-timed languages is based on the (simplistic) assumption that rhythm is timing, meaning that rhythm is the isochrony (equal duration) of the elements. Ac-cording to this theory, languages fall into one of these three categories:

1. stress-timing (‘Morse code rhythm’): speakers compress syllables where necessary to yield isochronous feet, stresses are recurring at regular in-tervals (English, German, Russian, Arabic, Persian)

2. syllable-timing (‘machine-gun rhythm’): speakers make syllables the same length, syllables are recurring at regular time intervals (French, Spanish, Italian, Telogu)

3. mora-timing: morae have approximately the same duration (Japanese, Tamil)

(8)

1.1

Purpose

According to [Ramus, 2002], in order to obtain a full-fledged rhythm typology, the following actions have to be taken:

1. enlargement of corpora on which the measurements are carried 2. extension of corpora with more languages (non-prototypical ones) 3. extension of corpora with speakers, samples, speech rates, speech registers The aim of this thesis is to comply to 1., 2. and partially to 3. by working on a large, 56-speaker corpus of the Hungarian language, that, as to our knowledge, has not been studied extensively with regards to rhythm until now. With two control languages, Bulgarian and German we carry out our experiments on a total corpus of approximately 300 minutes of speech, for details see Table 4.1. Working on this large amount of data is not a common practice in rhythm re-search (generally ranging from 1.5 to 9 minutes per language [Arvaniti, 2012]). From this we expect to obtain more stable and robust results.

The purpose of the thesis is three-fold:

Firstly, we repeat the most important measurements in the literature for

this new setting of languages, based on [Dauer, 1987], [F. Ramus, 1999], [Ling et al., 2000], [Grabe & Low, 2002], [Nolan & Asu, 2009] and [Arvaniti, 2012] (each work is

described in detail in Section 1.2).

Secondly, we investigate rhythmic patterns across different sentence types in Hungarian, to see if there is intra-language variation due to the different intonation patterns.

Lastly, we measure the importance of rhythm in second language learning. Our question is whether the rhythm of the target language differs substantially depending on the level of proficiency of those who speak it (native speakers or second language learners of different levels). More concretely, is the rhythm of German spoken by Bulgarian native speakers closer to Bulgarian or to German?

1.2

Related Work

In chronological order the following works summarize the rise and debates around rhythmic classification and measures.

[Dauer, 1987]

(9)

to classify languages into stress-timed or non-stress-timed categories. The cri-teria/parameters for the grouping are: duration of stressed syllables, syllable structure, vowel length contrast, intonation, tone, vowel reduction, consonant quality and the position of lexical stress. If the differences between stressed and unstressed syllables are maximized, stress is the principle behind rhythmic grouping.

A detailed description of the model is in Section 2.2. [F. Ramus, 1999]

This work rejects the isochrony theory by citing a large body of research that failed to find equal- feet or syllables experimentally. Supports the idea of ‘rhythm-based language discrimination’, since behavioral experiments, i.e. lan-guage discrimination tasks performed by newborns, had corroborated it.

He also claims that the two most important of the phonological criteria presented by [Dauer, 1987] are syllable structure and vowel reduction, which are mainly responsible for the perception of different rhythm types. Also is in favor for conceptualizing rhythm as a continuum with the prototypical languages (English as stress-timed and Spanish as syllable-timed) on the two ends, and most languages somewhere in between at an intermediate position (e.g. Catalan and Polish). Conceding, that there could be more than two rhythm classes that differentiate between languages.

The main purpose of the their investigation is to implement and extend the phonological model proposed by Dauer, since it fails to explain why infants can still differentiate between intermediate languages, or how they perceive and acquire speech rhythm of their languages.

They investigate eight languages (English, Dutch, Polish, categorized as stress-timed; French, Spanish, Italian, Catalan, categorized as syllable-timed; Japanese, categorized as mora-timed). Their corpus comprises of four native speakers reading five sentences for each language.

They introduce three metrics that they expect to successfully discriminate between the classical rhythm classes, and correlate with the results of language discrimination tasks:

1. %V (proportion of utterance comprised of vocalic intervals)

2. the standard deviation of the duration of vocalic intervals within each sentence, noted as ∆V

3. the standard deviation of the duration of intervocalic intervals within each sentence, noted as ∆C

(10)

[Ling et al., 2000]

British and Singapore English are compared with regards to their rhythm type. A strong criticism of standard deviation (∆) is presented. ∆ is blind to the sequence of syllables. Theoretically it is possible that two languages score the same, but have completely different durational patterns in an idealized scenario, when one language have only short consonants in the first part of the utterance, and only long ones in the second half, and the other language having changing short and long consonants throughout the utterances.

A new metric is proposed, to overcome the weaknesses of ∆, by taking into account successive interval duration. The normalized Pairwise Variability Index (nPVI) is the mean of the differences between successive intervals divided by the sum of the same intervals, the latter step included to control for speech rate variation (the formula is presented in Section 2.4).

In the second part of the study, vowel quality (vowel reduction) is measured. It is hypothesized that the less variable the vowel durations, and the less reduc-tion in the F1/F2 values, the more likely that the language is syllable-timed.

To investigate spectral patterns in vowels they measure the location of F1 and F1 in a Full Vowel Set vs. Reduced Vowel Set of both varieties. They use a quantification measure to calculate the degree of vowel centralization. A vowel is a vector, represented by two points in the vowel plane. The distance from the mean (centroid) is then calculated. The dispersion (the total variance of a vowels system from the centroid) is the mean sum of squares of the distances. Their results are as expected: significant distinction between the full and reduced set in British English, and non-significant in Singapore English. [Grabe & Low, 2002]

In a subsequent study Grabe and Low found that Singapore English is only slightly more syllable-timed than British English, when including more lan-guages in the investigation. Consonantal PVI was suggested as useful for cross-linguistic comparisons, especially when languages exhibit phonological properties of both stress-timing and syllable-timing. It is suggested that speech rate normalization is not appropriate for consonantal intervals, as differences between languages in consonantal interval durations are a function of their phonotactics, and not of speech rate.

(11)

distinc-tion. Mora-timed Japanese did not occupy a distinct area in the PVI plane, as opposed to the clear separation of Japanese from stress-timed and syllable-timed languages on the %V-∆C mapping in [F. Ramus, 1999]. Some languages were clearly distinguished by rPVI-C, with Polish and Estonian having extreme high and low values, respectively.

The results were compared to the scores in [F. Ramus, 1999], some being consistent, but some clearly different. They questioned the utility of %V, as it should have shown a negative correlation with vocalic nPVI, but in fact it did not.

The paper concluded that rhythm classes were rather gradient than cate-gorical, since the rhythm-class hypothesis could not be defended, and instead there are degrees of stress- and syllable-timing. They called the phenomenon of certain speech continua perceived categorically by na¨ıve subjects as weak categorical.

[Nolan & Asu, 2009]

They reviews PVI in its historical context and pose the question if PVI has advanced our understanding of the nature of speech rhythm, arguing that it is only indirectly related to perceived rhythm. They hypothesize that stress-timing and syllable-stress-timing are not points at either end of a continuum but orthogonal dimensions, and a language can be both stress-timed and syllable-timed. They acknowledge that the advantage of PVI is that it accounts for local variability (as opposed to the standard deviation). In raw PVI the tempo changes are not factored in, but in normalized PVI they are. They propose to measure syllable and foot PVI values to test the notion of coexisting rhythms. They describe problems with syllabification and footing. It is not a straight-forward task to identify prominent syllables in different languages. Language-specific definitions of the foot are provided. The studied languages are Esto-nian, English, Mexican and Castilian Spanish.

They use normalized syllable and foot PVI, to make the values are compa-rable across units of different magnitude such as the syllable and the foot.

Their findings show that even a syllable-timed language can manifest foot isochrony. They conclude that rhythm is rather a two-dimensional space than a continuum.

[Arvaniti, 2012]

This is a study testing the stability of the most commonly used metrics (∆V, ∆C, VarcoV, VarcoC, rPVI-C, nPVI-V) in the literature. It examines system-atically the consistency and correlation between metrics and combination of metric across different conditions such as elicitation method (read vs. spon-taneous speech) and speech controlled for containing only non-reduced vowels vs. maximal proportion of reduced vowels.

(12)

The results reveal inconsistencies at every level, finding intra-language vari-ation sometimes greater than varivari-ation between languages. The article con-cludes that rhythmic classification based on the metrics widely used in the literature is risky.

1.3

Overview of the Thesis

(13)
(14)

Chapter 2

Theoretical Foundations

In this section we review the history of the research on rhythm and rationale for the metrics, or, as they were called in the early days, the acoustic correlates of rhythm.

2.1

Historical Overview

The idea of rhythmic typology was first mentioned by [James, 1940], describ-ing English or Arabic by the terms ‘morse code’ rhythm type, and referrdescrib-ing to French and Telogu as ‘machine-gun’ rhythm type of languages. Soon af-ter, [Pike, 1945] established the terms “stressed-timed” and “syllable-timed”, to contrast English and Spanish. The first reference to the “mora-timing” Japanese is from [T.Shibata, 1980].

A relevant claim from Abercrombie, dated in 1967 [Abercombie, 1967], is that every language has a rhythm, amended by [Dauer, 1987] with the obser-vation that not every language has phonological stress.

Despite all the attempts, experimental work had not lead to evidence of isochrony. In one hand, it was shown that foot duration is proportional to the number of syllables it contains (in English) [F. Ramus, 1999]. On the other hand, there was also no evidence found, that syllable durations would be constant in Spanish [F. Ramus, 1999].

(15)

rhythm is rather a continuous variable than a dichotomous one, as many of the languages are not so easy to categorize.

The dubious state of these languages called for the quantification of the new notion of rhythm. It was argued that the quantification had to be based on acoustic cues extracted from the speech signal, since it is our perception system that discriminates between languages.

Previous research showed that only acoustic cues are guiding the infants’ perception when acquiring and discriminating different rhythm types. They can only segment speech into vocalic intervals and consonantal intervals, and they cannot identify complex phonological concepts as syllable or foot [F. Ramus, 1999]. Thus, in order to measure rhythm speech had to be segmented into vocalic and consonantal intervals to measure their variability independently from each other. The main assumption with respect to the rhythm categories was that since timed languages have more complex syllable structure, and stress-induced vowel reduction that is accompanied by durational shortening, the variability of vocalic and consonantal intervals would be higher in stress-timed languages.

Yet another line of interest turned to measure the variability between adja-cent feet and syllables directly [Nolan & Asu, 2009], instead of measuring the variability of smaller (vocalic and intervocalic) units. It was also proposed to conceptualize rhythm as a two-dimensional space instead of a scale.

2.2

The Dauer Model

The phonological components whose interaction rhythm emerges from the model proposed by Dauer [Dauer, 1987] are listed below. We will apply this model to Hungarian, and compare the results with other languages.

It has to be noted that there is a difference in the usage of the terminol-ogy between [Dauer, 1987] the and the present study, namely the distinction between stress and accent. In the present study we use the term stress to refer to the phonological feature that gives prominence to particular syllables as op-posed to others, and that is present in the lexical level, whereas the term accent is the phonetic realization of stress - the complete opposite of how Dauer uses the terms.

(16)

20 30 40 50 60 20 30 40 50 60 70

Ideal Scores for Prototypical Languages

Vocalic metric Consonantal metr ic Stress−timed ●Syllable−timed Mora−timed

Figure 2.1: Ideal rhythm clusters, the prototypical languages, e.g. English as stress-timed, Spanish as syllable-timed and Japanese as mora-timed.

Length

Duration

Whether stressed syllables are longer or shorter than unstressed syllables. The language gets a plus value if the duration of the vowels in stressed syllables (like CCVCC) is by 1.5+ times more than the vowels in unstressed syllables, zero if they are slightly longer, and minus if they are equal or there is no lexical stress in the language.

(17)

men-tioned contexts, we only differentiate between stressed and unstressed positions. Syllable structure

The language gets a plus if there are a variety of complex syllables and heavy syllables tend to be stressed, minus if the possible syllable types are limited in number, and mostly the types CV or CVC.

Quantity

The presence or lack of vowel length contrast in stressed or unstressed sylla-bles. The language gets a plus if the duration of the vowels is phonologically contrastive only in stressed syllables, zero if some of them are contrastive in unstressed position as well and minus if vowel length contrast is fully preserved in unstressed positions.

Pitch

Intonation

The language gets a plus if stressed syllables are turning points in intonation contour, and minus if intonation and tone are independent.

Tone

The language gets a plus for pitch-accent languages, zero if tones are present on all stressed syllables, but neutralized/modified on unstressed ones, and minus if the tone system is fully developed regardless of stress.

Quality

Vowels

This component of the model (vowel reduction) contains two statements at the same time, but only allows three variations of the theoretically four. It examines a) if there is a centralization process in unstressed position, and b) if the size of the vowel inventory is smaller in unstressed position.

1. centralization [+], size [-], (and the language gets a plus) 2. centralization [-], size [-], (and the language gets a zero) 3. centralization [-], size [+], (and the language gets a minus)

(18)

Consonants

The language gets a plus if consonants are more precisely articulated in stressed syllables, and minus if they have the same articulation in both stressed and unstressed syllables.

Function of stress

The language gets a plus if stress can occur in different positions in the word, and moving it would result in different meaning, zero if the stress is fixed, and minus if there is no phonological stress on the word level.

Assessment

The evaluation/assessment of the category values is as follows: the more pluses a language has, the more likely it is to be stress-timed, since it has “strong stress”, the difference between stressed and unstressed syllables is consider-able, it is easy to identify the stressed syllables in continuous speech. In a language characterized by mainly minuses the principle of grouping is differ-ent (segmdiffer-enting speech into meaningful units); it can be the syllable or vowel length. This type of languages can also exhibit phonological stress, but its less prominent, not as easily identifiable.

2.3

Criticism of the Dauer Model

The reason why we start to explore the rhythm of our languages with the Dauer model (despite its shortcomings described below) is because it established the phonological account of rhythm, and out of the eight criteria that make up the Dauer model, four remained frequently investigated with respect to their relation to rhythm: syllable structure, vowel reduction, vowel length contrast and lexical stress. The description of syllable structure and vowel reduction is disputable:

[Dimitrova, 1997] had drawn the attention to the lack of an intermediate category for languages that allow to have complex syllables, while in most cases employ simple ones (CV, CVC) for the ‘Syllable structure component’.

As for the concept of vowel reduction, it needs further clarification. [Mooshammer & Geng, 2008] mentions that stress-induced vowel reduction (the one excluding other

(19)

Although stress-induced vowel reduction and the shortening of the reduced vowels are frequently associated, it is not necessary that they occur together. The model does not treat this case separately, which is a weakness of it, since it is not clear which setup Hungarian should be assigned to. Since ‘Duration’ (the proportion of duration in stressed vs. unstressed vowels) is already a separate criteria in the model, we only take into account the direction of the reduction, if it is a centralization process where the vowel space is shrunk and shifted towards the center, or a raising process, where the vowels neutralize for their openness feature.

Lastly, for two of the components (’Quantity’ and ’Tone’) languages can-not be exhaustively assigned, since the ’feature does can-not exists for the given language’ is not provided.

2.4

Metrics

As mentioned in Section 2.1, the categorization of languages into stress- and syllable-timed classes, is based on perceptual judgements only, and therefore no established quantitative evaluation exists to measure the accuracy of it.

The rhythm metrics, a set of variables derived from the acoustic duration of consonantal and vocalic intervals can be grouped into two:

• interval measures (V%, ∆V, ∆V,) by [F. Ramus, 1999], and their vari-ants by [Dellwo, 2006] (VarcoV, VarcoC )

• pairwise variability indices by [Ling et al., 2000] (rPVI-C, nPVI-V ), and their variants by [Asu & Nolan, 2006] (nFootPVI, nPVI-CV )

Theoretical Grounding

• isochrony is a purely perceptual phenomenon, therefore measuring the absolute durational differences of syllables or stress feet does not yield to expected results

• young infants can distinguish between vocalic and consonantal intervals, but initially lack the concept of syllabification, therefore the separation of these intervals in the metric provides a developmentally plausible account of rhythm

• another reason for separation of vocalic and consonantal intervals is that the variation of these intervals can be independent for some languages (e.g. high consonantal, but low vocalic variation in Polish [Grabe & Low, 2002]) • vowel duration is the key to syllable-timing since it is generally assumed

that if the duration of vowels are constant throughout the syllables the consonants do not change that equilibrium

(20)

• departing from the purely acoustic notion of measures is argued by [Nolan & Asu, 2009], that splitting up the syllable into its parts cannot be a good choice

be-cause the syllable has a central role in the phonological structure • durational PVI’s for stress feet and (pseudo-)syllables are to measure

variability or constancy in the phonological units making up utterances • measuring the variability of syllables and feet directly addresses the

con-cept of percon-ceptual isochrony of these

The metrics below can be applied to different properties, such as duration, pitch, intensity of the successive elements. They can also be utilized not only to differentiate between languages, but also between different accents, and ac-cents of language learners, if the native language of the learner is rhythmically distinct from the target language.

The resulting value for each metric ranges between 0-100, so that they are comparable.

Interval Measures

Vowel Ratio

• %V measures the proportion of vocalic intervals in the utterance, with pauses and other noises included

• %V2 measures the proportion of vocalic intervals in the utterance, with-out pauses and other noises included

The higher the vowel ratio, the more syllable-timed a language is. More on the relationship of the metrics and rhythmic categories see in Section 2.4.

Standard Deviation - Delta (∆)

Introduced as an acoustic correlate in [F. Ramus, 1999].

Statistical standard deviation of vowels and consonants (∆V and ∆C) in an utterance reflects the variability in the duration of the vocalic and intervocalic intervals.

It can be demonstrated that ∆V or ∆C are insensitive for adjacency of the syllables, as the values could be the same value for an extreme stressed timed and syllable timed language (see in Section 2.1).

The other shortcoming of ∆C is, that it shows an inverse correlation with speech rate [Dellwo, 2006].

(21)

Varcos

Introduced in [Dellwo, 2006]. Coefficient of variation to measure data dis-persion from the mean. The normalized version of standard deviation (thus normalizes for tempo changes): multiplying it by 100, and dividing it by the mean.

V arco∆C = ∆C ∗ 100

meanC (1)

Pairwise Variability Indices

Presented in [Grabe & Low, 2002], based on ideas first exposed in [Ling et al., 2000]. Pairwise Variability Index (PVI) is a quantitative measure of how phonological units differ from their neighbours. In its initial form, it was developed as an alternative for ∆C. It also measures variability, but sequential variability of successive consonantal or vocalic intervals. Sequential patterning of durations could be seen as underlying any auditory impression of rhythm.

It is a pairwise difference in value v for a property p (e.g. duration) between the members of each successive pair of phonological units (e.g. vocalic or con-sonantal intervals). The larger the mean of the differences, the less regularity the language exhibits from unit to unit, the more close it it to the stress-timed prototype and vice versa. In other words, PVI captures how different (the duration of) the prominent syllables are, compared to the nonprominent ones -naturally, in syllable-timed languages this difference is smaller than in stress-timed languages, since the prominent syllables (vowels) are still expected to be similar in length to the non-prominent syllables.

As PVI takes the local variation into account (as opposed to ∆), it reflects the temporal sequencing of the units. The vocalic and intervocalic PVI are based more on acoustic than phonological principles, they are blind to word and syllable boundaries.

• raw PVI (rPVI): k stands for the index of the current interval, m is the number of intervals in the utterance, therefore m-1 is the number of successive pairs in the utterance.

rP V I = m−1

X

k=1

| vk− vk−1| /(m − 1) (2)

(22)

resultant value behaves like a percentage. nP V I = 100 ∗ "m−1 X k=1 | vk− vk−1 (vk+ vk−1)/2 | /(m − 1) # (3)

PVI can be applied to all kinds of properties that are associated with phono-logical units, such as duration, intensity, pitch.

PVI can also be used on many types of phonological-morphological units, e.g. vowels, consonants, syllables, stress feet. It is recommended to use nor-malized PVI, when comparing syllable and foot durations in a given language, since the absolute differences between consecutive feet are greater than between syllables or syllable parts (phones), which - being building parts of feet - are shorter units.

The relationship of PVI’s to the rhythmic categories is similar to that of the Deltas and Varcos: the higher the variability, the more likely that the language is stress timed.

Dispersion from Centroid (DC)

DC is a quantitative measure of the degree of vowel centralization suggested in [Ling et al., 2000]. It is more direct measurement of vowel reduction, that does not take durational patterns (e.g. vowel shortening) into account. It measures how prevalent of the movement towards the mid-central point of the vowel quadrilateral in stressed vs. unstressed positions is. It determines the distance (dispersion) of the F1/F2 values from the mean (centroid).

A vowel is represented as a vector V = (F 1, F 2), where the set of vowels is represented as: Vi = (Vi1, V

2

i ). The mean vector (centroid) is defined as: S =n1Pn

i=1Vi. And the dispersion coefficient is defined as follows:

D = 1 n n X i=1 (Vi− S)2 (4)

(23)

Prototype Stress−timed F2 F1 ● ● Full vowels

Potentially reduced vowels

Prototype Syllable−timed F2 F1 ● ● Full vowels

Potentially reduced vowels

Figure 2.2: Full and reduced vowel sets illustrated schematically. Af-ter [Ling et al., 2000].

Metric Scores and Language Types

It had been widely discussed in the literature how rhythm metrics predict the type of the language, based on its relation of one or more of the phonological properties whose interaction result in the perception of rhythmic class. There is no consensus about any of the claims, but there are widely accepted ones that most authors adhere to.

The rationale for the segmentation of speech into vocalic and consonantal intervals is based on their audible separability as explained in Section 2.1.

(24)

Two main criteria from Dauer’s model are frequently cited when determin-ing the rhythm class of a language:

1. syllable complexity:

The more complex the syllable is, the language is more likely to be stress-timed.

2. vowel reduction:

The greater the vowel reduction is, the language tends to be more stress timed.

Based on these two criteria a number of metrics were developed, in order to measure consonant and vowel duration variability.

1. The more complex syllable structure, the greater variability in the dura-tion of consonantal intervals, the more likely that the language is stress-timed. Metrics developed to measure the complexity of syllable structure: ∆V, ∆C, VarcoV, VarcoC, rPVI-C, nPVI-V).

2. Vowel reduction causes more variability in the duration of vocalic inter-vals, and these also take up less percentage of the utterance in stress-timed languages. Metrics: V%, ∆V, nPVI.

∆V reflects phonological characteristics directly related to vowel length, such as contrastive vowel length, unstressed vowel reduction, and non-phonemic vowel lengthening, and one could expect higher values for languages exhibiting these properties. Unfortunately the other vocalic variability metrics normalize the contrast away.

Other than that vowel duration is held as the key to syllable-timing, thus a high score of vowel ratio (%V) is associated with syllable-timing.

.

Relationship between Rhythm Metrics

Correlation between the metrics was shown in several previous studies on many language families [Shelece Easterday & Maddieson, 2011].

%V and ∆C shows a strong negative correlation. The precise nature of the relationship between %V and ∆C differs according to syllable structure.

∆V showed a similar pattern to the nPVI-V in [Ling et al., 2000].

Negative correlation could be assumed between %V and vocalic nPVI, but [Grabe & Low, 2002] found no empirical evidence of it.

∆V and ∆C shows a negative correlation with speech rate.

Reliability and Validity

(25)

there is significant variation of the scores due to: speech rate, inter-speaker variation, elicitation method, sentence type [Arvaniti, 2012].

The main criticism against vowel ratio (%V) and ∆C is that they disregard other aspects of rhythm such as successive variability. Therefore, these metrics are rather the metrics of the phonological factors such as syllable structure or vowel reduction, rather than rhythm as an interaction of these.

There is some controversy around what is more perceptually plausible to measure, the rhythmic units (syllable and foot) directly, or their parts (vocalic and consonantal intervals) in [Nolan & Asu, 2009]. [F. Ramus, 1999] claims that by language acquisition very newborns can only segment vowels from consonants (by recognizing vowel, and treating consonants as noise), while [Nolan & Asu, 2009] concedes that the reason why researches chose to investi-gate the variability of vocalic and consonantal intervals is because syllabifica-tion in English is a controversial and tedious task.

Standard Error (SE) of the Mean Metric Scores

In addition to the mean metric scores, that are calculated as the mean of all speakers, the standard error (SE) is also reported. It is an indicator of how representative the mean value is of the population it is drawn from. It is calculated as the standard deviation of the mean divided by the square root of the size of the population, thus is a normalized version of standard deviation.

It has to be noted here that it is not a common practice in rhythm research to report SE, instead in most of the works standard deviation (SD) values are reported along the means. However [White & Mattys, 2007] reports SE instead of SD.

2.5

Rhythm Metrics and Second Language Learning

No matter how controversial the rhythm metrics are when it comes to rhythmic classification, they do discriminate between languages across the above men-tioned phonological properties. Foreign accent in a second language (L2) is a popular research topic today. It is widely assumed that the accuracy with which nonnative speakers pronounce an L2 is, at least to some extent, depen-dent on their first language (L1).

Rhythmic metrics can capture differences in phonological properties, e.g. vowel reduction in unstressed syllables, measured by dispersion from centroid.

2.6

Previous Measurements in the Literature

(26)

behind these pairwise comparisons is that the combined vocalic and consonantal metrics characterize the rhythm of a language better [Arvaniti, 2012].

Language %V2 ∆IV VarcoV VarcoC VnPVI rCPVI

English* 44 54 50 51 59 63 English** 41.1 56.7 57.2 64.1 English*** 40.1 53.5 Estonian** 44.5 31.9 45.4 40.0 German* 38 62 51 53 54 66 German** 46.4 52.6 59.7 55.3 Greek* 47 37 56 46 52 47 Greek** 44.1 52.7 48.7 59.6 Italian* 50 43 53 49 46 49 Italian*** 45.2 48.1 Japanese** 45.5 55.8 40.9 62.5 Japanese*** 53.1 35.6 Spanish* 49 45 47 47 47 54 Spanish** 50.8 47.5 29.7 57.7 Spanish*** 43.8 47.4

(27)

20 30 40 50 60 70 20 30 40 50 60 70 %V ∆ C EN DE ●SP ●IT JP ●ET ● ● Stress−timed Syllable−timed Mora−timed Mixed 20 30 40 50 60 70 20 30 40 50 60 70

Vocalic Normalized PVI

Consonantal Ra w PVI EN DE ●SP ●IT JP ●ET ● ● Stress−timed Syllable−timed Mora−timed Mixed

Figure 2.3: Metric scores in previous research. The language codes are: DE=German, EN=English, ET=Estonian, IT=Italian, JP=Japanese, SP=Spanish. All values are from [Arvaniti, 2012], except for Japanese and Estonian, which are from [Grabe and Low, 2002]. Right: vocalic ratio and consonantal ∆, left: vocalic normalized PVI and consonantal raw PVI.

2.7

Predictions

Compared to German and Bulgarian, Hungarian was expected to be placed in the rhythm continuum in an intermediate position between these two.

We expect slight differences in the metrics scores of the sentence types in Hungarian.

(28)

Caveats

(29)
(30)

Chapter 3

Languages

The choice of the two control languages of the investigation was partly based on corpora availability. The categorical classification orders German as a pro-totypical stress-timed language [Kohler, 1982], Bulgarian as a mixed-syllable-timed [Dimitrova, 1997]. Hungarian is a language not readily categorized. The following sections describe the three languages according to the four main phonological/prosodic criteria that are considered to contribute the most to the perceptual stress- or syllable-timing.

3.1

Hungarian

Duration

Vowel Length Contrast

The language exhibits vowel length contrast, which means that there is phono-logical difference between long and short vowels. Both long and short vowels can occur in both stressed and unstressed syllables. Table 3.1 shows the dura-tional difference on a phoneme-by-phoneme basis, measured on the Hungarian Babel corpus described in Section 4.1.

Phoneme pair Short (ms.) Long (ms.) [i]-[i:] 72.64 (1350) 92.60 (221) [y]-[y:] 73.06 (220) 99.57 (86) [u]-[u:] 73.07 (382) 111.79 (151) [ø]-[ø:] 88.09 (388) 119.00 (237) [o]-[o:] 84.39 (1391) 111.07 (266) [E]-[e:]* 89.80 (3079) 107.87 (945) [6]-[a:]** 82.83 (2964) 130.45 (1213)

(31)

One issue arises, as the normalized metrics (nPVI) undesirably neutralizes this contrast, and we do not know how that influences the final scores.

Vowel Reduction

Vowel reduction in Hungarian is not widely discussed, but a recent study [Szeredi, 2012] and our results also showed that centralization process does take place in un-stressed position of the vowel.

Primary Word Stress

In Hungarian the primary stress of the content words in their lexical form is fixed, it falls on the first syllable. Function words do not carry stress. The primary stresses are prominent intonationally, pitch-accented, i.e. initiating a falling pitch contour, which is accompanied by higher amplitude and/or longer duration.

The only exception from the fixed stress rule is the class of function words, which do not bear stress at all.

Syllable Complexity

Hungarian allows complex (up to three consonants) onsets and codas, but only in word-initial or -final positions respectively, additionally, the majority of syllables in the language are CV, CVC, VC [T¨orkenczy, 2004].

3.2

Bulgarian

Duration

According to [Dimitrova, 1997] stressed vowels are of equal duration as un-stressed vowels. On the contrast, she cites authors who found that the ratio of the duration of stressed vs. unstressed vowels is 1.5.

Our measurements on the overall Bulgarian Babel corpus 4.1 resulted in a 1.56 ratio of the duration of stressed per unstressed vowels. We did not normalize for other factors influencing the duration of segments, such as phrase final lengthening, length of word or syllable it contains, context of the vowel, etc.

Vowel Length Contrast

(32)

Vowel Reduction

As opposed to phonetic vowel reduction (e.g. in German), where the maximal vowel system still exists in unstressed syllables, phonological vowel reduction in Bulgarian is a raising process.

The full Bulgarian vocalic system (consisting of six phonemes) is only present in stressed position. In unstressed position the high-low distinction is neutralized, there is a reduction process during which the unstressed low /a/ and /O/ move towards the direction of the higher [@] and [u] respectively, described as raising [Dimitrova, 1997]. The difference from the well-studied phonological reduction in English is, that in English vowels in unstressed po-sition undergo a centralization (when vowels are becoming schwa), while in Bulgarian the openness feature of the vowels neutralizes in unstressed position, resulting in the shift of the vowel system to the upper region of the vowel quadrilateral.

Syllable Complexity

Some extremely complex syllables are allowed in Bulgarian (such as CCVC,CCCVC, CCVCCC), but are not frequent. 80% of the syllables are open (CV, CCV), and another 12% the CVC type [Dimitrova, 1997].

Primary Word Stress

Lexical stress in Bulgarian is determined by the weight of the syllable.

3.3

German

Duration

We measured 1.52 ratio of the duration of stressed per unstressed vowels.

Vowel Length Contrast

Vowel length opposition does exist in German, and is neutralized in unstressed syllables (e.g. unstressed long /a:/ and short /a/ are neutralized) [Jessen, 1999]

Vowel Reduction

German exhibits phonetic vowel reduction in unstressed syllables. It has a minimally reduced system. Even though there is no phonological centralization of unstressed vowels, there is a considerable loss of timbre phonetically.

Primary Word Stress

(33)

Syllable Complexity

(34)
(35)

Chapter 4

Method

Since the metrics are quite sensitive to methodological choices, such as num-ber of speakers in the corpus, elicitation method (e.g. reading or spontaneous speech), sentence type, (and the non-normalized ones to speech rate), we in-tended to use data maximally comparable in this cross-language experiment.

In a nutshell, our methodology consists of the following steps: 1) taking the duration, intensity and spectral definition of segments, 2) employing most measurements that had been introduced so far in the literature, in order to pro-vide direct comparability to the existing work, which helps to place Hungarian on the continuum or two-dimensional space of rhythm, 3) running statistical analysis on the metric scores.

4.1

Material

We used five independent speech corpora for our experiments. Three of them (Hungarian Babel, Bulgarian Babel and German Kiel) were collected as the informants were reading passages, which we are going to refer as ‘story data’. Another two (Own Recordings and L2 Corpus) were collected as the informants were reading short separate sentences, ‘sentence data’.

We only compare data that are from the same elicitation method (story vs. sentence data).

The description of each of the datasets is in the following sections, the gender distribution is balanced in all of them.

Own Recordings - Hungarian Sentences

(36)

The sentence triples are designed so that they only differ in the position of the focus, but the word order is identical. This was done to see if otherwise identical sentences, with different intonation patterns (different focus condi-tions), would still manifest the same rhythm type. Or, putting it another way, how stable the metrics are across different types of sentences.

1. the first sentence is with a neutral word order and intonation, e.g. John went to the cinema., answering the question What happened? ;

2. in the second sentence, the word preceding the verb is in focus position, e.g. Johnf went to the cinema, answering the question Who went to the cinema?

3. the third sentence is interrogative:John went to the cinema, in the mean-ing of John went to the cinema or Mary? or John really went to the cinema?

Prominence in all languages can be signaled by duration, fundamental fre-quency, intensity and spectral definition. We are curious if duration is one of the main means of prominence in Hungarian, i.e. the change of intonation patterns would show different durational patterns.

In Hungarian, focused elements occur in certain syntactic positions only, since it is a discourse configurational language, which means that structural positions are available not just for syntactic but also for discourse functions (e.g. topic, focus). Syntactic constituents can move relatively freely to the designated focused positions in the sentence, and therefore prominence marking by prosody/intonation is less salient than in other languages.

The full glossed orthographic prompting text can be found in Appendix A.1. The average length of the recordings is 150 sec per speaker.

Segmentation

The first round of segmentation was done by the MAUS [Schiel F, 2011] soft-ware, followed by manual adjustments where needed. Our approach to seg-mentation was based on acoustic, and not on phonological criteria.

The segmentation into consonantal and vocalic intervals was done by simul-taneous inspection of spectrograms and oscillograms (using Praat), following the standard segmentation criteria, and the consideration of relying on pho-netic cues rather than phonological, since metric scores are used to “validate the idea that infants determine the rhythm class of their language at an early stage of acquisition that precedes the acquisition of syllable structure details” [Arvaniti, 2012].

(37)

Hungarian BABEL Corpus

The Hungarian Babel Corpus is a spoken language corpus, annotated for phone segment boundaries, and provided in TextGrid format [Roach, 1996].

It comprises of 40 short passages containing 5 thematically connected sen-tences (stories), all of them read 3 times by 55 speakers, female and male balanced, from age 19-69.

Lexical stress is not marked.

Bulgarian BABEL Corpus

20 speakers, annotated for segment boundaries and lexical stress. Consists of read stories [Roach, 1996]

German - KIEL

30 speakers, read passages of Nordwind und Sonne and the Buttergeschichte. Annotated (among others) for phone segment boundaries and lexical stress [Kohler, n.d.].

L2 Corpus - Bulgarian-German

We used three subcorpora of the overall dataset [Bistra Andreeva, 2010] 1. 6 (3-3 female and male) German native speakers reading German

sen-tences

2. 6 Bulgarian speakers reading Bulgarian sentences

3. 3 Bulgarian speakers reading the same German sentences (GermanBulgarian -Sents)

There are certain differences from the story data: glottal stops are counted as part of the vowel in the German-German data, but they are in the Bulgarian-German data. Therefore q is part of the vowel there too. Unlike in the read passages. Therefore, the results are not comparable.

(38)

Corpus Nr. of speakers Length (min.) Glottal stops Off-glides Closing phase

Hungarian-Story 56 44 no no no

Hungarian-Sents 2 3 yes no yes

Bulgarian-Story 20 22 no yes yes

Bulgarian-Sents 6 81 no yes no

German-Story 30 34 yes yes yes

German-Sents 6 76 no yes no

GermanBulgarian-Sents 3 40 no yes no

Table 4.1: Corpora summary. All consist of recordings of read speech, passages (Story) or isolated sentences (Sents). By ‘closing phase’ we refer to utterance-initial closing phase for stops, whether they are included or not in the analysis.

4.2

Statistical Analysis

ANOVA

Each durational metric is subjected to analysis of variance (ANOVA), with lan-guage as independent factor variable (three levels: Bulgarian, German, Hun-garian).

MANOVA

The standard practice for vocalic and consonantal metric scores is to place them in a plane determined by the pair of metrics. Multivariate ANOVA’s are run on the pairs of vocalic and consonantal metrics most frequently used together in the literature (%V and ∆C, nPVI-V and rPVI-C, VarcoV and VarcoC, %V and VarcoC) as the two dependent variables, and language as the predictor. The goal of these analyses is to see whether the combined effect size of the combined consonantal and vocalic metrics (metrics pairs) would enhance the language effect. [Arvaniti, 2012]

Post hoc tests

Pairwise comparisons of main effects and significant interactions are exam-ined by means of Tukey HSD post hoc tests, to see which languages differ significantly from one another. Post-hoc pairwise comparisons are commonly performed after significant effects have been found when there are three or more levels of a factor. After an ANOVA, you may know that the means of your response variable differ significantly across your factor, but you do not know which pairs of the factor levels are significantly different from each other. At this point, you can conduct pairwise comparisons.

Effect size

(39)

(for the metrics and metric pairs) can be compared to each other within the frame of the present study.

Euclidean distances

For each above pair of metrics, most commonly used as coordinates when determining the rhythm class and positioning the language in the space defined by the two metrics, we calculate Euclidean distances between the languages. The Euclidean distances provide a quantification of how far the languages are from each other in the rhythm space.

Non-parametric tests

(40)
(41)

Chapter 5

Implementation

All datasets came in a different format, that had to be converted first to .TextGrid and .wav extensions. The labeling conventions were also differing in each of the corpora described in 4.1. In the Hungarian Babel corpus the lex-ical stress was not marked, and this had to be done in an optimally automated way.

The overall workflow between the initial input of pairs of TextGrid and wav files and the very final output, results of metrics and statistical tests, is described in the following sections.

5.1

Pre-processing

• Following the practice established in [Grabe & Low, 2002] silent pauses in the utterances are excluded from the measured intervals (except for %V, see in Section 2.4). The other measurements are taken on inter-pause stretches (IPS), meaning that the given metric is run on all inter-pausal strings, and the mean of these values is taken as the final result (as opposed to run the metric on each vocalic and inter-vocalic segment in the utterance). E.g. if an utterance is as follows: VCVC#CVC#VCV, the final score is the mean of 3 values.

• Initial and final pauses are discarded for all measurements.

• Merging consecutive vocalic or consonantal clusters, e.g. the sequence CCVCVVCCV becomes CVCVCV.

• Extracting duration of segments.

• Extracting mean F0, F1, F2 and mean intensity of vowels (via Praat-script)

(42)

As the phonetic processes in spoken language vary widely (i.e deletion, insertion, assimilation), the two kinds of transcriptions are far from being identical, which makes the task non-straightforward. Other than that, the labeling conventions in the Babel corpus are not always consistent in the representation of the deleted or inserted sounds, which lead to other complications.

• Cutting out values higher than 900 Hz for the first formant for all vowels, and higher than 450 Hz for the back-closed /u/, in all languages. • Marking (lexical) stress for Hungarian: the first vowel of the word was

tagged. Function words do not carry stress, therefore were tagged.

5.2

Marking of Rhythmic Units

Foot and pseudo-syllable boundaries had to be marked for the Foot-PVI and PVI-CV measures respectively.

As syllable and stress lack a general phonetic definition [Dauer, 1987], they can be expected to be different for every language. For the present study, we chose to treat them identical for all languages in the dataset, as described in the following two sections.

Footing

Every stressed syllable initiates a foot. Interstress intervals, which we measure, are the stretches starting with a stressed vowel and ending right before the next stressed vowel in a within pause interval. The interval disregarded between ut-terance initial or after pause and the first stressed vowel of the inter-pause stretch (IPS) is called anarcursis.

The following IPS, # CCVCCCV CCVCVVCCV CCCV CCV# , where # stands for pause, and V for stressed vowel, is segmented into 3 interstress intervals as follows: V CCVCVVCC-V CCC-V CCV.

Pseudo-Syllabification

The pseudo-syllable units were introduced in [Farinas & Pellegrino, 2001], and are derived from the most common syllable type in the world’s languages: the CV structure. It CnV, where n can be any positive integer or zero (thus V can be a syllable too). Consecutive vowel segments are merged, and the stranding consonantal intervals at pre-pausal intervals are discarded, in order to comply with the CnV type of syllable structure (no coda). If a string of CCVCCCVVCVCC, it is going to be segmented as CCV.CCCV.V.CV, no matter where the original word boundaries and linguistic syllable boundaries are.

(43)

seg-menting speech into traditional phonological syllables is a highly language-dependent task, therefore not easy to automate when carrying out cross-linguistic studies. On the contrary, pseudo-syllabification can be implemented quickly, only the few above rules to take into account to.

5.3

Metrics and Statistics on Corpora

This section provides a detailed description of which analysis methods are applied on which corpora.

Story Data - Hungarian, Bulgarian, German

The first experimental setup aims to investigate and compare the metric scores on the Hungarian, Bulgarian and German story data.

1. durations of vocalic and consonantal intervals are calculated

2. durations of interstress intervals (feet) and of pseudo-syllable are calcu-lated

3. durational metrics V%, ∆V and ∆C, VarcoV and VarcoC, rPVI-C and nPVI-V within inter-pause streches (IPS) are calculated

4. ANOVA on metrics, and MANOVA on metric pairs (%V and ∆C, VarcoV and VarcoC, rPVI-C and nPVI-V, %V and VarcoC), with language as an independent factor

5. normalized PVI on durational pseudo-syllable and foot 6. ANOVA on

7. TukeyHSD post hoc test on

8. normalized intensity PVI (on vowels only)

9. intensity and durational differences in full and reduced vowel sets 10. F1/F2 comparison in full and reduced vowel sets (dispersion from

cen-troid)

Hungarian Sentences

The second experimental setup aims to measure the stability of metrics on Hungarian sentences with different focus conditions. Since

1. durations of vocalic and consonantal intervals are calculated

2. durational rhythm measures (V%, ∆C, VarcoV, VarcoC, rPVI-C and nPVI-V)

(44)

L2 Dataset - German, Bulgarian

1. taking the durational metrics (V%, ∆C, VarcoV, VarcoC, rPVI-C and nPVI-V) on each subcorpora

2. comparing the results by the means of non-parametric version of ANOVA, to see if, and to what extent the GermanBulgariancorpus is different from the BulgarianBulgarian and GermanGerman corpora

(45)
(46)

Chapter 6

Results

6.1

The Dauer-Model

The output of the Dauer model is summarized in Table 6.1. The main phono-logical properties of the languages (duration, syllable complexity, vowel length contrast, vowel reduction and lexical stress) are described in Chapter 3. The score assignments German and Hungarian are justified there and below. The score assignments for English are from [Dauer, 1987]. Assignments for Bulgar-ian are informed by [Dimitrova, 1997], but not in full agreement with it (as justified in Section 3.2).

The final scores are normalized for the number of components that apply for a given language: score / nr. of (active) components. The scores can range from -1 (as the prototypical syllable-timed language) to +1 (as the prototypical stress-timed language).

Duration

German stressed vowels are 1.52 times longer than unstressed ones - score: plus.

Bulgarian stressed vowels are 1.55 times longer than unstressed ones - score: plus.

Hungarian stressed vowels are 1.06 times longer than unstressed ones - score: zero.

For discussion on Bulgarian see Section 3.2

Syllable structure

Bulgarian syllable structure allows complexity, but the majority of them is CV or CVC - score: zero.

German syllable structure is complex - score: plus.

Hungarian syllable structure allows complexity, but the majority of them is CV, CVC or VC - score: zero.

(47)

Quantity

In Bulgarian quantity distinctions of vowels do not exists - score: NA.

In German quantity distinctions are only allowed in stressed syllables - score: plus.

In Hungarian quantity distinctions are permitted in both stressed and un-stressed syllables - score: minus.

Intonation

In Bulgarian in the large majority of cases accent and pitch correlate - score: plus

In German stressed syllables and intonation contour are highly interrelated -score: plus

In Hungarian stressed syllables and intonation contour are highly interrelated, as shown in Figure 6.1 - score: plus

Tone

The phonological use of pitch is restricted in Bulgarian, German and Hungarian to the level of the phrase and the sentence; tone does not differentiate units at the word or syllable level. The ‘Tone’ component is therefore not applicable to any of the languages in the investigation.

Vowels

Hungarian: centralization in unstressed syllables - score: plus. Bulgarian: rising in unstressed syllables - score: zero.

German: centralization in unstressed syllables - score: plus. For discussion see Section 2.2.

Consonants

Bulgarian consonantal allophones are determined by the phonological context rather than stress - score: minus

German consonants are modified in unstressed syllables - score: plus

Hungarian consonants are modified in unstressed syllables, e.g. voiced plosives become approximants - score: plus.

Function of stress

Lexical stress placement for the languages: Bulgarian: free - score: plus.

(48)

Language English German Bulgarian Hungarian

Duration + 1.52 (+) 1.55 (+) 1.06 (0)

Syllable structure complex (+) complex (+) moderate (0) moderate (0) Quantity NA only in stressed positions (+) NA in all syllables (-)

Intonation + + + +

Tone NA NA NA NA

Vowels centralization (+) centralization (+) raising (0) centralization (+)

Consonants + + - +

Function of Stress free (+) free (+) free (+) fixed (0)

Assessment 6/6 = 1 6/7 = 0.85 1/6 = 0.16 2/7 = 0.28

Table 6.1: Comparison of English, Bulgarian and Hungarian by the Dauer criteria. The zero for the syllable structure component was introduced in [Dimitrova, 1997].

Figure 6.1: Illustration of the correlation between intonation and stress in Hungarian. The blue line represents the pitch contour of this small segment of speech, showing that the F0 value is the highest at the first stressed syllable.

The final scores order the languages from the syllable-timing pole (-1) to the stress-timing pole (+1) as follows: Bulgarian (0.14), Hungarian (0.28), German (0.85), English (1).

6.2

Story Data - Durational Metrics

(49)

Language %V %V2 ∆V ∆IV VarcoV VarcoC VnPVI rCPVI English* 44 54 50 51 59 63 English** 41.1 56.7 57.2 64.1 English*** 40.1 53.5 Estonian** 44.5 31.9 45.4 40.0 German* 38 62 51 53 54 66 German** 46.4 52.6 59.7 55.3 Greek* 47 37 56 46 52 47 Greek** 44.1 52.7 48.7 59.6 Italian* 50 43 53 49 46 49 Italian*** 45.2 48.1 Japanese** 45.5 55.8 40.9 62.5 Japanese*** 53.1 35.6 Spanish* 49 45 47 47 47 54 Spanish** 50.8 47.5 29.7 57.7 Spanish*** 43.8 47.4 Hungarian-Story 37.48 (0.45) 50.84 (0.23) 23.05 (0.34) 29.68 (0.39) 23.51 (0.27) 33.49 (0.26) 37.98 (0.44) 50.45 (0.73) Bulgarian-Story 28.22 (0.77) 44.55 (0.41) 22.27 (0.68) 29.28 (0.61) 28.07 (0.50) 30.65 (0.42) 45.24 (0.76) 49.68 (1.13) German-Story 27.81 (0.28) 42.03 (0.23) 31.79 (0.41) 44.61 (0.54) 34.40 (0.30) 37.82 (0.21) 54.07 (0.47) 70.63 (0.92)

Table 6.2: Mean metric scores of the classical measurements and SE in brackets for Hungarian, Bulgarian and German story data. The other measurements are from *: [Arvaniti, 2012], **: [Grabe & Low, 2002], and ***: [F. Ramus, 1999].

The illustration of the (negative and positive) correlation of the metric pairs (%V and ∆C, VnPVI and CrPVI, VarcoV and VarcoC), is in Figure 6.2. The exact values of the scores are summarized in Table 6.2.

(50)

20 30 40 50 60 ∆C and Vratio Score bg de hu 20 30 40 50 60 Score bg de hu %V DeltaC 20 30 40 50 60 70

VnPVI and CrPVI

Score bg de hu 20 30 40 50 60 70 Score bg de hu VnPVI CrPVI 20 30 40 50 60

VarcoV and VarcoC

Score bg de hu 20 30 40 50 60 Score bg de hu VarcoV VarcoC

Figure 6.2: Mean score values and SE for the traditional metric pairs (%V2 and ∆C, VnPVI and CrPVI, VarcoV and VarcoC). Bulgarian (bg), German (de) and Hungarian (hu).

On Figure 6.3 the languages are placed in a rhythm space defined by vocalic and consonantal metric pairs: %V2 and ∆C, VnPVI and CrPVI, VarcoV and VarcoC, %v2 and VarcoC. For more on the motivation behind this choice of representation see 2.6.

We can see that Hungarian has the highest vowel ratio (A and D), but at the same time the duration of the vowels is the most invariant in Hungarian among the three languages. This might be due to the normalization applied in the normalized vocalic PVI (nPVI-V).

(51)
(52)

20 30 40 50 60 20 30 40 50 60 70

A

%V ∆ C BG

HU DE 20 30 40 50 60 20 30 40 50 60 70

B

nPVI−V rPVI−C BG

HU DE 20 30 40 50 60 20 30 40 50 60 70

C

VarcoV V arcoC BG

HU DE 20 30 40 50 60 20 30 40 50 60 70

D

%V V arcoC BG

HU DE

(53)

20 30 40 50 60 20 30 40 50 60 70

A

%V2 ∆ C BG

HU DE ● EN ● SP* ● JP** ● DE* 20 30 40 50 60 20 30 40 50 60 70

B

nPVI−V rPVI−C BG

HU DE ● EN* ●SP* ● JP** ● DE* 20 30 40 50 60 20 30 40 50 60 70

C

VarcoV V arcoC BG

HU DE ● EN* ●SP* ● DE* 20 30 40 50 60 20 30 40 50 60 70

D

%V2 V arcoC BG

HU DE ● EN* ● SP* ● DE*

(54)

ANOVA

Since the data is normally distributed, analysis of variance (ANOVA) is appli-cable to see how the independent language factor, with three levels: Bulgarian, German and Hungarian, affects the metric scores. All metrics showed a sta-tistically significant main effect for language. Therefore TukeyHSD post hoc tests were also run, to see pairwise differences.

%V

F(2,323)=173.3, p < 0.0001. The TukeyHSD post hoc test reveals that that Hungarian is significantly higher than Bulgarian and German at a p < 0.0001 level. There is no significant difference between German and Bulgarian. ∆C

F(2,323)=290, p < 0.0001. The TukeyHSD post hoc test reveals that German is significantly higher than Bulgarian and Hungarian at a p < 0.0001 level. There is no significant difference between Hungarian and Bulgarian.

nPVI-V

F(2,323)=293.2, p < 0.0001. The TukeyHSD post hoc test reveals that German is significantly higher than Bulgarian and Hungarian at a p < 0.0001 level. Hungarian is significantly lower than Bulgarian (p < 0.0001).

rPVI-C

F(2,323)=173.9, p < 0.0001. The TukeyHSD post hoc test reveals that German is significantly higher than Bulgarian and Hungarian at a p < 0.0001 level. There is no significant difference between Hungarian and Bulgarian.

VarcoV

F(2,323)=332.7, p < 0.0001. The TukeyHSD post hoc test reveals that German is significantly higher than Bulgarian and Hungarian, and that Bulgarian is significantly higher than Hungarian at a p < 0.0001 level.

VarcoC

(55)

Metric Order of Languages %V Hungarian > Bulgarian, German VnPVI German > Bulgarian > Hungarian VarcoV German > Bulgarian > Hungarian ∆C Hungarian > Bulgarian, German CrPVI German > Bulgarian, Hungarian VarcoC German > Hungarian > Bulgarian

Table 6.3: The results of pairwise comparisons between languages for vocalic and consonantal metrics. The significant difference between languages is marked with the ‘greater than’ symbol. With the exception of %V, it holds for all metric scores that the higher values indicate stress-timing

Table 6.3 summarizes which language pairs are significantly different from each other by metric type. Except for %V, the higher the score is, the language is more likely to stress-timed. We can see that the scores for the consonantal metrics are not entirely consistent. The vocalic metrics show an expected pattern; Hungarian has the less variability and highest proportion of vowel duration followed by Bulgarian and then German. Although it is closer to syllable-timed languages, Bulgarian %V is likely to be lower because of the small size of the vowel inventory. This is a weakness of the vowel ratio metric (discussed in detail in Section 2.4); it is affected by phonological factors, such as the size of the vowel inventory, that are not directly correlated with rhythm. The results for the other vocalic metrics are consistent, ordering German as the closest to stress-timing, Hungarian as the closest to syllable-timing, and Bulgarian in an intermediate position. The low variability of the vocalic intervals in Hungarian can be, however, the effect of normalizing away the vowel length contrast that (unlike in German) is preserved in unstressed syllable positions as well.

MANOVA

For metric pairs multivariate analysis of variance (MANOVA) was run, with the following significant results:

1. for %V-∆C, F(2,323)=99.39, p < 0.0001 2. for VnPVI-CrPVI, F(2,323)=46.38, p < 0.0001 3. for VarcoV-VarcoC, F(2,323)=67.19, p < 0.0001 4. for %V-VarcoC, F(2,323)=107.83, p < 0.0001

All metric pairs show a significant effect of language.

Effect size

(56)

%V ∆C nPVI-V rPVI-C VarcoV VarcoC

ANOVA - Language 0.51 0.64 0.64 0.51 0.67 0.49

%V-∆C VnPVI-CrPVI VarcoV-VarcoC %V-VarcoC

MANOVA - Language 0.38 0.22 0.29 0.40

Table 6.4: Partial eta square for language as a factor (independent variable). Single and tandem variables (metrics).

The effect size is somewhat variable for the metrics: the largest is for Var-coV, vocalic nPVI and ∆C.

As for the metric pairs, the language effect is the weakest for the PVI metric pair, but are generally low compared to the single metric effects.

Euclidean distance

Another approach to show magnitude of the difference between the metric pairs is the Euclidean distance. The distance between two two-dimensional vectors is calculated as follows:

d(p, q) =p(p1− q1)2+ (p2− q2)2 (5) Table 6.5 shows the distances between all languages. The coordinates for the individual languages are defined by the metric pair scores.

%V-∆C VnPVI-CrPVI VarcoV-VarcoC %V-VarcoC

Hungarian-Bulgarian 9.26 7.29 5.36 9.68

Hungarian-German 17.78 25.80 11.71 10.59

Bulgarian-German 15.33 22.73 9.56 7.17

Table 6.5: Euclidean distances (Euclidean norm) between languages, per metric pairs.

The results indicate that the distance between Hungarian and German is larger than between Hungarian and Bulgarian in the spaces defined by the met-ric pairs: %V-∆C, VnPVI-CrPVI and VarcoV-VarcoC. The distance between Bulgarian and German is smaller than between Hungarian and German. The exception is the %V-VarcoC pair, where the distances between the three lan-guages are the smallest and nearly equal. The Euclidean distances show more or less consistent behavior in our data for the combined vocalic and consonantal metrics.

6.3

Story Data - Durational PVI Measures

(57)

reference, the scores are taken from [Asu & Nolan, 2006]. The Estonian corpus contained recordings of 5 female informants reading a story.

The normalized version of the PVI are taken of interstress intervals (nFoot-PVI), pseudo-syllable (nPVI-CV), vocalic (VnPVI) and consonantal intervals (CnPVI).

Language Hungarian Mean (SE) Estonian Mean (SE) Bulgarian Mean (SE) German Mean (SE)

nPVI-CV 42.88 (0.55) 37.5 () 49.90 (0.66) 53.01 (0.49)

nFootPVI 36.14 (0.69) 33.5 () 24.82 (1.04) 24.81 (1.29)

VnPVI 37.98 (0.44) 48.3 () 43.33 (0.76) 54.21 (0.67)

CnPVI 59.65 (0.56) 52.0 () 52.99 (0.88) 60.14 (0.52)

Table 6.6: Mean and SE of PVI’s. The Estonian scores are taken from [Asu & Nolan, 2006], with which our results are comparable (except for the syllable-PVI values). The SE values for Estonian were not available.

In Figure 6.3 the score values of the PVI’s are represented in histograms (the same data as in table 6.6). The pseudo-syllable durations in Bulgarian and German are more variable than in Hungarian and Estonian, whereas the variability of the feet are lower respectively. These results place Hungarian (and Estonian) closer to the syllable-timed languages, and Bulgarian and German to the stress-times ones.

(58)

nPVICV nFootPVI nVPVI nCPVI de hu et bg 0 10 20 30 40 50 60 70

(59)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 20 30 40 50 60 70 80 10 20 30 40 50 60 70 Bulgarian 2−dimensional Pseudo−Syllable nPVI F oot nPVI ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 50 60 70 80 10 20 30 40 50 60 70 Hungarian 2−dimensional Pseudo−Syllable nPVI F oot nPVI ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 50 60 70 80 10 20 30 40 50 60 70 German 2−dimensional Pseudo−Syllable nPVI F oot nPVI

Figure 6.6: Plot of the two-dimensional rhythm concept for Bulgarian, Hungarian and German, which comprises of the normalized syllable (here pseudo-syllable) and foot PVI values. The red square symbols represent the means of each language.

nPVI-CV

Referenties

GERELATEERDE DOCUMENTEN

This paper is structured as follows: After having provided a background literature analysis on the concept of internationalization and common internationalization theories, I zoom

The aim of this study was to see whether two groups with neurodegenerative diseases causing dysarthria and one group without neurological impairments could be differentiated

Binne die gr·oter raamwerk van mondelinge letterkunde kan mondelinge prosa as n genre wat baie dinamies realiseer erken word.. bestaan, dinamies bygedra het, en

Daarnaast komt meer kennis van ziekten en plagen als gevolg van mechanische bewerkingen ter beschikking, zodat telers op basis van betrouwbare gegevens een goede afweging kunnen

In this section we will present a number of new use cases, some from a user perspective and some from a business perspective, which we have come up with. We think this set of

On the contrary, for the exhaustive time-limited discipline a large number of both approximative and exact analysis exists (see, e.g., [95, 31, 32, 39, 68]). Leung [68] analyzes

Vanaf het eind van de jaren zestig ging de Stichting Bijstand Buitenlandse Werknemers Gelderland net als de SBBW’s in andere regio’s een steeds belangrijker rol spelen in de

The purpose of the study is twofold; firstly, to use data envelopment analysis (DEA) to estimate the technical efficiencies of Johannesburg Stock Exchange (JSE)-listed