• No results found

Musicality as a Component of Engaging Speech

N/A
N/A
Protected

Academic year: 2021

Share "Musicality as a Component of Engaging Speech"

Copied!
44
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Abigail M. Golec University of Amsterdam

(2)

2 | P a g e Table of Contents Page Abstract……… 3 1. Introduction.……… 3 1.1 Overview………. 3 1.2 Background………. 4

1.3 Basis for the Current Study………. 6

1.4 Related Studies……… 7

1.5 Motivation………. 12

2. Methods………. 14

2.1 Overview………. 14

2.2 Recording of Voiced Speech……….. 14

2.3 Recording Procedure……….. 15

2.4 Data Analysis Procedure………. 18

3. Results……….…… 20

3.1 Overview……… 20

3.2 Descriptive Statistics……… 20

3.3 Matching Formant Ratios to Musical Intervals……… 22

3.4 Effects of Intonation………. 25

4. Discussion……….. 32

4.1 Overview……… 32

4.2 Musical Interval Ratio Matching and Chi-Square Test……….. 33

4.3 Evaluation of Intonation Effects ……… 34

4.4 Implications for Further Research ………..……….…….………. 37

4.5 Study Design……….. 39

Conclusion……….. 40

(3)

3 | P a g e

Abstract

Do engaging public speakers have more musical qualities in their voice when delivering a speech? A sizable body of research examines the links between music and language, and supports the claim that humans have the ability to comprehend vast amounts of information from the acoustic signals of both music and speech. Various research has been done involving the intricacies of emotional arousal in music, as well as emotional speech processing. However, very little research exists on the intersection of these subjects, even though they are arguably closely related. This current study aims to both expand upon and bring together these closely related research streams by drawing upon the seminal research of Bowling (2010), which supports a connection between formant frequency ratios of emotional speech and interval ratios of Western traditional music, in addition to the research on the acoustic-prosodic characteristics of charismatic speech, conducted by Neibuhr (2016), in order to facilitate a more nuanced discussion on the inherent musical qualities of speech, and performative speech in particular. Through the conduction and analysis of various types of emotional speech recordings, the current study explores and supports the hypothesis that intentionally engaging, performative speech elicits more musical qualities as compared to neutral speech.

1. Introduction

1.1 Overview

Much like a well-delivered speech, music moves us. It captures our attention and has the power of inducing a multitude of responses from us; be them emotional, physiological, or something else entirely. Scientists have researched the components of music that cause such reactions, and have come up with a variety of measures that influence human responses to music

(4)

4 | P a g e

(Rozin 2008; Elika 2013). However, what is it about a speech that evokes responses from us? Is it just rhetoric and language comprehension, or does the power of evoking response lie within the speaker’s vocal intonation, and other acoustic-prosodic qualities as well?

Perhaps most people are familiar with the experience of beholding a boring speech. The speaker could use incredibly eloquent language, but if their voice is lifeless and dull, the speaker loses the attention of their audience. Perhaps in some cases, the rhetoric of the speaker even ceases to matter as long as their voice is able to capture attention. Of course, public speaking is a more performative mode of communication, especially in comparison to speaking in a neutral reading tone, but what exactly changes in the voice when a speaker switches from a neutral voice to a performative voice? Is the resulting performative speech more musical than that of a more spontaneous manner? The intersection of music and language in this respect offers a thought-provoking and productive area of study for cognitive research.

1.2 Background

The approaches used to evaluate and compare the acoustic properties of music and speech are by no means universal, and can often be subjective and inconsistent. This is especially the case when reviewing research conducted in separate subject domains, such as speech acoustemology and cognitive musicology. The main two papers this study is motivated from, Bowling (2010) and Niebuhr (2016), are examples of this. With this in mind, before the current study is expounded upon, some definitions must be established.

I will begin by delineating definitions used predominantly in the field of music studies. Although there are a number of different ways to the define musicality, the definition of Honing (2018) best suits the present study. Honing (2018) proposes that musicality refers to a disposition towards musical

(5)

5 | P a g e

behaviors, defined by the development of a variety of abilities allowing for particular interaction with specific types of acoustic signals, constrained by cognitive ability and underlying biology. The ability to recognize a melody is one example of human musicality. Tonality refers to aesthetic modes of tone patterns in music theory. The major scale, in classical Western music traditions, is one such example of tonality. An interval is the frequency difference between two tones. Pitch refers to a tone’s frequency.

I will continue now with definitions used when discussing components of speech. Speech prosody refers to the properties of larger segments of speech, including such elements as intonation, stress, tone, and rhythm.

Speech intonation refers to the variance in tone, or vocal pitch, while speaking. Speech rhythm refers to the variations in stress, or vocal accent, while

speaking. Speech Intensity refers to the loudness of speech. Dynamic speech is speech that varies in speech intensity. A formant, or also formant frequency, refers to a prominent frequency band present in a voiced vowel syllable, which determines phonetic quality. There are multiple formants present in vowel sounds, and these formants help listeners distinguish one vowel sound from the other.

Now I will continue with the more unique definitions pertinent to the context of this study. Performative Speech refers to engaging and charismatic speech, often characterized by its delivery to, and in the presence of, an audience. Charismatic speech, itself, is characteristically problematic to define, as charisma is a subjective quality. However, speech of this nature has been defined in a number of other studies, and through the collective synthesis of these definitions, I therefore define charismatic speech, and by extension, Performative Speech, as speech which aims to appeal to an audience in the hopes of emotional arousal, inspiration, and impassioned belief and understanding (Rosenberg & Hirschberg 2009; Signorello 2012; Niebuhr 2016). Performative Speech also confers a level of pathos, benevolence, competence,

(6)

6 | P a g e

dominance, and aesthetic emotional arousal effects in the speaker (Signorello 2012). It has also been observed that Performative Speech, in one particular instance, tended to have heightened levels of pitch variance, higher fundamental frequency, varying speech rate, and varying vocal intensity (Niebuhr 2016).

In addition to Performative Speech, the current study defines certain intonations in the following ways: Neutral Speech refers to speech spoken in a neutral reading tone. Excited Speech refers to speech spoken in a positively heightened valence, and Subdued Speech refers to speech spoken with negatively repressed valence (Bowling 2010). These other intonations are used in comparative analysis in order to support the prevalent characteristics of performative speech.

1.3 Basis for the Current Study

The current study seeks to support a connection between Performative Speech and human musicality. The study examines the differences in the acoustic-prosodic characteristics of certain samples of text spoken in a number of specific intonations, in order to ascertain and support the defining characteristics of Performative Speech. These findings will be connected to musical qualities, and supported by other literature, in attempts to bridge the gap between music and language studies.

It has previously been established by past research that music and language are connected: through neurological processes, through functional behavioral research, through psychoacoustic perception (Besson 1999; Alcorta 2008; Deutsch 2008; Large 2016; Bhatara 2013; Ettlinger 2011; Roncoglia-Dennison 2016; Tierney 2018), and as comparatively similar subject groups (Weninger et al. 2013; Levine 1978; Tsai 2018; Lima 2011; Ross 2007; Weidema & Honing 2016; Falk 2014). In a merely definitive sense of the subjects, both domains are rule-based systems which require specific knowledge to

(7)

7 | P a g e

understand, and both express themselves through acoustic and/or textual material as sequential events in time (Besson 1999). Although the subjects of music and language remain distinct areas of study, studying the connection of these two fields yields positive results in furthering the understanding of the intricacies implicated in human processing of acoustic information (Bowling 2013; Tierney, Patel, & Breen 2018; Tsang 2017; Ravignani & Honing 2017). Nevertheless, there are relatively few published studies examining vocal intonation in Performative Speech. Traditionally, Performative Speech is analyzed through the evaluation of rhetoric and linguistic choices (Biria & Mohammadi 2012), but some studies indicate that the investigation into acoustic qualities of the voice, such as speech intonation, speech dynamic, and speech rhythm offer insight into the inner workings of the emotional processing and comprehension of the spoken word (Faber 2014; Filippi et al. 2016; Waaramaa & Leisiö 2013; Fritz & Koelsch 2008; Filippi et al. 2017, Coutinho & Dibben 2013). These insights are useful in a variety of ways, and for example, can be used explicitly to support teaching methods for public speaking (Cohen 2010). Pertinent findings related to the current study will be examined in the following section.

1.4 Related Studies

There are a number of studies which have examined the relationship between music and language, and music and speech in particular. In the following section I will detail work related to the current study. First, I will start with work that supports a neurological connection between music and language, then move on to work that supports a functional connection of music and language, and then I will detail work which examines the perceptual connections of music and language. After this, I will present some past studies which examined the acoustic qualities of Performative Speech.

(8)

8 | P a g e

Lastly, I will go more in depth into two particular studies on which the current study is directly motivated.

Music and language are connected neurologically, through similar processing mechanisms (Besson 1999; Janata 2003; Groussard et al. 2010; Large 2016; Zhao 2016). By utilizing neuroimaging technology such as fMRI, EEG, and MEG, research has deepened the understanding of human processing mechanisms for music and language, and examined how the neural networks overlap. The work by Groussard et al. (2010) used fMRI to explicitly show the areas of the brain that are active during music related tasks, as well as language related tasks. Through conjunction analysis of their results, they found overlap in the active networks of these processes. Besson (1999) used

electroencephalogram (EEG) to analyze the brain’s activity while focusing on

language-based or musical material. The results showed a congruence in the consistent occurrence of semantic processing values before harmonic processing took place. This indicates a similarity in the processing of music and language-based material, and also that the brain perceives music as semantic material.

Music and language are also related functionally by research (Cooper 1990; Juslin 2008; Podlubny 2016; Ettlinger 2011; Tsang, 2017). For example, studying the attending properties of musical speech versus monotonous speech illustrates the tendency for human preference of musical speech (Cooper 1990; Corbeil 2013; Tsang 2017). Cooper (1990) examined the behavioral preferences of young infants in their attending to infant-directed speech (IDS), over adult-directed speech (ADS). IDS is speech directed towards an infant or child that is pronounced with higher energy and more melodicity as compared to ADS. The attention task was carried out by using a visual fixation-based auditory preference procedure, examining the length of time an infant would attend to a visual stimulus that produced IDS and ADS, respectively. The infants looked at the stimulus producing IDS for a significantly longer amount

(9)

9 | P a g e

of time, suggesting a preference towards the more musical speech. Tsang (2017) expanded on this research by showing that infants prefer infant-directed song over infant directed speech, and found that infants preferred the song to the speech. This functional research allows for the analysis of human preferential behavior towards music and different types of speech. Although this attending property of music and speech has been predominantly focused on the behavior of infants, extending the research to include adults may offer interesting insights.

Music and language are connected psychoacoustically (Deutsch 2008; Gordon et al. 2010; Cummins 2012; Kempe et al. 2015; Boll-Avetisyan 2017; Teirney, Patel, & Breen 2018; Tierney 2018; Simpson 2008; Slater 2015; Arnoud 2018; Juslin 2001; Coutinho & Dibben 2013). Perhaps the most widely known example of research supporting this claim is the speech-to-song illusion, from Deutsch’s 2008 study. In this study, certain segments of speech were presented to listeners and consecutively repeated a number of times. After listening, participants indicated if their perception of these segments transformed from speech to song. The results showed a significant number of participants experienced this transformation for particular segments of text. This study offers insight into the significance of repetition in musical perception, as well as furthers understanding on human pitch perception in speech. Subsequently following the release of this study, other research has been conducted specifically on this phenomenon, which has shed light on a listener’s implicit ability to rapidly determine musical structure in speech via the slight manipulation of particular acoustic features of the transforming segments (Tierney, Patel, & Breen 2018; Tierney 2018).

The super-expressive voice theory (Juslin 2001) pertains to human perception of music and voice, and essentially postulates that humans perceive melodic lines as super-expressive voices. This is based on the knowledge that the frequency ranges of many instruments are in a roughly

(10)

10 | P a g e

similar range as the human voice, as well as based on the theory that musical pitch and speech pitch are processed by the brain in similar ways. It is thought that emotion is evoked in a listener as they process the combination of pitches and intervals produced by the voice. Musical instruments mimic the emotional quality of the voice, and that is thought to be a reason for human emotional arousal while listening to instrumental music. Although this is a novel theory, it has received critical responses suggesting a need for further research into the topic (Simpson 2008).

Research evaluating the interconnectedness of music and language has also been conducted regarding rhythm perception in the voice. Boll-Avetisyan (2017) tested native German speakers’ abilities to group language rhythmically, predicting that participants with musical experience would be better at the task. The results support the hypothesis, showing that musical expertise can aid in language perception abilities.

Even closer to the subject at hand, the research by Coutinho & Dibben (2013) regarding the psychoacoustic features of emotional speech and music has yielded seven characteristics for analysis: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. In this study, participants rated musical and vocal speech samples on a two-dimensional grid of valence and arousal. The stimuli presented varied within the seven psychoacoustic parameters. Another part of this research involved computational modeling, which supported the behavioral results indicating that the modulation of the seven particular psychoacoustic features of auditory stimuli influence emotional perception.

Various studies have also evaluated the emotional connotations in speech prosody (D’errico 2013; Bowling 2010; Niebuhr 2016; Goerlich 2011; Han et al. 2011; Bowling 2018; Faber 2014; Bowling 2012; Okada 2012; Signorello 2012; Signorello 2013; Paulmann 2013; Yanushevskaya 2013; Rosenberg & Herschberg

(11)

11 | P a g e

2009; Ross 2007). Two studies in particular stand out from this group, Bowling (2010) and Niebuhr (2016), which the current study has been modeled off of.

The studies by Bowling relate music tonality with acoustic vocal characteristics in order to establish a theory for a vocal basis of human musicality (Bowling 2010; Bowling 2012; Bowling 2018), and is supported by findings of other research (Ross 2007; Filippi et al. 2017; Faber 2014). Bowling (2010), compares samples of excited and subdued speech to musical melodies in major and minor music tonalities. This was done through the collection of a number of speech samples that included a single word condition as well as a monologue condition. The spectral comparisons were based on fundamental frequency and frequency ratios. These two qualities are important acoustic features which play critical roles in the perception of voice and tone quality. In speech, fundamental frequency can communicate information like sex, age, and emotional intonation of the speaker, and frequency ratios between the first and second formant differentiate vowel sounds, allowing for speech comprehension. In music, fundamental frequency conveys the melody, and frequency ratios between notes insinuate the music’s tonality.

The voiced vowel data from the speech samples was extracted and compared with musical data taken from a database of traditional Western major and minor melodies and interval ratios. The results showed a parallel between excited speech and major melodies and subdued speech and minor melodies, as it was determined that the formant ratios observed in emotional speech match the interval ratios associated with the major and minor tonalities. Bowling’s research suggests that the development of major and minor tonality may be based in emotional voice tone. This particular study was chosen as a model for the current study because of its direct correlation to emotional voice and musicality in addition to its promising results, which have yet to be replicated by another researcher.

(12)

12 | P a g e

The other significant research motivating the current study comes from Niebuhr (2016). This study specifically analyzes the qualities of performative speech, and uses the speeches of Steve Jobs as a case study to define charismatic speech. They use conventions established in previous research (D’Errico 2013; Rosenberg & Hirschberg 2009; Signorello 2012; Signorello 2013) to evaluate Jobs’ intonation, loudness, speaking rate, and fluency. They compare Jobs’ speech to a reference sample of speech, and found that Jobs’ speech had exceptionally high pitch level, and utilized a large pitch range. He also spoke with varying degrees of loudness, although his loudness range was generally within the reference range. His syllable lengths were much shorter compared to the reference sample, and he spoke fairly fast, at variable rates, although this didn’t stand out to the same extent as comparisons in pitch and loudness.

The results of Niebuhr (2016) indicate that it is possible to qualify performative speech in comparatively quantitative ways. The results of Bowling (2010) indicate that musical elements of tonality may be present in emotional voices. By synthesizing these results, the current study aims to support the existence of musical elements in performative voice, and further establish the implications of the inherent musical qualities of the voice.

1.5 Motivation

The current study combines parts of the approaches of Bowling (2010) and Niebuhr (2016) in order to evaluate certain musical qualities pertaining to the voice. Similar to Bowling (2010), the current study enacts a particular recording procedure in order to obtain analogous stimuli for analysis. The procedure also involves the recording of words, statements, and monologues in various emotional intonations. By including the intonations of excited, subdued, neutral, performative, and sung, the current study evaluates the acoustical properties of each type of speech, and relates these acoustical

(13)

13 | P a g e

properties to musical properties in order to better understand the defining features of each intonation.

Also similar to the study of Bowling (2010), the formant ratios of the voiced syllable data present in the recordings of the current study are analyzed for musical interval matching, and analyzed for effect per intonation. The results of Bowling (2010) suggest that the formant ratios of excited speech match with most intervals associated with major tonality, particularly the major third, and that formant ratios of subdued speech match with most intervals associated with minor tonality. The current study hopes to replicate these results regarding Excited and Subdued Speech, as well as observe the matching patterns of Neutral Speech, Performative Speech, and Sung Speech.

Similar to Niebuhr (2016), the current study evaluates recorded speech for a number of acoustical properties. These properties include Word Intensity, Word Duration, Syllable Duration, Range of Syllable Duration, Variance of Syllable Duration, Mean Pitch, Range of Mean Pitch, and Variance of Mean Pitch. All these properties play an important role in evaluating speech prosody, as they can help define a particular speech sample’s speech rhythm and melodicity. The evaluation of Word Intensity, Word Duration, Syllable Duration, Variance of Syllable Duration, and Range of Syllable Duration help define how loud each intonation is, the speaking rate of each intonation, and how much these speaking rates change during articulation. The evaluation of Mean Pitch, Range of Mean Pitch, and Variance of Mean Pitch help determine the frequencies these intonations are spoken at, and how much these frequencies change throughout articulation. The assessment of these properties together allows the current study to cross compare each intonation with each other in order to determine if there are specific acoustical properties that distinguish one intonation from another. By understanding what makes a particular intonation unique, one can gain a better understanding of speech in general, and the acoustical and musical properties pertaining to it.

(14)

14 | P a g e

2. Methods

2.1 Overview

The data for this research was collected from a number of recordings conducted by the researcher. The recordings contain examples of speech, spoken with neutral intonation, excited intonation, subdued intonation, and performative intonation, as well as one body of samples consisting of unaccompanied singing. These samples were obtained by recording single words, short fragment sentences, short lyrical samples, and monologues. These samples were recorded multiple times with particular combinations of the intonations mentioned above. The data extracted from these recordings was analyzed in Praat, an acoustic signal analysis software, and only pertains to the voiced vowel segments of the speech.

2.2 Recording of Voiced Speech

There was a total of ten participants involved in the recording process (four male), ages 19-27. All ten participants were native English speakers. Nine participants were of American nationality, and one was of Canadian nationality. Participants were constrained to native American-English speakers in order to control for an effect of accent as much as possible. Although there is some variance in regional American accents, the selected participants did not speak with heavy regional accents, and most participants were from the same region. Seven participants were from Connecticut, one participant was from Iowa, one participant was from Washington, D.C., and one participant was from Toronto, Canada. Five of the participants had received training as actors in their studies at university, and all participants were familiar with public speaking as a component of their studies and/or professions. The participants voluntarily contributed to the study and gave informed consent, according to the guidelines approved by the University of Amsterdam Ethics Committee.

(15)

15 | P a g e

The speech samples consisted of recordings of four separate conditions: a Single Word Condition, a Short Fragment Condition, a Song Condition, and a Monologue Condition. The short fragments and monologues were taken from Barrack Obama’s 2016 State of the Union address, and Elizabeth Warren’s 2018 Foreign Policy Speech at American University. These two speeches were selected because they are examples of politically charismatic speech conducted by both a male and a female professional speaker, respectively, and allow for more enriching comparisons to be made in the data analysis.

2.3 Recording Procedure

The recordings were conducted in two locations: in Mansfield, Connecticut, United States of America, and in Amsterdam, Netherlands. The multiple recording locations were necessary due to the location of the participants. In the Mansfield location, speech was recorded in a sound-attenuating chamber, using a Sure KSM32 vocal-condenser microphone, through a Focusrite Scarlett 2i2 interface, into a computer running Adobe Audition. In the Amsterdam location, speech was recorded in a silent room, using an MXL V67G large-diaphragm cardioid vocal-condenser microphone, through a Digidesign Mbox 2 interface, into a computer running Adobe Audition. The change in location and equipment had no significant effect on the quality of the recordings for the purpose of acoustic speech analysis.

The recording procedure began with a thorough explanation of the purpose of the data collection and matters pertaining to participant privacy, allowing the participant to give informed consent as stipulated by the University of Amsterdam Ethics Committee. The participant was then instructed to sit next to the microphone, which was adjusted to suit the participant’s height. They were then allowed to put headphones on, and adjust the volume level to their satisfaction. The microphone was then turned on and

(16)

16 | P a g e

recording began. The participant was asked to state their name and what they studied, in order to test microphone levels.

Once the microphone volume level was adequately adjusted, the participant was instructed to read each single word seven times consecutively, per intonation. Participants repeated the words seven times so that the central five utterances could be analyzed in order to ensure the absence of onset and/or offset effects. For the Single Word Condition, the text recorded was as follows: 1. World 2. Challenges 3. Science 4. Future 5. Democracy

These words were spoken with a neutral intonation, excited intonation, subdued intonation, and performative intonation. The subdued intonation was only included with the single word condition in order to reinforce the findings of Bowling (2010). These particular words were chosen because they recurred in other conditions, allowing for more cross comparison in the data analysis.

After the participant completed this section, they were given three short fragment sentences to read. They read each short fragment seven times consecutively per intonation. Participants repeated the fragments seven times so the central five utterances could be analyzed in order to ensure the absence of onset and/or offset effects. For the Short Fragment Condition, the text recorded was as follows:

1. A better politics doesn’t mean we have to agree on everything. 2. Rules that work for all Americans, not just for wealthy elites. 3. The world exists beyond the next quarterly report.

(17)

17 | P a g e

These sentences were spoken with neutral intonation, excited intonation, and performative intonation. These particular sentences were taken from Barack Obama’s 2016 State of the Union Address (fragment 1), and Elizabeth Warren’s 2018 Foreign Policy Speech at American University (fragments 2 and 3).

After this section was recoded, participants were instructed to read the two longer monologues. Similar to the Short Fragment Condition, these were taken from Barack Obama’s 2016 State of the Union Address (monologue 2), and Elizabeth Warren’s 2018 Foreign Policy Speech at American University (monologue 1), and read with neutral intonation, excited intonation, and performative intonation. Participants repeated each monologue five times over, in each intonation. Participants repeated each monologue five times so the central three utterances could be analyzed, in order to ensure the absence of onset and/or offset effects. The text recorded was as follows:

1. Fifty-five years ago, when President John F. Kennedy spoke right here at American University, he said that, “our problems are manmade–and therefore, they can be solved by man.” The same is true today. Okay, I’d add that they can also be solved by women, as well. But I believe in us. Americans are an adaptive, resilient people, and we have met hard challenges head on before. We can work together, as we have before, to strengthen democracy at home and abroad. We can build a foreign policy that works for all Americans, not just for the wealthy elites.

2. Sixty years ago, when the Russians beat us into space, we didn’t deny Sputnik was up there. We didn’t argue about the science, or shrink our research and development budget. We built a space program almost overnight, and twelve years later, we were walking on the moon. Now, that spirit of discovery is in our DNA. America is Thomas Edison and the Wright Brothers and George Washington Carver. America is Grace

(18)

18 | P a g e

Hopper and Katherine Johnson and Sally Ride. America is every immigrant and entrepreneur from Boston to Austin to Silicon Valley racing to shape a better future. That’s who we are, and over the past seven years, we’ve nurtured that spirit.

Similar to the short fragments, these longer monologues were taken from the speeches to allow for more cross comparison between participants and professional speakers. Unfortunately, due to time constraints regarding the scope of the research, the recordings from the Monologue Condition had to be excluded from the data analysis.

The final section of the recording procedure involved the Song Condition. The popular nursery rhyme, “Row, Row, Row, Your Boat” was chosen as the lyrical example. This lyric was chosen due to its universality among participants. The full lyric is as follows:

1. Row, row, row your boat, gently down the stream, merrily, merrily, merrily, merrily, life is but a dream.

All participants had previous knowledge of the tune, and were instructed to read the lyric in a neutral intonation, performative intonation, and then sing it, three times consecutively per each intonation. Participants repeated the lyric three times so the central utterance could be analyzed in order to ensure the absence of onset and/or offset effects.

2.4 Data Analysis Procedure

Upon the culmination of the recording process, the researcher selected the best speech examples per condition per participant, sorted them into new sound files separated per condition, and created a database for the samples.

(19)

19 | P a g e

Each sample per each condition was analyzed separately, and all samples were analyzed within similar parameters.

Each sample was imported into Praat (Boersma & Weenink 2019), an acoustic signal analysis software, and the boundaries of every voiced vowel sound were marked manually within the program on a text grid. Other tiers were added to the text grid which marked the duration of the word or fragment as well as the particular intonation (i.e. Excited Speech, Subdued Speech, etc.) that the word or fragment was spoken in. A Praat script extracted the following information per vowel sound: the identified vowel sound, the vowel’s duration, which sample it was from, the intonation it was spoken or sung with, the median pitch, the mean pitch, five readings of the pitch equally spread across the duration of the voiced vowel, the average first formant, the average second formant, five readings of the first formant equally spread across the duration of the voiced vowel, and five readings of the second formant equally spread across the duration of the voiced vowel. Additionally, the duration of each single word and each fragment was noted, as well as the word and/or fragment’s intensity.

The data was exported to a CSV file, and imported to Excel, where it was separated by speech condition (i.e. Short Fragment, Single Word, Song), into three separate charts. After this, the inconclusive and outlier data was removed from the charts. Then the formant ratios were calculated by dividing the mean Formant 2 over the mean Formant 1 of each vowel sound. The formant ratios were then compared to the interval ratios common to Western music, as found in Bowling (2010), and in Table 4 of this paper, in order to match the formant ratios with interval ratios within 1% of the interval ratio value. The ratios were collapsed into a single octave such that they range from 1 to 2. A Chi-Squared test was used to evaluate the likeliness of chance in the data distribution observed.

(20)

20 | P a g e

The data extracted by the Praat script was then imported into JASP version 0.9.2.0. The effect of intonation on the following factors was evaluated using Repeated Measure ANOVA: Word Intensity, Word Duration, Syllable Duration, Variance of Syllable Duration, Range of Syllable Duration, Mean Pitch, Range of Mean Pitch, and Variance of Mean Pitch. These factors were chosen because they reflect speech intensity, speech rhythm, and speech melodicity.

3. Results

3.1 Overview

The following section details the descriptive statistics pertaining to the research, as well as the statistical significance of the results. The first section will detail the descriptive statistics, the second section will expound upon the rates at which syllable ratios matched to musical interval ratios, and the final section will discuss the statistical significance of any empirical effects distinguishing each intonation.

3.2 Descriptive Statistics

Table 1 shows the descriptive statistics of Word Intensity, Word Duration, Syllable Duration, and Mean Pitch per intonation for the Short Fragment Condition. On average, Excited Speech had a higher Word Intensity and Mean Pitch, as well as a shorter Word Duration and Syllable Duration as compared to the other two intonations. Neutral Speech had the lowest Word Intensity and Mean Pitch, as well as the longest Syllable Duration. Performative Speech fell in between Excited and Neutral on Word Intensity, Syllable Duration, and Mean Pitch, but had the longest Word Duration.

(21)

21 | P a g e

Table 2 shows the descriptive statistics for the Song Condition. On average, Neutral Speech had a lower Word Intensity and Mean Pitch, and shorter Word Duration and Syllable Duration than Performative Speech and Sung Speech. Performative Speech had a higher Word Intensity and longer Word Duration than Neutral and Sung Speech, and fell in the middle

between the other two intonations in regards to Syllable Duration and Mean Pitch. Sung Speech had a higher Mean Pitch and longer Syllable Duration than Neutral and Performative Speech, and fell in the middle of the other two intonations in regards to Word Intensity and Word Duration.

(22)

22 | P a g e

Table 3 shows the descriptive statistics for the Single Word Condition. On average, Excited Speech had the highest Word Intensity and Mean Pitch, as compared to the other three intonations. Neutral Speech had the lowest Mean Pitch, as well as the shortest Word Duration and Syllable Duration. Subdued Speech had the lowest Word Intensity and longest Word Duration and Syllable Duration as compared to the other intonations.

Synthesizing the results from all conditions, the observed tendencies for the measured parameters were as follows: for Word Intensity, the intonations from loudest to softest were Excited, Performative, Sung, Neutral, Subdued. For Word Duration, the intonations from shortest to longest were Excited, Neutral, Sung, Performative, Subdued. For Syllable Duration, the intonations from shortest to longest were Excited, Neutral, Performative, Subdued, Sung. For Mean Pitch, the intonations from highest to lowest were Excited, Sung, Performative, Subdued, Neutral.

3.3 Matching Formant Ratios to Musical Intervals

Table 4 displays the abbreviations, semitones, and ratio values of the Musical Frequency Ratios. The speech formant ratios were matched to these values through comparative analysis. The results of the formant ratio comparative analysis are visually represented in Figures 1-3, which display the

(23)

23 | P a g e

results for the Short Fragment (SF) Condition, Single Word (SW) Condition, and Song (SG) Condition, respectively.

Table 5 displays the matching rates of syllable formant ratios to musical interval ratios between intonations and per condition. Chi-Square tests were conducted in order to evaluate the likelihood that the data distribution observed was due to chance. The tests indicated that intonation had no defining effect on musical interval ratio matching rates.

(24)

24 | P a g e

Formant Ratios to Musical Interval Matching Rates per Condition

Fig. 1: Number of

syllable formant ratios matched to musical intervals for Short Fragment (SF) Condition

Fig. 2: Number of

syllable formant ratios matched to musical intervals for Single Word (SW) Condition

Fig. 3: Number of

syllable formant ratios matched to musical intervals for Song (SG) Condition

*Formant ratios not matched to musical intervals are not pictured

0.00% 0.50% 1.00% 1.50% 2.00% 2.50% m2 M2 m3 M3 P4 tt P5 m6 M6 m7 M7 P8 Fre q u en cy o f Occu rre n ce (% ) Intervals

SG Formant Ratios to Musical Intervals

Neutral Performative Sung 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% m2 M2 m3 M3 P4 tt P5 m6 M6 m7 M7 P8 Fre q u en cy o f Occu rre n ce (% ) Intervals

SF Formant Ratios to Musical Intervals

Excited Neutral Performative

0.00% 0.50% 1.00% 1.50% 2.00% m2 M2 m3 M3 P4 tt P5 m6 M6 m7 M7 P8 Fre q u en cy o f Occu rre n ce (% ) Intervals

SW Formant Ratios to Musical Intervals

(25)

25 | P a g e

3.4 Effects of Intonation

Table 6 displays the results of the Repeated Measures ANOVA conducted on the data of the three separate speech conditions. This analysis determines whether there was a statistically significant effect of intonation in regards to the qualitative parameters analyzed, namely, Word Intensity, Word Duration, Syllable Duration, Variance of Syllable Duration, Range of Syllable Duration, Mean Pitch, Variance of Mean Pitch, and Range of Mean Pitch per speech condition. Figures 4, 5, and 6 display the results of the Post Hoc tests. The alfa level of all Post Hoc tests was set to 0.05.

3.4.1 Word Intensity

There was a significant effect of intonation on Word Intensity levels in all three conditions. For the SF Condition, Post Hoc tests showed that Excited Speech was louder than Neutral Speech, and Neutral Speech softer than Performative Speech. For the SG Condition, Excited Speech was significantly louder than Neutral Speech, and Performative Speech was louder than Neutral Speech. For the SW Condition, Excited Speech was louder than Neutral Speech and Subdued Speech, and Performative Speech was louder than Neutral Speech and Subdued Speech. In all conditions, no significant difference was observed between Excited Speech and Performative Speech with regard to all three conditions.

3.4.2 Word Duration

There was a significant effect of intonation on Word Duration in the SF Condition. In Post Hoc tests, results showed that Performative Speech had longer word durations than both Excited Speech and Neutral Speech. The p-value associated with the relationship between Performative Speech and Neutral Speech was 0.081, which is slightly higher than the usual accepted value, but still an interesting effect to address.

(26)

26 | P a g e

Table 5: Interval Ratio Matching Rate Per Intonation – shows frequency of occurrence of speech syllable formant ratios that matched to

music interval ratios. Chi-Square p-value indicates significance of values

Chromatic

Conversion N Rows % of Total Excited % of Total Neutral Performative % of Total Square p- Chi-value

N Rows

% of Total

Neutral Performative % of Total % of Total Sung Chi-Square p-value N Rows % of Total Excited % of Total Neutral % of Total Performative % of Total Subdues Chi-Square p-value Condition Fragment Short Fragment Short Fragment Short Fragment Short Fragment Short Song Song Song Song Song Single Word Single Word Single Word Single Word Single Word Single Word

m2 56 1.09% 1.3% 1.44% 0.153 17 0.99% 0.49% 0.62% 0.358 14 1.05% 1.05% 0.42% 0.42% 0.412 M2 41 0.89% 1.23% 0.68% 0.253 27 0.99% 1.11% 1.23% 0.195 10 0.21% 1.26% 0.21% 0.42% 0.562 m3 42 1.23% 0.82% 0.82% 0.245 29 1.11% 0.99% 1.48% 0.173 10 0.42% 0.21% 0.42% 1.05% 0.562 M3 38 0.82% 1.03% 0.75% 0.280 36 2.10% 1.23% 1.11% 0.113 10 0.63% 0.21% 0.63% 0.63% 0.562 P4 42 1.09% 1.09% 0.68% 0.244 34 1.48% 1.48% 1.23% 0.128 5 0.42% 0.21% 0.42% 0.00% 0.795 tt 66 1.16% 1.98% 1.37% 0.109 43 2.10% 2.10% 1.11% 0.074 16 1.05% 0.84% 1.05% 0.42% 0.350 P5 41 0.96% 0.82% 1.03% 0.253 28 1.11% 0.74% 1.60% 0.184 14 1.46% 0.42% 0.63% 0.42% 0.412 m6 37 0.75% 0.82% 0.96% 0.289 17 0.99% 0.49% 0.62% 0.358 12 0.84% 0.42% 0.63% 0.63% 0.483 M6 30 0.55% 0.89% 0.62% 0.366 18 1.11% 0.62% 0.49% 0.337 14 0.63% 1.05% 0.21% 1.05% 0.412 m7 50 1.3% 1.16% 0.96% 0.187 13 0.74% 0.74% 0.12% 0.455 19 0.63% 1.67% 0.84% 0.84% 0.273 M7 45 1.16% 1.09% 0.82% 0.221 13 0.37% 0.99% 0.25% 0.455 14 0.21% 1.05% 0.84% 0.84% 0.412 P8 38 0.89% 0.82% 0.89% 0.280 17 0.74% 0.49% 0.86% 0.358 21 1.46% 0.84% 1.05% 1.05% 0.230

(27)

27 | P a g e

3.4.3 Syllable Duration

There was a significant effect of intonation on Syllable Duration in the SG Condition and the SW Condition. In Post Hoc Tests, the results of the SG Condition showed that Neutral speech had shorter Syllable Durations than both Sung Speech and Performative Speech. In the SW Condition, results showed that Neutral Speech had shorter durations than Subdued Speech. The P-value associated with this finding is slightly higher than the accepted value, at 0.053, but is still an interesting effect to address.

3.4.4 Variance Syllable Duration

There was a significant effect of intonation on Variance of Syllable Duration in the SG Condition. However, upon Post Hoc test evaluation, the effect violated Mauchly’s Sphericity, even after implementing corrections, rendering the effect null.

3.4.5 Range Syllable Duration

There was a significant effect of intonation on Range of Syllable Duration observed in the SG Condition. Post Hoc tests showed that Sung Speech had a significantly broader Range of Syllable Duration as compared to Neutral and Performative Speech.

3.4.6 Mean Pitch

There was a significant effect of intonation on Mean Pitch observed in all three conditions. For the SF Condition, Post Hoc test results show that Excited Speech had a higher Mean Pitch than both Performative Speech and Neutral Speech. It also showed that Performative Speech had a higher Mean Pitch than Neutral Speech. For the SG Condition, Post Hoc test results showed that Sung Speech had a higher Mean Pitch than both Performative Speech and Neutral Speech, and Performative Speech had a higher Mean Pitch than Neutral

(28)

28 | P a g e

Speech. For the SW condition, Post Hoc test results show that Excited Speech had a higher Mean Pitch than Neutral Speech, Performative Speech, and Subdued Speech.

3.4.7 Variance Mean Pitch

There was a significant effect of intonation on Variance of Mean Pitch observed in the SG Condition. Post Hoc test results showed that Sung Speech had a higher Variance of Mean Pitch than both Neutral Speech and Performative Speech, and Performative Speech had a higher Variance of Mean Pitch than Neutral Speech.

3.4.8 Range Mean Pitch

There was a significant effect of intonation on Range of Mean Pitch observed in the SG Condition. Post Hoc test results showed that Sung Speech had a higher Range of Mean Pitch as compared to Neutral Speech and Performative Speech, and Performative Speech had a higher Range of Mean Pitch than Neutral Speech

(29)

29 | P a g e

Table 6 – Summary of Repeated Measures ANOVAs: SF = Short Fragment Condition; SG =

Song Condition; SW = Single Word Condition

Category ANOVA (F) ANOVA

(p) Category ANOVA (F) ANOVA (p) Category ANOVA (F) ANOVA (p)

SF Word

Intensity F(2, 18) = 30.84 <.001*** SG Word Intensity F(2, 18) = 30.84 <.001*** SW Word Intensity F(3, 27) = 34.22 <.001*** SF Word

Duration F(2, 18) = 7.884 0.003** SG Word Duration F(2, 18) = 3.478 0.053* SW Word Duration F(3, 27) = 1.877 0.157 SF Syllable Duration F(2, 18) = 0.956 0.403 SG Syllable Duration F(2, 18) = 12.44 <.001*** SW Syllable Duration F(3, 27) = 3.154 0.041* SF Variance Syllable Duration F(2, 18) = 1 0.387 SG Variance Syllable Duration F(2, 18) = 19.25 <.001*** SW Variance Syllable Duration F(3, 27) = 2.036 0.132 SF Range Syllable Duration F(2, 18) = 1.007 0.385 SG Range Syllable Duration F(2, 18) = 10.97 <.001*** SW Range Syllable Duration F(3, 27) = 1.043 0.389 SF Mean

Pitch F(2, 18) = 23.32 <.001*** SG Mean Pitch F(2, 18) = 41.72 <.001*** SW Mean Pitch F(3, 27) = 22.68 <.001*** SF Variance Mean Pitch F(2, 18) = 2.502 0.11 SG Variance Mean Pitch F(2, 18) = 18.73 <.001*** SW Variance Mean Pitch F(3, 27) = 0.84 0.484 SF Range Mean Pitch F(2, 18) = 0.063 0.939 SG Range Mean Pitch F(2, 18) = 14.58 <.001*** SW Range Mean Pitch F(3, 27) = 0.991 0.412

(30)

30 | P a g e

Figure 4 – Effects of Intonation in Short Fragment Condition

A. B. C.

A. Shows effect between Excited/Neutral and Neutral/Performative for Word Intensity B. Shows effect between Excited/Performative and Neutral/Performative

for Word Duration C. Shows effect between Excited/Performative, Excited/Neutral, and Neutral/Performative for Mean Pitch

Figure 5 – Effects of Intonation in Single Word Condition

A. B. C.

A. Shows effect between Excited/Subdued, Excited/Neutral, Neutral/Performative, and Performative/Subdued for Word Intensity B. Shows effect between

Neutral/Subdued for Syllable Duration C. Shows effect between Excited/Subdued, Excited/Performative, and Excited/Neutral for Mean Pitch

0 10 20 30 40 50 60 70 Word Intensity (dB)

Excited Neutral Performative Subdued 56 58 60 62 64 66 Word Intensity (dB) Excited Neutral Performative

0 50 100 150 200 250 Mean Pitch (Hz)

Excited Neutral Performative 0 100 200 300 400 Word Duration (ms) Excited Neutral Performative

0 50 100 150 200 250 Mean Pitch (Hz)

Excited Neutral Performative Subdued 10 10.5 11 11.5 12 12.5 Syllable Duration (ms)

Excited Neutral Performative Subdued

* * * * * * * * * * * * * * *

(31)

31 | P a g e

Figure 6 - Effects of Intonation in Song Condition

A. B. C.

D. E. F.

A. Shows effect between Neutral/Sung and Neutral/Performative for Word Intensity B. Shows effect between Neutral/Sung and Neutral/Performative for

Syllable Duration C. Shows effect between Neutral/Sung and Performative/Sung for Range of Syllable Duration D. Shows effect between Neutral/Sung,

Neutral/Performative, and Performative/Sung for Mean Pitch E. Shows effect between Neutral/Sung, Neutral/Performative, and Performative/Sung for Variance of Mean Pitch F. Shows effect between Neutral/Sung, Neutral/Performative, and Performative/Sung for Range of Mean Pitch

56 57 58 59 60 61 62 63 Word Intensity (dB) Neutral Performative Sung

0 50 100 150 200 250 Mean Pitch (Hz) Neutral Performative Sung

0 50 100 150 200 250

Range of Mean Pitch (Hz) Neutral Performative Sung 0

100 200 300 400

Variance of Mean Pitch (Hz) Neutral Performative Sung

0 10 20 30 40

Range of Syllable Duration (ms) Neutral Performative Sung 0 5 10 15 20 Syllable Duration (ms) Neutral Performative Sung

* * * * * * * * * * * * * * *

(32)

32 | P a g e

4. Discussion

4.1 Overview

The intent of the current study is to evaluate the musical qualities of speech by focusing on the defining features of particular types of emotional speech, in hopes of supporting the hypothesis that Performative Speech elicits more musical qualities than Neutral Speech. This study uses emotional speech data from recordings conducted by the researcher in order to measure how these defining features change with emotional intonation. The study also evaluates the rate at which speech formant ratios match to musical interval ratios, in the hope of replicating the results of Bowling (2010). Bowling’s results suggest a relationship between Excited Speech and the traditionally Western major mode, and Subdued Speech and the traditionally Western minor mode. Replicating these results could offer further evidence for the connection of the emotional processing mechanisms of music and speech, as well as give insight into the evolution of musical traditions.

The current study assesses speech on the parameters of Word Intensity, Word Duration, Syllable Duration, Variance of Syllable Duration, Range of Syllable Duration, Mean Pitch, Variance of Mean Pitch, and Range of Mean Pitch in order to gain an understanding of how these features change in speech with varying intonation and intent. These features were chosen because they are indicators of speech rhythm, speech dynamic, and speech melodicity. The study focuses on Excited Speech, Neutral Speech, Performative Speech, Subdued Speech, and Sung Speech, and finds multiple statistically significant effects of intonation on the selected speech parameters, suggesting that the intonations can be defined by these parameters.

(33)

33 | P a g e

4.2 Musical Interval Ratio Matching and Chi-Square Test

The current study was unable to replicate the results of Bowling (2010), which suggest a relationship between musical interval ratios and speech formant ratios, as no significant effects were found within the distribution of matching rates of formant ratios to musical intervals. Although 36% of SF formant ratios, 36% of SG formant ratios, and 33% of SW formant ratios did match to these musical interval ratios, and such a relatively high matching rate could indicate a potential link to speech and musicality, there is no evidence of a significant relationship between intonations incorporated in the present study and interval ratio matching rate (if a particular intonation’s formant ratios match to musical interval ratios more frequently than others), as well as intonation and frequency of particular musical interval ratio (if an intonation has a higher occurrence of formant ratios matching to a particular musical interval ratio). The results of the Chi-Square test confirmed that there were no significant effects within the occurrence of interval matching, indicating that the data distribution could be due to chance.

While this study did not reinforce Bowling’s results, it does not indicate that their results are invalid. Rather, the difference in results could be due to the time constraints of the present study. The procedure of the current study was closely modelled off the procedure of Bowling (2010) in certain ways: there were ten total participants, and the procedure of the Single Word Condition and Monologue Condition was very similar to that of Bowling (2010), as it included the repetition of single words and short passages of text in excited and subdued intonations. However, in the present study, the speech data of the Monologue Condition was unable to be included in the data analysis. The manual nature of the speech data demarcation was too time intensive for the scope of the current study. If the data from the Monologue Condition was included in the analysis, perhaps the results of the present study would express significance. Additionally, Bowling (2010) only included Excited and Subdued

(34)

34 | P a g e

Speech, while the current study also considered Performative Speech, Neutral Speech, and Sung Speech. Perhaps the addition of these other intonations effected the comparative significance of the results. Additionally, some intonations (i.e. Sung Speech, Subdued Speech) had smaller corpuses of speech data as compared to other intonations (i.e. Performative Speech, Neutral Speech). For these reasons, it would be interesting to repeat this procedure including a larger and more equalized sample size of speech data from each intonation, to see if significant effects manifest.

The continuation of the present research remains an interesting area to pursue, as the replication of Bowling’s results would be an advantageous addition to the corpus of research conducted on this topic. The continuation of this topic would also be pertinent to the inspiration of research in other areas. The origins and evolution of human musicality is one such subject area. A number of existing theories on the origins of musicality (Darwin 1871; Brown 2000; Pinker 1998) base themselves in a particular understanding of human speech, language, and communication abilities. The results of Bowling (2010) suggest a vocal basis of human musicality, citing the relationship of emotional voices with musical intervals. By understanding and increasing the awareness of the musical mechanics of speech, one can potentially acquire a more nuanced understanding of how human musicality evolved, and musicality’s role in language today.

4.3 Evaluation of Intonation Effects

Excited Speech proved to be different from Neutral Speech, Performative Speech, and Subdued Speech in both conditions it was present in, the Short Fragment (SF) Condition and the Single Word (SW) Condition. Excited Speech was louder than Neutral Speech and Subdued Speech in Word Intensity. It also had a higher Mean Pitch as compared to Neutral Speech, Performative Speech, and Subdued Speech. These effects were consistently

(35)

35 | P a g e

observed in both conditions, SF and SW. Excited Speech also had a shorter Word Duration than Performative Speech in the SF Condition. These results suggest that Excited Speech is characterized by a high pitch, loud intensity, and fast articulation.

On a purely aural understanding of Excited Speech, these results make sense, as an excited person generally produces loud and higher-pitched speech at an elevated rate. Although results from other studies support the claim that Excited Speech is characterized by a comparatively high pitch and loudness (Bowling 2010; Bowling 2012; Paulmann 2013; Yanushevskaya 2013), there is a surprising lack of literature regarding the analysis of speech duration in regards to Excited Speech. This indicates that the current study may be the first to offer explicit results suggesting that Excited Speech is characterized by quick articulation as well.

Performative Speech proved to be different from Excited Speech, Neutral Speech, Sung Speech, and Subdued Speech in all the conditions, although some of its defining factors varied from one condition to another. This result makes the defining features of Performative Speech difficult to succinctly summarize. However, because the intent of the current study is to explore the differences between Neutral Speech and Performative Speech, the results can be summarized within this focus. Synthesizing the results, Performative Speech was defined by a higher Word Intensity, a higher Mean Pitch, a higher Range of Mean Pitch, and a higher Variance of Mean Pitch as compared to Neutral Speech.

These results offer interesting implications, as they align with findings of previous studies (Niebuhr 2016; Signorello 2012; Signorello 2013; D’Errico 2012), which also show Performative Speech expressing the qualities mentioned above. These results also make sense in regards to an aural understanding of Performative Speech, as a charismatic speaker generally speaks with a high intensity and varying vocal pitch. The high Mean Pitch of Performative Speech

(36)

36 | P a g e

is a bit puzzling; past research has also observed this characteristic, and researchers have commented on the unexpectedness of the finding, as the stereotype for a charismatic voice suggests a deeper pitch (Niebuhr 2016). These results offer more evidence of the ways in which Performative Speech can be qualified, and offer a good foundation for future research on the topic. For example, the qualities of Performative Speech seem to also align with Infant Directed Speech in some ways. Further research comparing these two voice types could help aid in understanding how emotional and musical voices are processed and attended to.

Neutral Speech was different from Excited Speech, Performative Speech, Sung Speech, and Subdued Speech in all conditions, although some of its defining features also varied from one condition to another. This result makes the defining features of Neutral Speech difficult to succinctly summarize, however, it can broadly be said that Neutral Speech was characterized overall by a low volume, low pitch, quick articulation, and monotonous delivery, especially in comparison to Excited, Sung, and Performative Speech.

An aural understanding of Neutral Speech corroborates these findings, as a person speaking in a neutral reading tone tends to speak monotonously and at a low and steady volume. Additionally, these qualities of Neutral Speech are comparatively similar to those of Adult Directed Speech, as defined by past research (Cooper 1990; Corbeil 2013). However, there has been no past research specifically defining the qualities of Neutral Speech, indicating that the current study is the first to present these results.

Sung Speech was different from Neutral Speech and Performative Speech in the only condition it was present in, the SG Condition. Sung Speech proved to have longer Syllable Durations as compared to Neutral Speech, a higher Range of Syllable Duration as compared to both Neutral Speech and Performative Speech, a higher Mean Pitch than both Neutral and Performative Speech, a higher Variance of Mean Pitch as compared to Neutral and

(37)

37 | P a g e

Performative Speech, and a higher Range of Mean Pitch as compared to Neutral and Performative Speech. This suggests Sung Speech could be characterized by a high and varying pitch, as well as an overall slow, but varying articulation.

These findings make sense as this describes the rhythm and melody of “Row, Row, Row Your Boat.” This would be interesting to repeat with a different song, as these qualities may change with the analysis of a different song. The current study may be one of the first studies to acoustically analyze Sung Speech for these qualities. Further analysis of Sung Speech should be done in order to better understand its acoustic prosodic characteristics.

Subdued Speech was different from Excited Speech, Neutral Speech, and Performative Speech, in the only condition it was present in, the SW Condition. It proved to have a lower Word Intensity than both Excited Speech and Performative Speech, a longer syllable duration than Neutral Speech, and a lower Mean Pitch than Excited Speech. This indicates that Subdued Speech was characterized by a low volume, low pitch, and a slow articulation.

These results make sense, as subdued speakers tend to speak softly, slowly, and in a lower cadence. These results are reinforced by past research (Bowling 2010; Bowling 2012; Filippi et al. 2017). The results of the current study add to the limited amount of research on the topic, and also add credibility to the understanding of the acoustic prosodic characteristics of Subdued Speech. In the future, these findings can be used to aid in further research on similar topics.

4.4 Implications for Further Research

The results of the present study express a number of significant effects of intonation on the chosen speech parameters, supporting the idea that intonation is able to be quantitatively measured by and distinguished by these parameters. These results are a positive step towards understanding

(38)

38 | P a g e

Performative Speech, and the mechanisms behind its agency, such as listener engagement and emotional arousal. These results also support the hypothesis that Performative Speech is more musical than Neutral Speech in various ways. The observed results also confirm the findings of Niebuhr (2016).

Niebuhr (2016) reports that the Performative Speech of Steve Jobs had a higher Mean Pitch, higher Word Intensity, shorter Syllable Duration, and higher Variance of Syllable Duration. The results of the present study report that Performative Speech tended to have a higher Mean Pitch, higher Word Intensity, higher Range and Variance of Mean Pitch, and longer Syllable Duration. These findings overlap, reinforcing the claim that Performative Speech is characterized by a higher Mean Pitch and higher Word Intensity, although the results differ on the characterizing effects of Syllable Duration.

This difference between the characterizing effect of Speech Duration would be interesting to analyze further. Speculatively, this difference may be due to the direction of the participants in the current study. The participants were directed to a more political understanding of Performative Speech, while Job’s Performative Speech was of a more entrepreneurial/business understanding. This is significant to consider because the expected audience for the speaker is most likely different from political speech to business speech, and this difference in audience expectations could alter the speaker’s delivery of their speech. The current study sought to compare participant data against the speech data of Barack Obama and Elizabeth Warren, similar to the way Niebuhr (2016) compared the speech data of Steve Jobs to their reference sample. If the scope of the current study allowed for this analysis, it may have produced some interesting results. Additionally, the comparison of speech data from professional political speakers to that of professional business speakers could offer some interesting insights into the characterizing effects of Performative Speech, and how these effects may change between subject areas.

(39)

39 | P a g e

In another vein, the inclusion of English spoken with different or more present accents, or the inclusion of participants speaking another language, in a further study on this topic may yield some interesting results, as accents effect speech rhythm and melodicity. Additionally, although there was no targeted age range of participants in the current study, it would be meaningful to include older participants in the recording process as well, as the voice changes as it ages, presenting another area for further research. The inclusion of additional emotional intonations, such as anger or fear, to the recording process could also be beneficial in producing more significant results.

Understanding the mechanics behind speech intonation and the musical qualities inherent to it allow for a better understanding of the factors that affect human perception of speech. Performative speech is particularly interesting to study further, as the affective properties of it can be quite significant. There has been some research which delineates some characteristics of charismatic speakers (D’Errico 2013; Rosenberg & Hirschberg 2009; Signorello 2012; Signorello 2013), but it is still unknown why human listeners perceive these speakers as charismatic. Understanding what makes speech charismatic would have far reaching consequences for those learning to be professional speakers, such as politicians, entrepreneurs, performers, business executives, and teachers. It is also interesting to consider how it could help artificial intelligence engineers looking to improve speech recognition or speech synthesis technology, as a better understanding of organic speech could allow for improvements on speech technologies.

4.5 Study Design

The procedure of the recording process underwent a number of revisions before it was enacted, and a few problematic challenges were encountered throughout its development, which have been touched upon in previous sections. Presently, however, I would like to reflect on one specific

(40)

40 | P a g e

challenge that was encountered while designing the study. In order to ascertain the musical qualities of Performative Speech, I desired to compare the speech samples directly with Sung Speech. This is the purpose for the Song Condition. In order to achieve this, my original intent was to analyze the participants’ Performative Speech samples from the Short Fragment and Monologue Conditions, in hopes of finding sections that would transform in the speech-to-song phenomenon. I would then take these sections, and present them back to the participants ten times over, similar to Deutsche (2008), and then record them singing the fragment.

From this, I would have been able to show a direct comparison between Neutral Speech, Performative Speech, and Sung Speech, which would have ideally illustrated a genesis of melodic formation: the elements of pitch contour, or other acoustical elements, would be increasingly more developed per vocal style. Unfortunately, this could not be achieved within the time restrictions of the research, as my time with the participants in the United States, in particular, was limited.

Conclusion

Music can be defined by a number of factors, including pitch, rhythm, melody, and dynamic. In a similar way, speech is defined by tone, volume, and speed of articulation. By comparing these particular qualities of different types of emotional speech, we can evaluate certain musical qualities of speech. By comparing how these qualities manifest themselves between different intonations, we can gain insight into the ways in which emotional speech is processed, and how different types of speech are distinguished from one another. The current study reports on the relationships between the defining musical qualities of various intonations, which point to Performative Speech eliciting a variety of musical qualities. These qualities include increased loudness, slower articulation, higher pitch, increased range of pitch, and increased variance of pitch. This, in turn, provides a plausible basis for the

(41)

41 | P a g e

understanding of the effective qualities of Performative Speech as more musical than those qualities defining other emotional intonations such as Neutral Speech, thus presenting opportunities for further research.

Referenties

GERELATEERDE DOCUMENTEN

De situatie verschilt echter duidelijk per gebied (tabel 1). In het Noordelijk kleige- bied hebben de bedrijven gemiddeld de hoogste moderniteit. En dankzij de grote

Het is wel van belang voor de praktijk, omdat het model wordt gebruikt bij diverse beleidsevaluaties.

Mitton (1966:105) explains that for James, justification “is [predominantly] associated with the Last Judgement, but can also have reference to God’s acceptance of a man in this

Compared to older same sex drivers, both male and female young drivers in Europe report more frequently to engage in various traffic infringements, a preference for higher

(2006), varimax rotation helps in simplifying the columns of the factor matrix and maximizes the sum of variances of required loadings of the factor matrix, in

Speakers did not economize on accent lending pitch movements, but 40% of the boundary marking pitch movements disappeared under time pressure, reflecting the linguistic hierar- chy

In gevalle waar dan nie as voorwaardelikheidsmerker (kategorie 1) gebruik word nie, of waar daar na tyd anders as slegs die opeenvolging (kategorie 5) van

The present study’s results support its expectations regarding mean pitch (higher mean pitch for high-arousal than low-arousal emotions), pitch range (wider pitch range for