• No results found

The effect of the speech-to-song illusion on the perception of language-related information

N/A
N/A
Protected

Academic year: 2021

Share "The effect of the speech-to-song illusion on the perception of language-related information"

Copied!
16
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The effect of the speech-to-song illusion on the perception of language-related information

Master Brain and Cognitive Sciences Research Project 1

Eylül Turan

Supervisor: Makiko Sadakata Second examiner: Yiya Chen

(2)

Abstract

The question of whether music and language are two distinct categories or are overlapping ones has been long investigated. A recent discovery, the speech-to-song illusion (STS), suggests that the two are acoustically overlapping but perceptually distinct categories. In the STS certain spoken phrases are perceived as highly song-like when isolated from context and repeated. The transformation from speech to song benefits the processing of music-specific information as it has been illustrated by listeners’ increased precision in discriminating music-relevant acoustic information (e.g. pitch). Here we asked whether the same increase in sensitivity would be observed when manipulating vowels, an acoustic information that is relevant for language processing. We investigated this by administering a discrimination task both when the excerpt was perceived as speech initially and when it was perceived as song after repetition. Although there was an increase from initial to final discrimination, this did not reach significance. Interestingly, however, representing the excerpts as a song from the very start benefited sensitivity towards vowel manipulations in general, regardless of whether these took place initially or later on in the trial. These findings suggest that although the perceptual shift from speech to song affects judgments about speech-relevant information, the ability to represent information in a musical way early on has a larger impact on the

processing of language-relevant stimulus.

(3)

The effect of the speech-to-song illusion on the perception of language-related information Music and language are two domains that are of significant importance to human communication. Not only language but also the ability to acquire knowledge about music, and to understand and enjoy music is common to all human societies (Brown, 1991; Koelsch & Siebel, 2005; Zatorre, 2002). Both speech and song are produced with the same instrument and they have similar structure. For music, rhythm and the temporal ratios that segment a piece of music by the time continuum are important structural elements. Additionally, the division of the sound continuum into discrete pitches is common in all cultures, a structural characteristic that is useful to form musical scales. The same characteristics can be found in language as well. Language also involves sequential events that unfold in time. Further, it has a specific rhythm and segmental information such as phonemes and prosody. Similar to the sound continuum, the speech continuum is divided into discrete phonemes which generates the basic phonological unit. There exist also functional similarities between music and language. Both of them involve memory processes: we can easily recognize and reproduce learned melodies and as well as words. Both of them are closely tied to intentionality: music is an “act of creation that actualizes an intention” (Arom, 2000), and the same principle is valid for language as well. Because of such intentionality, both of them require to have a theory of mind ability (Besson & Schön, 2001). Further, in terms of their acquisition, both musical and linguistic rules are acquired by young children in a similar and effortless way. Very early on children can create novel musical and verbal utterances by applying a rule system that they abstract from their environment without a conscious intention. In fact, during the first phase of language acquisition, infants heavily rely on prosodic cues (Jusczyk, 1999). For Stanley (1945) a song is a “magnification of speech, which is wedded to the language of music” (p. 268). Despite having these similarities, speech and music are

generally regarded as being acoustically distinct categories and researchers tried to develop automated methods to distinguish the two based on their acoustic characteristics (Saitou et al., 2007; Schlüter & Sonnleitner, 2012). However, recently Deutsch, Henthorn and Lapidis (2011) discovered a unique phenomenon, the speech-to-song (STS) illusion, in which certain speech utterances, when removed from the surrounding context and repeated several times, could be perceived as a song suggesting an acoustical overlap between the two categories. Specifically, the illusion shows that music perception is a mode of listening that could be applied to speech stimuli, which are not heard as musical initially (Tierney, Patel & Breen, 2018). Hence the two categories, speech and song, are perceptually distinct ones that can arise from the exact same physical stimulus (i.e. identical verbal phrases). Therefore the listener’s task is to resolve this perceptual ambiguity when judging about the broader class (i.e. speech or song) of a given stimulus.

After the illusion was discovered, several studies focused on the stimulus

characteristics that might play a role for the transformation to occur. Tierney, Dick, Deutsch, and Sereno (2013) replicated the STS on a larger sample of illusion stimuli. They also identified a set of control stimuli matched to the illusion ones that did not show the

transformation effect. Their corpus of stimuli was replicated in non-musician participants as well (Vanden Bosch der Nederlanden, Hannon, and Snynder, 2015a). In another study, Margulis, Simchy-Gross, and Black (2015) showed that speech utterances taken from

languages that are more difficult to pronounce leads to a stronger transformation effect. Falk, Rathcke, and Dalla Bella (2014) manipulated stimulus characteristics such as pitch contour and rhythmic regularity and found that these can modulate the speed of the STS

transformation. Finally, the study conducted by Vanden Bosch der Nederlanden, Hannon and Synder (2015b) tested the effect of the STS on pitch discrimination performance, and

(4)

revealed that listeners could make use of music-specific representations of pitch structure while listening to speech segments. Specifically, when perceiving the excerpts as speech their sensitivity to pitch changes that conformed or did not conform to the Western musical scale was similar, however once the transformation occurred they became sensitive to pitch changes that violated their musical expectations, whereas they missed the ones that were in line with their expectations.

This research validates the finding that speech can be perceived as song, and points to the acoustic and musical elements that influence the strength of the transformation to music. Drawing from these studies, it seems that once the necessary preconditions are met, the music mode of listening could be applied to other types of stimuli than music, such as speech. Moreover, once the transformation occurs, listeners become more sensitive to the musical elements within the utterances such as discriminating acoustic features (e.g. pitch) that are relevant for music processing. However, none of these studies investigated how the transformation from speech to song affects perception of acoustic information related to language processing. Does listeners’ increased sensitivity to musical elements result in a decrease in sensitivity to language-related elements? The current paper explores this issue by manipulating vowel information in speech utterances that are subject to the transformation, and by examining how accuracy in detecting these vowel manipulations is affected once the transformation occurs.

An interesting aspect of the STS is that although listeners may apply a musical mode of listening to the non-musical information, this often requires repetition of it, suggesting that the musical elements are not readily available to the listener. Intuitively, it would be expected that music and speech are different enough to make correct classifications regardless of how frequently one is exposed to the stimulus (Tierney, Patel & Breen, 2018). There are two views that explain the underlying reasons of why repetition leads to disturbances in

classifying whether the sound is speech or music. The first one suggests that during speech processing the listener operates in a so-called ‘speech perception mode’ in which speech-related information becomes more salient over time and saliency of music-speech-related

information decreases. Repeated exposure of the stimulus satiates the resources necessary for speech perception, which in turn frees up space for music-related acoustic information such as pitch and rhythm (i.e. information that is less relevant in speech perception). Hence the listener shifts from a ‘speech perception mode’ to a ‘music perception mode’ (Margulis, 2013). As an example, the fact that pitch perception is disrupted by timbral variations (Allen & Oxenham, 2014; Caruso & Balaban, 2014; Warrier & Zatorre, 2002) suggests that

elements that are more important for speech perception (such as spectral information) and the ones that are more important for song perception (such as pitch processing) have an inverse relationship. Also, it seems that the saliency of speech-specific information is partly

determined by the listener’s familiarity with the language as the song illusion is experienced stronger for unfamiliar languages that are hard for the listener to pronounce compared to more pronounceable unfamiliar languages. (Margulis, Simchy-Gross, & Black, 2015).

The alternative view (i.e. the musical structure account) argues that listeners are always sensitive to musical aspects in any given information, however it takes time, therefore repetition, to extract the musical structure. For example, after a single repetition, listeners cannot judge the exact intervals of a tone sequence, however with repeated exposure their performance increases (Deutsch, 1979). It could be that repetition allows gaining knowledge about the exact intervals; hence it helps to assign a certain tonal schema to the pitch

sequences. Such an application of a tonal schema can even alter the way a pitch sequence is being perceived by moving it to the direction of the perceived key. That would explain listeners’ ease in detection for pitch changes within the illusion that move away from the perceived key (Vanden Bosch der Nederlanden et al., 2015b). The musical structure account

(5)

further posits that the illusion does not resemble the verbal transformation effect, in which massed repetition of a single word leads to the perception of various semantically and phonologically related words or non-words (Bashford, Warren, & Lenz, 2006; Bashford, Warren, & Lenz, 2008), rather it might be analogous to a perceptual transformation that happens when for example listeners hear a sequence of vowel sounds (Warren, Bashford, & Gardner, 1990; Warren, Healy, & Chalikia, 1996). When exposed to a rapid sequence of vowel sounds, listeners apply their top-down knowledge, which leads them to perceive these acoustic sounds as verbal sequences conforming to the phonotactic rules of English.

In sum, the speech-perception mode account assumes a shift from the speech-mode of listening to the music-mode of listening, so it predicts that initially listeners’ judgments about whether an excerpt is a song should be minimal and shouldn’t be related to the stimulus’ acoustical characteristics. On the other hand, the musical structure account argues that listeners are always sensitive to musical features of a stimulus however it takes time to extract the musical structure of a it. Hence it predicts that even initially listeners can make judgments of musical elements in a stimulus, and with repetition there would be an increase in their perceived musicality regarding the stimulus. Tierney and colleagues (2018)’s study confirmed the musical structure account over the speech-perception mode account.

Accordingly, subjects’ initial musicality ratings of the excerpts correlated with their final musicality ratings and with their judgments about the acoustic characteristics (pitch and rhythm) of the stimuli, suggesting that the observed increase in song perception resulted from an increased precision in analyzing the musical characteristics of the stimulus.

Although listeners’ ability of evaluating musicality seems to increase, it remains a question whether this observed increase affects their capability of evaluating speech-related information. In the current study we compared these two explanations by manipulating acoustic information that is relevant for speech-processing, namely vowel sounds. The reason why we focused specifically on vowels is because their role in speech processing is less specific compared to the role of consonants (e.g. Bonatti, Peña, Nespor, & Mehler, 2005; Mehler, Peña, Nespor, & Bonatti, 2006). Their processing is also different from consonants’ in music perception. Whereas consonants are more separable from melodic processing, vowels interact with intervals during song processing (Kolinsky et al., 2009). Therefore, it has been argued that their linguistic function is similar to the functions of pitch in music processing. Because vowels are relevant to speech processing, but are also heavily involved in music processing –more than consonants are- they are an ideal candidate to study the speech-to-song illusion. Their relevance to both speech and music led us to the question of how their perception will be altered based on the listener’s percept of the larger context (i.e. speech or song). On the one hand, the speech-perception mode account posits a trade-off between speech resources and music resources, hence it would predict that initially detecting deviations in speech-related information should be accurate because the listener would operate in a speech-perception mode, however once operating in the music-perception mode, i.e. once the transformation occurs, the listener should become less sensitive to speech-related information. On the other hand, based on the musical structure account it could be argued that just as the listeners have the capacity to extract the musical structure within any given

information and with repetition get better at doing this, they can do the same for speech-related information. Hence, extracting the musical structure could benefit other processes that are not directly linked to music processing. In short, this account would predict that listeners will already start with a high accuracy in detecting vowel manipulations; and once the transformation occurs they will even get better at doing this.

(6)

Method Participants

We tested 44 participants (31 females) who were mostly graduate students (M=25.61, SD=7.87). Exclusion criteria were a) having any kind of hearing problem including

temporary hearing disturbances, and b) having low scores for the language questionnaires that measure proficiency in English (see Table 2 for descriptives). None of our participants met the exclusion criteria.

Materials

Experimental stimuli

All stimuli were taken from an online, open-access audiobook recordings that used 3 male speakers (Tierney et al., 2012). In our experiment we made use of two male speakers (Speaker 1 and Speaker 3) since the manipulation did not work on Spekaer 2 because of his fast speech rate. We used twelve natural speech excerpts taken from a large corpus. Eight of them were previously shown to transform from speech to song, hereafter called as

transforming stimuli. The remaining four were control stimuli taken from the same corpus that were shown to be perceived as speech after repetition. The transforming stimuli differed significantly from the control ones in fundamental frequency stability and had marginally greater rhyhtmicity (i.e. the regularity of the intervals between stressed syllables) (Tierney et al., 2012).

Each of the transforming and control stimuli, which we will refer to as the “target stimuli” were part of a larger context. These contexts were 3 or 4 sentences long and they were also taken from the same corpus (Tierney et al., 2012). One vowel of each target stimuli was altered using Praat and the syllable that was manipulated varied between stimuli. Vowels were exchanged with the ones that were close to each other in the vowel space. After defining the vowels we can manipulate (i.e. a, e, ı, i, o, ö, u, ü), we marked the potential replacing vowels in each context. For example, if the target word was “here” and the target vowel was “e”, we searched for a) the vowel “a”, and b) the vowel “a” surrounded by the syllables “h” and “r”. After identifying several candidates, we cut the target vowel, and replaced it with the new vowel. The pitch contour of the new vowel was matched with the old one’s in order to preserve the naturalness of speech. The naturalness of this splicing manipulation was evaluated by three raters. Among 17 stimuli that were created, the eight best ones were chosen (see Table 1 for the details of this vowel manipulation).

Table 1. Stimuli that were used in the experiment

Original sentences Manipulated sentences

… and his two sisters... … and his two sist(ɘ)rs

… for this was the only service… … for this w(ə)s the only service…

… gave the houses… …g(ɑ)ve the houses…

… nothing but a scurvy faintness… …nothing but a sc(ɔ)rvy faintness … it had its compensations… … it had its compens(e)tions… … yet somehow I can get… … yet somehow I can g(ɘ)t … snags and sandbars… …snags and sandb(ɯ)rs

(7)

Language and musical sophistication tasks

We used a self-report language questionnaire (SRLQ) that measured participants’ English proficiency level (Roncaglia-Denissen et al., 2013). It also asked the age they started to learn English and how competent they see themselves (see Table 2 for descriptives). Additionally, participants completed the Lexical Test for Advanced Learners of English (LexTale) (Lemhöfer & Broersma, 2012) that measures vocabulary knowledge for medium to highly proficient speakers of English as a secong language. The LexTale consists of 60 words and asks whether these are actual English words or not. It takes about 3.5 minutes to

complete.

To measure musical experience, we admintered the Goldsmiths Musical

Sophistication Index (Gold-MSI) (Müllensiefen, Gingras, Musil & Stewart, 2014). This is a self-report questionnaire that measures individual differences in musical sophistication, and it is sensitive to differences among non-musicians. The Gold-MSI consists of 5 subscales, namely a) active musical engagement which asks for example how much time and money one spends on musical activities, b) perceptual abilities which measures how accurate one’s musical listening skills are, c) musical training which focuses on formal musical training history, d) singing abilities of the individual, and e) emotional engagement with music which refers to the ability of defining emotions that a music tries to transfer.

Table 2. Mean language skills (%) of participants measured by the self-report language questionnaire (SRLQ) Mean SD Reading 82.1 .90 Writing 75 1.40 Understanding 81.6 1.05 Speaking 74.4 1.46 Grammar 77.3 1.48 Independence 79.4 1.38 Procedure

After providing the informed consent, participants filled out the SRLQ, the LexTale and the Gold-MSI.This took approximately 15 minutes in total. After that, participants were taken into a sound proof room in which they were seated in front of the screen. The

experimenter stayed outside and communicated with them through their headphones. All insructions were presented on the screen. Participants were instructed that they will listen to voice fragments that often sounded like speech but some may sound like a song. They were informed that after hearing the fragment for the first time they will be asked to rate the fragment on how much speech or song-like it sounds (i.e. Rating task). Following this, they would hear the fragment two more times consecutively, and had to decide whether these two were different or the same (i.e. Discrimination task). Once they have done the discrimination task they would see a cross which would indicate that they will hear the same excerpt for five times repetitively, and at the end they would have to rate the last excerpt once again on how speech/song it sounded. Finally they would do the discrimination task one more time. Hence, they had to complete 2 rating, and 2 discrimination tasks. The instructions included the order of the task and a visual display of how a single trial will look like (see Figure 1).

(8)

For the rating, participants had to use the keys 1 (“exactly like speech”) to 5 (“exactly like song). For the discrimination task, they had to press “S” if they perceived the excerpts as same, and “K” if they perceived them to be different. “Same” pairs were the ones in which the first fragment (i.e. the standard stimulus) and the following one (i.e. comparison stimulus) were identical, whereas in “different” pairs the comparison stimulus differed from the

standard one by one vowel. Half of the pairs in each discrimination task (i.e. initial and final) were “same” pairs, whereas the other half were “different” pairs.

Each trial consisted of 10 repetitions in total, in which six of them belonged to the rating tasks, and four of them (2 pairs) were used in the discrimination task. Having the discrminiation task twice, first at the beginning and then at the end, allowed us to test sensitivity to vowel changes when the excerpt was perceived as speech and then as song. Before starting the experiment, participants completed two practice trials and were asked whether they had any questions. At the end of each practice trial they were given feedback about whether the two pairs were the same or were different from each other.

Figure 1. A representation of the order of the tasks within a single trial. Each trial consisted of 10 repetitions of the same speech fragment. The rating task took place twice in order to capture whether the participant perceived the excerpt as speech or song. The discrimination task also took place twice in order to capture the participant’s response when the speech fragment is perceived as speech and then as song.

Results

For each participant and each vowel change we calculated performance scores using the traditional correct response rate. Correct responses in the discrimination task (i.e. “same” responses on same trials, and “different” responses on different trials) were divided into the total responses, and we took the percentage of these scores. Not all excerpts showed the transformation effect for each participant. Hence we divided the data into three: a) trials that were initially heard as speech and continued to be heard as speech (i.e. Stable speech), b) trials that were initialy heard as song and continued to be heard as song (i.e. Stable song), and c) trials that were initially heard as speech and then heard as song (i.e. Transforming).

Transforming trials were the ones that were rated 1 or 2 in the initial rating task, and 3, 4, or 5 in the final rating task following the repetition (Vanden Bosch der Nederlanden, Hannon & Synder, 2015b). Stable speech trials were the ones that were rated 1 or 2, whereas Stable song trials were the ones that were rated 3, 4 or 5 both at the initial and final rating task. 38.67% of the stimuli were perceived as transforming, 45.64% as Stable Speech, and 31.94% as Stable Song. Because the discrimination task took place twice in each trial, for each of these three conditions we obtained two different performance scores, one for the initial discrimination task (i.e. Initial discrimination), and the other for the final one (i.e. Final discrimination). A chi-square analysis of the number of times each excerpt was categorized as stable or transforming was non significant. Thus none of the speech excerpts were more likely to be

10 9 8 7 6 5 4 3 2 1 Time

Rating Discrimination Rating Discrimination

10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1

(9)

heard as either stable (speech or song) or transforming by all participants. Further, participants’ musical sophistication (as measured by the Gold-MSI) did not influence the number of times they experienced the illusion, r= -.02, p> .05. Thus, the transformation was being experienced regardless of the participant’s musical abilities.

Because we were interested in how vowel discrimination sensitivity changes when an excerpt is first perceived as speech, and then as song, our initial analysis included only trials that were subject to the the speech-to-song illusion. A paired-samples t-test indicated no significant differences between the Initial discrimination task and the Final discrimination task for the transforming stimuli. Importantly, although not significant we observed an increase in performance from Initial to Final discrimination task.

In the following analysis we compared discrimination performance of the Transforming trials to the Stable speech and Stable song trials. We conducted a 2 x 3 repeated measures ANOVA, and found a main effect of Condition, F(2, 46) = 3.98, p = .02, η2= .14. Post hoc analysis using Bonferonni correction revealed that for both discrimination tasks performance was significantly higher for Stable Song trials (M= .95, SD= .02) than for Transforming trials (M= .86, SD= .02), p= .01 (see Figure 2). Stable speech trials did not differ significantly from both Transforming and Stable Song trials. Performance scores for Stable Speech trials were higher than Transforming trials, and lower than Stable Speech trials, although these differences were not significant (see Figure 3).

Figure 2. Participants’ mean correct response rates for each condition indicate a main effect of condition. Discrimination performance for Stable Song trials was significantly higher than that of Transforming trials both for the initial and final discrimination task, p= .01. Error bars are within-subject standard error (Cousineau, 2005). 80 82 84 86 88 90 92 94 96 98

Transforming Stable Speech Stable Song

Corre ct re spons e ra te (%) Condition *

(10)

Figure 3. Mean performance scores for each of the 3 conditions in the Initial and Final discrimination task. An increase from the initial to the final discrimination task was observed for Stable Song and Transforming trials, whereas for Stable Speech trials a decrease was observed. Importantly none of these differences were significant. Error bars are within-subject standard error (Cousineau, 2005).

We further analyzed the relationship between Stable song and Transforming trials by conducting paired-samples t-tests. We found a significant difference between Initial

discrimination for Stable song trials and Initial discrimination for Transforming trials (t(25) = 2.24, p< .03), whereas Last discrimination for Stable song and Last discrimination for

Transforming did not differ from each other (see Figure 4).

Lastly, we investigated whether Control trials differed from Transforming trials by conducting a 2 x 2 repeated measures ANOVA. There was no main effect of Condition, nor Discrimination.

Figure 4. Mean performance scores for Transforming and Stable Song conditions in the Initial and Final discrimination tasks. For the Initial discrimination task, performance on Stable Song trials was better than Transforming, whereas for the Final discrimination, in which the excerpts were perceived as song in both conditions, there was no difference in discrimination performance.

70 75 80 85 90 95 100

Initial discrimination Final discrimination

Corre ct re spons e ra te (%) Transforming Stable Speech Stable Song 70 75 80 85 90 95 100

Initial discrimination Final discrimination

Corre ct re spons e ra te (%) Transforming Stable Song

*

(11)

Discussion

The current study investigated how the speech-to-song illusion affects sensitivity to speech-related information. Specifically, we asked whether listeners’ performance in discriminating vowel manipulations will increase once they perceive the excerpt as a song compared to their initial percept of it, which is as speech. Our results showed that listeners’ discrimination performance did not differ between when the excerpt was perceived as speech vs. as song. Although there was an increase in performance from initial to final

discrimination for the transforming stimuli, unlike our expectations it did not reach significance.

Interestingly, discrimination performance in general, regardless of whether the preceding excerpt was rated as speech or song, was best when the utterance was initially perceived as song and continued to be perceived as song, and it was the worst for utterances that transformed from speech to song. Further, when we compared performance in the initial discrimination task, we found that participants’ sensitivity to vowel changes were higher when they initially rated the utterance as a song compared to when they rated it as speech. Importantly this difference in performance disappeared in the final discrimination task, in which both of the excerpts were now perceived as song.

These findings provide an opportunity to test the two competing views that explain the underlying mechanism of the illusion by placing the focus on elements that are more relevant to language processing than to musical processing. The “speech-perception mode” account suggests that by default listeners operate in a perceptual mode that prioritizes speech-related information, and de-emphasizes music-speech-related information, but through repeated exposure, speech representations become satiated which in turn allows the processing of musically important elements. This account would predict that, initially, because speech-related characteristics are by default more salient, sensitivity to vowels should be high. However, once the transformation occurs, due to a switch to the music mode of listening where musical elements are more pronounced, sensitivity to vowel differences should decrease. On the other hand, the “musical structure account” suggests that listeners are always capable of detecting musical elements within any information, however this process requires some time, hence repetition, in order for the tonal structures to be fine-tuned. Drawing from Tierney et al. (2018)’s study that showed evidence in favor of the “musical structure account”, we asked whether the same principle could be applied to language-related acoustic information. Specifically, we hypothesized that just as repetition benefits the fine-tuning of musical acoustic elements such as pitch, it should also benefit linguistic acoustic elements such as vowels; therefore we expected that with repeated exposure, sensitivity to vowel changes should increase. Although not significant, we found an increase in

discrimination accuracy once the transformation occurred, which points more towards the direction of the second account.

An intriguing finding however was that performance in discriminating vowels in general, regardless of whether it was in the initial or final discrimination task, was lower for trials that transformed compared to trials that were perceived as a song from the start. This suggests that the ability to perceive something as musical and extract musical elements early on benefits vowel representations, a process that is not central to music but more related to language. Interestingly, when listeners perceived the excerpt initially as speech

(Transforming condition) their vowel discrimination performance was poorer compared to cases in which their initial percept of the excerpt was a song (Stable Song condition). However once they perceived the stimulus as a song in the final discrimination task, their performance scores increased up to the point that it no longer differed from the excerpts that were already heard as song. This finding strengthens the idea that being able extract musical

(12)

elements from a given stimulus increases sensitivity to other information, in this case speech-related information, that is not directly speech-related to music processing. Indeed, research shows that musical abilities can benefit other processes. For example attending to the rhythmic information of a stimulus can aid not only language learning but also memory and recall processes (Fonseca-Mora, Toscano-Fuentes & Wermke, 2011). Intonation is another musical element that can benefit perception of word stress and recognition of sentence structure. Starting from the first year of life, infants make use of melodic information during L1

acquisition (Wermke & Mende, 2006; 2009; Wermke, 2002). The beneficial effects of music on language processing are also present for second language learning as music helps

memorization of the new language (Schellenberg et al., 2007), develops auditory perception (Slevc and Miyake, 2006) and reduces language anxiety (Fonseca-Mora & García, 2010).

It should be noted, however, that although vowel sounds are acoustic information relevant for language processing when compared to consonants, which have a specific lexical function, their function in speech processing seems to be different (Bonatti, Peña, Nespor, & Mehler, 2005; Mehler, Peña, Nespor, & Bonatti, 2006). In fact, this lexical function of consonants emerges very early in human life, as demonstrated by the fact that 20-month-old infants can easily learn two words that differ in one consonant (e.g. [pize] vs. [tize]), but cannot do the same when these differ by one vowel (e.g. [pize] vs. [paze]) (Nazzi, 2005; Nazzi & New, 2007). Whereas consonants have a lexical function, vowels seem to be helpful for extracting structural generalizations in artificial languages, hence they are heavily

involved in syntactic computations (Toro, Nespor, Mehler, & Bonatti, 2008). Additionally they play a more important role for grammar and prosody than consonants do (Nespor, Peña, & Mehler, 2003; Toro et al., 2008). These functional differences are further supported by studies done with non-human primates. For example, contrary to humans, monkeys can only extract statistical regularities based on vowels (Newport, Hauser, Spaepen, & Aslin, 2004). Interestingly they can produce harmonic sounds that resemble vowels (Owren, Seyfarth, & Cheney, 1997; Rendall, Rodman, & Emond, 1996), but only humans have the ability to articulate consonants that allow them to make rich meaningful contrasts (MacNeilage & Davis, 2000).

In sum, research provides evidence that vowels and consonants carry different linguistic functions. Whereas the processing of consonants is important for word identification, vowels’ major contribution is to grammar and prosody in the English

language. Further, studies comparing human and animal language suggest that vowels may have a less specific role for speech when compared to consonants (Kolinsky et al., 2009). These differences led to the hypothesis that the distinction between vowels and consonants could be also present in other human auditory processes such as music perception (Kolinsky et al., 2009). In a series of experiments Kolinsky and colleagues compared material with varying vowels and varying consonants. Participants had to classify bi-syllabic nonwords that were sung on two-tone melodic intervals. Overall, their results indicated that in songs

consonants were processed more independently from melodic information than vowels were. They concluded that vowels merge or interact with intervals during song processing,

suggesting that the linguistic function of vowels might resemble the functions of pitch in music processing.

These characteristics of vowels might explain why perceiving an excerpt as a song facilitated performance in the discrimination task we used. The fact that there is “musicality” in vowels might make their detection easier especially when the context in which they are perceived is a song rather than a speech utterance.

Future research should compare acoustic information that is central and secondary to music processing (i.e. pitch and vowel) to understand whether the illusion affects the two to a similar extent or whether it benefits one over the other. Further, in order to better understand

(13)

how language processing is affected from the illusion, studies could test stimuli that are language-specific. One such information could be semantic processing. It has been shown that repetition of a single word can disrupt one’s judgments of whether words are part of the same semantic category (Smith, 1984; Smith & Klein, 1990; Pilotti, Antrobus, & Duff, 1997; Pilotti & Khurshid, 2004). Future work can investigate how the illusion affects semantic relatedness judgments of the words that are subject to the transformation.

To our knowledge, the present study is the first to investigate language related processes in the speech-to-song illusion. We asked whether transformation from speech to song would alter sensitivity to vowel changes within the transforming excerpts. We

hypothesized that just as it is the case for musically relevant acoustic elements such as pitch, repetition should benefit the discrimination of varying vowel sounds, i.e. acoustic elements that are more related to language processing. Our findings suggests that although the transformation from speech to song has an impact on vowel discrimination, being able to represent information as a song from the very beginning is more crucial in order to make judgments about speech-related stimulus.

Acknowledgments

I would like to thank Makiko Sadakata for the supervision, Kanthida van Welzen and Maud Zweers for their support and collaboration, and Adam Tierney for the use of the corpus with auditory illusions.

References

Allen, E. J., & Oxenham, A. J. (2014). Symmetric interactions and interference between pitch and timbre. The Journal of the Acoustical Society of America, 135(3), 1371-1379.

Arom, S. (2000). Prolegomena to a biomusicology.” Ch. 2 in The Origins of Music. Ed. Nils L. Wallin, Björn Merker and Steven Brown.

Bashford, J., Warren, R., & Lenz, P. (2006). Polling the effective neighborhoods of spoken words with the verbal transformation effect. JASA, 119, EL55.

Bashford, J., Warren, R., & Lenz, P. (2008). Evoking biphone neighborhoods with verbal

transformations: illusory changes demonstrate both lexical competition and inhibition. JASA, 123, EL32.

Besson, M., & Schön, D. (2001). Comparison between language and music. Annals of the New York

Academy of Sciences, 930(1), 232-258.

Bonatti, L. L., Pena, M., Nespor, M., & Mehler, J. (2005). Linguistic constraints on statistical computations: The role of consonants and vowels in continuous speech processing. Psychological

Science, 16(6), 451-459.

Brown, D. (1991). Human Universals McGraw-Hill Humanities. Social Sciences/Languages. Caruso, V. C., & Balaban, E. (2014). Pitch and timbre interfere when both are parametrically varied.

PloS one, 9(1), e87065.

Deutsch, D. (1979). Octave generalization and the consolidation of melodic information. Canadian Journal of Psychology, 33, 201-205.

Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song. The

Journal of the Acoustical Society of America, 129(4), 2245-2252.

Falk, S., Rathcke, T., & Dalla Bella, S. (2014). When speech sounds like music. Journal of

Experimental Psychology: Human Perception and Performance, 40(4), 1491.

Fonseca-Mora, C., Toscano-Fuentes, C., & Wermke, K. (2011). Melodies that help: The relation between language aptitude and musical intelligence. International Journal of English

Studies, 22(1), 101-118.

Jusczyk, P. W. (1999). How infants begin to extract words from speech. Trends in cognitive sciences, 3(9), 323-328.

(14)

Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in cognitive

sciences, 9(12), 578-584.

Kolinsky, R., Lidji, P., Peretz, I., Besson, M., & Morais, J. (2009). Processing interactions between phonology and melody: Vowels sing but consonants speak. Cognition, 112(1), 1-20.

Lemhöfer, K., & Broersma, M. (2012). Introducing LexTALE: A quick and valid lexical test for advanced learners of English. Behavior research methods, 44(2), 325-343.

MacNeilage, P. F., & Davis, B. L. (2000). On the origin of internal structure of word forms. Science, 288(5465), 527-531.

Margulis, E. (2013). On Repeat: How Music Plays the Mind. New York, NY: Oxford University Press.

Margulis, E. H., Simchy-Gross, R., & Black, J. L. (2015). Pronunciation difficulty, temporal regularity, and the speech-to-song illusion. Frontiers in psychology, 6, 48.

Mehler, J., Peña, M., Nespor, M., & Bonatti, L. (2006). The “soul” of language does not use statistics: Reflections on vowels and consonants. Cortex, 42(6), 846-854.

Mora, MCF, & Barroso, LG (2010). Learn Spanish in the USA: the media as a social

motivation. Comunicar: Ibero-American Scientific Journal of Communication and Education , (34), 145-153.

Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: an index for assessing musical sophistication in the general population. PloS one, 9(2), e89642. Nazzi, T. (2005). Use of phonetic specificity during the acquisition of new words: Differences

between consonants and vowels. Cognition, 98, 13–30.

Nazzi, T., & New, B. (2007). Beyond stop consonants: Consonantal specificity in early lexical decision. Cognitive Development, 22, 271–279.

Nespor, M., Peña, M., & Mehler, J. (2003). On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e linguaggio, 2(2), 203-230.

Newport, E. L., Hauser, M. D., Spaepen, G., & Aslin, R. N. (2004). Learning at a distance II. Statistical learning of non-adjacent dependencies in a non-human primate. Cognitive

psychology, 49(2), 85-117.

Owren, M. J., Seyfarth, R. M., & Cheney, D. L. (1997). The acoustic features of vowel-like grunt calls in chacma baboons (Papio cyncephalus ursinus): Implications for production processes and functions. The Journal of the Acoustical Society of America, 101(5), 2951-2963.

Pilotti, M., Antrobus, J., & Duff, M. (1997). The effect of presemantic acoustic adaptation on semantic “satiation”. Memory and Cognition, 25, 305-312.

Pilotti, M., & Khurshid, A. (2004). Semantic satiation effect in young and older adults. Perceptual and Motor Skills, 98, 999-1016.

Rendall, D., Rodman, P. S., & Emond, R. E. (1996). Vocal recognition of individuals and kin in free-ranging rhesus monkeys. Animal behaviour, 51(5), 1007-1015.

Roncaglia-Denissen, M. P., Schmidt-Kassow, M., Heine, A., Vuust, P., & Kotz, S. A. (2013). Enhanced musical rhythmic perception in Turkish early and late learners of German. Frontiers in

psychology, 4, 645.

Saitou, T., Goto, M., Unoki, M., & Akagi, M. (2007, October). Speech-to-singing synthesis: Converting speaking voices to singing voices by controlling acoustic features unique to singing voices. In 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 215-218). IEEE.

Schellenberg, E. G., Nakata, T., Hunter, P. G., & Tamoto, S. (2007). Exposure to music and cognitive performance: Tests of children and adults. Psychology of music, 35(1), 5-19.

Schlüter, J., & Sonnleitner, R. (2012, September). Unsupervised feature learning for speech and music detection in radio broadcasts. In Proceedings of the 15th International Conference on Digital

Audio Effects.

Slevc, L. R., & Miyake, A. (2006). Individual differences in second-language proficiency: Does musical ability matter?. Psychological Science, 17(8), 675-681.

Smith, L. (1984). Semantic satiation affects category membership decision time but not lexical priming. Memory and Cognition, 12, 483-488.

(15)

Smith, L, & Klein, R. (1990). Evidence for semantic satiation: repeating a category slows subsequent semantic processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 852- 861.

Stanley, D. (1945). Your Voice: Applied Science of Vocal Art, Singing and Speaking. Pitman Publishing Corporation: New York, United States

Tierney, A., Dick, F., Deutsch, D., & Sereno, M. (2012). Speech versus song: multiple pitch-sensitive areas revealed by a naturally occurring musical illusion. Cerebral Cortex, 23(2), 249-254.

Tierney, A., Patel, A. D., & Breen, M. (2018). Acoustic foundations of the speech-to-song illusion. Journal of Experimental Psychology: General, 147(6), 888.

Toro, J. M., Nespor, M., Mehler, J., & Bonatti, L. L. (2008). Finding words and rules in a speech stream: Functional differences between vowels and consonants. Psychological Science, 19(2), 137-144.

Vanden Bosch der Nederlanden, C., Hannon, E., & Snyder, J. (2015a). Everyday musical experience is sufficient to perceive the speech-to-song illusion. Journal of Experimental Psychology: General, 2, e43-e49.

Vanden Bosch der Nederlanden, C., Hannon, E., & Snyder, J. (2015b). Finding the music of speech: musical knowledge influences pitch processing in speech. Cognition, 143, 135-140.

Wermke, K. (2002). Investigation of melody development in the infant cry of monozygotic twins in the first 5 months of life.

Wermke, K., & Mende, W. (2006). Melody as a primordial legacy from early roots of language. Behavioral and Brain Sciences, 29(3), 300-300.

Wermke, K., & Mende, W. (2009). Musical elements in human infants’ cries: in the beginning is the melody. Musicae Scientiae, 13(2_suppl), 151-175.

Warren, R., Bashford, J., & Gardner, D. (1990). Tweaking the lexicon: organization of vowel sequences into words. Perception & Psychophysics, 47, 423-432.

Warren, R., Healy, E., & Chalikia, M. (1996). The vowel-sequence illusion: intrasubject stability and intersubject agreement of syllabic forms. JASA, 100, 2452-2461.

Warrier, C., & Zatorre, R. (2002). Influence of tonal context and timbral variation on perception of pitch. Perception and Psychophysics, 64, 198-207.

Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: music and speech. Trends in cognitive sciences, 6(1), 37-46.

(16)

Referenties

GERELATEERDE DOCUMENTEN

In the following sections, the performance of the new scheme for music and pitch perception is compared in four different experiments to that of the current clinically used

Changes in the extent of recorded crime can therefore also be the result of changes in the population's willingness to report crime, in the policy of the police towards

This study has argued that an economic and financial crisis has an influence on the relationship between acquisitions including innovation output, measured as the number of

D.3.2 Pervaporation calculated results for Pervap™ 4101 The calculated pervaporation results obtained by using the raw data for Pervap™ 4101 membrane at 30, 40, 50 and 60 °C and

The increased Hb A and P50 values found in the diabetic mothers (Table I) as well as the signifIcant correlation between the P 50 values in diabetic mothers and the percentage of Hb F

We will look at the number of gestures people produce, the time people need to instruct, the number of words they use, the speech rate, the number of filled pauses used, and

In this work, our main contribution is a systematic methodology that combines several analysis techniques to (1) depict the design space of the possible decomposition alterna-

Ondanks dat de studenten met ASS verschillende belemmeringen ervaren in het hoger onderwijs, is er nog maar weinig bekend op welke manieren studenten met ASS ondersteund kunnen