• No results found

Anticipatory coarticulation in children: comparing coarticulation patterns in a picture naming and a shadowing task

N/A
N/A
Protected

Academic year: 2021

Share "Anticipatory coarticulation in children: comparing coarticulation patterns in a picture naming and a shadowing task"

Copied!
104
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Anticipatory coarticulation in children: comparing

coarticulation patterns in a picture naming and a

shadowing task

Lisanne de Jong

Master thesis

for the requirements of the program Research Master Language and Cognition

Word count: 20081 Student number: S3528863 Email: l.m.de.jong.4@student.rug.nl Supervisor: Prof. Dr. Martijn Wieling 31.08.2020

(2)

2

Abstract

In spoken language, speech segments overlap with each other, as a speech sound is anticipated already before its acoustic and articulatory onset. This process known as anticipatory coarticulation (Daniloff & Hammarberg, 1973; Fowler, 1980; Kent & Read, 2002; Repp, 1986) has been widely studied among adults, yet descriptions of the maturation of coarticulatory processes are inconsistent across studies. Moreover, most studies on patterns of anticipatory coarticulation in children make use of a shadowing paradigm to elicit speech , while little attention seems to be paid to the large number of studies suggesting that phonetic convergence to a model speaker is commonly found in shadowed speech (e.g. Babel, 2012;Pardo, Urmance, Wilman & Wiener, 2017; Schockley, Sabadini & Fowler, 2004). The aim of the present study was to investigate intersyllabic anticipatory coarticulation in Dutch children of varying ages and to compare whether coarticulatory patterns differed for speech elicited in a picture naming task and speech elicited in a shadowing task. Ultrasound tongue imaging data was collected at the NEMO Science Museum in Amsterdam as part of the NEMO Science Live program and the data of 61 participants aged 5-15 was included in the analysis. Using Generalized Additive Modeling, we did not find significant differences between the anticipatory coarticulation patterns in the picture naming and the shadowing task when taking into account the full dataset. However, an explorative analysis which was conducted on the basis of trends observed in the terrain plots of the different age groups suggested that although task choice did not affect coarticulatory patterns in the speech of participants aged 9-15, it did play a role in the speech of the participants aged 5-8, albeit to a different extent depending on the nature of the upcoming vowel. The trends observed in the study highlight the importance of discussing the role of task choice and provide new areas that could be explored in future studies.

Key-words: anticipatory coarticulation; long-distance vowel anticipation; speech motor control; shadowed speech; phonetic convergence

(3)

3

Acknowledgments

The completion of this thesis took a little longer than expected, but I am happy that the moment is finally here. All that is left is for me to thank a few people.

First of all, thank you Martijn, not only for being my supervisor for this thesis, but also for all the experiences and opportunities that you have created in the past 2.5 years in what over time has become the Speech Lab Groningen. Your feedback in this process has been incredibly helpful and I hope this final product lives up to your expectations. That being said, I don’t think I want to see another terrain plot again anytime soon ;)

Next, the other members of the SLG that have helped me at some point during this project. Thank you Teja, Jidde, Aude, Stefanie, Nora, Hedwig and Martijn B., for all your feedback, for being my guinea pigs and for creating scripts for my analysis.

Thank you, Shiv, Bregje, Annika, Jelle and Vicky for helping me with the data collection. I couldn’t have done it without you.

A massive thank you to my parents and my friends, especially Gina and Birgit, who have always been there for me throughout the process, despite probably not understanding much of what is discussed in this thesis.

And of course, thank you Elsard, for always being there for me and for helping out with my projects, whether it’s recruiting participants or agreeing to having your face put on a poster for thousands of Lowlands visitors to see. But yes, I will also forever complain about how unfair it is that you managed to get a MSc degree with the most basic level of statistics, and that I do not get one ;)

Lastly, I would like to thank the NEMO Science Museum and specifically Ludo and Mart for all your help and the very nice opportunity to collect data at the museum.

(4)

4

Table of contents

Abstract Acknowledgments List of appendices 1. Introduction ... 6 2. Theoretical background ... 9

2.1. The importance of studying anticipatory coarticulation ... 9

2.2. The role of task choice ... 12

2.3. Measuring anticipatory coarticulation ... 17

3. Research aims and hypotheses ... 20

4. Method ... 22 4.1. Participants ... 22 4.2. Materials ... 23 4.3. Experimental equipment ... 26 4.4. Experimental procedure ... 27 5. Analysis ... 32 5.1. Preprocessing ... 32 5.2. Statistical analysis ... 37 6. Results ... 43 6.1. Descriptive statistics ... 43 6.2. Visualizing GAMs ... 44

6.3. Hypothesis testing: the effect of task ... 46

6.3. Exploratory analysis: the effect of age ... 48

7. Discussion ... 62

Limitations of the study ... 65

Conclusion ... 69

References ... 70

(5)

5

List of appendices

A1. CETO approval ... 76

A2. Pictures used in picture naming task ... 77

A3. Information letter for participants and parent(s)/guardian(s) ... 79

A4. Consent form ... 81

A5. Questionnaire ... 83

A6. Debriefing leaflet ... 85

A7. Statistical analysis ... 86

(6)

6

1. Introduction

Coarticulation is a widely studied topic in research on speech development. In spoken language, speech segments overlap with each other. A speech sound is anticipated already before its acoustic and articulatory onset (anticipatory coarticulation), but an influence of a speech sound can also be found during the production of sounds that follow (carryover, or perseverative coarticulation). Figure 1 below visualizes the process of both types of coarticulation for three speech gestures (A, B, C).

Figure 1. Coarticulation of speech gestures (adapted from Fowler & Saltzmann, 1993).

The focus of this thesis is on anticipatory coarticulation. Anticipation gives us insights into motor programming in speech, as it reflects the interaction between planning processes (the selection of phonemic units to be pronounced, along with their corresponding motor schemes) and the physical execution that eventually leads to the intended utterance (Noiray et al., 2019).

Similar to other developmental processes, the speech of (typically developing) children changes as their age increases, especially when it comes to the temporal organization of speech gestures. Several studies have shown that coarticulation patterns in children differ from those in adults (Rubertus & Noiray, 2018; Noiray et al., 2019; Zharkova, Hewlett & Hardcaste, 2011). However, the exact way in which these patterns eventually reach the ‘adult’ stage remains unclear.

A commonly used method to study coarticulation in children involves a so-called shadowing task, in which participants have to repeat sequences that are presented to them auditorily. By

(7)

7 presenting stimuli in the auditory domain, it is possible to control for variation related to other cognitive abilities, such as reading, and study coarticulation processes in different age groups (Noiray et al., 2019). Moreover, a shadowing paradigm is often chosen when stimuli include pseudowords that are controlled for various factors, such as the types of speech sounds and their associated degree of coarticulatory resistance.

Although such studies have provided us with valuable insights into coarticulation patterns, the question remains to what extent we can extrapolate these findings from very controlled speech sequences to more ‘natural’ speech scenarios, such as in real words. In addition, it is important to consider how coarticulation patterns found in studies might be influenced by the task that is used to elicit speech data Many studies have found that merely listening to and repeating isolated words can trigger phonetic convergence, a process in which a speaker adjusts their speech to that of the person they interact with, or in this case, the person they listen to (Goldinger, 1998; Pardo, Urmanche, Wilman & Wiener, 2017; Shockley, Sabadini & Fowler, 2004). Considering the presumed difference in coarticulation patterns between children and adults, it seems important to investigate whether phonetic convergence might play a role when studying coarticulation using a shadowing task.

The present study looks at anticipatory coarticulation processes in the speech of Dutch-speaking children of different ages. By using the method of ultrasound tongue imaging, we were able to track tongue movements and therefore investigate anticipatory coarticulation from an articulatory point of view. The first aim of this study was to investigate the effect of age on anticipatory coarticulation processes in order to provide further insights into the maturation of speech production processes. Secondly, we were particularly interested in whether these processes differ depending on the nature of the experimental task. Therefore, we compared utterances produced in a shadowing task to those in a more ‘natural’ setting, namely a picture naming task.

To answer our research questions, we collected data from children visiting the NEMO Science Museum in July 2019. With the consent of their parent(s)/guardian(s), children participated in a 20-minute long experiment in which they took part in a picture naming and a shadowing task. During the experiment, we collected acoustic and articulatory data using an ultrasound tongue imaging device (Articulate Instruments, Ltd). The data collected in this project from children without speech problems will also be used as control data in a project that investigates speech articulation in children suffering from Duchenne muscular dystrophy

(8)

8 (DMD). As the data for the latter group will be collected at a later point, the stimuli design and experimental setup for our study was created in collaboration with the people involved in the DMD project (Prof. dr. Wieling, dr. Noiray and Nora Jamoulle).

Structure of this thesis

The following chapter will lay out the theoretical dimensions of this research project (Chapter 2). It will provide an overview of research that has been conducted on anticipatory coarticulation thus far and why it is an important topic of study. It will also discuss different factors that are at play, such as developmental patterns. Moreover, Chapter 2 will provide an overview on literature regarding the role of task choice in speech research. The chapter will be concluded with a section that introduces the measures and the analysis that were used in this study. Subsequently, Chapter 3 outlines the research aims and hypotheses of the project and is followed by a chapter that presents the methodology, including information on the participants, the stimuli design, the equipment used and the experimental procedure (Chapter 4). Following this, the steps taken for the data (pre-)processing and analysis are described in Chapter 5, after which the results of this study are presented in Chapter 6. In the final chapter, we discuss our findings by reflecting on the connections to previous research, on potential shortcomings of our study and what our findings could mean for future research on anticipatory coarticulation.

(9)

9

2. Theoretical background

This chapter contains an overview of the literature that formed the foundation for our study. First, we will focus on research that has been conducted on the topic of anticipatory coarticulation. This is followed by a discussion on different tasks that are used in speech research and how task choices might affect speech output. Finally, we will discuss some methodological concerns and introduce the measures and the type of analysis that we used for our study.

2.1. The importance of studying anticipatory coarticulation

Similar to many aspects of human movement dynamics, speech production involves a complex succession of processes that require motor patterns that an individual needs to learn. One important characteristic of such motor programming is the phenomenon of anticipation (Nadin, 2005). Anticipation refers to the ability of individuals to utilize past experiences to predict future events and to subsequently plan motor responses necessary to achieve a particular goal. In a linguistic context, we know that individuals are able to predict upcoming material and plan a response accordingly (Pickering & Garrod, 2007). In other words, prediction occurs after comprehension of the linguistic information that is provided to speakers, after which activation on the speech-motor level enables them to model their upcoming speech. As such, anticipation forms a fundamental characteristic of articulatory dynamics.

Although isolated speech sounds can be described and classified on the basis of a set of distinctive acoustic and articulatory features, communicative speech is much more complex. As individual sounds are combined to form syllables, words and phrases, the articulatory movements needed to produce these utterances are more than just the sum of the individual movements associated with particular sounds. As in other motor activities, speech production involves the interplay of different planning processes (namely, selecting the sound units as well as their corresponding motor schemes) and the physical execution to bind them together to form well-constructed, meaningful utterances. The process of speech segments overlapping with those adjacent to them is known as coarticulation (Daniloff & Hammarberg, 1973; Fowler, 1980; Kent & Read, 2002; Repp, 1986). Anticipatory coarticulation, then, refers to the process where a speech sound is anticipated already before its articulatory onset, which can be seen in that its characteristics are affected by those of the subsequent sound. Anticipatory coarticulation, therefore, is not only related to speech production, but also to speech perception.

(10)

10 There have been a number of studies that showed that anticipatory coarticulatory cues can play a facilitating role in earlier and faster word perception (Ali et al., 1971; Beddor et al., 2013, Salverda et al., 2014). Misleading cues, on the other hand, can lead word identification to occur more slowly and with a higher error rate (Whalen, 1984; Fowler, 1984; Fowler, 2005).

Following the similarity of speech anticipation with other forms of motor programming, the more experience an individual has with certain speech goals in certain phonetic environments, the more efficient and automatized these anticipations can become (Butz, Sigaud, & Gérard, 2003; Noiray et al., 2019). Therefore, an important field of study concerns how anticipatory coarticulation develops with age. Many studies report on deficiencies in the temporal organization of speech motor movements as indicators of developmental disorders such as apraxia of speech (e.g. Maas & Mailend, 2017; McNeil et al. 2017) and stuttering (e.g. Lenoci, 2018; Walsh, Mettel & Smith, 2015). Increasing our understanding on how anticipatory coarticulation develops across the lifespan in typically developing children could therefore provide valuable contributions to developmental theories of speech production. This knowledge could eventually be applied in clinical settings.

Coarticulation patterns in adults are well-documented and generally provide consistent findings (see for instance Katz, Kripke, & Tallal, 1991; Kent & Read, 2002; Nittrouer, Studdert-Kennedy, & McGowan, 1989). Over the past two decades, a growing number of studies has started to investigate anticipatory coarticulation patterns in typically-developing children. Although many of these studies agree that coarticulatory patterns in children differ from those in adults, the direction of this effect has been found inconsistent across studies and has yielded different hypotheses regarding the maturation of coarticulatory patterns into adulthood. When looking at the earlier literature, some studies reported greater levels of coarticulation in children (e.g. Nittrouer, Studdert-Kennedy & McGowan, 1989; Nittrouer & Whalen, 1989; Repp. 1986), whereas others suggested that the degree of coarticulation is initially smaller (e.g. Hodge, 1989; Sereno & Lieberman, 1987). Other studies suggested that similar levels of coarticulation can be found in both children and adults, but that its patterns show more variability in children (e.g. Katz, Kripke & Tallal, 1991; Sereno, Baum, Marean & Lieberman, 1987).

The different directions of the degree of coarticulation in children in comparison to adults has mainly been discussed in the context of two theories, namely one that describes child speech patterns as more segmental than adults (e.g. Gibson & Ohde, 2007; Katz,Kripke, & Tallal, 1991; Kent, 1983) and one that describes it as more holistic (e.g. Goodell & Studdert- Kennedy, 1993;

(11)

11 Nittrouer et al., 1996, Nittrouer & Whalen, 1989). The segmental approach is used to explain results that show less coarticulation in the patterns of children. It suggests that children learn patterns of individual components of a consonant-vowel segment, because of which the consonant is initially less affected by the following vowel, and that over the course of time, as children master the articulatory patterns associated with sound patterns, the degree of coarticulation increases until it eventually reaches an adult-like level (Kent, 1983). On the other hand, the holistic theory proposes that child speech initially exhibits a greater level of consonant-vowel coarticulation than adults. In this theory, syllable strings are seen as holistic patterns that are initially lexically-driven, meaning that they are limited to segment combinations that exist in words that have already been acquired (Keren-Portnoy, Majorano, & Vihman, 2009). Over time,as the lexical repertoire of children gradually expands, it is argued that the consonant and vowel interdependencies weaken and children develop more precision in coarticulatory coordination, both for familiar and new speech segment combinations (Goodell & Studdert-Kennedy, 1993, Nittrouer et al., 1996).

The current project is largely based on a more recent study by Noiray et al. (2019). This study focused on intersyllabic coarticulation (also referred to as long-distance coarticulation) in vowel-consonant-vowel (more specifically, schwa-C-V) sequences across a word boundary. The reason for this is that unlike anticipatory processes in the local domain (i.e. coarticulation within a syllable), long-distance coarticulation has received relatively little attention in the field. However, a previous study by Rubertus and Noiray (2018) provided interesting insights in terms of both local and long-distance coarticulation patterns. Findings of the study, which involved pronunciation of short schwa-C-V sequences by German children (aged 3, 4, 5 and 7) as well as adults, showed evidence of both local and long-distance anticipation for all children, but with a greater degree of coarticulation for the children aged 3-5 (kindergarten age) in comparison to the 7-year-old children (who were in the first year of primary school) and the adults. The authors argued that the difference between the children in kindergarten and those in the first year of primary school showed that phonological and lexical development are linked to developmental changes in coarticulation. The study by Noiray et al. (2019) largely replicated these findings, as 3-,4- and 5- year-olds anticipated upcoming vowels at an earlier timepoint than adults and to some extent the 7-year-olds. Additionally, the younger children in the 2018 study exhibited more variation in the long-distance anticipation of the vowel as a function of the preceding consonant. For some children, a significant effect of the nature of the consonant on vocalic anticipation was found, whereas for other children the consonant did not influence vocalic

(12)

12 anticipation patterns. In short, although some trends could be observed between age cohorts, Noiray et al. (2019)’s findings emphasize the importance of looking at developmental differences when studying anticipatory coarticulation. Rather than looking at the coarticulation degree as linearly increasing or decreasing across time, it might be better to study it in a non-linear fashion, a point which we will return to in Section 2.3. Relating their work to previous studies, Noiray et al. (2019) argue that the inconsistency in findings regarding anticipatory coarticulation patterns over time might also partly be explained by methodological differences between studies. This concerns for instance differences in experimental design, stimuli and analyses that have been used over the years. In line with this, Repp (1986) noted that for some phonetic contexts, a high degree of coarticulation might indicate advanced speech production, whereas in others it might be evidence for articulatory immaturity, emphasizing the necessity to take into account the experimental design when comparing findings of different studies. Moreover, Noiray et al. (2019) argue that studies focusing on developmental patterns are often limited in sample size as well as the age range that is being investigated, and that extrapolation of findings beyond the age range investigated might be another reason for the variation in conclusions drawn about the maturation of coarticulatory patterns (Noiray et al., 2019).

2.2. The role of task choice

When reviewing the literature on any linguistic phenomenon, it is important to consider the task(s) being used to elicit data. Speech output can be affected by many factors, both linguistic and non-linguistic. One example of how stimuli could affect the speech produced by participants can be found in the role of lexical frequency. For instance, results from a study by Herrmann, Whiteside and Cunningham (2008) revealed greater degrees of coarticulation for high-frequency syllables than for syllables with a lower frequency. Another study by Mousikou & Rastle (2015) found that low-frequency and high-frequency items generated different response latencies, with low-frequency items being named slower than high-frequency items. Interestingly, the authors found that this frequency effect was stronger for a picture naming task than for a task in which participants were asked to read out words. This suggests that beyond linguistic aspects of stimuli (such as lexical frequency), task choice might offer another layer that could influence the speech produced. In this section, we will briefly discuss three tasks that are commonly used in speech research and discuss some advantages and disadvantages to using these methods when it comes to studying (anticipatory) coarticulation.

(13)

13 Perhaps one of the most commonly used methods to elicit speech is having participants read texts that are presented to them. The benefit of using reading tasks is clear, as it is relatively easy to administer and there is no interference of visual or auditory stimuli in contrast to for example tasks such as a picture naming or shadowing task which utilize images or audio. Moreover, using texts allows researchers to control for the speech that is produced. For instance, when studying anticipatory coarticulation, target sequences can be easily embedded into larger sequences, as was for instance done in the study by Sussman, Byrd and Guitar (2011). A reading paradigm can also be used to investigate coarticulation in pseudowords, as can for example be found in the study by Whalen (1990). However, there are also downsides to administering tasks that require participants to read text. Most notably, these types of tasks exclude participants who have not yet acquired the competence to read. As recently there has been a shift towards studying coarticulation across the lifespan, choosing to use a reading task would exclude participants from a population that as of yet is still understudied. Another aspect relating to this is that speakers are affected by phonological characteristics of both the utterances they produce as well as the auditory input they receive (Saletta, Goffman & Hogan, 2016). Literacy competences, on top of that, play a role in that speakers who are literate are not only influenced by these phonological characteristics of stimuli, but also by their orthographic characteristics. It has even been suggested that this interference at the orthographic level is not only present in tasks that involve reading or spelling directly, but might also appear when a task simply requires any form of listening or speaking (Rastle, McCormick, Bayliss, & Davis, 2011). In other words, reading and writing competences are important aspects to consider.

2.2.2 Shadowing

In order to remove the reading competence barrier, recent studies on coarticulation have often favored a method that includes repetition to elicit speech data (see for instance (Maas & Mailend, 2017; Nijland et al., 2002; Noiray et al., 2018; Noiray et al., 2019; Rubertus & Noiray, 2018)). In this so-called shadowing paradigm, participants are asked to repeat sequences that are presented to them auditorily, using pre-recorded speech of a model speaker. Other than the fact that no reading competence is required to perform this task, a shadowing task can be used for both stimuli that consist of real words as well as pseudowords. A preference for the latter stimuli type can be seen in the literature, as it allows researchers to control for the phonetic

(14)

14 environment of the target sequences. However, there may be downsides to using this method as well, especially when it comes to studying anticipatory coarticulation.

For instance, a very important aspect in shadowing, is the potential role of phonetic convergence. A large body of research has shown that in conversation, humans tend to imitate aspects of the speech of their conversational partner (e,g. Goldinger, 1998). This phenomenon, referred to as phonetic convergence or imitation, is shown to be manifested in several phonetic features, such as speech rate and intonation (Putman & Street, 1984; Giles, Coupland & Coupland, 1991). It has also been found to affect more specific features, such as spectral characteristics of both vowels and consonants (Delvaux & Soquet, 2007; Honorof, Weihing & Fowler, 2011) and vowel durations (Delvaux & Soquet, 2007). Even though speakers tend to converge more to speech in a situation where they can see and hear their partner (Dias & Rosenblum, 2016), the presence of phonetic convergence also extends to outside of an interactive context. In several studies using a shadowing paradigm, results show that the speech of a participant becomes more similar to that of the model speaker after being briefly exposed to it (e.g. Babel, 2012; Pardo, Urmance, Wilman & Wiener, 2017; Schockley, Sabadini & Fowler, 2004).

Moreover, a study by Dufour & Nguyen (2013) suggests that phonetic convergence is a semi-automatic process that speakers are generally unaware of. They compared utterances produced in a typical shadowing task to those produced in a task where participants were given explicit instructions to imitate the auditory stimuli they were exposed to. Even though the convergence effect was greater for the explicit imitation task, a post-exposure task in which participants had to read aloud words from a screen revealed similar convergence effects for both groups, indicating that shadowing and imitation seem to share a general mechanism.

Additionally, several stimuli properties have been named to mediate the results found in shadowing tasks, such as the lexical frequency of the stimuli, the number of exposures for a given stimulus (Goldinger, 1998) and the participant’s degree of attention and engagement with the task during target exposure (Goldinger, 2013). In this, low levels of lexical frequency, a greater level of attention and more target exposure are associated with stronger imitation/convergence patterns. In order to explain these results, Goldinger (1998) proposed that the degree of convergence can be linked to the weight of new instances in a speaker’s lexicon. According to this theory, it is assumed that ‘that every stimulus leaves a unique memory trace, which includes rich phonetic details and is associated with a category label, and

(15)

15 that phonological categories are represented in memory as clouds of such traces’ (Nielsen, 2014, p. 2066). As such, phonetic convergence is seen as a natural result of speech perception, reflecting changes in phonological representations. Low-frequency words, then, are expected to show stronger a greater degree of imitation, as more weight is given to new stimuli. Moreover, the more a speaker is exposed to a stimulus associated with the modeled speech for a specific word, the higher the degree of convergence is expected to be (Goldinger, 1998; Nielsen, 2014).

Regarding developmental changes in phonetic imitation/convergence between children and adults, no explicit predictions are made by the theory described above. In fact, most studies conducted on phonetic convergence concern adult data. Literature on how children imitate speech is scarce in comparison to adults, and therefore the developmental course of phonetic convergence remains speculative (Nielsen, 2014). However, there are some studies that looked into the phenomenon in child speech. For instance, Kuhl and Meltzoff (1996) analyzed vowel (/a/, /i/, /u/) productions of 12-, 16- and 20-week-old infants while they listened to model stimuli and found that the vowels produced by the older infants showed more similarities with those of the model speaker. Although the speech of children at the age investigated in the study by Kuhl and Meltzoff (1996) is generally considered to be pre-linguistic, it indicates that children from a very young age onwards are able to imitate aspects of speech when exposed to a model speaker. In line with this, a study by Ryalls and Pisoni (1997) found that the word duration of utterances produced by younger children was more similar to that of the model speaker than utterances produced by older children and adults. In yet another study, Nielsen (2014) compared the voice-onset time (VOT) of pre-schoolers, third-graders and adults before and after being exposed to model speech with artificially increased VOT. She found that children showed significantly more imitation of the model speech than adults, but that the level of imitation was similar for the pre-schoolers and the third-graders. The studies by Ryalls and Pisoni (1997) and Nielsen (2014), although focusing on different aspects of speech (i.e. vowel production, word duration and VOT) seem to suggest that phonetic convergence is present to a higher degree in the speech of children than in the speech of adults. This would be in line with the exemplar-based theory as proposed by Goldinger (1998), which predicts that as younger speakers have fewer examplars in memory than older speakers, child speech would exhibit higher levels of imitation than adult speech (Nielsen, 2014). However, more research is needed to establish how the level of imitation develops in children over time and how and when they reach an adult-like stage.

(16)

16 Considering the gap in the literature when it comes to phonetic imitation/convergence in children, it is remarkable that its role is rarely mentioned in studies on coarticulation, especially since these studies often make use of a shadowing paradigm. Given the findings that anticipatory coarticulatory cues are used in word perception to prepare for an upcoming utterance, it is expected that there is also a greater degree of coarticulation in the production of speech in a task that involves some form of imitation (Zellou, Scarborough & Nielsen, 2016). For instance, when studying adults, Zellou et al. (2016) found that participants altered the degree of coarticulatory vowel nasality to a level that was similar to the one demonstrated in the prime they were exposed to, in comparison to their pronunciation of the target words at base line. In a follow-up study, Zellou, Dahan & Embick (2017) used an adapted version of the shadowing paradigm in which participants had to read aloud a printed word after auditory exposure to a different word. Again, the researchers found that participants imitated the coarticulatory vowel nasality of the model speaker, even when the stimuli that participants were primed with were different from those they had to produce themselves. The authors concluded that participants in their study were able to extract the nasal feature from the utterances they were exposed to and subsequently integrate it into their own production of different utterances.

The studies by Zellou et al. (2016, 2017) show that in a shadowing task, anticipatory coarticulation patterns of a speaker are more similar to those of the model speaker as compared to the speaker’s baseline patterns, which suggests that the speaker’s coarticulation patterns might differ when they are recorded in a task that does not involve a shadowing paradigm. Moreover, given the findings that children are likely to show more signs of imitation to model speech than adults, this might extend to them exhibiting stronger coarticulatory patterns when their speech is elicited in a shadowing task. However, as far as we know, no studies as of yet have investigated anticipatory coarticulation in children by directly comparing coarticulatory patterns under different task circumstances.

Another downside to using a shadowing paradigm could be that the speech recorded is not very natural. While the method ensures that the level of similarity between recordings can be maximized, which is especially practical when dealing with young participants, the shadowing paradigm leaves little room for self-paced speech output (Whiteside & Hodgson, 2000).

(17)

17 A picture naming task is a method that allows for eliciting speech that is more spontaneous. In such a task, the participant is required to identify the concept presented to them in an image. Picture naming tasks are often used in studies looking at single-word production and are suitable for participants of all age cohorts, considering that reading competence is not required. In the context of coarticulation, there are a few studies that have employed a picture naming paradigm (e.g. Katz, Kripke & Tallal, 1991; Soo-Eun, Ohde & Conture, 2002; Whiteside & Hodgson, 2000), but these studies do not reflect on a potential role of this task on the coarticulation patterns found. Instead, their reasoning behind using the picture naming paradigm mostly concerns that it allows for the collection of more self-paced speech output in comparison to the more commonly used shadowing paradigm.

2.3. Measuring anticipatory coarticulation 2.3.1. Ultrasound tongue imaging

Although earlier studies on anticipatory coarticulation largely focused on acoustic analyses (e.g. Katz, Kripke & Tallal, 1991; Repp, 1986; Whalen, 1990), articulatory data can provide more insights on the dynamic movements of the articulators. Coarticulatory processes engage multiple articulators such as the lips and the tongue, whose actions need to be coordinated in order for intelligible speech production to occur. The current study focuses on the organization of tongue gestures, as this organ is essential when it comes to the production of vowels and consonants (Barbier et al., 2015; Noiray, Ménard, & Iskarous,2013; Song, Demuth, Shattuck-Hufnagel, & Ménard, 2013; Zharkova, Hewlett, Hardcastle, & Lickley, 2014). Measuring tongue movements is not an easy task, as the tongue is located deep within the oral cavity and measurement therefore often requires a device to be inserted into the mouth. Some examples of techniques that do this in order to provide real-time visualization of the articulators are electropalatography (EPG) and electromagnetic articulography (EMA). In EPG research, a custom-made artificial palate containing electrodes is fit against a speaker’s hard palate, which allows researchers to measure the contact between the tongue and the palate during speech. In EMA research, electrodes are placed directly on the speaker’s tongue and other articulators to measure their movements. In contrast with these methods, ultrasound tongue imaging (UTI) is a less invasive method as it does not require any devices to be inserted into the mouth. Ultrasound imaging is well-known for its medical applications in fetal imaging and in studying internal organs, but it has also gained popularity in the field of speech research since the 1980s (Stone, 2005). With the use of a transducer (also called a probe) that is placed on the skin, the

(18)

18 functioning of internal organs can be visualized. When studying tongue movements for the purpose of speech research, the transducer is placed underneath the chin of the participant, after which an image of the tongue surface can be projected on a screen (Gick et al., 2008; Abel et al., 2015). An example of an ultrasound image of the tongue is given below in Figure 2.

Figure 2. Example of an ultrasound image of the tongue (left), which is created when the probe is placed underneath the chin (right).

Figure 2 displays the inverse-cone shape that the ultrasound image can be recognized by. With the transducer at the bottom of the image, the ultrasonic waves are emitted upwards. The location of the tongue contour is signified by the white line, which represents the reflection from the air just above the tongue surface. The tongue shape is presented in the mid-sagittal view, with the tongue tip found on the left side of the image, whereas the right side of the image depicts the more posterior areas of the tongue.

The main advantage of UTI over other techniques mentioned before lies in its non-invasiveness, which makes it for instance more suitable for testing young children. Other advantages are that many UTI devices are portable, which allows for use outside of a laboratory environment, and that it is more readily available (unlike for instance EPG, which usually requires custom-made palates). These aspects, in combination with UTI’s ability to provide real-time images of the tongue, provide for instance possibilities for use in speech therapy (Bernhardt, Gick, Bacsfalvi & Adler-Bock, 2005; Cleland, Scobbie, Nakai & Wrench, 2015; Preston et al., 2017).

(19)

19 However, there are also some disadvantages to the use of UTI, most of which relate to issues of quantification. A large amount of information is presented in an ultrasound image and there has been little standardization, with different researchers using different approaches to quantify tongue movements (Gick et al., 2008). In terms of data collection issues, the probe is likely to move while a person speaks, because of which some form of head and probe stabilization is necessary (Stone, 2005; Zharkova, 2013). Another issue concerns the variation in image quality between speakers. For instance, image quality tends to be better for younger speakers than for older speakers, which can partly be related to there being less fat in the tongue tissue to refract the ultrasound waves (Stone, 2005). Considering the high level of variability in ultrasound data, as is the case for most speech data, normalization is one of the pre-processing steps that should be taken when drawing comparisons between speakers (Stone, 2005; Zharkova, 2013).

2.3.2. Measuring anticipatory coarticulation over time: Generalized additive modeling

Speech data is inherently dynamic and variability between speakers as well between items are aspects that should be taken into account when analyzing such data (Wieling, 2018). When it comes to studying anticipatory coarticulation, this item- and subject-related variability makes that coarticulation patterns tend to be complex and therefore are difficult to capture with a linear approach. However, at the same time, the fact that dynamic data (e.g., tongue movement over time) can be difficult to analyze makes that researchers tend to simplify the dataset before starting the analysis in order to have a dataset with a more manageable size. Many studies therefore make use of single time point analyses with techniques such as linear regression models, locus equation or linear mixed-effects models. The main disadvantage to such linear approaches is that they might miss out on some of the complex, dynamic features of patterns underlying coarticulation.

In this study, we account for this problem by using Generalized Additive Modeling (GAM; Wood, 2017). Unlike the techniques mentioned above, there is no need for simplifying datasets when using GAMs. It is a non-linear regression method that can be used to pinpoint general patterns over time, while still accounting for variability between speakers and items (Wieling, 2018). Because of this, it is possible to identify more fine-grained, complex patterns that purely linear methods could overlook and it is therefore more suitable for the analysis of dynamic articulatory data. Section 5.2. will provide a detailed description of how GAMs were used in our study.

(20)

20

3. Research aims and hypotheses

Following the discussion of the literature in the previous chapter, our study aimed to gain more understanding of intersyllabic anticipatory coarticulation in schwa-consonant-vowel sequences across a word boundary by analyzing data on tongue movements, which we collected using ultrasound tongue imaging.

In short, our study can be split up into two main objectives, namely:

1) To study variation in long-distance vowel anticipation over time in different age groups; 2) To compare anticipatory patterns in a setting aimed to elicit more spontaneous speech

(picture naming) to those in a setting where utterances are repeated after a model speaker (shadowing)

With the first objective, we aim to replicate findings from previous studies and hope to gain more knowledge on the maturation of anticipatory coarticulation processes in children. In order to do this, our study is modelled after the one conducted by Noiray et al. (2019). Based on their findings, we had several predictions. First of all, we expected anticipatory coarticulation patterns to be present for all children (Katz, Kripke & Tallal, 1991; Sereno, Baum, Marean & Lieberman, 1987). Moreover, we expected the vocalic anticipation over time to follow non-linear patterns for all age cohorts, as the degree of the anticipatory behavior is expected to vary between subjects and be dependent on stimulus characteristics such as the target vowel and its neighboring consonants (Noiray et al., 2018; Rubertus & Noiray, 2018). Since non-linearity was predicted, we also opted for a GAM analysis that can take the non-linearity into account. We expected the non-linear patterns to be different when comparing different age cohorts. In line with Noiray et al. (2019), we expected greater levels of anticipatory coarticulation for younger speakers under the age of 7 in comparison to older children, who were hypothesized to show more adult-like patterns.

With regard to the first research aim, it should be noted that our study differed from Noiray et al. (2019) in a number of ways. The first difference is that rather than pseudowords, real words were used. The reason for this and simultaneously the second research aim of our study was that we wanted to compare anticipatory coarticulation patterns between two tasks. Since we wanted to be able to include participants of a young age who had not yet acquired reading competence, we opted for a picture naming task rather than a reading task. This way, it was

(21)

21 possible to compare coarticulation patterns in a more spontaneous speech eliciting situation (picture naming) versus those in the commonly used shadowing task.

Based on previous studies, we expected that speech elicited in the shadowing task might be influenced by exposure to the model speaker (Babel, 2012;Pardo, Urmance, Wilman & Wiener, 2017; Schockley, Sabadini & Fowler, 2004). Following from this, anticipatory coarticulation patterns were expected to be manifested differently in the picture naming and the shadowing task. Since phonetic convergence tends to have a stronger presence in child speech as compared to adult speech (Ryalls & Pisoni, 1997; Nielsen, 2014), we expected that the age of the participants would have an effect on the difference in anticipatory coarticulation between the two tasks. If younger children exhibited stronger patterns of anticipatory coarticulation at baseline but showed signs of imitation of the (adult) model speaker in the shadowing task, it could be expected that the degree of anticipatory coarticulation would be higher in the picture naming task than in the shadowing task. If older children did indeed show more adult-like patterns of anticipatory coarticulation in general (Noiray et al., 2019), differences between the tasks were expected to be small, as their coarticulation patterns would already at baseline be more similar to those of the model speaker in the shadowing task.

By investigating the role of age and task choice, we believe that our study can contribute to the body of literature on anticipatory coarticulation in several ways. First of all, despite the high number of coarticulation studies available, a relatively small percentage of those looks at developmental patterns. As laid out by Noiray et al. (2019), it is valuable to gain more understanding not only on non-linear patterns within individuals when it comes to anticipatory coarticulation, but also on how these patterns develop over time. Moreover, relating to our second research aim, we hope to provide more insights into the role of task choice in studying anticipatory coarticulation. If different coarticulation patterns are exhibited for the two different tasks used in our study, this could influence the task choice for future studies on coarticulation.

(22)

22

4. Method

In this section, we will discuss our choices regarding the methodology that was used to answer our research questions. This includes an explanation of our research setting, the recruitment of participants, our stimuli design, the equipment that was used and the experimental protocol.

4.1. Participants

Because it can be difficult to recruit children to participate in a study, we opted for data collection in a place where we knew there would be a large pool of potential participants. Therefore, we sent in a proposal to the NEMO Science Museum in Amsterdam, a museum that aims to bring science and technology closer to people and is specifically targeted at children. The NEMO Science Live program offers researchers the opportunity to conduct research in which visitors of the museum can participate. This way, researchers are able to collect data, but at the same time, the program also serves the purpose of outreach.

The participants for our study were recruited from the pool of visitors of the museum. A notification about the study was placed on the website of NEMO in the weeks leading up to the data collection. Moreover, staff members of the museum were briefed about the project so that they could inform visitors about the study and show them the designated area where it would take place. In principle, all people who spoke Dutch and who showed interest were allowed to participate, as the research environment also served the purpose of outreach. However, children under the age of 18 were only allowed to participate if a parent or legal guardian was present to sign the consent form. In total, 86 people participated in the experiment. No form of compensation was provided for participation. Ethical approval for this study was obtained from the Research Ethics Committee of the Faculty of Arts of the University of Groningen (CETO), proving that the research protocol was in line with internationally recognized standards in order to protect research participants (see Appendix A1). We used several exclusion criteria in order to select the data that would be included in the final analysis. We only included the data of participants who were native speakers of Dutch. We excluded participants with physical features that they indicated to affect their speech, such as missing teeth or abnormalities in the shape of the tongue. Data was excluded from the analysis if participants had speech, hearing or motor disorders that could also (negatively) influence their speech. As many Dutch children receive some (mild) form of speech therapy, this was only used as an exclusion criterium if parents/guardians indicated that their child was suffering from a speech disorder. The final

(23)

23 sample (out of a total of 86 participants), taking into account the exclusion criteria, consisted of a total number of 61 participants (28 male, 33 female) aged between 5 and 15 years old. Figure 3 gives an overview of the distribution of age and gender within the sample. While ideally it would have been better to have more younger participants, it became clear that in this particular research setting, it was difficult to find young children who would participate.Young children were often accompanied by older siblings and in this scenario, parents/guardians often indicated that their older child would be more likely to finish the entire experiment and provide better quality data. This explains the lower numbers of participation in the younger age group.

Figure 3. Overview of the distribution of age and gender across participants.

4.2. Materials

The speech production material for this experiment consisted of nouns (target words) accompanied by a fixed adjective that served as the carrier phrase. There were several criteria that needed to be taken into account when designing the stimulus materials. The first one consisted of a time constraint as set by NEMO, namely that the duration of the experimental procedure could not exceed 20 minutes. As testing children takes more time in terms of keeping their attention and making them feel comfortable, this posed a limit to the size of our stimuli set. In order to have a rich dataset, we opted for a smaller set of target words so that we could record multiple repetitions of each word in the two tasks.

(24)

24 The target words themselves were also required to meet a number of criteria. First of all, as one of the tasks in the study involved picture naming, target words needed to be able to be represented in a recognizable, visual form. Moreover, the words needed to be familiar to children of all ages and thus have a relatively high frequency. As a last requirement, for the objective of comparison, the target words needed to be similar in phonetic structure as well as word length. Many studies on coarticulation make use of pseudowords, for which it is relatively easy to control for the types of consonants and vowels that are used. However, as our intentions were to incorporate real words instead of pseudowords, it was not possible to find words that were both fully controlled for phonetic environment, highly frequent and could be presented in the form of an image. Therefore, we made a compromise by using target words that would be easily recognizable for children of all ages, while keeping the structure of the words as similar as possible. The final production material included in the analysis consisted of nine nouns that followed a C1VC2 pattern, referring to consonant (1)-vowel-consonant (2), followed by a schwa (see Table 1). For most items, the final schwa in the target word was followed by /n/. However, it should be noted that in standard Dutch, when /n/ occurs in a word-final position after a schwa, the nasal sound is typically not realized, especially in the case of spontaneous speech (Van de Velde & Van Hout, 1996; Van Oss & Gussenhoven, 1984; Voortman, 1994).

Table 1. Overview of the target words used in the study

Word Meaning C1 (IPA) V (IPA) C2 (IPA)

Benen (/be:nə/) Legs /b/ /eː/ /n/

Bezem (/beːzəm/) Broom /b/ /eː/ /z/

Boeken (/bukə/) Books /b/ /u/ /k/

Boten (/boːtə/) Boats /b/ /oː/ /t/

Kiezen (/kizə/ Molars /k/ /i/ /z/

Molen (/moːlə/) Mill /m/ /oː/ /l/

Poezen (/puzə/) Cats /p/ /u/ /z/

Tenen (/teːnə/) Toes /t/ /eː/ /n/

Voeten (/vutə/) Feet /v/ /u/ /t/

Apart from the words in Table 1, the original production set also contained the words ‘lepel’ (/leːpəl/ - ‘spoon’), ‘zeven’ (/zeːvən/ - ‘seven’), ‘tafel’ (/taːfəl/ - ‘table’), ‘koffer’ (/kɔfər/ -

(25)

25 ‘suitcase’) and ‘vogel’ (/voːxəl/ - ‘bird’). However, these items were excluded from the analysis as a large degree of variation in pronunciation was found between speakers. For ‘koffer’ (‘suitcase’), this was mostly found in the pronunciation of the final ‘r’ (either as /ʀ/ or /ɹ/), whereas for ‘zeven’ (‘seven’) the /eː/ was often pronounced as /øː/, which is common for speakers from the southern regions of the Netherlands (Schönfeld, 1970). For the words ending in /l/, the preceding schwa was often replaced with a vowel more resembling /ɔ/. Consequently, these items were excluded from the analysis as well.

Target words were preceded by the Dutch word ‘kleine’ (/klɛinə/, ‘small’), as this word was closest to the carrier phrase /aɪnə/ that was used in the (German) study by Noiray et al. (2019). For instance, the whole sequence for the target word ‘boeken’ would be /klɛinəbukə/. In the analysis, vocalic anticipation in these sequences was estimated using four time points: the midpoint and offset of the schwa in ‘kleine’ and the midpoint and offset of the consonant preceding the vowel of the target word. Figure 4 provides an example of the timepoints for the target sequence /klɛinəbukə/.

Figure 4. Overview of the timepoints used in the study.

For the picture naming task, the MultiPic picture database (Duñabeitia et al., 2018) was used to find pictures with which the target words could be visualized. Some items did not have a picture equivalent in the MultiPic database. For these items, we used a copyright-free image found using the Google Advanced search option. Appendix A2 shows all pictures.

The auditory stimuli that were used in the shadowing task (the carrier phrase ‘kleine’ plus the target words) were recorded by an adult female model speaker who was a native speaker of Dutch.

(26)

26 Prior to the data collection at NEMO, a pilot study was conducted with a three-year-old participant in order to improve the experimental setup, to optimize the timing of the recording and to ensure that pictures were unambiguous and easy to name for children of all ages.

4.3. Experimental equipment

The ultrasound tongue imaging data was collected using the Articulate Assistant Advanced software (Articulate Instruments Ltd., 2019). The frame rate for the ultrasound recordings was 83.33Hz. The probe was positioned in the middle underneath the participants’ chin in order to record the tongue surface contour in the midsagittal plane. A special headset (Articulate Instruments, Ltd, see image below) was used to stabilize the position of the ultrasound probe with respect to the head as much as possible.

Figure 5. The UTI headset used in the experiment.

In order to obtain the best image quality, the probe was positioned to maintain intimate contact with the chin without causing discomfort to the participant. Additional head-to-probe stabilization methods were not used in order to keep the setup comfortable for children and to maximize the level of natural, unimpeded speech recording. Although video recordings are often used to correct for probe movement, we decided not to record video data for this study, as we believed that parents/guardians would be less likely to consent to their child’s participation.

Next to the ultrasound tongue imaging data, acoustic recordings were made using a Shure WH20 XLRmicrophone, which was attached to the ultrasound headset. During the experiment, the microphone was positioned close to the mouth of the participant. The acoustic speech signal

(27)

27 was recorded at 22.05 kHz. Acoustic and ultrasound data were synced automatically in the AAA software.

4.4. Experimental procedure

During the days of data collection at NEMO, one or two researchers recruited participants outside of the experiment room. Interested passers-by were given information about the experiment and were shown ultrasound images on a screen to attract their attention.

Once a participant showed interest in participating in the study, their parent(s) or legal guardian(s) received an information letter (Appendix A3) and a consent form (Appendix A4) which they were asked to sign. They were informed that the study would take approximately 30 minutes and that the participant as well as they themselves could ask questions or withdraw their participation at any point. The parent(s)/guardian(s) were also asked to fill out a questionnaire that contained standard background questions about the participant regarding for instance age, gender and province of birth and living, but also included more specific questions about (history of) language, speech, hearing and motor disorders. The full questionnaire can be found in Appendix A5.

Following the intake procedure, the participant was led to the experiment room, which consisted of the Science Live area of the museum decorated in a way to make it attractive for participants. (see Figure 6). The parent(s)/guardian(s) were allowed to join the room as well if it made the child feel more comfortable, but they were asked to remain silent during the course of the experiment to avoid distraction.

(28)

28 Upon entering the room, the participant received an explanation of the experiment and the tasks that they would be doing. Then, the researcher proceeded to explain the experimental setup by showing the probe and explain how it would be attached during the experiment. In order to make the child feel as comfortable as possible, we took as much time as they needed for this part and allowed them to ask all questions they had. The participant was also informed that they could withdraw their participation at any point if they did not feel comfortable anymore. The researcher then proceeded to attach the ultrasound equipment. The stabilization headset was adjusted to perfectly fit the participant’s head, after which the probe was positioned underneath the participant’s chin. Participants were seated approximately one meter away from a 27-inch-monitor which was positioned at eye level. A pillow was placed on the back of the chair to prevent participants from slouching and to help them remain seated in an upright position throughout the experiment.

Once the headset was attached to the participant, the researcher explained them how to interpret the ultrasound image on the screen. However, afterwards, as we did not want participants to be able to view the ultrasound image during the experiment, a screen was put in front of the ultrasound image and the prompt list, so that the participant was unable to see upcoming prompts and they would keep their focus on the top part of the screen where the picture would appear in the picture naming task (see Figure 7).

(29)

29 Figure 7. Participant screen. This image shows the screen during the experiment. The part below the red line (i.e. the prompt names and the UTI signal) were blocked from the participant’s view during recording.

4.4.1. Picture Naming task

The goal of the first task was to name the object(s) in the pictures presented on the screen. Before the start of the task, the participant underwent a short training session, during which the researcher went through all pictures and the target words associated with them. The aim of this training session was for the participant to become familiar with the pictures that they would be asked to name later. In order to minimize the influence of the researcher during the familiarization phase, the researcher asked the participant to name the object in the image and only used the target words herself if the participant could not name the correct target word. The participant also received the instruction to always use the carrier phrase ‘kleine’ prior to naming the objects in the pictures. Stimuli were presented in a semi-random order in the form of lists, as the AAA software did not allow for full randomization of stimuli. Three repetitions of each item were recorded unless a child was not able to concentrate for the whole period of time. In the latter case, only two repetitions were recorded for each word. If the participant named a picture incorrectly (for instance, if they uttered the synonym ‘katten’ instead of ‘poezen’), the researcher paused the recording and asked the participant if they remembered the correct target word. If they did, another recording was made using the correct target word. If the participant

(30)

30 did not remember the correct target word, the researcher did not repeat the correct word as to prevent a shadowing effect, but simply continued with the next item.

4.4.2. Shadowing task

For the shadowing task, the procedure was similar to the picture naming task, except that the participant was presented with auditory instead of visual stimuli. The participants wore headphones (Trust) through which they heard the carrier phrase and target word as pronounced by a female model speaker. The goal of the task was to repeat aloud the sequences that they had heard. Again, items were presented in a semi-random order and three repetitions of each item were recorded, unless a child was unable to complete the experiment, in which case two repetitions per word were recorded.

It should be noted that the picture naming and the shadowing task were always presented in the same order (i.e. no counterbalancing was used). The reason for this was that this way, the data collection in the picture naming task could be seen as a baseline recording of the participant’s speech. If the shadowing task had been presented first, the potential effect of the speaker adjusting their utterances to those of the model speaker could have been carried over to the speech output from the picture naming task. Although this issue could have been solved by administering the picture naming task twice (i.e. similar to a pre- and post-test), the time constraints that we faced did not allow for this procedure.

4.4.3. Diadochokinesis task

If there was any time left, participants also took part in a diadochokinesis (DDK) task which is used to measure the ability to carry out alternating speech motor movements at a high speed (e.g. Stackhouse, 2000; Karlsson & Hartelius, 2019). In this task, participants were asked to repeat the syllables /pa/, /ta/ and /ka/ as quickly and as many times as possible within one breath. This was then also done for the multisyllabic /pa-ta-ka/. The data of this task is not part of the study discussed in this thesis, but may be used in a future study comparing the speech motor movements of healthy controls to those of children with Duchenne muscular dystrophy.

Ultrasound and acoustic data was recorded for all items during both tasks. The researcher constantly monitored the ultrasound image and the audio stream in AAA. At the same time, they kept an eye on the participant in order to check whether the ultrasound probe was still in the proper position. If the researcher noticed that the probe had moved, which typically occurred once or twice for older participants and more often for younger participants, the recording was paused. The researcher adjusted the probe to its original position, which was

(31)

31 done by comparing the live ultrasound image to the image of the first recording. In order to keep participants comfortable, the researcher asked them several times how they were doing and reminded them that they could withdraw their participation at any time. After finishing the experiment, participants (and their parent(s)/guardian(s)) were debriefed (see Appendix A6) and children received a certificate stating that they had participated in the study, which was provided by the NEMO Science Museum.

(32)

32

5. Analysis

In order to test for the effects of age and task choice on anticipatory coarticulation, we conducted several analyses. This section will first describe the steps that were taken in order to process the data, as performing analyses using ultrasound data requires quite a large number of preprocessing steps. Therefore, this section will go into detail of the data preparation procedure in PRAAT (version 6.1.03; Boersma & Weenink, 1996), the semi-automatic tongue detection as performed in MATLAB (version 9.4.0; r2018a) and finally we will discuss the statistical analysis that was performed in RStudio (RStudio Team, 2020).

5.1. Preprocessing 5.1.1. File preparation

As described in methodology section, audio and ultrasound data were recorded using the Articulate Assistant Advanced software. After data collection, ultrasound files were exported from the software in the form of AVI video files. Along with the data files, each recording was exported with a text file containing the target word, the task during which it was recorded (picture naming or shadowing), the participant number of the speaker as well as the timestamp of the recording. These elements were then used to automatically rename the audio and video files so that all file names contained the necessary information on the recording (target word, task, participant number and the repetition number of the item).

As for the audio files, these were not exported from AAA directly as their duration did not match the duration of the ultrasound video for the same recording. The reason for this is that during data collection, the audio recording is initiated slightly before the ultrasound recording to make sure that synchronization signals are captured accurately. As a result, there is a slight delay between the start of the audio and the time of the first ultrasonic frame. In the exported video files, the alignment within the AAA software is maintained and the video file is trimmed so that it only contains the part of the recording that has both audio and ultrasonic data. However, as we needed audio and video files of equal length for further processing, we used a Python script that extracted a WAV file from the AVI file to then match the length of both files.

(33)

33 5.1.2. PRAAT annotation

The next preprocessing step involved the phonetic labeling of the target sounds in the audio files. This was done manually for all recordings using the PRAAT software (Boersma & Weenink, 2016). Recordings were excluded from the annotation process (and thus from further analysis):

• If there seemed to be a problem with the audio recording;

• If either the carrier phrase or the target word was incorrectly pronounced by the speaker (e.g. /buzə/ was pronounced instead of /puzə/);

• If an /n/ was inserted between the carrier phrase and the target word (e.g. /klɛinənpuzə/ instead of /klɛinəpuzə/);

• If the wrong target word was uttered (this was the case only for the picture naming task, e.g. ‘tanden’ (teeth) instead of ‘kiezen’ (molars);

• If there was a noticeable hesitation or stutter in the pronunciation of the target word (e.g. /klɛinək-kizə/), as this could influence coarticulation;

• If there was a noticeable pause between the carrier phrase and the target word, as this could also influence coarticulation.

For the recordings that did not contain any of these issues and thus would be included in the analysis, two tiers were created to label the orthographic representation of the target sequence (carrier phrase ‘kleine’ plus the target word) and the sounds within this sequence(@1, C1, V1, C2, @2) respectively. Figure 8a and 8b below provide examples of the annotation for the items ‘kleine benen’ and ‘kleine poezen’.

Figure 8a. Phonetic labeling of the target sequence ‘kleine benen’. This image shows the waveform (top row), the spectrogram (second row), the ORT tier containing the orthographic representation of the target sequence (third row) and the MAU tier1 containing the target phonemes (bottom row).

1The name of the MAU tier generally refers to data that has been segmented automatically using the

software WebMAUS (Kisler, Schiel, & Sloetjes, 2012). Our segmentation was done manually, but we used the same tier names as those used in the annotation protocol we followed.

(34)

34 Figure 8b. Phonetic labeling of the target sequence ‘kleine poezen’.

Stable periodic cycles in the oscillogram were used for vocalic reference. To determine the onset of the schwa as well as the vowel, we used the same procedure as Noiray et al., 2019, namely by looking at the first ascending zero-crossing in the oscillogram at the beginning of the periodicity. The first ascending zero-crossing after the periodicity was indicated as the beginning of the consonant. The output of this phonetic labeling process consisted of PRAAT TextGrids containing the relevant time points used in the subsequent analyses (midpoint and offset of the schwa, midpoint and offset of the consonant and midpoint of the vowel).

(35)

35 5.1.3. Semi-automatic tongue contour tracking in MATLAB

The next preprocessing step involved obtaining the tongue contour information for the five target time points. We used the software package SLURP (Laporte & Ménard, 2018) which contains MATLAB scripts that are able to semi-automatically track the tongue contours in ultrasound recordings. In order to do this, the user must first choose an initial frame and select a number of anchor points below the tongue surface (see Figure 9 below). This frame is then used for subsequent tongue detection for all other frames in the recording. This process results in a100-point spline that is automatically fitted to the midsagittal tongue surface contour for each frame and allows for the extraction of the x- and y-coordinates for each of these 100 points.

Figure 9. Selecting anchor points in reference ultrasound frame

Note. This image was reprinted from a protocol for the SLURP software, created by Dzhuma Abakarova

and dr. Aude Noiray (Laboratory for Oral Language Acquisition, University of Potsdam).

After the automatic detection is finished, an energy plot is created in which the quality of the contour tracking is indicated. As can be seen in the energy plot in Figure 10 below, frame numbers are found on the x-axis and tongue positions along the contour (the top being the posterior part of the tongue, the bottom the anterior part) are found on the y-axis. In this greyscale image, black indicates that the quality of the tracking is probably sufficient, whereas white indicates potentially poor quality. By clicking on the energy plot, the user is able to check the tracking for the corresponding frame. For instance, the example energy plot in Figure 10

Referenties

GERELATEERDE DOCUMENTEN

While brain activity differed depending on whether participants listened to native or non-native accents, their overall performance, measured by word recall, was unaffected, and

First and foremost, our data support the hypothesis that target vowels are detected earlier when anticipatory coarticulation is provided in the preceding syllable (word), even across

The primary objective of this chapter, however, is to present the specific political risks identified in the Niger Delta, as well as the CSR initiatives and practices

The generalized service time for writes, B~, is now larger than B~: the second part of a write can start at the same moment as the first part with probability Po and it now has to

Bospaden waar in principe gefietst mag worden maar die geen fietspad zijn en die in de regel niet worden gebruikt door fietsers, of als verbinding geen meerwaarde hebben voor

The effects of the AOI in the eyetracking paradigm were tested using within subjects GLM, with the factor “AOI” with two levels (faces, eyes). Paired samples T‐tests were used

If the transition sound that occurs when two abutting vowels are fluently joined across a word boundary is just the result of coarticulation, one would expect such a sound sequence

The pattern of results for HIV exposed children on the A-not-B task suggests differences with unexposed children regardless of the infection status Children