• No results found

MOTOR INTERFERENCE IN SPEECH PRODUCTION

N/A
N/A
Protected

Academic year: 2021

Share "MOTOR INTERFERENCE IN SPEECH PRODUCTION"

Copied!
32
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Running head: MOTOR INTERFERENCE IN SPEECH PRODUCTION

Content-induced Motor Interference in Speech Production

Kasper I. Kok, University of Amsterdam

(2)

Abstract

We carried out two experiments in which participants recited tongue twisters that comprised grammatical and meaningful sentences. Both experiments showed that speech error patterns in this task reflect those elicited in more artificial tongue twister paradigms. One of the two experiments, which involved a visually presented context, revealed that speech error rates are sensitive to content. Participants made significantly more errors on the stimuli that described a mouth-related action than on those that were mouth-unrelated. This is interpreted in the light of recent literature on embodied sentence processing: we argue that these

increased error rates are due to motor interference induced by mouth-specific motor simulations. Finally, we provide preliminary evidence that motor simulation in language production may take place concurrently with, but not before articulation of action verbs.

(3)

Content-induced Motor Interference in Speech Production

Physical actions are inherent to language use. Ample literature has suggests that comprehending the meaning of sentences that describe motor action involves re-enactment of previous sensorimotor experiences (for foundational work, see Barsalou 1999; Glenberg and Robertson 2000). Glenberg & Kaschak (2002) were among the first to demonstrate this experimentally. They reported an Action-Sentence Compatibility Effect: when people are exposed to sentences that describe a particular action, they are relatively fast to subsequently perform a congruent motor action their selves. This effect has been found for arm movements toward and away from the body (Glenberg and Kaschak 2002), hand rotation in a particular direction (Zwaan and Taylor 2006), and hand shape (e.g. flat palm or closed fist; Bergen and Wheeler 2005).

Converging experimental evidence furthermore reveals that the motor system’s involvement in language comprehension is not limited to manual actions. Studies employing behavioral techniques, fMRI and TMS have shown that processing language about the hands, feet and mouth recruits motor areas specific to the corresponding parts of the body (Bergen et al. 2010; Buccino et al. 2005; Tettamanti et al. 2005).

The role of motor resonance in language production, as of yet, has remained relatively unexplored. Does production of meaningful utterances engage the motor system in the same way? This paper reports two studies that address this question by means of a novel,

interference-based paradigm.

Motor simulation in language production

Plausibly due to a lack of well-established empirical methods, our knowledge about motor simulation in language production is relatively limited. Currently, the only sources of evidence are studies on co-speech gesture and studies on non-linguistic priming.

(4)

Gesture studies have shown that people use more co-verbal gestures when they speak about spatial scenes than when they speak about non-spatial information (Alibali et al. 2001; Krauss 1998). Moreover, the spatial characteristics of these gestures are often congruent with the spatial dimensions of the verbal descriptions they accompany (e.g., Casasanto et al. 2006). Among other findings, these observations have given rise to the Gesture as Simulated Action framework (see Hostetter and Alibali 2008). Verbal and gestural aspects of a linguistic utterance, according to this view, are rooted in one and the same motor simulation of the expressed content.

Another, more direct line of evidence comes from behavioral experiments. In a recent study, Sato (2010) observed that the content of the sentences people produce in a semi-constrained message formulation task1 can be influenced by non-linguistic priming. After performing an action toward or away from the body, participants were more likely to produce a sentence describing an action in a compatible direction than in an incompatible direction. Burke & Feist (unpublished), in a more natural setting, observed that people giving verbal descriptions of the route between two locations used more linguistic constructions that involved the word “up” if they had just been walking up a stairs, and used the word “down” more often when they had just been walking downwards. These findings suggest that message formulation and lexical selection are sensitive to recent motor performances.

In the current study, we introduce a novel method for examining motor simulation in language production that has two advantages over to the currently available work. First, rather than using a priming method, we aspire to induce direct interference between semantic

processing and the motor actions underlying speech production. Thus we can provide

1

The task required participants to produce a sentence that involved two given words. These words were selected such that participants were likely to produce sentences describing motion in a particular direction relative to the body (e.g. “plug” and “outlet”).

(5)

evidence that these two capacities are not only closely connected (as follows from the aforementioned priming studies) but in fact reliant on overlapping neural resources. Second, our paradigm allows us to address questions about the temporal aspects of motor simulation in speech production, which have thus far not been answered. Examining how semantic content affects speech performance on individual words in a produced sentence can inform us at what moment motor simulation is performed relative to articulation of the relevant action verbs and for how long it persists.

Overview of the method

According to the literature discussed before, comprehending linguistic descriptions of actions exploits neural circuits specific to the parts of the body they are performed by. If the same is true for semantic encoding in language production, it follows that speaking about actions performed with the mouth involves capacity sharing: the mouth-specific motor areas that are involved in semantic processing are concurrently recruited for speaking itself. Given that capacity sharing causes impaired performance (Navon and Gopher 1979; Tombu and Jolicœur 2003), we predict that when people speak about mouth-related actions, their speech performances are comparatively poor.

For measuring speech performance, speech error research provides the most established methods. Psycholinguists have designed ways to experimentally elicit speech errors on a variety of levels of linguistic analysis (for a review, see Baars 1992). In order to assess which type of error is the most appropriate candidate measure for the current

experiment, it is vital to consider the main tenets of psycholinguistic models of speech production.

Early ‘serial processing’ speech production models (Bock and Levelt 1994; Fromkin 1971; Garrett 1975; Garrett 1980) and ‘spreading activation’ models (Dell 1986; Dell et al.

(6)

1999) account for speech errors in different ways, but roughly agree on processing stages they can be attributed to. Roughly speaking, most models assume that speech production involves (1) specifying the message one aims to formulate, (2) encoding this message into a lexically and syntactically apt verbal phrase, (3) encoding the corresponding phonological sequences, and (4) producing the appropriate phonetic sounds for articulating it. The many different kinds of speech errors that have been reported by the literature (for a review, see Stemberger 1983) can be classified according to these levels of encoding.

In the current study, our interest is merely on the errors that occur on the level of phonological and phonetic encoding, for these have been found to be sensitive to (pre)motor malfunctioning (e.g. Bradford and Dodd 1994). Errors on this level therefore are candidates for being affected by simulation-induced motor activity. For eliciting such errors we employ two variants of the “tongue twister” paradigm (Kupin 1982; Shattuck-Hufnagel 1983;

Wilshire 1999; Wilshire and McCarthy 1996), which requires participants to repeatedly recite sequences of phonetically similar words (e.g. “moose knife noose muff”). This is known to generate errors at the level of phonological planning as well as on the level of articulatory implementation (Goldrick and Blumstein 2006). Moreover, it allows for the use of relatively naturalistic linguistic stimuli. With the purpose of studying speech errors as an indicator of motor interference in sentence production, we modify this paradigm such that the stimuli comprise natural sentences.

In order to evaluate the working of our modified paradigm, we explore to what extent the error patterns it renders resemble those reported in Wilshire’s (1999) study, which uses stimuli comprising nonsense word sequences. On the basis of the same data we address our main research question: do people make more speech errors when producing mouth-related sentences compared to mouth-unrelated ones? Finally, we analyze what the observed speech

(7)

error patterns reveal about the temporal dynamics of motor simulation during speech production.

Experiment 1

Many effects reported in artificial error induction studies are observed in naturally occurring speech error analysis as well (Stemberger 1992). The external validity of tongue twister studies that use artificial stimuli consisting of nonsense sequences of monosyllabic nouns (Kupin 1982; Shattuck-Hufnagel 1983; Wilshire 1999), however, has never been explicitly tested. In order to interpret the findings of this study with respect to the main research questions, it is useful to understand how our ‘naturalistic’ tongue twister paradigm relates to the more artificial paradigms that lie at its basis. For this reason, we first contrast our results to those of Wilshire (1999), who conducted a detailed analysis of artificially constructed tongue twisters.

Subsequently, we examine the impact of an additional manipulation that we

committed: the stimulus set is constructed such that it includes sentences that describe related actions as well as paired, phonetically minimally different sentences that are mouth-unrelated. Since these two groups of sentences differ only in their semantics, this setup allows us to probe whether tongue twisters generate more errors when they describe actions

performed using the mouth.

Participants

Sixty UCSD undergraduate students aged between 18 and 24 (µ= 21.07, σ= 1.48; 19 males, 41 females) served as participants in exchange for course credit. All had acquired English before age five and reported normal or corrected-to-normal vision and hearing.

(8)

Thirty-six pairs of five-word sentences were devised. Each pair differed only in the verb; one described a mouth-related action (e.g. “the roadie licks large raisins”) and the other was phonologically similar but was mouth-unrelated (e.g. “the roadie lacks large raisins”). The two verbs in each pair differed at most by three phonemes, either as a result of

substitution (e.g. “screeches” vs. “scratches”) or addition/omission (“chomps” vs. “chops”). Because the purpose was to induce speech errors, the four content words all began with one of two similar onsets, in an ABBA order, as in the examples above. This ABBA design was based on a paradigm first devised by Shattuck-Hufnagel (1983), in which participants repeat nonsense sequences of four words in an ABBA pattern. This has been demonstrated to evoke a high speech error rate, particularly on the third word in the four-word sequence (Wilshire, 1999), but this effect has never been replicated for natural sentences like the ones we used. The two competing onsets in each sentence were either articulatorily similar consonant clusters (e.g. /kl/ and /kr/) or intrusion-prone phoneme pairs (e.g. /l/ and /r/), based on the Phoneme Confusion Matrix (Shattuck-Hufnagel and Klatt 1979).

For the purpose of examining temporal aspects of simulation-driven error patterns, the stimuli were divided into three groups on the basis of the position of the verb. Sentences in the Verb-2 condition had a Noun-Verb-Adjective-Noun structure (“the pirate beats/bites bad people”); sentences in the Verb-3 condition had an Adjective-Noun-Verb-Adverb structure (e.g. “the blunt baker belches/bulges blushfully”); sentences in the Verb-4 condition had an Adjective-Noun-Adverb-Verb structure (“the pale priest presently pukes/peaks”).

To prevent facilitation of the task by prediction of phonological arrangement, critical stimuli in each block were supplemented by six filler items that shared the grammatical structure with the other stimuli but had a different order of onset phonemes (ABAB rather than ABBA).

(9)

Norming. To ensure that the stimuli in the mouth condition and the non-mouth condition were not different in ways that could affect speech error rate, other than their semantics, several norming procedures were conducted.

A set of 16 UCSD undergraduate students who did not participate in the main

experiment, completed a sensibility rating task, in which they rated each sentence on a 7-point Likert scale that ranged from 1 ("Not sensible at all”) to 7 (“Completely sensible”). Mouth sentences and non-mouth sentences were overall rated approximately equally sensible2.

A different set of 17 UCSD undergraduate students, who did not participate in the main experiment, completed a survey in which they were asked to judge whether the verbs in our stimuli described actions that are typically performed using (a part of) the mouth. The results confirmed that the verbs qualified as mouth-related were indeed thought of as describing a mouth-action3.

To examine differences in semantic relatedness of words within each sentence, we conducted latent semantic analysis (Landauer and Dumais 1997). For each sentence, we computed LSA indexes for all possible word pairs that involved the verb. The resulting average LSA scores were not different across conditions4.

2

Mouth-items scored 5.16, non-mouth items scored 4.96 (p > 0.27). Within the three verb-position groups, sensibility ratings of mouth and non-mouth sentences also did not differ significantly (p > 0.85 for the verb-2 condition; p > 0.24 for the verb-3 condition; p > 0.07 for the verb-4 condition).

3

The mouth-verbs were thought to be performed using the mouth in 92.0% of the responses, whereas this was true for the non-mouth verbs in 5.7% of the responses. Within each group these scores were 90% vs. 26% (verb-2), 90% vs. 5% (verb-3) and 98% vs. 6% (verb-4).

4

The resulting average LSA score in the mouth-related condition was 0.061; in the mouth-unrelated sentences this was 0.062 (p > 0.96). Within each of the verb-position groups, this analysis does not indicate a significant differences either (p > 0.26 for the verb-2 condition; p > 0.58 for the verb-3 condition; p > 0.46 for the verb-4 condition).

(10)

The number of phonemes in the verbs in each condition was approximately equal5. Also, cases where the phonemes that distinguished the mouth-sentence from the non-mouth sentences had another occurrence in the rest of the sentence were balanced out6. Finally, there were no substantial differences in word frequencies between conditions7.

Procedure

In each trial, participants were presented with a question and a cue for an answer. They were instructed to combine these into a complete sentence and say that sentence out loud four times. Participants were told to maintain their rhythm as best they could in case of any disfluencies or memory difficulties.

The question and the cue each contained two of the four content words that comprised the resulting sentence (e.g. for the roadie licks large raisins the question was what does the roadie lick? and the cue was large raisins). The question and the cue were presented for 4500 ms and 3500 ms, respectively, in black 18 point Courier New Font on a 22 inch computer screen on a white background. Subsequently, the word “Speak!” appeared centered on the screen and a sequence of dots in the bottom of the screen appeared over time to indicate the desired speaking pace. A new dot appeared to the right of the previous dot for each

subsequent word, with vertical lines after every five words indicating sentence boundaries (see figure 1). Time intervals were 600 ms for content words and 400 ms for the initiating

5

168 in the mouth-condition vs. 172 in the non-mouth condition (p > 0.68)

6

There were such 12 occurrences in both mouth-relatedness conditions.

7

Word frequencies are based on the British National Corpus. log-frequency was 6.13 for mouth-words and 6.49 for non-mouth words; p>0.37. Within the verb-position groups, no significantly differences were detected: p > 0.15 for the verb-2 condition; p>0.67 for the verb-3 condition; p>0.73 for the verb-4 condition.

(11)

article (which was always "the"). Between trials, a blank screen appeared for 3000 ms. Stimuli were presented in randomized order within each block.

Figure 1. Sequence diagram of experiment 1

The three types of sentences (Verb-2, Verb-3 and Verb-4) were presented in separate blocks. Order of blocks was counterbalanced (by permutation) across participants. Within these blocks, sentences were randomized. The sentences were counterbalanced across two lists such that participants never saw both the Mouth and Non-Mouth versions of a given sentence. For training purposes, each block was preceded by two demonstration trials and six practice trials.

Coding and error classification

Transcribing the audio recordings was done offline by three coders, one of whom was unaware of the research hypothesis. All word responses were categorized into one of four classes. The class of ‘target errors’ consisted of only those responses that involved a

substitution of onset phonemes in absence of any other errors (e.g. “the pirate bites”  “the pirate pites”), also if self-corrected (e.g. “the pirate p-bites”). ‘Nontarget errors’ were all other erroneous responses. These included onset errors with no apparent contextual source (e.g. “the

(12)

pirate mites”), nucleus and coda errors, disfluent attempts (e.g., “p-pirate”), partial word omissions and any combination of these. Word responses for which a target error and a nontarget error was committed (e.g. “the pirate bites  the pirate points) were classified as nontarget errors.

Word omissions were marked for exclusion from analysis. In order to sustain comparability among trials, all words that were preceded by at least three word omissions were discarded as well. Moreover, if a subject on a given trial spoke evidently out of the indicated rhythm, the entire trial was excluded from analysis.

Statistical analysis

We analyzed the data on two levels of specificity. For the word-level analysis, all individual word responses (except for the articles) were treated as single data points. For each subject and for each item, error rates were computed by averaging the error rates committed on (1) each of the four recitations, (2) each of the four renditions of the individual words and (3) each of the mouth-relatedness conditions.

With the purpose of comparing our findings to Wilshire’s (1999) analysis, we used the same statistical method: a bootstrapping resampling procedure (Efron 1981) with Bonferroni corrections for familywise error. T-values are reported separately for the subject analysis and for the item analysis (indicated by S and I subscripts). For all omnibus tests and interaction effects we used repeated measures ANOVA and repeated measures MANOVA. For these analyses, F-values are reported separately for the subject analysis (Fs) and for the item

analysis (Fi).

As to avoid skewed results as a result of causal dependencies between error rates on successive words, additional analyses were carried out on a sentence-by-sentence level. For these analyses, each rendition of a sentence was coded in a binary manner, based on whether

(13)

it involved any error or no error at all. The resulting sentence-by-sentence error corpus was analyzed in the same way as the word-by-word error corpus.

Results

The tongue twister task was sufficiently challenging to elicit speech errors, but it also brought about inaccuracies of other sorts. For reasons described in the coding section, a substantial fraction (21.3%) of the response items was excluded from analysis. The remaining speech error corpus consisted of 34560 words, of which 1090 were errors (3.15% of all

responses). Of all errors, 827 were classified as target errors and 263 were nontarget errors.

This error corpus is analyzed in the light of our three research questions.

Do error patterns induced by ‘natural language’ tongue twisters mirror those induced by tongue twisters comprising nonsense word sequences? Two variables are of our particular interest: we aim to examine whether the effects of recitation number and word position (within a recitation) in our experiment reflect the corresponding findings of Wilshire (1999). We are concerned with these effects not only because they are found to be fairly robust, but also because they are relevant to our further research questions.

Figure 2 displays speech error rates as a function of recitation number in the current experiment in comparison to Wilshire’s (1999) findings.

(14)

Figure 2. Error percentages on each recitation

Akin to the observations of Wilshire (1999), we find that recitation has a strong effect on error rate (Fs = 20.4, p < 0.001; Fi = 21.5, p < 0.001; α = 0.05). Participants made relatively

few target errors on the first recitation compared to the second (ts = -6.68, p < 0.001; ti = -7.04,

p < 0.001; α = 0.008), third (ts = -7.63, p < 0.001; ti = -7.28, p < 0.001; α = 0.008) and fourth

(ts = -6.36, p < 0.001; ti = -5.56, p < 0.001; α = 0.008). The error rates on the three latter

recitations however do not differ from one another. In terms of pairwise comparisons, this pattern is exactly homologous to the results obtained in Wilshire’s version of the tongue twister paradigm.

(15)

As apparent in figure 3, speech error patterns as a function of word position (within sentence recitations) are also isomorphic to the findings reported in Wilshire (1999)8. In both studies, word position is a strongly significant predictor of error rate (Fs(3,57) = 17.9, p <

0.001; Fi(3,33) = 9.2, p < 0.001; α = 0.05) and errors peak on the first and third word. All

significant pairwise differences reported in Wilshire’s analysis are significant in the current study as well: 1 vs. 2, (ts = 5.60, p < 0.001; ti = 4.33, p-= 0.002; α = 0.008); 2 vs. 3, (ts = -6.60,

p < 0.001; ti = -4.00, p = 0.003; α = 0.008) and 3 vs. 4 (ts = 5.04, p < 0.001; ti = 3.32, p = 0.005;

α = 0.008). In addition, we observe a difference between the error rates on word positions 1 and 4 by subject (ts = 5.23, p < 0.001), but not by item (ti= 2.17, p = 0.051; α = 0.008).

8

Notably, the numbers adopted from Wilshire’s (1999) report displays include not only errors on initial positions, but also word-final consonant substitutions. The word-initial errors (corresponding to our ‘target errors’) however constitute the majority of this class, and are not reported to exhibit considerably different patterns than the non-initial errors.

(16)

Figure 3 Error percentages on each word position (within sentence recitations) in experiment 1 and Wilshire (1999)

In sum, our modified version of Wilshire’s tongue twister paradigm produces error patterns that are very similar to those produced by the original version9

Do tongue twisters that describe actions performed using one’s mouth induce more errors than those that describe mouth-unrelated actions?

9

Although the pairwise differences between error rates are much alike, the absolute error rates do not have the same magnitude. This may have a number of reasons. Most importantly, Wilshire’s ‘consonant errors’ include not only word-initial, but also word-final errors (see footnote 8). Moreover, many erroneous responses in our study may have been excluded as a result of being part of a trial that was entirely

discarded from analysis. All of the cross-experimental differences described in the discussion section may also have impacted absolute error rates, as is true for many other factors (e.g. different lab, country, time, etc.).

(17)

Main effects of mouth-relatedness in the word-level analysis were not found, regardless of whether we considered all errors (ts = -0.74, p = 0.46; ti = 0.04, p = 0.97; α =

0.05), only target errors (ts = -0.63, p = 0.53; ti = 0.14, p = 0.89; α = 0.05), or only nontarget

errors, (ts = -0.64, p = 0.53; ti= -.44, p = 0.67; α = 0.05). On the sentence level, no effect of

mouth-relatedness was found either (ts= -0.27, p = 0.78; ti= 0.05, p = 0.96; α = 0.05).

Analyzing verb-position conditions separately showed that this null result is not consistent across verb-position groups: the subject analysis shows a mouth-relatedness by verb position interaction (Fs(2,58) = 5.17, p = 0.009; Fi(2,34) = 1.51, p = 0.24; α = 0.05).

Pairwise comparisons reveal that subjects made fewer errors on mouth-related stimuli in the verb-3 condition, although this was only significant in the subject analysis (ts = -2.70, p =

0.011, ti = -1.33, p = 0.214; α = 0.017). A (non-significant) numerical trend in the opposite

direction is found in the verb-2 condition (ts = 1.15, p = 0.247; ti = 0.54, p = 0.60; α = 0.017)

Figure 4 Percentages of target errors and nontarget errors in mouth-unrelated conditions (subject analysis).

(18)

What temporal regularities do speech error patterns exhibit relative to articulation of the verb? To investigate the temporal dynamics of motor simulation during speech production, we examined the word position by verb position by mouth-relatedness interaction. If motor simulation consistently interferes with articulation immediately before, concurrent with or immediately after producing mouth-related verbs, this interaction should be significant. However, the statistical analysis shows no sign of such an interaction (Fs(6,54) = 1.25, p =

0.30; Fi(6,62) = 0.24 , p = 0.96; α = 0.05).

Figure 4 plots mean error rates on the verb and the words directly adjacent to the verb for the mouth and non-mouth condition. In line with the absence of a three-way interaction effect, this chart shows no sign of a consistent temporal locus of content-related motor interference.

(19)

Discussion

The observed isomorphisms between our findings and those reported in Wilshire (1999) are remarkable for the reason that these studies differ in at least four fundamental respects.

First of all, our stimuli are much closer to natural language than those used by Wilshire. They are not just quadruples of semantically unrelated nouns, but full sentences composed of nouns, verbs, adjectives and adverbs that are combined under grammatical constraints.

Second, as a consequence of increased naturalness, the phonemic structure of our stimuli is much less homogeneous than those in the original version. Wilshire’s stimuli are constructed to be composed of monosyllabic words that have specific phonemic patterns on onset as well as coda positions. In contrast, many of the words comprising our stimuli are multisyllabic, and they are only constrained in their word-initial phonemes. This might well affect speech behavior: Wilshire’s report, in line with later findings by McMillan et al. (2009), suggests that articulation of word strings is most error prone when subsequent words have not more than one competing feature.

A third difference concerns the setup of the experimental task. Whereas Wilshire’s participants simply had to read word sequences out loud, our study employed a question-cue procedure (with the purpose of mimicking spontaneous speech production). Participants therefore faced not only articulatory challenges but also had to cope with explicit memory demands.

Finally, the two experiments differ in their timing. Although the speech rate in terms of words per minute is the same across experiments (100 words per minute), the use of

(20)

multisyllabic words caused the average number of syllables per minute to be substantially higher in the current study (on average 167 per minute). The timeframe for articulating the four target words, moreover, was preceded by a 400 ms interval for pronouncing the article (“the”). Also, the desired speech rate was presented visually rather than by means of a metronome.

Given all of these differences, the fact that the same factors affect speech error rates in a very similar way in these different studies is rather striking. It suggests that Wilshire’s tongue twister paradigm is flexibile in terms of the nature of the stimuli it employs. What’s more, the observation that the findings in this ‘artificial’ tongue twister paradigm appear to extend to a more naturalistic setting can be interpreted as corroboration of the external validity of this paradigm.

As far as the effect of mouth-relatedness is concerned, the results do not confirm our hypotheses. Mouth-related items were found not to be more susceptible to erroneous

articulation than mouth-unrelated items. As a consequence of finding no content-related speech performance impairment, these data are not informative regarding the temporal aspects of motor simulation in speech production.

This null finding, however, does not decisively point out that motor simulation plays no role in speech production at all. There are reasons to suppose that the apparent absence of motor interference is at least to some extent due to high task demands: observed speech patterns as well as self-report data suggest that participants often completed the task without any consideration of the meanings of the sentences they said.

Although this may to some degree be due to deliberate strategies, the details of the experimental setup also plausibly contributed to this undesired observation. The words that constituted the stimuli were often fairly infrequent and semantically unrelated to each other

(21)

(according to our norming). Moreover, subjects were only exposed to these words once per trial. As a consequence, construing a coherent sentence meaning during the task may have been fairly challenging, especially given the limited timeframe that they had available to do so.

In experiment 2, the current paradigm is adjusted to overcome this drawback.

Experiment 2

By having participants memorize words rather than read them from the screen, experiment 1 imposed high working memory demands. This has undoubtedly contributed to the significant proportion of unusable data. A second concern regarding this task is that it did not sufficiently motivate participants to process the meaning of the sentences they say rather than to merely articulate the sounds associated with the phonological sequences they were presented with.

In experiment 2, the experimental paradigm was adjusted in two ways to resolve these shortcomings. Each trial was now preceded by a word-by-word description of the state of affairs described by the emergent sentence. Every noun in this description, furthermore, was supplemented with a picture.

The same research questions as in experiment 1 are addressed in the light of this visually augmented version of the experiment.

Participants

Thirty-two UCSD undergraduate students aged between 18 and 33 (µ= 21.25, σ= 2.69; 13 males, 19 females) served as participants in exchange for course credit. All were native English speakers and reported normal or corrected-to-normal vision and hearing.

(22)

Materials

Fourteen pairs of sentences were used as stimuli. Twelve of these were the stimuli from the verb-2 block used in experiment 1, two were new. For devising the new stimuli, the same criteria applied as in experiment 1.

No additional norming procedures were carried out.

All nouns in the word-by-word descriptions were supplemented by a 160 by 160 pixel colored cartoon image. These were gathered from the Microsoft Office Clip-Art collection and from a Clip-Art database powered by Google.

Procedure

The testing procedure differed slightly from the one in experiment 1. Most importantly, the task involved a word-by-word description of the situation that the stimuli described.

Before the question-cue procedure started, four short utterances appeared successively on the top of the screen for 3000 milliseconds. For the item the pirate bites bad people, for example, these statements were: there is a pirate; he bites; people; that are bad (See figure 6). For the first and the third statement in each trial, a cartoon image depicting the noun appeared in the center of the screen under the printed words.

With the purpose of better eliciting a natural speech pace, the timing of the recording phase was slightly adjusted relative to experiment 1. Participants now were given 550 ms (rather than 600 ms) to articulate each word. The time for pronouncing the article remained 400 ms.

The stimuli were not divided by verb-position, as we only used the verb-2 sentences. Apart from that, we performed the same randomization and counterbalancing procedures as in experiment 1

(23)

Figure 6. Sequence diagram of experiment 2

Coding and error classification

(24)

Results

Whilst the number of excluded data points was much lower than in experiment 1 (7.1%), the average overall error rate increased to 4.1%. The remaining speech error corpus consisted of 6879 successfully uttered words, 206 target errors and 83 non-target errors.

This error corpus is analyzed and discussed in the light of the same research questions as in experiment 1.

Do error patterns induced by ‘natural language’ tongue twisters mirror those induced by tongue twisters comprising nonsense word sequences? Recitation once again has a strong effect on error rate (Fs(2,29) = 9.87, p < 0.001; Fi(3,25) = 10.3, p < 0.001; α = 0.) Participants

made more errors on the second (ts= -3.91, p = 0.004 ti= -3.00, p = 0.038; α = 0.008), third

(ts= -3.41, p = 0.002, ti= -3.26 , p = 0.016; α = 0.008) and fourth (ts= -3.33, p = 0.004, ti =

-3.00 ,p = 0.034; α = 0.008) than on the first recitation. The error rates on the three latter recitations do not differ from one another. In terms of pairwise comparisons, this pattern is much like the results of experiment 1 as well as the results of Wilshire (1999)

The observed effect of word position also reflected the previously obtained findings. Error peaks are again on the first and third word. The pairwise differences reported in

Wilshire’s analysis are significant only in the subject-analysis: 1 vs. 2 (ts = 4.72, p < .001; ti =

3.04, p = 0.026; α = 0.008), 2 vs. 3, (ts = -5.37, p < 0.001; ti = -1.78, p = 0.18; α = 0.008), 3 vs.

4 (ts = 3.52, p = 0.003; ti = 1.71, p = 0.170; α = 0.008). In contrast to experiment 1, we do not

observe a difference between the error rates on word positions 1 and 4 in either subject or item analysis (ts= 2.36, p = 0.026; ti= 2.00, p = 0.084; α = 0.008).

Do tongue twisters that describe actions performed using one’s mouth induce more errors than those that describe mouth-unrelated actions? On the word-level, the main effect of mouth-relatedness is significant for subjects and items (ts = 2.87, p = 0.009; ti= 3.37 p =

(25)

0.015; α = 0.05) and is in the predicted direction. Analyzing the two error categories

separately, we see marginal significant effects for target errors in the subject analysis only (ts=

2.173, p = 0.038; ti = 2.75, p = 0.23; α = 0.05) and for nontarget errors in the item-analysis (ti

= 2.15, p = 0.09), but not in the subject-analysis (ts = 0.21, p = 0.054; α = 0.05).

On the sentence-by-sentence level, the main effect of mouth-relatedness holds (ts=

3.11, p = 0.004; ti= 3.20, p = 0.007; α = 0.05).

Figure 7. Percentages of target and nontarget errors on mouth and non-mouth items committed in experiment 2

What temporal regularities do speech error patterns exhibit relative to articulation of the verb? The data do not show a significant word position by mouth-relatedness interaction effect

(26)

(Fs(3,29) = 1.08, p = 0.37; Fi(3,11) = 2.03 , p = 0.17; α = 0.05). Nonetheless, as can be seen in

figure 8, the differences between error rates on mouth-related and mouth-unrelated conditions are not homogeneous across the individual word positions. Pairwise comparisons show strong numerical trends on the second (ts= 2.53, p = 0.16; ti = 2.44, p = 0.029; α = 0.013) and third

word (ts= 2.58, p = 0.16; ti = 2.71, p = 0.043; α = 0.013).

Figure 8 Error percentages on mouth items and non-mouth items on each individual word position (within sentences)

Discussion

The effects of recitation number and word position closely resemble the results of experiment 1 and are homologous to the results of Wilshire (1999). This adds robustness to the finding that these two factors evoke specific error patterns regardless of whether they

(27)

apply to nonsense word sequences or complete sentences. The adjustments relative to experiment 1 do not appear to affect these results in any way.

Unlike the findings of experiment 1, the results of experiment 2 strongly suggest that mouth-relatedness increases speech error rate. In analyses on word-level as well as on sentence-level, participants made significantly more errors on mouth-related items than on mouth-unrelated items. Since we carried out multiple adjustments relative to experiment 1, the specific contributions of the changes in stimulus presentation and timing to this effect cannot be inferred from the current data. Since the differences in timing were very subtle however, the vivid and more detailed presentation of the stimuli is most likely to be responsible for this outcome.

Although not decisively supported by statistical analyses10, the effect of mouth-relatedness on error rate is the strongest for the second and third content words of a sentence, i.e. on the verb itself and the word following the verb. Provided that this discrepancy in error rate can be interpreted as an indicator of simulation-induced motor interference, these results suggest that motor simulation is performed simultaneous with the articulation of the relevant action words, and persists over the subsequent word as well.

General Discussion

ABBA tongue twisters elicit specific error patterns even when they are composed not of nonsense word sequences but of full sentences. This supports the external validity of research paradigms such as the one used in Wilshire (1999) and shows that they are flexible enough to be employed for addressing research questions that concern natural sentence

10

Since the statistical power of the subject analysis (n=32) and the item analysis (n=28) is rather low, these results are contingent of getting more compelling results from additional participants.

(28)

processing. Using a relatively naturalistic tongue twister paradigm, we find that that mouth-relatedness impacts speech error rate, although this is only observed when stimuli were presented in a visual context. The observed divergence between experiments 1 and 2 may have been a result of the fact that participants in these two experiments relied on different linguistic features of the stimuli. Whereas participants in experiment 1 often appeared to rely on phonological features of the sentences they produced, the setup of experiment 2 was such that participants were encouraged to understand the meaning of the words they were saying. The fact that language production inherently involves semantic processing may imply that the results of experiment 2 are more likely to extend to natural communicative scenarios than those of experiment 1.

In experiment 2, furthermore, we find that the effect of mouth-relatedness on error rate is most salient for the words on the second and third position of the sentences. This could be interpreted as evidence that motor simulation of action verbs in sentence production is performed concurrently with (and not before) articulation itself and has a time span of between 1000 and 1500 milliseconds. In order to know whether this finding is robust and generalizable to sentences with a different grammatical structure, further research is needed.

Conclusion

This research contributes to literature on embodied language processing in two ways. Our findings indicate direct interference between semantics processing and motor

performance. This suggests that motor simulation in language production recruits the same neural systems that are responsible for actual performance of motor actions. Furthermore, the current study provides a novel method for examining temporal aspects of mental simulation.

In addition, our findings are relevant for psycholinguistic models of speech production. The motor interference observed in experiment 2 hints at temporal overlap between early

(29)

stages of conceptual encoding (or: ‘message formulation’) and articulation of the

corresponding acoustic signal. The assumption of sequentiality that most models of speech production to some degree hold11 thus might be valid in terms of the order in which encoding processes are accessed, but not in terms their temporal persistence. A model of speech

production that takes temporal aspects of linguistic encoding into account therefore has to adhere to an architecture that allows for parallelity between processing stages.

Since the research paradigm we used is rather novel, further research is needed to reveal the robustness and scope of our findings. The current findings raise the question of whether simulation-induced motor interference can be demonstrated using different measures than ours. Onset timing data, for instance, could be a valuable alternative measure for

articulation difficulty. Another interesting avenue for further research concerns the

generalizibility of the observed effect to other modalities. According to the same logic that underlies the current experiments, writing about manual actions should be more difficult than writing about non-manual actions.

These investigations could help to confirm that the embodied nature of semantic processing makes language production prone to self-impairment.

References

Alibali, M. W., D. C. Heath, et al. (2001). "Effects of visibility between speaker and listener on gesture production: Some gestures are meant to be seen." Journal of Memory and Language 44(2): 169-188.

Baars, B. J. (1992). "A dozen competing-plans techniques for inducing predictable slips in speech and action."

11

Although spreading activation models as well as most serial models after Bock & Levelt (1994) assume that speech production is not an entirely serial process, it is commonly acknowledged that the different encoding stages provide input for one another and are therefore accessed one after another.

(30)

Barsalou, L. W. (1999). "Perceptual symbol systems." Behavioral and Brain sciences 22(04): 577-660.

Bergen, B., T. T. C. Lau, et al. (2010). "Body part representations in verbal semantics." Memory & cognition 38(7): 969-981.

Bergen, B. and K. Wheeler (2005). Sentence understanding engages motor processes. Proceedings of the 27th Annual Conference of the Cognitive Science Society, Mahwah, NJ: Erlbaum.

Bock, K. and W. Levelt (1994). Language production: Grammatical encoding. Handbook of Psycholinguistics. Gernsbacher. New York, Academic Press: 945 - 984.

Bradford, A. and B. Dodd (1994). "The motor planning abilities of phonologically disordered children." International Journal of Language & Communication Disorders 29(4): 349-369.

Buccino, G., L. Riggio, et al. (2005). "Listening to action-related sentences modulates the activity of the motor system: a combined TMS and behavioral study." Cognitive Brain Research 24(3): 355-363.

Burke, M. and M. Feist (unpublished). When Actions Speak: Embodied Effects on Non-Literal Use of Spatial Terms. Abstract submitted to CSDL conference 2012. Vancouver.

Casasanto, D., S. Lozano, et al. (2006). Metaphor in the mind and hands. Proceedings of the 28th Annual Conference of the Cognitive Science Society, Mahwah, NJ: Erlbaum.

Dell, G. S. (1986). "A spreading-activation theory of retrieval in sentence production." Psychological review 93(3): 283.

Dell, G. S., F. Chang, et al. (1999). "Connectionist models of language production: Lexical access and grammatical encoding." Cognitive Science 23(4): 517-542.

Efron, B. (1981). "Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods." Biometrika 68(3): 589-599.

Fromkin, V. A. (1971). "The non-anomalous nature of anomalous utterances." Language: 27-52.

Garrett, M. (1975). "The Analysis of Sentence Production." The psychology of learning and motivation: Advances in research and theory 9: 133.

Garrett, M. F. (1980). "Levels of processing in sentence production." Language production 1: 177-220.

Glenberg, A. M. and M. P. Kaschak (2002). "Grounding language in action." Psychonomic bulletin & review 9(3): 558-565.

(31)

Glenberg, A. M. and D. A. Robertson (2000). "Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning." Journal of Memory and Language 43(3): 379-401.

Goldrick, M. and S. E. Blumstein (2006). "Cascading activation from phonological planning to articulatory processes: Evidence from tongue twisters." Language and Cognitive Processes 21(6): 649-683.

Hostetter, A. B. and M. W. Alibali (2008). "Visible embodiment: Gestures as simulated action." Psychonomic bulletin & review 15(3): 495-514.

Krauss, R. M. (1998). "Why do we gesture when we speak?" Current Directions in Psychological Science 7(2): 54-60.

Kupin, J. J. (1982). Tongue twisters as a source of information about speech production, Indiana University Linguistics Club.

Landauer, T. K. and S. T. Dumais (1997). "A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge." Psychological Review;

Psychological Review 104(2): 211.

McMillan, C. T., M. Corley, et al. (2009). "Articulatory evidence for feedback and competition in speech production." Language and Cognitive Processes 24(1): 44-66.

Navon, D. and D. Gopher (1979). "On the economy of the human-processing system." Psychological review 86(3): 214.

Sato, M. (2010). Message in the “Body”: Effects of Simulation in Sentence Production. Linguistics.Unpublished Ph.D. Dissertation

Shattuck-Hufnagel, S. (1983). "Sublexical units and suprasegmental structure in speech production planning." The production of speech: 109-136.

Shattuck-Hufnagel, S. and D. H. Klatt (1979). "The limited use of distinctive features and markedness in speech production: Evidence from speech error data." Journal of Verbal Learning and Verbal Behavior 18(1): 41-55.

Stemberger, J. P. (1983). Speech errors and theoretical phonology: A review. Bloomington, Indiana University Linguistics Club.

Stemberger, J. P. (1992). The reliability and replicability of naturalistic speech error data: A

comparison with experimentally induced errors. Experimental Slips and Human Error: Exploring the Architecture of Volition. B. Baars. New York, Plenum Press: 195 - 215.

(32)

Tettamanti, M., G. Buccino, et al. (2005). "Listening to action-related sentences activates fronto-parietal motor circuits." Journal of Cognitive Neuroscience 17(2): 273-281.

Tombu, M. and P. Jolicœur (2003). "A central capacity sharing model of dual-task performance." Journal of Experimental Psychology: Human Perception and Performance 29(1): 3.

Wilshire, C. E. (1999). "The “tongue twister” paradigm as a technique for studying phonological encoding." Language and Speech 42(1): 57-82.

Wilshire, C. E. and R. A. McCarthy (1996). "Experimental investigations of an impairment in phonological encoding." Cognitive Neuropsychology 13(7): 1059-1098.

Zwaan, R. A. and L. J. Taylor (2006). "Seeing, acting, understanding: motor resonance in language comprehension." Journal of Experimental Psychology: General 135(1): 1.

Referenties

GERELATEERDE DOCUMENTEN

We assume that the online community involvement (contain online communities dependence and virtual relationships) and trust tendency will affect the intensity of negative

● The filtering process resulted in 2378 privacy failure-related FB comments to analyze (negative eWOM) ● Valence significantly more negative for all companies in the filtered

For a one year period for each company, with the filtered valence data there are in total a volume of 2378 observations (privacy related FB comments): 1591 for Target, 46

Therefore, this study integrates the constructs of the personality traits extraversion (E), neuroticism (N), and the personal values ‘being well-respected’ (BWR),

The research focuses mainly on the moderating role of customer commitment and the perceived reliability of online information sources for customers, when

Results could not confirm the hypothesis that messages sent by a company to customers based on underlying motives to engage in positive WoM behavior, significantly

Blue Moon (a Molson brand) has used WOM to build the brand, it has focused on cities where full bodied beers sell well (Chicago) and relied on those whose opinions beer drinkers

This study assessed the effect of visitors’ personality and emotional response on finding positive mean- ing in life and the intention to spread positive word of mouth.. The