• No results found

The Contribution of the Motor and Auditory Cortex in Priming Action and Sound Verbs: a Pilot Study Monica Vanoncini

N/A
N/A
Protected

Academic year: 2021

Share "The Contribution of the Motor and Auditory Cortex in Priming Action and Sound Verbs: a Pilot Study Monica Vanoncini"

Copied!
58
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Contribution of the Motor and Auditory Cortex in Priming Action and Sound Verbs: a Pilot Study

Monica Vanoncini1,2

1Faculty of Arts - Rijksuniversiteit Groningen; 2Center for Language and Brain - HSE University

Master’s Thesis

Supervisors: Dr. Adrià Rofes1 and Dr. Olga Dragoy2

August 25th, 2020

Student details

Student Number: S3857514

Address: 54/A Locatelli, Almè (BG), Italy Email: m.vanoncini@student.rug.nl

(2)

Abstract

Introduction. According to embodied cognition theories, concepts are represented in sensory and motor

brain systems depending on their weight of sensorimotor features (e.g., motor, acoustic, visual, emotion). In that regard, action verbs (e.g., to carry) have higher association with motor features, whereas sound verbs (e.g., to sing) with auditory features. These representations can emerge as a priming effect (i.e., by making the target processing faster and/or more accurate) or an antipriming effect (i.e., by making the target processing slower and/or less accurate).

Aim. This pilot study assesses whether activation of the sensorimotor cortex, particularly motor and auditory

cortex, differently influences the visual lexical-decision of action and sound verbs.

Methods. Seventy-five Russian-speaking healthy adults participated in the experiment. An online

lexical-decision task with cross-modal feature-repetition priming was administered. Participants saw a video clip of a moving hand (i.e., hand prime), heard a bike bell sound (i.e., sound prime), or saw a static video clip (i.e., neutral prime). After that, they saw a verb in the written form (i.e., hand-related action verb, sound verb, pseudoverb) and had to decide whether the word they read was a word or not by pressing a button.

Results. Linear mixed effect models indicated no priming effects between hand prime and action verbs, nor

between sound prime and sound verbs. However, action verbs were significantly more accurate than sound verbs.

Discussion. The results showed a somatotopic verb-motor priming effect for action verbs due to the manual

response, which supports the contribution of motor cortex in the lexical decision of action verbs. The non-significant interactions between hand prime and action verbs, and sound prime followed by sound verbs might be due to the limitation of meaningless (i.e., hand prime) and general (i.e., sound prime) primes.

Conclusion. Our findings support embodied cognition theories and await further replication.

Recommendations and future directions are discussed.

(3)

Acknowledgements

I would like to take this opportunity to thank the European Commission for providing me with a scholarship for the duration of the two-year-Master’s programme, which gave me the opportunity to realize one of my dreams: to go abroad and study.

I would like to express my deepest gratitude to my supervisors, Dr. Adrià Rofes and Dr. Olga Dragoy, for giving me the chance to carry out this pilot study and for their guidance throughout each stage of the process. I also would like to acknowledge Professor Roelien Bastiaanse for our insightful discussions in the field of embodied cognition and her support during an exceptional time in Moscow.

Additionally, I would like to thank the members of the Center for Language and Brain in Moscow. Particularly, Victoria Pozdnyakova for helping me with participant recruitment, and with the stimuli preparation together with Dunya Novogilova and Tatyana Bolgina. Sincere thanks to Andrey Shulyatyev for the development of the pseudowords generation script and his kind permission to use it for this study.

I am grateful to my friends and colleagues from the European Master’s in Clinical Linguistics for making these two years a memorable journey.

Finally, I would like to thank Nicolò for encouraging me throughout this experience, starting from when this programme was just on my wish list.

(4)

Contents Abstract ... 2 Acknowledgements ... 3 LIST OF TABLES ... 6 LIST OF FIGURES ... 7 Introduction ... 8

Criticisms to embodied cognition ... 10

Verbs and embodied cognition ... 13

Action and sound verbs ... 14

The case of sound and action verbs and mirror neurons ... 18

Priming and cross-modal experiments ... 20

Aim, research questions, hypotheses and predictions ... 22

Methods... 23

Participants ... 23

Stimuli and stimuli preparation ... 23

Conditions ... 27 Design ... 28 Procedure ... 30 Data analysis ... 30 Results ... 33 Accuracy data... 33

Reaction time data... 35

Discussion ... 38

The contribution of motor cortex in processing action verbs ... 39

The contribution of auditory cortex in processing sound verbs ... 42

Summary ... 43

Limitations and future directions ... 44

Conclusion ... 45

References ... 46

APPENDIX A: Online consent form (translated in English) ... 52

APPENDIX B: Experiment Lists... 53

(5)
(6)

LIST OF TABLES

Table 1 Transitivity and instrumentality across lists ... 25

Table 2 Matching of conceptual and psycholinguistic stimulus features within list A ... 26

Table 3 Matching of conceptual and psycholinguistic stimulus features within list B ... 26

Table 4 Matching of conceptual and psycholinguistic stimulus features between lists ... 27

Table 5 Summary generalized linear mixed model M2 for Accuracy ... 33

(7)

LIST OF FIGURES

Figure 1 Perceptual and amodal systems ... 10

Figure 2 Prominent theories of conceptual representation ... 11

Figure 3 The grounded meaning of action verbs ... 15

Figure 4 Semantic somatotopy model of action word processing ... 16

Figure 5 The grounded meaning of sound verbs ... 17

Figure 6 Trial presentation structure ... 29

Figure 7 Observed and fitted densities to RT ... 32

Figure 8 Mean percentage of correct answers by verb types ... 35

Figure 9 Kernel density estimation of RT measures by verb types ... 36

Figure 10 RTs measures by verb type ... 38

(8)
(9)

Introduction

How language relates to real entities in the world and handle interactions with the environment? Could it be that concepts are represented in sensory and motor brain areas? Can verbs be rooted in sensory, other than motor, brain areas? Are sensory-motor areas automatically activated by the processing of verbs? The debate concerning the nature of concept representations owes its origins in the field of philosophy: under the rationalist account, concepts are mental entities based upon innate categories, totally distinct from perception; whereas, according to the empiricist view, significant knowledge is gained through sensory experiences (Kiefer & Pulvermüller, 2012).

In the last decades, modality-specific approaches resulting from this debate have stated that cognition is closely linked to perceptual and motor brain areas/networks. Therefore, the conceptual system not only keeps a link with its sensory-motor origin, but also entails a weak reactivation of a sensory-motor experience (Iachini, 2011). Thus sensory concepts, such as “feel”, have a sensory format, whereas action concepts, such as “pull”, have a motor format (for a review, see Mahon & Hickok, 2016). This approach goes under the label of embodied cognition.

The theory of embodied cognition claims that the brain areas activated during sensory and motor experiences are activated while processing of linguistic material related to these experiences, in practice supporting the existence of perceptual symbol systems (Barsalou, 1999). Under this view, perceptual states in sensory-motor systems are extracted and stored as perceptual symbols in long-term memory. These perceptual symbols are modal, as they are represented in the same system as perceptual states were presented, and analogical, as their structure is similar to the ones of the original perceptual states. This symbol formation process operates on different aspects of perceived experience: the sensory modalities (i.e., vision, audition, haptics, olfaction, and gustation), as well as proprioception and introspection. That results in the storage of several symbols, each of them rooted in its brain area. For instance, from the auditory system people acquire perceptual symbols for sounds which become stored in auditory areas; from proprioception, people acquire perceptual symbols for movements and positions and these symbols are set in motor areas. Perceptual symbol systems were proposed in contrast to traditional amodal symbol systems, according to which perceptual states in sensory-motor systems are converted into a new representational system (Barsalou, 1999). These new symbols are amodal, as they do not

(10)

have any correspondence with the perceptual states, and arbitrary, due to their conventional associations with the reference (see Figure 1; Barsalou, 1999).

Note. Perceptual (left) and amodal (right) symbol systems represented for the verb “to sing”. Linguistic

forms are typically used to represent amodal symbols. Based on Barsalou, 1999.

An experimental framework that can include the three components essential for the embodiment (i.e., perception, action and cognition) is priming (see section Priming and cross-modal experiments): in fact, the effect that perceiving a certain prime (perception) can have on answering the target (action) would enlighten on cognition.

Criticisms to embodied cognition

Recently, several researchers have argued that it is not possible to tackle embodied cognition by choosing, in a binary way, a disembodied or an embodied view. Rather, the problem concerns the gradations of embodiment in a broader space. Thus, theories of semantics are better represented into a continuum varying in terms of the degree of separation between perception/action and cognition (see Figure 2).

Figure 1

(11)

Note. Taken from Binder and Desai (2011).

Its extremities are represented by the unembodied view on one side, and the strong embodiment on the other side, both above mentioned. In the middle there are secondary embodiment (or grounding by interaction) and weakly embodied theories (or embodied abstraction). The former theories propose that cortical areas representing concepts do not overlap with areas dedicated to deal with sensory and motor information. However, this independence does not preclude an interaction with the perceptual systems (Mahon & Caramazza, 2008). The latter theories posit that conceptual representation gradually converge in multiple modality-specific systems: the extent to which the perceptual system mediates the semantic content depends on different factors, such as task demands, context, frequency, familiarity (Binder & Desai, 2011). Also, they predict that during semantic processing there is an engagement of areas anterior or adjacent to primary sensory and motor cortices (Vigliocco et al., 2004; for a review see Meteyard et al., 2012). According to Meteyard and colleagues (2012), in a theoretical review, the extremities of this continuum lack of support, whereas there might be convergence zones along with the activation of modal content. Similarly, Binder and Desai (2011) have suggested that embodied

Figure 2

(12)

abstraction is the most compatible view with the current evidence. Nowadays, although there are many methods that explore neural activity during conceptual processing, the debate is still open.

Three major objections have been challenging the embodied views. Firstly, it appears not clear whether the debate is about the nature of concepts refers to the conceptual format or the conceptual content (Mahon & Hickock, 2016). The embodiment of the format would be determined by the content of the concept; for instance, an action concept (e.g., to grasp) has a motor format. On the other hand, the embodiment of the content does not automatically entail the embodiment of the format, which can be amodal; that is, an action concept (e.g., to grasp) shares a common format with a visual concept (e.g., beautiful).

Secondly, activations in perceptual and motor brain systems observed in imaging studies (such as MEG and EEG), due to their own nature, cannot provide a clear reason for activations in certain brain regions, thus they might not be causally related to comprehension. That is, they potentially reflect the mental simulation of word meaning, or other cognitive functions (e.g. cognitive control, memory retrieval, prediction, information integration) (Hauk & Tschentscher, 2013).

Thirdly, an activation may reflect post-lexical processing, such as mental imagery or strategic responses to the task (Hauk & Tschentscher, 2013; Meteyard et al., 2012). For instance, the lexico-semantic processing of the word “to ring” might recall episodic memories (e.g., when was the last time one heard the telephone ringing). That is, any resulting activation in the auditory cortex might be due to the recall of this episode, rather than the pure and immediate semantic decoding of that word. Hence, the importance of distinguishing between early semantic access, occurring within 200-300ms after stimulus presentation, or late conceptual re-processing (Hauk et al., 2008) such as mental imagery, defined as a quasi-perceptual experience occurring without a proper external stimulus (Thomas, 2008).

In the effort to overcome these issues, Kiefer et al. (2008) recognized four markers that must be fulfilled to test the nature of concept representations. The task used (1) has to implicitly retrieve conceptual features, for example by using a lexical decision task; (2) it should activate perceptual region in order to identify the overlap between conceptual and perceptual processing; brain areas should respond (3) rapidly and (4) selectively, in order to exclude any post conceptual strategic process occurring after a concept has been fully accessed. The assumption behind a lexical decision task is that participants, in order to decide whether it is a word or not, have to

(13)

check whether it corresponds to a lexical representation. Therefore, the time and the accuracy required to make a decision give an estimation of the operations needed to access the lexical representation. Particularly, for the lexicality effect, it has been shown that pseudowords are classified more slowly, but not less accurate, than words (Carreiras et al., 2007). In addition, response time and accuracy are also affected by the context provided (e.g., prime). Finally, a visual lexical decision task requiring a finger press response brings an increased activation in manual responses (Carreiras et al., 2007): it is fundamental to keep that in mind when testing the nature of hand-related concepts.

Verbs and embodied cognition

Verbs represent the predication, describing events and relations among entities; whereas nouns represent the denotation, referring to entities. Verb tasks, currently used in language mapping procedures, are claimed to involve at least partially distinct neural substrates compared to noun tasks because they engage a larger number of cognitive and linguistic resources (Rofes & Miceli, 2014). Verbs indeed have more complex semantic representations than those of nouns: while nouns are organized into semantic hierarchies, the semantic network of verbs is mutable. Moreover, each verb carries its type and number of arguments; whereas nouns act grammatically similarly. Finally, verbs tend to be less imageable than nouns and, remarkably, across languages verbs are more morphologically complex than nouns (for a review, see Mätzig et al., 2009). All these factors may play a role in their different timing of acquisition and different vulnerability in brain damage and language mapping procedures. However, if we base our assumption on the embodied cognition theories, the neural distinction is not necessarily due to the noun-verb dichotomy, but to the weight of sensorimotor features carried by each concept (e.g., the concepts of cinnamon and to smell might have similar representations).

Differences within verbs are based on grammatical, conceptual lexical and conceptual semantic variables. Grammatical features include transitivity (i.e., whether a verb requires a direct object, e.g., to buy vs. to walk; Bastiaanse & Van Zonneveld, 2004), number of arguments (e.g., to smile vs. to insert; Thompson, 2003), regularity (e.g., to fit vs. to hit). Conceptual lexical features comprise name relatedness to a noun (i.e., verb-noun homophony, e.g. an iron vs to iron; Jonkers & Bastiaanse, 2007), familiarity (i.e., the extent to which a word is common or usual in one’s experience, e.g. to write vs. to incise), age of acquisition (i.e., the estimated age at which one

(14)

learned the word), word length, frequency (i.e., the number of occurrences of a word in a large corpus). Conceptual semantic features are instrumentality (i.e., whether a verb is referring to actions for which an instrument is required, e.g. to mop vs to polish, Jonkers & Bastiaanse, 2007), imageability (i.e., the degree to which a word evokes a mental image, e.g. to cut vs. to forget; Paivio et al., 1968), concreteness (i.e. the extent to which a word can be experienced by the senses, e.g. to shout vs. to accept; Paivio et al., 1968; Buccino 2019).

These latter features, referring to conceptual semantics, represent the major interest in the embodied cognition framework. Particularly, we take into account concrete concepts, the ones related to an immediate sensory-motor experience. Abstract concepts go beyond the purpose of this study. On the one hand, most nouns are associated with a distinct sensory-motor modality: olfactory (e.g. cinnamon), visual (e.g. photography), motor (e.g., race), haptics (e.g. screen), hearing (e.g. radio). On the other hand, it is hard to find verbs mainly entailing a sensory experience, other than a motor one. That is, when the focus is on verbs, vision, hearing and haptics are usually degraded, or even neglected. However, let us think about the concept of playing ukulele. Probably, we are trying to recreate all the aspects assumed to be relevant for the simulation, through partial reactivation of the neural ensembles originally recruited by the experience or concept (Iachini, 2011). There might be multi-modal (sensory and motor) associations: a small guitar (visual) having certain tone and volume (auditory), with its chords giving specific feedback on the fingertips (haptics). In addition to this, there might be a simulation of the different actions led by the left and the right hands (motor). In this thesis we will focus on two very specific types of verbs: action verbs and sound verbs.

Action and sound verbs

Action verbs are characterized by a higher association with the motor feature compared to other conceptual features (e.g., emotion, auditory, visual) (see Figure 3).

(15)

Note. The meaning of action verbs, in this case "to open", might be grounded in the motor cortex (1).

Secondary, it might be linked to auditory cortex (2), somato-sensory cortex (3), visual cortex (4). Based on Buccino et al. (2019).

Many studies focused their attention on action verbs, particularly those actions conducted with specific body parts. For instance, in a functional Magnetic Resonance Imaging (fMRI) study the passive reading of action words referring to face, arm, or leg actions (e.g., to lick, pick, or kick) has been shown to activate areas along the motor strip that seem to overlap, or are adjacent, with the activation pattern observed for the actual movement of tongue, fingers, or feet (Hauk et al., 2004). Similar results have been found during passive listening to sentences including mouth-, hand- or leg-actions (Tettamanti et al., 2005). Therefore, action words are defined as semantic links that bind language and action at the cortical level (see Figure 4): an exchange of information has been observed by differences in language processing when the corresponding motor site is stimulated (Pulvermüller et al., 2005).

Figure 3

(16)

Note. Taken from Pulvermüller, 2005.

In infancy, children performing for the first time an action are reinforced by their caregiver who typically uses the action word (Pulvermüller, 2005). Thus, the motor program and the representation of the word are activated almost simultaneously (Pulvermüller, 2005). Remarkably, sensorimotor networks seem to be recruited even for words learned after middle-childhood, including adulthood (for a review, see Kogan et al., 2020).

Sound verbs are those with a higher association with the auditory feature compared to other conceptual features (e.g., motor, emotion, visual) (see Figure 5).

Figure 4

(17)

Note. The meaning of sound verbs, in this case "to yell", might be grounded in the auditory cortex (2).

Secondary, it might be linked to motor cortex (1) and emotion-related brain regions (5). Based on Buccino et al. (2019).

Fewer studies have focused on this type of verbs. Studies on human action sound identification have shown a cortical distinction between the processing of sounds made by human, which activate the action-sound network, and sounds produced by nonliving sources, activating areas related to visual form, feature and object recognition (Engel et al., 2009; Lemaitre et al., 2018).

Promising findings with nouns (Kiefer et al., 2008), recently brought a research group to compare action-related and sound-related verbs for the first time (Popp et al., 2016; Popp et al., 2019a; Popp et al., 2019b). Popp et al. (2016) assessed whether feature-specific Event-Related Potential (ERP) differences between action and sound verbs in German. In one experiment, participants were visually presented a lexical decision task: action verbs elicited a more positive scalp potential at the parietal scalp region, sound verbs elicited a more negative scalp potential at

Figure 5

(18)

a central scalp region. In the second experiment, they tested whether pre-activating the verb concept in a context phase, in which the verb is presented with a related context noun (e.g. ball - to throw), modulates subsequent verb processing within the lexical decision task. Feature-specific ERP effects exhibited a reversed polarity, presumably due to a specific deactivation of the motor or auditory brain areas.

To overcome the limited spatial resolution of ERP, in a following study, they conducted a similar study using fMRI (Popp et al., 2019b). Sound-related verbs bilaterally recruited the superior temporal cortex (an auditory area), but also the visual and motor areas; whereas action-related verbs activated motor regions within the frontal cortex and the cerebellum. This partial overlapping brought these findings to be compatible with weaker embodied cognition theories (Popp et al., 2019b). The same group, in another study, conducted an explicit semantic context decision task followed by an implicit lexical task. The explicit one brought an activation correspondent to sensorimotor brain regions, while the implicit one did not show any differences between the two verb categories, providing limited support to the embodied cognition theories (Popp et al., 2019a). However, the stimuli included were summarily divided between action and sound verbs, without considering, for action verbs, whether they were executed by mouth, hand, or foot, and for sound verbs, whether they were made by nonliving sources, human actions or by animals. Remarkably, a category-preferential organization for processing real-world sounds has been supported by dissociations of activated cortical networks for human (e.g., applause), animal (e.g., buzzing insect), mechanical (e.g., egg timer), and environmental (e.g., heavy rain) sound-sources (Engel et al., 2009).

In sum, studies reporting the semantic somatotopy of action verbs processing, together with those showing a category-preferential organization for processing real sounds, provide the basis for the embodiment of sound verbs as well. However, experiments including both action and sound verbs seem to not totally support verbs’ embodiment, unless an explicit task is used.

The case of sound and action verbs and mirror neurons

An important discovery that has shaped the debate on meaning representation is that of mirror neurons (Rizzolatti et al., 1996). Mirror neurons are a class of neurons originally found in the rostral part of inferior premotor cortex of the macaque monkey that respond when performing an action, but also when they observe the experimenter making similar actions. Particularly, mirror

(19)

neurons in monkeys are activated only by transitive action that targets a simultaneously presented object (Di Pellegrino et al., 1992; Caramazza et al., 2014). Differently, in human brain these types of neurons are also present, outside the “motor regions”, in the medial temporal lobe (Mukamel et al., 2010). This disclosure established a clear link between the executed movement and that observed. Therefore, there should be a common code between actor and observer, which might have become the one between the sender and the receiver of each message (Rizzolatti & Arbib, 1998).

A closely related finding is the one of the audiovisual mirror neurons, which make the coding possible independently of whether actions are performed, heard (e.g. sounds), or seen (Kohler et al., 2002). However, Caramazza et al. (2014) argued that there might be a difference between a visual and an auditory signal: the former allows to determine different properties of the action, such as which effectors are used, their position and speed; the latter provides much less information (e.g. about the effector) and it represents a learned association. Further, the action understanding of mirror neurons, considered to be largely automatic, is achieved without any high-level mental processes (Rizzolatti & Sinigaglia, 2010); on the same line, the core claim of embodied cognition theories is that conceptual understanding is achieved through sensorimotor simulation. Hence the association between mirror neuron and embodied cognition theories.

Mirror neurons renovated the interest in motor theories of action recognition for which in order to recognize and identify an action it is necessary to covertly simulate the movements required to produce that action (Mahon & Hickok, 2016). Importantly, they have brought support to the view that perception and action share a common code, postulated by the Theory of Event Coding (TEC; Hommel, 2015; Hommel, 2019). In addition, TEC assumes that the common coding between perception and action happens through multimodal feature codes, referring to the distal features of the represented event. In other words, the basic units of both perception and action are sensorimotor entities: they are activated by sensory input and controlling motor output. Hence, in feature-repetition-priming, feature codes tend to prime all representations that include this particular feature code. For instance, seeing something green, such as a salad, facilitates saying “green”. However, the complexity of the format must be considered: more complex features (e.g. the ones included by the concept “parenthood”) require to take more time to deal with more information, compared to simpler features (e.g., “school”) (Hommel, 2019).

(20)

Remarkably, TEC has been considered theoretically commensurable with the meanings of the embodiment concept. That is, TEC embraces the view that human cognition emerges from sensorimotor processing (Hommel, 2015). Nonetheless, the matter is not that straightforward. De Zubicaray et al. (2013) examined the motor cortex activity during action word processing in a fMRI study. Their results showed that verb-like non words evoked increased activity compared to noun-like non words, besides this activity overlapped with that of hand-related action verbs. These findings provided evidence that the motor cortex activity does not reflect selectively motor-semantic content and/or its simulation, but also ortho-phonological processing. In other words, because motor cortex activity reflects the sensitivity to orthographic and phonological properties as well, it is important to ortho-phonologically match pseudoverbs with their verb counterparts.

Priming and cross-modal experiments

Context has been shown to influence our understanding of concepts (Popp et al., 2016; Hommel, 2019). One way to study that is using priming. A prime is an item that is presented for a short amount of time. The prime is then followed by a target, another item, for which the participant needs to provide an answer. When a prime is presented in the same sensory modality of the target, it is called uni-modal priming. For instance, a written word for prime and a written word for target. Likewise, when a prime is presented in one sensory modality and it influences the response to a target in a different sensory modality, it is called cross-modal priming (Scherer & Larsen, 2011). An example might be a video clip as prime and a written word as a target, or vice versa.

Another important distinction that must be made in order to predict the effects that a prime can have on a target it is between repetition priming and antipriming. Repetition priming consists of presenting the same prime and the same target. This type of priming causes a faster or more accurate process of the target. Antipriming is the antithesis of the repetition priming, that is, when prime and target share component features (e.g., piano and desk in visual object identification), therefore they have overlapping representations. This type of priming has been shown to manifest a detrimental effect on the target (Marsolek, 2008).

The assimilative or contrastive effect of a primed concept on a behavior can also depend on the range of meanings the concept might have, and whether those meanings overlap with the ones of the target (Janiszewski & Wyer, 2014). The reason is that the person activates the concept

(21)

of the prime, but then he/she should deactivate it in order to access the target. Furthermore, a prime is not an isolated piece of content or a single processing event. A prime results in the activation of an array of content and process nodes (Janiszewski & Wyer, 2014). Importantly a prime has a certain level of activation. In order to make it lower, the prime must to be simple enough that it is easy to process. But how all these variables have been used in the study of action and sound-related concepts?

Cross-modal priming experiments have been showed that encoding an iconic gesture activates semantically related words (So et al., 2013; Yap et al., 2011). Concerning action verbs, the most recent study including cross-modal priming is the one conducted by Murteira et al. (2019). This research group used pantomimed gesture as prime and action verb retrieval as target across two experiments. In the first one, they compared naming of verbs primed by congruent (representing the same action) or incongruent (representing an unrelated but meaningful action) pantomimed gestures. In the second one, they added a neutral prime condition (i.e. a video of the gesturer in a static standing position). They found faster responses when the action picture was preceded by a congruent gesture. Also, the more transparent was the gesture, the greater was the priming effect because the more consistently the target verb was evoked.

Another type of cross-modal priming technique consists in the motor system recruitment during the processing of action-related language. Klepp and colleagues (2019), using magnetoencephalography, created an experiment combining action verbs related to hand and foot with response effectors (hand and foot). Congruent verb-response conditions were associated with faster reaction times, showing evidence for a somatotopic verb-motor priming effect.

However, experiencing the world is not only processing visual stimuli, but also stimuli in other modalities, such as the auditory modality. Chen and Spence (2011) ran a semantic priming experiment and found that the presentation of a semantically congruent naturalistic sound, such as the sound of a dog’s bark or of a creaking door, can enhance the sensitivity of visual picture identification, the picture of a dog or of a door, respectively. The same authors, compared the time courses of the priming effects elicited by naturalistic sounds and spoken words on visual picture processing. They found that naturalistic sounds access their associated meaning faster compared to spoken words which, nevertheless, elicit a more prolonged priming (Chen & Spence, 2011; Chen & Spen, 2013; Chen & Spen, 2018). To the best of our knowledge, no studies have

(22)

investigated cross-modal priming using the same unspecified primes in visual and auditory modalities.

Aim, research questions, hypotheses and predictions

The aim of this study is to assess whether activation of the sensorimotor cortex differently influences the processing of action and sound verbs.

To do so, we ran a cross-modal priming study to answer two research questions:

a. Does the previous observation of a visual prime (i.e., unspecified hand movement) affect the processing of action-related verbs and/or sound-related verbs by making accuracy higher/lower and reaction times faster/slower in a lexical decision task?

b. Does the previous listening of an acoustic prime (i.e., bike bell sound) affect the processing of action-related verbs and/or sound-related verbs by making accuracy higher/lower and reaction times faster/slower in a lexical decision task?

Accordingly, we have two hypotheses:

a. The previous observation of an unspecified hand movement will affect the processing of action-related verbs and not sound-related verbs by making accuracy higher/lower and reaction times faster/slower in a visual lexical decision task.

b. The previous listening of a bike bell sound will affect the processing of sound-related verbs and not action-related verbs by making accuracy higher/lower and reaction times faster/slower in a visual lexical decision task.

We did not indicate a specific direction for accuracy and reaction times because the primes do not specifically repeat the whole targets’ concept (e.g., the verb to moo is not preceded by a moo sound as prime). As a result, they might behave as repetition priming, making the target processing faster or more accurate, or as antipriming, making the target processing slower or less accurate. For instance, the unspecified hand movement is followed by an action verb such as to wash, with which it has in common only the feature that it is an action performed by a hand; the bike bell sound is followed by a sound verb such as to sing, with which it has in common only the auditory feature.

An interaction between perceptual and linguistic processing (i.e., action prime with action verb, and acoustic prime with sound verb) would support the idea the action prime and action verbs are represented, to some extent, in the motor cortex; whereas the sound prime and sound verbs are

(23)

represented, to some extent, in the auditory cortex. Consequently, it would support the embodied cognition views.

Methods Participants

Seventy-five volunteers (24 males; age 18-77 (M=29, SD=12.28)) were recruited using online advertisements on social media sites (e.g., Facebook) and word-of-mouth. A total of six participants were excluded because they did not meet the inclusion criteria: three had neurological disorders, two had poor eyesight, one had poor eyesight and hearing. Sixty-nine participants (33 list A; 36 list B) were then included in data analysis (21 males; age 18-77 (M=28.5, SD=11.74)). They were Russian native speakers (3 were bilingual native speakers of Ukranian or Tatar), with normal or corrected-to-normal vision and normal hearing, without history of neurological (including reading and motor impairments) or psychiatric disorders by means of an on-line form. Fifty-eight (84%) participants were right-handed, 10 (14.5%) were ambidextrous, and 1 (1.5%) left-handed (Veale, 2014). Related to the education, 61 individuals have obtained a degree at university, 7 have completed high school, 1 did not complete high school. 51% of the participants (N=35) declared to play video games.

Informed consent was obtained through an online form, participants were informed they could opt out during the experiment (see Appendix A). Participation was voluntary and no financial compensation was provided. The procedure was approved by the Research Ethics Review Committee (Commissie Ethische Toetsing Onderzoek, eCETO) of the University of Groningen, The Netherlands.

Stimuli and stimuli preparation

To avoid any grammatical class ambiguity, the language chosen was Russian, in which verbs are clearly marked by their endings (i.e., -ть, t’). The experimental paradigm consisted of three types of stimuli: 48 action verbs (e.g., толкать, tolkat', to push), 48 sound verbs (e.g., храпеть, khrapet', to snore), 96 pseudowords (e.g., надбыть, nadbyt’); total stimuli 192 (See Appendix B). A pseudoverb is defined as a legal, pronounceable but meaningless letter string; that is, a pseudoverb conforms to the orthographic and phonological patterns of a language (Keuleers & Brysbaert, 2010), in the current case, of Russian verbs (e.g., хвачить, khvachit’). Pseudoverbs

(24)

were generated using an algorithm broadly based on principles described in Keuleers and Brysbaert (2010). Pseudoverbs were matched phonologically (ending by -ть, t’) as well as length in graphemes and frequency (that is, each pseudoverbs was generated from a list of verbs having a specific range of frequency) with their verb counterparts.

Previous to the creation of the experimental paradigm, we obtained word properties for 160 Russian verbs. To do so, we used two different procedures, including:

a. A Google Forms survey (Смыслы и ассоциации, Smysly i assotsiatsii, Meanings and associations; https://forms.gle/dfQAF2CGThPUmBTn8), which was shared in social networks (e.g., Facebook). Its completion took around 10-15 minutes. The survey was completed by 140 healthy individuals (32 males; age 18-72 (M=37.1, SD=13.6)) and was used to obtain ratings for relevance of visual, action, sound, and emotional features, and their familiarity. Visual and sound ratings were obtained following the instructions given by Paivio et al. (1968) and action, emotional and familiarity ratings following Popp et al. (2016) (Appendix C). The verbs were rated on a five-point Likert scale (one=low familiarity/relevance; five= high familiarity/relevance).

The survey contained 80 hand-related action verbs (Mean number of letters=6,94) including 28 made with a tool (e.g., ковать, kovat', to hammer), 34 made with a hand (e.g., трогать, trogat', to touch), 18 that can be made by both (e.g., открывать, otkryvat', to open); 80 sound related verbs (Mean number of letters =6,89) comprising 28 produced by animals (e.g., мычать, mychat', to moo), 20 produced by inanimate object (e.g., звякать, zvyakat', to tinkle), 32 human-made sounds (e.g. чихать, chikhat', to sneeze). In order to diminish the length of the questionnaire, the stimulus set was pseudo-randomly equally split into four lists: each list included 40 verbs and was randomly assigned to 35 participants.

b. Manual checking of grammatical (i.e., transitivity), lexical (i.e., word length, frequency) and semantic (i.e., instrumentality, frequency) properties in order to pinpoint the underlying processes that are needed to perform the task. Transitivity was checked on a dictionary of Russian language (Efremova, 2000). Word length in graphemes was checked manually. Instrumentality ratings were obtained by using StimulStat database (Alexeeva et al., 2018). In the few instances when a verb was not included in the database, the ratings were obtained from two linguists who rated the items individually and independently.

(25)

Word frequency was checked using the frequency dictionary of Russian vocabulary (Lyashevskiy & Sharov, 2009).

After that, to create the stimulus list for the ensuing priming experiment, verbs included in the survey and in the manual ratings having a rating of four or five for action or sound features were selected. However, ambiguous verbs, that is verbs having a high rating (> 4.5 on a 5-point Likert scale) for any other features (except familiarity), were excluded. Then, two final lists were created (see Appendix B), each one having 3 verb types (96 total verbs). List A was composed of 24 action verbs (4 made by a tool such as to paint, 14 by a hand such as to push, 6 by both such as to wash), 24 sound verbs (6 produced by animals such as to woof, 4 by inanimate object such as to creak, 14 by humans such as to call), 48 pseudoverbs. List B was composed of 24 action verbs (8 made by a tool such as to paddle, 11 by a hand such as to touch, 5 by both such as to smear), 24 sound verbs (10 produced by animals such as to roar, 7 by inanimate object such as to click, 7 by humans such as to sneeze), 48 pseudoverbs. Action and sound verbs were not matched for transitivity and instrumentality features (see Table 1).

Table 1

Transitivity and instrumentality across lists

Action verbs Sound verbs

List A List B Total List A List B Total

Transitivity Transitive 13 11 24 1 0 1 Can be both 11 13 24 9 10 19 Intransitive 0 0 0 14 14 28 Instrumentality Instrumental 6 8 14 0 0 0 Can be both 5 6 11 1 0 1 Not instrumental 13 10 23 23 24 47

Action verbs were comparable to sound verbs for familiarity, word length and word frequency within list A (see Table 2) and within list B (see Table 3).

(26)

Table 2

Matching of conceptual and psycholinguistic stimulus features within list A

ª P-values of two-tailed t-tests.

Table 3

Matching of conceptual and psycholinguistic stimulus features within list B

Acoustic Emotion Visual Motor Familiarity Word

length Word frequency Sound verbs 4.70 3.10 3.71 2.71 4.59 6.79 79.72 Action verbs 2.59 2.43 4.25 4.37 4.65 6.83 35.7 Sound vs. Action verbs (p-values ª) < 0.01 < 0.01 < 0.01 < 0.01 0.28 0.92 0.55

ª P-values of two-tailed t-tests.

Verbs of list A and verbs of list B were matched for most of conceptual and psycholinguistic features. The visual feature was not perfectly matched between lists. However, this difference was not due to differences in action verbs, where visual component has a high relevance in action performance, but between sound verbs of list A and sound verbs of list B (see Table 4).

Acoustic Emotion Visual Motor Familiarity Word

length Word frequency Sound verbs 4.77 3.26 3.32 2.57 4.63 7.08 24.25 Action verbs 2.38 2.31 4.14 4.30 4.65 6.92 26.15 Sound vs. Action verbs (p-values ª) < 0.01 < 0.01 < 0.01 < 0.01 0.77 0.65 0.88

(27)

Table 4

Matching of conceptual and psycholinguistic stimulus features between lists

ª P-values of two-tailed t-tests.

Finally, we created three different primes: (1) a video clip of a moving right-hand, (2) a sound produced by a bike bell, (3) a static video clip including only the background of the first prime. All primes had a duration of 900ms. In addition, primes (1) and (3) shared the dimensions (width = 608, height = 858) and were always presented as silent video primes. Prime (3) was included as neutral baseline condition to determine whether any effect on hand-related action verbs was due to prime (1) or to the hand responses (Klepp et al., 2019; Carreiras et al., 2007). In each list, for each verb category (i.e., action verbs, pseudoverbs, sound verbs), 8 targets were preceded by prime (1), 8 targets by prime (2), 8 targets by prime (3). Between lists, verbs belonging to the same verb category and preceded by the same prime were matched for the critical feature (i.e., acoustic for sound verbs, motor for action verbs), familiarity, word length, word frequency.

Conditions

The experiment had three conditions:

Acoustic Emotion Visual Motor Familiarity Word

length Word frequency Verbs List A 3.58 2.79 3.73 3.44 4.64 7 25.2 Verbs List B 3.64 2.76 3.98 3.54 4.62 6.81 57.71 Verbs List A vs. Verbs List B (p-values ª) 0.79 0.88 0.03 0.59 0.56 0.49 0.38 Sound v. List A vs. Sound v. List B (p-values ª) 0.25 0.39 0.02 0.32 Action v. List A vs. Action v. List B (p-values ª) 0.39 0.43 0.21 0.30

(28)

• A congruent condition where the prime and target are matched (e.g., hand video clip preceding an action verb, or bike bell sound preceding a sound verb);

• A non-congruent condition where the prime and target are not matched (e.g., hand video clip preceding a sound verb, or bike bell sound preceding an action verb); and

• A neutral condition where the prime, completely meaningless, cannot have any potential influence on any target (e.g., static video clip preceding an action verb, or static video clip preceding a sound verb).

Design

Each trial comprised: (a) a blank screen for 1000ms, (b) a fixation cross (+) appearing on the screen for 1000ms, (c) a prime for 900ms; (d) a fixation cross (+) for 100ms; (e) a target word displayed up to the participant’s answer (see Figure 6). The structure of the trials, including the stimulus presentation times and inter-stimulus intervals was based on Murteira et al. (2019).

(29)

Note. Trial presentation structure: a blank screen for 1000ms, a fixation cross appears for 1000ms, followed

by a prime for 900ms. Then a fixation cross appears only for 100ms, followed by the target verb (or pseudoverb). In this figure, three conditions are presented: congruent (in this case the action verb “to touch” preceded by the hand video clip), non-congruent (here an action verb preceded by the bike bell sound, not covered by any visual input) and neutral condition (here an action verb preceded by a static video clip). All materials are in Russian. To see a live experimental paradigm click here: https://youtu.be/9NviqvLdbPA

Visual elements were placed in full width screen layout (100%), but they differed in their height layout: fixation crosses had a layout height of 50% (between 10 and 60), while video clips had a layout height of 53% (between 15 and 68). The target stimuli were placed in the default screen layout given with the text content.

Figure 6

(30)

Procedure

The Gorilla Experiment Builder (www.gorilla.sc) was used to create and host the experiment (Anwyl-Irvine et al., 2018). Data was collected between 27-05-2020 and 08-06-2020.

In order to avoid distractions, participants were asked to find a quiet place where they would not been distracted, to wear their headphones and to place their mobile communication devices behind the computer (or somewhere out of their view). Before starting the task, individuals have to fill in a questionnaire (see Appendix D) asking for education level, primary occupation and handedness using the short form of the Edinburgh Handedness Inventory (Veale, 2014). In addition, because of the online experiment used collecting reaction times, they were asked whether they use to play video-games due to the related enhanced perceptual and motor skills, including visual/spatial processing and hand-eye coordination (Powers et al., 2013).

Participants were instructed that first, they would watch a brief video clip, or they would hear a sound and that no response was needed. Participants were instructed to respond, as fast and accurately as possible, to the second stimulus (i.e. the written word), deciding whether the word that they see is a word, by pressing “←” on their keyboard, or is not a word, by pressing “→” on their keyboard. A trial round (6 items) was included to avoid technical issues. The same practice session preceded list A and list B. A complete online session took around 15 minutes.

Data analysis

The data analysis was conducted on RStudio (RStudio Team, 2020). The primary dependent variables of interest were accuracy and Reaction Time (RT) for correct answers and were separately analyzed using the lme4 package (Bates et al., 2015). Accuracy had a binary classification with a score of 1 for correct answer and 0 for incorrect answer in the lexical decision task. RTs was defined as time in milliseconds elapsed between the onset of the target word and the response (i.e., button press) in the lexical decision task. They both were scored automatically by the Gorilla Experiment Builder. The independent variables were verb type (action verbs vs. pseudoverbs vs. sound verbs) and prime type (hand prime vs. neutral prime vs. sound prime). Other independent variables were included only if they would have improved the model: list (A vs. B), gender (female vs. male vs. other), handedness (left-handed vs. mixed-handed vs. right-handed), video games (whether participant play vs. do not play video games). Because gender, handedness and video games have less than five levels each, they were treated as fixed effects (Harrison et al.,

(31)

2018). Finally, participants and items were included as random effects in the overall model estimation. Because every participant was exposed to different stimuli (list A and list B), participants and items were not treated as crossed effects.

For accuracy, generalized linear mixed models (using the glmer function) were constructed to measure the effect of prime type, verb type and their interaction, on accuracy. The models were fit using a Laplacian approximation to the log-likelihood. A binomial distribution was chosen to match the properties of measured accuracy. Four different models were compared using ANOVA, all with the same random effects specification:

• M0: Accuracy ~ (1 | Participant) + (1 | Item)

• M1: Accuracy ~ List + Gender + Handedness + Video games + (1 | Participant) + (1 | Item) • M2: Accuracy ~ Prime type * Verb type + (1 | Participant) + (1 | Item)

• M3: Accuracy ~ Prime type * Verb type + List + Gender + Handedness + Games + (1 | Participant) + (1 | Item)

The resulting model with a significant p-value as well as the lowest Akaike Information Criterion (AIC) was further analyzed. Particularly, post-hoc analyses were included where necessary using the testInteractions function of the Phia package (Martinez, 2015).

Concerning RTs, as in most simple decision tasks, the distribution was positively skewed. In accordance with example set of Lo and Andrews (2015), and due to the potential unreliability and uninformativity of back-transformation, the observed deviation from normality was not eliminated. Therefore, in order to perform a statistical assessment on these raw RT data, while meeting the assumptions of the statistical model, generalized mixed linear models (using the glmer function) were constructed. The Gamma and inverse Gaussian distributions, both characterized by a unimodal skewed distribution with continuous responses greater than or equal to 0, thus associated with RT measures (Lo & Andrews, 2015), were visually compared with our raw RT data (see Figure 7).

(32)

Note. Bars represent the observed densities to RT, while lines show the fitted densities.

The Inverse Gaussian distribution seemed to show a better fit to RT responses: it is the one that best approximates the surface characteristics of the distribution of the observed RT. Hence, in generalized linear mixed models, the Inverse Gaussian distribution was chosen to match the properties of measured RT. The models were fit using a Laplacian approximation to the log-likelihood. Four different models were compared with the ANOVA, all with the same random effects specification:

• M0: RT ~ (1 | Participant) + (1 | Item)

• M1: RT ~ List + Gender + Handedness + Video games + (1 | Participant) + (1 | Item) • M2: RT ~ Prime type * Verb type + (1 | Participant) + (1 | Item)

• M3: RT ~ Prime type * Verb type + List + Gender + Handedness + Games + (1 | Participant) + (1 | Item)

The model with a significant p-value as well as the lowest Akaike Information Criterion (AIC) was further analyzed. Particularly, post-hoc analyses were included where necessary using the testInteractions function of the Phia package (Martinez, 2015).

Figure 7

(33)

Results

Accuracy and reaction time measures across 69 participants and across 192 (48 action verbs, 96 pseudoverbs, 48 sound verbs) stimulus items were collected and analyzed. Overall, data having physically impossibly short RTs (button presses within 200ms of stimulus onset) and very long response latencies (exceeding 3 seconds) were excluded (Balota et al., 2007): 108 out of 6624 (1.63%) data points were excluded from the data analysis of both accuracy and RTs.

Accuracy data

Overall accuracy was 96.32 % (correct: 6340; incorrect: 176). Following model comparisons, arising from maximum likelihood ratio test using an ANOVA, the most representative model, M2 (for a summary, see Table 5), carried an AIC value of 1437.5, with log-likelihoods represented in a chi-square statistic, 𝜒2 (2) = 10.8871, p = 0.004324.

Table 5

Summary generalized linear mixed model M2 for Accuracy

Model

M2 Accur ~ Prime type * Verb type + (1 | Participant) + (1 | Item)

AIC BIC log Lik deviance df. resid

1437.5 1512.1 -707.7 1415.5 6505

Scaled residuals:

Min 1Q Median 3Q Max

-10.9676 0.0664 0.0942 0.1472 1.0469

Random effects:

Groups Name Variance Std.Dev.

Item (Intercept) 1.9860 1.4093

Participant (Intercept) 0.3817 0.6178

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 6.3359 0.7957 7.963 1.68e-15***

Neutral prime -1.3632 0.9305 -1.465 0.1429

Sound prime -0.1354 1.0515 -0.129 0.8976

Pseudoverb -1.9523 0.8385 -2.328 0.0199*

Sound verb -1.4993 0.9234 -1.624 0.1044

(34)

Sound prime : Pseudoverb 0.2115 1.1580 0.183 0.8551

Neutral prime : Sound verb 0.4401 1.1581 0.380 0.7039

Sound prime : Sound verb -0.6810 1.2570 -0.542 0.5880

Note. Number of observations: 6516. Number of items: 192. Number of participants: 69. The reference

level for Verb type is action verb and for Prime type is hand prime. ***p < 0.001; *p < 0.05.

No significant interaction emerged between prime type and verb type from the overall model. Furthermore, the model showed a significant p-value for the accuracy related to pseudoverbs with an estimate of -1.9523 (p-value = 0.0199). Therefore, possible interactions between verb types were analyzed (Martinez, 2005). Action verbs (accuracy = 98.97% (SD = 2.57)) were significantly more accurate than pseudoverbs (accuracy = 96.89% (SD = 6.70)) (𝜒2 (1) = 8.6689, p =0.006474), and also action verbs were significantly more accurate than sound verbs (accuracy = 96.16% (SD = 6.05)) (𝜒2 (1) = 10.7085, p =0.003199).

In sum, in all three conditions, whether the targets were preceded by hand prime, neutral prime and sound prime, the correctness percentages concerning action verbs were higher (98.97%) than those related to sound verbs (96.16%) and pseudoverbs (96.89%). Finally, no difference in accuracy appeared between pseudoverbs and sound verbs (𝜒2 (1) = 0.6329, p =0.426293). Summaries of accuracy measures are reported in Figure 8.

(35)

Figure 8

Mean percentage of correct answers by verb types

Note. * indicates a significant difference.

Reaction time data

First, an outlier identification procedure was adopted: any correct RT greater than 3 standard deviations above the mean for that participant was recognized as outlier (Balota et al., 2007), using the sdTrim function in R. In this way, 4.28% (N=279) data did not enter the analysis for RT. The distribution of RTs obtained remained positively skewed (see Figure 9).

(36)

Following model comparisons, arising from maximum likelihood ratio test using an ANOVA, the most representative model, M2 (for a summary, see Table 6), carried an AIC value of 83536, with log-likelihoods represented in a chi-square statistic, 𝜒2 (2) = 36.9333, p < 0.001.

Table 6

Summary generalized linear mixed model M2 for Reaction Time

Model

M2 RT ~ Prime type * Verb type + (1 | Participant) + (1 | Item)

AIC BIC log Lik deviance df. resid

83536.5 83617.3 -41756.2 83512.5 6225

Scaled residuals:

Min 1Q Median 3Q Max

-1.9794 -0.6544 -0.2362 0.3856 7.5843

Random effects:

Figure 9

(37)

Groups Name Variance Std.Dev.

Item (Intercept) 2.972e+03 5.451e+01

Participant (Intercept) 1.066e+04 1.033e+02

Residual 6.516e-05 8.072e-03

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1075.15 20.15 53.361 < 2e-16***

Neutral prime 42.14 22.61 1.864 0.0623

Sound prime -15.88 22.20 -0.715 0.4746

Pseudoverb 120.47 19.89 6.058 1.38e-09***

Sound verb 13.61 22.41 0.607 0.5436

Neutral prime : Pseudoverb -34.01 28.35 -1.200 0.2303

Sound prime : Pseudoverb 12.65 28.00 0.452 0.6516

Neutral prime : Sound verb -21.90 32.06 -0.683 0.4945

Sound prime : Sound verb 42.40 31.78 1.334 0.1822

Note. Number of observations: 6237. Number of items: 192. Number of participants: 69. The reference

level for Verb type is action verb and for Prime type is hand prime. ***p < 0.001.

No significant interaction emerged between prime type and verb type from the overall model. Moreover, the RT related to pseudoverbs was significantly higher than action verbs, with an estimate of 120.47 (p-value < 0.001). This result was followed up by a post hoc test which analyzes the possible interaction pairwise contrasts (Martinez, 2015). The RT in lexical decision task was higher for pseudoverbs (RT = 1021.71ms (SD = 373.20)) compared to both action verbs (RT = 868.79ms (SD = 342.42)) (𝜒2 (1) = 96.8574, p < 0.001) and sound verb (RT = 896.87ms (SD = 338.13)) (𝜒2 (1) = 63.7472, p < 0.001), respectively. Summaries of RT measures by conditions are presented in Figure 10.

(38)

Note. * indicates a significant difference.

Discussion

This pilot study investigated the embodiment of action and sound verbs. To do so, it assessed whether the previous activation of the sensorimotor cortex differently influences the processing of hand-related action verbs and sound verbs in a cross-modal feature-repetition priming. In order to determine priming and/or antipriming effects, the experiment included three test conditions comprising congruent prime-target pairs (i.e., hand prime followed by an action verb; sound prime followed by a sound verb), incongruent prime-target pairs (i.e., hand prime followed by a sound verb; sound prime followed by an action verb), and a control condition, where verbs were preceded by a neutral prime. Reaction time and accuracy were obtained and analyzed to identify potential similarities and differences in action and sound verbs processing preceded by a certain prime. Overall, the task was performed carefully, given that the accuracy was 96.32 %.

The first hypothesis stated that the previous observation of an unspecified hand movement (i.e., hand prime) would have affected the processing of action-related verbs and not sound-related

Figure 10

(39)

verbs by making accuracy higher/lower and reaction times faster/slower in a lexical decision task. The second hypothesis asserted that the previous listening of a bike bell sound (i.e., sound prime) would have affected the processing of sound-related verbs and not action-related verbs by making accuracy higher/lower and reaction times faster/slower in a lexical decision task. Two generalized linear mixed models, one for accuracy one for RT, were adopted to evaluate the above hypotheses. The two models did not provide any significant difference neither related to the interaction between hand prime and action verbs, compared to the one between hand prime and sound verbs (first research question), nor related to the interaction between sound prime and sound verbs, compared to the one between sound prime and action verbs (second research question).

Although the findings did not match our predictions and hypotheses, two additional results must be taken into account: (1) Relative to accuracy, a significant difference in the overall accuracy, which was higher for action verbs compared to both pseudoverbs and sound verbs; (2) Pseudoverbs had a response time significantly slower compared to the real verbs. While the latter findings (2) are in line with previous studies (Carreiras et al., 2007) and might be due to the lexicality effect, the former results (1) are relevant to discuss because sound verbs and action verbs were carefully matched for familiarity, word length and frequency, though not for transitivity and instrumentality. Why did not we find any significant interaction? Did the prime not work? Why? Why did we find this difference just in accuracy? Do our findings give further insight to embodied cognition? Because the primes were based on different assumptions, the two verb categories must be separately analyzed. In the following sections, both accuracy and RTs of the action verbs will be discussed initially, followed by those related to sound verbs:

The contribution of motor cortex in processing action verbs

Our experiment tested whether pre-activating verbs with a hand prime modulates the processing of hand-related action vs. sound verbs differently within a lexical decision task. The generalized linear mixed models chosen to test accuracy and reaction time, respectively, did not show any difference between action and sound verbs preceded by hand prime in a lexical decision task. This lack of findings, not supporting the first hypothesis, could have three possible interpretations. First, activation of motor cortex, assumed to happen after seeing the hand prime, does not influence the processing of action verbs; in other words, the motor cortex does not contribute to processing of sensory-motor related verbs. That would be because, in contrast with

(40)

embodied cognition theories, concepts are represented in the brain in an amodal format. Second, the lexical decision task did not activate verbs’ meanings, as it evoked their processing just at a pre-semantic level, and sensorimotor activation occurs only, after accessing to the amodal concept, as a consequence of semantic elaboration. This implies that category-specific effects for action- and sound-related verbs are not detectable in the current implicit task (Popp et al., 2019a). The third interpretation concerns the hand prime, which was based on the assumption of mirror neurons (Rizzolatti et al., 1996): it is possible that the item we chose might have not properly activated the motor cortex. Before opting for one interpretation the overall results must be considered.

Interestingly, hand-related action verbs showed higher accuracy than sound verbs, and than pseudoverbs, whether they were preceded by hand, sound or neutral primes. That might indicate an effect of the response effector. That is, participants were instructed to make a lexical decision by pressing the buttons “←” or “→” with their right hand on the keyboard. It has been shown that in this type of task there is an increased activation in the regions associated with finger press responses (Carreiras et al., 2007). Hence, the results may indicate a somatotopic verb-motor priming effect, that is, responses are facilitated when they are performed with the effector (in this case, hand) described by a verb (in this case, hand-related action verbs). These findings are also in line with the work of Klepp and colleagues (2019), which showed a similar somatotopic verb-motor priming effect: hand and foot verbs followed by hand or foot responses, respectively, were processed faster. Nevertheless, significant findings were found only in accuracy, rather than in accuracy and reaction times. This might be due to (1) the different task (i.e., response selection task vs. lexical decision task); (2) the response time limit, which in Klepp et al. (2019) was set to 1150ms, while for the current study was unlimited, thus prime could have become ineffective because of the longer timing between prime and response. In addition, the current evidence is compatible with the somatotopic activation of the motor system when processing action words (Hauk et al., 2004; Tettamanti et al., 2005), and braces motor cortex contribution in action verbs processing. Indeed, hand-related action verbs following by hand response were significantly more accurate than sound verbs following by hand response. Also, it is consistent with the definition of action words as semantic links binding language and action (Pulvermüller et al., 2005). Moreover, this explanation suggests that the action verb-response effector pair is stronger than the hand prime-action verb pair (see Figure 11).

(41)

Note. On the left side the priming effect expected (hand prime-action verb pair); on the right side the priming effect observed (action verb-response effector pair).

Therefore it is unlikely that the first two above-mentioned explanations are adequate for these data: on one hand, pseudoverbs not only were phonotactically legal, but also they ended in the suffix indicating the infinitive in Russian, which necessitates full verb reading to perform the task, hence Popp et al. (2019a)’s explanation that relies on the lack of semantic elaboration in an implicit task cannot hold. On the other hand, it would be hard to explain the specific difference observed only in action verbs and not with sound verbs. Thus, why the hand prime did not show the predicted effects? The hand prime was built on the assumption that the activation of mirror neurons is largely automatic and is achieved without any high-level mental processes (Rizzolatti & Sinigaglia, 2010). In other words, the observation of a moving hand should easily and quickly activate those brain areas required to perform the same action (i.e., dorsolateral sites of the motor strip). However, their potential activation during action observation makes their representations not exclusively motor but also visual (Caramazza et al., 2014), therefore both motor and visual

Figure 11

(42)

information might have interfered with each other. Finally, the efficacy of action verb-response effector pair compared to hand-prime-action verb pair (see Figure 11) emphasizes the importance of having a specific prime. That, together with the fact that mirror neurons respond only during certain movements (Caramazza et al., 2014), might motivate the revealed inefficiency of our hand prime.

The contribution of auditory cortex in processing sound verbs

With its second research question, the experiment verified whether pre-activating verbs with a sound prime (i.e., bike bell sound), modulates the processing of sound vs action verbs differently within a visual lexical decision task. The generalized linear mixed models chosen to test accuracy and reaction time, respectively, did not show any difference between action and sound verbs preceded by sound prime in a lexical decision task. These findings could lead to three possible interpretations. The first one concerns the support of an amodal format of concepts represented in the brain, in line with the classical cognitive, nonembodied view (Mahon & Caramazza, 2008). However, if this was the case it would be difficult to explain the difference in accuracy between action and sound verbs. The second possible explanation is that verbs, because they denote actions (even if they are sound verbs), are represented exclusively in the motor cortex: it might be that sound verbs are related to mouth motor cortex (e.g., to sing) or to hand motor cortex (e.g., to click). Nevertheless, if that had been the case, a difference between action and sound verbs, specifically those related to hand motor cortex, would have not been discernible in our study. Thus, why did not sound verbs show any priming effect? Did the sound prime have limitations?

A third interpretation for the present set of findings, the most compelling in our view, is that the cortical network is category-organized for sound processing. Because previous studies have shown a category-preferential organization for processing real-world sounds (i.e., human, animal, mechanical, environmental sound-sources; Engel et al., 2009; Lemaitre et al., 2018), it is possible that supplying a mechanical sound as prime for every sound verb (i.e., related to human-made sound, animal sound or inanimate-object sound) did not generate any specific activation/facilitation. For instance, the sound prime (i.e., bike bell sound) and the target, which can be animal-related sound verb (e.g., to moo), might have activate different cortical networks: for the mechanical sound there might have a preferential activation of areas associated with

Referenties

GERELATEERDE DOCUMENTEN

Stem-stressed verbs in -auti have either a metatonical acute (6 dever- batives and 2 denominatives in Senn's list) or an original acute (l deverbative and 2 denominatives), while

Vedic -ya-presents with fluctuating accentuation represent a typologically interesting verbal class, since they belong to the semantic area ('entropy increase') which

Future studies on the survival of pathogenic yeast populations in natural environments should also be aimed at finding correlations between yeast numbers and the abundance

De onderdelen 'modelmatige mestmarkt' en 'beleefde mestmarkt' van de monitoring mestmarkt 2006 zijn uitgevoerd zoals beschreven is in Protocol voor monitoring landelijke

The timeframe of the story is October 1981 to June 1982, and the political events (the commencement of the Israeli incursion into Lebanon) form the background to the story. Yet, the

middernacht aangetroffen. Het aandeel rijders onder invloed in de provincie Utrecht is vrijwel gelijk aan dat in Noord-Brabant maar aanzienlijk lager dan in

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

Despite the obvious similarities between Croft’s model of events and our own, certain differences are worth mentioning. Most importantly, our model fo- cuses more on the