• No results found

Iterated learning of gestures: Between cultural transmission and critical periods

N/A
N/A
Protected

Academic year: 2021

Share "Iterated learning of gestures: Between cultural transmission and critical periods"

Copied!
31
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Iterated learning of gestures: Between cultural

transmission and critical periods

Author: Ekaterina Abramova

Student number: 10032851

Supervisors:

Prof. Simon Kirby

Dr. Kenny Smith

Dr. Jelle Zuidema

Edinburgh

August 2012

(2)

Contents

1 Introduction 3

1.1 Language evolution and iterated learning . . . 3

1.2 Emerging sign languages . . . 5

1.3 Iterating sign languages . . . 7

1.3.1 Evolutionary experiments for sign languages . . . 7

1.3.2 Sign experiments for understanding evolution . . . 9

1.4 Predictions . . . 11 2 Methods 12 2.1 Participants . . . 12 2.2 Materials . . . 12 2.3 Procedures . . . 13 2.4 Gesture encoding . . . 15 3 Results 16 3.1 Communicative accuracy . . . 16 3.2 Learnability . . . 17 3.3 Regularity . . . 18 3.4 Compositionality . . . 20 3.5 Follow-up study . . . 23 4 Discussion 24 5 Future Work 26 5.1 Technical issues . . . 26 5.2 Theoretical issues . . . 27 References 27

(3)

Abstract

The study presented here combines two strands of research on the emergence of linguistic struc-ture: Iterated Learning paradigm and studies of emerging sign languages. It is argued that such a combination can be mutually benecial and lead to new ndings on language evolution and language change. We propose an experimental paradigm for studying the emergence of languages in manual modality and partially replicate the patterns recently described in the Nicaraguan Sign Language: the segmentation and linearization of motion events expressions (Senghas et al., 2004). While the interpretation of the NSL phenomenon originally pointed to the importance of the critical period in creating structure in sign languages, we show that similar structure can emerge in adult hearing population. However, our prediction that this happens as a result of iterated transmission is not conrmed. We discuss the potential explanations for this and outline future work to be carried out.

(4)

1 Introduction

The question of language evolution has been on the rise in the last decades. It no longer seems to be considered a realm of pure speculation but rather interdisciplinary eorts have been put to sketching possible scenarios, testing them for coherence, probing with empirical tools of modeling and psychological experimentation. We cannot go back in time and hear the rst words of our ancestors but it is hoped that the general principles that were operational then can still be observed in how language emerges when no previous model is available. This can happen naturally, such as in the case of pidgins, homesign systems and sign languages invented by deaf communities. However, it can also be triggered in a lab where greater control is possible, by asking people to learn or create a new language. Curiously, there is not much cross-talk between these two strands - the natural and the experimental one and this is what we aim to address in this study. In the remainder of this section we present the Iterated Learning paradigm for testing evolution in the lab (Section 1.1) and the issues surrounding emerging sign languages (Section 1.2). In Section 1.3 we discuss the benets of combining the two perspectives and go on to presenting our attempt at that in the current study (Section 1.4).

1.1 Language evolution and iterated learning

Iterated Learning is a paradigm that started out as a model for investigating the changes in linguistic structure when it is transmitted across generations of simulated users. Iterated stands for the fact that the process of language acquisition and transmission is repeated over and over, with each generation learning from the previous generation and the output of that learning serving as an input to the next generation down the line (Kirby & Hurford, 2002). The changes that result from such a process point to the possibility that individual cognition is not the only factor at play in language evolution.

If we focus on just the individual, the nature of human language and its evolution can be easily attributed to some kind of innate language faculty, genetically encoded and realized in the brain (Hauser et al., 2002). Such faculty can be deemed responsible for the emergence of linguistic structure in the rst place and also viewed as a module guiding language acquisition. The necessity for positing a language acquisition device (LAD) of this sort is usually motivated by pointing out that linguistic input that children receive is not enough for inducing syntactic regularities and, therefore, the knowledge required to become a procient language user must reside in a specically evolved module. Language acquisition then is viewed as maturational growth triggered by the environment (Chomsky, 1980).

Language faculty can be considered a rich module containing a wealth of syntactic rules but, especially in recent theorizing, is usually limited to recursion mechanism and resulting compositionality, which is one of the core features of human language, responsible for its open-endedness and productivity. A com-positional system is one in which the meaning of a complex expression is a function of the meanings of its immediate syntactic parts and the way in which they are combined (Krifka, 2001, p.152). If such a function is not present, we have to do with a language in which meanings are related to signals holisti-cally, with complete signals linked to complete meanings and no structure inherent in these mappings. Simulations show that such a system is inexible and unstable - if a certain mapping is not given to a new user, it will not be acquired and the user will have to invent a new signal for this meaning when they need to express it. According to a biology-focused view, the emergence of compositionality is ex-plained by reproductive advantage it aords, which forces it to be hard-coded inside the language faculty. The iterated learning research, on the other hand, shows that even LAD of such a restricted sort is not necessary because instead of characterizing the poverty of the stimulus as a constraint on transmission, overcome by the LAD, it is best conceived of as a determinant in the evolution of compositional syntax (Brighton, 2002, p.26) because in such a system an overarching structure can be induced from a limited set of examples and generalized to the examples that have not been experienced thus preserving the

(5)

language (Brighton et al., 2005).

Of course, in order to move from holistic to compositional language and to propagate it, several conditions need to be met. First, the transmission bottleneck, i.e. the limit on the amount of data available needs to be present: if all meaning-signal forms are available to the learner, they can in principle be memorized and reproduced. Second, the meaning space needs to have structure so that meanings can be decomposed into features and values that will receive structured signals. Third, a set of biases of language learners need to be in place, the most important of it being the generalization bias, the ability to perceive regularities in the input, infer structure from it and apply to unseen data. This is presumably based on a preference for componential analysis - decomposing the perceived signals and meanings into sub-units and looking for correspondences, and a preference for more compact representations that are easier to store and process. The generalization bias is a general feature of learning systems. Another set of biases has been considered that is more language-specic and that most likely has evolved because of communicative functionality, namely a bias against many-to-one and one-to-many mappings that can lead to ambiguity. Even those biases, however, are not considered to be hard-coded features of linguistic structure and as such can be acted back upon by inter-generational transmission, for example, being amplied in the process (Brighton, 2002; Smith & Kirby, 2008).

The main point to derive from research on iterated learning is that linguistic structure is not only a matter of individual cognition but also a result of cultural processes and the interaction between these two levels. The adaptation goes in both directions - humans (and therefore their genes and brains) had to adapt to learning and using language but language had to adapt to surviving inter-generational transmission, i.e. languages that could not be acquired by new members of human community could not survive and we do not see them today (Kirby & Hurford, 2002). In time, therefore, languages become more structured and more learnable as they are more likely to survive the bottleneck and are easier to store and process by a cognitive system (Brighton et al., 2005).

Recently, the iterated learning paradigm that started as a computational model has been extended to actual human subjects. Kirby et al. (2008) asked people to learn an alien language for a set of objects that diered on three dimensions (color, shape and movement) and were assigned labels that consisted of random sequences of letters. After the learning phase they were tested on producing alien labels for the objects but, unbeknown to them, half of the objects they had not seen before. Therefore, they were forced to invent new labels. The set produced by one participant was then given to the next participant in a chain and the process was repeated for several generations. Over time, just as in simulations, language became more learnable, i.e. more easily transmitted (fewer errors towards the end of the chain) and more structured (the mapping between meanings and signals became more regular and predictable). In the rst version of the experiment it also became more ambiguous due to a lack of pressure to communicate. In such a situation, the demand on language to be more learnable leads to underspecication, the same labels being used for all objects of a certain type. In the extreme case it could lead to the preservation of just one symbol for all objects, which would minimize encoding eorts and create a language 100% transmittable but useless. Such a situation is prevented, however, by both communicative pressure and cognitive biases against ambiguity.

Psychological experimentation of this kind allows us to answer a variety of questions about the in-teraction between the cognitive and the cultural. For example, dierent directions of transmission can be investigated - vertical (adult-to-child) versus horizontal (peer-to-peer), the inuence of the size of the population, dierent network structures (Cornish et al., 2009). Another question that has been raised is whether iterated learning can lead to the emergence of symbolicity, not just compositionality - in an experimental setting involving graphical communication (further discussed in Section 1.3.2). Given a wealth of evidence on the rapid emergence of linguistic structure in sign languages, it seems crucial to ask whether the principles thought to be operational in written/spoken word are generalizable to manual

(6)

modality.

1.2 Emerging sign languages

There are several kinds of experiments on language emergence that nature provides us with. First, one can study the birth of pidgin and creole languages in the areas where speakers of dierent languages come together and respond to the need to communicate with each other by inventing a new system. This process, however, is aected by their native languages and thus does not let us examine how languages get created with no prior linguistic background, which is obviously more relevant if we are interested in language evolution, not mere language change.

Second, we can study deaf individuals born into hearing families. Some parents decide to teach children sign language, many, however, send them to oral schools, which means that they are not exposed to any conventional language at the beginning of their life. In such cases children usually develop homesigns to communicate. The system that results is not a collection of ad hoc pantomimes, although many signs are iconic. What is interesting is that many typically linguistic features appear as well - what Goldin-Meadow (2002b) calls resilient properties of language, such as stable and systematically organized lexicon, grammatical distinctions, meaningful word order (Goldin-Meadow, 2002a). Although we would expect iconicity to play a big role in developing language in manual modality, Morford (1996) argues that iconicity does not in reality help processing the signs and is only useful when symbols are being created or used with individuals who do not know the system. Once it has been established, it is the symbol-to-symbol relationships that become more relevant and, for example aect the form of each new symbol. Therefore, ASL signers do not rely on iconicity when asked to produce signs for novel objects, while hearing subjects do. Such a process makes it tempting to view homesign as a potential window on language evolution (Botha, 2007). However, homesign use is usually restricted to a single family, often a single user and a single generation, therefore it constitutes an impoverished model.

A third kind of natural experiment - emerging sign languages - enables us to address a wider spectrum of evolutionary issues. These languages are created when a group of people with no prior exposure to conventional language comes together and forms a signing community, either because there is a higher incidence of deafness in a particular location (a village, an island) or because a school or another com-munity structure for the deaf is created. One widely studied recent example is the Nicaraguan Sign Language.

Prior to the 1970s there was no deaf community in Nicaragua and deaf people had little contact with each other. They were developing independent homesign systems and some were schooled in oralist tradition. The rst special school in Managua was also oralist and until the 70s the pupils did not interact outside classroom. The situation began to change in 1977 when a new school was established as well as a vocational school for adolescents. The attendance in both schools was rising and students were socializing more, leading to an establishment of a social club in 1986. With more interaction Nicaraguan Sign Language appeared and started to develop. The lexicon became more regular and consistent, enriched with borrowed forms and principles. Grammar has also emerged (Senghas et al., 2005).

Such a young language as NSL presents an opportunity to closely observe changes in its structure as they happen from generation to generation and investigate the inuence of both the age of entry and the time of entry of language users into the system. In particular, it has been shown that grammatical complexity, measured with the number of arguments per verb, as well as patterns of inection and agreement, is higher in signers from later cohorts and higher in those who started learning NSL at a younger age. The eect turned out to be additive with those characterized by both later year of entry and younger age exhibiting the most complexity (Senghas, 1995).

(7)

combi-natorial patterning. Certain events can be expressed holistically, especially in manual modality, but languages consist of categorical units that can be re-combined to create an innite number of forms. Even though many sign languages use simultaneous constructions, the way they are acquired shows pref-erence for linear sequencing, e.g. complex verb expressions in ASL are oversegmented during development. Language-learning machinery is set to help identify the basic units. Just as in homesign, in emerging sign languages, a shift from holistic iconic forms to a more combinatorial and less iconic patterning signies the development of mature linguistic structure.

Senghas et al. (2004) examined expression of complex motion events in NSL to see (1) whether they are inspired by co-speech gestures available in the hearing environment and (2) whether they are ex-pressed simultaneously, directly representing the referent meaning or rather transformed into linguistic forms. Motion constructions are often analyzed using Talmy's (1985) typology. Talmy considers both translational and self-contained motion and distinguishes six components of motion events: (1) the pres-ence or abspres-ence of the translational motion (Motion), (2) the moving entity (Figure), (3) the object with respect to which the Figure moves (Ground), (4) the course followed by the Figure with respect to the Ground (Path), (5) the manner in which the motion takes place (Manner) and (6) the cause of its occur-rence (Cause). In the experiment described, hearing and deaf participants were presented with a cartoon that contained motion events and asked to retell it. Their signs or co-speech gestures were analyzed to whether manner and path were expressed simultaneously or sequentially and compared between hearing Spanish speakers and NSL speakers from dierent cohorts (dierent year of entry into the signing com-munity). The simultaneous way was used by all Spanish speakers and most rst-cohort signers. Second-and third-cohort signers, by contrast, used mostly the sequential way, for example signing rst roll then down for rolling down. The segmentation started to appear in the rst cohort in the late 1970s and spread in later cohorts. This pattern was then supplemented with further constructions, like A-B-A to express simultaneity and then generalized to a variety of other simultaneous aspects of motion events like Figure and Ground (cat climb cat and climb pipe climb). This way, NSL signers went from gestures to conventional signs.

For Senghas, the principles responsible for linguistic structure have their source in the mind, in particular in the mind of children acquiring language. The fact that it is children from later generations that show the most complexity is interpreted as evidence for children being the primary motor of language change: It appears that the processes of dissection, reanalysis, and recombination are among those that become less available beyond adolescence (Senghas et al., 2004, p.1781). Therefore, only the critical period makes it possible to dissect previously unanalyzed wholes available in the input (be it real-world experience or linguistic productions). Since there is no fully established adult model that would replace over-segmented forms, children do not unlearn their inventions and so the forms they created persist. The process reduces the elaborateness of the surface forms that had previously been necessary for the signing to be adequately expressive, washing away the mimetic dimension (Senghas, 1995).

On the other hand, Senghas et al. (2005) argue that not only psychological principles need to be con-sidered, but also sociocultural, namely the fact that while children are better at acquiring and processing language, adults have bigger sociocultural impact and it is them who create a new niche for new cohorts. Ultimately, there is also a question of retaining the newly created forms: Novel forms would be more likely to be retained by speakers if those forms are seen as more eective at communicating, whether through increased eciency, precision, exibility, or compatibility with either the cognitive capacities of the speakers or the structures of the linguistic system (p.302). This is where we come back to the iterated learning paradigm.

(8)

1.3 Iterating sign languages

It is our belief that an integration of iterated learning research with research on emerging sign languages can be fruitful for both sides concerned. We review the questions that arise at such crossroads in turn. 1.3.1 Evolutionary experiments for sign languages

While emerging sign languages allow us to observe the birth and subsequent changes of language in situ, and a variety of psychological tests can be administered to the users, still, greater control in testing the main ndings from the eld is advisable. Meir et al. (2010) admit that because researchers approach their investigations with dierent questions, dierent materials, and dierent analytic tools, it is at present dicult to rigorously compare emerging sign languages (p. 278) while Fay et al. (2010) claim that studying emerging sign languages in nature makes it dicult to distinguish between contributions of dierent evolutionary processes like iterated transmission and social collaboration. It seems, therefore, useful to simplify the scenarios that occur in emerging sign languages to something that can be tested in a lab. We can then construct settings that take into account a number of variables that are considered important for language emergence, construct sets of pre-dened data and observe processes of change in shorter time than what is available in nature. Admittedly, the participants will carry the background of their rst spoken language but we can still expect biases to aect how they structure language in manual modality and try to tease apart the eect of manual modality from the more general cognitive or population processes, from the eects of age and so forth. That brings us back to the issue of a critical period which needs further discussion.

One could view such a period as a consequence of having language acquisition device, a specic computational mechanism present during maturation and declining with age. Or rather as a domain-general process, akin to Minimum Description Length bias widely discussed in the context of iterated learning (Brighton, 2002). It could also be that it is due to cognitive deciencies in childhood like memory or executive function constraints. The latter option was in fact investigated empirically as Newport's less is more hypothesis, according to which it is children's perceptual and memory limitations that are responsible for children analyzing complex signs into components:

That is, rather than storing sets of complex form-meaning pairs and then somehow nding within these complex stimuli the subparts which such pairs share, the child learner may often store components directly, and may thereby be in a better position to note when these components appear within more complex forms (Newport, 1988, p.167).

Newport investigated the maturational constraints, comparing ASL native signers with early and late signers on word order and morphology. It was shown that non-native signers are characterized by a lot of variability in their language production and tend to produce frozen, holistic forms which means they have not performed adequate morphological analysis. Native learners, on the other hand perform componential errors and omissions and later acquire further morphemes. The explanation oered by Newport (1990) is that the dierence between child and adult learners lies not in how capable they are in performing linguistic analysis but in how the input is perceived and stored, which reduces the problem space for inducing the form-meaning mapping. Adults are capable of taking in more data and memorizing complete signals which hinders their analytical process.

Cochran et al. (1999) tested Newport's hypothesis in an Articial Language Learning context. If the nature of the critical period constraints is biologically linked to a certain maturational stage, adult learners could never learn language the child way. If, however, in line with Newport's hypothesis the constraint is related to cognitive capacities, we could try to manipulate adults into a state resembling critical period, by changing the kind of input they receive - chunks rather than complete signals, or

(9)

manipulating their processing by limiting cognitive resources. The latter manipulation was tested with subjects being asked to learn ASL constructions while counting high tones interspersed randomly between low tones. Adults who could learn ASL verb agreement and ASL verbs of motion in normal conditions acquired the constructions faster but holistically and made mistakes in new contexts due to failure to generalize. Cognitive load adults showed more variable pattern of response and were not better at generalizing than normal condition subjects but also were not worse.

The evidence produced by Cochran et al. (1999) was tentative but it could be, for example, that if the same experiment had been carried out in a chain set-up, the weak tendencies in cognitively impoverished subjects would have amplied and led to better learning or even re-inventing of the ASL morphology over time. This hypothesis is based on a study by Smith & Wonnacott (2010) who tested a dierent explanation for why children are propelling language change, namely that they show preference for regularization, not just segmentation (Kam & Newport, 2005). Regularization of a plural marker was tested and it was found that the predictability of a marker increased in diusion chains composed of adult participants but not in isolated learners. The population-level eects, therefore can stem not only from strong child learner biases but also from weak adult biases that become amplied. The upshot is that sometimes individual, single-generation experiments are not enough. Perhaps what we need is not only comparisons of children and adults in such experiments but also comparisons between them while they are embedded in the process of transmission and are forced to react to the input that gets restructured over time.

In line with the latter hypothesis, in their commentary on the Senghas paper, Russo and Volterra (2005) note that Nicaraguan children were exposed to some communicative input before joining the school and that dierences between generations might be attributed to dierent communicative inputs which they received. They also, however, raise the question of what hearing participants would have done if requested to perform the task without using spoken language, which is a relevant concern because co-speech gestures share communicative load with speech and therefore cannot be directly compared to either of the NSL cohorts. Again, it could be that if a hearing adult is asked to communicate a motion even using only their hands, sequential signs would appear, either after already one trial (due to spontaneous invention), after several trials (compressing the representation) or after passing on the message to other people in an iterated scenario.

Another factor that needs to be taken into account in explaining NSL changes is the population structure itself. Meir et al. (2010) distinguish between village sign languages, e.g. Al-Bedouin Sign Language (ABSL) and deaf community sign languages, e.g. NSL. In the former people share a common culture and social environment from the beginning that they can rely on in communication and therefore can be less explicit. The inow of new users is limited and the inuence of hearing people is bigger because of a sparser distribution of the deaf. In the deaf community, on the other hand, the heterogeneity, changes in the population size and structure make systematic linguistic features develop faster. The contact and transmission are dierent, the information context wider. Constant attunement in a group (peer group, family) creates less variability in the lexicon and grammatical solutions.

If all it took to evolve complex language was a child's brain, we could not expect large dierences between NSL, ABSL and even homesigns. Again, we can carefully study dierent types of signing com-munities but laboratory manipulation of meaning space, population composition, the kinds of interaction available can potentially provide much clearer answers on the inuence of these dierent factors on the emergence of linguistic complexity.

It should be noted that iterated learning is not the only account being applied the language evolution and perhaps to fully explain the processes taking place in NSL or ABSL, we will need to supplement it with a social collaborative account proposed by Fay et al. (2010). According to this account, global systems emerge from local interactions that become aligned over time. The interactions introduce a number of competing signs which over time become selected and spread over community. Obviously, the

(10)

two accounts can be complementary and simply explain two parts of the process - sharing innovation among peers and rapid convergence on certain solutions and second, transmitting it to next generations and slower change in structure.

Finally, the choice of NSL users for sequential representation of motion events can be attributed not only to children's cognitive biases but also to other characteristics of NSL (Senghas et al., 2002). Or, to approach the issue from a dierent end, the fact that adults do not readily incorporate this new way into their productions or the fact that they did not invent it, could be due to settling onto a system which cannot easily be changed, especially when the system they do have is already communicatively functional. It could also be maladaptive to be excessively exible in the adult stage of life. A fair comparison would perhaps be trying to elicit restructuring of a new system in adults with prior system established versus not. This is yet another case where lab experiments can be used for testing hearing adults in other than speech modalities.

1.3.2 Sign experiments for understanding evolution

Trying to go in the other direction - from sign languages to studying language evolution also requires justication.

In the light of what we discussed in the previous sections, the claim that linguistic structure is a product of children and their cognitive abilities seems at odds with an account that would posit cultural transmission itself and the necessity of languages to adapt at its source. Of course, there is nothing inherently incompatible between the two perspectives and we could say that the biases and capacities needed for language to evolve reside in children, not adults. We could also supplement this with an evolutionary account of the critical period itself (Kirby & Hurford, 1997). In that case, however, we still need to explain why compositional structure emerges in iterated learning experiments that use adult participants. Are adult NSL users capable of contributing to progressive regularization of their language and contributing to language change but for some reason settle on the patterns they acquired? Or are IL experiments testing other mechanisms than those operational in natural contexts?

It could also be argued that the dierence between NSL and iterated learning experiments can be explained by a dierence in modality - speech versus manual signs. In the iterated learning paradigm compositionality is said to be an eect of the pressure to generalize to unseen data and the human tests rely on arbitrary random labels presented to participants usually in writing. With iconic signs aorded by manual modality, there is no such pressure because even if the stimulus has not been encountered, it can be constructed on the spot by mime and will likely be understood in the right context. One could then claim that it is the increased pressure to generalize in experiments with written labels that leads adult participants onto the compositional path while we cannot expect the same when we are dealing with iconic signals. This, however, creates a chasm between dierent modalities and between adults and children and turns the fact of children creating structure within iconicity into a mystery, questioning at the same time the role of the bottleneck in the emergence of compositional structure. Meeting the challenge of explaining this chasm requires a short digression to work on other modality that involves iconic signs - graphical communication.

Experiments in the graphical mode have emerged relatively recently as a way to avoid the inuence of speech background as well as to investigate whether iterated learning can explain the transition from iconic to symbolic signs. Icons are such signs which structurally resemble the objects they refer to. The information required for interpreting the meaning of such a sign resides primarily in the sign itself. The simplication of signs and gradual loss of iconicity marks an emergence of symbols, which carry meaning by information contained in the cognitive systems of their users, who either remember how the sign has arisen, memorize the relationship between sign and meaning or can infer the meaning from knowledge

(11)

of other signs (for example, via compositionality). The relationship between symbols and objects is not natural but conventional, though it does not mean that all iconicity is lost, just like it is not entirely lost in sign languages although its role in cognitive processing is diminished.

Garrod et al. (2007) asked participants to repeatedly play a game of Pictionary, in which a director is asked to draw a concept and a matcher guess its meaning. The game was played with a set of 16 concepts drawn from meaning domains such as museum, theater, art gallery, and the amount of feedback between directors and matchers was varied. It was shown that over time drawings become simpler and more abstract and in the presence of interaction and exchange of roles they converge - the drawings produced by each of the players for a particular concept become more similar by the end of the game. The authors consider two explanations for simplication of graphical signs - mere repeated production and grounding, and conclude that the latter hypothesis is supported because of the key role played by communicative feedback and exchange of roles. The authors also discuss the increasing compositionality and systematicity of signs. They point out that many signs contain independently interpretable elements that combine to produce the meaning of a whole. Interestingly, not only the signs are structured, the system itself becomes structured, so that subparts of the signs correspond to taxonomic subparts of the domain of objects as a whole (p. 985), showing that symbolization is not only a matter of increasing arbitrariness of iconic signs but also, and perhaps more importantly, of restructuring of the system as a whole. Unfortunately, they do not probe the issue further.

The latter is accomplished by Theisen et al. (2010) who investigated specically the relationship between arbitrariness and systematicity of graphical signs, systematicity being present when signals for similar meanings share an element. A more coherent set of task stimuli, that represented ve entity types and ten themes, was chosen ensuring that they shared semantic features that could be exploited. The participants played a game of drawing and guessing, switching roles every six trials. By the end of the game, signs became more systematic and more arbitrary but it was also found that this systematicity is already present in initial trials. Participants tried to reference previous drawings in an attempt to communicate, showing that the connection of signal to meaning can be easily established by iconicity but also by relation to previously communicated meanings.

The results of these studies are signicant for language evolution in general because they suggest that systematicity and compositionality could be taken o the ground from iconic mappings, not merely chance commonalities. The latter possibility is exemplied by Wray's (1998) fractionation account, according to which rst arbitrary vocal signals were holistic and it is random similarities between sounds used in utteranes that expressed similar situations led to associating what is common in form to what is common in meaning and thus leading to the emergence of the rst words. A process of the same sort needs to be assumed in studies such as by Kirby et al. (2008) where we begin with randomly created labels. By contrast, allowing for iconicity to play a role in language evolution, either in a manual form such as proposed in gestural theories of language evolution (Arbib, 2005; Armstrong & Wilcox, 2007; Corballis, 2002) or directly as sound-symbolic mappings (Kita et al., 2010), allows us to consider a wider range of possibilities for the emergence of systematicity. Constructing a gestural paradigm for testing language evolution could help us investigate the plausibility of such scenarios while extending it to sound symbolism would further our understanding of iconicity as a cross-modal phenomenon.

Sign languages often prompted gestural speculations but they can also lead to more detailed questions about the form of signs that survive the transmission. As we have seen from research by Senghas et al. (2004), motion events are progressively expressed by separated and linearized forms. Other research in the area, for example by Goldin-Meadow on word order (2002) or on object and handling gestures (In press) shows that there might be universal tendencies in other domains as well. We could try to explain these tendencies as cognitive biases per se (e.g. certain word order being more natural given the way we perceive events) or we could again take an evolutionary perspective.

(12)

The puzzle of which signs eventually win in an iconic medium where potentially any sign can be understood, is an evolutionary question when we look at it as a matter of tness. Fay et al. (2008) investigated precisely this - whether cultural transmission leads to the emergence of optimized system of signs. Again, using a game of Pictionary they compared signs that emerged in two conditions: community in which interaction partners change but stay within a group and isolated pairs. In addition to complexity and information content the signs were compared on measures of sign eciency and discriminability because an eective sign should be easily detected, eciently encoded into memory and its meaning should be accurately and eciently derived from the sign. (p.3555)1. It turned out that community

signs were better decoded by naive observers - people who did not participate in system creation but the benet could not be attributed to either speed or ease of detection. Instead, the residual iconicity of community evolved signs was found to be responsible. Thus, while symbols are usually easier to produce and perceive since they require only enough structure to be distinguished from other symbols, icons are better in supporting transmission to new group members. In other words, signs become optimized in two ways at once: ease of production and ease of learning by subsequent generations. They can become optimized because the need to align is supported by the availability of big and varied set of exemplars in a community.

In sum, considering emerging sign languages phenomena leads us to ask questions about particular adaptive solutions that are employed, beyond the core feature of compositionality traditionally investi-gated in the iterated learning paradigm and increases the range of our inquiry. We can start probing the patterns observed in nature and add the issues of a critical period, transmission itself, the nature of iconicity2, population structure, the tness of signals or perhaps some internal constraints provided by

the language system. We can ask whether what we nd is generalizable to both modalities or explain the dierences if found. It might be that, given the right context, the same results as those that appeared in NSL could be obtained in adult participants and the way this comes about would shed light not only on the NSL issues but also add data to the evolutionary research agenda.

1.4 Predictions

Pulling both strands of research we presented above, we propose a test that will aim to replicate the NSL motion segmentation phenomenon with adult hearing participants. If it is children's ability for segmentation and generalization that is responsible for the change in NSL, we should not be able to get the same eects with hearing adults. If the changes are an eect of transmission and/or resulting dierences in the input available to each new generation, we could expect the segmentation to emerge in an iterated learning context.

Hypothesis 1 : The signs for motion events will become segmented in later generations.

We propose testing this hypothesis in the simplest iterated learning scenario, with each chain consisting of a series of individuals passing on the system from generation to generation, with no interaction. If Hypothesis 1 is not conrmed in such a simple set up, we will treat this study as an opportunity to establish an experimental approach of testing gestural evolution and examine more complex conditions in the future.

Another motivation for this study is the fact that while iconicity has been investigated in graphical communication setting, it is far from clear that it is the same phenomenon as iconicity in manual modality and whether, for example, the results from graphical communication experiments can be generalized to sign languages. One dierence is the time variable which is not available to pictures but is exploited

1Galantucci (2005) in earlier experiments adds that the forms that best facilitate convergence are perceptually distinct, produced by simple motor sequences, and tolerant of individual variations (p. 760).

2For example, can we conceptualize it as a learning bias? Is there an interaction between a preference for iconicity and a size of the system (Gasser, 2004)?

(13)

by NSL signers. Another dierence could lie in the kind of iconicity that can be employed in graphical settings. McNeill (1996; 2000) talks about the Kendon continuum for gestures that go from gesticulations (co-speech gestures) through pantomime, conventionalized emblems to signs. Each of these elements have dierent relation to speech, dierent degree of systematicity and conventionality and also engage dierent semiotic modes. The latter can be global or segmented depending on whether the meaning of the parts is determined by the meaning of the whole or the other way around; and synthetic or analytic, depending on whether one sign expresses all components of the scene. It is not clear whether the same continuum exists in the graphical domain but could potentially aect the conclusions drawn from the Pictionary experiments. For example, can we identify anything like gesticulations which are global and synthetic in pictures? If not, could the emergence of systematicity in pictures be related to such an absence and a potential lack of such emergence in experiments in manual modality?

The point is that it can only be shown empirically whether the processes that occur in Pictionary experiments do also occur in gestural experiments. Based on previous research, we thus predict the following:

Hypothesis 2 : The transmission error will drop in later generations.

Hypothesis 3 : The signs within individuals will become more systematic in later generations. In constructing our set of motion stimuli we go with Theisen et al. (2010) and create a structured set of meanings which allows for investigating systematicity. We construct a set of 16 motion events with 4 manners and 4 paths that can all be combined. Due to a lack of interaction, to ensure that the system does not become too underspecied we use a discrimination task embedded in the learning phase which will hopefully encourage participants to be clear in their signaling: due to an awareness that the quality of one's gesturing is functional to the next participant who will have to identify its meaning. We aim to test dierent combinations of the number of videos in the discrimination task and the number of learning rounds. These numbers will be tested in order to determine the best level of diculty for the experiment. It might be, for example, that having just 2 discrimination videos is too simple and does not motivate participants to produce clear messages while having 4 discrimination videos is too dicult and requires more learning rounds for participants to learn anything of the previous generation input.

2 Methods

2.1 Participants

40 students from the University of Edinburgh participated in the study. They were invited to participate in a gesture comprehension and production experiment and informed that they will be recorded and the resulting videos will be shown to other participants. After the experiment participants were payed a fee of ’4 and debriefed.

2.2 Materials

A set of motion events stimuli (Motion Set) was constructed by using a list of motion verbs provided by Levin (1993). All the verbs that are listed in section 51 of that work (Verbs of Motion) were collected totaling 236 verbs. Next, every verb was analyzed as to its potential of occurring in a perceptually simple scene with as little additional objects as possible. For example, to abandon was excluded because it is dicult to convey in a simple scene, to bike was excluded because it would involve signing an additional object instead of only the agent. The resulting set of 9 verbs of motion manners was limited to 4 verbs that could be used with 1 agent and 4 paths (presented in Figure 1). We chose an inanimate agent - a ball - for our motion scenes. Animations of the 16 manner and path combinations were created using Adobe Flash Professional CS5.5 software. An example of roll down video is shown in Figure 2.

(14)

Figure 1: Motion Set

Paths Manners

straight across bounce

diagonally straight down roll (turn along horizontal axis) horizontally in a wave (slalom-like) spin (turn along vertical axis)

horizontally in a circle jitter

Figure 2: Roll down video frames

For the initial set of the 16 gesture videos (Gesture Set) in Generation 0, 16 University of Edinburgh graduate students were asked to watch a single motion video and communicate what they have witnessed using their hands. They were recorded and their permission for the use of these recordings in further studies was obtained. We chose to use 16 dierent people in order to approximate a random, unbiased set of videos, uninformed by the knowledge of the full meaning space.

The experimental tasks were created using Matlab Psychtoolbox software and presented on a computer screen.

2.3 Procedures

Each participant was invited to the lab on their own. They were seated in front of a computer and a camera, briey introduced to the experiment and told that they will receive instructions on the screen once they begin the task (you can see the scheme of an experimental design in Figure 3 and the instructions for both phases of the experiment in Figure 4).

Figure 3: Experimental set-up

In the rst part of the experiment (Learning Phase) each participant was asked to watch 12 gesture videos. These videos were semi-randomly drawn from the Gesture Set for the rst participant in each

(15)

Figure 4: Instructions

Learning Phase:

Welcome to the experiment!

In each round you will be presented with a video of a person gesturing and then with [two/four] videos of a moving ball. Your task is to identify the video that corresponds to the witnessed gesture. You will be given feedback on whether you identied the motion video correctly. There will be [2/4] learning phases.

Press SPACE to proceed. Teaching Phase:

Welcome to the experiment!

In each round you will be presented with a video of a moving ball and your task is to communicate the meaning of that video to another person using only your hands. You will have 7 seconds to complete that task. You will be recorded during that phase and the resulting videos will be shown to another participant. Please make sure that you communicate all the aspects of the scene that you nd relevant because the next participant will have to identify the correct video from your message. There will rst be a practice phase, you make ask questions if anything is unclear after that.

Press SPACE to proceed.

chain and from the gesture videos produced by a participant in the previous generation for all later generations. The videos were chosen in such a way that only one of each manners and one of each paths was missing. For example, the set in Figure 5a would be acceptable while the set in Figure 5b would not because of eliminating two instances of spinning while preserving four instances of bouncing. This was done to ensure a balanced presentation of each meaning dimension, important given the small size of the meaning space.

Figure 5: Randomization constraints

(a) Acceptable set

bounce

straight bouncedown bouncewave x x roll

down rollwave rollcircle spin

straight x spinwave spincircle jitter

straight jitterdown x jittercircle

(b) Unacceptable set

bounce

straight bouncedown bouncewave bouncecircle x roll

down rollwave rollcircle spin

straight x spinwave x jitter

straight jitterdown x jittercircle

After each gesture video, in a discrimination task, participants were shown a set of 2 or 4 motion videos (depending on the version of the experiment) and asked to identify the video that expressed the meaning they have just seen conveyed to them in gesture. Of the discrimination videos drawn from the Motion Set, one video was the target corresponding to the gesture video they have just seen while distractor videos were randomly chosen from the remaining set in such a way that they shared either manner or path with the target video. Participants received feedback on whether they identied the video correctly.

The Learning Phase was repeated 2 or 4 times. The 2-Version involved 2 Learning Rounds and 2 videos in the discrimination task while 4-Version involved 4 Learning Rounds and 4 videos in the discrimination task.

(16)

commu-nicate what they have witnessed using only their hands3. 4 of these videos were randomly generated to

be new, i.e. to express the meaning they had not witnessed in the Learning Phase and were therefore forced to generalize from a set of meanings they had seen. The recorded videos from the Teaching Phase were used in the Learning Phase for the next participant in the chain and the process was repeated until 5 generations in 4 chains were obtained.

The experiment took approximately 20 minutes for the 2-Version and 30 min for the 4-Version.

2.4 Gesture encoding

Given our research questions about the changes in language structure and relationships between dierent chains and dierent generations, we did not want to limit ourselves to merely establishing whether signs segmented and linearized in time are present and counting in how many instances this occurred. We also wanted to capture any subtler forms of regularity were they to emerge. Therefore, we attempted to set up a coding scheme for gestures produced in our study and encode them as series of symbols that could be further subjected to quantitative analysis.

In a seminal work on the structure of sign language, Stokoe (1960) describes signs as composed of three elements: tab, dez and sig. Tab is the sign location with respect to the body - whether it is made close to the head, torso, in relation to non-active hand and so on. Dez is the hand conguration, a handshape. Sig is the movement or a change in conguration of the active hand. These elements occur simultaneously and establish a system of distinctions between dierent signs. A st, for example, can be considered the same sign whether it is made with the palm facing down or to the side or it could be recognized as two dierent signs, depending on how they are used by signers themselves.

Klima & Bellugi (1979) developed this work further by talking about Hand Conguration, Place of Articulation and Movement and analyzing 20 ASL handshapes into sets of features like extension and contraction of ngers, contact between ngers and thumb, active ngers, spread ngers or compact and so forth. Other dimensions that can be described are the kind of contact between hands or between hand and body part (if any) and orientation of the palmar surface. Movement can be hand-internal or directional - along paths in space.

The problem that one encounters in trying to develop a coding scheme lies in the fact that it is dicult to state a priori which dimensions of shape and movement will be relevant for a given situation before the sign system is created. There is a great variety of possible shapes that a human hand can form but each sign language selects a limited number of them. Trying to cover any possibility is likely to result in a scheme that will introduce so many distinctions that any regularity will be lost. Choosing too little dimensions could lead to over-interpreting the data and weaken our conclusions. In other words, the problem of over- and under-tting is unavoidable and will require additional studies to resolve in a manner suitable for our research questions.

Before such work is carried out we limited ourselves to a number of dimensions that seemed to be most distinctive from the rst look at our data. In addition, we were guided by the nature of our meaning space and the task context. Hearing adults are unlikely to invent too many manual distinctions in the course of the 20 minutes of the experiment, nor meaningfully employ dierent places of articulation. Therefore, we (1) did not encode gestures with respect to location but (2) did encode both global and local movement (deemed crucial for the expression of motion events). The chosen features were encoded as a series of numbers for each video, noting handshape and movement for each hand and for several time steps if the video contained visibly distinct gesture fragments. The resulting coding scheme is shown in Figure 6.

3Unfortunately, due to an experimenter error we have recorded only 12 videos in each Teaching Round, therefore, our transmission error analyses below are based on 8 matching videos for each pair of subsequent generations.

(17)

Figure 6: Coding scheme of a single video gesture

Time step Hand Conguration Hand Conguration Shape Orientation Global motion Local motion ... 1/2/.../n 1 0-3 0-3 0-2 0-7 0-4 0-6 2 ...

) * *repeat for a desired number of time steps

Hand tags: 1  right; 2  left; Conguration:

Shape:

Fingers involved: 0  none; 1  all; 2  index nger; 3  lower arm Flexion: 0  none; 1  closed; 2  exed; 3  straight

Spread: 0  none; 1  ngers closed; 2  ngers spread

Orientation: 0  none; 1  up; 2  down; 3  left; 4  right; 5  front; 6  back; 7  diagonally Global motion: 0  none; 1  straight; 2  diagonally; 3  wave; 4  circle

Local motion: 0  none; 1  X-oscillations (side-side); 2  Y-oscillations (up-down); 3  Z-oscillations (front-back); 4  X-rotation (spin); 5  Y- rotation (roll); 6  Z-rotation (roll front-(front-back);

3 Results

All results were processed separately for our two versions of the experiment.

3.1 Communicative accuracy

First, we computed the correctness scores for each participant, i.e. whether the meaning intended by the teacher was correctly guessed by the learner. We asked whether communicative accuracy measured this way (1) increases over trial rounds for each participant and (2) increases for subsequent generations in each chain.

The accuracy scores were overall high in the 2-Version (with the maximum score being 12, Mean = 9.55, SD = 1.8109, V ar = 3.28) and did not increase signicantly in the second compared to the rst learning round nor across generations.

Figure 7: Communicative Accuracy in the 4-Version

(a) Over learning rounds (b) Over generations

In the 4-Version, the scores in the Learning Round 4 were signicantly higher than the scores in the Learning Round 1 (p < 0.01, t(19) = −3.018) suggesting that participants have learned the meaning of the teacher gestures over time. In addition, Jonckheere-Terpstra Test revealed a signicant trend in the

(18)

data with scores becoming higher in later generations (J = 111.5, SD = 2.0971, r = .47, p < .05), which implies that language became more communicatively functional over time (see Figure 7).

It needs to be noted, however, that the scores did not reach 100% accuracy, with the mean of 8.85 for the Learning Round 4 and 9.125 for Generation 5. It remains to be seen whether further adjustments to the combination of the number of discrimination video choices and the number of learning rounds are necessary, as well as whether creating longer chains would allow accuracy to reach still higher level.

3.2 Learnability

Even though participants were not instructed explicitly to copy the gestures of people they have observed in the Learning Round, post-test interviews revealed that they tried to re-use the gestures they saw and improve on them when necessary. In order to measure how faithfully people copy the output of the previous generation, we calculated transmission errors for each chain and each generation.

Transmission error is usually determined by computing average Levenshtein edit distance between homologous signals in subsequent generations (signals that express the same meaning). Because our video gesture encodings have xed lengths and each symbol at a given position encodes specic features, it would not be meaningful to employ Levenshtein insertions and deletions and therefore we restricted ourselves to a simplied edit distance based on substitutions. In addition, we assigned dierent weights to dierent gesture dimensions. Our simple weighted distance measure for each pair of homologous signals was thus the following:

d(a, b, W ) = 1 N N X i=1 Wiγ(a, b, i) (1)

where a, b are the video gesture encodings (represented as arrays of feature symbols of length N), W is the weights vector and

γ(a, b, i) =    1, if ai6= bi 0, if ai= bi (2)

In other words, the measure was the number of substitutions multiplied by the corresponding weights and divided by the length of the gesture encodings considered. We chose the weights in such a way as to give more importance to the dierence in motion, slightly less importance to the handshape and less to the orientation of the palmar surface, according to our understanding of how important dierent gesture dimensions are for expressing motion events. W was set to [0.3,0.3,0.3,0.1,0.5,0.5] for 1 hand, 1 time step. It must be noted, however, that setting all weights to 1 merely attens the resulting plots without aecting the general trends.

We take our distance measure to be an average rather than a sum in order to make gestures that involve dierent number of time steps comparable with each other. Unfortunately, at this stage we are only able to handle the dierences that occur between chains, not between generations, i.e. our formula requires that the gesture encodings in corresponding generations are of the same length. It is not yet clear, how to judge similarity between gestures that contain variable number of time steps or weigh the dierent order of time steps. For example, it would seem that a dierence between two people signing roll down holistically (in one time step) but using dierent handshapes to do so should be judged as smaller than a dierence between one person signing roll down holistically and another person signing rst roll, then down (in two time steps). Considering that only one of our chains contained multiple time steps, this did not pose diculty to our analysis but it is an issue that needs to be resolved in future studies.

(19)

We calculated transmission error for all videos for all tangential generations and then averaged to obtain one number for each pair of participants compared. The resulting plots for 2-Version and 4-Version can be seen in Figure 8.

As evident from the plots, unfortunately, we did not nd a decrease in transmission error across generations (estimated with Page's trend test). Of course, it could be that the increase in faithfulness of transmission can only be expected in the chains with good signs to begin with and that when signs are confusing, incoherent or dicult to produce, people choose to innovate rather than replicate the bad patterns. It is also dicult to distinguish between randomly inventing new gestures and generalizing. For example, a qualitative look at the last two generations in Chain 1, 2-Version reveals that the dierences between these two participants lie in the Generation 5 participant generalizing the spatial segmentation pattern. It is possible that once the chains reach such a stable state, transmission error would drop and stay at a low level. Perhaps 5 generations is not long enough for this to occur.

Figure 8: Transmission Error

(a) 2-Version (b) 4-Version

3.3 Regularity

One feature that becomes evident from looking at the obtained gesture videos is that people tend to be consistent in their handshapes. This is true for (1) consistency within individual (using only one hand or only both, using index nger or a whole st etc. for all motion events), (2) consistency within chains (a tendency to propagate the shape choices down the generations)4and (3) slight tendencies within

particular meaning dimensions (bouncing and jittering are usually expressed with a st handshape while rotational motions with the index nger). Qualitative analysis also shows that over generations people become more consistent.

The most straightforward method of turning these observations into more quantitative conclusions is relying on frequencies of dierent handshapes. With regard to (2), taking the 2-Version data under consideration, we calculated that there were 30 unique handshapes employed. Only 8 of them occurred in more than one chains, only 6 with frequency larger than 10. Plotting the frequencies of those 6 handshapes shows that they are rather specic - each handshape is very frequent in a particular chain but very infrequent in all the other chains (Figure 9).

Going beyond looking at simple frequencies, we calculated entropy of the handshapes used by each participant and compared these entropies for all generations and all chains. Entropy is a measure of our average uncertainty when we do not know an outcome of an information source. It can be used as

4This is related to the faithfulness of transmission measured by the edit distance above. While edit distance, however, deals with homologous signals, here we are interested in consistency of sets of signals as a whole.

(20)

a measure of regularity of the data because entropy is higher the more distinct values in the measured variable and the more evenly they are distributed. When variability goes down because there are less distinct values or some values are clearly preferred, the entropy is lower.

Figure 9: Handshape frequencies

If we represent handshapes as distinct symbols by concatenating handshape and orientation numbers for both hands (i.e. we omit for now the time and hand tags, as well as motion numbers because here we are interested in investigating consistency of handshape only), the maximum entropy for all possible handshape combinations given our encoding is 17.1699. We could also, however, represent handshapes as two symbols, concatenated separately for each hand. Finally, we could leave the encoding as separate numbers, reecting the fact that dimensions can change independently. In the latter two cases, entropy values need to be normalized to the maximum entropy of each of the separate dimensions.

First, we calculated whether entropy within each chain across all generations is less than entropy from randomly created pseudo-chains. We have calculated entropies for veridical chains and then for simulated 1000 dierent pseudo-chains, created by randomly assigning the data gathered from people in the 2-Version to dierent chains. We performed this procedure 100 times and counted how many times the z-scores of entropies obtained from the veridical chains were lower than -1.96. The result was 63 for Chain 2 and 23 for Chain 4, 0 for Chains 1 and 3. These numbers could not be obtained for 4-Version due to poor handling of dierent lengths of gesture encodings.

Second, we plotted entropies separately for each chain and each generation. In Figure 10 we rst present the plots obtained by using the three ways of representing handshapes described above and the resulting entropy values for our 2-Version. We also plot the baseline entropy of Generation 0, which is our Gesture Set, produced by 16 dierent people. As can be seen from the plots, the 2-hands and separate dimensions representations lead to similar results, therefore we choose to use the separate representation whenever possible. 4-Version entropies are presented in Figure 10d. It can also be noted that there is a clear drop of entropy between Generation 0 and Generation 1, reecting a fact that each of the participants in Generation 1 is more consistent in their handshapes than a set of 16 dierent people (in line with our observation (1) above). Finally, there is a slight decreasing tendency over generations but unfortunately it did not reach statistical signicance.

Finally, we asked whether handshape entropies dier between dierent meaning dimensions, and whether, for example certain meaning dimensions regularize more rapidly. This turned out not to be the case. The average normalized entropies for all participants ranged between 0.6237 and 0.6444 for

(21)

Figure 10: Handshape Entropy

(a) V2 entropy from concatenated handshapes (b) V2 entropy from handshapes concatenated to 2 hands

(c) V2 entropy from handshapes as separate numbers (d) V4 entropy from handshapes as separate numbers

dierent manners and between 0.5809 and 0.7336 for paths. There were also no notable dierences in entropy changes over generations for dierent meaning dimensions.

3.4 Compositionality

In addition to mere signal regularity, we can examine the regularity of mapping between signals and meanings and thereby tap into compositionality and systematicity. Drawing inspiration from the NSL study, we qualitatively examined the data for any cases of segmentation or linearization of gestures. Our expectation that such solutions will be an outcome of transmission and therefore will emerge in later generations (at least in some chains) was not conrmed. What did happen, however, is that in two chains participants spontaneously invented such systems which were propagated down those chains.

In the 2-Version, participant from Generation 2, Chain 1 (called henceforth the Space Chain), broke up the signs for all rolling and spinning meanings spatially, with one hand tracing the path of the event and another hand signing the manner (see a snapshot in Figure 11a). She continued signing bouncing and jittering holistically but the segmentation convention spread to these domains by the end of the chain and was the only solution used in Generation 5 (see bouncing in Figure 11b).

In the 4-Version, already in Generation 1, Chain 2 (the Time Chain), participant created tempo-rally segmented signs, just like in NSL, by signing manner and path separately (see Figure 12 for rolling straight). In a post-test interview he explicitly denied knowing NSL or any other sign language. Unfor-tunately, the system so created did not survive nor generalize by the end of the chain as it appears that

(22)

Figure 11: Spatial segmentation

just as one person is enough to create a system, one person is enough to break it in the current set up (more on this in Section 4).

Figure 12: Temporal segmentation

Of course, the absence of segmentation in other chains does not mean that there is no structure to the signal-meaning mapping but its form may be subtler than detectable by just qualitative analysis. Ideally, we need a compositionality/systematicity measure that (1) will detect regularity between signals and meanings from video encodings, (2) will agree with naive observers judgment, (3) will be gradual so as to detect any potential increase in regularity between subsequent generations and compare systems that employ segmentation with those that do not. We tested two measures that have previously been used in iterated learning studies: pairwise distance correlation and the RegMap metric.

Pairwise distance correlation relies on the fact that in compositional languages neighboring meanings and neighboring signals share structure and therefore similar meanings will in general be expressed with similar signals while more semantically distant meanings will be expressed with more distant signals. We can check for the degree of correlation between pairs of meanings and pairs of signals and the higher that correlation, the more structure there is in the system (Kirby et al., 2008). The signicance of the obtained correlation can be estimated using Monte Carlo analysis techniques. Accordingly, we have calculated pairwise distance correlations and plotted their z-scores (Figure 13). Calculation was done on a representation that contained both handshape and motion dimensions.

RegMap (Tamariz & Smith, 2008) is a measure that combines the conditional entropy of meanings given signals and of signals given meanings and normalizes the result to make it comparable across

(23)

Figure 13: Structure by pairwise correlation

(a) 2-Version (b) 4-Version

systems of dierent sizes (Cornish et al., 2009, p.8). In other words, it checks for how predictable a signal component is given a meaning component and the other way round. The advantage of this metric is that we can determine which signal component encodes which part of the meaning and in our case let the data tell us whether dierent time steps, dierent hands, or dierent hand conguration dimensions selectively represent manners and paths.

Figure 14: Structure by RegMap

(a) 2-Version (b) 2-Version

(24)

We have calculated RegMap values for two dierent ways of representing gestures: (1) with handshape and motion expressed all as separate numbers and (2) with handshape concatenated to one numeric string and motion dimensions expressed as separate numbers, e.g. [1224,1,2,1223,1,2]. The resulting plots did not dier much for these two representations so we only show the latter (Figure 14) because it is more compact and allows for clearer analyses, i.e. relative contribution of handshape versus motion in expressing motion events. We present the RegMap values separately for both directions of predictability: from signals to meanings L(SM) and from meanings to signals L(MS) as they are not reducible to each other (a given signal can be less predictable when it maps to dierent meanings as is the case in ambiguity, while a given meaning can be less predictable when it maps to dierent signals as in synonymy). As can be seen from the plots, structure by pairwise distance correlation does overall increase above the baseline Gesture Set level as well as over generations. This is also true at least for some chains when structure is measured using the RegMap metric. Unfortunately, the trends are not strong enough to reach statistical signicance measured by the Jonckheere-Terpstra test.

Interesting observations can be made by plotting RegMap values for particular signal dimensions for dierent chains (Figure 15). Usually we are interested in changes occurring from one generation to the next, so one could wonder whether it is reasonable to collapse all generations for a given chain. However, we have seen that entropy within at least some chains is smaller than in a random pseudo-chains, therefore we believe such observations can be a useful tool for exploring the data. Examining dierent signal dimensions shows that overall, not surprisingly, in all chains global motion selectively encodes path (iconically tracing straight, down, wave and circle) while local motion encodes manner (oscillating up and down for bouncing, rotating hand for rolling and spinning). In addition, however, in the Space Chain handshape is also found to be more related to manner than path, perhaps signifying a greater systematization in our compositional chain (also evident in Figure 13a and 14a).

Figure 15: Selectivity of dierent signal dimensions

3.5 Follow-up study

Given the novelty of our nding of segmented signs systems in hearing adults, we made an attempt to overcome the limitations of our simple design and determine whether such systems would prevail if, for example, more interaction was available. In other words, our design restricted the space of new signs

(25)

and made them susceptible to sudden death just because of one participant in a chain not willing to propagate them further. We suspect that in a more natural setting, within a community, such signs being less ambiguous and more structured would be more likely to spread, which could explain the success of such signs in NSL. We conducted a follow-up study to test these predictions but given the limited time available could only do so for the 2-Version, i.e. the spatially segmented signs of the Space Chain.

10 people were invited to participate in the study. The procedure was the same as in the original experiment with two modications: (1) there was no chain, just 10 separate Learning+Teaching rounds and (2) participants were given a new Gesture Set. There were two versions of the Gesture Set created to contain 50% of the compositional system signs from the Space Chain, Generation 5 and 50% of a non-compositional system in Chain 4, Generation 5. The latter was chosen because it was clear and consistent, with all meanings expressed by one hand formed into a st. The two versions were a mirror image of each other with respect to which meanings are expressed by which signals to minimize the eect of the initial choice. The results of the experiment are shown in Figure 16.

As can be seen, 5 out of 10 people used the Space Chain system in 10 or more motion scenes out of 12. On the other hand, 3 people did not use it at all and 1 person for just 1 sign. However, it is interesting to note that out of these 4 individuals, only one generalized the Chain 4 system to 11 out of 12 signs, the remaining 3 invented mostly completely new system. The results are inconclusive but encouraging, given the exploratory nature of this experiment, and invite further investigation.

Figure 16: Replicability of the compositional system

4 Discussion

Our study has shown that segmented and linearized system of signing motion events can emerge in adult hearing participants. This does not directly undermine the conclusion that it is children who are responsible for language change but at least weakens the position of the critical period and associated cognitive-biological mechanisms as the primary explanans of the phenomenon.

Our results point at several facts easy to overlook when we focus on segmentation and linearization in time. First, we have witnessed that in addition to segmentation in time, segmentation in space is a viable option for systematizing signs in manual modality. In our 2-Version of the experiment we have seen the spreading of information over two dierent hands, with one hand showing manner of motion and path and another hand tracing the path only. Such a sign seems redundant, especially because the path information seems easier to comprehend from both signs and motion videos - several participants

Referenties

GERELATEERDE DOCUMENTEN

- Afstand perceel tot eventuele waardplanten (tomaat, augurk etc.) Daarnaast ook achterhalen welke waardplanten onder de (on)kruiden aanwezig zijn (agoemawiri). -

Therefore, the question that will be addressed in this study is: “is there a relationship between the number of languages a person knows and their performance in a visual, nonverbal

De concept conclusie van ZIN is dat lokaal ivermectine in vergelijking met lokaal metronidazol en lokaal azelaïnezuur een therapeutisch gelijke waarde heeft voor de behandeling

The CreaRE series of workshops brings together RE practitioners and researchers engaged in discussing the role of creativity in RE, the array of creativity techniques that can be

Cartel actions causes the demand for input to decrease as the cartel either has to create scarcity in order to achieve higher-than-competitive prices (Cournot competition) or it

Uit de berekeningen komt naar voren dat het belang van het agrocomplex in 2025 verder zal zijn afgenomen, tot een aandeel van circa 8,5% in toegevoegde waarde en

- Dit onderzoek toont aan dat er naast de gebruikelijke “verklaringen” (zwavelverdampers, pesticiden residuen, losgelaten aantallen, voedselgebrek, bladplukken, CO20 dosering) nog