The Emergence of Linguistic Conventions for Word Order: An Agent Based Model

(1)

The Emergence of Linguistic

Conventions for Word Order

(2)

Layout: typeset by the author using L

A

_TEX.

(3)

The Emergence of Linguistic

Conventions for Word Order

An Agent Based Model

Loïs M. Dona

11873116

Bachelor thesis

Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam

Faculty of Science

Science Park 904

1098 XH Amsterdam

Supervisor

Dr. M. Schouwstra

Institute for Logic, Language and Computation

Faculty of Science

University of Amsterdam

Science Park 907

1098 XG Amsterdam

(4)

Abstract

This research aimed to gain more insight into the relative contributions of semantics and structural priming to the emergence of linguistic conven-tions for word order by using an agent based model. This model has been designed based on an existing experiment in which two participants com-municated with each other using improvised silent gesture (Schouwstra, Smith, & Kirby, 2020). The data collected in that experiment revealed that the word order of the participants’ gestures started off as being de-pendant on semantics of the message, but became more regular through repeated communication, signifying the emergence of linguistic conven-tions for word order. Multiple agent based models with different kinds of structural priming and weights that regulated the ratio in which seman-tics and structural priming affected the agents’ word order preferences were implemented and compared to each other in order to find the model which best fitted the regularization of word order that has been observed in the experiment. The optimal model indicated that the regularization of word order is related to an increased influence of structural priming and a decreased influence of semantics on the participants’ word order choices. Furthermore, the results showed that structural priming is a cu-mulative process of taking into account the entire memory of encountered word order structures when producing a new improvised gestured utter-ance. These findings provide a useful base for the design of more complex models and signify the usefulness of interaction between experiments and computational models.

(5)

Acknowledgements

I wish to express my gratitude to my supervisor, dr. M. Schouwstra, for guiding me through this project, giving feedback and providing me with her knowledge and expertise in this subject.

I also want to thank the coordinator, dr. S. van Splunter, for checking in with my progress and giving me useful feedback.

(6)

1 Introduction

The evolution of language is an important topic in the field of linguistics and researchers have been trying to get to the bottom of the underlying processes of language evolution and origins of language for many years. Studying this topic is useful, because natural language is a unique trait that distinguishes humans from other species and knowledge of this unique feature can provide valuable insights which can be used in the formation of linguistic theories and a better understanding of our main communication system. The evolution of language is a broad field of study in which it is hard to focus on all aspects of language, which is why researchers often focus on one specific aspect of human language evolution. An example is the emergence of linguistic conventions for word order, which focuses on the way in which the majority of emerging languages eventu-ally converge on a dominant word order (Dryer, 2013). Word order refers to a language’s ordering of subject, object and verb in sentences and is a way of describing who did what to whom. Much research on this subject has been published, often consisting of experimental approaches, including the research of Schouwstra et al. (2020).

This research consisted of three different experiments in which participants de-scribed a given event to another participant, using only improvised gesture to ensure that they could not use any linguistic knowledge they already possessed. In this way, an emerging linguistic system was resembled as closely as possible. One of the experiments used fixed pairs of participants taking turns performing this task, which will be explained more thoroughly in section 2.1. They found that word order started off as being dependant on the semantics of the message, but slowly became more regular over the course of trials. This shift indicated the emergence of linguistic conventions for word order. The dependance of lin-guistic structure on semantics is also referred to as naturalness (Schouwstra et al., 2020). Furthermore, they have discovered that observations from the pre-vious trial affected the produced word orders of participants in the next trial, which is referred to as structural priming. Nevertheless, how and to what ex-tent semantics and structural priming contributed to word order regularization exactly remained unclear.

An experimental approach like this is advantageous, because it involves real world participants and circumstances. However, the downside is that they are not flexible, due to the limit to what can be demanded from the participants and conductors in terms of time and energy. Computational models do have this flexibility and large computational power. For example, it would not be humane to execute thousands at once in a real life experiment, whereas this would be a possibility when using a computational model. Agent based models are an example of such models, being able to mimick a simplified real world situation using computational resources (Lekvam, Gamb¨ack, & Bungum, 2014). The findings from Schouwstra’s research and the promising characteristics of

(8)

agent based models brought up the question what agent based modelling could reveal about the the relative contributions of semantics and structural prim-ing to the emergence of lprim-inguistic conventions for word order. Investigatprim-ing their relative contributions and how these change over time using agent based modelling is believed to be possible, because such models have been proven to succesfully model complex phenomena like this one (Kirby, 2001; Smith, Kirby, & Brighton, 2003; Steels, 1995; Kirby & Hurford, 2002). Such a model would simulate the communication between two agents in a similar way to the pre-viously mentioned experiment (Schouwstra et al., 2020). Those agents would update their word order preferences by calculating a weighted average over a structural priming component and the semantics of the desired message prior to each trial. The weights which best resemble real world regularization will be able to reveal more about the relative contributions of these factors to the emergence of linguistic conventions for word order numerically.

The expectation is that the agents’ produced word orders should gradually shift from being largely dependent on semantics to being largely dependent on struc-tural priming over time for regularization to take place. This is based on the prediction that the shift from naturalness to regularity that has been observed in the experiment mentioned earlier was related to structural priming becoming more and semantics becoming less influential to the participants’ produced ges-tures over the course of trials (Schouwstra et al., 2020). This expectation is sup-ported by previous research, which has revealed that interaction and structural priming play a role in the regularization of language, because language users unconsciously try to align their behaviour with that of their conversation part-ner, which causes regularization (Feh´er, Wonnacott, & Smith, 2016; Pickering & Garrod, 2004; Ferreira & Bock, 2006; Pickering & Garrod, 2017). Moreover, the results from the experiment have shown that the word order of the partici-pants’ productions was affected by both their own previous production, referred to as production-priming, and that of their conversation partner, referred to as comprehension-priming (Schouwstra et al., 2020). This is why both types of structural priming need to be included in the model. It is expected that their relative contributions to the emergence of linguistic conventions for word order shifted from more production-priming to more comprehension-priming, which could be a possible explanation for the increased consensus between participants towards the end of the experiment (Schouwstra et al., 2020). In addition, struc-tural priming is expected to be more complicated than just looking back at one previous trial in this case. Instead, it is believed to be a process of incorpo-rating every previously encountered trial in a weighted average which discounts observations that lie further in the past. Several studies provide evidence in favour of this claim, suggesting that structural priming is cumulative and not just based on recently observed linguistic structures and that the effects are long-lived (Kaschak, Kutta, & Schatschneider, 2011; Coyle & Kaschak, 2008; Hartsuiker & Kolk, 1998; Ferreira & Bock, 2006). The decision to discount memories that are further in the past is based on the famous forgetting curve that was introduced by Ebbinghaus, who discovered that memories lose their

(9)

strength more and more as time progresses (Ebbinghaus, 2013). The proposed model is able to provide a more general outlook on the regularization of word order and the possibility to predict how this proceeds over a longer time period, because agent based models have the flexibility of creating as many participants and trials as is desired.

The process of researching this problem from start to end has been reported in the following sections, starting off with providing the necessary background containing relevant concepts and theories about language and word order evolu-tion, agent based modelling and learning strategies for agents. It continues with the methodology, in which the approach used to answer the research question has been explained in detail. First, an explanation about the analysis of the experimental data has been given, which is followed by a detailed description of the agent based model framework. Next, the procedure for defining the param-eters for the model and determining the optimal ones by comparing different models has been outlined, followed by an explanation of how they are evaluated. The results of these comparisons have been presented and discussed relative to the predictions that have been made earlier. At last, the most important re-sults have been summarised and recommendations for future research have been stated.

2 Theoretical Background

In this section, previous research supporting the decisions that have been made for the model has been presented and several concepts that we will use through-out this work have been explained. First, an introduction on word order and language evolution has been given, followed by research in which computational models have been used to investigate this evolution. Lastly, two different learn-ing strategies that agents in these computational models can use have been compared to each other.

2.1 The evolution of word order

Numerous experimental and literature based studies have been conducted con-cerning the evolution and origins of word order conventions. Most existing languages form sentences with a regular ordering of subject, object and verb, of which the two most common ones are SOV (Subject Object Verb) and SVO (Subject Verb Object) (Dryer, 2013) and it has been argued by multiple re-searchers that the word orders of all languages of the world must have evolved from SOV (Givón, 2014; Newmeyer, 2000; Gell-Mann & Ruhlen, 2011). On top of that, research has provided evidence which shows that people prefer using SOV when they communicate using solely improvised gesture, irrespective of the word order of their native language (Goldin-Meadow, So, Özyürek, & My-lander, 2008). While these are possible explanations for the dominance of SOV in the languages of the world, it does not explain why SVO is dominant as well.

(10)

Further research has revealed that SOV and SVO are used to distinguish se-mantic properties of sentences in the situation of an emerging linguistic system (Motamedi, Schouwstra, Culbertson, Smith, & Kirby, 2017; Gibson et al., 2013; Hall, Mayberry, & Ferreira, 2013; Schouwstra & de Swart, 2014). One of these studies has suggested that the split between the use of SOV and SVO in early emerging languages results from a similar semantic distinction between exten-sional and intenexten-sional events (Schouwstra & de Swart, 2014). The latter involve objects which are non-specific, possibly imaginary or dependent on the verb, these are exemplified in example 1, 2 and 3 respectively, whereas extensional events are all other events, for example “the boy throws the ball” (Schouwstra & de Swart, 2014; Forbes, 2020).

(1) The girl wants a bike. (2) The boy saw a monster. (3) The woman makes a sandwich.

They tested their hypothesis by letting participants use improvised silent ges-ture to describe an event shown on a picges-ture. The participants had no linguistic rules or information to rely on due to the silent gesture component of this exper-iment, which ensured that the situation of an emerging language was resembled as closely as possible. Additionally, they did not have a conversation part-ner; rather, the experiment purely focused on the improvisation aspect. The word order of participants’ productions seemed to depend on the event type of the message they were conveying. They used SOV for expressing extensional events and SVO for expressing intensional events, irrespective of their native language. In addition to this research that primarily focused on the effect of semantics on production, there also exists a similar study which focused on how word order of a silently gestured message influences the interpretation of the semantics of the message (Schouwstra, de Swart, & Thompson, 2019). They showed that participants were more likely to interpret an event as extensional when the word order was SOV and the same holds for SVO and intensional events. Thus, the results of these experiments provided convincing evidence in favor of semantics playing a crucial role in the evolution of word order. All word orders having evolved from SOV could be a too simplistic account for the dominant word order establishment that happened in the majority of languages. In contrast to these studies that have investigated word order in the absence of a linguistic system, there has also been research on word order in the presence of one. As mentioned before, most languages have a fixed word order for de-scribing who did what to whom (Dryer, 2013). A possible explanation for this could be a cognitive bias that favours simpler rule systems, which is referred to as simplicity bias (Culbertson & Kirby, 2016). This bias predicts that people prefer to minimize random variation in language. Thus, it can be argued that the regularized word order that is often found in established languages has re-sulted from minimizing the variation in possible word orders by decreasing their dependence on semantics over time.

(11)

Being provided with this information, one might wonder about how languages evolve from having a natural word order in an emerging linguistic system, to having a regular word order in an established language. It has been mentioned in different studies that there are three main mechanisms that interactively impact the evolution of language: improvisation, interaction and cultural transmission (Schouwstra, Motamedi, Smith, & Kirby, 2016; Lekvam et al., 2014). When there are no linguistic rules to rely on, people must improvise to communicate with others. Repeating this communication by interaction with each other will create a common ground between individuals, which will eventually be passed on to the next generation (Schouwstra et al., 2016). These mechanisms and their effect on the emergence of linguistic conventions for word order have been researched (Schouwstra et al., 2020).

This research consisted of three different experiments in which participants used improvised gesture to describe an intensional or extensional event shown on a picture. The influence of communication and vertical cultural transmission has been researched in experiment 1 by repeating the following process: letting two participants communicate for thirty two trials in the presence of an observer, which was referred to as a generation, and then letting the observer communi-cate with a new participant. The gesturing participant was called the producer and they gestured a message to another participant who was called the inter-preter.

Experiment 2 had the same setup as experiment 1, except for the fact that they investigated the influence of communication and horizontal cultural transmission by letting fixed pairs of participants communicate repeatedly. As opposed to experiment 1, there was no generational transfer in this experiment by replacing the producer with a new participant each trial. Experiment 1 and 2 gave similar results. At the beginning of both experiments, participants preferred SOV for communicating about extensional events and SVO for intensional events. At the end of both experiments the word order became more regular, meaning that word order became less dependent on event type, denoting the emergence of lin-guistic conventions for word order. The similarity between the results of these two experiments indicated that cultural transmission is not necessary for word order to regularize. There exist similar observations from another study which focused on the emergence of compositional linguistic structure, which showed that generational transmission was not necessary for compositional structure to arise (Raviv, Meyer, & Lev-Ari, 2019).

SVO was most often the resulting dominant order in experiment 1 and 2, which could possibly have been explained by it being the order of the native language of the participants. However, experiment 3 showed that event frequency influenced which word order became conventionalized by using unbalanced frequencies of event types that were shown to the participants. SOV became the dominant word order instead of SVO when the events being expressed were predominantly

(12)

extensional. Moreover, the results also revealed that previously encountered word orders had an influence the participants word order use, which is referred to as structural priming, which can be divided into comprehension-priming and production-priming and has been defined in section 1 (Schouwstra et al., 2020). Taken together, the results of the three experiments demonstrated that nat-uralness weakens and regularity strengthens as linguistic conventions, in this case for word order, emerge through repeated communication in an emerging linguistic system.

2.2 Agent Based Models for Language Evolution

The research that has been presented so far has been primarily experimental, although the use of agent based simulations is also a popular method for re-searching the underlying processes of language evolution (Lekvam et al., 2014; Kirby & Hurford, 2002; Smith et al., 2003; Steels, 1995). Such models are simplified versions of the world in which agents act and interact and thereby affect the state of the world (Lekvam et al., 2014). They have been proven to be powerful, because they are far more flexible than real world experiments; the amount of trials, amount of participants and conditions of the simulated experiment are mutable. Combining computational models and experimental work has also been proven to be an effective way of studying the evolution of language (Schouwstra et al., 2019; Culbertson & Smolensky, 2012; Kirby, Tama-riz, Cornish, & Smith, 2015) and this work is also an example of such such a combination.

The connection between horizontal or vertical cultural transmission and lan-guage evolution is a popular topic of study when using agent based models. Horizontal transmission involves transmitting knowledge within a generation (Gong, 2010), meaning that the information flows stays within one generation, which is illustrated in figure 1. Figure 2 pictures vertical transmission, which involves communication of an agent with another agent from a younger genera-tion (Gong, 2010). Despite Schouwstra et al. (2020) describing their experiment 1 as involving transmission and experiment 2 as not involving transmission, ex-periment 2 did involve some kind of horizontal transmission and exex-periment 1 involved vertical transmission.

x knowledge

(13)

x x + 1 knowledge

Figure 2: Vertical cultural transmission

Kirby’s work concerning the iterated learning model has been very influential in the field of agent based modelling for language evolution (Kirby, 2001; Kirby & Hurford, 2002). His models are based on the assumption that linguistic structure arises through vertical cultural transmission and the iterated learning model provides a framework for studying this kind of cultural transmission. The essence of iterated learning is the repeated process of a generation producing a set of utterances that are then observed by the next generation, which continues until a stable end state has been reached.

Contrary to the iterated learning model, Steels introduced a model for sim-ulating horizontal cultural transmission (Steels, 1995). The model consisted of several agents in a two-dimensional grid where they could move freely. A speaker described an object and the hearer attempted to identify the object, the speaker could then either confirm or reject the hearers guess. This chain of events was repeated until the agents reached consensus about the intended object or ended when reaching consensus was impossible. Their vocabulary got updated based on communicative success. The model revealed that a common vocabulary emerged among agents through this repeated updating.

As was mentioned in section 2.1, research has indicated that vertical sion was not necessary for the regularization of word order; horizontal transmis-sion only could also accomplish regularization (Schouwstra et al., 2020; Raviv et al., 2019). This was the reason that the model that has been used in this research simulated this special kind of horizontal cultural transmission from experiment 2 that was mentioned in section 2.1 (Schouwstra et al., 2020), similar to the work of Steels (1995), the only difference being that communication happened only within fixed pairs, instead of free communication within a generation.

2.3 Learning Through Interaction

The agents in agent based models can learn by updating their beliefs about the state of the world based on information acquired through observation of the world and other agents in particular (Steels, 1995; Kirby & Hurford, 2002; Banerjee, 1992; Zeng & Sycara, 1998). The actions of others reveal information about their beliefs, which can be used by another agent to improve its decision. There are two main methods which agents can use to do this, namely Bayesian or non-Bayesian learning. First, a general overview of Bayesian learning has been given, which is followed by an overview of non-Bayesian learning, the learning method that has been used in this research.

(14)

Bayesian learning is a popular rational method for learning in agent based models. There are multiple studies in which agents use this learning strategy (Banerjee, 1992; Zeng & Sycara, 1998). The actions of an agent are observed by others and used in combination with their private information to update their beliefs about the world. Updating beliefs about the state of the world being S given the observations O is done using Bayes rule, which is shown in equation 1 (Zeng & Sycara, 1998). The posterior probability (P (S|O)) that results from this is the new prior probability to update in the next iteration. Here, P (S) is their prior belief about the state of the world being S, P (O|S) is referred to as the likelihood of the observations given a word order and P (O) is the probability of the observation O occurring. Agents repeatedly updating their beliefs eventually results in a consensus between agents about the true state of the world.

P (S|O) = P (O|S) · P (S)

P (O) (1)

Despite the popularity of Bayesian learning, non-Bayesian learning is a more suitable learning strategy for the desired model. This method where agents learn by linearly combining their own beliefs with those of others has been in-troduced in a model named the De Groot model for social learning (DeGroot, 1974). An agent has Bayesian prior beliefs in the beginning of the simulation and updates them by computing the weighted sum of the beliefs of all N agents, including his own, with the weights summing to one. This updating rule has been stated in equation 2, where B stands for the belief of an agent, and p is a weight assigned to the belief of an agent.

Bi = N

X

j=0

pij· Bj (2)

Agents cannot directly observe the beliefs of other agents, which is why they need observations to infer them. For this particular problem of word order regularization, the belief of the other agents would correspond to which word order they have used: in other words, the comprehension-priming component of structural priming. Additionally, the semantics of the intended message and the production-priming component must also be included in the linear combi-nation, since they also influence the word order of a production (Schouwstra et al., 2020). The ease of adding or taking away factors to the equation makes this learning strategy flexible. In addition, the computation is less costly and in comparison to the Bayesian updating formula (Molavi, Tahbaz-Salehi, & Jad-babaie, 2018; Liu, Fang, Wang, & Wang, 2014), where several probabilities need to be calculated at each iteration, especially in situations where there are many variables and observations to consider.

Classic non-Bayesian learning assumes that the weights that are used for lin-early combining the beliefs are constant, while other research has shown that it

(15)

is also possible to let the proportions in which agents include their own beliefs and those of the other agents change at each time point (Liu et al., 2014). This means that different weights can be used for updating the beliefs of the agents at each iteration. This technique involved computing the weights using a time dependant equation and an example of this has been given in equation 3. They proposed a model in which η(t) corresponded to the weight assigned to the be-lief of other agents, 1 − η(t) corresponded to the weight assigned to their own belief at time t. This resulted in the agents being more reliant on the belief of the others at first, but gradually including more of their own beliefs in their decision making as time progressed.

η(t) = 1

t (3)

This possibility is another advantage of the non-Bayesian approach, since it allows implementation of the assumption that the relative contributions of comprehension-priming, production-priming and semantics to the regulariza-tion of word order are not constant (Schouwstra et al., 2020). More precisely, the assumption is that naturalness, which is expected to be related to semantics, makes way for regularity, which is expected to be related to structural priming, as time progresses.

3 Method

The investigation of the relative contributions of structural priming and seman-tics to the emergence of linguistic conventions for word order required mimicking the communication between two individuals in an emerging linguistic system. An agent based model that simulates word order regularization through hori-zontal cultural transmission has been used to do this, because they are flexible and suitable for simulating such complex phenomena, which has been mentioned before in section 1 and 2.2. The goal for the simulation was to accurately repre-sent the emergence of linguistic conventions for word order, for which the data from experiment 2 in the research of Schouwstra et al. (2020) were held as a standard. The manner in which regularity evolved through generations in the experimental data has been used to calculate several sets of time dependent pa-rameters for the agent based model (Liu et al., 2014). The simulation involved two agents, which took turns producing a word order based on the type of event that was going to be communicated about and the previous observations of themselves and the other agents in proportions that were determined by these time dependent parameters. The resulting data from several differently set up simulations were compared in order to form conclusions about the relative con-tributions of semantics and structural priming to the emergence of linguistic conventions for word order.

(16)

3.1 The Analysis of the Experimental Data

The data collected in experiment 2 (Schouwstra et al., 2020), which was men-tioned in section 2.1, has been used for constructing the model. All future references to ”the experiment” refer to experiment 2. This experiment con-sisted of twelve pairs of participants alternating between the roles of producer and interpreter who communicated using improvised gesture for six generations of thirty two trials each, meaning that the data consisted of 2304 entries. The most important information present in the data were the word order and cor-responding event type of the production. Since this research focused on the semantic distinction between the use SOV and SVO, the trials in which another order was produced were deleted from the data using the Pandas library and all of the programming in this research has been done using Python 3.8.5. The first step in constructing the model was getting more insight into the exper-imental data. There were three measures that could provide information about the regularity of word order in the data, namely the proportion of natural utter-ances, entropy and mutual information. The proportion of natural utterances is a straight forward measure, it shows the proportion of utterances in which the combination of word order and event type was natural. In other words, the proportion natural is one when the order-semantics combination is always SOV-extensional or SVO-intensional. Entropy is a measure which reveals the uncertainty or surprise of the outcomes of word order (Shannon, 1948). Mutual information is related to entropy, but quantifies the dependence between event type and word order (Shannon, 1948). It has been calculated by subtracting the conditional entropy of word order given event type, from the entropy of word order, as can be seen in the equation 4 below. Equation 5 shows the calculation of the entropy and equation 6 shows the calculation of the conditional entropy, where X represents the set of possible word orders and Y represents the set of possible event types.

I(X; Y ) = H(X) − H(X|Y ) (4) H(X) = −X x∈X P (x) · log P (x) (5) H(X|Y ) = −X x∈X X y∈Y P (x, y) · logP (x, y) P (y) (6) These measures were calculated from the experimental data for each of the six generations and for each participant pair. On top of that, the averages of these measures over all pairs were calculated for all generations. The measures and average measures for each pair were plotted together in a graph using the Mat-plotlib library. In addition to this, the exponential function in equation 7 has

(17)

been fitted to the average curve for each of the measures using the Scipy library. The shape of the mean curves hinted that an exponential function was an ap-propriate fit. Moreover, the entropy and mutual information cannot be smaller than zero and the proportion of natural utterances is not expected to go below than 0.5 (Schouwstra et al., 2020). If the participants would have used just one word order throughout the entire experiment, the proportion of natural utter-ances would have been 0.5, because the word order would have been considered natural for only half the trials with the matching event type due to both event types occurring in equal proportions. An exponential function like the one in equation 7 meets all of these restrictions. Fitting the function to the curve of the naturalness measure required adding +0.5, because the proportion of natural utterances in a generation was not expected to go below 0.5 (Schouwstra et al., 2020).

f (x) = a · ek·x (7) To evaluate the quality of the fit, the mean squared error (MSE) between vectors containing the average measures of the experimental data (X) and the corre-sponding estimated points from the fitted curve ( ˜X) have been calculated for all three graphs using the Sklearn library. The MSE was defined as follows, with N being the number of data points on the mean curve:

M SE(X, ˜X) = 1 N N X i=0 xi− ˜xi (8)

3.2 The Model

This section elaborates on the structure of the hypothesized model where the influence of semantics on the decision making of the agents decreases and that of structural priming increases over time, with the latter being implemented as long term structural priming where older observations are discounted. It also serves as a detailed overview of the general setup of the model, which is followed by an overview of different weights that have been compared in section 3.3.1 and a comparison of this model to models in which other kinds of structural priming have been implemented in section 3.3.2. These comparisons were necessary for the evaluation of the hypothesis.

3.2.1 The Agents

Simulating the regularization of word order like it proceeded in experiment 2 required two agents communicating exclusively with each other (Schouwstra et al., 2020). Such an agent needed three attributes:

1. A role in the conversation, which could be either producer or interpreter 2. Internal preferences for SOV and SVO word order, which are represented

(18)

3. A memory in which their previous productions and the previous produc-tions of the other agent are stored

A producer agent could produce a word order by treating the SVO and SVO preference scores as probabilities of choosing that order and using these proba-bilities to sample a word order using Numpy. However, during the first trial, an agent had no observations of itself or its partner on which to base its preferences. This has been solved by making these scores be the conditional probabilities of order given event type, calculated from the first three trials from all the first generations in the experimental data, ensuring an accurate reflection of word order choices made under the condition of having no prior knowledge available other than the event type that is to be expressed. The ideal reflection of this situation would have been to use only the first trials, but that amount of data was not sufficient to calculate all the necessary conditional probabilities. At the beginning of each communication trial, the producer agent updated its preference scores by linearly combining the event type that was to be commu-nicated and structural priming, which has been divided into comprehension-and production-priming. This has been formalized in equations 11 comprehension-and 12 comprehension-and visualized in figure 3, where the process of the producer agent incorporating the structural priming and semantic aspects resulting in a production has been pic-tured together with the mutual dependencies between the different components of the model.

Figure 3: Visualization of a trial from the simulation. The producer agent (white) is influenced by the event type that is to be described and all previously encountered productions from itself and the other agent, both indicated by the dotted lines. The produced word order is observed by the grey interpreter agent and by itself, indicated by the dashed lines.

(19)

In equation 11, X corresponds to the collection of previous productions of an agent itself, Y to the collection of previous productions from the other agent and z to the event type that is going to be communicated about. This means that in equation 11, f (X, n) refers to production-priming and f (Y, n) to comprehension-priming, X and Y being the memory of an agent. Equation 11 and 12 have been used to calculate the preference scores, subtracting the SOV preference scores from one resulted in SVO preference scores, because the scores are probabilistic. Equation 9 explains the structural priming component f (X, n) in more detail. It has been calculated like has been hypothesized: a weighted average over an agent’s memory in which older memories were discounted, where p is a weight and x an item from the agent’s memory. Other kinds of structural priming have been discussed and compared to this implementation in section 3.3.2. The outcomes of that equation were rounded to either zero or one depending on which is closest, corresponding to a SVO or SOV preference respectively. The weights for long term structural priming have been chosen based on the slope of human memory decay, for which a hyperbolic function has been shown to be a suitable representation, because it balances the trade-off between goodness of fit and complexity for this phenomenon (Lee, 2004). This is why the weights for this formula were drawn from the hyperbolic function 1

t, older memories

weighing less than recent memories. However, there were also cases when the agent either has no observations or productions, which was the case in a second and third trial. This has been solved by not letting structural priming split into comprehension- and production-priming, but letting it be fully defined by either the productions or the observations during those trials.

The semantic component from equation 11 is represented by g(x), which is ex-plained in equation 10. The given event type was the input variable, which was mapped to either zero or one, formalizing the observation that participants gen-erally preferred SOV for extensional and SVO for intensional events (Schouwstra et al., 2020; Schouwstra & de Swart, 2014). For example, g(extensional) = 1 when computing an SOV score with the event type being extensional, because SOV was preferred for such an event by the majority of participants.

f (X, n) = $_P_n t=0pt· xt Pn t=0pt ' (9) g(x) = ( 1 if x = extensional 0 otherwise (10)

scoreSOV(X, Y, z, n) = p1· (p3· f (X, n) + p4· f (Y, n)) + p2· g(z) (11)

(20)

3.2.2 The Simulation

A simulation started off with the creation of two agents, a producer and an interpreter, which both had the same initial preference scores. The producer agent used these scores to produce the first word order, which could have been either SOV or SVO. A single trial of a simulation has been visualized in figure 3, where the two different types of agents and what influenced them are pictured. Simulations consisted of a hundred communicating pairs, which all communi-cated for seventy generations consisting of thirty two trials each. Simulating only a few pairs could have caused too large fluctuations across simulations and in turn unstable results due to the probabilistic component of the decision making. Moreover, there were more generations than in the experiment in or-der to gain more insight into word oror-der regularization over a longer time period. After the first trial, when both agents had the ability to collect external in-formation, the updating of their scores after each trial happened according to the updating rule in equation 11. Like in the experiment, the intensional and extensional event types occurred in equal proportions. The producer chose SOV or SVO based on their calculated preference scores when given an event type at the beginning of a trial, after which the agents switched roles. The process of a whole simulation has been outlined in algorithm 1. The simulation yielded a data set similar to that of the experiment, which has been visualized in the same way as has been done for the experimental data.

Algorithm 1 Process of the simulation for one conversation pair model = Model(weights, initial scores)

for i generations do

semantics = generate desired event types for each trial for j trials do

if i=0 and j=0 then

producer.update preferences(initial scores, semanticsj)

production = producer.produce(semanticsj)

interpreter.observe(production)

switch roles of producer and interpreter else

producer.update preferences(weightsi, semanticsj)

production = producer.produce(semanticsj)

interpreter.observe(production)

switch roles of producer and interpreter end if

end for

save productions of both agents end for

(21)

3.3 The Comparison of Different Model Implementations

3.3.1 The Weights

The updating of the preferences after each trial was dependent on the weights that were used in the updating formula in equation 11. The selection of these weights was important, because they determined the productions of the agents and thereby the outcome of the simulation, despite the slight variability caused by the probabilistic aspect of the scores. Fitting a function to the mean propor-tion of natural utterances, the mean entropy and the mean mutual informapropor-tion curves of the experimental data revealed two parameters belonging to an expo-nential curve, represented by a and k in equation 7. This resulted in three sets of two parameters, which were used to calculate weights that determined the relative contributions of semantics and structural priming to the agents’ word order preferences in the model.

Each parameter set has been used to calculate time-vaying weights per gen-eration, by filling in the parameters in equation 7, where x symbolizes the time in generations. The outcome of equation 7 was the weight associated with the semantics of the event in generation t, which is stated as p2 in equation 11.

The weight associated with structural priming was 1 − η(t), which corresponds to p1. Equation 11 shows that structural priming was defined by two types of

priming, namely comprehension-priming and production-priming, represented by f (x) and f (y) respectively. A simplifying assumption has been made, namely that these two types of priming contributed in the same proportions as seman-tics and structural priming. p4 = p1 from equation 11 was associated with

production-priming and p3 = p2 was associated with comprehension-priming.

This strategy translated the expectation that the development of regulariza-tion in the experimental data was related to structural priming, comprehension priming in particular, becoming more important and semantics less important in the participants’ decision making as time progresses.

The remaining task was to determine which of the three sets of weights was the best option for modelling the regularization of word order, which has been done by comparing the output of each simulation performed using one of the weight sets to the experimental data. The simulated data consisted of the agents’ pro-ductions and the corresponding event types and have been visualized using the same procedure that was used for the experimental data. The different simu-lations were evaluated by measuring their similarity to the experimental data. The MSE between each of these different simulation data sets and the experi-mental data were calculated for the proportion natural utterances, entropy and mutual information. The sum of these three MSE values has been used as the measure of similarity between the experiment and the data of a simulation: a lower value meant a higher similarity. This procedure is clarified in algorithm 2. The reason that there has not been made a comparison between the produced word orders per trial of the experiment and the simulation was that optimizing

(22)

such a similarity would result in the risk of overfitting the simulated data to the experimental data. It was more important to make the simulation resemble the general flow of the regularity through generations of the real world data. Algorithm 2 Calculation of the similarity between the simulation data and the experimental data

similarity = 0

for all 3 fitted curves of simulation do

similarity += MSE(curve, corresponding curve from experimental data) end for

return similarity

3.3.2 Structural Priming

Alongside the determination of the optimal weights, research on the underly-ing processes of structural primunderly-ing has been conducted. Comparisons of three different kinds of structural priming were made to evaluate the hypothesis that structural priming is a process of taking a weighted average over the entire memory, discounting older observations. Models utilizing short term structural priming, long term weighted structural priming and long term non-weighted structural priming have been compared to each other to test this. Short term structural priming was implemented by making the comprehension priming com-ponent the one previous observation of the other agent’s production and the production-priming component the one previous production of the agent itself, these corresponded to SVO or SOV coded as zero or one respectively for f (X, n) and f (Y, n) in equation 11. The long term weighted version was the hypothe-sized type of structural priming and has been thoroughly explained in section 3.3.1. The non-weighted condition works the same, but used a regular average over the entire memory instead of a weighted average, meaning the values of p in equation 9 were all equal. The data resulting from these simulations has been visualized in the same way as for the experimental data. After that, the similarity to the experimental data has been measured according to algorithm 2. Then, the MSE’s between each of these three different simulation data sets and the experimental data were calculated for the proportion natural utterances, entropy and mutual information. The sum of these three MSE values has been used as the measure of similarity between the experimental data and the data of a simulation.

Verifying the prediction that the relative influence of production-priming on the word order of the participants’ productions decreased, whereas that of com-prehension priming increased over time was also necessary. This was done by running all simulations that have been discussed before with the weights for production-priming and comprehension priming reversed and comparing them to the simulations where the weights were not reversed. In other words, p3and

(23)

of these two cases were compared to each other in order to evaluate the hypoth-esis of the increase in influence of comprehension-priming and the decrease in that of production-priming on word order over time. This comparison has been done according to algorithm 2 like before.

In order to fully test the hypothesis, another step needed to be taken, namely investigating the prediction that the regularization in the experiment resulted from a decreased effect of semantics and an increased effect of structural priming on the productions of participants over time. This has been done by reversing their weights in a similar way as has been discussed in the previous paragraph, meaning that p1and p2in equation 11 were switched. The results of these

sim-ulations have been visualized and compared to the graphs of the corresponding non-reversed conditions.

3.3.3 The Optimal Model

The model out of all compared models which had the optimal trade-off be-tween computational cost and goodness of fit was considered to be the best for modelling the emergence of linguistic conventions for word order. The different weight sets, the reversing of weights and the different kinds of structural prim-ing were all taken into account in the search of this optimal model. However, it is important to note that small variations could occur in the resulting MSE values when executing the same model repeatedly, due to the probabilistic com-ponent of the agents’ decision making. The optimal model has been used to form conclusions about the relative contributions of structural priming and se-mantics to the regularization of word order and also to make predictions about this regularization over a longer time period.

4 Results

4.1 The Analysis of the Experimental Data

Analysing the experimental data resulted in graphs of the proportion natural-ness, entropy and mutual information per generation, shown in figure 4. The goodness of fit has been quantified by the MSE between the black mean line and the red exponential curve that has been fitted to the mean line, the results of those can be seen in table 1. These values are low, meaning that an exponential fit was indeed a good fit and that the mean curve of mutual information was the hardest to fit. Looking at the graphs in figure 4, it can be seen that all mean curves decrease decrease faster in the beginning than in later generations. Looking closer, it seems that the mean entropy and mutual information curves in figure 4b and 4c will not go below 0.4 and 0.3 respectively if there were more generations. Despite this observation, the decision has been made to not include this in the exponential fitting function, because there was no evidence for this. Other arguments have already been mentioned in section 3.1 and have been discussed further in section 5. The parameters of the fitted curves are stated

(24)

in table 1 and they were used to calculate the different sets of weights for the agents’ preference updating function in the simulation.

(a) Proportion naturalness (b) Entropy

(c) Mutual information

Figure 4: The resulting plots of the analysis of the experimental data. The dashed lines represent the individual pairs, the black line represents the mean and the red line represents the fitted curve.

MSE Parameters a k Naturalness 0.0003 0.23 -0.22

Entropy 0.0008 0.56 -0.10 Mutual information 0.0026 0.49 -0.18

Table 1: The quality of fit (MSE) and parameters of all three exponential curve fits

(25)

4.2 The Agent Based Model

According to the results of the analysis of the experimental data in table 1, the weights based on the slope of proportion naturalness of the experimental data are calculated using the following function: η(t) = 0.23 · e−0.22·t. Second, the weights based on the slope of entropy are calculated with η(t) = 0.56 · e−0.10·t. Finally, those based on the slope of mutual information are calculated using the function η(t) = 0.49 · e−0.18·t. Table 2 shows the resulting similarities between the experimental data and the data of eighteen different simulations, all using a different combination of weight function mentioned before, type of structural priming and possible reversal of weights. They have been compared to each other to discover which one is best for describing the experimental data. The simula-tions that made use of the weight set based on the slope of mutual information from the experimental data are visualized in figures 6, 7 and 8 in appendix A. Figure 6 shows the visualization of the data from the simulation using short term structural priming and figure 7 visualizes the simulation in which long term weighted structural priming has been implemented. At last, figure 8 shows the results of the simulation which utilizes long term non-weighted structural prim-ing. In all these visualizations, the right hand size shows the simulations where regular weights were used and the left hand side shows simulations where the weights for production- and comprehension-priming were switched. It is impor-tant to note that all these results are prone to slight variability, because of the probabilistic component in the decision making of the agents.

As can be seen in table 2, the simulation with the weights based on the mutual information has the smallest MSE value for all conditions, except for the con-dition where short term priming with non-reversed weights has been used. The largest MSE is that of the simulation based on proportion naturalness param-eters, in both the reversed and non-reversed condition. When comparing the reversed to the non-reversed conditions, they seem to be similar when looking at the MSE values in table 2. However, there are visible differences when look-ing at the visualizations: the reversed condition shows some curves that remain horizontal and do not go down over generations in figures 6d, 7d, 8d, 6f, 7f and 8f or curves that keep fluctuating even at later generations in figures 6b, 7b and 8b.

(26)

Long term priming Short term priming Weighted average Regular average

Reversed weights Non-reversed weights Reversed weights Non-reversed weights Reversed weights Non-reversed weights Naturalness weights 0.073 0.079 0.108 0.089 0.051 0.016 Entropy weights 0.021 0.019 0.018 0.016 0.051 0.090 Mutual information weights 0.004 0.004 0.004 0.004 0.016 0.038

Table 2: Similarities between the simulation and the experimental data for differently set up simulations measured by the MSE. A lower value indicates a higher similarity between the experimental data and the simulation data.

(27)

So far, only the results of reversal of the comprehension- and production-priming weights, p3 and p4 in equation 11 have been presented. However, to be able to

fully test the hypothesis, it was also necessary to compare the reversal of se-mantics and general structural priming weights to the non-reversed condition: in other words, the reversal of p1 and p2 from equation 11. An example result

of this can be viewed below in figure 5. This visualization is of the simulation with weights based on mutual information and long term weighted structural priming. It is clear that all three measures rise exponentially over the course of generations, which was also the case for all the other simulations.

(a) Proportion naturalness (b) Entropy

(c) Mutual information

Figure 5: An example of a simulation where the weights for structural priming and semantics have been reversed. The dashed lines represent the individual conversation pairs and the black line represents the mean over those pairs. Finally, short term structural priming, long term weighted structural priming and long term non-weighted structural priming were compared to each other. The visualizations of simulations with these different kinds of structural prim-ing are shown for their reversed and non-reversed primprim-ing weights condition in figures 6, 7 and 8 in appendix A. It can be seen in figure 6 that short term struc-tural priming results in fluctuation across generations concerning the proportion of natural utterances, entropy and mutual information. This heavy fluctuation is not observed for the two long term structural priming conditions in figures 7 and 8. Moreover, there does not seem to be a significant difference between the

(28)

weighted and regular average long term structural priming when looking at the MSE values in table 2 and the visualizations.

Aside from attempting to recreate experiment 2 (Schouwstra et al., 2020), the model has also been extended to seventy generations in order to reveal more about the relative contributions of structural priming and semantics over a longer time period than in the experiment. The results of this have been pre-sented in appendix A, figures 6, 7 and 8. All graphs clearly show the entropy and mutual information approaching zero and the proportion of natural utterances approaching 0.5 around the forty fifth generation.

5 Discussion

The analysis of the different simulations has provided valuable information about the relative contributions of semantics and structural priming to the emergence of linguistic conventions for word order. First of all, the experi-mental data analysis revealed that word order regularized in an exponential manner. This means that word order regularized faster in earlier generations than later generations of the experiment. The simulations were designed to regularize accordingly by basing the weights for the agents’ updating equation on the slope of the experiment’s regularization. This faster regularization at the start might seem counterintuitive, because naturalness has been proven to be dominant at the beginning of the experiment (Schouwstra et al., 2020). A possible explanation may be that there is more room for regularization at the beginning of the experiment, resulting in bigger differences in regularity between generations, whereas later in the experiment the fine-tuning happens, resulting in smaller differences between generations. The results of the weights based on the exponential fit of the different regularization measures can be viewed in ta-ble 2. The simulations using the weights based on the fit of mutual information generally yielded the best similarity, meaning it best described the regulariza-tion that was seen in the experimental data, which was to be expected as it contains most information about the dependency between semantics and word order out of all three measures (Shannon, 1948).

Apart from testing the simulation with different weights, different kinds of struc-tural priming have also been compared to each other. First of all, it is clear that short term structural priming was a worse fit to the experimental data than long term structural priming, because of the larger MSE values in table 2. The visualizations of the short term priming condition in figure 6 confirmed this, because the values of all three measures fluctuated considerably more across generations in comparison to the long term priming conditions, causing the re-sults of the short term priming condition to be unstable and thereby unreliable and meaningless. These observations indicated that structural priming is more complex than basing a sentence structure on the previously encountered one: instead, the participants seemed to have incorporated every previously

(29)

encoun-tered structure in their word order decision making, which is in agreement with the expectations that were based on previous research (Kaschak et al., 2011; Coyle & Kaschak, 2008; Hartsuiker & Kolk, 1998; Ferreira & Bock, 2006). This result is remarkable, because previous research showed a significant presence of short term structural priming in the used data (Schouwstra et al., 2020). On the other hand, the several studies about the cumulative properties of struc-tural priming caused this result to be less surprising. It seems that short term structural priming might have been present, but that long term effects were more suitable for modelling the gradual regularization. This outcome is a nice addition to the existing findings, as there has not been done much research on structural priming in the situation of emerging linguistic rules focusing on the visual-manual modality yet.

Moreover, the distinction between the weighted and regular condition of long term structural priming did not seem to make much difference in terms of good-ness of fit to the experimental data: defining the structural priming component as a regular average over the memories of the agents gave similar results to the memory discounting strategy. Nevertheless, using a regular average instead of a weighted average decreased computational cost, because the formula for calculating the weights for discounting older memories did not have to be esti-mated and included in the model. Consequently, a non-weighted approach was preferred over a weighted approach, because it optimized the trade off between goodness of fit and complexity. This means that the hypothesis about the work-ings of structural priming was partially incorrect, since these results indicated that participants did not discount older memories: instead, the results indicated that participants included every encountered word order in equal proportion. This signifies that the forgetting curve from Ebbinghaus did not apply to long term structural priming in this situation (Ebbinghaus, 2013), which could be because the curve was designed to reflect the discounting of consciously retrieved memories, whereas structural priming might be considered an unconscious and implicit process (K. Bock & Griffin, 2000; Ferreira & Bock, 2006; Kaschak et al., 2011).

Furthermore, models with the differing factor being the reversal of weights of comprehension- and production-priming have been compared to each other. The results in table 2 revealed that this reversal gave similar results in terms of good-ness of fit. However, the visualizations of the simulation data in figure 6, 7 and 8 show that it did affect the behaviour of some agents: the right sided graphs show certain pairs of which word order did not regularize. For example, the purple curve in figure 7b, 7d and 7f shows a pair for which word order did not reg-ularize, shown by the proportion naturalness, entropy and mutual information not decreasing over generations. This phenomenon was also observed in other differently set up simulations when reversing the weights for comprehension-and production-priming. An explanation for this could be that the agents did not take into consideration the other agents behaviour enough when choosing a word order and in turn were not able to align and regularize their word order

(30)

due to the relative influence of comprehension priming on the preferences of agents being too low throughout the simulation. Figure 4 shows that such out-liers were not present in in the experimental data, meaning that they shouldn’t be present in the simulation. These observations indicate that decreasing the weights for production-priming and increasing those for comprehension-priming over time is necessary in order to prevent pairs from struggling to converge on a dominant word order. This supports the expectation about word order regu-larization being related to an increased effect of comprehension-priming and a decreased effect of production-priming on word order over time.

Moreover, the simulation with a reversal of the weights for semantics and struc-tural priming as a whole had to be executed to fully test the hypothesis. The visualizations in figure 5 immediately showed that this condition was the worst way of attempting to model word order regularization. Instead of increasing regularity over generations like in the experiment, regularity decreased exponen-tially, modelling the opposite of regularization. This means that the hypothesis about regularization being connected to the increasing influence of structural priming on word order is likely to be correct.

Taken together, these findings indicate that the model with long term non-weighted structural priming and non-reversed weights based on the slope of mutual information of the experimental data is the optimal one for describing the regularization from the experiment and minimized model complexity. This means the optimal weight function for the simulation was the one based on the mutual information fit of the experimental data, presented in equation 13. In addition, these last two paragraphs demonstrated the effectiveness of time-varying weights for modelling the shift from naturalness to regularity (Liu et al., 2014).

η(t) = 0.49 · e−0.18·t (13) Extending this model to simulate seventy generations revealed that the mean proportion of natural utterances approached 0.5 and the mean entropy and mu-tual information approached zero after forty five generations of thirty two trials, meaning that the agents’ language fully regularized after approximately 1440 trials, as shown in figure 8 in appendix A. This suggested that a conversing pair could establish a regular word order pattern after interacting approximately 1440 times. Yet, it is important to note that this holds only for language in the visual-manual modality, since the simulations were based on an experiment where the communication consisted solely of silent gesture. It is not safe to claim that the same conditions hold for other languages.

Related to this point is the uncertainty about whether real world languages in the visual-manual modality are able to fully regularize, despite the simula-tions being designed to do so. The mean curves in figures 4b and 4c flatten

(31)

at some point and seem to not be able to reach zero, indicating that the as-sumption about the form of the exponential fit might have been incorrect. As a matter of fact, previous research has shown that some sign languages do not fully regularize (Napoli, Spence, & de Quadros, 2017; Flaherty, Schouwstra, & Goldin-Meadow, 2018; Napoli & Sutton-Spence, 2014). These studies have provided evidence that traces of the semantic distinction between intensional and extensional events still surface in word order structure in both younger and older sign languages. This distinction has not been researched for spoken nat-ural language yet, but research on the influence of other semantic factors on word order in a spoken language called Odawa has indicated that semantically conditioned variability is present there (Christianson & Ferreira, 2005). On top of that, results from another research project showed that cognitive and seman-tic factors can cause word order variation in English language (J. K. Bock & Warren, 1985). These findings might relate to the naturalness pattern that has been observed in this research. Altering the model according to these findings might have been a good decision, if there would have been more information about if and when the regularization was supposed to come to an end. The fact that the experiment stopped after six generations limits the certainty with which conclusions about this could be made.

The final point of discussion is about the agents’ preference updating rule, which implies that structural priming and semantics are sufficient to model the regu-larization of word order accurately. However, there could be more factors that contribute to this phenomenon, because there are many external factors that could affect a person. For example, research about sign language convergence revealed that sign languages in small and highly connected communities are less regular than sign languages in large communities (Thompson, Raviv, & Kirby, 2020). Another aspect about the real world that was not included in the proposed model is that people usually interact with multiple people and also simultaneously, whereas the model simulated the two individuals interacting exclusively with each other.

6 Conclusion

The relative contributions of semantics and structural priming to the emergence of linguistic conventions for word order were investigated using an agent based model. It has been discovered that the regularization seen in the experimental data resulted from the influence of semantics on word order choices decreasing and that of structural priming increasing as time progressed. Within structural priming, comprehension-priming became more and production-priming became less influential over time. Additionally, the results partially confirmed the hy-pothesis about the workings of structural priming: the simulations showed that structural priming was a process of incorporating observations from every previ-ous encountered trial in equal proportion, but observations from further in the past were not discounted. The following two paragraphs contain

(32)

recommenda-tions for future research based on the discussion points from section 5.

The first point from section 5 concerned the inability to extend the results to real world languages of all modalities. Future experimental research similar to that of Schouwstra et al. (2020), but focusing on spoken language will provide the possibility to extend the proposed model to spoken language. However, re-flecting an emerging linguistic system through improvisation will be a challenge when designing such an experiment, since speakers possess existing linguistic knowledge about their language. Performing more experiments will also help with solving the problem concerning the lacking data for correctly estimating the initial scores that was mentioned in section 3. Furthermore, section 5 men-tioned the uncertainty about when or whether languages, including the one from the experiment, stop regularizing at some point before full regularization. Fu-ture research could continue to explore this by performing more experiments and field work concerning regularization of word order. In addition, there were two points made in section 5 about the possibility of increasing the resemblance between the simulation and real world communication. The first point was about additional influences from external factors in the real world, which could be researched more extensively and included in a more complex and better in-formed model. An important thing to consider when doing this is whether the extra complexity that would be added to such a model will make a significant difference in goodness of fit. The second point concerned the possibility of a more accurate reflection of real world interaction. The proposed model could be altered by creating multiple agents that exist in a grid and letting those that are in proximity of each other communicate. The non-Bayesian linear learning strategy that has been used would be advantageous for this situation (DeGroot, 1974; Liu et al., 2014), because its flexibility and computational efficiency would make it easier to expand the model. Multiple interpreter agents would be able to observe one producer agent and agents would be able to communicate with different agents each trial instead of communicating in fixed pairs. This ex-panded model would represent the emergence of linguistic conventions for word order more realistically, making any conclusions resulting from the model be more applicable to real world situations and thereby more useful.

In brief, the proposed model showed that linguistic conventions for word order emerged due to an increased influence of structural priming and a decreased in-fluence of semantics on word order decision making and participants attempting to align their word order choices with the observed behaviour of their conver-sation partner more over time. This means that regularization of word order was related to structural priming and naturalness to semantics, confirming part of the hypothesis. It was also confirmed that structural priming is in this case a process of taking into account the entire memory of encountered word order structures, instead of taking into account just one previous observation. These findings provide a strong foundation for the design of more realistic and widely applicable models and also signify the usefulness of interaction between experi-mental research and computational models.

(33)

References

Banerjee, A. V. (1992). A simple model of herd behavior. The quarterly journal of economics, 107 (3), 797–817.

Bock, J. K., & Warren, R. K. (1985). Conceptual accessibility and syntactic structure in sentence formulation. Cognition, 21 (1), 47–67.

Bock, K., & Griffin, Z. M. (2000). The persistence of structural priming: Transient activation or implicit learning? Journal of experimental psychology: General , 129 (2), 177.

Christianson, K., & Ferreira, F. (2005). Conceptual accessibility and sentence production in a free word order language (odawa). Cognition, 98 (2), 105–135. Coyle, J. M., & Kaschak, M. P. (2008). Patterns of experience with verbs af-fect long-term cumulative structural priming. Psychonomic bulletin & review , 15 (5), 967–970.

Culbertson, J., & Kirby, S. (2016). Simplicity and specificity in language: Domain-general biases have domain-specific effects. Frontiers in psychology, 6 , 1964.

Culbertson, J., & Smolensky, P. (2012). A bayesian model of biases in artificial language learning: The case of a word-order universal. Cognitive science, 36 (8), 1468–1498.

DeGroot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Association, 69 (345), 118–121.

Dryer, M. S. (2013). Order of subject, object and verb. In M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Retrieved from https://wals.info/chapter/81

Ebbinghaus, H. (2013). Memory: A contribution to experimental psychology. Annals of neurosciences, 20 (4), 155.

Feh´er, O., Wonnacott, E., & Smith, K. (2016). Structural priming in artificial languages and the regularisation of unpredictable variation. Journal of Memory and Language, 91 , 158–180.

Ferreira, V. S., & Bock, K. (2006). The functions of structural priming. Language and cognitive processes, 21 (7-8), 1011–1029.

Flaherty, M., Schouwstra, M., & Goldin-Meadow, S. (2018). Do we see word or-der patterns from silent gesture studies in a new natural language. In The evolu-tion of language: Proceedings of the 12th internaevolu-tional conference (evolangxii). ncu press. retrieved from http://evolang. org/torun/proceedings/papertemplate. html (pp. 3991–1).

The Emergence of Linguistic Conventions for Word Order: An Agent Based Model

The Emergence of Linguistic

Conventions for Word Order

Layout: typeset by the author using L

TEX.

The Emergence of Linguistic

Conventions for Word Order

An Agent Based Model

Loïs M. Dona

11873116

Bachelor thesis

Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam

Faculty of Science

Science Park 904

1098 XH Amsterdam

Supervisor

Dr. M. Schouwstra

Institute for Logic, Language and Computation

Faculty of Science

University of Amsterdam

Science Park 907

1098 XG Amsterdam

Acknowledgements

Contents

1

Introduction

2

Theoretical Background

2.1

The evolution of word order

2.2

Agent Based Models for Language Evolution

2.3

Learning Through Interaction

3

Method

3.1

The Analysis of the Experimental Data

3.2

The Model

3.3

The Comparison of Different Model Implementations

4

Results

4.1

The Analysis of the Experimental Data

4.2

The Agent Based Model

5

Discussion

6

Conclusion

References

_TEX.