The cultural evolution of color convexity in populations of neural networks

(1)

The cultural evolution of

color convexity

in populations of neural

networks

(2)

Layout: typeset by the author using L

A

_TEX.

(3)

The cultural evolution of color

convexity

in populations of neural networks

Nienke C. Duetz

11418656

Bachelor thesis

Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam

Faculty of Science

Science Park 904

1098 XH Amsterdam

Supervisor

dr. F. Carcassi

Institute for Logic, Language and Computation

Faculty of Science

University of Amsterdam

Science Park 907

1098 XG Amsterdam

(4)

Abstract

Human languages are subjected to iterated learning: a child learns a language from its parents, who have learned it from their parents, who have learned it from their parents, and so on. Because of small differences in the languages that a child learns versus the language that its parents have learned, the language changes over time. This process is called the cultural evolution of language. Previous research suggests that a consequence of this cultural evolution is the emergence of semantic universals. Semantic universals are features of language that apply to meaning, which are shared by all human languages. Such a se-mantic universal is the convexity universal of color terms: all color terms in a language carve up convex regions of the color circle.

This paper aims to prove that the convexity universal exists as a result of cultural evolution, by modelling iterated learning of color terms. The model consists of populations of neural networks, where each network is able to learn color systems. Each population represents a generation that uses the output data from the previous generation as its input data, thus simulating iterated learning. The input of the initial generation consists of randomly generated color systems that are completely non-convex.

The results of the model show that non-convex color systems do not evolve toward convex ones. A possible explanation for this is that the neural networks are not accurate enough, thus losing valuable information regarding convexity. Another possible explanation is that the measure of convexity that is used, is not sufficient for analyzing the convexity of entire color systems.

(5)

1 Introduction

One of the main features that separates humans from other animals, is that we are capable of communicating through complex language. The question of why humans are so unique in having complex, structured languages is one of many controversies. On one side, it is argued that the emergence of language is the result of natural selection and therefore has a biological cause. The other side proposes that language evolves on its own through transmission of language over many generations (Christiansen and Kirby, 2003). This process is called the cultural evolution of language. This paper focuses on the second approach. The cultural evolution of language occurs as a result of iterated learning: a child learns the language of its parent, who have learned it from their parents, who have learned it from their parents, and so on. Each time the language is transmitted to a new individual, it changes slightly. The individual can then, in turn, teach their language to other individuals who in turn learn a slightly different language. When this happens over many generations, the language will culturally evolve (Kirby et al., 2015).

A key concept that drives cultural evolution is the learning bottleneck. The learning bottleneck means that an individual needs to learn a large scope (in this case, an extensive language), but only has a limited set of input to learn from (Kirby et al., 2014). Kirby et al. (2014) suggest that because of this re-striction, language evolves to have structure, which makes it possible to learn more information from a smaller set of input. The reason for this is that a structured language has compositional rules that make it possible for individu-als to infer different meanings, instead of only having a one-to-one mapping for each word-meaning combination. For example, English has the rule that one can convey the opposite of a word by placing the word ’not’ in front of it. Then, the opposite of anything can be conveyed using one rule, instead of having to learn a separate antonym for each word in the language, That would require a lot more input data as opposed to a single rule.

Such a structure is something that all human languages share, and it is consid-ered to be one of the key factors that makes human language unique compared to communication in other animals. Expanding on this, there are other features that all languages appear to share, even languages that have evolved completely separately from each other. Such features are called linguistic universals. An (seemingly obvious) example of a linguistic universal is that all languages have vowels and consonants (Siemund, 2009). When these properties apply to mean-ing, they are called semantic universals. Some explanations for the existence of linguistic universals have been proposed, but not much research has been done on the existence of semantic universals. One explanation for the existence of semantic universals is proposed by Steinert-Threlkeld and Szymanik (2020): ex-pressions that satisfy a semantic universals are easier to learn than exex-pressions that do not. Steinert-Threlkeld and Szymanik (2020) test this learnability by

(7)

modelling two semantic universals and their non-universal counterparts, using neural networks (see section 2.4). The network learns meanings that satisfy the semantic universals faster and more accurately than the meanings that do not satisfy semantic universals, showing that they are easier to learn. The mean-ings that satisfy the semantic universals that Steinert-Threlkeld and Szymanik (2020) researched were monotonicity and convex color categories

However, this learnability property only applies to individual learning, and not to language as a whole. Thus, the learnability of a semantic universal does not entirely explain why it occurs in all languages. Carcassi et al. (2019) complete this explanation for the monotonicity universal. They model the cultural evo-lution of language and show that the monotonicity universal emerges when a language evolves.

The other semantic universal Steinert-Threlkeld and Szymanik (2020) analyse, lies in the domain of color terms. A color term can be defined as a term that de-notes a specific section of a color space (for example, see Figure 1). Gardenfors (2004) proposes a specific universal over the domain of color terms: he argues that color terms across languages carve up the color circle into convex regions In mathematics, a convex region is a space where for each pair of points p1and

p2, a straight line line l can be drawn, and each point on l is also contained in

the region (Gardenfors, 2004) (see section 2.3).

Research about the convexity universal has not yet been extended to language as a whole. This paper expands on the previous work done by Carcassi et al. (2019) on the cultural evolution of the monotonicity universal, by modelling the cultural evolution of color categories in the same way. The research question is therefore:

To what degree do convex color categories emerge when modelling the cultural evolution of color categories?

Modelling the cultural evolution of color categories is done by using popula-tions of neural networks that are able to learn color spaces. Each population represents a generation in cultural evolution, and the networks represent indi-viduals in a population. Each of the networks in a population will be trained on the output color systems of the previous generation. The first population will be trained on randomly initialized color systems that are not convex. The color systems of the previous generation are not used in their entirety: only a fraction of these color systems is used as input. This represents the learning bottleneck that is necessary for cultural evolution to occur.

The expectation is that the color systems produced by populations will become more and more convex each generation. After a certain number of generations, the color systems will have converged to a high degree of convexity and will not become more convex afterwards, similar to how the quantifiers in Carcassi et al. (2019) evolve towards monotonicity.

(8)

2 Related work

Human language evolves through iterated learning (Kirby et al., 2014). Learning is considered iterated learning when a behaviour (such as language) is learned by observing the behaviour in other individuals, and those individuals them-selves have also learned their behaviour by observing other individuals (Kirby et al., 2014).

It is quite obvious that language is learned in such a way: children start speak-ing after hearspeak-ing their parents and other people in their environment speak, and the parents have learned language the same way from their parents, and so on. The consequence of this is that a cultural evolution of language occurs, where the language changes over many generations of iterated learning. As language evolves, a structured language emerges. Structured languages exhibit system-level structure (Kirby et al., 2015): there exist compositional rules from which meaning can be inferred. Other forms of language are holistic languages, which exhibit no structure and where every sound has a unique meaning, and degen-erate languages, which have very high structure but where one sound can have multiple meanings (degenerate languages usually consist of a single sound which is associated with every meaning) (Kirby et al., 2015).

The reason that language evolves to structured language is because a language is under pressure to be both learnable and expressive (Kirby et al., 2015). If a language is not learnable, it cannot be transmitted to other individuals, and if a language is not expressive, it cannot function as a reliable form of communi-cation. Holistic languages are not very learnable: the scope of such a language is too great to be transmitted to another individual. Degenerate languages, on the other hand, are not very expressive: because sounds do not have unique meanings, they cannot reliably be used to communicate (Kirby et al., 2015). In languages, there is a trade-off between these two traits: when a language becomes more learnable, it loses expressivity and vice-versa. This means that when languages culturally evolve, an (nearly) optimal balance between the two is reached. This optimal balance results in structured language, where there is a limited amount of sounds that may not each have a unique meaning, but a structured combination of parts allows a language to be very expressive (Kirby et al., 2015).

2.1 Color in language

In a study done by Regier et al. (2015), semantic categories across languages are researched, with the goal to explain why languages have the particular cate-gories that they do. They discuss a similar trade-off in language as Kirby et al. (2015), simple versus informative, in terms of category systems in language. Here, a simple category system means that it minimizes cognitive load, while an informative system supports precise communication. Thus, simple is

(9)

simi-lar to learnable in Kirby et al. (2015) and informative is simisimi-lar to expressive. Regier et al. (2015) conclude that a category system is efficient when it achieves a near-optimal trade-off between simplicity and informativeness.

One of the domains Regier et al. (2015) studied, was color naming systems. They researched color systems in languages that are present in the WCS, the World Color Systems. These are languages specifically from non-industrialized societies, which means that they have not been much subjected to influence from other languages. Regier et al. (2015) measured the complexity of a color naming system as the number of color terms that are present in the language. Interestingly, the color naming systems in the languages tend to be highly in-formative, even with a low number of color terms.

This means that the majority of the world’s languages indeed reflect a near-optimal trade-off between simplicity and informativeness, which suggests that these color naming systems emerge from the cultural evolution of language. Whether the color naming systems also have a high degree of convexity is not discussed in Regier et al. (2015)

2.2 Semantic universals and modelling cultural evolution

As previously stated, Steinert-Threlkeld and Szymanik (2020) have aimed to ex-plain the existence of semantic universals, features of meaning that are shared across all languages. They trained neural networks to learn meanings across two domains and concluded that expressions that satisfy a semantic universal are easier to learn than expressions that do not satisfy a universal. The mean-ings that satisfy the semantic universals that Steinert-Threlkeld and Szymanik (2020) researched were monotonicity and convex color categories

However, the learnability of a semantic universal does not entirely explain why it occurs in all languages. This is because learnability only applies to individual learning. Carcassi et al. (2019) complete this explanation for the monotonicity universal. They model the cultural evolution of language and show that the monotonicity universal emerges when a language evolves.

2.3 Color convexity and conceptual spaces

Carcassi et al. (2019) only model the monotonicity universal. The other seman-tic universal modelled in Steinert-Threlkeld and Szymanik (2020) is the color convexity universal. The color convexity universal means that color terms in all languages denote convex regions in a geometrical color space, which is the conceptual space of colors (Gardenfors, 2004). Steinert-Threlkeld and Szymanik (2020) show that convex color terms are indeed easier to learn by a single neural network.

In mathematics, a convex region is a space where for each pair of points p1

(10)

in the region (Gardenfors, 2004). In formal terms, this means: X is convex if and only if for all x, y ∈ X, for all t ∈ [0, 1], t · x + (1 − t) · y ∈ X (Steinert-Threlkeld and Szymanik, 2020). Figure 1 illustrates this concept. Here, subspace C is convex while the subspaces nC and nC+ are not.

Figure 1: A color circle with subspaces of varying degrees of convexity. (Steinert-Threlkeld and Szymanik, 2020)

Figure 1 also illustrates that some spaces can be more convex than others. Intuitively, one would observe that subspace nC is more convex than subspace nC+, even though neither are convex. It can be said that in subspace nC, more points satisfy the rule for convexity than in subspace nC. That is, in nC there are more points between which a line can be drawn where all points on said line are in nC as well, than the number of points in nC+ that satisfy this rule.

2.4 Neural networks

To model iterated learning, neural networks are used to represent agents. Specif-ically, feed-forward neural networks are used. Feed-forward neural networks are models that can be used to make predictions based on input data. The networks used in this research assign a color category to a point based on its location in the color space.

A neural network is a network that consist of processing units, or neurons, that are connected by weights (Kr¨ose et al., 1993). These neurons are usually struc-tured as layers, where each neuron in a layer is connected to all the neurons in both the previous and the next layer (shown in Figure 2). Each neuron in a layer receives input from the previous layer, and uses this to compute the output that it sends to the next layer. The neurons also compute the weights

(11)

for each neuron in the next layer. These weights represent how strongly this neuron influences the neurons in the next layer (Kr¨ose et al., 1993).

A neural network always has one input layer, one output layer and a vari-able number of hidden layers. The input layer receives data from outside the network, the output layer determines the output of the network and the hid-den layers perform computations that stay inside the network (these are called ’hidden’ because their output can never be observed from outside the network) (Kr¨ose et al., 1993). Each layer can have a variable number of neurons. In the neural networks used in this paper, each output neuron represents a color category.

Figure 2: A neural network with two hidden layers. The arrows represent the weights from one neuron to another.

To create a model using a neural network, the network needs to be trained first. This is done by feeding the network many instances of input data together with the output data it should produce. The goal is for the neural network to set the weights in such a way that it produces the same output data as it receives.

Initially, the weights are set randomly. When the network receives input data, the input layer computes the output that each neuron sends to the next layer, using the activation function. How much each neuron influences the neurons in the next layer is computed by the values of the weights. This is repeated in each hidden layer, until the output layer is reached, which produces the final results and compares it to the desired output. Since the weights are set randomly, this output will have a large error. The goal is to get this error (close) to 0 without overfitting the model. To achieve this, the error is send back through the net-work, and the weights of all neurons are adjusted to minimize the error. This is called back-propagation (Kr¨ose et al., 1993).

(12)

This process is repeated many times until the error is minimized. After this training process, the neural network can be used to produce predictions for in-put data of which the outin-put is not yet known.

The neural networks used in this paper predict which color category a data point belongs to. The input is a data point in a 3D color space. Therefore, the input layer consists of three neurons: one for each coordinate in the color space. The size of the output layer depends on the number of color categories there are present in the color space: one neuron represent one category. For a network to learn a color space, the network is fed many data points from a color space with their respective category assignment. After the training phase, it is then able to assign a color category to a data point based on the points location in the color space.

(13)

3 Method

This section describes the methods used to model the cultural evolution of color terms. First, a description of the model itself is given. Then, the color spaces and the measure of convexity get discussed. Lastly, there is an explanation of how to evaluate the results of the model.

3.1 Modelling cultural evolution

Cultural evolution of language relies on two important factors: iterated learning and the learning bottleneck. Iterated learning is modelled using populations of neural networks. The learning bottleneck is represented by an extra parameter in the model. This subsection gives a description of both, as well as a description of the way the neural networks are implemented.

3.1.1 Populations

To model the cultural evolution of color spaces, populations of 10 agents are created. Each agent in the population is represented by a simple neural network, which is designed to learn a color space. Each generation is then represented by one population. To model iterated learning, each generation needs to learn from the previous generations. This is done by using the output of the previous generation as input for the next generation. Initially, each agent of the first population is trained on a color space with a very low overall convexity (see Figure 4). For each child in the next generation, one agent from the parent generation is randomly selected. This agent creates a mapping of a new color space by assigning categories to the points in that color space, based on its own model that was previously trained. This mapping is then used as input for the child agent. This is repeated for another 150 generations.

3.1.2 Bottlenecks

Essential for the cultural evolution of language is the learning bottleneck: in-dividuals learning the language only receive a portion of the language to learn from instead of the entire scope of the language. To account for this in the model, the parameter bottleneck size is introduced. This bottleneck is the frac-tion of the data points of the color space that the child generafrac-tion receives as input. The model is run with bottleneck sizes of 0.2, 0.4, 0.6, 0.8 and 1. Thus with a bottleneck of 0.2, if the color space learned by the parent agent contains 9000 data points, a child agent will only receive 1800 of these data points as input to train its network on. When the bottleneck size is 1, the entire output data of the parent is used as input for the next generation. This bottleneck is used to test if the bottleneck size is really the cause of the results (because if a bottleneck of 1 produces the same results as a small bottleneck, it is not the bottleneck that causes the particular pattern that emerges).

(14)

The smaller the bottleneck size, the more the color space that the child agent learns from the parent, will differ from the parent color space. This is because the neural network needs to create a model with less input than if the bottleneck size is large. It is expected that the resulting color spaces, after being passed through many generations, will have a lower degree of convexity when a higher bottleneck size is used. When the neural network has more input information to base its predictions on, it will not have to converge to a high convexity as much, because it created more individual mappings instead of having to infer rules. The same can be observed for the monotonicity universal in Carcassi et al. (2019): a smaller bottleneck size results in convergence towards mono-tonicity, while a the model does not converge towards monotonicity when a large bottleneck is used.

3.1.3 PyTorch and neural networks

The model is written in Python, and the neural networks are created using the library PyTorch. PyTorch is a framework used for implementing deep learning, that aims to provide both usability and speed (Paszke et al., 2019). By using this library, basic properties of neural networks do not need to be implemented manually, a task that is both time consuming and can lead to more mistakes. The neural networks in the model receive a three-dimensional color space as input. Each point in the color space is one input data point, and they are assigned one of seven colors. The networks consist of an input layer of three neurons, two hidden layer of 32 neurons, and an output layer of seven neurons. Each of the three neurons in corresponds to one coordinate of the data point in the color space. Each of neurons in the output layer corresponds to one of the possible colors it can be assigned to.

For the activation function the ReLU-function is used, loss is calculated using softmax cross entropy and for the optimizer, the Adam optimizer is used. The networks train in batches of 32 data points over six epochs.

3.2 Color spaces and measure of convexity

This subsection describes how to measure the degree of convexity of the color spaces. While it is easy to observe whether a color space is convex or not convex, the degree of convexity is not so straightforward. A sufficient measure for the degree of convexity is necessary to observe whether the model converges towards convexity. This subsection also describes how to generate the color spaces that are used as input for the initial generation, as there is no previous generation to receive input from.

3.2.1 Calculating the degree of convexity

To analyse the color spaces that the agents in population produce as output, a measure for the degree of convexity of the entire color space needs to be defined.

(15)

Here, the same definition for degree of convexity defined in Steinert-Threlkeld and Szymanik (2020) is used. This definition is based on the notion that a non-convex region can be entirely contained in a larger non-convex region, called the convex hull (see Figure 3). The degree of convexity of the non-convex region is then expressed as the fraction of the smallest convex hull that contains it. This measure of degree of convexity is expressed by Figure 1: for both nC and nC+, the smallest convex hull is a circle through the arms of the stars, but the area of nC covers a larger fraction of the area of said circle than the area of nC+ does. Therefore, nC+ is less convex than nC.

Figure 3: A non-convex shape with its convex hull

However, this measure for degree of convexity is only defined for individual regions, not for entire color spaces containing multiple regions. To calculate the degree of convexity of the entire color system, the degree of convexity of each individual color term in the system is calculated. Then the weighted average of these convexities is used to represent the degree of convexity of the entire color system.

3.2.2 Generation of color spaces

The particular color systems used in the model are partitions of the CIELab color space, and contain 9261 points. The CIELab color space is an approxi-mately uniform color space that aims to replicate the human perception of colors (Hill et al., 1997). The partitions are generated using the algorithm described in Steinert-Threlkeld and Szymanik (2020), using 7 different color categories. The algorithm takes two parameters, c and s, which control the degree of convexity the resulting color system will have. Parameter c controls how connected the regions of the color system are (that is, how spread out the categories are), while s controls the smoothness of the borders of the color regions.

(16)

is a color system with a high degree of convexity. Color spaces whose convexity approaches 1 consist of regions with very few angles, and each color is contained in a single region in the color space. Color spaces that have a very low degree of convexity are an almost completely random scattering of colors.

Figure 4: A color system of 7 categories with a convexity of 0.1516. Such color systems are used as the input data for the first generation.

(17)

3.3 Initial generation

The agents in the initial generation receive an input of color systems that need to have a very low degree of convexity. This is achieved using parameters s = 0 and c = 1. The color systems that are created using this values have a degree of convexity of around 0.15 and have the same structure as the color space in Figure 4.

3.4 Evaluation of the model

Before modelling iterated learning, the neural networks need to be evaluated. This is done by first training the network on a fraction of a color space, and then testing the accuracy of the network on the remaining fraction. This fraction is equivalent to the bottleneck parameter. The network is trained on bottlenecks of 0.75, 0.2 and 0. This is done to test how large an effect a smaller bottleneck has on the accuracy of the network. The larger bottlenecks are represented by 0.75 and not 0.8, because if a training bottleneck of 0.8 is used, then there are not enough remaining data points to test the accuracy on. The network accu-racy is tested on color spaces with degrees of convexity ranging from 0.1 tot 0.99.

After these accuracies are computed, the correlation between the accuracy of a network and the degree of convexity of a color space is computed using a simple linear regression model.

To evaluate the results of the model for cultural evolution, one agent of the final generation is chosen. The chain of parent agents in all previous genera-tions of this agent is taken, and for each parent the convexity is plotted against the generation. The reason only one agent is chosen, is because the because in each generation a random parent is chosen that each child learns from. If one parent happens not to be chosen for any of the children, the entire chain of previous agents of that parent is omitted. The probability of this happening in 150 generations is high, which means that the chains are the same for all agents for a large part of the first generations. Only at the end of the chain will the parent chains of the agents diverge.

Each of the five bottlenecks (0.2, 0.4, 0.6, 0.8 and 1) is plotted in a separate graph. For each of the bottlenecks, the model is run five times.

(18)

4 Results and evaluation

This section contains the results and evaluation of the model. First, the accuracy of the networks are evaluated. It is important to evaluate the accuracy of the networks before evaluating the model itself, as the accuracy of the networks influences the results of the model.

4.1 Accuracy of the neural networks

Figure 6: The accuracies of the neural network for color spaces with various degrees of convexity. The linear regression and R2 values are shown for each bottleneck.

Figure 6 shows the accuracy of the networks, for three bottlenecks, together with a linear regression of the results. There is a clear linear relation between the degree of convexity of a color space that the network is trained on and the accuracy of the network. Using a bottleneck of 0, the degree of convexity has no influence on the accuracy of the network. The reason for this is obvious: the network does not receive any input data and cannot learn anything, which means that it computes it output based on the randomly initialized values of

(19)

the weights in the network.

The color spaces with the lowest degrees of convexity result in similar accu-racies for all the bottlenecks. The reason for this is that color spaces with a low convexity contain an almost random distribution of categories (see Figure 4). Therefore, there are not any structures to learn for the network, which means that it has to learn each value individually. That is not possible, since there are only 32 neurons in each hidden layer and the color spaces contain 9261 points. Thus, the network does not have enough computational power to compute the value of each point individually.

The higher degrees of convexity result in significantly higher accuracies for both bottleneck 0.75 and 0.2, though the linear regression shows that the accuracies with bottleneck 0.2 are slightly lower. This makes sense: a higher bottleneck means that the network receives a larger input data set, which gives it more opportunities to adjust the weights in the model and to reduce the error. It is interesting to see however, that the R2 of bottleneck 0.2 is very close the R2of bottleneck 0.75. This suggests that even when the input data set is small, the network is able to detect the structures that are present in color spaces with a high degree of convexity.

(20)

4.2 Results of iterated learning

Figure 7: Plots modelling iterated learning of color spaces for each bottleneck over 150 generations.

(21)

As opposed to what was expected, the color spaces do not evolve towards a high degree of convexity. Instead of slowly converging towards convexity, the convexities of the output color spaces are oscillating between a high convexity and a low convexity. However, the amplitudes of these oscillations are a lot larger for the smaller bottlenecks. This would suggest that networks trained on a smaller bottlenecks do have a tendency towards convexity, however when the color space is already convex, it returns to a non-convex color space again. However, running the model using a bottleneck of 0 seems to suggest other-wise. This model exhibits the same pattern of oscillation as a bottleneck of 0.2. When a bottleneck of 0 is used, there is no input data and the network makes predictions based only on the randomly initialized weights. Thus, it seems that the oscillations in the plot occur because of the initial values of the networks, and they have little to do with the way the networks learn the color spaces. This would in turn mean that the neural networks or the learning bottleneck do not influence the degree of convexity at all.

On the other hand, the neural networks are learning something from the color spaces when a small bottleneck is used, as evidenced by the accuracy plot. Fig-ure 8 and FigFig-ure 9 show two color spaces: one with a high convexity and the color spaces a network outputs, when learning the first with a bottleneck of 0.2. The accuracy of the learned color space is 0.6172, and it has a low degree of convexity of 0.2495. However, even though its convexity is low, there are clearly some structures to be seen in the color space in Figure 9. But even though there are structures present in the color space, that does not necessarily mean that the color space has a high degree of convexity. For example, two colors alter-nating is a pattern that can be learned by the network, but it is a completely non-convex pattern. Therefore, the way the networks learn the color spaces has little influence on the degree of convexity.

(22)

Figure 8: Input color space with a convexity of 0.9172

Figure 9: Color space that a neural network outputs when using a bottleneck of 0.2 The network has an accuracy of 0.6172 and the color space has a convexity of 0.2495.

(23)

5 Conclusion and future work

Modelling the cultural evolution of color categories using neural networks does not result in a convergence towards convex color spaces. The expectation was that when using small bottlenecks, the color spaces would become more and more convex. Instead, the degree of convexity of the output color spaces oscil-lates between high and low convexities.

This does not mean that a network does not learn anything from the input when a small bottleneck is used. As can be seen in the analysis of the accuracy, a higher degree of convexity results in a higher accuracy, even for a small bot-tleneck. However, even when the accuracy is relatively high, this does not mean that the degree of convexity is preserved when learning the color space. For ex-ample, the network could learn an alternating pattern between two colors. This can still have a relatively high accuracy, but it is a very non-convex structure. This could also be a result of the accuracies not being high enough. The 0.2 bottleneck results in accuracies that are generally not higher than 0.8. This means that there is still information that gets lost. If the neural network learns a region to be one color, but because of the accuracy is 0.8 it assigns a fraction of 0.2 of the points in the region to a different color, the resulting convexity of said region is low.

Future research can include which configuration of the neural networks results in a higher accuracy. Of course, for lower bottlenecks, the accuracy would still need to be quite low, otherwise it is not possible for cultural evolution to occur. If the accuracy is higher for the color spaces with a high degree of convexity, this high convexity might be preserved better in the output color spaces. That could prevent the oscillations that can be seen in Figure 7.

Yet there could be a possible different explanation for why the oscillations occur in smaller bottlenecks. Intuitively, it could be said that color spaces that have patterns (or structures) have higher chance of being more convex than color spaces that are more random. If the categories in a color space are randomly distributed, the color space will always have a low convexity, as there cannot be large regions of the same color. When a bottleneck of 1 is used to train a network on a (almost) random color space, more of the randomness of the color space will be preserved than when a smaller bottleneck is used. A network trained using a smaller bottleneck might infer more patterns than a network trained using a larger bottleneck. If there is a pattern in the output color space, it could be convex, but if there is no pattern in the output color space, it can never be convex. Thus, there can be more instances where a network infers a convex color space when trained on a lesser convex color space, when using a smaller bottleneck. This results in more oscillations with a higher amplitude than when a larger bottleneck is used. Currently, this is only a hypothesis, but it could be tested in future work.

(24)

Another explanation for the lack of convergence towards convexity, might be that the measure of convexity is not sufficient. The measure of convexity that is used here is not very sensitive to smooth borders, while it is sensitive to shapes that have many angles. For example, a shape with very rough borders might be considered very convex, while an X-shape is not, even if it has completely smooth borders. Thus, if the regions in the color spaces have more angles but smooth borders, they will have a lower degree of convexity.

The degree of convexity of the entire color system is computed by taking the weighted average of convexity of each of the regions. When some regions are very non convex, this can drag down the complete convexity of the system, even if other regions become more convex than before.

Thus, a final topic for future work could be to research if there are other mea-sures of convexities that more accurately represent the degree of convexity of the particular color spaces used in this paper.

(25)

References

F. Carcassi, S. Steinert-Threlkeld, and J. Szymanik. The emergence of monotone quantifiers via iterated learning. 2019.

M. H. Christiansen and S. Kirby. Language evolution: Consensus and contro-versies. Trends in cognitive sciences, 7(7):300–307, 2003.

P. Gardenfors. Conceptual spaces as a framework for knowledge representation. Mind and Matter, 2(2):9–27, 2004.

B. Hill, T. Roger, and F. W. Vorhagen. Comparative analysis of the quantiza-tion of color spaces on the basis of the cielab color-difference formula. ACM Transactions on Graphics (TOG), 16(2):109–154, 1997.

S. Kirby, T. Griffiths, and K. Smith. Iterated learning and the evolution of language. Current opinion in neurobiology, 28:108–114, 2014.

S. Kirby, M. Tamariz, H. Cornish, and K. Smith. Compression and communi-cation in the cultural evolution of linguistic structure. Cognition, 141:87–102, 2015.

B. Kr¨ose, B. Krose, P. van der Smagt, and P. Smagt. An introduction to neural networks. 1993.

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information pro-cessing systems, pages 8026–8037, 2019.

T. Regier, C. Kemp, and P. Kay. 11 word meanings across languages support efficient communication. The handbook of language emergence, 87:237, 2015. P. Siemund. Linguistic universals and vernacular data. Vernacular universals and language contacts: Evidence from varieties of English and beyond, pages 321–346, 2009.

S. Steinert-Threlkeld and J. Szymanik. Ease of learning explains semantic uni-versals. Cognition, 195:104076, 2020.

The cultural evolution of color convexity in populations of neural networks

The cultural evolution of

color convexity

in populations of neural

networks

Layout: typeset by the author using L

TEX.

The cultural evolution of color

convexity

in populations of neural networks

Nienke C. Duetz

11418656

Bachelor thesis

Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam

Faculty of Science

Science Park 904

1098 XH Amsterdam

Supervisor

dr. F. Carcassi

Institute for Logic, Language and Computation

Faculty of Science

University of Amsterdam

Science Park 907

1098 XG Amsterdam

Abstract

Contents

1

Introduction

2

Related work

2.1

Color in language

2.2

Semantic universals and modelling cultural evolution

2.3

Color convexity and conceptual spaces

2.4

Neural networks

3

Method

3.1

Modelling cultural evolution

3.2

Color spaces and measure of convexity

3.3

Initial generation

3.4

Evaluation of the model

4

Results and evaluation

4.1

Accuracy of the neural networks

4.2

Results of iterated learning

5

Conclusion and future work

References

_TEX.