Incorporating meaning into thediscovery of generic sentences

(1)

Incorporating meaning into the

discovery of generic sentences

Roan D.J. de Jong 10791930

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisors

prof. dr. ing. R.A.M. van Rooij & dr. M.A.F. Lewis

Institute for Logic, Language and Computation Faculty of Science & Faculty of Humanities

University of Amsterdam Science Park 107 1098 XG Amsterdam

(2)

Abstract

In this thesis, we develop a model for incorporating meaning into the statistical approach of learning generic sentences proposed by Rooij and Schulz (to appear). We show that our model can be used to determ-ine the probability of indirectly observing a hypernym by examining the probability of encountering its hyponyms in a corpus.

“Colourless green ideas sleep furiously” - Noam Chomsky, 1957

1 Introduction

1.1 Generic sentences

Generic sentences are common utterances in human language. They are the sentences which we use to attribute physical features, behavioural patterns, sen-timents we hold, and other beliefs we have to a group. Which generics people use, demonstrate what these people hold to be true about these groups. For example: “dogs bark”, “milk is white”, and “rocks are hard”.

Generic sentences are in some cases accepted even if there is no factual basis for these beliefs. For example, the sentence “Mosquitoes carry the West Nile virus” is generally accepted, even though only less than 1% of mosquitoes carry the disease (Leslie, 2008). There are several theories on how people acquire an understanding of which utterances they should accept as a generic sentence.

1.1.1 Conceptual approach

Leslie (2008) argues that we use a conceptual model to discover generic sen-tences. Leslie identifies three main categories of generic sentences: generics based on counter-instances, characteristic features, and striking features.

Generics based on counter-instances look at the existence of other features within a class. If a feature f can be replaced with a second feature f0, then f0 is said to be a positive counter-instance of f . For example, “dogs are female” has the positive counter-instance “dogs are male” as the feature “being female” can be replaced with the feature “being male”. A negative counter-instance f0 is a feature that cannot replace f within the class G. Features with only negative counter-instances are said to be generic sentences. For example, “dogs bark” is a generic sentence as it would need to be replaced with another sound that dogs make, such as “hooting” or “meowing”. However, these features are only exhibited outside the class of all dogs, therefore “dogs bark” is an accepted generic sentence.

Characteristic features are based on groups of classes. For these features, we assume that every class within a group has a distinctive feature of the same type. For example, we assume that every species of trees has a distinctive type of leaf. This allows people to quickly assume that a feature is present in all members of the group, even if we only encounter a single member of the group that has that feature. For example, when we encounter a new type of tree, we will immediately generalise the pattern of its leaves to the entire class.

The final category is based on the importance of knowing about a feature. Examples of these features are “being poisonous”, “being dangerous”, or “being healing”. Leslie describes these features as “striking”. This category would explain why people commonly hold the belief that mosquitoes carry the West Nile virus, as it is an infectious disease associated with an epidemic in North-Western America in 1999 (Campbell et al., 2002).

(4)

1.1.2 Statistical approach

A second approach to generics is statistical in nature and is based on the preval-ence of a feature in a group (Cohen, 1999). This approach makes a distinction between absolute and relative generics. Absolute generics merely look at the occurrence of a feature within a group and accept a feature as generic when more than half of the class possesses that feature.

On the other hand, relative generics are determined by comparing the re-lative occurrence of a feature when compared to the contextually alternative classes. For example, if the class we are investigating would be a currently practised religion, such as Judaism, its alternatives would be the set of all other religions currently still in practice. If a feature is more prevalent in a class compared to its alternatives above a certain threshold, it is said to be relat-ively generic. For example, overall mosquito’s do not carry the West Nile virus. However, compared to other insects it is a distinctive quality.

1.1.3 Van Rooij’s approach

In this research, we will focus on a novel method proposed by Rooij and Schulz for identifying generic sentences. This method uses the concept of typicality, which is based on the theory of associative learning developed in psychology and is based on a variant of the statistical approach for generics discussed in the previous paragraph. This concept is defined as follows:

Definition 1.1. Typicality (Rooij and Schulz, to appear)

∇P_Gf :=P (f |G) − P (f |alt(G))

1 − P (f |alt(G)) × V alue(f )

In this definition, f is a feature which we are attributing to a group G. The function alt(G) is the group of all contextual alternatives of G. The function V alue(f ) is a measure of the intensity of the feature f . As there are often no predefined measures for V alue(f ), this variable is optional. In applications where no clear alternative sets of G can be defined, the probability of observing f can suffice as well:

∆P_Gf := P (f |G) − P (f )

This measure of intensity can be compared to the third class of generic sentences as described by Leslie (2008). For this third class, it is assumed that value is typically high when the feature is taken to be important, even when the feature itself is not widespread across the class it is attributed to.

1.2 Hypernymic relations

A hypernymic relation is a relation which shows that all members of a group are also part of a more general group. Therefore, the entire group is subordinate to its hypernym. In set theory, the set X would be a hypernym of Y if Y is a proper subset of X. Common examples of hypernymic relations are phrases as “Dogs are mammals” and “Forks are utensils”. Notice that these phrases are examples of generic sentences.

(5)

1.2.1 Hearst patterns

A long-standing approach for extracting hypernymic relations from a corpus is the use of so-called ‘Hearst-pattern’. First proposed in Hearst (1998), Hearst patterns use patterns in sentences that signal hypernymic relations. The ori-ginal paper only proposed 8 patterns, however many others have been found by later research. The patterns are useful for detecting hypernyms, as only a single occurrence can already confidently assert the existence of the belief in a hypernym. For example, the sentence:

(...) infectious diseases, such as malaria, HIV and the flu (...) follows the Hearst-pattern:

NP such as {NP,}* and NP

This sentence accurately shows that malaria, HIV, and the flu are all instances of the class of infectious diseases without requiring a large number of examples to train this relationship. Recent studies even show that in some cases this rule-based approach outperforms modern methods (Roller et al., 2018). However, in most cases, Hearst patterns create many false positives that need to be judged by hand, which is a monotonous and time-consuming task (Hearst, 1998).

1.2.2 Distributional semantics

Distributional semantics is a second approach that can be used to learn hy-pernymic relations (Fu et al., 2014). In distributional semantics, the meaning of a word is characterised by the context it is used in. Its hypothesis is that two expressions are semantically similar if they can appear in similar contexts. To discover which expressions share common contexts, distributional semantics uses word co-occurrences to derive its meaning from the expressions that are commonly used in a single utterance (Lenci, 2008). These representations based on co-occurrences are often transformed into a lower-dimensional space, as this can improve the performance of the language model and is faster than the ori-ginal representation (Mnih and Kavukcuoglu, 2013).

There are many approaches to create these lower-dimensional models, most of which are described by Mikolov et al. (2013a). However, in this paper, we will focus on the new approaches that Mikolov et al. (2013a) proposes: continuous bag-of-words (CBOW) and skipgram. Both these methods are unsupervised algorithms that rely on the context of a term to learn embeddings of those terms.

The goal of the CBOW-model is to maximise the probability of predicting a term based on its neighbours as input data. The model learns this probability by creating a word embedding for every term and maximising that embedding to correctly predict the target word based on the surrounding terms (Mikolov

(6)

1.2.3 Hypernymic relations and bias

While generic sentences can be useful in identifying what people deem to be generally accepted facts about the world, they can also have unwanted side-effects. Namely, they can be used to express biases. Caliskan et al. discovered that GLoVe word embeddings show similar bias in race and gender as people show in an Implicit Association Test (IAT). The IAT is a psychological test which has demonstrated that people can link two concepts more quickly when the association between these concepts is stronger. Caliskan et al. developed a method to apply this method from psychology to similarity between vectors in word embeddings, identifying similar patterns (Caliskan et al., 2017). The research by Caliskan et al. reports hypernymic relations. For example, their finding that female names are associated with arts rather than science can be viewed as representing the sentence “Mathematicians are men”.

When we are aware of these findings, they allow us to discover further bias in other domains or datasets. It also enables us to remove the bias or modify the data when it would cause noise for our intended use. For example, one could build an algorithm which gives the task to match a high school student to an academic program based on their hobbies, (dis-)likes, and other factors by comparing them to satisfied graduates with similar interests. As there would be more examples of girls choosing a program in arts and boys choosing a program in mathematics, this algorithm could unintentionally group students based on their gender.

Goal of the project

The approach described by Rooij and Schulz (to appear) bases the typicality of a feature on how often we observe that feature in a member of a group. A feature is learned when we deem it typical. However, in the modern age - where literacy rates in the developed world are exceedingly high - we often learn about the world through written media (Roser and Ortiz-Ospina, 2016). Therefore, we will need to adapt the method proposed by Rooij and Schulz to cover the domain of written text.

As the theory presented by Rooij and Schulz can be used to determine ac-ceptability of all types of generics, we will examine whether our distributional model can be used to identify the hypernymic relationships specifically. Our approach will be as follows: first, we will extract accepted generics from the corpus using Hearst patterns, which we will use to test whether our model can also establish these accepted generics. Secondly, we will examine whether we can observe more general associations by attempting to recreate association bias as described in Caliskan et al. (2017).

We, therefore, pose the following question:

“Can our distributional model infer hypernymic relations in text for the theory proposed by Van Rooij and Schulz?”

(7)

2 Proposed model

Before we introduce our model, it is necessary to clarify our definition of context. In this thesis, context will refer to the scope of the corpus that is relevant for establishing a relationship between the terms f and g. Examples of contexts that could be examined are entire articles, paragraphs, sentences, or even just the direct neighbours of a term. This definition can be used to calculate the probability of encountering a term f in our corpus.

Definition 2.1. The probability of observing f in a corpus

P (f ) = #(f ) #(context)

Note that the function #(f ) returns the number of times that f occurs in our corpus. #(context) refers to the number of contexts in our corpus.

A naive approach to adapting the approach as proposed by Rooij and Schulz (to appear) would be to replace the event of observing a feature in a group by the observation of the feature in text. In this definition, observing the term f and g in the same context is defined as an observation of the feature with a member of the group. We define this observation as #(f ∧ g). The total co-occurrences of f and g can then be used to calculate the conditional probability f given g.

Definition 2.2. The conditional probability of observing f given g

P (f |g) =#(f ∧ g) #(g)

In this thesis, however, we are not just interested in the co-occurrence of a single term, but with a group of terms. As defining a method for measuring the significance of a term in the group is outside the scope of this thesis, we will assume that the influence of every member of a group is equal. Therefore, in this thesis we will use the mean co-occurrence of all members of the group.

Definition 2.3. The probability of observing f given a group G

P (f |G) = 1 G G X g #(f ∧ g) #(g)

2.1 Incorporating meaning

However, this model would find low co-occurrence for hypernymic relations, as these do not often occur in the same context (Hearst, 1998). Instead, people of-ten use either the hyponym or the hypernym. To alleviate this issue, we propose extending this model to observe the indirect relation between the term f and

(8)

Therefore, we introduce the random term n for which we will calculate its in-fluence on the indirect co-occurrence between f and G. First, our method will determine the direct rate of co-occurrence between n and G. Secondly, we will ascertain the relation between the meaning of n and f in order to establish the relevance of n to the indirect relation between f and G. We will measure the similarity of n to f using the cosine similarity between the word embeddings of n and f . Instead of observing just a single term n, we will apply this model on all terms in the corpus. This approach results in the following model:

Definition 2.4. Indirectly observing f given a group G for a random term n

Pψ(f |G) = 1 N N X n sim(f, n) × 1 G G X g #(n ∧ g) #(g)

This extended model also requires us to incorporate the random term n into the probability of observing f in our corpus by determining the probability of indirectly observing f . This can be achieved by counting the occurrences of a random term n in the corpus and utilising its similarity to the meaning of f to determine its significance to the probability of f occurring indirectly. As with the previous formula, we will also apply this method to all terms in the corpus.

Definition 2.5. The probability of indirectly observing f in a corpus

Pψ(f ) = 1 N N X n sim(f, n) × #(n) #(context)

This extension allows us to apply these definitions of indirect co-occurrence into the model proposed by Rooij and Schulz.

Definition 2.6. Typicality of f given a group G for hypernymic relations

∇P_Gf = P

ψ_{(f |G) − P}ψ_{(f |alt(G))}

1 − Pψ_{(f |alt(G))}

As mentioned in the introduction, when there is no clear definition of the alternatives of a group G, we are able to use a simplified version of the model of typicality. Which is As our research does not have these alternatives defined in all cases, we will apply this simplified model as well:

∆P_Gf := Pψ(f |G) − Pψ(f )

Which is equal to:

Definition 2.7. Typicality of f given a group G without alternative groups

∆P_Gf = 1 N N X n sim(f, n) × 1 G G X g #(n ∧ g) #(g) − 1 N N X n sim(f, n) × #(n) #(context)

(9)

3 Method

In this section, we will discuss the technical implementation of our project. First, we will introduce our corpus and next our application of Hearst patterns and our usage of distributional semantics

3.1 Corpus

For this project, we use the British National Corpus (BNC) as our source (Brit-ish National Corpus, 2001). The BNC is a collection of written and spoken excerpts of British English from the late twentieth century (1991-1994). The written part is a collection of different widely varying sources, such as academic essays and newspaper articles. The spoken part consists of transcripts of differ-ent conversations, ranging from formal business conversations to informal phone calls. It contains over 100 million lines of text with more than 8 million distinct terms.

As this corpus gives us such a wide spread of sources, it allows us to discover generic associations that were widely accepted around the time that the cor-pus was generated. The corcor-pus was cleaned by filtering punctuation, non-word characters, and removing common whitespace are described in the appendix. Characters that were replaced - but not removed - are shown in the following table. Character Replacement £ ‘ POUND ’ % ‘ percent ’ ° ‘ degrees ’ & ‘ and ’

Table 1: Characters that have been replaced in our dataset

The spoken excerpts of the BNC are often too long to use for our implement-ation, as they often miss proper sentence endings. They also contain stuttering, sounds that are not actual words (such as “errr” and “hmm”), and words that were repeated. These excerpts can become extremely protracted due to these problems. During early testing, we noticed that these sentences often caused memory issues for our implementation. We, therefore, decided to exclude all sentences longer than 50 words where a ‘word’ is every substring that is sur-rounded by whitespace characters.

(10)

3.2 Hearst Patterns

In order to identify hypernymic relations using Hearst patterns, we use a collec-tion provided in Roller et al. (2018). This list is an extension of the original six patterns defined in Hearst (1998) and includes patterns which were discovered after the original paper was published. As all patterns contain static strings that are necessary for a sentence to be able to fit a Hearst pattern, all sentences that do not contain these patterns are filtered before processing. The full list of patterns can be found below.

Pattern Source Excluded

NP H including NP* (and|or) NP Hearst (1998) NP H such as NP* (and|or) NP Hearst (1998) NP H especially NP* (and|or) NP Hearst (1998)

NP* (and|or) other NP H Hearst (1998) Yes

Such NP H as NP* (and|or) NP Hearst (1998) Yes

NP which is an example of NP H Roller et al. (2018) NP which is a class of NP H Roller et al. (2018) NP which is a kind of NP H Roller et al. (2018) NP (and|or) any other NP H Roller et al. (2018) NP (and|or) some NP H Roller et al. (2018) NP which is called NP H Roller et al. (2018) NP is a special case of NP H Roller et al. (2018) NP is (a|an) NP H that Roller et al. (2018) like (all|most|any|other) NP H Roller et al. (2018) unlike (all|most|any|other) NP H Roller et al. (2018)

NP is J J S (most) NP H Roller et al. (2018) Yes X is a !(member|part|given) NP H Roller et al. (2018) Yes

Table 2: Hearst patterns used for the extraction of hypernymic relations

In this table, the noun phrase that will signal the hypernym is tagged with the extension H, a vertical bar implies one of the options must be present, and an NP* represents an enumeration of hyponyms. The term J J S refers to the Penn Treebank POS-tag which identifies a superlative adjective.

As shown in the table, some patterns were excluded from this research. This is due to our implementation being unable to capture these patterns correctly. Note that Roller et al. (2018) also provided the pattern !(features|properties) NP H such as NP*. This pattern was also not captured correctly by our imple-mentation and was not included in the table due to its similarities with the such as pattern provided by Hearst (1998).

(11)

3.2.1 Syntactic structure

As Hearst patterns rely on the syntactic structure of a sentence, our imple-mentation requires a method to create this structure. We use the Stanford CoreNLP toolkit to parse the syntactic structure of a given sentence. The Stan-ford CoreNLP toolkit is an integrated toolkit for natural language processing that allows us to quickly parse a sentence, tag all constituents with their part-of-speech, and build the syntactic tree using a PCFG-parser (Manning et al., 2014). The part-of-speech-tagging is performed using the Penn Treebank tag set and the parsing of the syntactic structure is based on a pre-trained English language model also based on the Penn Treebank (Marcus et al., 1994).

3.2.2 Syntactic tree traversal

The final step in identifying Hearst patterns in our corpus is traversing the syn-tactic tree to select sentences that contain the correct structure and extracting the hypernymic relations when a Hearst pattern is identified. To produce these traversals, we developed an algorithm to traverse the syntactic structure that utilises a bottom-up approach to identify the local structure of the phrase that possibly contains a Hearst pattern. The extracted patterns were then selected by hand.

3.3 Distributional semantics

Our distributional model consists of two parts: raw counts of the co-occurrence between phrases and a lower dimensional Word2Vec model generated using the skipgram approach ( ˇReh˚uˇrek and Sojka, 2010). The data was preprocessed by automatically detecting common phrases which correlate to common multi-word expressions using the approach described by Mikolov et al. (2013b) and Bouma (2009). The phrases were generated using the same library as was used to create the Word2Vec model ( ˇReh˚uˇrek and Sojka, 2010). To reduce the size of the model, only phrases that occur more than twenty times in the corpus are included in the final model.

We also use the actual co-occurrence count in our model as a measure of how indicative a term is for the class we want to model. To create these co-occurrence counts, we view the surrounding of a phrase as the two previous and following phrases. This provides us with the local structure in a sentence.

(12)

4 Results

4.1 Retrieval of Hearst relations

Our first experiment investigates whether our model can recreate hypernymic relations. To generate these relations, we extracted 23 relations from the BNC at random and verified these relations using our model. The hyponyms that were determined for these classes shown in appendix A. Two of these classes were not modelled in our distributional model and have therefore been excluded from the results.

We also generated 23 random categories as a baseline in order to review whether the model produces a significant increase in typicality for accepted hypernymic relations. For the same reason, we also generated random classes for the hypernyms that have been established using Hearst patterns. In the following table, we present the typicality-score for these hypernyms using the actual classes and the classes we randomly generated, as well as our control group. In the table that reports the typicality scores of our actual hypernyms, we emphasise whether the random group or the extracted hyponyms give a higher result.

Random hypernym ∆P_Gf random group Manly 1.008 · 10−5 Soap opera 5.501 · 10−6 Owners abroad 6.021 · 10−6 Jack Mason 7.472 · 10−6 Nodule 8.274 · 10−6 Carbon dioxide 9.248 · 10−6 Imperfection 9.252 · 10−6 Falsified 8.264 · 10−6 Projection 8.765 · 10−6 McLuhan 5.508 · 10−6 Agora 6.988 · 10−6 Democratic 6.873 · 10−6 Armstrongs 9.638 · 10−6 Murchison 7.846 · 10−6 Markers 1.080 · 10−5 Sir Joshua 6.801 · 10−6 Newbridge 7.862 · 10−6 Reimbursement 7.163 · 10−6 Roasts 5.577 · 10−6 Saddle 8.254 · 10−6 Khmer Rouge 6.496 · 10−6 High concentrations 9.374 · 10−6 Brusque 7.370 · 10−6

Term ∆P_Gf Random group ∆P_Gf Hyponyms Narcotic 7.321 · 10−6 _{1.010 · 10}−5 Writing 6.307 · 10−6 8.921 · 10−6 Poison 7.573 · 10−6 1.031 · 10−5 Atom 6.930 · 10−6 1.102 · 10−5 Activities 8.557 · 10−6 _{1.187 · 10}−5 Movements 1.063 · 10−5 1.171 · 10−5 Countries 7.356 · 10−6 9.769 · 10−6 Characteristics 1.096 · 10−5 1.327 · 10−5 Rocks 9.506 · 10−6 _{1.332 · 10}−5 Organic matter 9.998 · 10−5 1.430 · 10−5 Minerals 9.498 · 10−6 1.593 · 10−5 Sports 6.098 · 10−6 8.229 · 10−6 Teas 6.442 · 10−6 _{9.419 · 10}−6 Herbs 8.372 · 10−6 1.233 · 10−5 Fish 7.853 · 10−6 1.220 · 10−5 Medical condition 7.455 · 10−6 1.077 · 10−5 Languages 7.696 · 10−5 _{9.881 · 10}−6 Materials 1.064 · 10−5 1.219 · 10−5 Institutions 1.093 · 10−5 1.415 · 10−5 Organism 8.860 · 10−6 1.212 · 10−5 Weapons 1.029 · 10−5 _{1.270 · 10}−5 Churchmen 1.047 · 10−5 1.095 · 10−5 Religion 1.211 · 10−5 1.666 · 10−5

(13)

4.2 Recreating the Implicit Association Test

The second experiment attempts to recreate the Implicit Association Test (IAT) as described by Greenwald et al. (1998). We will use this experiment to examine whether our model can be used to establish basic associations. As the original examples from the AIT are based on associations with certain racial groups in American society, it would be negligent to assume that these views were similar to those in the United Kingdom in 1991-1994. As the paper on the IAT was written after this timeframe, there is no data on this subject and generating or hypothesising about these norms is outside the scope of this project. However, as a baseline Greenwald et al. (1998) used the association of positive and negative terms with insects, flowers, instruments, and weapons. As these associations are less based on societal norms and historical context, we assume that these relations are similar in British society. Therefore, we will use these classes in this experiment.

These classes are also clear alternatives of each other. This allows us to use the original definition of typicality as defined by Rooij and Schulz (to appear):

∇P_Gf = P

ψ_{(f |G) − P}ψ_{(f |alt(G))}

1 − Pψ_{(f |alt(G))}

In this experiment, we will compare the performance of the original definition to the more simple approach.

In the following table, we report the typicality of all target terms in the original groups mentioned above. For reference, these groups are listed in appendix B. They are identical to the ones reported in Greenwald et al. (1998), except for the term ‘bagpipe’ which was replaced by the British English term ‘bagpipes’. No results are given for hypernyms that were not present in our model.

For each term, we emphasise whether the positive or negative typicality score performs better. For each group, we only report the typicality score ∇P_Gffor the terms that are associated with that class in Greenwald et al. (1998), as in our full results the typicality of the other group is equal to ∇P_Gf· −1. This is due to the small values for Pψ(f |G) and Pψ(f |alt(G)), making the denominator approach 1. To prove this we determine the limits of the formula ∇P_Gf, for readability we have replaced Pψ_{(f |G) and P}ψ_{(f |alt(G)) with X and Y respectively.}

lim X→0 Y →0lim X − Y 1 − Y + Y − X 1 − X = 0 !! lim X→0 X 1 + −X 1 − X = 0 ! 0 0

(14)

Term ∆P_Gf Positive ∆P_Gf Negative ∇Pf GPositive Aster 7.201 · 10−6 7.053 · 10−6 1.836 · 10−7 Clover 8.028 · 10−6 8.227 · 10−6 −1.641 · 10−7 Hyacinth 9.254 · 10−6 8.703 · 10−6 5.992 · 10−7 Marigold 8.900 · 10−6 8.415 · 10−6 5.256 · 10−7 Poppy 8.560 · 10−6 8.097 · 10−6 5.061 · 10−7 Azalea 5.873 · 10−6 5.812 · 10−6 9.758 · 10−7 Crocus 6.851 · 10−6 6.600 · 10−6 2.909 · 10−7 Iris 9.544 · 10−6 8.960 · 10−6 6.282 · 10−7 Orchid 7.606 · 10−6 7.553 · 10−6 8.925 · 10−8 Rose 6.701 · 10−6 6.660 · 10−6 7.267 · 10−7 Bluebell 7.077 · 10−6 _{6.682 · 10}−6 _{4.393 · 10}−7 Daffodil 7.380 · 10−6 _{7.083 · 10}−6 _{3.407 · 10}−7 Lilac 8.676 · 10−6 8.329 · 10−6 3.914 · 10−7 Pansy 7.269 · 10−6 6.811 · 10−6 4.995 · 10−7 Tulip 8.061 · 10−6 7.827 · 10−6 2.795 · 10−7 Buttercup 6.976 · 10−6 6.725 · 10−6 2.907 · 10−7 Daisy 9.880 · 10−6 9.339 · 10−6 5.915 · 10−7 Lily 9.988 · 10−6 9.227 · 10−6 8.097 · 10−7 Peony 7.242 · 10−6 6.830 · 10−6 4.515 · 10−7 Violet 9.200 · 10−6 8.560 · 10−6 6.860 · 10−7 Carnation 7.448 · 10−6 7.068 · 10−6 4.208 · 10−7 Gladiola - - -Magnolia 7.008 · 10−6 6.761 · 10−6 2.901 · 10−7 Petunia - - -Zinnia - -

-Term ∆P_Gf Positive ∆P_Gf Negative ∇Pf

GNegative Ant 6.780 · 10−6 6.687 · 10−6 −1.263 · 10−7 Caterpillar 8.174 · 10−6 8.182 · 10−6 −2.680 · 10−8 Flea 8.008 · 10−6 _{7.920 · 10}−6 _{−1.239 · 10}−7 Locust 7.610 · 10−6 _{7.951 · 10}−6 _{3.126 · 10}−7 Spider 8.587 · 10−6 8.272 · 10−6 −3.519 · 10−7 Bedbug - - -Centipede 7.568 · 10−6 7.511 · 10−6 −9.279 · 10−8 Fly 6.997 · 10−6 6.884 · 10−6 −1.401 · 10−7 Maggot 7.807 · 10−6 8.446 · 10−6 −2.951 · 10−7 Tarantula 7.229 · 10−6 7.075 · 10−6 −1.951 · 10−7 Bee 7.807 · 10−6 7.482 · 10−6 −3.610 · 10−7 Cockroach 7.349 · 10−6 7.390 · 10−6 3.226 · 10−9 Gnat 6.900 · 10−6 6.724 · 10−6 −2.147 · 10−7 Mosquito 6.561 · 10−6 6.930 · 10−6 3.398 · 10−7 Termite 8.953 · 10−6 9.114 · 10−6 1.292 · 10−7 Beetle 8.116 · 10−6 8.285 · 10−6 1.351 · 10−7 Cricket 5.311 · 10−6 5.027 · 10−6 −3.166 · 10−7 Hornet 7.228 · 10−6 _{7.147 · 10}−6 _{−1.223 · 10}−7 Moth 8.554 · 10−6 _{7.147 · 10}−6 _{−1.684 · 10}−7 Wasp 7.829 · 10−6 7.862 · 10−6 −2.140 · 10−9 Blackfly - - -Dragonfly 8.455 · 10−6 8.548 · 10−6 5.655 · 10−8 Horsefly - - -Roach 6.552 · 10−6 6.383 · 10−6 −2.085 · 10−7 Weevil - -

-Table 4: Typicality of flowers and insects

Term ∆P_Gf Positive ∆P_Gf Negative ∇P_Gf Positive Bagpipes∗ 8.037 · 10−6 7.504 · 10−6 5.715 · 10−7 Cello 5.996 · 10−6 5.479 · 10−6 5.560 · 10−7 Guitar 6.223 · 10−6 5.694 · 10−6 5.650 · 10−7 Lute 7.209 · 10−6 6.770 · 10−6 4.772 · 10−7 Trombone 6.796 · 10−6 _{6.612 · 10}−6 _{4.007 · 10}−7 Banjo 8.420 · 10−6 _{7.982 · 10}−6 _{4.822 · 10}−7 Clarinet 5.554 · 10−6 5.179 · 10−6 4.079 · 10−7 Harmonica 7.022 · 10−6 6.578 · 10−6 4.849 · 10−7 Mandolin 7.258 · 10−6 6.636 · 10−6 6.633 · 10−7 Trumpet 7.651 · 10−6 7.080 · 10−6 6.098 · 10−7 Bassoon 6.094 · 10−6 5.692 · 10−6 4.368 · 10−7 Drum 8.045 · 10−6 7.915 · 10−6 1.650 · 10−7 Harp 6.642 · 10−6 6.214 · 10−6 4.655 · 10−7 Oboe 5.579 · 10−6 5.154 · 10−6 4.595 · 10−7 Tuba 6.412 · 10−6 6.251 · 10−6 1.942 · 10−7 Bell 6.440 · 10−6 6.244 · 10−6 2.303 · 10−7 Fiddle 8.503 · 10−6 8.003 · 10−6 5.343 · 10−7 Harpsichord 5.909 · 10−6 _{5.593 · 10}−6 _{3.520 · 10}−7 Piano 6.415 · 10−6 _{5.910 · 10}−6 _{5.424 · 10}−7 Viola 8.476 · 10−6 7.746 · 10−6 7.741 · 10−7 Bongo 7.659 · 10−6 7.492 · 10−6 2.037 · 10−7 Flute 7.159 · 10−6 6.566 · 10−6 6.313 · 10−7 Horn 8.368 · 10−6 8.037 · 10−6 3.732 · 10−7 Saxophone 7.032 · 10−6 6.463 · 10−6 6.152 · 10−7 Violin 6.384 · 10−6 5.757 · 10−6 6.628 · 10−7

Term ∆P_Gf Positive ∆P_Gf Negative ∇Pf

GNegative Arrow 6.819 · 10−6 6.606 · 10−6 −2.446 · 10−7 Club 5.758 · 10−6 5.225 · 10−6 −5.672 · 10−7 Gun 8.173 · 10−6 8.240 · 10−6 3.437 · 10−8 Missile 5.742 · 10−6 5.945 · 10−6 1.758 · 10−7 Spear 8.710 · 10−6 8.753 · 10−6 1.088 · 10−8 Axe 7.375 · 10−6 7.525 · 10−6 1.224 · 10−7 Dagger 9.577 · 10−6 9.327 · 10−6 −2.864 · 10−7 Harpoon 6.697 · 10−6 6.896 · 10−6 1.675 · 10−7 Pistol 8.360 · 10−6 8.344 · 10−6 −5.007 · 10−8 Sword 9.341 · 10−6 9.125 · 10−6 −2.487 · 10−7 Blade 8.919 · 10−6 _{8.780 · 10}−6 _{−1.742 · 10}−7 Dynamite 7.952 · 10−6 _{7.899 · 10}−6 _{−8.923 · 10}−8 Hatchet 6.708 · 10−6 6.447 · 10−6 −2.959 · 10−7 Rifle 7.551 · 10−6 7.595 · 10−6 1.318 · 10−8 Tank 6.819 · 10−6 6.978 · 10−6 1.273 · 10−7 Bomb 6.138 · 10−6 6.495 · 10−6 3.331 · 10−7 Firearm 5.135 · 10−6 5.632 · 10−6 4.764 · 10−7 Knife 8.946 · 10−6 8.899 · 10−6 −8.042 · 10−8 Shotgun 7.993 · 10−6 8.280 · 10−6 2.568 · 10−7 Teargas 7.202 · 10−6 7.532 · 10−6 2.976 · 10−7 Cannon 7.897 · 10−6 7.954 · 10−6 1.877 · 10−8 Grenade 7.576 · 10−6 7.719 · 10−6 1.098 · 10−7 Mace 9.229 · 10−6 8.799 · 10−6 −4.715 · 10−7 Slingshot - - -Whip 7.639 · 10−6 7.530 · 10−6 −1.397 · 10−7

(15)

5 Discussion

5.1 Retrieval of Hearst Patterns

Before we can examine the results of our model, we must first establish the baseline for when we accept a typicality score as being significant. In table 2 we emphasise the higher typicality score, however, it would be to directly assume that a higher score proves a hypernymic relation. We, therefore, observe that there is a consistent relative increase between the groups of hyponyms and our randomly selected groups which is similar to the relative increase from the completely random groups as shown in table 6. The mean deviation of the relative distance from the increase in typicality of the random hyponyms to the extracted hyponyms is also relatively minimal at 9.049 · 10−7.

Random hypernyms Random groups Extracted hypernyms Mean value 9.036 · 10−6 7.801 · 10−6 1.192 · 10−5 Relative to random hypernym 0 1.235 · 10−6 4.119 · 10−6 Relative to extracted hypernyms 4.119 · 10−6 _{2.884 · 10}−6 ₀

Table 6: Mean and relative typicality scores

We excluded the typicality scores for the hypernym ‘languages’ from these calculations, as we will discuss this outlier further in the following. As there is currently no accepted measure for when a typicality score indicates an accepted generic sentence, we cannot definitively state that these hypernyms are accep-ted in this corpus. However, we do consider the consistent increase in typicality amongst the hypernyms to be a strong indication for the acceptance of these sentences as being generic.

We examined which terms in our model show the highest similarity with the hyponyms associated with ‘languages’ and discovered that the term ‘Zulu’ shows no high similarity with language related terms, as shown in table 7. However, even though the model for ‘Zulu’ appears to be associated with the people in-stead of their language, we note that some of these terms connect the Zulu to Islamic beliefs, even though the majority of the Zulu people are Christian (Houle, 2011). This makes us question the reliability of this representation of ‘Zulu’ even more.

Term Cosine similarity Term Cosine similarity

Damour 0.742 Lancers 0.708

Druze 0.736 Gulbuddin Hekmatyar 0.706

Crusader 0.733 Jos´e Antonio 0.705

Jihad 0.729 Caliph 0.705

South Vietnamese 0.725 Barbarian 0.704

(16)

5.2 Recreating the Implicit Association Test

For our examination of the recreation of the IAT, we will first consider the per-formance of our model in the four different groups. Overall, our model was able to correctly identify the terms in the positive groups of flowers and instruments correctly apart from the term ‘clover’. As with the hypernym ‘languages’ in our previous experiment, we examined the words most similar to ‘clover’, but were unable to establish that the term was modelled incorrectly, as shown in table 8.

Term Cosine similarity Term Cosine similarity

Grasses 0.846 Seaweed 0.808

Clovers 0.835 Stalks 0.798

Berries 0.817 Eucalyptus 0.792

Barley 0.813 Oats 0.783

Turnips 0.811 Cabbage 0.783

Table 8: Terms most similar to ‘Clover’

We were also unable to discover based on the number occurrences of ‘clover’ in our corpus, as it is not an outlier in that respect. We report the occurrences of the class of flowers in table 9.

Term Count Term Count

aster 21 pansy 35 clover 148 tulip 74 hyacinth 127 buttercup 36 marigold 48 daisy 518 poppy 196 lily 695 azalea 25 peony 43 crocus 45 violet 448 iris 671 carnation 58 orchid 120 gladiola -rose 9574 magnolia 76 bluebell 43 petunia -daffodil 95 zinnia -lilac 162

Table 9: The occurrences of the terms in the class ’flowers’

As the difference between the typicality scores for the positive and negat-ive typicality scores is only 2.00 · 10−7 and we were unable to determine that ‘Clover’ was modelled incorrectly. We therefore conclude that the model was neutral about this term. As the difference between the positive and negative typicality scores for Aster, Azalea, Orchid, Rose, and Daffodil was less than the difference for Clover, we must also conclude that our model was also neutral about these terms. This brings down the performance of our model for this class to 16 out of 22 flowers correctly classified.

(17)

22 insects and 13 out of 24 weapons correctly. We examined whether the neg-ative features we used occurred less frequent in our corpus than the positive terms. The results of this examination are reported in the table 10.

caress 200 lucky 3205 freedom 5457 rainbow 749 health 14433 diploma 559 love 20953 gift 2361 peace 6339 honour 2580 cheer 716 miracle 906 friend 13636 sunrise 192 heaven 1821 family 30205 loyal 1047 happy 9942 pleasure 4436 laughter 1973 diamond 893 paradise 775 gentle 2483 vacation 215 honest 2686

abuse 2045 pollute 110 crash 1736 tragedy 1592 filth 257 divorce 1816 murder 4774 jail 1032 sickness 830 poverty 2557 accident 4726 ugly 1224 death 17249 cancer 2289 grief 1190 kill 4281 poison 788 rotten 650 stink 203 vomit 225 assault 1690 agony 848 disaster 2346 prison 4539 hatred 909

Table 10: Counts of positive and negative terms

Overall, the negative terms occur 59, 906 times in our corpus, whereas the positive terms occur 128, 762 times. This could explain the lower typicality of the negative group, as the probability of the group occurring in the text is an important part of the probability of Pψ(f |G). As when there are no co-occurrences between members of the group and the term in our corpus, the overall probability is reduced.

Normally this would be intuitive, as a group that does not occur often is less likely be a typical feature. However, when comparing different groups with a term we must observe their frequency relative to that term instead of the whole corpus. Currently, our model currently does not take this into account.

This could also be the result of the size of our dataset. Relative to Caliskan et al. (2017), which is trained on a dataset of 2.2 million distinct tokens where ours is over 8 million.

We also reviewed whether the negative terms were modelled correctly in our dis-tributional model by studying the terms they are most similar to, these terms are reported in appendix C. This investigation showed that ‘tank’, ‘harpoon’, ‘club’, and ‘blade’ were not modelled as weapons and that ‘ant’, ‘hornet’, ‘cricket’ and ‘gnat’ were not modelled as insects, however this does not explain the overall low performance of our model on these negative groups. Finally, we examined

(18)

whether using the terms in the positive and negative groups as our term f and the classes of flowers, insects, instruments, and weapons as our group G. We will explore whether this provides better results, as intuitively it is logical that the probability of observing a negative term would be increased when we en-counter weapons of insects. We report the mean performance for the positive and negative groups per class in table 11 and the full results in appendix D.

µ flowers µ insects µ instruments µ weapons Positive 7.175 · 10−6 7.434 · 10−6 7.813 · 10−6 7.441 · 10−6 Negative 6.870 · 10−6 7.608 · 10−6 7.314 · 10−6 7.495 · 10−6

Table 11: Mean typicality of positive and negative terms per group

The results of this experiment are however worse than our original results. Overall, the typicality is lower for these groups and the difference between the mean typicality for the positive and negative terms are very close together.

The results for the original typicality score ∇P_Gf are identical to the simpli-fied version ∆P_Gf in identifying the association for all groups except the group of insects. In this group, it performs even worse than the simplified model. It was only able to correctly identify 6 out of the 22 insects modelled. This could also be explained by the relatively low number of occurrences of the negative group as discussed earlier.

Its identical performance in the other groups is most likely linked to the high value in the denominator in this formula, as previously discussed in 4.2.

6 Conclusion

Our model was able to capture hypernymic relations with a higher typicality than the typicality of a random group of phrases. We were also able to recreate findings from Greenwald et al. (1998) by showing a correlation between positive terms and a group of positive phrases. Even though we were not able to recreate similar results for negative terms and a group of negative phrases, this can be explained by the limitations of our algorithm. We were unable to examine the full model, however, due to our reported probabilities being too small to show substantial differences with the typicality of the alternative group.

Our work could be extended by determining the value of features as is explained in section 1.1.3. This could alleviate the issue of comparing groups of features that occur with different frequencies in our dataset and provide results more in line with the original proposal by Rooij and Schulz as discussed in section 1.1.3. Our model could also be utilised to perform other tasks, such as establishing all meronyms that share a holonym or explore the acceptance of previously established hypernyms with a defined set of alternatives.

(19)

References

Gerlof Bouma. Normalized (pointwise) mutual information in collocation ex-traction. Proceedings of GSCL, pages 31–40, 2009. 9

British National Corpus. 2001. URL http://www.natcorp.ox.ac.uk. 7

Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356 (6334):183–186, 2017. 4, 15

Grant L Campbell, Anthony A Marfin, Robert S Lanciotti, and Duane J Gubler. West nile virus. The Lancet infectious diseases, 2(9):519–529, 2002. 1

Ariel Cohen. Generics, frequency adverbs, and probability. Linguistics and Philosophy, 22(3):221–253, Jun 1999. ISSN 1573-0549. doi: 10.1023/A: 1005497727784. URL https://doi.org/10.1023/A:1005497727784. 2

Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1199–1209, 2014. 3

Anthony G Greenwald, Debbie E McGhee, and Jordan LK Schwartz. Measur-ing individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74(6):1464, 1998. 11, 16, 22

Marti A Hearst. Automated discovery of wordnet relations. WordNet: an electronic lexical database, pages 131–153, 1998. 3, 5, 8

Robert J Houle. Making African Christianity: Africans Reimagining Their Faith in Colonial South Africa. Lehigh University Press, 2011. 13

Alessandro Lenci. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20(1):1–31, 2008. 3

Sarah-Jane Leslie. Generics: Cognition and acquisition. Philosophical Review, 117(1):1–47, 2008. doi: 10.1215/00318108-2007-023. 1, 2

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) Sys-tem Demonstrations, pages 55–60, 2014. URL http://www.aclweb.org/ anthology/P/P14/P14-5010. 9

Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn tree-bank: Annotating predicate argument structure. In Proceedings of the

(20)

Work-Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estima-tion of word representaestima-tions in vector space. arXiv preprint arXiv:1301.3781, 2013a. 3

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Dis-tributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013b. 9

Andriy Mnih and Koray Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information pro-cessing systems, pages 2265–2273, 2013. 3

Radim ˇReh˚uˇrek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Chal-lenges for NLP Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA. 9

Stephen Roller, Douwe Kiela, and Maximilian Nickel. Hearst patterns revis-ited: Automatic hypernym detection from large text corpora. arXiv preprint arXiv:1806.03191, 2018. 3, 8

Robert van Rooij and Katrina Schulz. Generics and typicality: A bounded rationality approach. Linguistics and Philosophy, to appear. doi: 10.1007/ s10988-019-09265-8. 2, 4, 5, 6, 11, 16

(21)

Appendices

A

Hypernyms and their groups

In this appendix we report the hypernyms and their corresponding hyponyms that have been extracted from the BNC using Hearst patterns and used in 4.1. Italicised terms were extracted from the corpus using Hearst patterns, but not included in our distributional model. These terms were therefore not included in the results presented in 4.1.

Narcotic: alcohol, morphine

Writing: report, essays, diaries, stories, letters, accounts Poison: belladonna, arsenic

Atom: halogen, oxygen

Activities: demonstrations, assemblies, research, assault

Movements: labour, womens suffrage, independence, fascist, feminists, peace, antislavery

Countries: Indonesia, India, Japan Iceland, Czechoslovakia, Austria, Switzer-land, FinSwitzer-land, Norway, Sweden

Characteristics: ethnicity, education, age, gender, status, unemployment, his-tory

Rocks: obsidian, baryte, alabaster, gypsum, limestone, marble, siltstones, ser-pentinite

Organic matter: shell, membranes, filaments, vomit, urine, faeces, food, ma-nure, compost, mould, peat

Minerals: quartz, halite, gold, silver, barium, pyrite, dolomite, siderite, mag-netite, sylvite

Sports: football, windsurfing, sailing, badminton, tennis, cricket, golf Teas: blackberry, orange, mango

Herbs: peppers, mustard seed, jasmine, ginseng, peppermint, rosemary, chicory, yarrow, burnet, lavender, basil, myrtle, shallot flakes, harissa powder, lime-flower, rib grass, sheeps parsley, verbena, pelargonium

Fish: carp, salmon, cod, halibut, haddock, snapper, pike, herring, sardines, cod, hake, eel, golden mullet, sea trout, scrod

Medical condition: incontinence, chest pain, HIV

Languages: Italian, Portuguese, Slovak, Spanish, Afrikaans, Zulu, German, Latin, Malay, Tswana

Materials: wood, clay, plaster, metal, plastics, laminates, glass, paper, textiles, rubber, iron, aluminium, rock, salt, ciment, fondu

Institutions: banks, insurance companies, university, parliament Organism: animal, human, bacteria, viruses, fungi

Weapons: napalm, missiles, artillery, mortars, machine guns, lasers, shotguns, axe, sword, kalashnikovs

(22)

Random hyponyms

In the following section we report all the random hyponyms that were generated for the hypernyms we extracted using Hearst patterns.

Narcotic: car collided, half-year

Writing: hythe, witch-hunt, butterworth, retail trade, supernatant, paw Poison: call, coastal marshes

Atom: tightly wrapped, indoor

Activities: Rolling Stones, Ian Smith, averted, Dietrich

Movements: slackened, terrestrial ecology, roosevelts, deputy chair, campaign aimed, stocks, self-satisfied

Countries: thoroughly enjoyable, Berwick, brothers sisters, migrating birds, sterling, gallery’s, Gethsemane, like wildfire, Mahogany, sinker plate

Characteristics: hill farmers, cliff path, threshed, motioned, antiquated, un-ruly, bridge pickup

Rocks: semi-detached house, feral, white stripes, barrett, turn-taking, deflat-ing

Organic matter: musings, allegations, gold watch, treasurer current, resevoirs, Namibians, violinists, underwent surgery, stay awake, well-repared, prussians Minerals: surveys, wastes, Montaine, appear, silkily, Miss Hatherby, spastics society, ah hah

Sports: mishandled, well-loved, littering, breathtaking view, gentle persuasion, cmht, conductivity

Teas: WSR, reinterpreted, prospective study

Herbs: anyone who’s, unadorned, foster-parents, forties, expressed grave, erosions, christs sake, woorden bridge, pallets, lifted, undertones, enfant terrible

Fish: Gombrich, nestlings, jobless, AWB, arcs, lif-saving, roamed, main draw-back, transcribed, Andres, escapism, ministrations

Medical condition: divides into, variable, sports centres

Languages: tring, nationalise, mismatch between, luxury, sweeties, asset strip-ping, blackcurrant, heading towards, similar fate

Materials: wheeled, termed, entrance fees, whacking, video recorder, wretchedly, indium-111, purser, sanctioning, varying amounts, grown steadily, kiwi, triplets, mini

Institutions: trade-in, mucosal surface, station, guiseley

Organism: bampton, samuelson, owner-occupiers, gypsies, mutatis mutandis Weapons: marshalling, foresee, scandinavian, 14-year-olds, aroused consider-able, nett, skinnergate, market counterparty, windsurfing, revitalization Churchmen: satellite television, m4 corridor, insurrections

(23)

Random groups

In the following section we report all the random groups that were generated as a comparison baseline for the hypernyms we extracted using Hearst patterns.

Manly: transubstantiation, hamster, finally deciding, plasminogen activators, rupturing administrative structures, left-back, anthropoid, churn out, knot, spe-cial pleading, matting, third-class, leather gloves, dowager duchess, collapses, diem, air ministry, anything remotely, brightest

Soap opera: clem, physical contact, Neil Kinnocks, replacement, watering hole, resign, Sharpe, verso, Cosford, child-like, larger-than-life, South Wales, senior vice-president, potentate, aisles, bandwidth, cottoned on, student protests Owners abroad: 800 feet, missed

Jack Mason: flower arrangements, 30mm, mlmin, rebuild, formidable, glove compartment, half-caste, commendation, private contractors, soapy water, sub-ordinating, drum, distinguishes, chafing, sports facilities

Nodule: pay rises, deviousness, factual information, walker cup, inferiority complex, intragastric, bonhomie, Sudbury, ballooning, Antioch, Denny, shyly, diocese

Carbon dioxide: Boaz, thus avoiding, educated privately, downward, cock-roft, archaic heritage, radiations, cpsu politburo, population estimates, rotted, warily, zero, Dorian Gray

Imperfection: kings bench, Herta, inaccuracies, weather, helicopters, lib-rary associations, turns, adulterous, harpers, passenger seat, historical periods, Manchester-based, safeguard, herman

Falsified: precision architecture, simplistically, atomism, presidential

Projection: Hondas, svga, iron bars, 06 social, froth, Croke Park, value-for-money, coach Bob, roly, about-turn, medicaid, fuming, walsh, instances McLuhan: Yucatan, efta, backwoods, stolen property, Llewelyn, source rocks, impeding, utah, media items, slip stitch, Scanlon, bunches, disapprove, bour-dieus, tornados, rash, gdansk, proportional representation, commissions propos-als

Agora: shopped, christian communicators, concretion

Democratic: synthesis, reassuringly, much-needed, pigging, Paul II, people who’ve, computer serial, de, valet, neoplasia, v6 engine, 5in, international her-ald, form n79, kilometres away, daft, Betts, catching sight, gramophone Armstrongs: pedantic, 12 noon, plea bargain, copley

Murchison: scotvec, Antonia, scurry, chemical plant, supercharged, white stripes, windscale, drug-related, indicates, William Joyce, telecoms, soviet inva-sion, scrupulously clean, moral code, mi6, larger, hasn’t, as follows, phenomen-ological

Markers: monthly repayments, Credit Suisse, likelihood ratio, salivation, fili-gree, potencies, nov, Kristevas, John Locke, consultative body, financial devol-ution, St Mirren, handyman, Moldava, hornets, rewriting, missions

(24)

Saddle: intrinsic, economic recession, brain tumour

Khmer Rouge: agreeing, simpler, gauntlet, Christies sale, synapse, pop-up, Kim, sedate, woo woo, lashings, fleury

High concentrations: intelligences, ladybirds, organisational culture, dread-fully, two-person, Lizzie, head-dresses, Mohicans, re-investigation, discourage, visit

Brusque: Drance, senior, wedded, receptionists, repented, postal service

B

Implicit association groups

The following lists shows the groups of features used in the Imlicit Association Test as used by Greenwald et al. (1998). These are identical to the groups Gre-enwald et al. use in their first and second experiment.

Positive: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond, gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family, happy, laughter, paradise, vacation

Negative: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink, assault, disaster, hatred, pollute, tragedy, divorce, jail, poverty, ugly, cancer, kill, rotten, vomit, agony, prison

(25)

C

Similarity with negative groups

In this appendix we report the ten phrases that were most similar in our Word2Vec model to the negative groups used in 4.2. The table description reports the terms that are given in the same order as they are reported in the table itself.

Insects

Term Similarity albino 0.760 elephant 0.733 owl 0.714 eel 0.709 apple tree 0.690 alligator 0.685 ostrich 0.675 hedgehog 0.656 squirrel 0.653 reptile 0.647 Term Similarity beetle 0.815 wasp 0.805 moth 0.804 leopard 0.784 snail 0.781 pea 0.775 vine 0.768 snake 0.767 pheasant 0.752 fungus 0.752 Term Similarity goat 0.765 wasp 0.764 monkey 0.748 poodle 0.746 moth 0.744 panther 0.730 bee 0.726 squirrel 0.724 toad 0.724 leopard 0.722

Table 12: Ant, Caterpillar, Flea

Term Similarity locusts 0.688 wildebeest 0.678 guano 0.665 beaked 0.661 salamanders 0.658 dragonfly 0.654 sugar-cane 0.651 vetch 0.646 pelagic 0.641 hippos 0.637 Term Similarity snake 0.860 lizard 0.807 moth 0.785 cat 0.785 bird 0.783 monkey 0.778 butterfly 0.776 hedgehog 0.775 wasp 0.774 frog 0.773 Term Similarity tarantula 0.710 flatfish 0.702 wild cat 0.698 bindweed 0.696 dried-up 0.695 plover 0.683 dragonfly 0.683 buttercup 0.682 banshee 0.681 splutters 0.677

(26)

Term Similarity sail 0.784 shoot 0.777 ride 0.726 swim 0.719 float 0.717 catch 0.714 jump 0.691 drive 0.685 crawl 0.684 paddle 0.679 Term Similarity caster 0.837 waggler 0.830 chub 0.810 roach 0.809 pike 0.774 dace 0.771 bream 0.769 3lb 0.743 gudgeon 0.731 lob 0.728 Term Similarity piranha 0.756 viper 0.751 canary 0.741 flightless 0.740 sable 0.731 dipper 0.727 bindweed 0.725 cow parsley 0.724 flippers 0.723 peregrine falcon 0.723

Table 14: Fly, Maggot, Tarantula

Term Similarity moth 0.804 bird 0.796 snake 0.791 wasp 0.783 shark 0.779 frog 0.776 squirrel 0.776 cat 0.771 parrot 0.765 butterfly 0.758 Term Similarity passion fruit 0.732 clams 0.728 sizzle 0.726 4oz 0.725 half-eaten 0.724 carcass 0.722 400g 14oz can 0.721 marinated 0.718 lard 0.716 pancake 0.711 Term Similarity three-inch 0.673 midget 0.662 thrower 0.652 hot pot 0.651 twirl 0.648 dogfish 0.646 hyena 0.645 loach 0.644 halberd 0.644 pudgy 0.644

Table 15: Bee, Cockroach, Gnat

Term Similarity shark 0.725 moth 0.715 dolphin 0.714 wasp 0.713 insect 0.684 fungus 0.677 larva 0.671 seals 0.669 squid 0.659 caterpillar 0.655 Term Similarity insectivorous 0.752 arboreal 0.735 spiny 0.705 eurasian 0.699 long-necked 0.693 ficus 0.693 sessile 0.689 deciduous 0.688 coniferous 0.686 dragonfly 0.685 Term Similarity fungus 0.827 wasp 0.820 caterpillar 0.815 bark 0.807 moth 0.800 snake 0.787 snail 0.782 toad 0.780 larva 0.772 rabbit 0.772

(27)

Term Similarity rugby 0.877 football 0.831 soccer 0.817 hockey 0.813 tennis 0.760 athletics 0.754 boxing 0.753 snooker 0.730 badminton 0.729 golf 0.729 Term Similarity floatplane 0.738 gloster 0.731 qe2 0.730 sunbeam 0.721 fairey 0.719 fighter plane 0.718 nimrod 0.716 boeing 707 0.716 fishing boat 0.715 corsair 0.712 Term Similarity wasp 0.874 leopard 0.838 butterfly 0.821 caterpillar 0.804 bee 0.804 snake 0.803 tiger 0.801 beetle 0.800 larva 0.789 cuckoo 0.787

Table 17: Cricket, Hornet, Moth

Term Similarity moth 0.874 leopard 0.851 tiger 0.833 beetle 0.820 shark 0.807 caterpillar 0.805 snake 0.796 butterfly 0.794 squirrel 0.789 frog 0.788 Term Similarity primula 0.770 waterlily 0.752 white-tailed 0.750 flightless 0.745 dunlin 0.742 ptarmigan 0.742 sea lions 0.739 plover 0.734 long-tailed 0.734 dragonflies 0.734 Term Similarity chub 0.908 dace 0.896 bream 0.813 maggot 0.809 tench 0.804 skimmers 0.800 trout 0.797 caster 0.790 gudgeon 0.788 pike 0.773

Table 18: Wasp, Dragonfly, Roach

Weapons

Term Similarity eyeball 0.729 inverted 0.725 arrowhead 0.725 indentation 0.717 arch 0.707 groove 0.689 earring 0.689 adder 0.686 iron bar 0.683 Term Similarity golf club 0.763 football club 0.757 clubs 0.743 football team 0.739 cricket club 0.699 stadium 0.687 british legion 0.681 supporters club 0.680 display team 0.677 Term Similarity pistol 0.873 shotgun 0.866 rifle 0.852 revolver 0.846 knife 0.828 grenade 0.818 spear 0.776 bullet 0.769 helmet 0.757

(28)

Term Similarity rocket 0.836 launcher 0.792 long-range 0.780 missiles 0.758 jet 0.756 submarine 0.754 torpedo 0.746 helicopter 0.745 bomber 0.741 polaris 0.729 Term Similarity dagger 0.845 sword 0.845 knife 0.798 pistol 0.795 gun 0.776 revolver 0.775 helmet 0.773 bullet 0.770 fist 0.766 saddle 0.763 Term Similarity oar 0.745 iron bar 0.679 dagger 0.669 sword 0.665 hammer 0.655 grenade 0.653 arrow 0.645 gun 0.643 spear 0.642 oak tree 0.639

Table 20: Missile, Spear, Axe

Term Similarity sword 0.866 spear 0.845 helmet 0.808 walking stick 0.801 pistol 0.798 muzzle 0.792 knife 0.789 cloak 0.785 greatcoat 0.778 holster 0.775 Term Similarity fire extinguisher 0.719 sandbank 0.709 submersible 0.702 flat-bottomed 0.699 grasshopper 0.697 ju87 0.696 boeing 707 0.695 qe2 0.690 coaster 0.686 executive transporter 0.685 Term Similarity revolver 0.889 rifle 0.886 gun 0.873 shotgun 0.849 grenade 0.806 handgun 0.806 dagger 0.798 spear 0.795 knife 0.794 cannon 0.772

Table 21: Dagger, Harpoon, Pistol

Term Similarity dagger 0.866 spear 0.845 helmet 0.812 pistol 0.768 lance 0.757 gun 0.754 fist 0.741 armour 0.740 rifle 0.740 knife 0.738 Term Similarity needle 0.839 rod 0.833 seam 0.828 shaft 0.825 cord 0.824 rope 0.804 thread 0.796 muzzle 0.790 wire 0.790 strap 0.784 Term Similarity banger 0.662 clanking 0.645 bellows 0.639 hooter 0.636 poker 0.636 gunpowder 0.633 drums 0.632 chainsaw 0.631 pirate 0.623 red hot 0.621

(29)

Term Similarity machete 0.646 baseball bat 0.633 bandit 0.629 lanky 0.622 zafonic 0.622 pitchfork 0.617 surfboard 0.612 stealer 0.609 tigress 0.609 serial killer 0.607 Term Similarity pistol 0.886 gun 0.852 revolver 0.837 shotgun 0.809 musket 0.786 grenade 0.782 machine gun 0.755 cannon 0.754 spear 0.753 pistols 0.746 Term Similarity cylinder 0.762 cistern 0.761 pond 0.755 radiator 0.749 container 0.736 heater 0.713 tanks 0.711 boiler 0.710 bulb 0.708 loft 0.707

Table 23: Hatchet, Rifle, Tank

Term Similarity bombs 0.750 car bomb 0.736 grenade 0.726 torpedo 0.718 hurricane 0.709 bombers 0.692 jumbo jet 0.691 blast 0.688 high explosive 0.686 raid 0.684 Term Similarity shotgun 0.688 stolen goods 0.655 dishonestly 0.649 carrier 0.635 firearms 0.635 detonator 0.632 cause grievous 0.631 pistol 0.618 criminal damage 0.607 bodily harm 0.604 Term Similarity gun 0.828 shotgun 0.816 razor 0.800 spear 0.798 pistol 0.794 dagger 0.789 revolver 0.780 rope 0.771 bullet 0.767 grenade 0.767

Table 24: Bomb, Firearm, Knife

Term Similarity gun 0.866 pistol 0.849 revolver 0.824 knife 0.816 rifle 0.809 grenade 0.794 handgun 0.763 spear 0.761 machete 0.751 sawn-off shotgun 0.743 Term Similarity water cannon 0.835 tear-gas 0.828 automatic weapons 0.827 riot police 0.768 incendiaries 0.756 israeli army 0.753 riot gear 0.753 live ammunition 0.750 hand grenades 0.748 armoured vehicles 0.746 Term Similarity cannons 0.774 pistols 0.773 pistol 0.772 gunner 0.770 machine-guns 0.764 crane 0.755 rifle 0.754 musket 0.750 machine-gun 0.749 gun 0.729

(30)

Term Similarity gun 0.818 bullet 0.816 revolver 0.813 pistol 0.806 shotgun 0.794 rifle 0.782 machine-gun 0.777 knife 0.767 handgun 0.762 detonator 0.750 Term Similarity thrower 0.696 partridge 0.686 jonny 0.676 heinz 0.673 laurent 0.672 carmichael 0.657 vivian 0.657 drake 0.654 reed 0.654 dirk 0.654 Term Similarity kick 0.718 snap 0.710 flick 0.710 jab 0.699 jerk 0.699 hammer 0.693 saddle 0.685 slap 0.682 fist 0.681 stick 0.676

(31)

D

Typicality of positive and negative terms

Term Typicality Term Typicality

caress 1.045 · 10−5 lucky 6.714 · 10−6 freedom 7.285 · 10−6 rainbow 9.494 · 10−6 health 5.663 · 10−6 diploma 5.153 · 10−6 love 6.903 · 10−6 gift 6.091 · 10−6 peace 6.650 · 10−6 honour 5.621 · 10−6 cheer 6.646 · 10−6 miracle 5.974 · 10−6 friend 6.959 · 10−6 sunrise 9.211 · 10−6 heaven 7.141 · 10−6 family 6.533 · 10−6 loyal 4.957 · 10−6 happy 6.575 · 10−6 pleasure 8.570 · 10−6 laughter 9.431 · 10−6 diamond 9.332 · 10−6 paradise 7.037 · 10−6 gentle 9.981 · 10−6 vacation 4.670 · 10−6 honest 6.341 · 10−6

abuse 7.431 · 10−6 pollute 6.832 · 10−6 crash 5.649 · 10−6 tragedy 6.283 · 10−6 filth 1.042 · 10−5 divorce 5.517 · 10−6 murder 5.872 · 10−6 jail 5.367 · 10−6 sickness 6.134 · 10−6 poverty 7.102 · 10−6 accident 5.015 · 10−6 ugly 7.715 · 10−6 death 5.533 · 10−6 cancer 5.444 · 10−6 grief 8.009 · 10−6 kill 6.183 · 10−6 poison 7.875 · 10−6 rotten 7.542 · 10−6 stink 8.967 · 10−6 vomit 9.947 · 10−6 assault 5.190 · 10−6 agony 8.404 · 10−6 disaster 5.186 · 10−6 prison 5.049 · 10−6 hatred 9.074 · 10−6

Table 27: Typicality of positive and negative terms with ‘flowers’

(32)

Term Typicality Term Typicality caress 1.071 · 10−5 lucky 7.503 · 10−6 freedom 8.535 · 10−6 rainbow 9.677 · 10−6 health 7.260 · 10−6 diploma 5.862 · 10−6 love 7.798 · 10−6 gift 6.817 · 10−6 peace 7.687 · 10−6 honour 6.695 · 10−6 cheer 7.277 · 10−6 miracle 6.684 · 10−6 friend 7.839 · 10−6 sunrise 9.272 · 10−6 heaven 7.405 · 10−6 family 7.026 · 10−6 loyal 5.312 · 10−6 happy 7.002 · 10−6 pleasure 9.625 · 10−6 laughter 1.080 · 10−5 diamond 9.441 · 10−6 paradise 6.152 · 10−6 gentle 1.095 · 10−5 vacation 5.137 · 10−6 honest 6.864 · 10−6

Table 29: Typicality of positive and negative terms with ‘Instruments’

(33)

E

Removed characters

The following table lists all characters that were removed from the corpus, in-cluding their unicode value.

Character Unicode value Character Unicode value Character Unicode value

< U+003C : U+003A → U+2192

> U+003E ; U+003B 0 U+2032

∗ U+002A = U+003D ‘ U+0060

∗ U+2217 + U+002B 00 _U+2033

± U+00B1 - U+002D / U+002F

& U+0026 • U+2022 \ U+005C

! U+0021 – U+002D U+002D ‘ U+2018

? U+003F — U+002D U+002D U+002D ’ U+2019

. . . U+2026 [ U+005B ( U+0028

” U+0022 ] U+005D ) U+0029

# U+0023 ˆ U+005E <

-, U+002C ° U+2218 >

-. U+002E — U+2014 &

Incorporating meaning into thediscovery of generic sentences