My Reality is not Your Reality

(1)

My Reality is not Your Reality

Alexandra E.E. Arkut 10583149

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. M. J. Marx H. Azarbonyad MSc ILPS, IvI Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam 2016-06-24

(2)

In this bachelor project, the meaning of given concepts before and after 9/11 is represented using New York Times articles covering a time period of twenty years (1987-2007). Moreover, the difference in meaning of the concepts between the two perspectives is quantified. The words for both time periods are represented as word embeddings in the form of a vector, acquired through applying the Word2Vec algorithm to the words. The ’agreement measure’, or similarity between the meaning of a word for both time periods, can then be computed utilising three methods. The first method involves creating a linear mapping between the vector spaces that represent the meanings of words in the two datasets. In the second method, the agreement measure is based on the similarity of neighbour-ing words in the two vector spaces for a given concept. The final method bases the agreement measure on the co-occurrence of neighbours in the two Word2Vec models, combined with the linear-mapping from the first method. The quantifications of differences in meaning are used in sum-marising a list of concepts related to the 9/11 attacks and evaluated by peers. Results of peer evaluation seem to indicate that there is definite potential in the methods posed and that future research should be consid-ered. Future work includes lemmatisation, investigating the influence of the frequency of terms and the occurrence of new words in the vocabulary of the Word2Vec models.

(4)

1 Introduction

A noteworthy topic of study in the field of opinion mining concerns analysing the varying opinions that occur in political settings, as political parties often possess contrasting opinions on various topics. Expressing these differences in opinions in a comprehensible manner may be of use in various environments, not necessarily limited to the political domain. In a study related to this topic, Fang et al. (2012) developed a method that is capable of expressing contrastive opinions in political records from US senators. Moreover, a measure with which the quantity of difference between the opinions seen from various viewpoints can be determined has been posed. In addition to differences in political records, the research describes how the quantified difference between the opinions on topic words from three newspapers, originating from different countries, can be obtained. An interesting issue that has not been investigated, however, is the quantified difference between opinions on certain topic words within one newspaper; the meaning of the topic words may have been influenced by im-portant events, such as the 9/11 attacks in 2001. Investigating this change in meaning could provide great insight into the attitude of society towards certain topics before and after the attacks, and, hence, could also contribute to the understanding of the impact of terrorist-related attacks on society.

Similar to the goal described in Fang et al. (2012), the aim of the project ’My reality is not your reality’ is to:

Represent the meaning of a given concept seen from two viewpoints, quan-tify the difference between the two meanings and summarise them.

Datasets of speeches from the UK Conservative and Labour party have been

used in quantifying the difference between these two viewpoints1_{. The UK}

Parliament proceedings serve as an appropriate dataset for this research when compared to reports from Dutch political parties, as they clearly portray two points of view on a given topic, whereas in the Netherlands, several different political parties exist that may all express different opinions on a topic.

The focus of this bachelor thesis lies in creating a ’timeline of meaning’, ob-tained through comparing the meaning of concepts in New York Times articles, covering a time period of twenty years, namely 1987-2007, before and after the 9/11 attacks.

The words for both time periods are represented as word embeddings in the form of a vector, acquired through applying the Word2Vec algorithm to the words. The ’agreement measure’, or similarity between the meaning of a word for both time periods, can then be computed utilising three methods. The first method involves creating a linear mapping between the vector spaces that represent the meanings of words in the two datasets. In the second method, the agreement measure is based on the similarity of neighbouring words in the two vector spaces for a given concept. The final method focuses on the intersection of neighbouring words for a concept in both Word2Vec models, combined with the linear mapping created in the first method.

After quantification, the next goal of the project is to summarise the mean-1_{This research has been performed by H. Azarbonyad, one of the supervisors for this}

(5)

ings for a selected list of concepts for both time periods. These summaries should, in theory, portray the change in meaning of a concept over time (if present). Finally, a peer evaluation will be performed to see whether the sum-marisations are comprehensible and representative for a given concept from a human perspective.

In order to achieve the research goal, the following questions should be an-swered:

RQ1 Can the change in meaning of concepts between two viewpoints be

de-tected?

1. What would be the best way to detect this change?

2. Does the quantified change in meaning of a set of concepts obtained through the methods used agree with the change in meaning of these concepts according to the results of peer evaluation?

3. Which of the three methods, being the linear mapping method, the neighbour method and the combined co-occurrence-linear-mapping method, produces the best results when quantifying the change in meaning of a concept?

RQ2 How can the meanings of words seen from two viewpoints be summarised

in a comprehensible manner?

1. Would key-word summarisation be feasible?

2. How can one find the terms that determine the change in meaning of the concepts?

1.1 Overview of thesis

In Section 3, the methodology of the project is discussed in three subsections, each one covering one objective from the research goal. The results obtained through the methods are assessed in Section 4. Conclusions and future work are discussed in Section 5.

2 Related Work

The research by Fang et al. (2012) described in the introduction utilises a Cross-Perspective Topic Model, in which the generation of topic words, i.e. words on which one may have an opinion, and the generation of opinion words are per-formed individually, so as to facilitate the opinion modelling. Although the results of the research seem promising, it may still be of interest to investigate various ways of representing the words in the dataset. A research by Levy et al. (2014) compares two types of word representations in three tasks, consisting of analogy questions, and ranking of word pairs according to semantic relations that might hold between the two words in the pair. The two word represen-tations used are an embedded representation (obtained through applying the Word2Vec algorithm, which is explained below, to the words), and an explicit representation, which uses the context in which words occur. Both resulting vectors were used in the aforementioned tasks by either looking at the offset

(6)

between two vectors to determine similarity, or by maintaining the direction of the vectors, leaving the offset out of consideration. The directional method worked well on the ranking task, but obtained low results for the analogy ques-tion tasks. The offset method performed relatively well on all tasks, especially when all aspects of a word were represented in a way that avoids some aspects being weighted more heavily than others.

The current research project will use word embeddings created by the Word2Vec algorithm as word representations, as in the research by Levy et al. (2014). This algorithm makes use of Recurrent Neural Networks to form continuous vector representations of words [Mikolov et al., 2013a]. Words that are similar to each other tend to be close together in the vector space [Mikolov et al., 2010]. This embedded representation has already been proven to capture semantic and syn-tactic relations in analogy questions [Mikolov et al., 2013c]. Moreover, these representations have been used successfully in machine translation in translat-ing words between two languages that need not be similar, as described in a research by Mikolov et al. (2013b). This research uses the vector spaces of the words from both languages to create a mapping from one vector space to another. As in this research, this bachelor project will also utilise two vector spaces, one representing words that occur in New York Times articles before 9/11, and the other representing the data from after 9/11. A similar linear mapping between the two spaces, as described in Mikolov et al. (2013b) will be performed in the current research project as well.

An additional method that may be used for comparing two contrasting view-points is similar to the method described in a research by Jeh and Widom (2002). This article discusses a method to measure the similarity between nodes in a graph. To compute the similarity between two nodes, the similarities of the neighbouring nodes of the two nodes are measured, and used in computing the similarity of the two original nodes. The similarities of these original nodes also influence the similarity of the neighbouring nodes.

3 Methodology

The methodology consists of three sections, all covering one element of the re-search goal described in the introduction. To recall, the rere-search goal is to:

Section 3.1 addresses the first item of the research goal by specifying how the New York Times dataset is converted to two Word2Vec models, one before (and including) 9/11, and one model containing data after the 9/11 attacks.

Section 3.2 covers the second subgoal, being the quantification of differ-ence between the meaning of a concept before and after 9/11. Three methods are posed for determining this change, namely a linear mapping method, a method based on neighbouring words in the Word2Vec models, and a method that utilises the co-occurrence of neighbours combined with the mapping from the first method.

The final objective of the research goal is considered in the closing section of the methodology, Section 3.3. As for the quantification, three approaches

(7)

to summarisation are presented. The first approach uses the agreement mea-sure obtained through the neighbour-method. The second technique utilises the measure acquired through the co-occurrence-mapping method. These two summarisation methods are combined in the final approach.

The programming language used for the project is Python (multiple

ver-sions). The Word2Vec algorithm is part of the gensim package [ ˇReh˚uˇrek and Sojka, 2010].

3.1 Data Description - New York Times Corpus

The New York Times dataset contains 1855671 XML-files with a total size of 16,49 GB, representing news articles covering a time period of twenty years (1987-2007). Each XML-File contains a news article, annotated with tags de-noting, among other types of information, date of publication, title, the lead paragraph and the full text of the article. An example can be seen in Figure 1.

Figure 1: Example of an XML-file from the New York Times corpus. The XML-files are processed so as to enable the training of the Word2Vec models. Figure 2 shows an overview of the processes involved in processing the data. The individual tasks are explained in the sections below.

Figure 2: Schematic of the processes involved in processing the data.

3.1.1 Extracting the Data

What is of interest for the current research project is the full text of the article.

This text can be used for training the Word2Vec algorithm. As the full text2

2_{In some occasions, a TypeError is raised when parsing the XML code (46546 times in}

1855671 files). It was decided to continue parsing and skip parts that could not be parsed. Therefore, for some articles, the entire text could not be parsed. However, this is not deemed important, considering the size of the dataset.

(8)

also contains the lead paragraph, it is important to parse the data in such a way that no duplicate pieces of text occur for an article.

The full texts are written to two files, one containing the extracted texts from

the XML-files before (and including) 9/113_{, and one containing the extracted}

texts from after 9/11. This facilitates the subsequent steps that prepare the data for training the Word2Vec models, in the sense that in these steps, no additional measures need to be taken to determine the date of the texts. Moreover, splitting the data into two files ensures that the later processes can be performed on two smaller files, rather than on one large file.

3.1.2 Tokenising the Data

The extracted text is processed further to the form of a list of lists of words in a sentence in order for the Word2Vec algorithm to handle the input. Furthermore, punctuation is removed and all words are cast to lowercase. This ensures that words such as ’Computer’ and ’computer’ are perceived as the same word. For example, the sentences

”This short, explanatory sentence is an example sentence. The sec-ond sentence is explanatory as well.”

is transformed to:

1 [[’this’, ’short’, ’explanatory’, ’sentence’, ’is’,

2 ’an’, ’example’, ’sentence’], [’the’, ’second’, ’sentence’,

3 ’is’, ’explanatory’, ’as’, ’well’]]

Although some information regarding named entities may be lost when cast-ing to lowercase, the positive effect caused by castcast-ing in regard to decreascast-ing the vocabulary size for the Word2Vec algorithm outweigh this relatively small negative effect. All punctuation except for ’-’ is removed from the sentences, as ’-’ denotes a syntactic relation between words in a sentence that might contain useful information.

The extracted text obtained through performing the first task is first split into sentences before removing punctuation and casting to lowercase. Finally, the tokenised lists are written to a file.

3.1.3 Training the Word2Vec Models

The lists of tokenised sentences serve as the input for the Word2Vec algorithm, which creates continuous vector representations of the words in the sentences using recurrent neural networks [Mikolov et al., 2013a]. Each word in the cor-pus (that occurs at least a given number of times) is converted to such a vector. A simple illustration of the method can be seen in Figure 3. For training the Word2Vec models for data before (and including) and after 9/11, the following settings have been used, as can also be seen in Figure 4:

min count = 20: Only words that occur twenty times or more in the corpus are converted to a vector.

3_{Throughout the paper, ’before (and including) 9/11’ is sometimes mentioned as ’before}

(9)

workers = 25: The number of parallel processes; this enables the models to be trained faster.

size = 300: The length of the word-vector.

window = 10: To capture contextual relations between words in the vectors, a maximum number of ten words within the same sentence before and after the word that is vectorised is considered when creating the vector.

Figure 3: Schematic of the Word2Vec method.

Using the tokenised files from the previous step, the Word2Vec models are trained. As the files containing the data are quite large (3.4 GB and 8.1 GB for the data before (and including) and after 9/11 respectively), and as this data needs to be processed further, a generator is created to loop through the pickled files instead of loading the files into memory (Figure 4, lines 2-6) .

In lines 6 and 7 (Figure 4), the data is used to train bigrams and trigrams. Bigrams and trigrams consist of two and three consecutive words, respectively, which often occur together. Based on how often the words occur together in a sentence, the two (or three) separate words are concatenated with an under-score. Terms such as ’artificial intelligence’ are then not seen as two separate words, but displayed as ’artificial intelligence’. The same is the case for tri-grams. The main aim for training the bi- and trigrams is to discover collocations that contain relevant information. Collocations such as ’also say’ or ’i am’ do not provide information, whereas collocations such as ’artificial intelligence’ or ’terrorist attacks’ do.

After the bi- and trigrams have been trained, the output is processed by the Word2Vec algorithm to train the model using the settings shown in lines 12-13. The model is then saved to a file so as to be able to load the trained model in the future.

(10)

1

2 class Iter:

3 def __iter__(self):

4 loop through pickled file

5

6 pickle_object = Iter() create generator

7 bigram_transformer = create bigrams from pickle_object

8 trigram_transformer = create trigrams from bigram_transformer

9 using bigram[pickle_object]

10

11 model = train word2vec model using settings:

12 gensim.models.Word2Vec(trigram_transformer[pickle_object],

13 min_count=20,workers=25,size=300,window=10)

14

15 model.save(to file)

16

Figure 4: Overview of how the Word2Vec model is trained.

The specifications of the final trained models before (and including) 9/11 and after 9/11 can be seen in Table 1.

Specifications Before After

Vocabulary Size 388485 258552

Vocabulary Overlap 181703

Unigrams 184336 112905

Bigrams 204149 145647

Trigrams 0 0

Table 1: Specifications of the trained Word2Vec models before (and including) 9/11 and after 9/11.

As the Word2Vec model before (and including) 9/11 covers a time period of roughly fourteen years (1987-2001), the vocabulary size is larger than for the Word2Vec model after 9/11, which only concerns six years (2001-2007). The overlap of the vocabulary as can be seen in Figure 1 is however deemed substantial enough for this research to obtain relevant and coherent results.

(11)

3.2 Quantification Methods

3.2.1 Quantification of Differences Method One: Linear Mapping

The linear mapping method is based on the linear mapping used in the research by Mikolov et al. (2013b). The assumption made in this method is that a vector that represents a given word in one embedding can be mapped to a vector in the second embedding. If this word has not changed meaning over time, the same word in the second embedding should be returned. If a different word is obtained, this may suggest a shift in meaning of the word over time. An

illustration of the method4 _{can be seen in Figure 5.}

Figure 5: Schematic of the Linear Mapping-method. A word-vector from em-bedding one is mapped to a vector in emem-bedding two.

3.2.1.1 Stop List

To learn a linear mapping from one vector space to another, it is important to base the mapping on vectors representing words for which the meaning in the two datasets is not significantly different. These are neutral words such as

”about” or ”after”. To learn the mapping, a standard list5_{containing such stop}

words with a few additional words added, such as ’computer’, is used. As the meaning of the words should in theory be similar in both time periods, they serve as fixed points in the mapping around which the words with varying meaning are situated. The mapping was learned on a total of 813 words from the stop list.

3.2.1.2 Creating the mapping

The linear mapping is created making use of the Gradient Descent algorithm with a maximum number of 50000 iterations, an of 0.000000001 and a learning rate α of 0.012. A bias-row is added to the mapping pre-training. The mapping is optimised by training on the words in the stop list that occur in the vocabu-laries of both Word2Vec models using the following formula:

minfΣni−1(f∗ vwE1i − v

E2 wi)

2 ₍₁₎

4_{Code provided and method posed by H. Azarbonyad.}

(12)

where f is the trained mapping, n the number of stop words on which the

mapping is trained, and vE1

wi and v

E2

wi represent word wi in embeddings E1 and

E2 respectively. The training is complete if the error is smaller or equal to epsilon or if the maximum number of iterations is reached. The mapping from before to after 9/11 was trained with a final error of 0.078, whereas the mapping from after to before 9/11 was trained with a final error of 0.178. The agreement of a word w is then defined as:

agreement(w) = cosineE1(vwE1, f∗ v

E2

w ) (2)

where vE1

w is the Word2Vec vector of word w in embedding E1, and f the

mapping from embedding E2 to E1. cosineE1 is used to denote that the

co-sine similarity is computed in embedding E1. The average coco-sine similarities (stopwords included) for four mappings are shown in Table 2.

Mapping Average Cosine Similarity Mapping to Self Words in Vocabulary B->A 0.49 64748 -A->B 0.48 67359 -B->A->B 0.85 181564 388485 A->B->A 0.87 181582 258552

Table 2: Average cosine similarities for the four mappings (stopwords included), along with the number of words that are mapped to themselves and the total number of words per vocabulary.

A mapping is created to map words from the Word2Vec model before 9/11 to after 9/11 and vice versa. After this has been done, these mappings can be used to map words back and forth to see if the returned word matches the original word and with what similarity, as seen in Table 2. Furthermore, a list can be created for both mappings containing the similarity of the mapped word to the original word, as can be seen in Figure 6. . The similarities in this list could, in theory, be used as suggestions that indicate how much the meaning of words over time has changed.

(13)

1

2 # loop through keys. if they appear in both vocabularies,

3 # perform mapping and compute cosine similarity between

4 # mapped word and original word (both ways)

5 for i in after_keys:

6 if i in before_keys:

7 i_vec = m_after[i] with extra value (1) added

8 i_mapped = map.dot(i_vec)

9 i_orig = m_before[i]

10 sim_ab = 1-spatial.distance.cosine(i_mapped,i_orig)

11 dict_sim_ab[i] = sim_ab

12

13 i_vec = m_before[i] with extra value (1) added

14 i_mapped = map_rev.dot(i_vec)

15 i_orig = m_after[i]

16 sim_ba = 1-spatial.distance.cosine(i_mapped,i_orig)

17 dict_sim_ba[i] = sim_ba

Figure 6: A list of similarities between mapped words and corresponding original words in the vector space is created.

3.2.2 Quantification of Differences Method 2: Neighbour Method

In the neighbour method6_{, it is assumed that the meaning of a word can be}

(partially) determined by looking at its neighbours. This method is based on the method described in Jeh and Widom (2002), with which the similarity of nodes in a tree can be computed utilising neighbouring nodes. A schematic of the method for this bachelor thesis can be seen in Figure 7. An overview in pseudo-code of the neighbour-method can be seen in Figure 8.

Figure 7: Schematic of the neighbour method 6_{Code provided and method posed by H. Azarbonyad.}

(14)

1 def stability(model1, model2, vocab, Embedding1, Embedding2, k, numIter):

2 # top k most similar words to each word

3 are used for calculating stability values

4 s_old = {}

5 s_new = {}

6 for w in vocab:

7 s_old[w] = 1. # start with agreement value 1 for each word

8

9 th = 0.55 #consider words for which the top N neighbours

10 have a similarity higher than th

11 N = 10

12

13 ###iteration###

14 for i in range(numIter): number of iterations (10)

15 j = 0

16 maxVal = 0

17 for w in vocab: # for each word in the overlapping vocabulary

18 j += 1

19 neigh1 = neighbours for word in embedding 1

20 neigh2 = neighbours for word in embedding 2

21 s_new[w] = 0

22 count = 0

23 for w1 in neigh1:

24 compute similarity between w and w1 if

25 w1 is in vocab and similarity > th

26 count+=1 27 check = 0 28 if count > N: 29 check = 1 30 temp = 0 31 count = 0 32 for w2 in neigh2:

33 compute similarity between w and w2 if

34 w1 is in vocab and similarity > th

35 count+=1

36 if s_new[w] > maxVal:

37 maxVal = s_new[w] #save highest value

38 for normalisation

39 if count < N or check == 0:

40 s_new[w] = 1. # if not enough neighbours meet the

41 similarity requirements,

42 make their agreement equal to 1

43

44 #Normalization

45 minVal = find minimum value

46 for w in s_new: 47 if s_new[w] != 1: 48 normalise value 49 50 51 s_old = dict(s_new)

52 sorted_s = sort words from low to high agreement value

53 write dictionary after iteration i to file

54

(15)

The aim is to quantify the agreement between the meaning of a given concept (”Target Word” in the Figure) in both embeddings by looking at the neighbours of the concept in the opposing embedding. These neighbours are looked up in the original embedding. The agreement is then computed using the following formula:

agreement(w) = (simi(wi, Nj(w)) + simj(wj, Nwi))/2 (3)

with sim(wi, Nj(w)) being the cosine similarity between word w in

embed-ding i and Nj(w) the top 100 most similar neighbours to word w in embedding

j looked up in embedding i.

As Ni(w) represents a set of neighbours rather than a single neighbour,

sim(wi, Nj(w)) is computed as:

sim(wi, Nj(w)) =

1

|Nj(w)|

Σw0_N

j(w)cosinei(vwi, vw0_i) (4)

where vwi represents the word vector of word w in embedding i and vw0i the

word vector of neighbour w0 _{from embedding j looked up in embedding i.An}

illustration of Formulas 3 and 4 can be seen in Figure 9.

Figure 9: Overview of Formulas 3 and 4. Cosine similarities are computed for the target word in embedding one and the neighbours in embedding two (shown) and vice versa (not shown in Figure).

The next step of this method involves determining the agreement for the neighbours of the target word and the neighbours thereof up until a certain depth, therefore making the process iterative. Formula 3 then becomes:

(16)

where agreementt_{(w) represents the agreement at iteration t. The}

agree-ment at iteration zero is set to 1. Formula 4 now becomes:

simt_(w i, Nj(w)) = 1 |Nj(w)| Σw0_N_j_(w)cosinei(vwi, vwi0)agreement t−1_(w0_{) (6)}

The iterative step of this method is shown in Figure 10. After each iteration, the obtained agreement measures are saved to a file (line 53 in Figure 8).

Figure 10: Overview of Formulas 5 and 6. Cosine similarities are computed for the neighbours of the target word in embedding one and their neighbours in embedding two (shown) and vice versa (not shown in Figure).

Lines 9-11 in Figure 8 set two conditions for the agreement measure to be computed; for a word w, the top N (=10) neighbours must have a higher similarity to w than th (=0.55). For words that do not meet these requirements, the agreement value will remain equal to one. While this ensures that noise in the data (e.g. words that do not occur frequently in the dataset or occur in a wide range of contexts) is flagged and does not acquire an agreement value, it may be that the flagged words contain valuable information about the change in meaning of a word from before to after 9/11 and that this knowledge is discarded through flagging.

An advantage of the iterative neighbour-method is that it takes into account that the agreement for a concept is perhaps not only influenced by its direct neighbours, but by its indirect neighbours as well.

3.2.3 Quantification of Differences Method 3: Co-Occurrence of

Neigh-bours and Linear Mapping

This method bases the agreement between the meaning of a given concept in both Word2Vec models on the intersection of neighbours for both embeddings,

(17)

as well as on the linear mapping method described in Section 3.2.1. This method is not iterative, as opposed to the neighbour method described in 3.2.2, so as not to lose the simplicity of direct neighbour comparison. Figure 11 shows a schematic of a target word in both embeddings and their co-occurring neigh-bours. For neighbours that do not occur in the intersection, the linear mapping is used to map the neighbour vector to the other embedding. This is also de-scribed in the pseudo-code of the method, shown in Figure 12.

Figure 11: Overview of the intersection part of method 3; the agreement of meaning of the target word is based on co-occurring neighbours (black lines) in both embeddings.

1 before_neighbours = list of neighbours of target word before 9/11

2 after_neighbours = list of neighbours of target word before 9/11

3

4 for b in before_neighbours:

5 if b in after_neighbours:

6 count1_same+= N-after10_w.index(b) # add count of intersecting

7 neighbours based on index

8 else: # if b doesn’t occur in after_neighbours

9 i_vec = get vector of b in before model (with bias)

10 i_mapped = map from before model to after model

11 sim1+= abs(cosine similarity of target word vector and mapped

12 neighbour vector)

13 count1_sim+=1

14

15 repeat for neighbours in after_neighbours

16

Figure 12: Pseudo-code for the intersection-mapping method.

This method uses the top 100 most similar neighbours to the target word w

(18)

for embedding i. The count of co-occurring neighbours is not only based on co-occurrence of neighbours, but on their index in the neighbour-list, represent-ing the similarity to the target word, of the neighbour. Neighbours that occur higher in the neighbour-list are weighted more heavily using the following for-mula:

Ci(w) =|N(w)| ∗ |Ni(w)∩ Nj(w)| − ΣnNi(w)∩Nj(w) indexj(n) (7)

where Ci(w) represents the count of neighbours of word w in embedding i

based on their index in embedding j,_|N(w)|7_{the number of neighbours}

consid-ered (i.e. 100), Ni(w) the 100 neighbours most similar to word w in embedding i,

and indexj(n) the index that neighbour n, which is an element of Ni(w)∩Nj(w),

has in embedding j. Take as an example the following neighbour-lists:

1 a(w) = [n1,n2,n3,n4,n5]

2 b(w) = [n2,n4,n1,n5,n6]

Each neighbour in list a(w) is obtained (if possible) from list b(w), along

with the index. The final count Ca(w), using Equation 7 then becomes:

1 C_a(w) = 5*4 - (2+0+1+3) = 14

Note that the summation in Formula 7 contains four terms instead of five, as neighbour n3 does not occur in list b(w). So as to be able to take neighbour n3 into account when computing the agreement, the linear mapping is used to map the n3 vector from list a(w) to a vector representing the neighbour in b(w) and taking the absolute cosine similarity from the mapped vector to the target word vector. This can be seen in line 8 and onwards in Figure 12. The Formula

concerning the similarity of the linear mapping is as follows8_:

Simi(w) = ΣnNi(w)\Nj(w)abs(cosinej(w, f (n))) (8)

where in Formula 8, w is the word for which the agreement is computed,

f the mapping from embedding i to embedding j, and where cosinej denotes

that the similarity is computed in embedding j. The agreement(w) function is defined as: α∗(C1(w) + C₂ 2(w) ∗ ΣN i=1i )+(1−α)∗(₂ Sim1(w) ∗ |N1(w)\ N2(w)|+ Sim2(w) 2∗ |N2(w)\ N1(w)|) (9)

7_{For a set S, |S| denotes the cardinality.}

8_{The absolute cosine could perhaps be considered as illogical, as negative similarities which}

are less similar to a word are now counted as positive. However, this is only the case if negative similarities occur. Therefore, it is assumed that making the cosine similarity absolute will not have a great impact on results. Making the cosine similarity absolute was done to obtain agreement measures that, in theory, lie between 0 and 1.

(19)

where ΣN

i=1i denotes the maximum of C1 and C2if all neighbours in the top

100 occur in the opposing embedding. α has the following properties:

α =      α = 1 if|N1\ N2| = 0 and |N2\ N1| = 0 α = 0 if C1= 0 and C2= 0 f ixed(0.5) if otherwise (10)

3.3 Summarisation Methods

Using the quantifications from the three methods described in the previous sections, the summarisation of concepts related to the 9/11 attacks can be per-formed. This is explained in the following subsections.

3.3.1 Summarisation Method 1: Using Neighbour-Method Results

This summarisation method is based solely on the agreement measure obtained through the neighbour-method described in Section 3.2.2. To summarise a con-cept before and after 9/11, the following parameters should be defined:

word The word that is to be summarised

N The number of words in the summarisation

threshold The agreement of a word should be lower or equal to the agreement

for it to be considered for the summary

iteration The agreement measures obtained through the neighbour-method

are saved after every iteration, this parameter states of which iteration the results should be considered (starting at iteration 0)

remove duplicates Parameter that states whether or not words are allowed

to occur in both summaries (before and after 9/11)

Summarisations using this method have been performed with the following settings:

N=5, threshold=0.15, iteration=9, remove_duplicates=y

Figure 13 shows the pseudo-code for the summarisation method. The top 100 most similar words in both models to the word that is to be summarised are considered during summarisation. If a neighbour is in the overlapping vo-cabulary of the two Word2Vec models and if the agreement of this neighbour is equal to or below the set threshold, the word is added to a list (one list for each summarisation). As the top 100 neighbours are ordered from highest similarity to lowest, the summaries will follow the same trend, showing the summarisation words that meet the remaining requirements posed in the method ordered from high to low similarity.

(20)

1 before_list = []

2 after_list = []

3 b_n = neigh_before[word]

4 a_n = neigh_after[word]

5 for i in range(neigh): # top 100 neighbours

6 try: 7 if (dict_agreement[b_n[i]] <= thresh): 8 before_list.append(b_n[i]) 9 except KeyError: 10 continue 11 try: 12 if (dict_agreement[a_n[i]] <= thresh): 13 after_list.append(a_n[i]) 14 except KeyError: 15 continue

Figure 13: Pseudo-code for the summarisation method based on the neighbour-agreement measure.

The remove_duplicates parameter determines whether or not the sum-marisations of length N may contain the same words. If this is not the case, duplicates are removed in the manner described in Figure 14.

1 if remove_duplicates =="y":

2 i = 0

3 while(i < (N+1)):

4 if before_list[i] in after_list[0:N]: #if word in before summary

5 is in the top N after summary

6 remove word from before and after summary

7 add N+1th word to before summary

8 add N+1th word to after summary

9 i = 0

10 repeat until no duplicates occur in top N words for

11 both summaries

12 else:

13 i+=1

Take as an example the following two summarisation lists:

1 before_list = [a,b,c,d,e,f,z]

2 after_list = [a,g,h,i,e,r,q]

Let us say that the length of the summarisation, N, is equal to five words. As word a occurs in before_list and in the top N of after_list, the word is removed from both lists and the following lists remain:

(21)

1 before_list = [b,c,d,e,f,z]

2 after_list = [g,h,i,e,r,q]

The lists are checked for duplicates until the list lengths are either less or equal to N or until no duplicates are found in the top N words in the summari-sation. The final summarisations will be:

1 before_list = [b,c,d,f,z]

2 after_list = [g,h,i,r,q]

Removing the duplicates in a list is done so as to make the summarisations more contrastive, with the aim to enhance the visualisation of the change in meaning for a concept before and after 9/11.

3.3.2 Summarisation Method 2: Using Co-Occurrence and Mapping

Results

Apart from using the agreement measures obtained through the neighbour-method, the measure acquired through the co-occurrence-mapping method de-scribed in Section 3.2.3 serve as an alternative standard on which the sum-marisations can be based. As for the first summarisation method the top 100 neighbours are considered for summarisation, and a few parameters should be defined:

threshold The agreement of a word should be lower or equal to the agreement

for it to be considered for the summary

sim The cosine similarity that a neighbour for which the agreement is equal

to or lower than the threshold mentioned above should have to the word that is summarised

sim only The cosine similarity that a neighbour that does not occur in the

vocabulary of both Word2Vec models should have to the word that is summarised for it to be considered for the summarisation; this threshold is higher than sim.

N=5, threshold=0.4, sim=0.3, sim_only=0.5

The summaries of this method may still contain duplicates. This was decided so as to conserve the simplicity of the co-occurrence-mapping method, yielding fairly unprocessed summarisations. Take the following neighbour-list (measures added for visualisation purposes), where the first number denotes the agreement measure obtained through the co-occurrence-mapping method and the second number indicates the cosine similarity of the neighbour to the word that is to be summarised:

(22)

1 before_list = [a(0.6,0.8),b(0.45,0.6),c(0.33,0.4),d(0.25,0.2),

2 e(0.6,0.15),f(0.2,0.09)]

Let us take threshold=0.4, sim=0.3 and sim_only=0.5. The words that are considered for the summarisation will be:

1 before_list = [a(0.6,0.8),b(0.45,0.6),c(0.33,0.4)]

The main difference between this and the previous summarisation method is that the previous method does not consider a threshold for the similarity of the neighbouring words, whereas this method does. Moreover, duplicates are not removed in this method, though this is done the previous method.

3.3.3 Summarisation Method 3: Summarisation Methods 1 and 2

Combined

So as to capture the positive aspects of both quantification-methods, the final method posed consists of a combination of the two summarisation methods described above. As described in Section 3.2.2, the agreement measure for words that are considered as noise is set to one. The summarisation method in Section 3.3.1 is therefore not capable of including these words in the summary, with the possibility of discarding words that clearly portray a change in meaning of a concept over time. With the aim of detecting these words, the agreement measure obtained by the co-occurrence-mapping method described in Section 3.2.3 is utilised as a benchmark for words where the neighbour-method returns a value of one. As with the previous two methods, the top 100 neighbours of a word are considered for summarisation and the following parameters should be defined:

threshold The agreement obtained through the neighbour-method of a word

should be lower or equal to the agreement for it to be considered for the summary

iteration The agreement measures obtained through the neighbour-method

are saved after every iteration, this parameter states of which iteration the results should be considered (starting at iteration 0)

thresh cooc For words where the agreement measure for the neighbour-method

is equal to one, the agreement value obtained through the co-occurrence-mapping measure is considered; the value should be lower or equal to thresh_cooc

sim The cosine similarity that a neighbour for which the agreement is equal

to or lower than thresh_cooc mentioned above should have to the word that is summarised

remove duplicates Parameter that states whether or not words are allowed

(23)

N=5, threshold=0.15, iteration=9, thresh_cooc=0.4, sim=0.3, remove_duplicates= y An illustration of the method is shown in pseudo-code in Figure 15.

Dupli-cates in the summaries are removed as described in Figure 14. As in the first summarisation method described, only words that occur in both Word2Vec vo-cabularies are considered. The difference is that the ”noise” is analysed further utilising an additional agreement measure, which could perhaps result in more contrastive and defining summarisations before and after the 9/11 attacks.

1 before_list = []

2 after_list = []

3 b_n = neigh_before[word]

4 a_n = neigh_after[word]

5 for i in range(neigh): # top 100 neighbours

6 try: 7 if (dict_agreement[b_n[i]] <= thresh): 8 before_list.append(b_n[i]) 9 if (dict_agreement[b_n[i]] == 1.0): 10 try: 11 if agreement_own[b_n[i]] 12 <= thresh_own and 13 (m_before.similarity(word,b_n[i]) >= sim): 14 before_list.append(b_n[i]) 15 except KeyError: 16 continue 17 except KeyError: 18 continue 19 20 try: 21 if (dict_agreement[a_n[i]] <= thresh): 22 after_list.append(a_n[i]) 23 if (dict_agreement[a_n[i]] == 1.0): 24 try: 25 if agreement_own[a_n[i]] 26 <= thresh_own and 27 (m_after.similarity(word,a_n[i]) >= sim): 28 before_list.append(a_n[i]) 29 except KeyError: 30 continue 31 except KeyError: 32 continue

(24)

4 Evaluation

The following subsections describe the results for the three summarisation meth-ods discussed in Section 3.3. A total of sixteen concepts, which were either chosen based on relevant literature or deemed fit for summarisation as they are related to the 9/11 attacks, were summarised. The individual summarisations are discussed in section 4.1.1, where a brief motivation is given for the inclusion of the concepts in the evaluation-list. Section 4.1.2 provides an overview of the summarisation results and possible explanations. Finally, in Section 4.1.3, the concept-pair results are discussed.

As the agreement measures obtained through the three quantification meth-ods described are used for summarisation, the methmeth-ods are not evaluated indi-vidually. Rather, the summarisations produced using the three summarisation methods are assessed through peer evaluation (summarisations and concept-pairs, explained below), which indirectly leads to an analysis of the quantifica-tions.

4.1 Google Form: Peer Evaluation

To assess the comprehensibility of the summarisations as well as how descriptive they are for a concept before and after 9/11, a Google Form was created and sent to fellow students for peer evaluation. A total of ten people filled out the form, which consists of sixteen summarisation questions and five concept-pair questions. All concepts were title-cased or capitalised and punctuation has been added where deemed fit to make the terms easily readable.

Summarisation questions consist of a concept and its accompanying sum-marisations before and after 9/11 for all three summarisation methods, as can be seen in Figure 16.

Figure 16: Example of a summarisation question

In general, a total of six summarisations per concept are given. However, as some summarisations for the Neighbour quantification method and combined Neighbour Co-Occurrence Mapping measure are the same, duplicate summarisa-tions were removed, sometimes resulting in fewer summarisasummarisa-tions for a concept. For each concept, the question is as follows:

In your opinion, which of the summaries belong to the given concept ’Before 9/11’ and which of the summaries belong to ’After 9/11’ ?

(25)

No specifications where given regarding how many summarisations per time period were given per concept, leading to a fairly open evaluation.

The Concept-Pair questions are composed of two concepts, for which the question is:

Which of the two terms has changed most since 9/11?

This type of question can be used to investigate whether or not the obtained agreement measures used in summarisation are logical. An example can be seen in Figure 17.

Figure 17: Example of a concept-pair question

Questions were randomised per type per survey, as were the options for the summarisation questions. For the evaluation form in its entirety, please refer to Appendix A.1.

4.1.1 Results for Individual Summarisations

The sixteen concepts in Table 3 have been selected for evaluation based on lit-erature or were deemed fit for summarisation as they are related to the 9/11 attacks. Concepts have been run through the Co-Occurrence Mapping sum-marisation method as (a final) part of the selection process. The summarisa-tion results are discussed in groups consisting of concepts that are similar to one another. Section 4.1.2 discusses the results in in their entirety.

Islam Airport

Muslim Hijacking

Fundamentalism Terrorist Targets

Radicalized Security

Anti-Americanism Anti-Terrorism

Terrorism WTC

Terrorist 911

Terrorist Attacks Ground Zero

Table 3: The sixteen concepts that were summarised and used for evaluation.

Islam, Muslim As the terrorists involved in the 9/11 attacks were practicers

of Islam, it could be of value to investigate whether a change in meaning for these concepts has occurred since 9/11. According to a study investi-gating eleven terrorist-related events in the United States since the 9/11 attacks, the term ’Muslim’ was often adopted when describing the attack-ers [Powell, 2011]. Furthermore, the representation of terrorist events in media contribute to the increase of fear of Islam.

(26)

Concept Neighbours Co-Occurrence Mapping Combined

Summary C Summary C Summary C

Islam Before - Islams - Islamic Fundamentalism - Christians - Roman Catholicism - Secular 8/10 - Moslem - Jihad - Moslems - Louis Farrakhan - Shiite Islam 7/10 - Islams - Islamic Fundamentalism - Moslem - Christians - Roman Catholicism 8/10 After - Religion - Muslims - Secularism - Extremism - Non-Muslims 8/10 - Wahhabism - Radical Islam - Islamic Law - Jihad - Non-Muslims 7/10 - Religion - Muslims - Secularism - Extremism - Non-Muslims 8/10 Muslim Before - Shiite - Hindu - Sunni - Shiite Muslim - Druse 9/10 - Moslem - Shiite - Predominantly Muslim - Moslems - Sunni 9/10 - Moslem - Shiite - Hindu - Sunni - Sunni Muslim 7/10 After - Fundamentalist - Sikhs - Hindus - Non-Muslims -Christians 6/10 - Non-Muslim - Sikhs - Sikh - Non-Muslims - Sunni Muslim 1/10 - Islam - Fundamentalist - Sikhs - Hindus - Non-Muslims 9/10

Table 4: Summarisations for the concepts ’Islam’ and ’Muslim’. Column ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

The summarisations shown in Table 4 have obtained fairly positive results in peer evaluation, with the exception of the ’After 9/11’ Co-Occurrence Mapping summarisation for the concept ’Muslim’. The main difference between this summarisation and the remaining ’After 9/11’ summarisation seems to be that no terms concerning extremism have been mentioned in this summary. It could therefore be argued that the occurrence of an extremist-related term is defining for fellow students to label a summarisation as being ’After 9/11’. This trend is also observed for the summarisations of other concepts.

Fundamentalism, Radicalized, Anti-Americanism Terrorists affiliated with

terrorist events with an international background since the 9/11 attacks were often described as being extremist [Powell, 2011]. As the three con-cepts mentioned all portray some form of extremism, they have been in-cluded for summarisation.

(27)

Fundamentalism Before - Fanaticism - Atheism - Separatism - Absolutism - Authoritarianism 7/10 - Muslim Fundamentalism - Anarchism - Jihad - Ayatollahs - Nationalisms 0/10 - Fundamentalists - Fanaticism - Atheism - Separatism - Absolutism 8/10 After - Militancy - Anti-Americanism - Anti-Western - Relativism - Totalitarianism 8/10 - Islamism - Islamic Extremism - Radical Islam - Militancy - Wahhabism 10/10 - Islamic Extremism - Militancy - Anti-Americanism - Anti-Western - Relativism 10/10 Radicalized Before - Embittered - Repressed - Enraged - Reviled - Militant 8/10 - Cowed - Internalized - Racked - Indoctrinated - Repelled 8/10 - Embittered - Cowed - Repressed - Internalized - Enraged 9/10 After - Anti-Western - Islamic Fundamentalists - Revolted - Fundamentalist - Fundamentalists 9/10 - Indoctrinated - Muslim Militants - Jihadists - Deeply Influenced - Stateless 10/10 - Anti-Western - Islamic Fundamentalists - Revolted - Fundamentalist - Fundamentalists 9/10 Anti-Americanism Before - Disaffection - Militancy - Radicalism - Isolationalism - Antagonism 8/10 - Militancy - Nationalist Sentiment - Russian Nationalism - Social Inqeuities - Nationalist Passions 9/10 - Dissaffection - Militancy - Radicalism - Isolationism - Nationalist Sentiment 9/10 After - Anti-Semitism - Islamic Fundamentalism - Fundamentalism - Extremism - Hatred 9/10 - Islamism - Militancy - Radical Islam - Islamic Extremism - Islamic Radicalism 10/10 - Anti-Semitism - Islamic Fundamentalism - Fundamentalism - Extremism - Hatred 9/10

Table 5: Summarisations for the concepts ’Fundamentalism’, ’Radicalized’ and ’Anti-Americanism’. Column ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

Table 5 shows that, according to peer evaluation, all summaries were deemed descriptive enough for a time period to be labeled correctly except for the ’Be-fore 9/11’ Co-Occurrence Mapping summary of the concept ’Fundamentalism’. Similar but opposite to the problem described for the ’Muslim’ summarisation, it can be argued that, as extremist terms related to Islam occur in this sum-marisation and are not found in the other ’Before 9/11’ sumsum-marisations, fellow students have, in a way, decided that Islamic extremism and terms related to this mainly occur ’After 9/11’.

Terrorism, Terrorist, Terrorist Attacks As the 9/11 attacks were of a

(28)

Terrorism Before - Hostage-Taking - Agression - Holy War - Libya - Israel 7/10 - Terrorist Attacks - Hostage-Taking - Domestic Terrorism - State-Sponsored Terrorism - Terror Attacks 4/10 - Terrorist Attacks - Terrorist Groups - Terrorist Organizations - Hostage-Taking - Terrorist Activity 7/10 After - Islamic Terrorists - Iraq - Extremists - Islamic Fundamentalism - Insurgency 10/10 - Terror - International Terrorism - Global Terrorism - Islamic Terrorism - Islamic Extremism 10/10 - Terror - Islamic Extremism - Domestic Terrorism - Extremism - Terrorist Threats 9/10 Terrorist Before - Terrorist Group - Abu Nidal - Radical Islamic - Terrorist Attacks - Islamic Terrorist 4/10 - Terrorist Group - Terrorist Organization - Palestinian Terrorist - Suspected Terrorist - Abu Nidal 8/10 - Terrorist Group - Abu Nidal - Radical Islamic - Terrorist Attacks - Islamic Terrorist 4/10 After - Terrorist Activity - Islamic Militants - Extremists - Extremist Groups - Hijacking 10/10 - Al Qaeda - Qaeda - Islamic Terrorists - Terrorist Group - Jihadist 9/10 - Qaeda - Terrorist Cells - Terrorist Activity - Terror - Islamic Militants 9/10

Terrorist Attacks Before

- Missile Attacks - Hostage-Taking - Terrorist Groups - Air Raids - Bombing Raids 6/10 - Terrorist Attack - Terror Attacks - Terrorist Threats - Suicide Attacks - Retaliatory Attacks 5/10 - Terrorist Actions - Terrorist Bombings - Guerilla Attacks - Bomb Attacks - Terrorist Activity 7/10 After - Attack - Natural Disasters - Tsunami - Suicide Bombing - Suicide Attacks 2/10 - Terror Attacks - Terrorist Attack - Terror Attack - Terrorist Strikes - 9/11 Attacks 10/10 - Hijackings - Attack - Terrorist Threats - Suicide-Attacks - Suicide-Bombings 9/10

Table 6: Summarisations for the concepts ’Terrorism’, ’Terrorist’ and ’Terrorist Attacks’. Column ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

As can be seen in Figure 6, two of the three summarisations for the concept ’Terrorist’ after 9/11 contain a key-word concerning ’Al-Qaeda’. This term has frequently been linked to terrorist events after 9/11, even though this terror-ist group was not necessarily connected to the affairs [Powell, 2011]. The same study states that for the concept ’terrorism’, ’domestic terrorism’ is perceived as less perilous than international and global terrorism. Both ’International Ter-rorism’ and ’Global TerTer-rorism’ occur in the ’After 9/11’ Co-Occurrence Mapping summarisation for ’Terrorism’, which seems to suggest that more attention is paid to the international side of terrorism.

Airport, Hijacking, Terrorist Targets A defining aspect of the 9/11

at-tacks in 2001 was the hijacking of airplanes, two of which would ulti-mately be used to attack the Twin Towers. As an answer to the attacks, security measures concerning airports and aviation have been sharpened [Blalock et al., 2007].

While ’Terrorist Targets’ could be perceived as a quite broad concept, it could be that a shift has taken place regarding the location of terrorist attacks since 9/11.

(29)

Airport Before - Airport Terminal - Landing Strip - Plane - Passenger Terminal - Heliport 10/10 - Airport Terminal - Ohare Airport - Teterboro Airport - Midway Airport - Ohare 10/10 - Airport Terminal - Airstrip - Teterboro Airport - Ohare - MacArthur Airport 10/10 After - Airtrain - Border Crossing - Port - Pennsylvania Station - Terminals 2/10 - Dulles - Reagan International - Security Checkpoint - JFK - Stansted 7/10 - Dulles - Security Checkpoint - Terminal - JFK - Airtrain 6/10 Hijacking Before - Kidnapping - Hijackings - Bombings - Airliner - Bomb Attack 0/10 - 1985 Hijacking - Achille Lauro - Hijacker - Hijackers,- Hijackings 9/10 - Kidnapping - Hijackings - Bombings - Airliner - Bomb Attack 0/10 After - Terrorist Act - Terrorist Acts - Missile Attack - Islamic Terrorists - Terrorist 9/10 - Hijackings - Terrorist Plot - Hijacker - Hijackers - Achille Lauro 5/10 - Terrorist Act - Terrorist Acts - Missile Attack - Islamic Terrorists - Terrorist 9/10

Terrorist Targets Before

- Civilian Targets - Terrorist Attacks - Terrorist Actions - Terrorist Acts - Terror Attacks 1/10 - Retaliatory Strikes - Serbian Targets - Civilian Targets - Terrorist Attacks - Terror Attacks 6/10 - Retaliatory Strikes - Civilian Targets - Terrorst Attacks - Terrorist Actions - Terrorist Acts 3/10 After - Airfields - Landmarks - Terrorist Groups - Depots - Command Posts 3/10 - Potential Targets - Military Installations - Chemical Plants - Terrorist Target - Bridges Tunnels 3/10 - Bridges Tunnels - Airfields - Trouble Spots - Breeding Grounds - Transportation Hubs 6/10

Table 7: Summarisations for the concepts ’Airport’, ’Hijacking’ and ’Terrorist Targets’. Column ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

Table 7 shows scores for summarisations that are a bit lower than for the concepts previously discussed. Especially the Neighbours and Combined ’Before 9/11’ summarisations (which are the same) for the concept ’Hijacking’ were labeled incorrectly by all fellow students, suggesting that the key-words are either mostly associated with ’After 9/11’, or that no key-words occurred in the summary for which students felt a strong enough link to ’Before 9/11’. Overall, the same trend is seen for the concepts ’Airport’ and ’Terrorist Targets’. It could be argued that these concepts have not changed in meaning since the 9/11 attacks, or that the methods used for summarisation were not able to capture the change.

A noteworthy key-word in the Co-Occurrence Mapping and Combined ’After 9/11’ summarisations for the concept ’Airport’, however, is the term ’Security Checkpoint’, perhaps due to the increased security measures described in the motivation for inclusion of this concept for evaluation.

Security, Anti-Terrorism Following the same reasoning as for the concept

’Airport’ described above, these concepts are considered for summarisa-tion to investigate whether there has been an increase or shift in general security measures or the prevention of terrorism.

(30)

Security Before - Alarm Systems - Security Forces - Armed Forces - Paramilitary - Weapons 8/10 - Internal Security - Safety - Anti-Terrorist - Intelligence Gathering - Counterterrorism 2/10 - Anti-Terrorist - Military - Antiterrorist - Anti-Terrorism - Israels Security 1/10

After - Security Forces 1/10

- Internal Security - Airport Security - Antiterrorism - Aviation Security - Intelligence Gathering 8/10 - Antiterrorism - Coordination - Preparedness - Counterterrorism Efforts - Civil Aviation 6/10 Anti-Terrorism Before - Terrorist - Paramilitary - Germ Warfare - Suspected Terrorist - Death Squad 5/10 - Antiterrorism - Anti-Terrorist - Counterterrorism - Anti-Narcotics - Antiterrorist 6/10 - Anti-Terrorist - Counter-Terrorism - Antinarcotics - Anti-Drug - Drug-Interdiction 8/10 After - Paramilitary 3/10 - Antiterrorism - Antiterror - Antiterrorist - Counterterrorism - Intelligence-Gathering 10/10 - Antiterror - Interagency - Anticrime - Counterterrorist - Anti-Terror 5/10

Table 8: Summarisations for the concepts ’Security’ and ’Anti-Terrorism’. Col-umn ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

Table 8 shows fairly average scores for the summarisations of concepts ’Se-curity’ and ’Anti-Terrorism’. As these concepts are quite broad and not limited to the 9/11 attacks (especially ’Security’, which is a quite universal term), the summarisations seem rather ambiguous in the sense that no ’clues’ appear that could match the summarisations to a time period. An exception to this is perhaps the ’After 9/11’ Co-Occurrence Mapping summarisation, in which the terms ’Airport Security’ and ’Aviation Security’ occur; the score for this sum-marisation is indeed higher than for the remaining ’After 9/11’ sumsum-marisations for the concept ’Security’.

WTC, 911,Ground Zero These three concepts are perhaps most closely

re-lated to the 9/11 attacks when compared to the other terms considered for summarisation and are therefore expected to produce non-ambiguous, descriptive summarisations.

The World Trade Center (WTC) was one of the targets in the 9/11 attacks. ’911’ is the emergency number in the United States. As punctuation was removed during tokenisation of the NYT data, ’911’ should, in theory, also represent September 11th after the attacks (being 9/11). Ground Zero denotes the location where the Twin Towers were formerly situated.

(31)

WTC Before - Golf Association - USGA - Womens Tennis - WTA - WTA Tour 10/10 - Golf Association - Soccer Federation - USTA 10/10 - Golf Association - USTA - USGA - Womens Tennis - WTA 10/10 After - Firefighter 9/10 - North Tower - South Tower - Trade Center - 92nd Floor - Cantor Fitzgerald 8/10 - Firefighter - Murrah Building - FDNY - Vesey Street - AON Corporation 8/10 911 Before - 917 - Phones - 646 - Cellular Phone - 611 9/10 - 911 Emergency - 911 Telephone - 911 Operator - 911 System - Emergency 911 9/10 - 911 Emergency - 911 Operator - EMS - 911 Call - 911 Calls 9/10 After - Terror Attacks - Terrorist Attack - Terrorist Attacks - Tragedy - Hijackings 10/10 - Sept 11 - 9/11 Attacks - Sept 11th - September 11th - Terror Attacks 10/10 - Terror Attacks - Katrina - Terrorist Attack - Terrorist Attacks - Twin Towers 10/10

Ground Zero Before

- Terminus - Southern Tip - Telluride - Fort Irwin - DDB 9/10 - Hot Spot - Nerve Center - Post-Apocalyptic - Pasadena Playhouse - Boomtown 6/10 - Hot Spot - Nerve Center - Post-Apocalyptic - Pasadena Playhouse - Boomtown 6/10 After - Lower Manhattan - Towers - Columbus Circle - Brooklyn Bridge - Yankee Stadium 5/10 - Trade Center - Disaster Site - Lower Manhattan - Freedom Tower - 9/11 Memorial 10/10 - Lower Manhattan - Twin Towers - Memorial - 16-Acre Site - Fresh Kills 9/10

Table 9: Summarisations for the concepts ’WTC’, ’911’ and ’Ground Zero’. Column ’C’ denotes the number of times the summary was labeled ’correctly’ during peer evaluation.

Table 9 shows summaries of the concepts perhaps most specifically related to 9/11 when compared to the other concepts used for evaluation. This can be seen in the scores for the summarisations of the concepts ’WTC’ and ’911’.

A noteworthy observation is the difference between the ’Before 9/11’ sum-marisation of the concept ’911’ for the Neighbours and Combined method. To recall, the Neighbour quantification method assigned an agreement measure of 1.0 to words that are considered noise. The combined summarisation method takes these words into account for summarisation by using the Co-Occurrence Mapping measure for these words. While both summarisations were labeled correctly by most fellow students, it can be argued that the Combined sum-marisation is more descriptive of the concept than the Neighbour summarisa-tion, which consists mainly of other numbers. The combined method captures the defining key-words (which all have a Neighbour agreement of 1.0) in this case, instead of labeling them as being noise.

A slight disadvantage to the Combined summarisation method can be ob-served in the ’After 9/11’ summarisation for ’911’. While the concept ’Twin Towers’, a defining term for the 9/11 attacks, is added to the summary, the term ’Katrina’, possibly referring to the hurricane, is added as well. Both of the terms obtained a Neighbour agreement measure of 1.0. The problem, from now on referred to as ’Noise vs Defining Concept’, is thus that, while there are certainly descriptive concepts in the ’noise’, it may be hard to distinguish these terms from real irrelevant, noisy words.

(32)

4.1.2 Overview of Summarisation Results

Neighbours Co-Occurrence Mapping Combined

C Before C After C Before C After C Before C After

Recall Peers vs Method 68.13% 65.00% 67.50% 80.00% 66.25% 81.88%

Table 10: Recall for summarisation questions. ’C’ stands for correct labeling by fellow students.

Table 10 describes the overal recall for the summarisation questions, obtained through dividing the number of total votes casted correctly per summarisation type by fellow students by the number of total votes that should have been casted per summarisation type. As can be seen, the obtained results seem average to fairly positive, with the ’After 9/11’ summarisations for the Co-Occurrence Mapping and Combined method scoring best. This may be the case because, as mentioned in the previous section, for some concepts a trend has been detected where summarisations containing extremist terms are labeled as being ’After 9/11’ quite often. Table 11 shows the average of correct labels assigned to summarisations (before and after 9/11) for the three methods, along with the Neighbour and Co-Occurrence agreement measures of the concept.

Concept Neighbour Agreement C Co-Occurrence Mapping Agreement C C Combined

islam 0.138 8/10 0.489 7/10 8/10 muslim 0.098 7.5/10 0.460 5/10 8/10 terrorism 0.177 8.5/10 0.429 7/10 8/10 terrorist targets 1.0 2/10 0.194 4.5/10 4.5/10 airport 0.167 6/10 0.520 8.5/10 8/10 wtc 1.0 9.5/10 0.030 9/10 9/10 911 1.0 9.5/10 0.067 9.5/10 9.5/10 fundamentalism 0.121 7.5/10 0.566 5/10 9/10 terrorist attacks 0.135 4/10 0.351 7.5/10 8/10 security 1.0 4.5/10 0.442 5/10 3.5/10 hijacking 0.145 4.5/10 0.385 7/10 4.5/10 ground zero 1.0 7/10 0.074 8/10 7.5/10 terrorist 0.137 8.5/10 0.408 7/10 6.5/10 anti-americanism 0.120 8.5/10 0.549 9.5/10 9/10 anti-terrorism 1.0 4/10 0.398 8/10 6.5/10 radicalized 1.0 8.5/10 0.428 9/10 9/10

Table 11: Average of correct labels assigned to summarisations (before and after 9/11) for the three methods, along with the Neighbour and Co-Occurrence agreement measures of the concept. Note that for the Neighbour agreement summarisations, the concepts that were summarised may have an agreement of 1.0, whereas for the terms used in their summarisations, this is not the case.

It appears that the agreement measures for the concepts themselves do not necessarily provide an indication of how descriptive the summarisations for a time period may be based on the results of peer evaluation. What can be observed, however, is that in quite a few cases, the combined summarisation method seems to perform on par with the best of the single summarisation methods, if not better (with a few exceptions).

4.1.3 Results of Concept-Pair Questions

The following concept-pairs shown in Table 12 were included in the evaluation form to further assess the agreement measures obtained during quantification:

(33)

Terrorism vs Ground Zero WTC vs Muslim Security vs Fundamentalism 911 vs Airport Terrorist vs Targets-Islam Table 12: My caption

Table 13 shows the results of peer evaluation, along with the agreement measures obtained through the Neighbour and Co-Occurrence Mapping method. As can be seen, the results according to peer evaluation do not show convincing evidence that the agreement measure defines the quantity of change of a concept over time. What can be observed, however, is that the two agreement measures complement each other, in such a way that for each concept-pair, one agreement measure represents the ’correct’ change in meaning according to peer evaluation.

Concepts Agr. Neighbour (it. 9) C Agr. Co-Occurrence Mapping C vs Terrorism 0.18 3/10 0.43 7/10 Ground Zero 1.0 0.07 vs WTC 1.0 3/10 0.03 7/10 Muslim 0.098 0.46 vs Security 1.0 7/10 0.44 3/10 Fundamentalism 0.12 0.57 vs 911 1.0 0/10 0.07 10/10 Airport 0.17 0.52 vs Terrorist Targets 1.0 9/10 0.19 1/10 Islam 0.14 0.49 Average 44% 56%

Table 13: Concept-pairs with, for each term, the agreement measure obtained through the Neighbour method and the Co-Occurrence Mapping method. It-eration 9 for the Neighbour method is the 10th itIt-eration, as counting starts at 0.

5 Conclusions

Hierin beantwoord je jouw hoofdvraag op basis van het eerder vergaarde bewijs. To recall the research goal from the introduction, the aim of the project ’My Reality is not Your Reality’ is to:

To achieve the research goal, two main questions had to be answered, namely:

RQ1 Can the change in meaning of concepts between two viewpoints be

(34)

RQ2 How can the meanings of words seen from two viewpoints be summarised in a comprehensible manner?

The meaning of concepts before and after 9/11 were represented as vectors in a Word2Vec model. Using three methods, the Linear Mapping method, Neigh-bour method, and Co-Occurrence Mapping method, the difference in meaning of a concept before and after 9/11 has been quantified. Based on results ob-tained through peer evaluation, it seems that there is a logical foundation for all three methods, producing quantifications that are capable of, to a certain extent, expressing the change in meaning for a given concept over time.

Agreement measures obtained through the Neighbour method and Co-Occurrence Mapping method appear to complement one another, which could be an expla-nation for the Combined summarisation results scoring relatively well in peer evaluation. It can be argued, based on the results of the Neighbour and Co-Occurrence Mapping summarisations as well as the Combined summarisations, that there is definite potential in summarising concepts in a similar manner as described in this project.

Overall, the methods posed in this research produce promising results re-garding quantification of difference between the meaning of a concept and its summarisation, making future additional research a logical step.

5.1 Discussion

A few concerns expressed by some fellow students who filled out the peer eval-uation may have contributed to the results in a negative manner. For some students, it was (initially) not clear that the summarisations belonged to a con-cept, even though this was explained in the evaluation form. It could be that some summarisations were therefore judged in a different manner. Furthermore, as the students were quite young at the time of the attacks, it appeared to be quite difficult to express an opinion on whether or not a summarisation belonged to the concept before 9/11, as students did not have a clear recollection of these terms before 9/11.

One final note is that, even though the results for some concepts seem to have been influenced by the 9/11 attacks, it need not be the case that this event is the only reason for the change in meaning of a concept. Other important events, attacks and developments may have played a significant role in the change in meaning; this is however hard to detect without additional research.

5.2 Future Work

A noteworthy addition to this research that could perhaps improve quantifica-tion and summarisaquantifica-tion concerns the lemmatisaquantifica-tion of key-words. Words such as ’terrorism’, ’terrorist’ and ’terrorists’ are all seen as separate words and could therefore all occur in one summarisation, leaving less room for more descriptive terms. Grouping these concepts as being one single term avoids this problem, hence leading to perhaps more contrastive summarisations. As the Word2Vec model bases the formation of vectors partially on the context of the word, it would most likely be necessary to perform lemmatisation before training the model, so as to ensure that the same term is used everywhere.

My Reality is not Your Reality

My Reality is not Your Reality

Contents

1

Introduction

1.1

Overview of thesis

2

Related Work

3

Methodology

3.1

Data Description - New York Times Corpus

3.2

Quantification Methods

3.3

Summarisation Methods

4

Evaluation

4.1

Google Form: Peer Evaluation

5

Conclusions

5.1

Discussion

5.2

Future Work