Contradiction detection between news articles

(1)

Contradiction detection between

news articles

Kasper van Veen

June 8, 2016

Supervisor(s): Christof Monz (UvA)

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

sentences are filtered. The last phase is to apply logistic regression using five features. This experiment will focus on contradictions containing antonyms and negations. The experi-ments at the end will show the results using contradictions found in the RTE datasets and in between two news articles. From the results we can conclude that the chosen features work well, but are not enough to cover the whole RTE dataset.

(4)

(5)

1 Introduction 7

1.1 Structure . . . 7

2 Background 9 2.1 Methodology . . . 9

2.2 What are contradictions . . . 9

3 How do we detect contradictions? 11 3.1 Find semantics and syntax of sentences . . . 11

3.1.1 Using spaCy for syntax analysis . . . 11

3.1.2 Dependency graphs . . . 14

3.2 Alignment between dependency graphs . . . 14

3.3 Filter non co-referent sentences . . . 15

3.4 Logistic Regression . . . 15

3.5 Features . . . 16

3.5.1 Antonyms feature . . . 16

3.5.2 Switching of object and subject feature . . . 18

3.5.3 Alignment feature . . . 18

3.5.4 Negation feature . . . 18

4 Experiments 19

(6)

(7)

Introduction

When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their fingers at the Russians. The Russians however, pleaded the Ukrainian govern-ment as guilty. Instead of accepting and believing what the Western news said, I decided to start my own research to find out what both parties were saying. I found an article published on the BBC on the 14th of October 2015, which stated:

‘Mr Joustra said pro-Russian rebels were in charge of the area from where the missile that hit MH17 had been fired.’

while Pravda (one of the biggest Russian news sources) published an article on the 15th of October 2015 which stated:

‘Group representatives confirmed that the plane was shot down from the territory controlled by the official Kiev.’

It can easily be seen that two of the biggest news sources in the world were making totally different statements about the same subject. This is an interesting observation because it can be seen as propaganda by both parties. Although it is not possible to find out the truth, it is however possible to show the statements in which parties contradict from each other. This is where the idea for the thesis came from: to build a program that is able to detect contradictions and therefore show the differences in opinion.

Stanford University did some research on contradiction detection.[3] It uses four steps to determine if sentence pairs from the RTE3 dataset are either a contradiction or not. This thesis will use a similar approach from Stanford University but will also use different tools and features which will be discussed later in this thesis.

The research question of this thesis is: what kind of contradictions can I detect in two related news articles?

1.1 Structure

This thesis starts with a short introduction about the methodology, recent research and the definition of a contradiction. Chapter three will go further into the subject and will describe the four steps how computers are able to detect contradictions. Chapter four will show the results of the experiment using RTE datasets and a few contradictions found in news articles. Chapter five will mark out the conclusion and the discussion about future work.

(8)

(9)

Background

2.1 Methodology

Instead of using the Stanford parser as a dependency parser like Stanford University, this experiment uses spaCy for a faster and more accurate result. The WordNet database is used to acquire a large amount of antonyms and negations. This thesis will not only focus on the contradictions in the RTE datasets, it (Recognizing Textual Entailment) will also try to detect the contradictions found between different news articles about the MH17 disaster. The features that are used for the logistic regression are based on some contradictions found in the news articles. These contradictions are mainly antonyms and negations.

We used the RTE1 dataset to verify if the features and the logistic regression classifier are sufficient enough to detect contradictions.

2.2 What are contradictions

Before the experiment could start, the definition of a contradiction should be clear. ”Con-tradictions occur when sentence A and sentence B are unlikely to be true at the same time.” [3] In terms of logical expressions it states: A ∧ ¬B or ¬A ∧ B. An important requirement for contradictions is that both sentences are about the same event (co-referent sentences). Con-tradictions occur in many different forms and levels of difficulties. Antonyms and negations are the easiest to recognize followed by numerical differences. Next come factive and modal words and the hardest are sentences which require world-knowledge to be understood.

Examples are a good way to show what the differences are and how they are recog-nized. Antonyms are words that are opposites of each other like big/small, rich/poor and young/old:

‘The people of Kenya are rich’ vs ’The people of Kenya are poor’

Negations are words that are negations of each other like did/didn’t, have/haven’t and could/couldn’t:

‘Frank committed the crime’ vs ’Frank didn’t commit the crime’ Numerical differences occur when there is a difference between numbers:

‘Apple’s annual revenue was 50 million in 2016’ vs ’In 2016, Apple’s annual revenue was 40 million.

Also a difference between the date on which an event occurred could be seen as a contra-diction:

‘Willem-Alexander became king of the Netherlands in 2013’ vs ’Willem-Alexander became king of the Netherlands in 2010.

To detect other numerical differences the computer should be able to recognize words like ‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ [6]. These words add extra value to the number next it to. An example:

(10)

‘More than 500 people attended the ceremony’ vs ’700 people attended the cere-mony’

The program would detect this as a contradiction, since 500 and 700 are different numbers. However, ‘more than 500’ is technically the same as ‘700’[1]. Although this is not true in all cases: it depends on the range between the two numbers. ‘At least 200’ and ‘5000’ are too far apart from each other to be considered reliable sources, although it is technically the same. It is up to the end user to determine this boundary.

Factive words add necessity or possibility to a verb:

‘The burglar managed to open the door’ vs ‘The burglar opened the door’ Modal words add modality to a verb:

‘He will go to work’ vs ‘He would to go to work’

The last and hardest type of contradictions needs world knowledge to be understood. For the human eye it might be easy to recognize them as a contradiction but for a computer it is difficult. An example:

‘Albert Einstein was in Austria’ vs ‘Albert Einstein was in Germany’

This is not a contradiction because both sentences can be true time. Albert Einstein could have been in both places, but not in the same time

‘Albert Einstein died in Austria’ vs ‘Albert Einstein died in Germany’

This is obviously a contradiction, because both sentences can’t be true. Someone can only pass away in one place. ‘Died in’ should be seen as a function of a person’s unique place of death[7]. For a program that is able to detect contradictions it is hard to know all of these functions, so an idea for further research is to make a dataset of these functions. Research showed that only few contradictions could be detected using syntactic matching. The rest depends on having world knowledge and the understanding of the semantic structure of the sentences [2]. A relative easy part of world knowledge are location relations. It is common sense that ‘Amsterdam’ is the capital of the Netherlands. A computer however, often does not have that knowledge.

‘Today, the mayor returned to the capital of the Netherlands, Amsterdam.’ vs ‘Today, the mayor returned to Rotterdam, the capital of the Netherlands.’

The program would detect a contradiction here, because the mayor returned to two different cities on the same day. Still, this is not a contradiction because the second sentence is false. Rotterdam is not the capital of the Netherlands. Holonyms should be used to construct a dataset containing world knowledge [1]. Holonomy is a semantic relationship between multiple terms. ‘House’ is a holonomy of ‘door’ and ‘window’. In the case above, ‘capital of the Netherlands’ is a holonomy of ‘Amsterdam’. If implemented correctly in a dataset, it should see the second sentence as false and thereby will not detect any contradictions.

There are many different sorts of contradictions which makes it hard to detect them. Often the sentence pairs are not as similar as the sentences above and thus require the knowledge of their syntactic structure. This paper will mainly focus on the antonym and negations to test if the contradiction detection program is working on a specific level.

(11)

How do we detect contradictions?

Two sentences are needed to determine if they are contradictions or not, so we name them sentence A and B. For contradiction detection there are various steps that need to be made. The first step is the find the syntactic structure for each sentence. The syntactic structure of a sentence shows the words that form a sentence and their properties like verbs and adjuncts. spaCy will be used as a dependency parser to achieve the first step. Next the two graphs obtained from spaCy will be aligned with each other to acquire a specific score which will determine if sentence A and B have the possibility to be a contradiction or not. This score is based on the occurrence of antonyms, negations and other words that might lead to a contradiction. For the third step we have to filter the non co-referent sentences, which are sentences that are not about the same event. The final step for this program is to apply logistic regression which will determine if the sentences are true contradictions of each other. For this experiment, a couple already existing datasets are used. RTE are datasets from Stanford University which contains 800 pairs of sentences with the possibility of being a contradiction. WordNet is a dataset that contains many of the English antonyms and negations. These datasets are used as a tool to complete this experiment

3.1 Find semantics and syntax of sentences

Computers and humans differ in many ways. When a human hears a sentence, its prior knowledge is used to understand it. The person does not only uses its knowledge of grammar but he/she also understands the words and their meanings. To detect contradictions, a computer should also be able to understand the context of the sentence and the meaning of the words. To get a better understanding about the content of a sentence, the semantics should be studied.

Semantic is the study of meaning of linguistic expressions. It shows how words and phrases are related to their denotation. The meaning of words often depend on the whole sentence. A good example to show this:

‘Spears have a very sharp point’ vs ‘You should not point at people’

Words like ‘point’ are called homonyms: they are written the same but could have different meanings. In the first sentence, ’point’ is a noun while in the second sentence ‘point’ is a verb. The syntax are the grammar, rules and principles that give the structure of a language. The grammar and the meaning of a sentence are closely related. Sometimes the grammar can be right but the sentence is meaningless, and vice versa.

‘The helpless apple saw anger’ vs ‘The young man took in the shop some cigarettes’ The first sentence doesn’t make any sense, however the syntactic structure is correct. The second sentence is syntactic incorrect but the reader can understand the sentence.

3.1.1 Using spaCy for syntax analysis

spaCy was used to get the syntactic structure of a sentence. spaCy is a dependency parser which reads every word and links them together based on their syntactical structure. It uses

(12)

tokenization to split the sentence in separate words and numbers. White space characters are used to separate the tokens from each other. Although it uses 1.5GB of RAM, the choice for using spaCy instead of other dependency parsers like Stanford was quickly made. spaCy is a fast and very accurate parser written in Python which made it easier to combine with the other components of the contradiction detection program. It can also recognize homonyms based on the rest of the sentence, a proof of this can be seen in figure 3.1 and 3.2 where the sentence pair ‘Spears have a very sharp point’ and ‘You should not point at people’ is used. The figures are images taken from the CSS version of spaCy’s dependency parser[8].

(13)

PROPN proper noun

AUX auxiliary verb

ADJ adjectives

ADV adverbs

ADP adpositions (prepositions and postpositions)

CONJ conjunctions

DET determiners

INTJ interjection

NUM cardinal numbers

PRT particles or other function words

PUNCT punctuation

SCONJ subordinating conjunction

SYM symbol

For this experiment it is useless to only use the CSS version of spaCy so the API is used to obtain the graphs. A next pressing issue that came across during this experiment are syntactic ambiguities. They occur when the specific word order is not enough to fully understand the sentence. A famous example came from a quote by Grouche Marx:

‘One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know..’

This sentence can be interpreted in two different ways, either: I shot the elephant when I was wearing my pyjamas or I shot the elephant, who was wearing my pyjamas. When this sentence is used in spaCy’s dependency parser we get the result seen in figure 3.3.

Figure 3.3: Syntactic ambiguity

This means that spaCy is interpreting this sentence as: I shot the elephant when I was

wearing my pyjamas. ‘Shot’ refers to ‘in’ which refers to ’pyjamas’. In the other case

‘elephant’ would have referred to ‘pyjamas’. Although syntactic ambiguities don’t occur so often, it is something to take into consideration for this experiment. So far this experiment is only able to detect one interpretation and therefore it might not be able to detect some contradictions. When we transform the sentence above into a more realistic version we get:

‘The burglar shot someone in his pyjamas’ which translates in:

‘The burglar shot someone, while the burglar was wearing his pyjamas’ Now a sentence that might look like a contradiction:

‘Someone was not shot wearing his pyjamas’,

this might look like a contradiction, but it is not about the same event. Nobody was shot while wearing pyjamas, instead somebody shot while wearing pyjamas.

So after both sentences are parsed, they should get their corresponding dependency graphs.

(14)

3.1.2 Dependency graphs

A dependency graph is a graph which represents dependencies of various objects (in this case, words) to each other. Each graph contains as much information as possible about the semantic structure of the sentence. Each sentence is split up into words (nodes) and each edge show the grammatical relationship between the words. In the program, there are three different dependency graphs that are shown to the user. POS, tag and NER. From the official spaCy documentation: POS represents the word-class of a token. It is a coarse-grained and less detailed tag. Tag is different, it is fine-grained and more detailed than POS. It represents not only the word-class but also some standard morphological information about the token. Tag is used for the syntactic parser because they are language and treebank dependent. The tagger has the ability to predict these fine-grained tags and a mapping table is used to reduce them to the coarse-grained .pos tags.[9] At last, NER stands for named-entity recognition, which are names of persons, places, companies and other known entities.

The program allocates each word to a single node. However, some words are auxiliary verbs which are verbs that are used in forming the tenses of other verbs. Examples are: ‘were lost’ and ‘must go’. A auxiliary verb is often attached to a main verb, which shows the semantic content of the sentence. Another example: ‘I did not complete my homework’, the main verb is ‘complete’ while ‘did not’ is used to support it. The program sees these two words as one word and makes sure to only allocate it to one single node. The output of this graph is:

I did not complete my homework [’PRP’, ’VB’, ’PRP$’, ’NN’] [’PRON’, ’VERB’, ’ADJ’, ’NOUN’] [’’, ’’, ’’, ’’,’’]

The next step is to align the two graphs to each other to find the similarities and differences.

3.2 Alignment between dependency graphs

After each sentence is transformed into a dependency graph, they are aligned to each other. Alignment between graphs is the concept of mapping two graphs with each other, to make them as similar as possible. For contradiction detection it is used to map words (nodes) from sentence A to similar words in sentence B. If a word does not have any similar words, it is ignored.

The idea is to obtain a specific score based on the alignment. Synonyms and antonyms will get the highest score while words that have no similarity (irrelevant words) will get the lowest score.

The similarity score is based on the cosine metric. This is an similarity measurement between two vectors that evaluates the cosine of the angel between them. The spaCy toolkit uses the word2vec model vectors produced by Levy et al.[5], and those vectors are used for the cosine metric.

(15)

‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose more than 40 percent’

If merging is not limited to PROPN, the NUM-PERCENT pair will be merged. This will result in a low entailment score, because it sees ‘40 percent; and ‘46 percent’ as totally different words with no similarity:

similarity of 40 percent (40 percent) and 46 percent (46 percent) = 0.0000 However when merging named entities is turned off for NUM (cardinal numbers) the total alignment score will be much higher:

similarity of 40 (40) and 46 (46) = 0.7749 similarity of 46 (46) is the highest (0.7749) percent

---similarity of percent (percent) and Mitsubishi Motors Corp. (Mitsubishi Motors Corp.) = 0.0000 similarity of percent (percent) and sales (sale) = 0.3118

similarity of percent (percent) and percent (percent) = 1.0000 similarity of percent (percent) is the highest (1.0000)

Mathematically speaking: f (‘40 percent’, ‘46 percent’) < f (‘40’,‘46’)+f (‘percent’, ‘percent’) where f is the similarity function. Computing this equation results in 0 < 0.7749 + 1.0000.

The words which have the best similarity and thus the highest score will be used to determine the total alignment score.

3.3 Filter non co-referent sentences

Some sentence pairs might obtain a high alignment score but are not contradiction at all. An important requirement of contradictions is that both sentences are about the same event, called co-reference. A good example to illustrate this is:

‘The palace of the Dutch royal family is in Amsterdam’ vs ‘The palace of the British royal family is in London’

This will get a very high alignment score because most words are the same. Amster-dam and London are different places and therefore the program should see this pair as a contradiction, because the palace of the royal family can only be in one country.

similarity alignment score

palace → palace 1.0000

family → family 1.0000

is → is 1.0000

Amsterdam → London 0.6464

However this is about two different royal families, so it is not a contradiction at all. Here people can see the importance of filtering non co-referent sentences. This thesis will only try to detect contradictions from antonyms and negations in a given text. Contradictions based on world knowledge are very hard to detect without a reliable database containing world knowledge. Therefore this part won’t be used further.

3.4 Logistic Regression

When the alignment between two graphs have resulted in a high score, logistic regression will be the final step in this experiment. It will determine if two sentences are entailed with each other. Entailment occurs when sentence A needs the truth of sentence B to be true itself. So if A is true, then B is true: A |= B. If there is no entailment in this stage, it means that sentence A and B have a high possibility of being a contradiction. An example of entailment:

(16)

In this example it can be seen that if the first sentence is true, the second sentence is true as well. If the child broke the glass it means that the glass is broken.

Logistic regression is a mathematical model which determines the probability if a specific event will occur or not. It uses previously given data to predict the outcome of new data. The goal is to let a computer make the decisions while it is not programmed to do a task specifically. The model will learn from a training set, which consists of a matrix X and a vector y. X contains all the features, while vector y is the decision. In matrix X, each column shows a single feature and each row shows the value of that feature.

A mathematical approach for logistic regression starts with the following formula:

log p

1 − p = β0+ β1x1+ β2x2+ . . . + βnxn

where x1, x2, . . . , xn are the elements in the feature vector. The bias term β0 is used

to vertically translate the decision boundary to make sure it does not have to intersect the

origin. Since there is no feature x0, it has a weight of 1 (1 · β0 = β0). Therefore every

feature vector starts with weight 1, called the bias term. The remaining β1, β2 and βn are

the weights that correspond to the feature vector elements

The probability p notates the chance of something to happen, which is always between

0 and 1. The odds of the dependent variables are log_1−pp . So the odds for Y:

P (Y = 1)

P (Y = 0) =

P (Y = 1) 1 − P (Y = 1)

In the case of this experiment, the data that is used is based on features which, if used

together, will determine if two sentences entail each other. Stanford University tried a

similar experiment, it uses 28 features to recognize entailment using specific patterns.[6]. The features that are used in this thesis mainly focus on antonyms, negations, switching of subjects and objects, alignment and possible more in the future. If a pair of sentences contain antonyms or many negations, the possibility of a contradiction is high and there is no entailment.

This experiment uses a three class classifier to determine entailment. The three classes are: ‘yes’, ‘no’, and ‘unknown’.

A famous three class classifier is the iris dataset. The iris dataset contains the mea-surements of four attributes of 150 iris flowers from three different types of irises: setosa, virginica and versicolor[4]. The four attributes are sepal length in cm, sepal width in cm, petal length in cm and petal width in cm

Some data points in figure 3.5 are incorrectly classified because the decision boundary is in the center of a cluster. The two features are not distinctive enough to separate the two classes. The conclusion is that it is very important to have distinctive features in order to predict the right class.

3.5 Features

(17)

A line will separate the non-entailment from the entailment. All sentences above the line will have no entailment, and thus have a high chance of being a contradiction. Everything below the line will have high entailment and therefore a low chance of being a contradiction. This line, also called the decision boundary, is obtained after fitting the model using training data.

‘Fell’ and ‘rose’ are not direct antonyms like ‘good’ and ‘bad’. However the synonyms of ‘fell’ and ‘rose’ are antonyms. The program will first check if word A and B are antonyms, if not, it will check if word A is an antonym of a synonym of B or vice versa. To improve the detection of antonyms, the lemma of a word is used. A lemma is the result of canonicalization of a word. In this case the lemma of ‘fell’ is ‘fall’ and the lemma of ‘rose’ is ‘rise’.

The word ’fall’ can have different meanings. It could be a synonym for ‘autumn’ or a synonym for ‘descend’. In this example the program will compare ‘rise’ (meaning: to go up) with a synonym of fall: descend (meaning: to go down). The output of the program shows which synonyms are used:

### are_antonyms(fell, rose): lemma1: Lemma(’fall.v.01.fall’) lemma2: Lemma(’rise.v.01.rise’) =========== synonyms: {Lemma(’decrease.v.01.decrease’), Lemma(’descend.v.01.come_down’), Lemma(’precipitate.v.03.precipitate’), Lemma(’fall.v.32.settle’), Lemma(’decrease.v.01.lessen’), Lemma(’hang.v.05.flow’), Lemma(’fall.v.21.return’), Lemma(’fall.v.20.light’), Lemma(’fall.v.04.come’), Lemma(’decrease.v.01.diminish’), Lemma(’fall.v.23.fall_down’), Lemma(’fall.v.01.fall’), Lemma(’hang.v.05.hang’), Lemma(’fall.v.08.shine’), Lemma(’fall.v.21.pass’), Lemma(’descend.v.01.go_down’), Lemma(’fall.v.21.devolve’), Lemma(’accrue.v.02.accrue’), Lemma(’fall.v.08.strike’), Lemma(’descend.v.01.descend’)}

(18)

Figure 3.5: Two features of the iris dataset. This dataset is used to classify each type of iris based on the given measurements. There is a cluster in the blue area and a second cluster in the brown/red area. The blue one contains Iris setosa, while the other two types of flowers are grouped together in the second cluster.

antonyms: [Lemma(’descend.v.01.fall’)] fell rose => True

The string of each lemma consists of four parts. First part is a synonym, followed by a letter which indicates if it is a verb (v), noun (n), adjective (a), adjective satellite (s) or adverb (r). Next is a two digit hexadecimal integer which shows the amount of words used in the synset. The last part consists of another synonym and the dot symbol is used as a separator. The function are antonyms returns either ‘true’ or ‘false’ to see if the two words are antonyms of each other.

3.5.2 Switching of object and subject feature

In some cases the subject and object are switched in the sentences. This feature detects when an object becomes a subject and vise versa. If it detects a switch, there is no entailment and thus a higher chance of being a contradiction. Since the filtering of non co-referent sentences is not implemented, it can not clearly be said that they contradict each other because they might be about different events.

‘CD Technologies announced that it has closed the acquisition of Datel, Inc.’ vs ‘Datel acquired CD Technologies’

In this example, ‘CD Technologies’ was the subject in the first sentence, but the object in the second sentence. So in this case the sentences contradict each other.

(19)

Experiments

In this experiment, various RTE datasets are used to detect entailment, which contain the following numbers [3]:

Dataset # of contradictions # of total pairs

RTE1 dev1 48 287 RTE1 dev2 55 280 RTE1 test 149 800 RTE2 dev 111 800 RTE3 dev 80 800 RTE3 test 72 800

These datasets contain many sentence pairs divided in three classes based on entailment: ‘yes’, ‘no’ and ‘unknown’. Unknown entailment is often the result of non co-referent sen-tences. An example of a non co-referent sentence pair found in the RTE1 dev2 3ways dataset is:

‘The Irish Sales, Services and Marketing Operation of Microsoft was established in 1991’ vs ‘Microsoft was established in 1991’

The first sentence is about a specific department of Microsoft, the second sentence is about Microsoft itself. Although they are similar, they are not about the same event and therefore entailment is unknown. To detect entailment, 10 pairs containing antonyms and negations were chosen from the RTE1 dev1 3ways and RTE1 dev2 3ways datasets and 16 more were chosen from the RTE1 test 3ways

Dataset # of pairs # cont. antonyms # cont. negations % accurate

RTE1 dev 10 5 2 90%

RTE1 test 16 11 5 62.5%

The table above shows the total results. For the RTE1 dev datasets, 10 pairs were cho-sen of which 5 contained antonyms, 2 contained negations and 2 contained none of them. Together with the features, the program achieved an accuracy of 90%. When looking at the RTE1 test dataset, 16 pairs were chosen. There were 11 pairs containing antonyms and 5 pairs containing negations. Here we achieved an accuracy of 62.5%. Some other results showing the output of individual sentence pairs are shown below:

=== id=13 entails=1 length=None task=IR ===

T: iTunes software has seen strong sales in Europe. H: Strong sales for iTunes in Europe.

alignment: 3.0

features: [ 0. 0. 0. 0. 0.]

This output shows that the alignment score is ‘3.0’. There are three words (‘sales’, ‘iTunes’ and ‘Europe’) that are exactly the same. Exact matches are 100% the same and therefore get a score of ‘1.0000’ each. The summation of these values gives the final alignment score. The word ‘strong’ is not a PROPN, NOUN or VERB and therefore not considered for matching. The next number is the antonym feature and since there are no antonyms in both sen-tences, the value is 0. The last number is the switch between object and subject feature.

(20)

Since there is no verb in the second sentence, there are no objects and subjects to switch. Therefore the object and subject switch feature has a value of 0.

=== id=148 entails=0 length=None task=RC ===

T: The Philippine Stock Exchange Composite Index rose 0.1 percent to 1573.65. H: The Philippine Stock Exchange Composite Index dropped.

alignment: 5.50424838962

features: [ 1. 1. 0. 0. 0.]

In this example the alignment score is high, although it should be lower. The Philippine Stock Exchange Composite Index should be seen as one organization and therefore only get an alignment score of ‘1.0000’. The words ‘rose’ and ‘dropped’ are antonyms and therefore also get a high alignment score. Since there is an antonym, the second value is now 1. This means that there is no entailment en thus a high probability of contradictions.

=== id=177 entails=0 length=None task=RC ===

T: Increased storage isn’t all Microsoft will be offering its Hotmail users --they can also look forward to free anti-virus protection.

H: Microsoft won’t offer increased storage to its users. alignment: 4.99999963253

n’t n’t

features: [ 1. 0. 1. 1. 1.]

This case shows a high entailment and the change of subject and object. In the first sentence ‘increased storage’ is the object and ‘Microsoft’ is the subject. This is a false positive because ‘Microsoft’ is not switched as an object, only its position in the sentence is switched. In both sentences, negations are detected. Although there is no antonym, there is no entailment so the probability of contradictions are high.

=== id=969 entails=0 length=None task=PP ===

T: Doug Lawrence bought the impressionist oil landscape by J. Ottis Adams in the mid-1970s at a Fort Wayne antiques dealer.

H: Doug Lawrence sold the impressionist oil landscape by J. Ottis Adams alignment: 4.78047287886

features: [ 1. 1. 0. 0. 0.]

This is a clear example of an antonym.

T: Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent in June H: Mitsubishi sales rose 46 percent

alignment: 4.48539948256

features: [ 1. 1. 0. 0. 0.]

This sentence pair used in the thesis earlier is also a clear example of antonym. === DEV DATASET =============================

(21)

T: The bombers had not managed to enter the embassy compounds. H: The bombers entered the embassy compounds.

alignment: 4.00000032203 not

features: [ 1. 0. 0. 1. 0.]

These two sentence pair both contain negations. The first sentence pair does not get a value of 1 at the alignment feature because the score is not larger than 4.

=== TEST DATASET ============================ number of pairs in test: 16

logreg score on test: 0.625

pair ids: [1370, 2167, 2019, 934, 1847, 1990, 1984, 1421, 1445, 1981, 1960, 2088, 1044, 986, 1078, 1077]

answers: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]

predicted: [ 0. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 1. 1. 0. 0. 0.]

It can be seen that not all pairs are correct. We get an accuracy of 62,5%. All of the incorrect pairs have the same feature vector [0, 0, 0, 0, 0] and thus automatically get entailment 1. More features need to be implemented to get a higher accuracy.

When running the program using the sentence pairs found in the BBC and Pravda we get:

Mr joulstra said Pro-Russian rebels were in charge of the area from where the missile that hit MH17 had been fired

Group representatives confirmed that the plane was shot down from the territory controlled by the official Kiev

alignment: 2.31372259356

features: [ 0. 0. 0. 0. 0.]

predicted: [ 1.]

Unfortunately the program detects entailment in this sentence pair while there should be no entailment. The problem of the low alignment score is that ‘MH17’ and ‘plane’ are not seen as synonyms. This is considered world knowledge and therefore it should manually put in a database. The synonyms found were ‘fired’ - ‘shot down’ and ‘area’ - ‘territory’.

(22)

(23)

Conclusion and discussion

Contradiction detection is hard due to all the differences in contradictions and the difficulty of language in general. This thesis focused on recognizing antonyms and negations using four phases. The first phase uses spaCy as a dependency parser. The second phase aligned to two dependency graphs obtained from spaCy. The third phase, filtering non co-referent sentences, was not implemented because this experiment does not focus on contradictions containing world knowledge. In the last phase, logistic regression is applied using five fea-tures. These features are based on antonyms, negations, alignment score and switching between objects and subjects.

In the experiments, we achieved an accuracy of 90% on the dev set and 62.5% on the test set. Antonyms and negations are detected but when all of the features have value 0, the predictions are wrong. When using the program to detect entailment in the sentence pairs found regarding MH17, the result is unfortunately not accurate. A database containing world knowledge should be used. For example, ‘MH17’ and ‘plane’ should be seen as a synonym.

In the future, this program could be extended with more features to detect a wider range of contradictions. Detecting numerical differences could be achieved by adding words like ‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ to a database containing numbers. If implemented correctly it should identify ‘more than 500’ as technically equal to ‘700’.

To detect contradictions containing world knowledge another database should be incor-porated. This database should answer queries related to specific events. For example, ‘born in’ has to be related to one specific place since a person can only be born in one place.

Sentence pairs containing geographical places should make use of holonymys. These are words that have a semantic relationship with other words, for example ‘House’ is a holonomy of ‘door’ and ‘window’ and ‘capital of the Netherlands’ is a holonomy for ‘Amsterdam’. This database should therefore be expanded with places and their relationship to other places.

When making these kind of databases, containing information about numbers, specific functions and holonyms, one should be able to detect a larger amount of contradictions.

(24)

(25)

[1] Daniel Cer. Aligning semantic graphs for textual inference and machine reading. [2] Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. Recognizing textual

entail-ment: Rational, evaluation and approaches–erratum. Natural Language Engineering, 16(01):105–105, 2010.

[3] Marie-Catherine De Marneffe, Anna N Rafferty, and Christopher D Manning. Finding contradictions in text. In ACL, volume 8, pages 1039–1047, 2008.

[4] Ravindra Koggalage and Saman Halgamuge. Reducing the number of training samples for fast support vector machine classification. Neural Information Processing-Letters and Reviews, 2(3):57–65, 2004.

[5] Omer Levy and Yoav Goldberg. Dependency-based word embeddings. In ACL (2), pages 302–308, 2014.

[6] Bill MacCartney, Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, and Christopher D Manning. Learning to recognize features of valid textual entailments. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 41–48. Association for Computational Linguistics, 2006.

[7] Alan Ritter, Doug Downey, Stephen Soderland, and Oren Etzioni. It’s a contradiction— no, it’s not: a case study using functional relations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 11–20. Association for Computational Linguistics, 2008.

[8] spaCy. spacy css demo, 2015. [Online; accessed 15-May-2016]. [9] spaCy. spacy documentation, 2015. [Online; accessed 15-May-2016].