Automatic acquisition of qualia roles

(1)

Master thesis:

Automatic acquisition of qualia roles

using an Italian semantically annotated corpus

by

Marco Trovato Mascali c

Supervisor: Gosse Bouma

European Master of Language and Communication Technologies

University of Groningen University of Nancy 2

August 2011

(2)

Declaration

I hereby confirm that the thesis presented here is my own work, with all assistance

acknowledged.

(3)

Abstract

This thesis investigates and improves a machine learning approach which permits to automatically recognize agentive and telic roles for nouns from a large CoNLL parsed Italian corpus.

Following the work of Yamada and Baldwin [25] I use a token-level supervised

classifier to dynamically discover for each noun of a set a number of verbs considered

as being the agentive or telic role of that noun. Furthermore, after the creation of

three semantically different training sets, corresponding to three subclasses of the

SimpleOWL ontology (Location, Artifact and LivingEntity), I run two different ex-

periments to evaluate whether selecting positive and negative instances among those

belonging to the same semantic group can improve the results or not and to under-

stand whether semantic features are useful for this goal. The research lies on the

assumption of the compositional nature of natural language, as described in the Gen-

erative Lexicon (GL) theory by Pustejovski, and in particular on the importance of

the so called qualia roles. Lexica based on GL can be used in many different com-

putational linguistic tasks such as Question answering and Textual Entailment, but

they are extremely time consuming and expensive to develop manually.

(4)

Chapter 1 Introduction

One of the advantages of the Generative Lexicon theory is that it addresses the main issue of semantic knowledge: in fact, while lexical semantic knowledge appears to be extremely various and irregular, a computable and unambiguous knowledge is necessary for Natural Language Processing. Pustejovsky’s main idea was that lexical semantic knowledge can be formalized through a decompositional process which led him to identify four basic “oppositions” [16], called qualia roles. Each of these roles is a feature consisting of a logical predicate. For any given lexical unit there can be maximally 4 roles (Often some roles are not typical for a semantic class):

Formal : the class the entity belongs to.

Constitutive : relations about the internal constitution of an entity.

Agentive : the origin of the entity or the way this was created.

Telic : the typical purpose or function of the entity.

(7)

Generalizing a little we can say that through the qualia roles a noun can be related to other nouns by the means of classical relations such as hyperonymy, and to verbs specifying its origins and its typical functions. For example the noun “beer” would have the following qualia roles:

Formal : beverage Constitutive : barley Agentive : to produce Telic : to drink

In this thesis I describe an approach to improve a method for automatically ac- quiring Agentive and Telic roles for nouns from an Italian CoNLL parsed and seman- tically annotated corpus. Similarly to Yamada and Baldwin [25] I use a token-level supervised classifier to achieve my goal. Besides their work, I also use more com- plex features for the vector and I split the training data in three subsets according to the semantic classification of each noun in the Simple OWL ontology. The main hypothesis behind my research is that some semantic background knowledge and a semantically annotated corpus can improve the overall system. To evaluate the re- sults I use a manually annotated Gold Standard consisting of 1200 noun-verb pairs indicating agentive and telic roles and a grade describing their level of “relation”.

1.1 Motivation

Despite the fact that the influence of the Generative Lexicon (GL) theory by Puste-

(8)

are not yet very developed and common. This is mainly due to the fact that creating this kind of databases is extremely expensive and time consuming for researchers.

Furthermore, the majority of studies have focused especially on the acquisition of other kind of relations such as Constitutive (the internal constitution of an entity) and Formal (the so called “is-a” relation) roles, neglecting the importance of Telic and Agentive roles which can be used, for example, for the interpretation of logi- cal metonymy [12]. For example, such kind of information can be useful for the interpretation of the following sentences:

(1) Beatrix enjoyed the book.

(2) Beatrix enjoyed the cake.

(1) and (2) are almost identical and syntactically equal, but the (possible) implicit verbs are clearly different:

(1a) Beatrix enjoyed reading the book.

(2a) Beatrix enjoyed eating the cake.

The interpretation of these sentences presupposes a deep knowledge of the seman-

tics of the two entities “book” and “cake”. Going a bit deeper into the analysis of

this example it’s possible to state that once we know the characteristics of an entity,

we can probably use them for other entities belonging to the same semantic class. So,

for example the interpretation of (2) would be similar if substituting “cake” with an

other noun classified as “Food”. Having a large lexicon describing Telic and Agentive

roles can help the automatic interpretation of such cases. More generally, GL lexicon

(9)

databases can be extremely useful for a variety of different tasks, from anaphora res- olution to machine translation [14]. Research dedicated to the automatic acquisition of qualia roles have usually led to modest results, so much has still to be done.

1.2 Goals

The goal of my thesis is to improve the results obtained by other works on automatic discovery of qualia roles. In particular I want to focus on the importance of filtering the training data and split it in different semantic classes. Furthermore I expect that using features of a semantically annotated corpus should help achieving the goal.

1.3 Outline

In chapter 2 I will talk about some related works, in particular I will introduce two

already existing lexical semantic databases annotated following the GL theory, then

I’ll compare other studies on the automatic acquisition of qualia roles. Chapter 3 will

be dedicated to the description of the resources I need for my research, which are the

Simple OWL ontology, an OWL resource based on the Simple model [13], ItalWordnet,

the Italian version of the more famous Wordnet developed at Princeton University and

finally wikiCoNLL, an Italian CoNLL parsed and semantically annotated corpus ex-

tracted from Wikipedia (http://wacky.sslmit.unibo.it/doku.php?id=download). As I

need a group of semantically annotated nouns for collecting different training data,

in chapter 4 I will explain how I collected them mapping the Top Ontology of Ital-

Wordnet to the Simple OWL ontology. The topic of chapter 5 will be a description

(10)

and an analysis of the manually built Gold Standards. Chapter 6 is the main part

of the thesis: it contains a description of the features used and of the training data

and it continues with describing the classification algorithms used. In chapter 7 I’ll

show the results of the classification methods and finally I’ll present the conclusions

in chapter 8.

(11)

Chapter 2 Related works

2.1 Existing lexical databases

As already announced in the introductory chapter, there are few available lexical databases based on GL. All of them have been manually built through introspec- tion thanks to the effort and the time that some researchers have dedicated to the goal. One of the first project in this field was the Parole-Simple lexicon ([13]). Spon- sored by the European Union, Simple’s main purpose was to develop “wide-coverage- harmonized computational semantic lexica for 12 European languages” [22]. Parole was instead dedicated to the annotation of the morpho-syntactic layer. The combina- tion of these two projects led to the Parole-Simple lexicon, composed of 10.000 word meanings for each language annotated according to a language-independent ontology of semantic types and morphological and syntactic features specific to each language.

A sample of the final lexicon is publicly available at http://www.ub.edu/gilcub/SIMPLE/simple.html.

Unfortunately the entire Italian lexicon is not free.

(12)

Following the Simple specifications, Pustejovsky and other researchers [17] have developed a large generative lexicon ontology and a dictionary in English to be used for computational tasks, the Brandeis Semantic Ontology (BSO). It’s an ongoing [10]

project containing 40.000 lexical entries and 3.700 ontological nodes. As suggested in Simple, the ontology is divided into 3 major types: events, entities and properties.

Although the qualias contained in the BSO are always correct (remember they are the result of introspection), after an evaluation performed comparing it with the British National Corpus (BNC) and ConceptNet [15] by Havasi [10], a lack of the qualias structures used in everyday language was highlighted. The main differences between Simple and BSO are the size, which in the second is much larger than in any lexicon of any language of the Simple project, and the lack of other parallel lexicons of the BSO.

2.2 Qualia’s acquisition techniques

The problem of automatic acquisition of qualia is not new. Some researchers have already tried during the years to find a good method to extract this information.

Cimiano and Wenderoth [8] have suggested a method based on Hearst patterns [11].

Following the assumption that some semantic relations (qualia roles in this case) are

related to the syntactical structure of the sentence, Cimiano and Wenderoth manually

created for each role a set of queries (called clues), indicating the relation of interest,

which are connected to a regular expression. For example π(t) are used to is one of the

four clues built for the telic role, where π(t) is a variable standing for the term, and

which is connected to the regular expression (A |a|An|an)NP 0is{V BZ}used{V BN}

(13)

to {T O}P URP . Afterwards they sent queries like these to Google API and then they downloaded and part-of-speech tagged the snippets of the first 10 Google hits matching the clue. Finally they matched the regular expressions related to the clue and weighted the qualia candidates according to the Jaccard coefficient.

An other interesting work has been done by Claveau et al [9]. In their paper they describe a method to automatically acquire qualia roles for noun-verb pairs from a se- mantically and POS tagged French corpus using inductive logic programming without any predefined patterns. This method has been motivated by the linguistic assump- tion that knowing a priori all the possible patterns describing a certain qualia role is not an easy (or even possible) task. For achieving the goal they extracted noun-verb pairs from a French semantically and syntactically annotated corpus including some background knowledge (position and distance of the terms) and manually classified them as positive or negative examples of qualia relations. The inductive logic pro- gramming step did the rest. In this way they obtained not only the qualias for a given noun, but also the rules explaining how noun-verb pairs having a qualia relation are different from those lacking the relation.

A completely different approach is proposed by Amaro et al [7]. In their study

they highlighted the similarities between the Wordnets and the qualia information

noticing how, for example, the hyperonymy/hyponymy relation refers to the formal

qualia role and the cause relations to the agentive qualia role. Given this as a basis,

they presented a table of links between the lexical relations of the Portuguese Wordnet

and the qualia roles. Although this is not a method for acquiring qualias from a corpus

it is still a cheap way to obtain a GL based lexicon by a simple conversion from the

(14)

Finally Yamada and Baldwin [25] proposed two techniques for discovering a ranked

list of agentive and telic roles for noun-verb pairs. The first one, similarly to Cimiano,

is based on fixed templates while the second is a machine learning method using as

feature vector the dependency relations of the noun-verb pairs and their contexts. In

my thesis I will follow the idea of this second technique and I will try to improve it.

(15)

Chapter 3 Resources

3.1 The Simple OWL ontology

The Simple-OWL [22] is a Generative Lexicon based ontology imported from a pre- existing computational linguistic lexicon called Parole-Simple-CLIPS (the Italian sec- tion derived from the European Simple project [13]) and converted into the W3C standard. Thanks to this tool different word senses (encoded as semantic units in Simple-OWL) can be identified and discriminated from other word senses having the same lexical item. The Simple-OWL hierarchy, exactly as in Simple, consists of 153 language independent types (OWL classes) built on the influence of Pustejovsky’s GL model. In fact the top classes of the ontology (Figure 3.1) are Entity, Agen- tive, Constitutive and Telic. The Formal qualia role is instead modelled trough the owl:subclassOf relation.

OWL object properties and data type properties are used in Simple-OWL respec-

tively to link two semantic units and to link a semantic unit to a value within a closed

(16)

Figure 3.1: Simple OWL top hierarchy

range. Each class, delimited by the constraints and conditions created by object and datatype properties, works like a template guiding the encoding of the semantic units.

The OWL file containing the hierarchy and the attributes is freely available under request. Unfortunately the lexicon is not included and not free.

3.2 ItalWordnet

ItalWordnet [20] is a lexical-semantic database created following the trail of two

other projects called EWN (EuroWordNet) and SI-TAL (Sistema Integrato per il

Trattamento Automatico del Linguaggio). Similarly to the Princeton Wordnet, the

IWN is structured in synsets (synonym sets) containing word senses related through

a partial synonymy relation (two lexemes are partial synonyms if they have the same

connotation and can be interchanged in a context). It contains around 67.000 word

senses, 50.000 lemmas and 130.000 semantic relations such as for example metonymy,

antonymy, hyponimy, etc. Each synset is assigned to one or more of the 65 classes

of the Top Ontology, a language independent hierarchy of concepts indicating fun-

(17)

damental semantic differences (Artifact, Natural, Dynamic, Static, etc.) which are considered as being common to all the languages. So for example the word sense

‘tavolo’ table is an instance of a synset which is classified according to the Top Con- cepts (TCs) as [Artifact, Object, Furniture]. These TCs are distributed in three categories depending on their semantic aspects without using any kind of distinctions in terms of part of speech. The first category (1stOrderEntity)(Figure 3.2) was con- ceived following the Qualia roles by Pustejovsky and it contains only concrete nouns;

the second (2ndOrderEntity) was partially influenced by the Aktionsart [23] and con- tains nouns, verbs, and adjectives referring to properties, events, states or processes;

finally in the third category (3rdOrderEntity) we find abstract nouns.

3.3 A semantically annotated Italian corpus

The semantically and syntactically annotated Italian Wikipedia, which can be down- loaded in the Wacky project web page [1] is a corpus (at the moment of the download in February 2011) of about 200 million tokens and more than 10 million sentences in CoNLL format built by the University of Pisa. The extraction of the corpus and the annotation were automatically performed and include information covering morpho- syntactic, syntactic and semantic fields of linguistics. Each sentence is separated from the others by a blank line. Each line of the corpus consists of 12 fields separated by a tab character and indicating the following information:

1. ID: It consists of a number starting from 1 at the beginning of each new sentence

2. Word form: a word form or a punctuation

(18)

Figure 3.2: First Order Entities of ItalWordnet

(19)

3. Lemma: lemma or punctuation

4. CPOSTAG: coarse-grained part of speech tags

5. POSTAG: fine-grained part of speech tags

6. Features: morpho-syntactic features like gender, number and person

7. Head: 0 (if it’s the root) or the ID of the parent node

8. DEPREL: dependency relation

9. PHEAD: empty

10. PDEPREL: empty

11. empty

12. Semantic label

The tools used to obtain this format are described in the website of the University

of Pisa [2]. In particular, the list of the semantic tags used to annotate adverbs, verbs,

nouns and adjective by their SuperSense tagger is available in [3].

(20)

Chapter 4 Mapping Simple OWL and ItalWordnet

4.1 Motivation

The goal of this chapter is creating a lexicon of word senses divided into classes corresponding to the Concrete Simple-OWL classes (see Figure 4.1) to be used for the training corpus of my machine learning method. Lacking the Simple lexicon, I will show how I obtained a similar lexicon by using and modifying an already existing mapping of the two concept ontologies of Simple-OWL and ItalWordnet. Results and remarks will be discussed in the final section of this chapter.

4.2 Mapping

As I mentioned in the previous chapter, unfortunately the Simple lexicon is not freely

available, while the ItalWordnet is, for thesis purposes. Although these projects have

(21)

Figure 4.1: Concrete Simple OWL classes

(22)

different approaches and goals, they can still be related and gain knowledge from one another. In fact a manual mapping from SimpleOWL to ItalWordnet (IWN) has already been performed by Nilda Ruimy [19] [21] [18]. Following the Simple-OWL structure, a semantic unit can be assigned to only one semantic type; on the other side IWN allows a cross classification along multiple Top Concepts. In spite of this evident difference and the fact that we are comparing semantic units (in Simple) to word senses in synsets (in IWN), it is possible to consider their semantic classification as equivalent given the fact that these are both based on the GL theory. In her paper Ruimy describes the similarities of the two lexica and maps the top ontologies of Simple and IWN analysing their shared words (semantic units in Simple, variants of a synset in IWN). Despite the unpredictable discrepancies she finds, which are usually caused by an incomplete encoding in IWN or differences in the granularity of the senses of the two models, a substantial overlap is evidenced. The list of the mappings she obtained indicates for every single Simple type the possible corresponding IWN type (or intersection of types). A sample is shown in Table 4.1.

Ruimy’s work is extremely useful for my goal. Having both lexicons, she performed a linking method based on the shared words. As the mapping is already done, it’s now easier to transfer the IWN lexemes to the appropriate Simple categories. Nevertheless, a few problems arise here:

• Ruimy’s mapping [4] is directed from Simple to IWN, while I need the reverse mapping

• The mapping is not 1 to 1

• Reversing it is not straightforward, in fact a group of IWN types can correspond

(23)

SimpleOWL types ItalWordnet Top Concepts (intersection) Location Solid Part Place

Location Relation 3DLocation Place Solid

Place Natural Liquid Part GeopoliticalLocation Part Place

Area Part Place Solid

Opening Location Relation

Place Part

Table 4.1: Mapping SimpleOWL concepts to IWN performed by Ruimy to more than one Simple type (ex. in Table 4.1 “Place u Part” (IWN) is linked to both “Opening” and “GeopoliticalLocation”).

A way to overcome these obstacles is using the power of Formal Concept Analysis (FCA) treating Simple-OWL classes as objects and IWN classes as attributes. In all those cases in which a Simple-OWL type is linked to more than one group of IWN types I split the original Simple class in as many other classes as the number of the mappings to avoid any kind of information loss. The example in Table 4.2 represents the last 3 mappings of Table 4.1. The classes on the rows belong to the Simple-OWL hierarchy, while those on the columns to IWN. The class “Opening” is so split in two parts, corresponding to the 2 possible mappings Ruimy found.

Following these guidelines I built an FCA with 202 objects and 56 attributes.

Using Galicia, a software created by the University of Montreal [5] I automatically

(24)

SimpleOWL \ IWN Place Part Solid Location Relation GeopoliticalLocation X X

Area X X X

Opening X X

Opening2 X X

Table 4.2: Intent and Extent example

mation. A section of the lattice is shown in Figure 4.2.

Figure 4.2: A section of the lattice created

The concepts of the lattice obtained can be divided into three main groups of analysis:

1. Concepts having as intent a certain number of attributes and only one object

as extent. For example the concept 81 (Figure 4.2) has as intent “Container u

Object” and extent “Container”. This is the evidence that those words classified

according to the IWN ontology as “Container u Object” can be mapped to

Simple-OWL “Container” class. In the lattice there are 58 of such similar

concepts; among these we also find concepts sharing an intent which I had

artificially split (Table 4.3).

(25)

Intent (Simple-OWL types) Extent (intersection of IWN types) NaturalSubstance Solid u Liquid

NaturalSubstance2 Liquid u Substance NaturalSubstance3 Gas u Substance

Table 4.3: NaturalSubstance split in 3 classes

This is due to a finer-grained ontological classification in IWN. (see annexes 1:

mapping 1 to 1).

2. Concepts having as intent a certain number of attributes and more than one object as extent belonging to a common Simple-OWL superclass. Many times the IWN top ontology is not specific enough to be linked to a Simple-OWL leaf-class. Anyway a mapping is still possible if climbing up the Simple-OWL ontological hierarchy a shared parent node is found (see Table 4.4).

Intent Extent Common Simple-OWL

(Simple-OWL type) (Inters. of IWN types) parent node

EarthAnimal, Animal u Object Animal

AirAnimal, WaterAnimal

Table 4.4: Example of a common Simple-OWL parent node

In 4.4 we see how the three Simple-OWL types called EarthAnimal, AirAnimal

and WaterAnimal have the same IWN extent and at the same time they share a

common parent node Animal. Surely this kind of mapping make us loose some

(26)

information but it can still be considered a good approximation. Reversing the situation I discussed in the previous paragraph, in these cases the Simple-OWL ontology has a finer classification. That’s the case for 21 concepts. (see annexes 2: mapping with shared parent node).

3. All the other concepts not belonging to the two previous groups. These concepts are not useful for my purpose. In fact apart of the top and bottom concepts, which are characterized respectively by the lack of intension and extension, they don’t specify any extent which refers either to a single Simple-OWL class or to a parent node of the hierarchy of the ontology.

4.3 Remarks and results

As already noticed by Ruimy, the mapping from one resource to the other appear much more challenging when dealing with IWN 2nd Order Entities. This is probably due to an extremely fine-grained ontology under the classes Events and Properties in Simple-OWL and to an intrinsic complexity in the semantic encoding of word senses falling in these types.

In this study I will focus my analysis on the Simple-OWL hierarchy types belong- ing to by the supertype ConcreteEntity (Figure 4.1) for at least two reasons: first because there is a more detailed mapping from IWN to Simple-OWL under this class;

the second and more important reason is that Agentive and Telic qualia roles are less problematic to annotate for concrete entities than for events.

After these considerations, I leave out all the mappings other than those under

the ConcreteEntity superclass. With the help of a python program I selected all the

(27)

synsets marked as ‘N’ (nouns) and encoded in IWN with the set of types I need to map, then I extracted from the synsets all the variants (word senses) contained and automatically assigned them the proper Simple-OWL class. First I run the program for the first group of concepts (those with 1 to 1 mappings) obtaining 5635 word senses classified. Similarly I apply the same method for the second group of concepts (those sharing a common parent node) obtaining 14501 word senses classified. As I expected a small subset of the first group of word senses obtained was also present in the second group. The reason is that in a few cases one of the concepts belonging to the first group was subsumed by a concept of the second group as shown in Figure 4.3.

Figure 4.3: Example of a subsumed concept

The overlapping was solved simply by eliminating the word senses which were

repeated twice from the second group. This reduced the total amount of word senses

mapped from 20136 to 20002.

(28)

Chapter 5 Creating the Gold Standard

In this chapter I will introduce the procedure I followed to create three semantically different Gold Standards, randomly selecting a total amount of 60 nouns from the 3 main SimpleOWL classes belonging to the Concrete SimpleOWL superclass.

5.1 Preparing the data

The data was divided into three groups, corresponding to the main Simple-OWL classes under the superclass ConcreteEntity: Location, LivingEntity and Artifacts.

I collected the semantic units obtained after the mapping (see chapter 4) separately

for each group. The result of this process was 7257 living entities, 2004 locations

and 3740 artifacts. Notice that even though I could have chosen more specific classes

when available (sometimes it was not the case because of the different granularity of

IWN and SimpleOWL; see chapter 4) I preferred not to especially because I wanted

to obtain 3 different semantic groups of nouns not too difficult to annotate manually

and general enough to be found in a non specific-domain corpus. After this choice I

(29)

randomly took 20 nouns from each group which were linked to at least 20 different verbs (deliberately skipping copulas and auxiliaries) by a dependency relation in the wiki CoNLL corpus. This choice was done to avoid the inclusion of nouns which were extremely rare in the corpus. For each noun 20 verbs were extracted from the wiki CoNLL corpus. In this group of verbs at least 2 were chosen as representative of Agentive and Telic roles from the literature (if available) or directly by me. All the other verbs were selected randomly. For example, for the noun violino (violin), belonging to the semantic class “Artifact”, the following verbs were selected:

Suonare (play)

Accordare (tune)

Costruire (build)

Divenire (become)

Impugnare (hold, draw)

Includere (include)

Abbandonare (give up)

Sfoderare (unsheathe)

Introdurre (insert, put)

Eseguire (execute, play)

Entrare (enter)

(30)

Produrre (produce)

Comprendere (understand)

Insegnare (teach)

Aggiungere (add)

Strimpellare (to strum)

Scomparire (to disappear)

Mantenere (to mantain)

Ispirare (to inspire)

Dedicare (to dedicate)

The final data was thus formed by 20 nouns × 20 verbs × 3 semantic groups (Artifacts, LivingEntity, Location), which makes a total of 1200 noun-verb pairs. It has to be noticed that the same data is used to annotate both Telic and Agentive roles.

5.2 Instruction to the annotators

After having selected the noun-verb couples to annotate, two volunteers were asked to perform the annotation separately for each semantic group on noun-verbs. In other words they were asked to annotate 3 data sets composed of 400 noun-verb pairs each.

Before starting, I proposed them a short review of the GL theory. The tasks were 2:

1. Judging with a mark the degree of correlation for each noun-verb couple.

(31)

2. Deciding for each noun-verb pair if the verb should be classified as an Agentive or a Telic role for that noun.

The marks were based on a numeric scale from 0 to 6. Each degree of relation was described as follows:

0 No relation at all. Not even in any imaginable metaphorical sense.

1 Extremely poor relation.

2 Poor relation.

3 Not clear.

4 Possible Case.

5 Very common case of qualia role (either Agentive or Telic).

6 Typical case of qualia role (either Agentive or Telic).

After having judged the degree of correlation the annotators were asked to decide what kind of relation was linking each noun-verb pair selecting among three options:

Agentive, Telic or nothing for those cases in which making a clear decision was difficult or simply when none of the two roles was suitable for that noun-verb pair. If a noun had more than one meaning, a definition for the disambiguation were given to the annotators. All the instructions were given in Italian.

5.3 Profile of the annotators

Two adult Italian native speaker graduated students of Communication Sciences and

(32)

5.4 Analysis of the Gold Standard

The result of the annotation process is a collection of 1200 noun-verb pairs annotated with a degree and a qualia role divided into three semantic classes. For example, the first annotator judged 5 of the 20 noun-verb pairs for the noun violino, belonging to the Gold Standard for Artifacts, as follows:

Violino Accordare 4 T Violino Costruire 6 A Violino Divenire 0 _ Violino Impugnare 4 T Violino Includere 0 _

where the first two words are the noun-verb pairs the annotators was given, the third entry is the degree of relation (from 0 to 6) and the last entry is the qualia role he/she assigned to the pair, which can be T (for Telic), A (for Agentive) or un underscore (for none of the two roles or for difficult decisions).

To combine the different version of the two annotators for each of the three sets I simply calculate the average of the grades the 2 annotators gave to each noun-verb pair and evaluate their agreement on the qualia roles. For example, the final Gold Standard for 5 of the 20 noun-verb pairs for the noun violino (same pairs of the previous example) is:

Violino Accordare 5.0 TT

Violino Costruire 6.0 AA

Violino Divenire 0.5 _A

(33)

Violino Provare 5.0 TT Violino Disdegnare 1.5 _T

This time the correlation is expressed by an average and the qualia role can be expressed by an agreement (both annotators judged the verb either as telic or agentive or none) as in the first, the second and the fourth noun-verb pairs, or a disagreement.

A few statistics can help understanding the data obtained after this process. In table 5.1 I discretized the correlation scores (indicating the goodness of a noun-verb pair) in three categories: low scores going from 0 to 1.5, medium scores from 2 to 4 and high score from 4.5 to 6. Notice that at the moment I’m not considering their agreement on the qualia role.

0 - 1.5 2 - 4 4.5 - 6 Tot

Artifact 147 123 130 400

Location 118 156 126 400

Living-entity 251 129 20 400

Table 5.1: Scores of the 3 Gold Standards

Now (see table 5.2) analysing only the high scores (those between 4.5 and 6), we can

see how many of these have been annotated with a perfect Agentive (AA) or Telic

(TT) agreement. The last column is instead showing the high scoring noun-verb pairs

having a disagreement on the qualia role.

(34)

Tot. high (4.5 - 6) AA TT disagreement

Artifact 130 28 64 38

Location 126 36 63 27

Living-entity 20 6 7 7

Table 5.2: High scores analysis

In table 5.3 are shown the mean and the variance (without making any selection according to the qualia role) for each of the 3 Gold Standards (one for each semantic class).

Mean Variance Artifact 2.797 4.397 Location 3.012 3.297 Living-entity 1.594 2.233

Table 5.3: Mean and Variance of the 3 Gold Standards

This result shows that the annotators were judging the 3 Gold Standards in dif-

ferent ways. In particular, the variance for Artifact is higher than the other ones,

which means that a more clear-cut decision was made for this semantic class when

choosing a degree of correlation of noun-verb pairs. In other words, as the variance

is a measure describing how far a set of numbers is from the mean, having a high

variance here suggests that the annotators were often giving either a high or a low

mark to the n-v relation. Much more importantly, the Living-entity gold standard

(35)

show that, as expected, it was very difficult to annotate and led to very few Agen-

tive and Telic roles. This was not surprising because the literature I used to obtain

the Agentive and Telic roles for the nouns, while building the data to be annotated,

was extremely poor when referring to Living entities. For these reasons from now on

I will use only the Gold Standards for Artifact and Location, while neglecting the

Living-entity’s data.

(36)

Chapter 6 The machine learning method

This chapter is dedicated to the main part of the thesis. In the first section I’ll describe the features extracted from the Italian CoNLL parsed Italian Wikipedia corpus (see Chapter 3.3), than I’ll continue describing the procedure to obtain and balance the training corpus. The last part instead is focused on the machine learning method and on the selection of the features to use.

6.1 Features

For each sentence of the Italian CoNLL corpus containing a noun-verb pair present

in both Gold Standards (remember I’m not using the Living-entity Gold Standard),

I extracted and then converted into csv (comma separated values) a set of features

including the dependency relation linking the noun-verb pair, the part of speech of

the two terms, the semantic tags they were given and finally all the other dependency

relations in which they were involved. So for example, the sentence “In Italia si

costrui’ il primo violino.” (The first violin was built in Italy) which includes the

(37)

noun-verb pair “violin-build”, part of the Gold Standard for artifacts, was represented in the CoNLL corpus ¹ as:

1 In in EA 4 comp_loc O

2 Italia Italia SP 1 mod B-noun.location

3 si si PC 4 clit O

4 costrui’ costruire V 0 ROOT B-verb.cognition

5 il il RD 7 det O

6 primo primo NO 7 mod O

7 violino violino S 4 subj B-noun.artifact

8 . . FS 1 con O

For the noun-verb pair violino-costruire in this sentence my python program ex- tracted the following features:

POS of the Noun: S

POS of the Verb: V

Dependency Relation of the Noun: subj

Dependency Relation of the Verb: ROOT

Semantic label of the Noun: B-noun.artifact

Semantic label of the Verb: B-verb.cognition

1

Notice that in this examples I reduced the CoNLL fields from 12 to 7 (ID, Word form, Lemma,

POSTAG, Head, DepRel, Semantic Label), which are those useful for the features. See Chapter 3.3

for more details

(38)

Dependency environment of the Noun: det, mod

Dependency environment of the Verb: comp loc, clit

The last among these features (the dependency environment) needs a little ex- planation. In fact it is actually represented as a list of features expressing the pres- ence/absence of a dependency relation for each one of the 26 dependency relations of the CoNLL tagset. So in the previous example, for the noun violino only the dependency relations det and mod are present, while the remaining 24 are tagged as absent. Instead, those (rare) cases in which a single dependency relation is present more than one time are simply marked as present. This choice was done to avoid the creation of long and almost unique features which would have decreased the machine learning performances. So, at the end of this process I will have 2 sets of instances (one for each semantic class) in csv, each instance being represented like:

796_violino_costruire,S,subj,B-noun.artifact,V,Root,B-verb.cognition,no,yes,no,no..

where the first element is the ID formed by the number of the sentence extracted and the noun-verb pair. Then the actual features follow: the second, third and forth are respectively the POS, the dependency relation and the semantic label of the Noun.

Similarly 5 ^th , 6 ^th and 7 ^th are the same features for the Verb. Finally 26 features for the Noun and other 26 features for the Verb (only 4 of the total 52 are shown in the example) indicate with a yes or a no the dependency environment (as explained previously).

6.2 Creating the training data

Now that I extracted all the necessary features I need to refine the data to obtain pos-

itive and negative instances for agentive and telic roles, following the Gold Standard

(39)

indications. So, for each instance extracted from the CoNLL corpus in csv format, I attached the corresponding grade of correlation (in average) and the agreement on the role as expressed in the Gold Standard for that specific noun-verb pair. To give an example, the instance described in the last paragraph of the previous section would become:

796_violino_costruire,S,subj,B-noun.artifact,V,Root,B-verb.cognition,no,.. 6.0,TT

where the last 2 features are taken from the Gold Standard Artifact, for the noun- verb pair “violino-costruire”. Finally I can process these data to obtain the final training data I’ll use for the machine learning experiments. The steps I’ll follow are 3: Balancing, Positive or Negative, Agentive or Telic.

Balancing: in table 6.1 I compare the mean (on the noun-verb correlation) of both Gold Standards with the mean of the two extracted sets of instances. It’s easy to notice how the results of the extracted instances are higher than those of the respective Gold Standards.

Gold Standard grade mean Extracted instances mean

Artifact 2.797 5.021

Location 3.012 3.9

Table 6.1: Grade’s mean of Gold Standard and extracted instances (scale 0-6)

This means that in the corpus the frequency of the noun-verb pairs which were

(40)

from 0 to 6) is higher than the frequency of the pairs which scored less. Going a little deeper, I discovered that few noun-verb pairs with a high degree of correlation were dominating the extracted instances; for example, the noun-verb pair libro-scrivere (book-write), which scored 6.0, was appearing 2761 times in the extracted instances for the Gold Standard Artifacts on a total amount of 14944 extracted instances for that semantic class. To avoid a training set not at all conform to the Gold Standard I eliminated all those instances containing the same noun-verb pair after I had seen it 100 times. Although this operation could seem a little artificial, it’s useful to reduce the importance of the noun- verb pairs which were dominating the set of instances extracted. At the end of this process the number of instances for both sets was reduced consistently, as shown in table 6.2.

Num instances before Num instances after

Artifact 14944 6102

Location 9048 6449

Table 6.2: Number of instances before and after Balancing

Positive or Negative: The goal of my machine learning approach is to automat-

ically understand if an noun-verb pair represents a good example of Telic or

Agentive role, or if it doesn’t. At this step I extracted from the balanced sets

of instances only those ones which clearly represented a good or a bad example

(41)

of qualia role, leaving out the instances extracted from noun-verb pairs which revealed problematic and uncertain even for the annotators (notice that I’m still not considering the difference between Agentive and Telic role. This will be done in the next paragraph). To achieve this, I selected as negative ex- amples the instances having a grade between 0 and 1.5, and as positive those between 4.5 and 6, eliminating all the instances which scored between 2 and 4.

This operation reduced the number of instances of each set (see table 6.3) less dramatically than the previous operation.

Num instances before Num instances after

Artifact 6102 4540

Location 6449 4753

Table 6.3: Number of instances before and after Pos or Neg

Agentive and Telic: This last step finally takes into account the differences be-

tween Telic and Agentive role. Each of the two sets of instances (Artifact

and Location), reduced by the operations called “Balancing” and “Positive or

Negative”, is here split into two parts: one containing positive and negative

instances for the Agentive role and the other one containing positive and neg-

ative instances for the Telic role. Only the instances with perfect agreement

on the role (both of the judges marked the noun-verb pair as being Agentive

or Telic), and obviously with a grade bigger than 4.5 (as said before), were

(42)

instances which scored less than 1.5 (notice that these are common for both Agentive and Telic) and the positive ones of the opposite role. So, for example, the Artifact-Agentive set will have as negative instances: the low scoring ones and the positive instances of the Artifact-Telic set. The result of this last step is 4 sets of positive and negative instances: 2 sets (one per role) for each of the 2 semantic classes ² (table 6.4).

Positive instances Negative instances

Artifact-Agentive 723 2882

Artifact-Telic 1897 1708

Location-Agentive 1189 2816

Location-Telic 1201 2804

Table 6.4: Pos and Neg instances for each training set

This 4 sets are the final training data I will use in the machine learning experi- ments.

6.3 Experiments

On the training data obtained I run 2 different classifiers using the Weka software [6] to discover the algorithm giving the best accuracy: a NaiveBayes classifier and a decision tree algorithm. The choice of a NaiveBayes and a decision tree (I’ll use an

2

Remember that I’m using only the Artifact and the Location Gold Standards

(43)

implementation of Weka called J48) is motivated by the fact that these two techniques are usually considered good solutions when working with only nominal attributes [24], as in this case. At the same time these algorithms deeply differ in the way they work.

In fact the NaiveBayes classifier is based on the assumption that each feature gives its own contribution to the final probability independently of the presence and the importance of the other features. Let’s start from the Bayes’s rule of conditional probability, which states:

P [H |E] = ^{P [E} ^{|H]×P [H]} _{P [E]}

Where P [H |E] is the conditional probability of H (hypothesis) knowing E (evi- dence), P [E |H] is the conditional probability of E knowing H, and P [E] and P [H]

are the prior probability respectively of E and H. Now, according to the “naive”

conditional independence assumption each feature E _n is conditionally independent of every other feature E _m for n 6= m. Which means that, for example, the probability of an instance of my data of being, say, Agentive according to the Naive Bayes rule would be:

P [Agentive |E] = ^{P [E}

¹

|Agentive]×P [E

2

|Agentive]×P [E

ⁿ

|Agentive]×P [Agentive]

P [E]

Where P [E ₁ |Agentive]...P [E n |Agentive] represent the (independent) probabilities of each feature co-occurs with an Agentive role. The J48 decision tree algorithm instead recursively identifies the attribute to split (among its values) according to its ability to reveal more about the data instance. This is commonly called information gain and it’s defined as:

− H(X|Y )

(44)

where H(X) is the entropy of the training examples being, say, Agentive or not, and H(X |Y ) is the entropy of the training data knowing the value of an attribute Y.

So, the algorithm chooses the attribute having the highest information gain and split the choices among the values of that attribute. Than it repeats the same operations recursively until it reaches the end nodes, which in this case are represented by pos (instances containing positive examples of Agentive or Telic roles) and neg (instances not containing a qualia role). An example can help understand this. Suppose that a minimized version of the training data is composed by the following instances (table 6.5):

Instances NSEM VDEP NPOS Decision (Agentive)

I ₁ noun.act ROOT N neg

I ₂ noun.artifact ROOT NP pos

I ₃ noun.person subj N neg

I ₄ noun.person ROOT N neg

I 5 noun.act subj NP pos

I ₆ noun.artifact ROOT N pos

Table 6.5: Example of training data

The J48 algorithm will choose among the three attributes of the training data, namely

NSEM (semantic label of the noun), VDEP (dependency relation of the verb) and

NPOS (part of speech of the noun), that one which has the highest Information Gain.

(45)

So, knowing that the entropy of the system is 1 (H(S) = −P (pos) × log 2 P (pos) − P (neg) × log 2 P (neg)), the IG of each attribute will be:

IG(S, NSEM)= H(S) −(|S artif act |/6)×H(S artif act ) −(|S act |/6)×H(S act ) −(|S person |/6)×

H(S _person ) = 1 − (0.33 × 0) − (0.33 × 1) − (0.33 × 0) = 0.67

IG(S, VDEP)= H(S) − (|S ROOT |/6) × H(S ROOT ) − (|S subj |/6) × H(S subj ) = 1 − (0.66 × 1) − (0.33 × 1) = 0.01

IG(S, NPOS)= H(S) − (|S N |/6) × H(S N ) − (|S N P |/6) × H(S N P ) = 1 − (0.66 × 0.81) − (0.33 × 0) = 0.46

So clearly the algorithm will choose NSEM as first node (the root of the tree), because is the attribute with the highest IG. From NSEM the tree will split in 3 branches (the number of values) and the algorithm will continue looking for an other node to split, in the same way we found the root node, until it reaches the leaf nodes.

Figure 6.1: Example of a decision tree

(46)

In image 6.1 we can see how the 2 values of NSEM artifact and person led directly to two decisions (the leaf nodes), respectively pos and neg; while the third attribute led to an other node, VDEP (actually it could have equally led to NPOS as this has the same IG of VDEP).

Besides using the described algorithms, I also run the experiments using different combination of features to evaluate whether the semantic labels contribute to obtain a better accuracy; in other words, I’ll first compare 2 combinations of features (see Chapter 6.1), the complete set of features and all the features without the semantic labels for nouns and verbs, finally I will use a reduced set of features selected according to the “attribute selection” method. While in the first two experiments the choice of the attributes is done only to underline the importance of using (or not) the semantic features, the last one, the so called “attribute selection” method, is a technique to search through all the combinations of attributes in the data to obtain the subset of attributes which works best for prediction. The standard WEKA implementation of this technique uses the “best-first” algorithm as search method and the CfsSubsetEval as evaluation method . The “best-first” is in fact a “heuristic search algorithm which attempts to predict how close the end of a path is to a solution” (wikipedia), where the paths are subsets of the features and the goal is the class. The CfsSubsetEval is instead a correlation-based filter method which gives high scores to those features being good predictor of the class but having a low correlation among each other. In this way the number of features used by the classifier is consistently reduced from the initial 61 to 5-6 features.

The last and most important step is the division of training and test set: in Section

7.1 I’ll use only training and test data belonging to the same semantic class, while

(47)

in Section 7.2 I’ll use mixed training data composed by instances of both semantic

classes. Finally, all the experiments will be performed with the leave-one out cross

validation technique, training on all the nouns and leaving out the noun which has

to be tested until all the nouns will be tested. This is a way to overcome the problem

of the lack of a huge amount of training data and a simple solution to separate in a

clear way training from test data.

(48)

Chapter 7 Evaluation and results

In this chapter I evaluate the results obtained from the different experiments discussed

in chapter 6. I will also try to compare my results to those obtained by Yamada

and Baldwin, although we use different corpora (BNC and WikiCoNLL) different

languages (English and Italian) and different Gold Standards (30 nouns × 50 verbs,

20 nouns × 20 verbs). The accuracy that I’ll report from now on in my results are

obtained by the so-called micro average: in this case the micro average is calculated by

multiplying the accuracy of each noun by the number of instances this noun appears

in, and then dividing the sum of these accuracies by the total number of instances

of the semantic class these names belong to. The advantage of using the micro

average instead of the macro average (classical average) is a more reliable result, in

fact the macro average would have given the same importance to the different noun’s

accuracies even though the number of instances was varying from noun to noun.

(49)

7.1 Training on only one semantic class

This set of experiments uses the nouns belonging to the same semantic class for both training and testing. So, for each of the 20 nouns (of each semantic class) I’ll train on 19 of them and test on the left one.

The accuracy for both classifiers using the full set of features is shown in Table 7.1 while the accuracy using the limited set of features (no semantic labels) is shown in Table 7.2.

NaiveBayes J48 Artifact Agentive 75.79 % 78.31%

Artifact Telic 71.38 % 73.25%

Location Agentive 70.33 % 71.02 % Location Telic 68.84 % 69.71 %

Table 7.1: Accuracy using all features

NaiveBayes J48 Artifact Agentive 70.22 % 72.12 %

Artifact Telic 65.47 % 64.75%

Location Agentive 64.09 % 65.48 % Location Telic 63 % 63.94 %

Table 7.2: Accuracy without semantic features

(50)

Finally in Table 7.3 are shown the accuracies using the reduced set of features obtained after the “attribute selection” method. Notice that the attribute selection method chooses the best set of features from the training sets, and as these are differ- ent for each noun tested (remember that I use training data which is not containing the test noun using cross validation) the set of features chosen can vary. Anyway a few statistics can help understand which features are usually selected from the “at- tribute selection” method. In table 7.4 the frequency of the 10 most used features are shown for all the 80 different training data (20 nouns × 2 roles × 2 semantic classes).

VSEM and NSEM, as we have already seen, represent the semantic tags for nouns and verbs, while the other features indicate the presence/absence of:

N4: noun with a subject as dependent

V22: verb with a clitic as dependent

V4: verb with a subject as dependent

V16: verb with a negation as dependent

N20: noun with an object as dependent

N15: noun with a complement as dependent

V18: verb with a modal as dependent

V10: verb with a predicative complement as dependent

(51)

NaiveBayes J48 Artifact Agentive 75.27 % 79.07 %

Artifact Telic 72.62 % 74.57%

Location Agentive 70.72 % 72.49 % Location Telic 69.98 % 71.55 %

Table 7.3: Accuracy using Attribute selection

VSEM NSEM N4 V22 V4 V16 N20 N15 V18 V10

80 76 70 70 69 67 60 54 53 49

Table 7.4: Frequency of the most common attributes (max is 80: the number of all the training data)

The most chosen set of features by the attribute selection method is composed by 6 features: VSEM, NSEM, N4, V4, V22, V16.

7.2 Training on different semantic classes

While in the previous section I was using both training data and test data of the

same semantic class in this section I will experiment using a “mixed” training corpus

composed as follows: from each semantic class (Artifact and Location) I randomly

selected 10 nouns and then used their respective instances. The new training corpus

is thus formed by instances of both semantic classes. For testing I use, similarly to

the preceding experiments, the cross validation technique. As in the previous section

(52)

I show the results using all the features (Table 7.5), without the semantic features (Table 7.6) and finally using the features chosen by the attribute selection method (Table 7.7).

NaiveBayes J48 Agentive 69.88 % 70.33%

Telic 65.34 % 66.42 %

Table 7.5: Accuracy for mixed data (all features)

NaiveBayes J48 Agentive 64.05 % 66.2%

Telic 60.71 % 63.21 %

Table 7.6: Accuracy for mixed data (no semantic features)

NaiveBayes J48 Agentive 69.94 % 70.4%

Telic 68.38 % 68.34 %

Table 7.7: Accuracy for mixed data (via attribute selection)

(53)

7.3 Summing up and comparing the results

Looking at the accuracies we can make at least 4 statements:

1: J48 performed better than the Naive Bayes classifier

2: Eliminating the semantic features led to worst results

3: Including the semantic class as a feature improves the results

4: The attribute selection method chose (almost always) both semantic features as part of the most predictive set of features.

The accuracies obtained by separating the semantic classes (section 7.1) and mix- ing them (section 7.2) are not directly comparable because the training data are com- posed differently. So, to give an idea of the effectiveness of the two methods (training on the same semantic class and training on a mixed one) in table 7.8 I compare the (micro-averaged) accuracy of the mixed class (section 7.2) to the (micro-averaged) accuracy obtained by selecting from the separated classes (Artifact and Location) the accuracies of the same nouns contained in the mixed class. Notice that the accuracies shown are obtained using the J48 classifier, as it was that one that performed the best.

Yamada and Baldwin [25] obtained an accuracy of 62.9 % for the telic role and

63.9 % on the agentive role. Having in mind that the training data, language and

corpora used are different from mine, we can still try to compare the results. The

(54)

Mixed Artifact + Location

all features no sem. features attr. select. all features no sem. features attr. select.

Agentive 70.33 % 66.2% 70.4 % 74.61 % 68.82 % 75.88 %

Telic 66.42 % 63.21 % 68.34 % 70 % 64.64 % 71.49 %

Table 7.8: Comparing the mixed class approach to the separated approach (using J48)

Baldwin) and using all features apart of the semantic ones (once again trying to follow the same approach of Yamada and Baldwin) obtained similar scores, in par- ticular with the Naive Bayes classifier. The J48 classifier and especially the inclusion of the semantic features (Table 7.5) led to better results, gaining around 4 points accuracy. Selecting the attributes (Table 7.7) helped eliminating some noisy features and generally improved the results of about 1.5 points.

The last thing to notice is that precision of the best classifier (using attribute

selection, J48 and training on the same semantic class, shown in table 7.3) for the

noun-verb pairs classified as positive examples is generally high (average of 80 %)

while recall tends to be around 60 %. This means that the system is quite good in

recognizing the positive pairs, but it usually over generates the negative ones.

(55)

Chapter 8 Conclusions and future work

Following the work of Yamada and Baldwin I created a token-level classifier to recog-

nize verbs representing the Agentive or Telic role for given nouns in an Italian CoNLL

corpus. After having collected a lexicon of word senses divided in different seman-

tic classes (following the Simple-OWL distinctions) I created a manually annotated

Gold Standard defining the goodness of 1200 noun-verb pairs linked by an Agentive or

Telic relation. From the wikiCoNLL Italian corpus I then extracted a set of features

for all the sentences containing the pairs of the Gold Standard to create the train-

ing data. The main purposes of the classification experiments were to understand

whether using semantic features and training on semantically restricted training data

could help obtaining a better accuracy. Both hypotheses were confirmed by the re-

sults: semantic features revealed their importance improving the results of about 4

points. Similarly, using a training corpus of the same semantic class of the test set

led to a moderately good improvement of the accuracy of the classifier. Generally

the system showed a promising precision obtaining with my best classifier a precision

(56)

of the 80 % in recognizing the positive examples of agentive and telic relations but a lower recall (around 60 %). In other words the classifier was over generating the neg- ative instances. These results suggest that I developed a quite good system which can recognize with a reliable precision Agentive and Telic roles for nouns. Besides these results the research confirmed that Agentive and Telic roles are easier to recognize and annotate for artifacts than for other entities.

Although the promising scores obtained, there is still room for improvement. In particular the low recall on positive examples indicates that many (around 40 %) noun-verb pairs linked by one of the two qualia roles weren’t recognized by the system.

Solving this is not an easy task: other systems such as for example the template-based

one, using Hearst patterns, have shown similar limitations. One possible solution

could be the hybridization of multiple systems, integrating for example the machine

learning system with the template-based one. Furthermore my research could be

used in a system for the interpretation of logical metonymy and more generally can

help lexicographers in constructing lexicon based on the Generative Lexicon theory

by Pustejovsky.

(57)

Bibliography

[1] http://wacky.sslmit.unibo.it.

[2] http://medialab.di.unipi.it/wiki/Tanl.

[3] http://medialab.di.unipi.it/evalita2011/sst_tags.html.

[4] http://www.ilc.cnr.it/clips/Ontology_mapping.doc.

[5] http://www.iro.umontreal.ca/~galicia/.

[6] http://www.cs.waikato.ac.nz/ml/weka/index.html.

[7] R. Amaro, S. Mendes, and P. Marrafa. Lexical-conceptual relations as qualia role encoders. In Temporal Logic in Specification, pages 29–36, 2010.

[8] P. Cimiano and J. Wenderoth. Automatically learning qualia structures from the web. 2005.

[9] V. Claveau, P. Sbillot, C. Fabre, and P. Bouillon. Learning semantic lexicons

from a part-of-speech and semantically tagged corpus using inductive logic pro-

gramming. Journal of Machine Learning Research, 4:493–525, 2003.

(58)

[10] C. Havasi, A. Rumshisky, and J. Pustejovsky. An evaluation of the brandeis semantic ontology. 2009.

[11] M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In International Conference on Computational Linguistics, pages 539–545, 1992.

[12] M. Lapata and A. Lascarides. A probabilistic account of logical metonymy.

Computational Linguistics, 29(2):263–317, 2003.

[13] A. Lenci, N. Bel, F. Busa, N. Calzolari, E. Gola, M. Mona-chini, A. Ogonowski, I. Peters, W. Peters, N. Ruimy, M. Villegas, and A. Zampolli. Simple: A general framework for the development of multilingual lexicons. In Language Resources and Evaluation, 2000.

[14] M. Light. Corpus processing for lexical acquisition, edited by branimir boguraev and james pustejovsky. Journal of Logic, Language and Information, 7:111–114, 1998.

[15] H. Liu and P. Singh. Conceptnet: A practical commonsense reasoning toolkit.

2004.

[16] J. Pustejovsky. The Generative Lexicon. The MIT Press, Cambridge, MA, 1995.

[17] J. Pustejovsky, C. Havasi, J. Littman, A. Rumshisky, and M. Verhagen. Towards a generative lexical resource: The brandeis semantic ontology. In Language Resources and Evaluation, 2006.

[18] A. Roventini and N. Ruimy. Mapping events and abstract entities from parole-

simple-clips to italwordnet. In Language Resources and Evaluation, 2008.

(59)

[19] A. Roventini, N. Ruimy, R. Marinelli, M. Ulivieri, and M. Mammini. Map- ping concrete entities from parole-simple-clips to italwordnet: Methodology and results. In Meeting of the Association for Computational Linguistics, 2007.

[20] A. . Roventini, A. Alonge, F. Bertagna, N. Calzolari, R. Marinelli, B. Magnini, and M. Speranza. Italwordnet: a large semantic database for the automatic treatment of the italian language. 2002.

[21] N. Ruimy and A. Roventini. Towards the linking of two electronic lexical databases of italian.

[22] A. Toral and M. Monachini. Simple-owl: a generative lexicon ontology for nlp and the semantic web. 2007.

[23] Z. Vendler. Verbs and times. The Philosophical Review, 66(2):143–160, 1957.

[24] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. The Morgan Kaufmann Series in Data Management Systems.

Morgan Kaufmann Publishers, San Francisco, CA, 2nd edition, 2005.

[25] I. Yamada and T. Baldwin. Automatic discovery of telic and agentive roles

from corpus data. In 18th Pacific Asia Conference on Language,, pages 115–126,

Tokyo, Japan, December 2004.

Automatic acquisition of qualia roles

Master thesis:

Automatic acquisition of qualia roles

using an Italian semantically annotated corpus

by

Marco Trovato Mascali c

Supervisor: Gosse Bouma

European Master of Language and Communication Technologies

University of Groningen University of Nancy 2

August 2011

Declaration

I hereby confirm that the thesis presented here is my own work, with all assistance

acknowledged.

Abstract

This thesis investigates and improves a machine learning approach which permits to automatically recognize agentive and telic roles for nouns from a large CoNLL parsed Italian corpus.

Following the work of Yamada and Baldwin [25] I use a token-level supervised

classifier to dynamically discover for each noun of a set a number of verbs considered

as being the agentive or telic role of that noun. Furthermore, after the creation of

three semantically different training sets, corresponding to three subclasses of the

SimpleOWL ontology (Location, Artifact and LivingEntity), I run two different ex-

periments to evaluate whether selecting positive and negative instances among those

belonging to the same semantic group can improve the results or not and to under-

stand whether semantic features are useful for this goal. The research lies on the

assumption of the compositional nature of natural language, as described in the Gen-

erative Lexicon (GL) theory by Pustejovski, and in particular on the importance of

the so called qualia roles. Lexica based on GL can be used in many different com-

putational linguistic tasks such as Question answering and Textual Entailment, but

they are extremely time consuming and expensive to develop manually.

Contents

Abstract iii

1 Introduction 1

1.1 Motivation . . . . 2 1.2 Goals . . . . 4 1.3 Outline . . . . 4

2 Related works 6

2.1 Existing lexical databases . . . . 6 2.2 Qualia’s acquisition techniques . . . . 7

3 Resources 10

3.1 The Simple OWL ontology . . . . 10 3.2 ItalWordnet . . . . 11 3.3 A semantically annotated Italian corpus . . . . 12

4 Mapping Simple OWL and ItalWordnet 15

4.1 Motivation . . . . 15

4.2 Mapping . . . . 15

4.3 Remarks and results . . . . 21

5 Creating the Gold Standard 23 5.1 Preparing the data . . . . 23

5.2 Instruction to the annotators . . . . 25

5.3 Profile of the annotators . . . . 26

5.4 Analysis of the Gold Standard . . . . 27

6 The machine learning method 31 6.1 Features . . . . 31

6.2 Creating the training data . . . . 33

6.3 Experiments . . . . 37

7 Evaluation and results 43 7.1 Training on only one semantic class . . . . 44

7.2 Training on different semantic classes . . . . 46

7.3 Summing up and comparing the results . . . . 48

8 Conclusions and future work 50

Bibliography 52

Chapter 1

Introduction

Formal : the class the entity belongs to.

Constitutive : relations about the internal constitution of an entity.

Agentive : the origin of the entity or the way this was created.

Telic : the typical purpose or function of the entity.

Generalizing a little we can say that through the qualia roles a noun can be related to other nouns by the means of classical relations such as hyperonymy, and to verbs specifying its origins and its typical functions. For example the noun “beer” would have the following qualia roles:

Formal : beverage Constitutive : barley Agentive : to produce Telic : to drink

1.1 Motivation

Despite the fact that the influence of the Generative Lexicon (GL) theory by Puste-

are not yet very developed and common. This is mainly due to the fact that creating this kind of databases is extremely expensive and time consuming for researchers.

(1) Beatrix enjoyed the book.

(2) Beatrix enjoyed the cake.

(1) and (2) are almost identical and syntactically equal, but the (possible) implicit verbs are clearly different:

(1a) Beatrix enjoyed reading the book.

(2a) Beatrix enjoyed eating the cake.

The interpretation of these sentences presupposes a deep knowledge of the seman-

tics of the two entities “book” and “cake”. Going a bit deeper into the analysis of

this example it’s possible to state that once we know the characteristics of an entity,

we can probably use them for other entities belonging to the same semantic class. So,

for example the interpretation of (2) would be similar if substituting “cake” with an

other noun classified as “Food”. Having a large lexicon describing Telic and Agentive

roles can help the automatic interpretation of such cases. More generally, GL lexicon

databases can be extremely useful for a variety of different tasks, from anaphora res- olution to machine translation [14]. Research dedicated to the automatic acquisition of qualia roles have usually led to modest results, so much has still to be done.

1.2 Goals

1.3 Outline

In chapter 2 I will talk about some related works, in particular I will introduce two

already existing lexical semantic databases annotated following the GL theory, then