The potentiality of computational metaphor identification systems for studying human metaphor processing

(1)

The potentiality of computational metaphor identification

systems for studying human metaphor processing

Xiaoyu Tong

Abstract

Natural language processing (NLP) systems approach language-related events from a unique perspective and could render invaluable insights for the study of hu-man cognition, including metaphor processing. This thesis identifies and focuses on two aspects that NLP systems of metaphor identification can contribute to the study of human metaphor processing: 1) the extraction of metaphor instances for linguistic analysis or experiments, and 2) the decoding of brain activity through representational similarity analysis. For the extraction of metaphor instances, this thesis finds that automated and manual metaphor identification methods are based on different assumptions about word senses. As a result, explicit incorporation of the comparison processes typically required for manual metaphor identification does not guarantee better model performance. It is advisable for the development of future models to begin with an investigation of the learned parametres of the state-of-the-art models. The decoding of brain activity requires that the computational model be comparable to neurocognitive processes, which does not always correlate with good performance. This dissociation between comparability and performance is parallel to the dissociation between the naturalness and accuracy of human metaphor iden-tification. This thesis thus suggests that comparative studies between automated metaphor identification and human judgement can be a new avenue for studying human metaphor processing. This thesis also suggests that experiments on various cognitively plausible features or strategies, the development of cross-lingual systems, and the exploration of systems capable of discourse-level metaphor identification are promising directions for future research.

(2)

1 Introduction

Metaphor is a vital means of organising knowledge and conveying ideas. We evaluate an argument or idea using words such as well-grounded and baseless, because we perceive idea as building;1 we talk about a question in one’s mind, based on the metaphorical mapping between mind and container. This emphasis and interest in the cognitive perspective of metaphor use is often associated with the ‘cognitive turn’ in metaphor research in the late 1970s (Honeck and Hoffman 1980; Lakoff and Johnson 1980; Ortony 1979), when metaphor began to be recognised as an important component of human cognition instead of a mere rhetorical device. A long time has passed since then, and the mental representation of conceptual metaphor as well as metaphor comprehension and production as cognitive processes have been investigated in a wide range of scientific fields, from linguistics to cognitive neuroscience; yet there has not been a consensus re-garding the nature of either metaphor or metaphor processing (e.g., Bowdle and Gentner 2005; Gibbs 2011, 2013; Gibbs and E. Chen 2017; Steen 2017).

The cognitive turn in metaphor research has also brought about an interest in compu-tational metaphor processing systems in the natural language pocessing (NLP) commu-nity: the ubiquity of metaphor use in natural discourse requires that linguistic metaphors be detected and properly interpreted for a machine to adequately process everyday hu-man language. Since the first metaphor identification system was proposed by Fass (1991), more than 40 metaphor identification systems have been proposed; the state-of-the-art systems employ deep learning architectures and are capable of sequential word-level metaphor identification, achieving F-scores of over 0.7.

Computational models not only find inspiration in cognitive scientific research, but can also provide insights to the latter. However, there have not been many attempts at applying computational metaphor identification systems to the study of metaphor processing. In this thesis, I lay out the aspects in which NLP techniques can benefit studies of human metaphor processing (section 2), review the metaphor datasets that have been used by wide-coverage metaphor identification systems (section 3), and go over the wide-coverage metaphor identification systems from a cognitive perspective, with respect to:

1. The identification of the types of linguistic metaphors associated with the cognitive processes of metaphor comprehension (section 4).

2. The comparability of the architecture of the models with human metaphor pro-cessing or text propro-cessing in general (section 5).

This thesis thus intends to help cognitive scientific studies to find the computational models that best suit their purposes, as well as provide suggestions for the development of computational models that can better facilitate studies of human metaphor processing.

1

Here, small caps are used to denote that ‘idea’ and ‘building’ are concepts instead of referents. Since this thesis adopts the CMT view of metaphor as conceptual cross-domain mappings (see section 2), it also follows the convention of CMT and uses small caps for the analysis of conceptual metaphors.

(4)

2 Applying automated metaphor identification to the study

of human metaphor processing

2.1 Theories and terminology of human metaphor processing

One of the most influential works associated with the cognitive turn in metaphor research is Lakoff and Johnson (1980), which laid the foundation of the Conceptual Metaphor Theory (CMT). Lakoff and Johnson found through linguistic analysis that much of ev-eryday English use seems to be suggestive of systematic conceptual mappings between different conceptual domains. These cross-domain conceptual mappings are termed con-ceptual metaphors. The two conceptual mappings involved in a conceptual metaphor are called source domain and target domain. The conceptual metaphor argument as war, for instance, establishes a systematic mapping between a set of properties and relations of the source domain, war, and the target domain, argument: an argument could remind people of wars, when it is caused by conflicting positions between people and involves aggressive behaviour. The conceptual metaphor thus serves as a tool to perceive argument as war.

CMT has called attention to the fundamental role of metaphor in the organisation of knowledge in the mind and the construction of meaning in lanaguage use. One may still wonder, however, what it means for the brain/mind to process manifestations of underly-ing conceptual metaphors given that conceptual metaphors are cross-domain mappunderly-ings. In other words, CMT does not directly deal with metaphor processing, what happens in the brain/mind when one encounters a metaphor in text or other modalities. One of the questions concerning metaphor processing is about the nature of online cross-domain mapping, whether it is essentially a comparison process or a categorisation process (e.g., Barnden 2016; Bowdle and Gentner 2005). According to the comparison view (e.g., Ortony 1979), when one processes (1), which exhibits a metaphorical mapping between flower (the source domain) and beauty (the target domain), one searches for the shared properties and/or relations of the two domains, such as being pleasing to the eye, and the possibility of being fragile. Scholars advocating the categorisation view (e.g., Glucksberg and Keysar 1990), however, would argue that the object a flower refers to a superordinate category which includes both beauty and flower, with the latter being a prototypical member of the category.

(1) Beauty is but a flower / which wrinkles will devour; . . .2

The comparison-vs-categorisation debate gave rise to another milestone in the study of metaphor, the Career of Metaphor theory advanced by Bowdle and Gentner (2005). The theory suggests that both comparison and categorisation are possible paths of metaphor processing; which path is chosen depends on the interaction between the con-ventionality of the metaphor and its linguistic realisation (Bowdle and Gentner 2005, p. 208). More specifically, novel metaphors or metaphors that establish new conceptual cross-domain mappings are processed through comparison. As the metaphor becomes conventionalised due to frequent use, people tend to process the metaphor through categorisation, as categorisation is less computationally expensive for the brain than comparison (Bowdle and Gentner 2005, p. 199). The syntactic form that realises the metaphorical relation also affects the processing of a conventional metaphor. Compare the following statements:

2

(5)

(2) a. An argument is like a war. b. An argument is a war. (3) a. Okonomiyaki is like pizza.

b. Okonomiyaki is Japanese food.

The two examples in (2) are realisations of the conventional metaphor argument as war. Bowdle and Gentner (2005) contend that while a conventional metaphor can be processed through either comparison or categorisation, it is processed through compar-ison when expressed in the form of a simile, like (2-a), because its syntactic structure is the same as literal comparisons such as (3-a). On the other hand, copula metaphors such as (2-b) prompt the recipient to process the metaphors through categorisation, as they share the syntactic structure of (3-b), which is a literal categorisation.

The conventionality/novelty of metaphor is one of the most frequently addressed dimensions in metaphor research. To facilitate discussion across disciplines and studies, this thesis adopts a clear-cut operational definition of conventional and novel metaphor use: a metaphorically used word is motivated by a conventional metaphor if its meaning in the context is a registered sense in a dictionary of contemporary language use; other-wise, it is an instance of novel metaphor use. This operational definition is in line with both Bowdle and Gentner (2005) and MIPVU, a CMT-based metaphor identification procedure (see section 2.2).3 _{Since MIPVU is used for analysing metaphor instances}

in all subsequent sections, this thesis also follows MIPVU in consulting the Macmillan dictionary4 _{about word senses.}

The Career of Metaphor theory primarily deals with the processing of direct metaphors (coined by Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010), linguistic metaphors that employ words clearly incongruent with the topic of the sentence or discourse in the given context (i.e, vocabulary typically associated with the source do-main). Similes and copula metaphors are typical examples of direct metaphors. They can be distinguished from indirect metaphors (Steen, Dorst, Herrmann, Kaal, Krenn-mayr, and Pasma 2010), which are linguistic metaphors that feature the metaphorical use of polysemous words (that is, the metaphorical meaning is an established meaning of the word). The use of the verb attack below is an indirect metaphor based on the conceptual metaphor argument as war:

(4) His work was attacked by the critics.

Since the syntactic structure does not resemble either literal comparison or literal cate-gorisation, it is difficult to analyse the processing of such metaphor uses given the Career of Metaphor theory. Furthermore, it seems counterintuitive that these metaphor uses invoke online metaphor processing in the first place. It is unlikely that when one en-counters (4), one needs to refer to the domain of war to understand what the sentence means.

The processing of indirect metaphors like (4) is dealt with in the Deliberate Metaphor Theory proposed by Steen (2008, 2011a, 2015, 2017). Bringing the communication di-mension into metaphor research (in addition to the language and the thought dimen-sions), the theory draws attention to the deliberateness of metaphor use or the intention behind metaphor use. According to the theory, a linguistic metaphor can be identified as

3_{The differentiation between conventional and novel metaphors is not among the compulsory steps}

of MIPVU, but the criterion is used consistently in the discussion (e.g., Steen, Dorst, Herrmann, Kaal, and Krennmayr 2010; Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010).

(6)

deliberate metaphor use if a topic shift or perspective change is intentionally introduced to the discourse; otherwise, the linguistic metaphor is a non-deliberate metaphor use. Steen (2017) associates deliberate metaphor use with online metaphor processing, the construction of cross-domain mappings as part of the process of text comprehension. Examples of deliberate metaphors include novel metaphors like (1) and conventional metaphors realised in the form of simile or copula metaphor, such as the two examples in (2). Indirect metaphors like (4) are non-deliberate metaphor uses, as there is no clear evidence that the speaker intends to introduce the domain of war by using the verb attack.

Steen (2017) also differentiates the deliberateness of metaphor use from the con-sciousness or awareness of metaphor use. Conscious metaphor processing is signalled by the metacognition that a metaphor is being processed. In other words, one is consciously processing a metaphor if one thinks to oneself, ‘there is a metaphor going on’. Conscious metaphor use should therefore be a fraction of deliberate metaphor use.

2.2 Manual metaphor identification procedures

The cognitive turn in metaphor research calls for inter- and multidisciplinary studies, which in turn calls for metaphor identification procedures that produce replicable results. Taking CMT as the starting point, Pragglejaz Group (2007) attempted to develop such a procedure called MIP (short for Metaphor Identification Procedure). The procedure is summarised in figure 1.

Despite the systematicity of the procedure, a major drawback is that it does not exhaust all the word uses that imply an underlying conceptual metaphor. Consider the following instances:

(5) a. His eyes are like stars. b. His eyes are stars.

Both (5-a) and (5-b) are undoubtedly metaphorical, as both clearly establish a mapping between eye and star, which belong to different conceptual domains. However, if one examines the sentences using MIP, one may have to decide that none of the words in (5-a) should be marked metaphorical. All of the words in (5-a) are literally used, although the entire sentence is metaphorical. One may also experience a difficult time deciding the metaphoricity of are and stars in (5-b), which can be understood as either ‘his eyes are like stars’ (comparison), or ‘his eyes are things that shines’ (categorisation). In the former case, only the linking verb are should be marked metaphorical; in the latter case, only the object stars.

An improvement and extension of MIP, which is called MIPVU (Metaphor Identifica-tion Procedure VU University Amsterdam), was then proposed (Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010). The general steps are presented in figure 1. Instead of metaphorically used words, MIPVU aims at identifying metaphor-related word s, which involve all word uses that are potentially related to an underlying conceptual metaphor. The method distinguishes four types of metaphor-related words: 1) indirect metaphors, the identification procedure practically the same as MIP, 2) direct metaphors, 3) implicit metaphors, which include pronouns and ellipses that refer to source-domain entities, and 4) metaphor flags, which are indicators of a cross-domain comparison (e.g., as, like, re-sembling).

There are three aspects of MIP(VU) that requires special attention. Firstly, lexical units are not separated by space. In principle, the procedure follows the Macmillan

(7)

1. Read and understand the text. 2. Determine the lexical units. 3. For each lexical unit:

(a) Determine its contextual meaning.

(b) Determine whether it has a more basic meaning. (c) Compare the contextual

and the basic meaning. 4. Decide the metaphoricity of the

lexical unit. (a) MIP

1. Determine the lexical units. 2. Identify indirect metaphors. 3. Identify direct metaphors. 4. Identify implicit metaphors. 5. Identify metaphor flags. 6. Deal with new coinages.

(b) MIPVU

Figure 1: The general guidelines of MIP (Pragglejaz Group 2007, p. 3) and MIPVU (Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010, pp. 25–26). I retain the different levels of specificity of the cited instructions.

dictionary and treats every word or multiword expression that has a dictionary entry as a single lexical unit. Therefore, phrasal verbs and/or idioms such as pull yourself together are considered individual lexical units; the metaphoricity of the components (viz., the words pull, yourself, and together in pull yourself together ) is not examined.

Secondly, basic meanings are identified relative to contextual meanings; the proce-dure does not determine the literal senses of a lexical unit prior to examining its meaning in the context. It is also explicitly pointed out that a more basic meaning is not necessar-ily also more frequently used (Pragglejaz Group 2007, p. 3). This marks a fundamental difference between the underlying assumptions about metaphor adopted by MIP(VU) and selectional preference violation (SPV, Wilks 1975, 1978). According to SPV, a sen-tence is metaphorical if the nouns in the sensen-tence do not satisfy the selectional preference of the verb or adjective modifying the nouns. For example, the expression follow the academic tradition (BNC CLP 856)5 _{is metaphorical, because the verb follow prefers}

the object to refer to something visible, whereas traditions are invisible. The SPV view of metaphor is essentially based on co-occurrence frequency, whereas MIP(VU) follows CMT and assumes that basic meanings are usually embodied.

Lastly, note that the procedure requires that the contextual and the more basic meaning be not only sufficiently different, but also semantically related. If the contex-tual meaning cannot be considered to be derived from the more basic meaning despite the fact that the latter satisfies the requirements of being more basic (i.e., more con-crete, specific, and human-oriented), the annotator should consider taking both as basic meanings (Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010, p. 38).

5

Examples of usage taken from the British National Corpus (BNC) were obtained under the terms of the BNC End User Licence. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the web site at http: //www.natcorp.ox.ac.uk/.

(8)

2.3 Challenges faced by cognitive research

2.3.1 Linguistic analysis

MIPVU provides a systematic way to extract almost all word uses that can potentially be explained by an underlying conceptual metaphor, which marks a good start for linguistic analysis of conceptual metaphors. As is pointed out by the authors of MIP, the man-ual procedure also enriches the annotators’ understanding of the nature of metaphor (Pragglejaz Group 2007, p. 36), which is a great treasure for metaphor researchers. Nonetheless, I identify the following challenges that linguistic analysis of metaphor still faces.

To begin with, the reliability of MIP(VU) has been tested within research groups (e.g., Lu and Wang 2017; Marhula and Rosi´nski 2014; Pragglejaz Group 2007; Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010), but has not been tested across research groups. It is possible that the annotators reach agreements on the decisions concerning their current study, but a different research group would have reached another set of agreements given the same study. If the high reliability rates are only within groups, then the replicability of the procedure is still questionable.

A similar potential problem can be spotted when it comes to the reliability and replicability of MIP(VU) across languages. Since MIP(VU) is based on English word uses and only specifies how the procedure should be applied to (contemporary) English texts, there has been adaptation of the procedure to other languages (Nacey et al. 2019). However, it remains a question to what extent the proposed versions of MIPVU for other languages are accepted by the other research groups that deal with metaphor uses in those languages. Additionally, there has not been cross-linguistic comparisons of the adapted procedures. It is therefore also questionable whether the different versions pro-duce comparable results for investigating the universals and cross-linguistic differences of metaphor use.

Lastly, it should be noted that MIP(VU) as well as other means of linguistic anal-ysis of metaphor are built upon various hypotheses regarding the relationship between conceptual and linguistic metaphors. The findings of linguistic analysis with respect to metaphor processing are only valid if the underlying assumptions are true. To verify the underlying assumptions about metaphor processing or conceptual metaphors, one would need converging evidence from other fields of study, such as cognitive psychology and cognitive neuroscience.

2.3.2 Psychological and neuroscience experiments

Psychological and neuroscience experiments have been a major means of verifying and exploring theories of metaphor processing. However, I observe the following issues re-garding these experiments.

A characteristic of these experiments that cannot be neglected is the restriction on the syntactic form of the linguistic metaphors used as stimuli. It is typical of an experiment to employ metaphors realised by adjective-noun relations (e.g., sweet person), short verb phrases (e.g., my car drinks gasoline), or is-a patterns (e.g., life is a maze), as the restricted forms help to control confounding effects. These examples are valid metaphor uses, but may not be representative of how metaphors are used in natural discourse, as it is not always the case that a linguistic metaphor can be understood without ambiguity given limited context. It is therefore questionable to what extent the findings regarding the processing of these linguistic metaphors can be generalised to metaphor uses in natural discourse.

(9)

Apart from formal restrictions, experiments also distinguish metaphors with respect to underlying cognitive processes (see section 2.1). Novel metaphors, for instance, are a popular object of investigation, as they are usually regarded as directly associated with online metaphor processing (e.g., Bowdle and Gentner 2005; Steen 2011b). However, novel metaphors occur much less frequently than conventional metaphors in natural discourse (Steen, Dorst, Herrmann, Kaal, and Krennmayr 2010). It is therefore a great challenge to collect a large number of novel metaphor uses from natural discourse. Using made-up metaphors might be a way around the challenge, but it could add to the distance between experimental and natural settings that the experiment may already experience from the formal restrictions.

2.4 Opportunities offered by NLP techniques

Both linguistic analysis and experiments can benefit from NLP systems that extract metaphor uses from natural discourse. As an alternative to MIP(VU), computational metaphor identification systems can not only significantly reduce the time spent on ex-tracting linguistic metaphors, but also to a large extent avoid the problems with respect to between-research group and cross-linguistic reliability and replicability. Since the actual metaphor identification is done by computational systems instead of human ana-lysts, it is much easier to achieve replicable results that are relatively independent of the individual research groups. The use of computational systems can also facilitate adap-tation of a metaphor identification procedure to different languages. Although decisions still need to be made with respect to the equivalences of various elements of a procedure in another language, automated systems make the process less time-consuming and less expensive. In section 4, therefore, I review the potential of existing metaphor identifi-cation systems with respect to collecting linguistic metaphors for linguistic analysis or experiment.

The use of computational models in neuroimaging studies can also contribute to pro-viding converging evidence for the verification of hypotheses about metaphor processing and/or conceptual metaphors. In this thesis, I focus on the use of representational similarity analysis (Kriegeskorte et al. 2008), a method for decoding brain activity by calculating the correlation between brain activity and the hidden states of computational models; the model that is more positively correlated with the resultant brain activity is more likely to represent how metaphor is processed in the brain/mind. The potential of existing metaphor identification systems in this respect is dealt with in section 5.

By combining computational metaphor processing systems with other modules, NLP models can also serve as a tool to ‘visualise’ the results of generalising findings based on restricted metaphor examples to metaphor processing in natural settings. The visuali-sation can provide researchers with a concrete idea of the role of metaphor processing in text processing and how its role is affected by the interaction of various variables, and may even help to generate new insights about metaphor processing and text process-ing. The actual incorporation of this prospect is beyond the scope of this thesis, but the review of the currently available models will point out directions to help realise the incorporation.

3 Metaphor datasets

The development of automated metaphor identification systems requires metaphor datasets for model training and testing. It is therefore essential to be clear prior to using a

(10)

***cool***

*nonliteral cluster*

wsj26:2702 N While the 17 male supers who play guards and courtiers cool our heels , we are shoehorned into a room 15 feet square , reading newspapers and exchanging small talk ./.

wsj26:1046 U To cool down the economy , China has been trying to limit money supply ./.

*literal cluster*

wsj41:160 L Heavy water is used to control and cool the reaction of small research reactors ./.

Figure 2: An excerpt of the TroFi Example Base (Birke and Sarkar 2006) metaphor dataset how the metaphor examples are obtained and what types of metaphors are covered by the dataset.

The TroFi Example Base6 _{consists of 6436 (1592 literal, 2145 non-literal, and 2699}

unannotated) sentences from BLLIP 1987-89 WSJ Corpus Release 17_{, manifesting the}

use of 50 target verbs. Attached to each sentence are an identifier (e.g., wsj26:2702) specifying the location of the sentence in the original corpus and a label denoting whether the sentence is literal (L), non-literal (N), or unannotated (U). An example for the verb cool is presented in figure 2.

The TroFi Example Base was built by Birke and Sarkar (2006) using a TroFi (Trope Finder) system developed by the authors with an Active Learning component (Birke and Sarkar 2007). The system classifies target sentences into the literal or the non-literal category via sentence clustering, which involves calculation of the similarity between target sentences and seed sentences; two sentences are considered similar if they contain the same words. It should be noted that the non-literal category includes not only metaphorical uses, but other tropes as well (Birke and Sarkar 2006, p. 330). Consider sentence wsj26:2702 in figure 2, which is marked non-literal and exemplifies the use of cool in the idiom cool your heels. Since the idiom is an entry in the Macmillan dictionary,8 _{however, MIPVU would consider it a single lexical unit and would not mark}

its use in the sentence metaphorical, as it is the basic meaning of the idiom (i.e., to wait) that is used in the sentence. While the metaphoricity of phrasal verbs and/or idioms is debatable (e.g., Marhula and Rosi´nski 2014, p. 40), the inclusion or exclusion of the arguably non-metaphorical instances should be justified in studies that employ this dataset.

Additionally, the TroFi system only deals with entrenched word uses (Birke and Sarkar 2006, p. 329). The sentence clustering algorithm relies on the words that target sentences share with seed sentences, which are extracted from WordNet (Fellbaum 1998), Wayne Magnuson English Idioms Sayings and Slang9_{, and the Master Metaphor List}10_,

and thus exhibits existing metaphor uses. As a result, the linguistic metaphors in the example base are likely to be conventional metaphor uses.

Tsvetkov, Boytsov, et al. (2014) released a dataset (henceforth: the TSV dataset) consisting of an equal number (884) of metaphorical and non-metaphorical adjective-noun phrases (examples see figure 3). The examples were collected from the web and verified by more than one additional annotator (Tsvetkov, Boytsov, et al. 2014, p. 251).

6_{http://natlang.cs.sfu.ca/software/trofi.html} 7 https://catalog.ldc.upenn.edu/LDC2000T43 8 https://www.macmillandictionary.com/dictionary/british/cool-your-heels 9_{http://esl-bits.net/idioms/index.htm} 10 http://araw.mede.uic.edu/ alansz/metaphor/METAPHORLIST.pdf

(11)

met nonmet sweet dreams sweet candy sweet person sweet smell academic gap widening gap mad money mad manager

mad hair plain yogurt

Figure 3: Examples from the adjective-noun dataset of Tsvetkov, Boytsov, et al. (2014)

As is shown in figure 3, the metaphorical expressions include both cases where the adjective is metaphorically used and cases where the noun is metaphorically used (e.g., academic gap).

A drawback of the dataset, however, is that it does not come with a detailed account of the criteria used for metaphor annotation, which makes it difficult for researchers using the dataset to decide whether metaphor is defined in the same way in their research and in the dataset. For example, the dataset regards mad hair as a non-metaphorical expression. Following MIPVU, the contextual meaning of mad in the phrase is messy or extraordinary (typically going in all directions). The meaning has not entered the Macmillan dictionary11_{, but is related to sense 1, referring to someone acting in a strange}

way that would be considered crazy, silly, or funny. Therefore, mad in mad hair can be understood as referring to someone’s hairstyle that is so strange that it looks as if the person is mentally disturbed; the use is metaphorical rather than metonymic, as a mad person does not necessarily wear an extraordinary hairstyle. While the phrase may simply be a rare mistake that does not affect the overall reliability of the dataset, it may also have resulted from a systematic criterion that was used for annotating all of the examples.

It should also be noted that the absence of the metaphorical uses of an adjective that appears in the literal examples does not necessarily mean that the adjective does not have an entrenched metaphorical meaning. For example, the dataset includes a literal use of the adjective plain (plain yogurt in figure 3), but leaves out its metaphorical uses, as in plain facts. It is unlikely that the dataset excludes conventional metaphor uses, either. All of the metaphorical uses of sweet in the dataset (sweet air, sweet dreams, sweet life, sweet looks, sweet person), for instance, are conventionalised.

The metaphor dataset released by Mohammad et al. (2016) (henceforth: the MOH dataset) consists of 1639 (1230 literal and 409 metaphorical) sentences extracted from WordNet, manifesting the use of 440 verbs. The uses of the verb abuse is shown in figure 4. Mohammad et al. (2016) collected the WordNet example sentences of verbs that have three to nine senses, and had the metaphoricity of the verb uses annotated by at least ten amateur annotators. Provided that the senses associated with each example sentence reflect accurately the contextual meanings of the verbs in each sentence, the verb uses that are marked metaphorical in the dataset can be considered conventionalised metaphorical uses that, when prompted, can still evoke online metaphor processing; the confidence ratings can be regarded as a measure of degree of metaphoricity. The specified association between word senses and verb uses in sentences also provides a clearer picture of the domains engaged in the metaphorical mappings.

The Language Computer Corporation (LLC) metaphor datasets (Mohler, Brunson, et al. 2016) contain linguistic metaphors concerning a set of predetermined target-domain

11

(12)

term sense sentence class confidence abuse abuse#v#1 This boss abuses his workers. literal 0.52 abuse abuse#v#2 Don’t abuse the system. metaphorical 0.5 abuse abuse#v#3 The actress abused the policeman

who gave her a parking ticket.

metaphorical 0.5 abuse abuse#v#4 Her husband often abuses alcohol. metaphorical 0.7

Figure 4: An excerpt from the dataset of Mohammad et al. (2016)

concepts in four languages: English, Spanish, Russian, and Farsi. The annotated sen-tences were extracted from corpora of web contents, and four metaphor annotation procedures were employed, resulting in four separate datasets: 1) SYS, which consists of validated outputs of two automated metaphor identification systems (Bracewell et al. 2014; Mohler, Tomlinson, et al. 2015), 2) UNV, which consists of all the outputs of the two automated metaphor identification systems, without validation by annotators, 3) REC, consisting of manually extracted sentences containing potential source-domain lexemes, 4) ANN, a dataset of novel metaphors extracted by annotators given a set of conceptual metaphors.

The sentences are annotated with respect to the following aspects: 1) the target-do-main and source-dotarget-do-main concepts and lexemes, 2) the syntactic relationship between the source-domain and target-domain lexemes, 3) the metaphoricity of the sentence, rated on a four-point scale, from 0 ‘no metaphoricity’ to 3 ‘clear metaphor’, 4) the affect of the source-domain and target-domain lexemes of metaphorical sentences (i.e., sentences receiving a metaphoricity rating above 1), rated on a seven-point scale, from -3 ‘strongly negative’ to 3 ‘strongly positive’.

Since the LCC datasets provide metaphoricity annotations concerning a predeter-mined set of target-domain concepts in four languages, the datasets can be used for exploring the universals as well as cross-linguistic differences in metaphor use. The af-fect annotations can also be useful for investigating the relationship between metaphor use and the expression of emotions (e.g., Mohammad et al. 2016).

The VU Amsterdam Metaphor Corpus (VUAMC) (Steen, Dorst, Herrmann, Kaal, Krennmayr, and Pasma 2010) is the fruit of the authors’ application of their own metaphor identification method of MIPVU (section 2.2). The annotation was done on the entire BNC-Baby corpus, a four-million word subset of the British National Corpus (BNC)12_{, including separation of lexical units and the metaphoricity label of each lexical}

unit. It therefore distinguishes this corpus from all the other metaphor datasets that it deals with the metaphoricity of functional words.

VUAMC also acknowledges the use of extended metaphors or successive realisations of the same conceptual metaphor, usually along several sentences. A typical example of extended metaphor is the ‘All the World’s a Stage’ speech written by William Shake-speare. The first four lines are as follows:

(6) All the world’s a stage, / and all the men and women merely players; / they have their exits and their entrances; / and one man in his time plays many parts, . . .13

The speech begins with a world as stage metaphor, and proceeds with exploring various mappings within the metaphor. Note that the metaphoricity of the last two lines needs to be considered within the given context, that the speaker is making a

12

http://www.natcorp.ox.ac.uk/

13

(13)

comparison between world and stage; without the context, the two lines could be mistaken as literal sentences.

As is revealed in this thesis, the latest metaphor identification systems, which usu-ally employ a deep learning architecture, operate at sentence level instead of discourse level (see section 5). The identification of extended metaphors is therefore one of the less explored tasks in the NLP community. However, extended metaphors are closely associated with deliberate metaphor use (e.g., Reijnierse et al. 2018; Steen 2015), and are therefore valuable linguistic material for studying human metaphor processing. The development of metaphor identification systems that capture extended metaphors would be a huge contribution to metaphor research.

4 Extracting linguistic metaphors for linguistic analysis or

experiments

Let us start our journey through wide-coverage metaphor identification systems. I ex-amine the metaphor instances these systems extract in this section, and compare their architectures with neurocognitive processes in the next section.

The processing of literal and metaphorical messages are usually considered distinct due to the association of the latter with the notion of incongruity: despite the effort of studies such as Bowdle and Gentner (2005) in bringing the two processes together (section 2.1),14_{it is usually assumed that metaphor processing begins with the (conscious}

or unconscous) recognition that an expression is incongruent with its context.15 _It

is thus typical of rule-based metaphor identification models to perform some sort of comparison between the target word and the context. With the emergence of machine learning models, such comparison became increasingly implicit. I introduce the term explicit models for models that explicitly incorporate some sort of comparison between the target word and the context, and implicit models for those that do not. This and the next sections are structured according to this difference in explicitness, because explicit incorporation of metaphor comprehension processes entails higher predictability in terms of both the detection of metaphors relevant for cognitive research and comparability with neurocognitive processes. In this section, the two types of models are reviewed with different emphases: for explicit models, this section infers the metaphors that are likely to be detected or overlooked from the design of the models; for implicit models, the focus is on their performance as well as the effect of the features (e.g., concreteness information) that the studies experimented with. The types of metaphors the implicit models are likely to detect or overlook cannot be inferred without examination of the learned parametres, and is therefore beyond the scope of this thesis.

While most models take context into consideration, there are also a few exceptions: Beigman Klebanov, B. Leong, et al. (2014) and Beigman Klebanov, C. W. Leong, et al.

14_{The nature of metaphor processing is still under investigation. This thesis therefore does not insist}

that assuming an association between metaphor and incongruity or comparison is problematic.

15_{MIP(VU) identifies metaphor-related words by examining the contextual and basic meanings of}

target words (see section 2.2). While the contextual-basic contrast seems to be within the different uses of a word instead of between a word and its context (as in SPV), to interpret the contextual meaning of a word basically means to think of an alternative expression that is congruent with the context; whether a more basic meaning can be identified indicates whether there is a considerable difference between the target word and the alternative expression. Since the alternative expression is congruent with the context, a target word identified as a metaphor-related word is essentially incongruent with the context. Therefore, this thesis considers this essential step in MIP(VU) to be an incorporation of the idea of word-context incongruity as well.

(14)

(2016). The models proposed in these two studies pass a representation of the target word to a classifier; the word representation is concerned with aspects of the target word independent of the current context. The classification is thus based on the frequency that a word is used metaphorically in the training data, and the functionality of the models is likely to be restricted to locating conventional metaphor uses. Nonetheless, these models serve as strong baselines for studies of automated metaphor identification.

4.1 Explicit models

Before deep neural networks were widely used for automated metaphor identification, most models were dedicated to the identification of one or more of the three types of linguistic metaphors distinguished by Krishnakumaran and Zhu (2007). Their study deals with realisations of metaphors in three types of syntactic relations involving nouns: 1)subject-be-object, 2) subject-verb-object (the verb is not be, and one of the nouns can be absent), and 3) adjective-noun. The linguistic metaphors are named Type I, Type II, and Type III metaphors respectively. The subjects in Type I metaphors and the nouns in Type II and Type III metaphors are assumed to denote the target domains; the verbs (including different forms of be in Type I metaphors) or adjectives are assumed to be used metaphorically. The models following this typology thus take into account minimal context (the grammatical relations that can be accepted by each explicit model are summarised in table 1).16 _{The restriction on the syntactic form of candidate expressions}

also indicates that the models are unlikely to detect all the metaphor instances in a text.17

Nonetheless, they can be useful for collecting stimuli for experiments, which typically have strict restrictions on the syntactic form of the stimuli as well. In the examination of explicit and implicit models, therefore, this section first go over the models that predict metaphoricity from minimal context, and then move on to models that accept a longer range of context, typically an entire sentence.

4.1.1 Taking minimal context

4.1.1.1 Type I metaphors Type I or copula metaphors are essentially instances of deliberate metaphor use, as the expressions cannot be processed without constructing a mapping between the concepts denoted by the subjects and the objects respectively. As is mentioned earlier, deliberate metaphors can be either novel or conventional (see sec-tion 2.1). Since convensec-tional metaphors tend to be realised by indirect metaphors, which usually do not require online metaphor processing (e.g., Steen, Dorst, Herrmann, Kaal, and Krennmayr 2010), computational models that can extract copula metaphors that are realisations of conventional metaphors can provide useful instances of language use for research on the relationship between linguistic metaphors and metaphor processing. Krishnakumaran and Zhu (2007) use the hyponymy relationships stored in WordNet (Fellbaum 1998) for the identification of copula metaphors. If WordNet confirms a hy-ponymy relationship between the subject and the object, the subject-be-object relation will be marked ‘normal’. While this approach is in accord with the idea that copula

16

I consider the presence of at least one context word to be necessary for metaphor identification. Following MIP(VU), it is impossible to interpret the contextual meaning of a word if context is not provided. The presence of minimal context therefore refers to adjective-noun and verb-noun pairs in a restricted sense. Subject-be-object and subject-verb-object relations are grouped with the minimal cases because they have more resemblance to the latter than full sentences in natural discourse.

17

It is for this reason that this thesis considers the identification of the three types of metaphors separately for the study of Krishnakumaran and Zhu (2007), although their model eventually identifies metaphor at sentence level.

(15)

Table 1: Explicit models: accepted input structures

IsA SVO AN S

Krishnakumaran and Zhu (2007) X X X

-Su et al. (2017) X - - -Neuman et al. (2013) X X X -Li et al. (2013) X X X -Turney et al. (2011) - - X X Assaf et al. (2013) - - X -Wilks et al. (2013) - X -

-Shutova and Sun (2013) - X -

-Shutova, Kiela, et al. (2016) - X X

-Rei et al. (2017) - X X

-Mao et al. (2018) - - - X

Swarnkar and Singh (2018) - - - X

Mao et al. (2019) - - - X

IsA = subject-be-object; SVO = subject-verb-object; AN = adjective-noun; S = sentence. For models that extract certain grammatical relations from input sentences, the extracted grammatical relations are considered as their actual inputs. The models are ordered according to the sequence they are first described in the thesis.

metaphors are essentially ‘false’ hyponymy relationships, it is, strictly speaking, the hy-ponymy relationship between the contextual meaning of the target-domain noun and the more basic meaning of the source-domain noun (comparing to its contextual meaning) that needs to be searched for.18 _{Consider the following two copula metaphors:}

(7) The woman is a cat.

(8) My cat is a tiger. (Neuman et al. 2013)

In (7), the subject woman takes the sense woman.n.01 ‘an adult female person’ in Word-Net19_{; and the object cat takes the sense cat.n.03 ‘a spiteful woman gossip’. The}

algo-rithm of Krishnakumaran and Zhu (2007) will mark the expression non-metaphorical as there is a hyponymy relationship between the two senses in WordNet. However, there can be identified a more basic meaning for cat referring to a woman who gossips a lot: cat.n.01 ‘feline mammal usually having thick soft fur and no ability to roar’, which does not have a hyponymy relationship with woman; it is also based on this relation that (7) is regarded as an instance of conventional metaphor. The other example, (8), on the other hand, is a novel metaphor use, as the contextual meaning of the object tiger can be interpreted as ‘a fierce creature’, which is not an entrenched sense of the noun. The algorithm, however, will falsely mark the expression non-metaphorical, based on the hyponymy relationship between tiger.n.02 ‘large feline of forests in most of Asia having a tawny coat with black stripes’ and big cat.n.01 ‘any of several large cats typically able to roar and living in the wild’. While tiger.n.02 is the more basic meaning of the object in the context, big cat.n.01 is not the contextual meaning taken

18_{Since the subject is assumed to be used literally, there is not a more basic meaning compared to its}

contextual meaning; the contextual meaning is basic in itself. Nonetheless, the meaning to be examined for the subject should still be the contextual meaning: for a noun that has more than one literal uses, it is the literal meaning that is brought out by the context (i.e., the object) that should be compared with the the basic meaning of the object.

19_{The senses were extracted using nltk (Bird et al. 2009), presented in the format synset name}

(16)

by the subject cat. The sense of the subject should be cat.n.01, which does not have a hyponymy relationship with tiger. The method of Krishnakumaran and Zhu (2007) will therefore result in false negatives when dealing with words with entrenched metaphorical senses, the ultimate reason being that it does not involve interpretation of contextual and more basic meanings. As is exemplified by the two copula metaphors above, the false negatives will contain both novel and conventional metaphors.

The method used by Krishnakumaran and Zhu (2007) is adopted by Su et al. (2017) as one of the criterion for metaphor identification: a subject-be-object relation will be regarded as metaphorical if 1) there is no hyponymy relationship between the two nouns and 2) the cosine similarity of their embeddings is smaller than the threshold. This method also does not deal with the different uses of the same word. Since semantic relatedness is employed as an additional criterion, it does not resolve the false negatives of Krishnakumaran and Zhu (2007). On the contrary, the additional criterion may increase false negative cases: the source-domain and target-domain word pairs that are typically used to realise a highly conventionalised metaphor (e.g., money and time) are likely to be used in similar contexts due to the established systematic mappings between the two conceptual domains (consider expressions such as I have used up my time and I have a lot of time, where the concept time is essentially perceived as a valuable resource). The high semantic relatedness of these word pairs may therefore result in expressions of conventional metaphors such as time is money being falsely classified into the non-metaphorical category. Nonetheless, it might be possible to resolve the problem by including such word pairs in the training data, provided that the cosine similarity of word pairs exhibiting conventionalised metaphorical relations is distinguishable from word pairs with hyponymy relationships.

Neuman et al. (2013) developed an algorithm called CCO** for identifying copula metaphors, based on the CCO and the CCO* algorithm for identifying Type II and Type III metaphors respectively (see section 4.1.1.2). The algorithm begins with the WordNet-based categorisation of nouns provided by the WordStat categorisation dictionary, which is basically in accord with the categorisation provided by the WordNet lexicographer files20_{. The algorithm marks the target relation metaphorical if there is no overlap}

between the categories of the subject and the object. If there is overlap, the algorithm will search ConceptNet (Speer et al. 2017) to see whether the main categories of the two nouns also overlap; if not, the subject-be-object relation will be considered metaphorical despite the WordStat categories that the two nouns share. If the main categories of the two nouns are also the same, the algorithm will continue and check whether there is overlap between the 100 most concrete nouns that are associated with the two nouns respectively. The algorithm will only mark the subject-be-object relation literal if it detects overlap in this step.

By consulting ConceptNet, the metaphoricity of the woman is a cat and my cat is a tiger will be detected successfully, as the main categories of woman, cat, and tiger are adult, feline, and an animal respectively. It should be noted, however, that the main categories are determined (by ConceptNet) prior to the metaphor identification task, independent of the context of the candidate expressions. In other words, like Krishnaku-maran and Zhu (2007), the algorithm of Neuman et al. (2013) does not interpret the contextual and more basic meanings either. It could resolve some of the false negatives because it is assumed that the metaphoricity of candidate subject-be-object relations can be determined with minimal context, which requires that the basic meanings of the subject and the object are their most common uses. It is this assumption that helps

20

(17)

to simplify the task for models that rely on minimal context, and makes it possible for CCO** to use ConceptNet for word-sense disambiguation.

Also note that while consulting ConceptNet helps to reduce some of the false nega-tives resulting from the method of Krishnakumaran and Zhu (2007), the function of the preceding and the following steps is not as clear. The function of the preceding step with WordStat is not clear because words whose main categories are different are unlikely to be in the same WordStat category. The last steps that make use of concrete information and collocations are not motivated. It is therefore unclear whether or how these steps help to further reduce incorrect classification. To sum up, the use of ConceptNet in the CCO** algorithm essentially serves as a word-sense disambiguation process, which helps to resolve some of the false negatives of Krishnakumaran and Zhu (2007). However, it might be possible to simplify the algorithm if it does not need to resemble the CCO and CCO* algorithm for Type II and Type III metaphors respectively.

Instead of using existing lexical resources, Li et al. (2013) built a hyponymy and a metaphor knowledge base, and proposed a method to identify metaphors with the help of the two knowledge bases. The hyponymy knowledge base (W. Wu et al. 2012) contains 16 million pairs of noun phrases exhibiting a hyponymy relationship, which were obtained using Hearst patterns (Hearst 1992). Each pair is associated with a set of probabilistic scores (e.g., how typical the paired noun phrases are to each other). The metaphor knowledge base consists of close to 3.6 million pairs of noun phrases with metaphorical relationships, each pair being associated with a probability score. The metaphorical pairs were obtained from similes (e.g., life is like a boat) as well as is-a patterns not found in the hyponymy knowledge base.

The process for identifying copula metaphors is as follows: if the candidate target-source pair already exists in the hyponymy or the metaphor knowledge base, it is classi-fied into the literal or metaphorical category directly; otherwise, the algorithm compares the probabilities that the candidate represents a metaphorical and a hyponymy rela-tionship, by searching the knowledge bases for noun phrases that have a metaphorical relationship with a member of the candidate pair and a hyponymy relationship with the other member. For example, if the hyponymy knowledge base has (Ferrari, sports car ) and the metaphor knowledge base has (sports car, beast), then the expression my Ferrari is a beast is very likely metaphorical. As is pointed out for the method of Neuman et al. (2013), the candidate expressions, as well as the expressions stored in the two knowledge bases, will only avoid ambiguity if the basic meanings of the nouns in the context are their most common uses. Therefore, like Neuman et al. (2013), this method can resolve some of the false negatives of Krishnakumaran and Zhu (2007) without direct word-sense disambiguation.

4.1.1.2 Type II and Type III metaphors Like Type I metaphors, Type II and Type III metaphors can be either conventional or novel. For example, monosemous adjectives and verbs all have the potential to be used in a completely novel context, and thus resulting in novel metaphors. Unlike Type I metaphors, however, the deliberateness of Type II and Type III metaphors may need more context to decide. Type II and Type III metaphor identification is also trickier than Type I metaphor. For Type I metaphors, it can be assumed prior to metaphor identification that the subject is literal and it is the metaphoricity of the object that needs to be predicted. For Type II and Type III metaphors, however, both the noun(s) and the verb or adjective could be metaphorically used. Compare the following pairs of expressions:

(18)

b. She married a brick. (Wilks et al. 2013) (10) a. sweet person (TSV)

b. academic gap (TSV)

In both pairs, the verb or adjective in the former expression (drinks and sweet) is metaphorically used, whereas it is a noun in the latter expression (brick and gap) that is metaphorically used. Therefore, it could be problematic to assume beforehand that the nouns always take their literal senses.

To identify Type II and Type III metaphors, Krishnakumaran and Zhu (2007) used bigram counts or co-occurrence frequency in addition to the WordNet hyponymy rela-tionships that are also used for identifying Type I metaphors (see section 4.1.1.1). An adjective-noun or verb-noun relation is marked ‘normal’ if the noun has a hyponymy relationship with one or more of the nouns that frequently collocate with the adjective or verb. The problem with the identification of Type I metaphors is also present in this method: since the method does not involve word-sense disambiguation, it will result in false negatives when dealing with words with entrenched metaphorical meanings. For example, since there is a hyponymy relationship between woman and cat, the method is likely to consider personifications of cat literal.

Additionally, the method of Krishnakumaran and Zhu (2007) adopts the SPV view of metaphor use and assumes that the most frequent uses of a word are always literal (see section 2.2), which could also lead to false positives and false negatives when it comes to words whose most frequent uses are metaphorical. For example, the method is unlikely to detect the metaphoricity of bright colours, in which the contextual meaning of the adjective can be associated with a more basic, but less frequently used meaning that refers to the brightness of light, as in bright sunlight.21 _{While overlooking highly}

conventionalised metaphorical expressions like this might be acceptable for NLP tasks, this means that the method might not be suitable for extracting stimuli for experiments that aim to investigate the processing of conventional metaphors.

Turney et al. (2011) utilised concreteness information for automated metaphor iden-tification, based on the observation that many metaphorical mappings are between a concrete domain and an abstract domain. To identify Type III metaphors in which the adjective is concrete, the method looks up the concreteness of the noun, and marks the adjective-noun relation metaphorical if the noun is abstract. This method is unlikely to falsely classify bright sunlight into the metaphor category like Krishnakumaran and Zhu (2007) might do, as it is not based on co-occurrence frequency and the concreteness of the two words are similar22_{The method can also be improved fairly easily by comparing}

the concreteness of the adjective and the noun instead of assuming that the adjective is concrete. Nonetheless, it will overlook metaphorical mappings between domains with similar levels of concreteness, including both false negatives (e.g., bright colours) and true positives (e.g., light taste) that might result from Krishnakumaran and Zhu (2007). Also note that concrete and abstract words may be processed in different ways in the brain (see Mkrtychian et al. 2019). If an experiment exclusively uses metaphorical

ex-21

This example is in accord with the metaphor annotation of VUAMC (see section 3), in which the adjective bright in expressions such as bright blue eyes is marked as a metaphor-related word. The frequency information is from the Macmillan dictionary: www.macmillandictionary.com/dictionary/ british/bright 1.

22_{The concreteness ratings (ranging from 100 to 700) of bright and sunlight are 473 and 515}

re-spectively, according to the MRC Psycholinguistic Database (https://websites.psychology.uwa.edu.au/ school/MRCDatabase/uwa mrc.htm), from which Turney et al. (2011) obtained the abstractness ratings used in their experiment.

(19)

pressions that are realisations of concrete-abstract domain mappings, the generalisability of the findings should be considered with caution.

Assaf et al. (2013) proposed the Concrete Category Overlap (CCO) algorithm, which uses WordNet categories, co-occurrence frequency, and concreteness information for the detection of Type III metaphors. An adjective-noun relation is marked literal if, accord-ing to the WordStat dictionary (see section 4.1.1.1), the categories of the noun overlap with the categories of the most concrete nouns that are frequently modified by the ad-jective (co-occurrence frequency is examined before concreteness). This method is an improvement compared to the method of Turney et al. (2011), as it is likely to detect the metaphoricity of concrete-concrete mappings such as light taste. Nonetheless, it is not necessarily also an improvement on the method of Krishnakumaran and Zhu (2007). The WordStat dictionary is based on the categorisation of words provided by the WordNet lexicograher files, which in turn is based on the uses of words in all contexts (e.g., cat as a noun belongs to four categories: animal, person, artefact, and act). The use of the WordStat dictionary in CCO (and its derivatives) is thus similar to the use of WordNet hyponymy relationships by Krishnakumaran and Zhu (2007) (see section 4.1.1.1) in that neither methods differentiate the different uses of the same word. The false negatives (e.g., bright colours) and false positives (e.g., bright sunlight) that may result from the method of Krishnakumaran and Zhu (2007) are due to its dependence on co-occurrence frequency. Since the same criterion is employed in CCO, it is unlikely to resolve those false judgements.

Based on the CCO algorithm, Neuman et al. (2013) developed the CCO* algorithm for the identification of Type II metaphors. After consulting the WordStat categorisa-tion diccategorisa-tionary for overlapping categories between the object in the subject-verb-object relation and the most concrete nouns that most frequently collocate with the verb, the algorithm only marks the relation literal if the main categories (obtained from Con-ceptNet) of the object and the concrete nouns also overlap. Similar to the comparison with the method of Krishnakumaran and Zhu (2007) for Type I metaphors, the use of ConceptNet by Neuman et al. (2013) can help to resolve some of the false negatives, when the nouns in the candidate verb-noun relation are used literally and all the words have only one literal use, which corresponds to their main categories in ConceptNet. In other words, the algorithm may lead to false positives for words with multiple literal uses, and it should be verified for the true positives whether it is the verbs that are used metaphorically.

Also note that Neuman et al. (2013) exclude monosemous adjectives and verbs in their CCO and CCO* algorithms. This will lead to false negatives that are novel metaphors realised by the use of monosemous words.

Wilks et al. (2013) attempted to deal with Type II metaphors that are realisations of conventional metaphors. The methods rely on the selectional restrictions of verb classes specified in VerbNet (Schuler 2005) and the mappings between VerbNet and WordNet word senses. The basic idea is that a verb and its arguments realise a conventional metaphor if the nouns satisfy the selectional restrictions of the verb, but the satisfying sense is not the primary senses of the verb or the nouns in WordNet. The methods are thus based on two assumptions: word senses are ordered according to primacy in WordNet, and the first sense is always the literal sense. Apart from the fact that words can have multiple literal uses, another problem of the assumptions is that according to WordNet itself, the ordering of word senses should be considered random (Princeton University 2010). It is, strictly speaking, not entirely valid to assume a particular role for the first senses in the resource.

(20)

Wilks et al. (2013) addressed the cases where the noun in a verb-noun relation is metaphorical (e.g., she married a brick ) and proposed separate methods that deal with metaphorically used verbs and nouns in verb-noun constructions respectively. Both methods take as input a verb and its arguments. To identify metaphorically used verbs, the system looks for the sense of the target verb that belongs to the VerbNet class whose selectional restrictions are satisfied by the arguments. If it is not the first sense of the verb in WordNet, the verb is a realisation of a conventional metaphor. To identify metaphorically used nouns, the system first checks whether the noun arguments sat-isfy the selectional restrictions of the verb; if not, the expression is what they call ‘a Preference Violation metaphor’. If the selectional restrictions are satisfied, a noun argu-ment is a realisation of a conventional metaphor if the sense that satisfies the selectional restrictions is not the first sense of the noun in WordNet.

The study of Wilks et al. (2013) is distinguishable from other studies that deal with Type II metaphors in that it addresses metaphorical verb-noun constructions in which the noun rather than the verb is metaphorically used. However, it has to be assumed for the identification of verb metaphors that the noun arguments are used literally, and for the identification of noun metaphors that the verb is used literally. If it can be decided beforehand that the verb or the nouns are used literally, however, there is no need for a metaphor identification system in the first place. Eventually, the methods of Wilks et al. (2013), like the other methods that assum nouns are used literally, can only detect metaphoricity at phrase or sentence level; it remains a question which words are used metaphorically.

Li et al. (2013) use the selectional preferences of adjectives and verbs in addition to their hyponymy and metaphor knowledge bases (see section 4.1.1.1) for the identification of Type II and Type III metaphors. In particular, if the adjective or verb in the candidate relation prefers another noun phrase that, according to the metaphor knowledge base, has a metaphorical relationship with the noun phrase in the relation, the candidate relation is metaphorical. Since selectional preference is based on co-occurrence frequency, the method may not be able to resolve false negatives like bright colours, which are conventionalised metaphorical expressions that are more frequent than the literal uses of the adjective or verb. Additionally, since the method is based on a corpus of Type I metaphors, which are essentially deliberate metaphor uses (see section 4.1.1.1), the method may overlook realisations of metaphorical mappings that are less likely to be used deliberately.

Shutova and Sun (2013) also obtain potential metaphorical mappings before search-ing for lsearch-inguistic metaphors. The system begins with buildsearch-ing a hierarchical categori-sation of nouns by means of hierarchical graph factoricategori-sation clustering (Yu et al. 2006), using grammatical relations or verb-noun relations as features. The system then identi-fies metaphorical mappings between a given source-domain noun and five target-domain clusters. This is done by searching the graph for the top six clusters that are asso-ciated with a given source-domain noun and excluding the cluster to which the input noun belongs (i.e., the literal cluster). Based on the identified metaphorical mappings, the system collects as source-domain verbs the verbs that contribute the most to the identified metaphorical associations and form literal verb-noun relations with the input source-domain noun. The verb-noun relations in the text corpus in which the verb is one of the source-domain verbs and the noun belongs to one of the target-domain clus-ters will be identified as realisations of the identified metaphorical mappings. Just as Li et al. (2013) tend to overlook non-deliberate metaphor uses, the system of Shutova and Sun (2013) may overlook Type II metaphors based on metaphorical mappings that

(21)

have not been realised by verb-noun relations. Since metaphorical mappings are between conceptual domains instead of individual words, metaphorical mappings that have not been realised by verb-noun relations could, in principle, be realised by such relations as well. This means that the system may overlook some instances of novel metaphor use.

Shutova, Kiela, et al. (2016) proposed two methods, WordCos and PhrasCos, that use cosine similarity between word or phrase embeddings for predicting the metaphoricity of adjective-noun or verb-noun pairs; a word pair is marked metaphorical if the cosine similarity is below a trained threshold. The WordCos method calculates the cosine similarity between the pretrained word embeddings of the two words. The rationale is that occurring words having a metaphorical relation are less similar compared to co-occurring words having a literal relation. The PhrasCos method calculates the cosine similarity between the embeddings of the phrase and the individual words. The rationale is that metaphor gives rise to emergent meaning (e.g., Johnson 1981); compared to literal phrases, the meaning of a metaphorical phrase should be less similar to the combination of the individual words. Shutova, Kiela, et al. (2016) also experimented with fusing visual representations with linguistic representations, which improved the performance of the model for both Type II and Type III metaphors. While the rationale behind the methods captures important aspects of metaphor as defined in CMT, it should be noted that pretrained word embeddings, which usually capture the most common uses of the words, do not necessarily denote literal word uses. Therefore, this method is also likely to overlook highly conventionalised, frequently used metaphorical expressions.

Taking the methods of Shutova, Kiela, et al. (2016) as the starting point, Rei et al. (2017) proposed a supervised similarity network, which aims to specialise in a calculation of the semantic relatedness of the components of candidate word pairs that is specific to the task of metaphor identification. Taking an adjective-noun or verb-noun pair represented by pretrained word embeddings as input, the network first uses a gating function to obtain a representation of the noun with respect to the word embedding of the adjective or verb. The two word representations are then mapped onto a task-specific vector space, where a weighted cosine similarity is calculated and compared to a threshold. The method of Rei et al. (2017) addresses the problem with relying on co-occurrence frequency and resulted in better performance of their models compared to Shutova, Kiela, et al. (2016) without the use of visual representations. However, it is also less visible from the method of Rei et al. (2017) what were being compared with by the models and how different aspects of word meaning contributed to calculation of cosine similarity. A closer examination of the weights in the trained models might reveal more about the improved performance and the relationship between vector space word representations and metaphor identification.

Since Shutova, Kiela, et al. (2016) and Rei et al. (2017) use trained thresholds to determine the metaphoricity of candidate word pairs, both methods have the potential to differentiate different types of metaphors (e.g., conventional versus novel metaphors) by training separate models using datasets that manifest different criteria for metaphor identification. Note, however, that the performance of the models could vary with the metaphor type to be detected, which could also reveal interesting facts about the dif-ferences between metaphor types with respect to the distributional behaviour of the linguistic realisations.

4.1.1.3 Summary The three types of linguistic metaphors defined by Krishnaku-maran and Zhu (2007) do not cover all types of metaphor uses in natural discourse; a system that detects Type I, Type II, and Type III metaphors may therefore not be ideal

(22)

for linguistic analysis. Nonetheless, the detected examples should suffice as stimuli in experiments.

Among the methods dedicated to the identification of Type I or copula metaphors, the CCO** algorithm proposed by Neuman et al. (2013) and the method of Li et al. (2013), which is equipped with hyponymy and metaphor knowledge bases, are likely to resolve nouns with entrenched metaphorical meanings (i.e., when dealing with expres-sions like the woman is a cat and my cat is a tiger ). Nonetheless, it should be noted that the methods are not designed to exhaust all the copula metaphors in a text corpus. Instances whose metaphoricity needs to be considered in a longer range of context are likely to be left out.

The methods for detecting Type II and Type III metaphors typically assume the SPV view of metaphor. Highly conventionalised metaphor uses such as bright colours are likely to be overlooked. Also note that for Type II and Type III metaphors, both the verbs or adjectives and the nouns are potential metaphorically used words. Most methods, however, seem to assume that the nouns are literally used; Wilks et al. (2013) explicitly addressed the problem, but their method does not seem to resolve it. The methods therefore essentially detect metaphors at phrase instead of word level.

A common issue of the explicit models examined so far is that they acknowledge and incorporate the distinctness of target and source domains, but tend to overlook the relatedness of the two domains that exhibit a metaphorical relation. As a result, the extracted linguistic phenomena may include anomaly or nonsensical expressions (e.g., colourless green ideas sleep furiously). Wilks et al. (2013) might be able to differenti-ate anomaly from realisations of conventional metaphors, but the ‘Preference Violation Metaphors’ might be a mixture of novel metaphors and anomaly. For models that use a threshold to determine metaphoricity (Rei et al. 2017; Shutova, Kiela, et al. 2016; Su et al. 2017), it may help to resolve the issue by using a lower threshold together with the existing upper threshold.

4.1.2 Taking the entire sentence as context

The first explicit models that take into account a longer range of context were proposed by Turney et al. (2011). Unlike the detection of metaphorical adjective-noun relations, which only considers the concreteness of the noun (see section 4.1.1), Turney et al. (2011) detect metaphorically used concrete verbs by considering the concreteness of all of the content words in the same sentence. The input sentence is represented by a 5d vector, the dimensions being the average abstractness ratings of all the nouns, proper nouns (the nouns and the proper nouns are mutually exclusive), verbs (excluding the target verb), adjectives, and adverbs in the sentence respectively. While the prediction considers the property of more context words, it is based solely on concreteness information. Metaphor uses that might be overlooked by the method include personifications such as my car drinks gasoline, as animals and objects (which function as target domains in such metaphors) are usually equally or even more concrete than human actions or events (which the source-domain verbs refer to). It is therefore essential to examine the domains associated with the words even when more context words are taken into account.

The system proposed by Mao et al. (2018) identifies a word as being metaphorically used if its contextual meaning within the sentence differs from its most common meaning. The most common meaning is represented as the pretrained word embedding of the target word itself. The contextual meaning is represented as the pretrained word embedding of the candidate word that is the most similar to the context words. The candidate

The potentiality of computational metaphor identification systems for studying human metaphor processing