• No results found

The relation of responsivity and neg-raising

N/A
N/A
Protected

Academic year: 2021

Share "The relation of responsivity and neg-raising"

Copied!
29
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The relation between

responsivity and neg-raising

(2)

Layout: typeset by the author using LATEX.

(3)

The relation between

responsivity and neg-raising

Kaj D. Meijer 10509534

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. C. Qing

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 907 1098 XG Amsterdam

(4)

Abstract

There is an ongoing debate regarding clause-embedding verbs and the relation between responsivity and neg-raising. This research aims to provide another ap-proach regarding this debate. Thresholds for responsivity and neg-raising have been used to gather responsive neg-raising predicates from the MegaNegraising and MegaAcceptability dataset.

The responsive neg-raising predicate decide has been found, with dependence of the thresholds. However, the neg-raising evaluation shows that, although de-cide has a high neg-raising likelihood, is not neg-raising according to White’s interrogative-to-declarative test, which uses strong NPI’s. This evaluation high-lights the difference in neg-raising measure in White’s and Theiler et al.’s ologies and how the neg-raising evaluation can differ between these two method-ologies.

So even though this research subject is still open for discussion, especially the measurement method for raising, this thesis couldn’t find any responsive neg-raising predicates that would indicate any other relation between responsivity and neg-raising other than being mutually exclusive.

(5)

Chapter 1

Introduction

This thesis investigates English clause-embedding verbs. The focus of this thesis is on whether these clause-embedding verbs could take certain complements based on the neg-raising property of a clause-embedding verb. The term predicate will be used to refer to clause-embedding verbs.

Some predicates can take declarative complements (1a) or interrogative com-plements (1b) and some can even take both.

(1) a. Mary {knew, thought, *wondered} that John left. b. Mary {knew, *thought, wondered} whether John left.

There are also categories of clause-embedding verbs that indicate which com-plements the predicates can take: antirogative if they only take declarative clausal complements, rogative if they only take interrogative causal complements and re-sponsive if they take both declarative and interrogative clausal complements.

This thesis will also mention declarativity, a predicate’s ability to take a declar-ative complement and interrogativity, a predicate’s ability to take an interrogdeclar-ative complement.

There are more properties to clause-embedding verbs, such as neg-raising. If a predicate is neg-raising it means that this predicate keeps the same meaning while raising the negation from the embedded or subordinate clause to the main clause. The meaning of (2b) is different from (2a), as Mary is certain in (2a) and uncertain in (2b). On the other hand, sentences (2c) and (2d) are generally thought to mean the same. This shows that know is not neg-raising and think is.

(2) a. Mary knew that John didn’t leave.

b. Mary didn’t know that John left. <– Different meaning than (2a) c. Mary thought that John didn’t leave.

d. Mary didn’t think that John left. <– Same meaning as (2c) 1

(6)

1.1

Theoretical context

There are several generalizations regarding clause-embedding verbs and discus-sions on the validity of these generalizations. These generalizations on clause-embedding verbs discuss whether these predicates should take declarative, inter-rogative or both complements. One important generalization about both these types of clauses and neg-raising is Zuber’s (1982) generalization: all neg-raising predicates are antirogative. Theiler et al. (2019) observed that this is indeed the case and provided a formal analysis.

Following Bartsch (1973) and Gajewski (2007), Theiler et al. (2019) use a pre-suppositional account where neg-raising behavior results from a so-called excluded-middle (EM) presupposition. For example, sentence (3) presupposes that Mary is opinionated.

(3) Mary believes that John left

Presupposition: Mary believes that John left or Mary believes that John didn’t leave

Even though (3) might be an obvious example, if (3) is negated it implies that Mary believes that John didn’t leave, which shows the neg-raising property of believe, exemplified in (4).

(4) Mary doesn’t believe that John left

Presupposition: Mary believes that John left or Mary believes that John didn’t leave

∴ Mary believes that John didn’t leave.

Using this EM presupposition, Theiler et al. (2019) create a generalized EM pre-supposition that applies to both declarative and interrogative complements, which they used to formulate a lexical entry for believe. This generalized EM presuppo-sition has a different effect depending on whether the complement is declarative or interrogative. The effect for declarative complements comes down to the normal version of this presupposition and for interrogative complements it leads to sys-tematic triviality. Theiler et al. (2019) show how Gajewski’s (2002) L-analyticity can be used to explain the antirogativity of neg-raising predicates.

Gajewski’s (2002) L-analyticity, which stands for logical analyticity, appears at the level of grammar, such that L-analytical sentences are observed as un-grammatical. A logical skeleton (LS) is needed in order to determine whether a sentence is L-analytical, because a sentence S is L-analytical just in case S’s LS receives the denotation 1 (or 0) for all interpretations in which its denotation is defined (Gajewski, 2002). A LS can be constructed by identifying all constituents that don’t contain logical items and replace them by constants of the same type, illustrated by example (5).

(7)

(5) a. *There is every tall tree. b. Logical skeleton of (5a):

[there is every P]

Following Barwise and Cooper (1981), it is assumed that there refers to the domain of individuals De. There isn’t any possible interpretation I such that

〚there is every 〛(D,I) = 0, since I (P) ⊆ D

e for all I. This means that sentence (5a)

is L-analytical, which explains why it is not grammatical.

It should be noted that the focus on believe was only for their discussion and that Theiler et al. (2019) stated that the presuppositional account can be extended to other neg-raising predicates.

However, White (2021) argues that the generalizations that are based on the assumption that think, believe, hope and fear are antirogative predicates, are not viable when they are evaluated at the scale of the entire English lexicon. White (2021) used White and Rawlins’ (2016) methodology and An and White’s (2020) MegaNegraising dataset in order to test Zuber’s (1982) generalization. An and White’s (2020) dataset contains likelihood judgements where participants had to judge with a slider how likely it is for a speaker to mean something when they say something else, as seen in (6).

(6) If I were to say I don’t think that a particular thing happened, how likely is it that I actually mean that I think that that thing didn’t happen?

Later, An and White (2020) applied their method to all declarative complement taking English verbs, according to White and Rawlins’ (2016) criterion, varying the subject and tense, shown in (7).

(7) {I {didn’t, don’t}, A particular person {didn’t, doesn’t}} think that a particular thing happened.

White (2021) assumes that a predicate is neg-raising if it is neg-raising in either of the four combinations of subject and tense, so White uses the maximum of the normalized neg-raising judgements for each predicate (An and White, 2020, Appendix D). White (2021) then plots this measure of neg-raising against White and Rawlins’ (2016) measure of responsivity and shows a linear regression weighted by the declarative-taking weight, which displays a positive correlation (White, 2021, Figure 3 (right)). Figure 1.1 shows a reconstruction of that figure, but only uses the unweighted normalized judgements scores and neg-raising scores of the NP Ved that S-frame. White (2021) suggests that this positive correlation is a violation to the prediction of Zuber’s (1982) generalization and that it is false about the entire English lexicon.

(8)

However, White’s (2021) suggestion isn’t necessarily true. As seen in Figure 1.1, there is indeed a positive correlation between responsivity and neg-raising, but Zuber’s (1982) generalization actually only excludes predicates from the top right corner, where responsivity and neg-raising should be at their highest, and a large area of that corner is indeed empty. White (2021) used a responsivity threshold of zero with the normalized data from White and Rawlins’ (2016) MegaAcceptability dataset to determine whether a predicate is responsive or not, but as will be discussed in the next chapter, this threshold is too low. Additionally, White (2021) did not declare any neg-raising threshold, but instead spoke of a general neg-raising likelihood.

Figure 1.1: The correlation between responsivity and neg-raising for the NP Ved that S-frame.

Therefore, this thesis will investigate what thresholds for responsivity and neg-raising would suffice and how large the area, in Figure 1.1, has to be to also be evaluated as responsive and raising according to Theiler et al.’s (2019) neg-raising measure.

Furthermore, White (2021) states that the predicates think and believe are neg-raising and thus according to Zuber’s (1982) generalization should not take interrogative complements. However, White (2021) used corpus evidence to attest that think and believe do take interrogative complements and argues that Zuber’s (1982) generalization should be jettisoned. This thesis will also try to find corpus evidence for any responsive neg-raising predicates that follow from the thresholds.

(9)

White (2021) also comments on Theiler et al.’s 2019 note that believe does in fact take interrogative complements in some constructions, but not constructions that use polar interrogative complements, which are complements that can be answered with yes or no. Theiler et al. (2019) also note that an anonymous reviewer pointed out that believe was not neg-raising in the construction where it took interrogative complements. White (2021) argues the fact that particular uses of a predicate are invoked to save a generalization and that Theiler et al. (2019) also used it to save their generalization from examples such as (8). White (2021) also suggests that particular uses of a predicate in some context are relevant to the generalization and not the fact that the predicate can trigger the relevant inferences in some contexts.

(8) You won’t believe who called (*in ages)!

Therefore, the relation between responsivity and neg-raising is under heavy de-bate, more specifically whether neg-raising can be used as a prediction property for responsivity. There is still much more information to be added to this conversation and this thesis will address this subject by using An and White’s (2020) MegaNe-graising dataset and White and Rawlins’ (2016) MegaAcceptability dataset, since this is the only quantitative data available on this subject.

(10)

1.2

Research question

The main research question is:

How are responsivity and neg-raising related?

To investigate this question, this research question has been split in the follow-ing sub-questions that form the structure of the thesis:

1. What thresholds could be used for the identification of responsive neg-raising predicates?

2. Are there responsive neg-raising predicates that can be extracted from large-scale data?

3. Can the responsivity and neg-raising measures of the extracted responsive neg-raising predicates be validated?

1.3

Hypotheses

It is expected that the relation that responsivity and neg-raising do not exclude each other, i.e., responsive predicates could also be neg-raising and neg-raising predicates could also be responsive.

The hypotheses of the sub-questions are as follows:

1. The thresholds should be high enough to exclude false positives.

2. It is expected to find multiple predicates, besides believe and think that can be considered responsive and neg-raising.

3. There is at least one responsive neg-raising predicate, with an interrogative complement, that can be found in a corpus.

(11)

Chapter 2

Method

As mentioned before, An and White’s (2020) MegaNegraising dataset was used for the measure of neg-raising in this thesis. However, this dataset does not contain all necessary information to test the relation between responsivity and neg-raising, so White and Rawlins’ (2016) MegaAcceptability dataset is utilized to provide the data needed for the measure of responsivity.

First of all, responsive and neg-raising predicates needed to be identified to test the relation between responsivity and neg-raising. However, even though the MegaNegraising dataset contains participants judgement scores of the predicates, this data is not actually labeled or divided into classes that could indicate which predicate is actually neg-raising or not. The same thing stands for the MegaAccept-ability dataset, only assumptions can be made whether a predicate is antirogative, rogative or responsive. Therefore, thresholds were used due to the lack of any labels or classifications in the data. These thresholds were to be chosen in a way that would minimize the number of false positives.

Finally, the responsive neg-raising predicates found with these thresholds were evaluated in large corpora to confirm whether these predicates could actually take interrogative complements.

The code that was written for this thesis can be found on: https://github.com/Kajzzer/responsivity-negraising-relation.

(12)

2.1

Frames

Frames are used in both An and White’s (2020) MegaNegraising dataset and White and Rawlins’ (2016) MegaAcceptability dataset to formulate sentence structure, see (9) and (10). The frames used in (9c) and (10c) were used by White and Rawlins (2018) to determine which predicates could take declarative complements and these frames are the most simplified frames that contain either declarative or, if that is replaced by whether, interrogative complements. The frames seem very similar, besides the different predicates in (9a) and (10a), but note that in (9a) Mary does the talking themselves, while in (10a) Mary is listening to someone else telling them something.

It is to be noted that frames (9c) and (9d) have the same structure, but (9c) belongs to the normalized MegaAcceptability dataset and (9d) belongs to the MegaNegraising dataset. The same holds for (10c) and (10d).

(9) a. Mary said that John left.

b. The robber believed that he escaped. c. NP Ved that S

d. NP V that S

(10) a. Mary was told that John left.

b. The policeman was advised that the robber ran. c. NP was Ved that S

d. NP be V that S

This thesis was restricted to the following frames: the declarative NP Ved that S-frame and the interrogative NP Ved whether S-frame. Note that the frames only have a different complement. These frames were chosen because they appear in both the MegaAcceptability dataset and MegaNegraising dataset and this restric-tion allowed for results specific for these frames, without the interference of the responsivity and neg-raising measures from other frames.

During the rest of this thesis, it can be assumed that all results and measures of responsivity and neg-raising belong to these frames.

(13)

2.2

Responsive Predicates

The measure of responsivity was extracted from the

mega-acceptability-v1-normalized data of White and Rawlins’ (2016) MegaAccept-ability dataset. The original data from this dataset contains ordinal scale accept-ability judgements on the scale of [1, 7] for 1007 verbs and 50 frames. Participants were given lists of semantically bleached sentences, sentences that contain as little semantic content as possible. For example (11a) and (11b) would both be bleached to (11c), to ensure that plausibility effects are controlled for.

(11) a. Mary thought that John left.

b. The head of security thought that his staff properly switched shifts. c. Someone thought that something happened.

Furthermore, the MegaAcceptability dataset has an alarmingly low number of responses per predicate. Every predicate only has five responses from differ-ent participants that can be used to determine its responsivity and thus is this data very sensitive to poor judgements. White and Rawlins (2020) did develop a normalization method to make the data less sensitive to poor responses and also proposes a threshold for responsivity. However, as we will see below, their data are still very noisy and their threshold is too low.

2.2.1

White and Rawlins’ normalization

In this method White and Rawlins (2020) utilized the Spearman rank correlation, which measures the strength and direction of association between two variables, to measure participant agreement of participants that did the same list of pred-icates and uses that measurement to weigh the participant’s responses. White and Rawlins (2020) also note that this participant agreement is low and that it is likely due to the fact that the MegaAcceptability dataset contains many low frequency predicates where participants could be less certain about and the rate of poor participants in the data.

However, White and Rawlins (2020) also mention that most participants only rated one list of predicates and that a good participant could be assigned low quality score by the Spearman rank correlation if that same list is rated by mostly bad participants. This issue was addressed by White and Rawlins (2020) as they fitted a linear mixed effects model with random intercepts for a participant and Spearman rank correlations followed by the extraction of the Best Linear Unbiased Predictors (BLUP) for the participant intercepts. White and Rawlins (2020) then z-scored and squashed the results of the BLUP through a Gaussian cumulative distribution function to [0, 1] and were combined into a single variability score by

(14)

computing their mean, weighted by the participant quality score of the participant who provided the rating.

The ordinal scale participant judgements were evaluated by White and Rawlins (2020) as interval data that associated participants with a different manner of binning, defined by cutpoints for each participant. White and Rawlins (2020) created these bins of varying size where the bin corresponding to the worst rating, 1, is to the interval (-∞, cutpoint1] and the bin corresponding to the best rating,

7, is the interval (cutpoint6, ∞). White and Rawlins (2020) then estimate the

acceptabilities for all verb-frame pairs and cutpoints for the participants by using gradient descent to maximize the likelihood of the data mentioned above together with an exponential prior on the distance between the cutpoint and a smoothing term, so that the third cutpoint is locked to zero. White and Rawlins (2020) claim that the predicates are identifiable by locking the third cutpoint to zero, which means that the responsivity threshold should be zero in the normalized data.

2.2.2

Thresholds for responsive predicates

Further investigation of the predicates around White and Rawlins’ (2020) thresh-old for responsivity indicated that this threshthresh-old was insufficient for this thesis. The two lowest scoring predicate for either the declarativity and interrogativity measures have been retrieved for a manual evaluation. The relevant information in Table 2.1 is the declarativity of coach and conspire and the interrogativity of chirp and weep.

Predicate Declarativity Interrogativity

coach 0.001 -0.237

conspire 0.0015 1.427

chirp 0.778 0.0025

weep 1.774 0.007

Table 2.1: The two lowest scoring predicates for the declarativity and interrogativity measures that just exceed their respective thresholds. According to White and Rawlins’ (2020) normalization, these predicates should respectively take declarative or interrogative complements. However, after a closer inspection of of the bleached sentences of these predicate, (12) and (13), it became clear that this threshold is insufficient. Only sentence (12b) is a possible excep-tion, but the other sentences indicate that this threshold could contain many false positives.

(15)

(12) a. *Someone coached that something happened. b. ?Someone conspired that something happened. (13) a. *Someone chirped whether something happened.

b. *Someone wept whether something happened.

Therefore, the threshold of White and Rawlins’ (2020) normalization is not used for this thesis, and a new threshold had been separately created for the declarativity and interrogativity measures, by using the standard deviation and mean of all predicates for the earlier mentioned frames. The standard deviation (STD) indicates the average amount of variability in the data and is should be safe to assume that, with a normalized declarativity measure with the rounded range of [-3.0, 4.8] and a normalized interrogativity measure with the rounded range of [-3.3, 4.1], the STD will provide a considerably higher value than zero. White and Rawlins (2020) mention that they attempted to select every verb in English that could take a clause of some type in the MegaAcceptability dataset, so it was assumed that the mean of the declarativity and interrogativity measures are higher than zero.

Since the normalized data of White and Rawlins’ (2016) MegaAcceptability dataset was supposed to indicate responsivity at zero, the new threshold adopted a combination of the STD and the mean. The acceptability scores were divided be-tween the declarativity measure, of the NP V that S-frame, and the interrogativity measure, of the NP V whether S-frame.

The STDs and means of the declarativity and interrogativity measures in Table 2.2 confirm the assumptions that both the STD and mean were positive numbers and that the thresholds for the measures are significantly higher than the original thresholds of zero.

STD Mean Total Declarativity 1.351 0.769 2.12 Interrogativity 1.2156 0.4457 1.661

Table 2.2: The STDs and means of the declarativity and interrogativity measures, complemented with the rounded total that should function as the new

threshold for the measures.

Table 2.3 shows the two lowest scoring predicate for either the declarativity and interrogativity measures and only warn could be considered a responsive predicate with these threshold.

The analysis of the bleached sentences, (14) and (15), indeed showed that the current predicates, that are just above the thresholds, can be considered to take declarative or interrogative complements. That being the case, the used thresholds

(16)

Predicate Declarativity Interrogativity freak_out 2.122 1.167

certify 2.126 1.616

warn 2.258 1.669

disregard 2.079 1.673

Table 2.3: The two lowest scoring predicates for the declarativity and interrogativity measures that just exceed their respective thresholds. were 2.12 for the declarativity measure and 1.661 for the interrogativity measure. These threshold were still not perfect, but the purpose of this thesis was to find some possible responsive neg-raising predicates and these threshold lead to few false positives.

(14) a. Someone freaked out that something happened. b. Someone certified that something happened. (15) a. Someone warned whether something happened.

b. Someone disregarded whether something happened.

2.3

Neg-raising predicates

The measure of neg-raising was extracted from the mega-negraising-v1 data of An and White’s (2020) MegaNegraising dataset. As seen in the introduction, partici-pants were asked to judge how likely it is for a speaker to mean something when they say something else, see (6), on a [0, 1]-slider. The MegaNegraising dataset has some key features that are different than the MegaAcceptability dataset, it uses slightly different bleached sentences, see (7), a verb-frame-subject-tense combina-tion instead of a verb-frame pair and a different notacombina-tion for the frames. However, the different frame notation can be disregarded as was mentioned earlier using examples (9) and (10).

Restrictions on the verb-frame-subject-tense combinations have already been made earlier in the thesis by limiting the frame. The subject and tense are com-plements to the frame in order to create the correct bleached sentence (7), so those are limited to the third person subject and past tense in order to match the gen-eral semantics of this bleached sentence with those used in the MegaAcceptability dataset, see (11c).

Besides the four key parameters that form the verb-frame-subject-tense com-bination, there is another parameter that should be mentioned: acceptability. In the context of the MegaNegraising dataset, An and White (2020) explain how ac-ceptability was used by participants to indicate, on a [0, 1]-slider, how likely it

(17)

would be that a particular sentence would be said by a person, regardless of the neg-raising, see (16).

(16) How easy is it for you to imagine someone saying I don’t announce that a particular thing happened?

However, in contrast to An and White (2020), the acceptability responses were not used to weigh the neg-raising measure. Predicates with a high neg-raising measure were assumed to also have high acceptability responses and this thesis only focuses on predicates with a high neg-raising measure. This assumption was made on the basis that it is very unlikely for a participant to be unable to image someone saying a sentence (16), while being able to think that that person actually meant something else (6).

2.3.1

Threshold for neg-raising

Unlike the MegaAcceptability dataset, the MegaNegraising dataset doesn’t have normalized data with a proposed threshold.

An and White (2020) plotted the likelihood of neg-raising of the third person subject against the likelihood of the first person subject and something notable is that most predicates are in the bottom left area, indicating a low neg-raising measure (An and White, 2020, Figure 1). This figure has also been included in this thesis, see Figure 2.1.

(18)

Figure 2.1: Normalized neg-raising scores for different subject, tense, and frame pairs (An and White, 2020, Figure 1)

.

The top left plot shows the likelihood of neg-raising for the NP Ved that S-frame, the frame that is used in this thesis, and indicates that the overall neg-raising likelihood is low. To support this, the means of the neg-neg-raising measure were calculated for different frame-subject-tense combinations, see Table 2.4, and this confirms that the average neg-raising measure is low, especially for the NP V that S-frame.

Frame Subject Tense Mean (rounded)

All All All 0.356

All First All 0.347

All Third All 0.366

All All Present 0.309

All All Past 0.351

NP V that S Third Past 0.217

Table 2.4: Means of the neg-raising measures for different frame-subject-tense combinations.

After this observation, the STD of the NP V that S-frame, rounded to 0.196, was evaluated together with the mean of 0.217. Yet, a possible threshold of 0.413 was automatically consider to be insufficient since it didn’t even reach the halfway mark of the measure range.

(19)

That being the case, the new approach was to investigate the threshold based on the quantity of predicates that exceed that threshold, using the neg-raising measures of the NP V that S-frame with third person subject and past tense.

This approach resulted in a threshold of 0.511, where only 5% of the predicates are considered neg-raising. This percentage was assumed to exclusively contain neg-raising predicates, because this percentage only includes 46 predicates that are considered neg-raising. Reducing the quantity of neg-raising predicates even further would render the predictive power of neg-raising based theories close to none.

(20)

2.4

Evaluation

The evaluation of the responsivity focused on validating the interrogativity of responsive neg-raising predicates, since neg-raising predicates are by definition as-sumed to take declarative complements. This evaluation utilized multiple corpora of NLTK, which provides to over 50 corpora and other lexical resources along with text processing libraries for tokenization, tagging, parsing, classification and more. The corpora of NLTK span over a huge number of categories (books, news items, web and chat texts), so there should be some corpora that can provide correct English sentences with interrogative complements.

The corpora that were used for the evaluation were: Brown Corpus (created by Brown University containing a great number of genres), the Gutenberg Corpus (containing parts of some 25,000 e-books) and the Reuters Corpus (containing 10,788 news documents).

The MegaAcceptability dataset also keeps track of the form the predicates take in certain sentences, in this case the active present and past tense. However, as mentioned before, Theiler et al. (2019) observed that the neg-raising property of predicates can differ in different configurations, so only the third person past tense uses of responsive neg-raising predicates, in combination with an interrogative complement, will be considered for the evaluation.

2.4.1

Neg-raising

In addition to the corpus evaluation, responsive neg-raising predicates were also evaluated on their neg-raising measure by using intuitive examples. Unlike the examples in the MegaNegraising dataset (6), these intuitive examples were not be bleached, but still had the same frame-subject-tense combination. An additional evaluation on the neg-raising was done to evaluate the neg-raising measure of the sentences that were found in the corpus evaluation of the responsive neg-raising predicates. White (2021) proposed a test, the interrogative-to-declarative test, that removes any effect of the complement form while keeping all other aspects of the context aspects of the context. In other words, converting an interrogative (17a) to a declarative complement (17b).

(17) a. You won’t believe who called (*in ages)!

(21)

Chapter 3

Results

3.1

Responsive neg-raising predicates

Table 3.1 shows all eight responsive neg-raising predicates that were found using the acceptability threshold of 2.12 for the declarativity measure, 1.661 for the interrogativity measure and a threshold of 0.511 for the neg-raising measure.

Predicate Declarativity Interrogativity Neg-raising

conclude 2.618 1.746 0.561 decide 2.792 1.971 0.58 detect 4.035 2.108 0.53 discover 3.446 2.41 0.535 like 2.248 1.981 0.561 pinpoint 2.352 2.036 0.531 see 2.413 2.421 0.54 surmise 2.668 2.061 0.537

Table 3.1: The responsive neg-raising predicates of the normalized dataset with rounded measures. Bold predicates were found in the NLTK corpora with

interrogative complements.

Lines are added to Figure 1.1 to represent the thresholds for responsive neg-raising predicates, which resulted in Figure 3.1. Note that there are more than eight predicates in selected area. Figure 1.1 was created with White (2021)’s White (2021) responsivity thresholds of zero.

The values in Table 3.1 do not only show why these predicates are considered responsive neg-raising predicates, but they can also be used in the discussion of setting thresholds and determining the relation between responsivity and neg-raising.

(22)

Figure 3.1: The correlation between responsivity and neg-raising for the NP Ved that S-frame. The red lines visualize the thresholds.

The current thresholds indicate that there is some relation between responsivity and neg-raising, but they were adjusted to the data. The responsivity thresholds use the STD and mean of the responsivity measure. The neg-raising threshold on the other hand was adjusted to only include 5% of the predicates.

This raises the question about which thresholds would not show this relation. The thresholds needed to to accomplish this can easily be found in Table 3.1 by selecting the highest values per column; either a declarativity threshold of 4.035, an interrogativity threshold of 2.792 or a neg-raising threshold of 0.58. Using either of these thresholds as values that need to be exceeded, or a combination of these thresholds. Look at Figure 1.1 for example, the linear regression shows a positive correlation between responsivity and neg-raising, yet most of the scatter plot points are in the middle of the plot and none in the top right corner, the corner that should contain responsive neg-raising predicate.

Even though a neg-raising threshold of 0.58 on a scale of [0, 1] seems more than reasonable, it was already excluded from this thesis in the method, stating that the predictive power of neg-raising based theories would become close to none.

(23)

3.2

Evaluation

The evaluation resulted in 63 sentences with interrogative complement taking verbs, of which 34 contained see, 19 decide and 10 decided. The examples in (18) illustrate some of the sentences in that were found in the NLTK corpora, one per verb form.

(18) a. The British Government will have to decide whether to let U.S. coal in.

b. The EC Commission met in Brussels on Friday to see whether the 12 EC cocoa consuming nations could narrow their differences at this month’s meeting.

c. In short, he had not yet decided whether he was an honest man or a knave.

Yet, in the method was discussed how only the third person subject and past tense responsive neg-raising predicate should be considered, resulting in the ten sentences containing decided, see (19).

(19) a. U.S. WEIGHS LIFTING JAPANESE TRADE CURBS The White House has completed a new review of Japanese semiconductor trading practices but has not yet decided whether trade sanctions levied against Japan last April should be lifted, U.S. officials said .

b. Richard Wiener, attorney for Apex at Cadwalader Wickersham and Taft, said the company has not yet decided whether to pursue its case against Belcher Oil .

c. Industrial Equity (Pacific) added it is considering launching a tender offer for Calmat stock or making a merger proposal to the company, but said it has not decided whether it will pursue a Calmat acquisition on a non-negotiated basis.

d. He added the executive Commission has not yet decided whether the scheme should become a permanent feature of the EC’s struggle to find a use for its massive stocks of farm produce.

e. It said it may buy additional shares, but had not decided whether to offer its shares in response to a tender offer by Sunter Acquisition Corp, a unit of First Boston Inc.

f. Boschwitz told Reuters that neither he nor the U.S. Agriculture De-partment had decided whether or how deficiency payments should be guaranteed to farmers who might choose not to plant under the decou-pling scheme.

(24)

g. It also said it is studying the 35 dlr a share leveraged buyout offer made by Purolator managers and E.F. Hutton LBO Inc but has not decided whether it will tender its stock in the offer.

h. SENATOR UNCOMMITTED ON OFFERING 0/92 BILL Senator Richard Lugar of Indiana, ranking Republican on the U.S. Senate Agri-culture Committee, has not decided whether to introduce an

administration-backed bill to apply the so-called 0/92 provision to 1988 through 1990 grain crops, an aide to the senator said.

i. He did not give comparative figures and said the government-owned Food Corporation of India (FCI) had not yet decided whether these grains could be used as cattle feed.

j. In short, he had not yet decided whether he was an honest man or a knave.

All sentences in 19 share the same structure in some way or another, where all of them contain the same pattern, see (20). There is one exception to this observation, namely (19f), but that sentence also means, slightly simplified, that both Boschwitz and the U.S. Agriculture Departmant had not decided whether payments should be guaranteed. White (2021) discussed the appearance of patterns such as (20) to have belong to a collection of possible factors: tense, aspect and modality (TAM) morphology on the predicate. According to White (2021), the pattern in (20) appears to be the perfect aspect. So the counterexample decide takes on the perfect aspect in contrast to the counterexamples, think and believe, that White (2021) found.

(20) {has, had} not (yet) decided whether

So this evaluation validate that decided can take interrogative complements, but not within the constrictions of the NP Ved whether S-frame.

Furthermore, the neg-raising measure of decided has not yet been validated. The neg-raising measure could be validated with the use of intuitive examples (21). These examples show that decided should indeed be considered neg-raising.

(21) a. If I were to say The student didn’t decide that they were finished, how likely is it that I actually mean that The student decided that they weren’t finished?

b. If I were to say The head of security didn’t decide that his staff should get a day off, how likely is it that I actually mean that The head of security decided that his staff shouldn’t get a day off?

c. If I were to say The robber didn’t decide that he should turn himself in, how likely is it that I actually mean that The robber decided that he shouldn’t turn himself in?

(25)

The interrogative-to-declarative test was applied as a final part of the evalu-ation on (19j) using the strong NPI either. The sentence was slightly altered by removing yet, see (22a), but this change doesn’t change the result of the test. This sentence was used, even though it does not satisfy the restrictions of the NP Ved whether S-frame, because the corpus evaluation wasn’t able to confirm the use of decide in that frame and this test could confirm the neg-raising measure of decide in a broader context. Example (22) shows a context where the neg-raising decide (22b) is not possible. The strong NPI either is used in (22b), but (22b) has a different meaning than (22c).

(22) a. In short, he had not decided whether he was an honest man or a knave. b. In short, he had not decided he was an honest man either.

c. *In short, he had decided he was not an honest man either.

The intuitive examples indicate that decide should be neg-raising, but the interrogative-to-declarative test doesn’t. This is a great example for showing the difference between White’s (2021) and Theiler et al.’s (2019) method of measuring neg-raising and raises the argument that another dataset is needed to capture this measure of neg-raising for all predicates.

(26)

Chapter 4

Discussion & Conclusion

4.1

Discussion

This thesis made use of the MegaAcceptability dataset which only had five re-sponses per predicate. Even though this was this issue was addressed to some extent by White and Rawlins’ (2020) normalization, the lack of responses and the rate of poor participants mentioned by White and Rawlins (2020) are very notice-able, especially around the acceptability measure of zero. This thesis used high thresholds for the acceptability in an attempt to counteract the lack of responses and to guarantee a higher level of certainty about the acceptability measure of the resulting predicates.

The selected thresholds, also for the neg-raising measure, aren’t perfect, but as mentioned before, their main purpose was to minimize the number of false positives. The lack of information on these datasets made it hard to determine which predicates were actually responsive or neg-raising without checking them manually. It is possible that the thresholds were already too high and that some false negatives could have been confirmed by the evaluation as counterexamples.

The frame selection in this thesis was meant to restrict the number of evaluated constructions. However, this restriction also meant that other possible analyses were lost, such as combining the responsivity measure and neg-raising measure of the predicates over multiple frames and analysing the differences to the current method and it left other possible discoveries unexplored.

The corpus evaluation, even though informative, can be improved. The ap-proach that was used in this thesis only considered three corpora and the results could have been more insightful if more corpora were used and provided more examples of responsive neg-raising predicates in different frames.

(27)

Another thing to note is the evaluation method of neg-raising measure with the intuitive examples. This method was a manual check to validate the neg-raising measure and thus a personal opinion. White’s (2021) measure of neg-raising is different than the neg-raising measure of Theiler et al. (2019) and a new dataset with interrogative-to-declarative tests is needed to truly evaluate Theiler et al.’s (2019) analysis.

4.2

Conclusion

First of all, this thesis has established some thresholds that can be used to iden-tify responsive neg-raising predicates for a single frame in the MegaAcceptability dataset and MegaNegraising dataset. These thresholds were established so that they could give some guarantee that the results wouldn’t contain false positives and provided a more quantifiable method to identify responsive neg-raising predi-cates. The measures of the responsive neg-raising predicates also showed possible thresholds that can be used to exclude all predicates from being considered re-sponsive and neg-raising. It could be argued that the current thresholds were too low and that they should be replaced by the possible higher thresholds, especially for the neg-raising measure.

Second, the results of the experimental data suggest that there are eight re-sponsive neg-raising predicates, given the examined thresholds, namely conclude, decide, detect, discover, like, pinpoint, see and surmise. It was expected that be-lieve and think were also included in this list, but their absence this doesn’t mean that these predicates can’t be valid responsive and neg-raising since many possible valid predicates were excluded to minimize the number of false positives.

Third, the evaluation of these eight responsive neg-raising predicates shows that see and decide saw usage in the evaluated corpora and only decide in the third person subject and past tense, but not in the NP Ved that S-frame. The neg-raising evaluation showed that decide is indeed neg-raising in White’s (2021) neg-raising measure, but not Theiler et al.’s (2019) neg-raising measure. This means that decide has been validated in this thesis to be responsive, albeit only in a broader context instead of the NP Ved that S-frame, but not neg-raising in all measures of neg-raising.

Finally, this thesis provided a possible counterexample, decide, to Zuber’s (1982) generalization, but decide could not be validated using a strong NPI in the interrogative-to-declarative test. So even though this research subject is still open for discussion, especially the measurement method for neg-raising, this thesis couldn’t find any responsive neg-raising predicates that would indicate any other relation between responsivity and neg-raising other than being mutually exclusive.

(28)

4.3

Future research

The first and foremost recommended research has to be to create a new dataset that captures neg-raising for predicates using the interrogative-to-declarative test of White (2021). This dataset should contain neg-raising data that is consistent with Theiler et al.’s (2019) neg-raising measure. A possible use for this dataset could be to find actual possible counterexamples to Zuber’s (1982) generalization. A project similar to this one could also be conducted where more frames or thresholds are evaluated in order to analyse how the thresholds behave in these (different) frames and which responsive-negraising predicates can be found.

It is currently difficult to evaluate the MegaAcceptability dataset and MegaNe-graising dataset regarding precision and recall when certain threshold are used to identify, for example, responsive or neg-raising predicates. It would be extremely helpful if there would be any expert knowledge regarding these datasets that could (partially) indicate which verb-frame pairs should be accepted. However, given the size of these datasets and the ongoing debate about whether predicates are antirog-ative or responsive and whether predicates are neg-raising, this would be a huge project to take on and would also need a lot of cooperation and consensus from this area of research.

Another interesting project would be to create an unsupervised classifier that could split the predicates into antirogative, rogative and responsive categories or find patterns in the frames for predicate classes, such as attitude predicates, like-lihood predicates and so on.

(29)

References

An, H. Y. and White, A. S. (2020). The lexical and grammatical sources of neg-raising inferences. Proceedings of the Society for Computation in Linguistics, 3(1):220–233.

Bartsch, R. (1973). Negative transportation” gibt es nicht. Linguistische Berichte, 27(7):1–7.

Barwise, J. and Cooper, R. (1981). Generalized quantifiers and natural language. In Philosophy, language, and artificial intelligence, pages 241–301. Springer. Gajewski, J. (2002). L-analyticity and natural language. Manuscript, MIT. Gajewski, J. R. (2007). Neg-raising and polarity. Linguistics and Philosophy,

30(3):289–328.

Theiler, N., Roelofsen, F., and Aloni, M. (2019). Picky predicates: why believe doesn’t like interrogative complements, and other puzzles. Natural Language Semantics, 27(2):95–134.

White, A. S. (2021). On believing and hoping whether.

White, A. S. and Rawlins, K. (2016). A computational model of s-selection. In Semantics and linguistic theory, volume 26, pages 641–663.

White, A. S. and Rawlins, K. (2018). The role of veridicality and factivity in clause selection. In Proceedings of the 48th Annual Meeting of the North East Linguistic Society, page to appear, Amherst, MA. GLSA Publications.

White, A. S. and Rawlins, K. (2020). Frequency, acceptability, and selection: A case study of clause-embedding. arXiv preprint arXiv:2004.04106.

Zuber, R. (1982). Semantic restrictions on certain complementizers. In Proceedings of the 13th international congress of linguists, pages 434–436. Tokyo.

Referenties

GERELATEERDE DOCUMENTEN

The studies presented in this thesis are based on the MRI substudy of the PROspective Study of the Elderly at Risk (PROSPER). A brief overview of the PROSPER study is given in

The data of this thesis were collected within the PROspective Study of Pravastatin in the Elderly at Risk (PROSPER), a randomised, double blind, placebo-controlled trial to test

User preferences Fuzzyfication Adaptive Re asoning T2 PD Brain stripping Templates FLAIR Image registration Template Mapping CR -FLAIR FCM FCM FCM INFERENCE (CSF &amp;

White matter hyperintensities were assessed with use of both the visual semiquantitative Scheltens scale and an inhouse developed quan- titative volumetric method.. Firstly, with

In our longitudinal study, we found that a reduction of total cerebral blood flow was not associated with an increase of deep WMH, whereas an association was observed between

We longitudinally investigated the association between various cardiovascular risk factors and the presence and progression of deep and periventricular white matter hyper-

We undertook a three year follow-up study with both repeated MR and cognitive testing in order to investigate the influence of deep white matter hyperintensities (deep WMH)

The studies presented in this thesis are based on the MRI substudy of the PROspective Study of the Elderly at Risk (PROSPER). In the previous chapters these studies were described