Cognitively constrained parsing in ACT-R - The effects of function tags and lexical information on eye movement models

(1)

Cognitively constrained parsing in

ACT-R

The effects of function tags and lexical information on eye

movement models

Evelyne J. van Oers 11026049

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. Jakub Dotlacil

Institute for Language and Logic Faculty of Science University of Amsterdam

Science Park 107 1098 XG Amsterdam

(2)

Abstract

This thesis addresses the implementation of a syntactic parser in the Adap-tive Control of Thought-Rational cogniAdap-tive architecture, as opposed to similar re-search, which use independent parsers alongside a cognitive architecture. This paper specifically explores the effects of functional tags and lexical information on parser results. Implementing a parser directly into a cognitive architecture allows for reading results directly off the parser and this novel approach might provide new insights. An experiment comparing a default parser containing function tags, a simple parser containing no function tags or lexical information, and a lexical-ized parser containing lexical information and no function tags has been conducted. Parser recall and precision measures were 36.68% and 39.48% for the default parser, 29.08% and 35.17% for the simple parser, and 34.76% and 38.78% for the lexical parser. Correlation was not found between any of the parser generated models and the self-paced reading data of Frank et al.(2013). Correlation measures consis-tently doubled after the addition of lexical information to the parser. The removal of function tags had no clear effect on correlation measures. In conclusion, lexical information has a positive effect on the eye movement model accuracy. Although the model generated by the parser as discussed in this paper is less accurate than rival models, the used approach provides a generous amount of freedom for im-provements which can be explored in further research.

(3)

1 Introduction

1.1 Problem Description

One of the main objectives in the field of psycholin-guistics is to explain the mechanisms the human brain uses to process and represent language. Al-though a vast amount of research has been con-ducted in the past, many of the details of these brain processes are still unclear. In recent years, computational modeling has gained popularity as a promising tool to shed light upon the questions that still remain. The ambition of this research is to contribute new insights to these questions by using a unique approach in computational model-ing. Here, specific interest will be taken in human eye movements during reading tasks. Contrary to previous research (Demberg & Keller, 2008; Engel-mann, Vasishth, Engbert, & Kliegl, 2013), which use syntactic parsers designed for purposes other than to model human behavior, this research will concern itself with a syntactic parser which is built directly into a cognitive architecture. The cogni-tive architecture used in this research will be the Adaptive Control of Thought-Rational (ACT-R). (Anderson et al., 2004)

1.2 The Composition of this Paper

The objective of this research will thus consist of two sub-goals:

1. Build a cognitively constrained syntactic parser that can parse sentences correctly. 2. Fit the generated model to human behavioral

data and analyze the findings.

This thesis will focus on the effects that different types of information provided to the parser have on these sub-goals. This paper will describe specif-ically the effects of tag specification and lexical in-formation on parser performance. The question this paper thus aims to answer is:

1. whether the addition of function tag and lex-ical information improve the accuracy of the parser

2. whether the addition of function tag and lex-ical information improve the accuracy of the models of human eye tracking data cre-ated by the parser.

To answer this research question, an experiment will be conducted between three different parsers: the default parser, the simple parser, and the lexi-calized parser, with the only difference being the in-formation provided to each of the different parsers. The default parser contains the function tag infor-mation, whereas the simple parser and lexicalized parser do not. The lexicalized parser is different from the simple parser and the default parser, in that it contains lexical information. The results of this experiment are documented and discussed in section 4. Before the results are analyzed, section 2 will provide the theoretical foundation on which this research is based. Section 3 will explain the research method and introduce the materials and methods used to set up the experiment. The paper is concluded in section 5, where topics for further research are proposed. The appendix to this paper contains results of which discussion is beyond the scope of this paper.

Parser Function tag Lexical info

Default yes no

Simple no no

Lexicalized no yes

Table 1: Overview of the characteristics of the dif-ferent parsers.

2 Theoretical Foundation

Prior to describing the experiment, this paper will review background knowledge and findings of temporary research. In this section, important con-cepts and findings with regard to eye tracking and eye movement will be reviewed. Additionally, im-portant background knowledge on cognitive

(6)

ing and the ACT-R architecture will be highlighted. Lastly, similar research is discussed briefly.

2.1 Eye Movement

Eye tracking data has become a popular tool in psycholinguistic research after eye movements have been established as an important descrip-tor of language processing behavior during read-ing. (Engelmann et al., 2013; Rayner, 2009) In this research, one of the main goals is to fit an eye movement model on eye tracking data in order to better understand the brain processes that might cause certain eye movements. The hypothesis be-hind this approach is that, given that eye tracking data can explain mental processes in language pro-cessing, one should also be able to do the reverse: a good cognitive model should be able to generate eye tracking data similar to that of humans.

Generating eye tracking data however, is less straightforward than it might initially seem. Differ-ent types of eye movemDiffer-ent exist and it is important to specify which of these movements are modeled.

One of the most important type of eye move-ments are called saccades. Rather than moving smoothly, the focus of the eyes jump from place to place. These ‘jumps’ in focus are called sac-cades. (Rayner, 1998) Between saccades, the eyes will linger. These moments when the eyes remain relatively still are called fixations. In information processing tasks where visual cues take a central role -such as reading a text- researchers typically focus on these eye fixations. In the experiment of this paper, modeling of the duration of eye fixations has been the focus as well.

It is important to note, according to Rayner (1998), that during these fixations the eyes are never really still. Aside from constant tremor called nystagmus, the eyes are also observed to make slightly larger movements called drifts and microsaccades. What causes these movements to appear is not completely clear, and will also not be the focus of this project. For simplification pur-poses, these small movements will be seen as noise,

and hence are neglected in the model and human eye movement data used.

Another type of eye movement that often oc-curs during reading is called regression. Regressions are made when eyes move back to re-read small parts of the text. An example of the eye move-ments made when regression occurs whilst reading can be seen in figure 1. As has been stated at the beginning of this subsection, eye fixation patterns and processing difficulty of words are closely linked. (Demberg & Keller, 2008; Engelmann et al., 2013; Juhasz & Rayner, 2003; Kliegl, Grabner, Rolfs, & Engbert, 2004) Regression is one of these patterns and may occur because a reader experiences prob-lems with the processing of the currently fixated word and will slow down the reading process. As text becomes conceptually more difficult, fixation duration and the frequency of regressions increase. (Rayner, 2009) Although the occurrence of regres-sion is an important sign for sentence processing difficulties, predicting when regression occurs is be-yond the scope of this paper and could be consid-ered in further research.

Figure 1: Examples of saccade, skip, return sweep, regression and refixation as eyemovements during reading.

(Rayner et al., 2015)

Current research points at multiple factors that indicate processing complexity of a word which in-fluence eye fixation duration and amount of regres-sions occurring. (Juhasz & Rayner, 2003; Kliegl et al., 2004) In their research, Juhasz & Rayner men-tion word frequency, subjective familiarity, word length, concreteness, and age of acquisition as five variables that all influence eye movement patterns.

(7)

Kliegl et al. also mention word length and fre-quency, as well as the predictability of a word as contributing factors to fixation duration. In this re-search, word frequency and predictability have been taken into account by the model. Factors that will be less intuitive to implement, such as concreteness, may be considered in future research.

In conclusion, research using eye tracking data has shown an immediate connection between eye movement behavior and a number of factors that determine the processing complexity of a word. To understand in more detail how these factors influence the word processing mechanism of the brain, cognitive architectures may be of help. The next subsection will provide important background knowledge on cognitive architectures. We will specifically take a close look at the ACT-R archi-tecture, which was used in this research to build the eye movement model.

2.2 Cognitive Modeling

In the previous section it was briefly mentioned that eye tracking data similar to that of humans can be generated by a good cognitive model. In cognitive science, using cognitive models is com-mon practice. An important goal in this field is to build a bridge between theoretical linguistics and psycholinguistics. To attain this goal it is vital to construct mechanistic explanations for both gram-mar and cognition. Using a cognitive model that models human cognition while linguistic tasks are performed is an effective way of reaching a mech-anistic understanding of cognition and linguistics. (Hale, 2014)

Cognitive modeling makes use of cognitive ar-chitectures. These architectures are designed to represent the various processes in the human brain and the communication between these processes. ACT-R is one of the most well known cognitive ar-chitectures (Anderson et al., 2004) and is also the cognitive architecture used in this research.

A general schematic overview of the ACT-R ar-chitecture can be seen in figure 2. As can be seen

in this figure, the ACT-R architecture consists of a set of modules. Each module processes a differ-ent category of information and stores this infor-mation in its respective module buffer. In ACT-R, the information that is stored in these buffers are called chunks. Communication between these separate modules is achieved through the central production system. This system works in criti-cal cycles, and can only execute one action every cycle and involves only the available information chunks. Which production rule the production sys-tem chooses to execute depends on the available chunks at the beginning of the cycle. How well the production rule is executed depends on the height of the activation value during the execution of the rule. A high activation value corresponds to fast execution. After a production rule is chosen, this production fires, and the buffers are then updated for another cycle.

Figure 2: ACT-R structure (Anderson et al., 2004)

In this research, the parser is implemented di-rectly into ACT-R. This means that any action of the parser is executed as a production rule. Gram-mar rules and lexical information will for example be retrieved from the declarative memory module through the retrieval buffer and the success of this retrieval is dependent on the height of the activa-tion value.

It may be worthwhile to note that the design of ACT-R takes into account several popular concepts

(8)

within the domain of language acquisition research. Understanding how these concepts influence use of language will be simplified because one can read their interaction straight from the model. The first distinction ACT-R makes is between declarative (factual) and procedural (action based) knowledge. (Brasoveanu & Dotlacil, 2018; Lebi`ere & Ander-son, 2008) The former is stored in the declarative memory module, and the latter can be found in the form of production rules along which the cen-tral production system acts. Secondly, ACT-R also makes a distinction between explicit and implicit knowledge. The declarative memory chunks that can be retrieved and inspected can be seen as ex-plicit knowledge, whereas imex-plicit knowledge is re-flected by the various equations that are used to define the behavior of the declarative memory mod-ule. (Anderson et al., 2004) In this thesis project, an example of what is treated as explicit declara-tive knowledge would be lexical knowledge. After all, we know and can explain why certain words are verbs or nouns. Implicit declarative knowledge might be the structural knowledge encoded in each word. This is intuitive knowledge: as human be-ings, we expect the word gives to be followed by a noun phrase, such as in the following sentence:

”J ohn gives an apple.”

On the other hand, we expect the word sleeps to be followed by a preprositional phrase rather than a noun phrase:

”M ary sleeps in bed.”

Observing the effects of providing this lexical knowledge will be one of the main focuses of this paper, and will be discussed more in-depth in sec-tion 3 and 4.

As opposed to declarative knowledge, procedu-ral knowledge will not be explicitly available as ready-to-retrieve chunks of knowledge. The proce-dural knowledge is instead reflected by, for example, the inner workings of ACT-R such as when to ac-cess the buffers of certain modules. This paper will not focus on the inner workings of ACT-R and their

influence on the results from the experiments in this paper. Details of the ACT-R architecture and their effect on the experiments as described in this paper are described in ‘Modeling behavioral data using a cognitively constrained parser’. (Bal, 2018)

The other main reason which makes ACT-R an interesting choice as a cognitive architecture, is the modules it contains. Most research use cognitive modeling only to investigate the relation between declarative memory and natural language process-ing. (Brasoveanu & Dotlacil, 2018) Using the ACT-R architecture, the research in this thesis can easily be further expanded with information contributed by for example motor and vision.

2.3 Modeling Eye Movements

Dur-ing ReadDur-ing Tasks

As has been previously mentioned, research into language processing by using cognitive models is ample. The research of Demberg and Keller (2008) and Engelman et al. (2013) are two noteworthy examples. Both of these papers use the ACT-R cognitive architecture and concern themselves with the cognitive processes that might explain the eye movement patterns of humans as they read. It is important to note, however, that neither of these papers do directly program the parser into ACT-R. Instead, they both use an independently developed parser and use its results to then model the read-ing times in ACT-R. Demberg and Keller (2008) use the Minipar dependency parser combined with ACT-R to study the correlation of the distance of dependency and surprisal with eye fixation dura-tions. The second paper is a similar research of Engelman et al. (2013), who use the Boston de-pendency parser (Boston, Hale, Vasishth, & Kliegl, 2011). Like Demberg and Keller, Engelman et al. study the correlation of eye fixations with the dis-tance of dependency and surprisal. However, their study additionally focuses on the influence of cue-based memory retrieval on eye fixations.

Both studies link higher surprisal rates to longer processing time. Dependency measures and mem-ory retrieval times also seem to have influence on

(9)

reading patterns. In other words, frequently oc-curring sentence structures and words are easier to retrieve and process by humans, and structures and words that appear rarely are typically hard to pro-cess. It is expected that the model in this paper will also find this correlation.

3 Research Method

This section will introduce the materials and re-search approach used for the experiment. First, the materials for the experiment are highlighted. Sec-ond, the actions taken to pre-process the data for the experiment are described in detail. Third, the specific actions taken to create the default, simple, and lexicalized parser are reviewed.

3.1 Materials Used

For this research, several materials are needed. Pri-marily, part-of-speach (POS) tagged sentences will be necessary for building a grammar that can be used to provide language information to the ACT-R model. The Penn Treebank will be used to provide the POS-tagged data as it is one of the biggest and most extensive treebanks available to date. The second requirement for this project is a tool with which the ACT-R cognitive model can be imple-mented. Primarily because of its powerful use in data processing, Python 3.5 has been chosen as the programming language in this research. The second reason for choosing Python as a programming lan-guage lies in the availability of the pyactr library, which takes care of the low-level workings of the ACT-R model. Lastly, this project requires human eye-tracking data as a golden standard to the pre-dictions of the ACT-R model. The chosen corpora are the corpus used by Frank et. al. (2013) and the Dundee Treebank. (Barrett, Agi´c, & Sogaard, 2015)

3.1.1 Penn Treebank

The dataset used for this experiment is the Penn Treebank, a treebank database containing more than 7 million words of part-of-speech tagged text. The annotated material includes many different types of text, such as IBM computer manuals, nurs-ing notes, Wall Street Journal articles and tran-scribed telephone conversations. (Taylor, Marcus, & Santorini, 2003) The version of the Penn Tree-bank used in this experiment contains 26 sections and has been divided into training data, devel-opment data, and test data categories. Table 1 displays the distribution of sections used in this project.

Data Type Sections Training Data 0-20 Development Data 21 Test Data 25

Table 2: Section distribuition of PTB

Because this experiment does not specifically fo-cus on parser accuracy, a development stage was unnecessary. For this reason, section 21 of the Penn Treebank database is ommitted. This section will thus remain available for use in further research.

3.1.2 Eye Tracking Data

The eye tracking corpora used for the experiment in this paper are the corpus of Frank et al. (Frank, Monsalve, Thompson, & Vigliocco, 2013) and the Dundee Treebank (Barrett et al., 2015). The cor-pus of Frank et al. consists of a self-paced read-ing section and an eye trackread-ing section. The eye tracking section holds information about first fixa-tion, first pass, right bounded and go past reading times. First fixation time includes only the dura-tion of the first time a reader fixates on a word. First pass reading time is the sum of durations the reader fixates their attention on a word before mov-ing on to a new word. Right bounded readmov-ing time is the sum of fixation durations on a certain word

(10)

before the reader fixates their eyes on a word to the right of the current word. Go-past reading time is the sum of fixation durations on the current word and words left to the current word up to but not including the moment a reader fixates on a word to the right of the current word for the first time. In the experiment of this paper, the parser model will be compared to all of these times except the first fixation time.

The Dundee corpus, as opposed to the corpus from Frank et. al, only contains two sections: it contains a first pass reading time section like the corpus of Frank et al, and a section with total eye fixation times. The model generated by the parser is also compared to this corpus.

3.2 Pre-Processing the Data

Before it is possible to run the ACT-R model on the Penn Treebank data, the data needs to be pre-processed. As this paper focuses on the effects of specific combinations of information being fed to an ACT-R based parser, multiple combinations and forms of pre-processing of data are applied. These pre-processing steps will be discussed in the next subsection. The current subsection will focus on the vital pre-processing steps that are used for all three parsers.

Many of the mandatory pre-processing of the treebank data will be done in similar fashion to Roark (2001), who in his paper describes a de-tailed implementation for a probabilistic context free grammar.

The most notable mandatory pre-processing technique used on the data is the factorization of the POS-tags as described by Roark. This section shall discuss the factorization of POS-tags and the use of multiple ordered rules.

3.2.1 Factorization of POS-tags

The nonterminal labels in the parse trees obtained from the Penn Treebank will be rewritten in a

for-mat similar to the forfor-mat used by Roark (2001). The rewriting of these labels is called factorization, and allows a nonterminal label to contain informa-tion about its sister and parent node. Allowing this information to be contained by the tree labels is vi-tal as it enables lexical items to become part of the left context. This in turn makes it possible to con-dition production probabilities. (Roark, 2001)

The factorization of nonterminal labels consists of two steps. First, the parse trees must be rewrit-ten to their chomsky normal form, after that the labels need to be rewritten according to the factor-ization rules as specified by Roark. This subsection will discuss both steps in depth.

Before factorization can be applied, the chom-sky normal form (CNF) of a parsed sentence must be obtained. A grammar is in CNF when it obeys the following rules:

A → BC A → α

S →

In other words, in a grammar that is in CNF, a nonterminal symbol can only map to two new nonterminal symbols, or one terminal symbol. Fur-thermore, only the start symbol may map to the empty string . (Jurafsky & Martin, 2009) In this thesis project a looser definition of the CNF is used, allowing the grammar to contain rules of the form A → B. Structurally, allowing this rule to appear does not change much: all these rules do, in essence, is rewriting a nonterminal symbol to a new nonter-minal symbol.

An advantage of the chomsky normal form is that it assures that for any rule, the right hand side of this rule will at most exist of two symbols. This advantage will play an important role as one of the expected qualities of the parser is that it can predict the next POS-tag based on its current state in the parse tree. Without binarization of the rules, the parser would run into a lookahead prob-lem: If we take the grammar as-is, this would in theory allow the grammar to contain a rule that

(11)

maps from one POS-tag to an infinite amount of new symbols, making predictions impossible. Even in less extreme cases, not binarizing the parse trees may provide problems.

If we assume for a moment that a grammar ex-ists such that one of its rules is of the form:

A → B C D

in this case the left corner parser will have to predict both categories C and D based on the infor-mation of the leftmost tag B and the parent A (i.e. the probability P (C, D|A, B)). With a binarization of the grammar rules, this prediction becomes many times easier, as the parser will only need to make predictions of the next category at hand. POS-tag prediction is thus postponed until the parser has seen as much of the sentence as it possibly can. The example rule would be rewritten to the rules A → B E, and E → C D. Now the probabil-ities the parser will use for its prediction have be-come the easier to calculate values P (E|A, B) and P (D|E, C).

With the Penn Treebank sentences in CNF, it is possible to apply factorization to the POS-tag labels. According to Roark’s definition of factor-ization, a factored grammar Gf is defined the

fol-lowing way: 1. (A → B A−B) ∈ Gf iff (A → Bβ) ∈ G, s.t. B ∈ V and β ∈ V∗ 2. (A − α → B A − αB) ∈ Gf iff (A → αBβ) ∈ G, s.t B ∈ V and α ∈ V∗ 3. (A − αB → ) ∈ Gf iff (A → αB) ∈ G, s.t. B ∈ V and α ∈ V∗ 4. (A → a) ∈ Gf iff (A → a) ∈ G, s.t a ∈ T

Here, V is a set of all nonterminal symbols and T is a set of all terminal symbols that occur in the grammar G.

The factorization process in this thesis project will deviate slightly from the definition from Roark,

in that the presence of additional epsilon produc-tions and stop symbols are implicit and therefore not included in the process. An example of the ef-fect of the transform as used on the Penn Treebank sentences can be seen in the figure below.

Figure 3: Example of lexicalization of a syntactic tree. (a): standard tree in CNF, (b): tree (a) after left corner lexicalization as used in this paper.

3.2.2 Multiple Order Rules

Now that the tree structures of the Penn Treebank are rewritten to their correct form, the grammar rules associated with these trees are extracted to provide the parser of knowledge about the Penn Treebank grammar. The rules are subdivided in their left hand side and right hand side. Each rule is saved in combination with its frequency count.

Because parse trees for sentences often have branches of varying depth and the parser needs to predict the correct parse tree structure, multiple order rules need to be generated. For this research, rules up to the 3rd order are generated, where the order corresponds with the branch depth.

(12)

ing the calculation of the frequency for multiple or-der rules, independence of probability to encounter each sub-rule is assumed to simplify calculation.

freq left right1 right2

22033 VP VBZ < V P > ∧ < V BZ >

Table 3: An example of a first order rule

3.3 The Default, Simple, and

Lexi-calized Parser

The experiment conducted in this paper will show the impact of the provision of different types of in-formation to the parser. The inin-formation provided to the parser will be provided through the data sets of the multiple order rules as described sec-tion 3.2.2. For the experiment, three parsers are compared against one another: the default parser, the simple parser, and the lexicalized parser. This subsection will describe all three parsers.

3.3.1 The Default Parser

The default parser is created without further ma-jor pre-processing or addition of data. This means that the tagset used for the grammar rules provided to the parser is the same as the tagset deployed by the Penn Treebank itself. The only alteration made to the Penn Treebank tagset is the removal of the coreference annotation in the tags. Corefer-ence annotation is used for tasks that seek infor-mation about large scale text coherence, such as automatic summarizing and other information ex-traction tasks.(Van Deemter & Kibble, 1999) Since this is not the goal of this parser, the annotation can be safely removed and this removal will lead to less data sparsity.

3.3.2 The Simple Parser

The simple parser is similar to the default parser, but differs from the latter in that the tagset used

is simplified. Apart from the removal of the coref-erence annotation of the tags, the function tags are also removed. This leaves the parser with only the base labels, such as S, NP, VP, and ADJP, without additional information. Using this underspecifica-tion of tags is based on the hypothesis that infor-mation encoded in the function tags -such as the grammatical role of a phrase- adds no useful in-formation to the parser and can be ommitted to combat data sparsity problems.

3.3.3 The Lexicalized Parser

The lexicalized parser extends the simple parser by adding lexical information to each grammar rule. For extension, the simple parser was selected as op-posed to the default parser because the addition of lexical information to the default parser data would lead to a data sparsity problem.

The way lexical information is provided to the parser is through adding the head information be-longing to each rule in the grammar rules provided to the parser. The head of a phrase is the word that determines the syntactic category of that phrase. (Miller, 2011) This means that, for example, the head of a noun phrase will always be a noun, and the head of a verb phrase will always be a verb. Of-ten, but not always, the head of a phrase occurs at the start of the phrase. For this parser, the heads are extracted according to the following algorithm, where step two is added because of practical con-siderations caused by the Penn Treebank tagging system:

1. If the current phrase is a noun phrase, verb phrase or adjective phrase, retrieve the first word within this phrase that has the correct word type

2. If the current phrase is any other type, return the first word of the phrase.

After extraction, the head information was added in a separate column in the file of grammar rules. Only the head information of the phrases oc-curring at the left hand side of the grammar rule

(13)

is worth to consider. It is not important to know what the head is of the phrases that are parsed into, because the head information is used as lexical in-formation to make predictions. These predictions are based on the information of the left hand side of the rule. Therefore, head information is not added to any phrase that does not occur on the left hand side at least once. This means that a first order rule will have the following form after lexical infor-mation is added:

freq left head right1 right2

9 VP belongs VBZ < V P > ∧ < V BZ >

Table 4: First order rule after adding lexical infor-mation

4 Discussing the Results

In this section, the results of the experiment will be reviewed. The first results to be considered will be the parsing accuracy results of the different types of parser. After that, correlation results of the model with the eye tracking data and the eye tracking cor-pora will be analyzed.

4.1 Parser Tagging Accuracy

An important goal of the parser is to tag the sen-tences it needs to read correctly: the parser knowl-edge of sentence structures is used to model reading times. The model operates on the assumption that lesser known sentence structures will be harder to recall by the parser and thus take longer to read, and more well known sentence structures will be easier to recall and thus the parser will spend lit-tle time reading these structures. In order to rec-ognize the sentence structures, however, the parser first needs to recognize and tag the unseen test sen-tences accurately.

In table 5 one can see the average precision (AvP), average recall (AvR) and the percentage of sentences skipped (% Skipped) for every parser

type. A sentence is skipped by the parser when-ever the parser is stuck and cannot recognize the sentence structure. As can be seen from the table, the lexicalized parser holds the overall best score for accuracy.

Parser AvP AvR Skipped Default 0.395 0.367 0.15 Simple 0.352 0.291 0.225 Lexicalized 0.388 0.348 0.15

Table 5: Parser tagging accuracy on Penn Treebank section 25

An explanation for these low accuracy results could be found in the restrictions put on the parser: for example, the current parser only allows for five parallel parses. This limitation was introduced because, although humans do consider alternative sentence structures as they are reading, the amount of considered alternatives is low. Also, the current parser only considers alternative parses on a local level. This means any alternative is generated from the current node the parser has arrived at, ruling out any possibility to consider alternative parses it has discarded in previous nodes. Adding the pos-sibility to keep these alternative parses that would otherwise be discarded around, might improve over-all parsing accuracy.

However, this overall low parsing accuracy means that leaving in the wrongly tagged sentences would add too much noise to the model data. To counter this problem, the skipped sentences were left out of the correlation measure. Hand select-ing only the completely correctly tagged sentences from the eye tracking corpuses was considered, but this would make the dataset too small to test on adequately.

4.2 Correlation Measures

Correlation measures for each corpus can be seen in tables 6, 7, 8, 9, 10 and 11. The results below show the correlation of the parser results with eye tracking data on which a shift by one word was

(14)

performed. The shift of one is performed on the data because the activation values as modeled by the parser are delayed by approximately one word. A negative correlation is expected, as a high acti-vation value causes quick retrieval (Anderson et al., 2004). Quick retrieval in turn means reading times will be shorter, hence high activation values will be associated with low fixation duration values.

Results of correlation with data on which a shift is not performed are added to the appendix as dis-cussion of these results is beyond the scope of this paper. An in-depth discussion of these results can be found in Bal (2018).

Parser Activation Correlation p-value Default Minimum -0.024928 0.274699 Maximum -0.015617 0.493808 Average -0.013761 0.546549 Simple Minimum -0.022298 0.328549 Maximum -0.012717 0.577399 Average -0.016310 0.474842 Lexicalized Minimum -0.034941 0.125698 Maximum -0.030916 0.175469 Average -0.036349 0.111153 Table 6: Results of parser on self-paced reading data (Frank et. al (2013)), number of sentences = 239, shift=1

Parser Activation Correlation p-value Default Minimum -0.068362 0.022321 Maximum -0.104601 0.000462 Average -0.088808 0.002972 Simple Minimum -0.073965 0.013411 Maximum -0.085426 0.004275 Average -0.094526 0.001563 Lexicalized Minimum -0.239104 5.493455e-16 Maximum -0.242980 1.789783e-16 Average -0.248369 3.640618e-17 Table 7: Results of parser on first pass eye tracking data (Frank et. al (2013)), number of sentences = 180, shift=1

Parser Activation Correlation p-value Default Minimum -0.070338 0.018720 Maximum -0.105856 0.000394 Average -0.089744 0.002681 Simple Minimum -0.080729 0.006945 Maximum -0.092124 0.002056 Average -0.101253 0.000702 Lexicalized Minimum -0.241198 3.004202e-16 Maximum -0.244189 1.256331e-16 Average -0.249965 2.255050e-17 Table 8: Results of parser on right bound eye

track-ing data (Frank et. al (2013)), number of sentences = 180, shift=1

Parser Activation Correlation p-value Default Minimum -0.070530 0.018398 Maximum -0.089483 0.002760 Average -0.080021 0.007457 Simple Minimum -0.094141 0.001634 Maximum -0.085703 0.004152 Average -0.103279 0.000546 Lexicalized Minimum -0.232566 3.484584e-15 Maximum -0.228533 1.059504e-14 Average -0.237056 9.856411e-16 Table 9: Results of parser on go past eye tracking

data (Frank et. al (2013)), number of sentences = 180, shift=1

Parser Activation Correlation p-value Default Minimum -0.077239 7.165929e-12 Maximum -0.022784 4.347006e-02 Average -0.115550 9.306488e-25 Simple Minimum -0.008452 0.450116 Maximum 0.028247 0.011590 Average -0.053041 0.000002 Lexicalized Minimum -0.103440 3.607711e-20 Maximum -0.086739 1.281407e-14 Average -0.113845 4.058381e-24 Table 10: Results of parser on Dundee total time

eye tracking data, number of sentences = 439, shift=1

(15)

Parser Activation Correlation p-value Default Minimum -0.096185 1.311492e-17 Maximum -0.040644 3.147173e-04 Average -0.135990 9.792262e-34 Simple Minimum -0.013536 2.264671e-01 Maximum -0.028846 9.938009e-03 Average -0.057861 2.288598e-07 Lexicalized Minimum -0.122763 8.317138e-28 Maximum -0.106785 2.133247e-21 Average -0.129899 5.831059e-31 Table 11: Results of parser on Dundee first pass eye tracking data, number of sentences = 439, shift=1

The overall results show a slight increase in cor-relation when comparing the default parser to the simple parser, and a noticeable increase in correla-tion when comparing the simple parser to the lexi-calized parser.

A significant correlation of the model and the eye tracking data occurs when p < 0.05. This means that the parser yields a model that signif-icantly correlates with the eye tracking data from frank et al. (2013), however, the model bears no significant correlation with the self paced reading data. An explanation for the lack of correlation with the self paced reading data might be that self paced reading data is known to be less accurate than eye tracking data.

Results show that changing from the default tagset used by Penn Treebank to a simpler tagset has little effect on correlation results, and are some-times even unexpected: the correlation of the sim-ple parser model on the total time eye tracking data of the Dundee corpus is insignificant, and the corre-lation of the maximum activation is even positive. However, adding lexical information to the parser does seem to have a significant effect on correlation results. In most cases, correlation scores of the cor-pus and the lexicalized parser model have doubled compared to the default and simple parsers. The

results from this experiment confirm the assump-tion that lexical informaassump-tion does also carry infor-mation about likely sentence structures, and more importantly, imply that humans do use this lexi-cal information to predict sentence structure while reading.

5 Experiment Conclusion and

Further Research

In conclusion, implementing a parser directly into ACT-R seems a worthwhile path to venture. Pliminary results are sub-optimal because the re-strictions placed on the parser, such as restricted parallel parses, cause parsing inaccuracies. How-ever, many solutions and improvements to the pars-ing problems faced are yet to be considered. Dur-ing the experiment, a data sparsity problem was observed. A fast solution to combat this prob-lem is to provide the parser of more eye tracking data. Next to using a larger eye-tracking corpus to provide more training data for the parser, the parser itself can also be improved in multiple ways. One of these ways is to consider other factors well known to be contributing factors to fixation dura-tion, such as word length, concreteness, and age of acquisition.(Juhasz & Rayner, 2003; Kliegl et al., 2004) Further research might focus on using multi-ple order rule data with frequency counts that are not based on the assumption of independent prob-ability between rules. Another way to improve the parser is by feeding it more information about the global structure of the sentence, as the parser in this thesis mainly depends on the local information provided by its current state. This can be achieved by keeping track of alternative parses on a global scale, holding a select amount of the most likely alternative parses in memory. Lastly, further re-search might consider modeling eye movement be-yond eye fixation, such as predicting the occurrence of regression.

(16)

6 Appendix

6.1 Results with Shift of Zero

Mean first-pass RT Mean right-bounded RT Mean go past RT Rule structure Activation Correlation p-value Correlation p-value Correlation p-value

Normal Minimum 0.0522 0.0600 0.0422 0.128 0.0364 0.189 Maximum 0.0363 0.191 0.0304 0.272 0.0229 0.409 Average 0.0506 0.0685 0.0423 0.127 0.0363 0.191 Simple Minimum 0.126 0.00000504 0.116 0.000030 0.0930 0.000796 Maximum 0.0618 0.0259 0.0499 0.0724 0.0277 0.319 Average 0.108 0.0000914 0.0959 0.000541 0.0716 0.00985 Lexicalized Minimum 0.231 3.48 · 10−17 0.212 1.24 · 10−14 0.170 6.65 · 10−10 Maximum 0.183 3.00 · 10−11 0.165 2.45 · 10−9 0.128 4.01 · 10−6 Average 0.218 2.11 · 10−15 0.200 4.01 · 10−13 0.159 8.30 · 10−9

Table 12: Results of parser on eye tracking data (Frank et. al (2013)), shift = 0, n = 180

Rule structure Activation Correlation coefficient p-value Normal Minimum -0.0391 0.0678 Maximum 0.0161 0.451 Average -0.00191 0.928 Simple Minimum -0.00358 0.867 Maximum -0.00672 0.754 Average -0.000942 0.965 Lexicalized Minimum -0.00471 0.827 Maximum 0.00419 0.846 Average -0.00479 0.824

Table 13: Results of parser on self-paced reading data (Frank et. al (2013)), Shift = 0, n = 239

Parser Activation Correlation p-value Default Minimum 0.179280 6.959717e-61 Maximum 0.200046 1.189492e-75 Average 0.179859 2.843465e-61 Simple Minimum 0.168422 1.022114e-54 Maximum 0.171246 1.562264e-56 Average 0.130610 2.033106e-33 Lexicalized Minimum 0.274856 5.488660e-144 Maximum 0.276411 1.153960e-145 Average 0.256411 6.289697e-125

Table 14: Results of parser on Dundee total time eye tracking data, number of sentences = 443, shift=0

(17)

Parser Activation Correlation p-value Default Minimum 0.187458 1.709083e-66 Maximum 0.166761 8.273052e-53 Average 0.168443 7.385588e-54 Simple Minimum 0.160499 8.625147e-50 Maximum 0.154396 3.643485e-46 Average 0.120512 1.151643e-28 Lexicalized Minimum 0.279625 3.645614e-149 Maximum 0.275544 9.972953e-145 Average 0.260407 6.276365e-129

Table 15: Results of parser on Dundee first pass eye tracking data, number of sentences = 443, shift=0

References

Anderson, J., Bothell, D., Byrne, M., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of mind. Psychological Review , 111(4), 1036-1060.

Bal, S. (2018, 7). Modeling behavioral data using a cognitively constrained parser. Barrett, M., Agi´c, Z., & Sogaard, A. (2015). The dundee treebank. The 14th

Interna-tional Workshop on Treebanks and Linguistic Theories.

Boston, M. F., Hale, J., Vasishth, S., & Kliegl, R. (2011). Parallel processing and sentence comprehension difficulty. Language and Cognitive Processes, 26 , 301-349.

Brasoveanu, A., & Dotlacil, J. (2018). Formal linguistics and cognitive architecture: Integrating generative grammars, cognitive architectures and bayesian methods. (draft)

Demberg, V., & Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109 , 193-210.

Engelmann, F., Vasishth, S., Engbert, R., & Kliegl, R. (2013). A framework for mod-eling the interaction of syntactic processing and eye movement control. Topics in cognitive science, 5 , 452-474.

Frank, S. L., Monsalve, I. F., Thompson, R. L., & Vigliocco, G. (2013). Reading time data for evaluating broad-coverage models of english sentence processing. Behavior Research Methods, 45.5 , 1182-1190.

Hale, J. (2014). Automaton theories of human sentence comprehension. CSLI Publica-tions, Center for the Study of Language and Information. (draft)

Juhasz, B. J., & Rayner, K. (2003). Investigating the effects of a set of intercorre-lated variables one eye fixation durations in reading. Journal of Experimental Psychology: Learning, Memory and Cognition, 29(6), 1312-1318.

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing. Pearson Educa-tion, Inc. (International Edition)

Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European journal of cognitive psychology, 16(1/2), 262-284.

Lebi`ere, C., & Anderson, J. R. (2008). A connectionist implementation of the act-r production system.

Miller, J. (2011). A critical introduction to syntax. London: continuum.

Rayner, K. (1998). Eye movement in reading and information processing. Psychological Bulletin, 124(3), 372-422.

(18)

Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457-1506.

Rayner, K., Abbott, M. J., Schotter, E. R., Belanger, N. N., Higgins, E. C., Leinenger, M., . . . Plummer, P. (2015). Keith rayner eye movements in reading data collection. UC San Diego Library Digital Collections. (http://dx.doi.org/10.6075/J0JW8BSV)

Roark, B. (2001). Probabilistic top-down parsing and language modeling. Computational linguistics, 27.2 , 249-276.

Taylor, A., Marcus, M., & Santorini, B. (2003). The penn treebank: an overview. In Treebanks (p. 5-22). Springer.

Van Deemter, K., & Kibble, R. (1999). What is coreference, and what should coreference annotation be? Proceedings of the Workshop on Coreference and its Applications, 90-96.

Cognitively constrained parsing in ACT-R - The effects of function tags and lexical information on eye movement models