Modeling behavioral data using a cognitively constrained parser - Left-corner parsing in ACT-R

(1)

Modeling behavioral data using a

cognitively constrained parser

Left-corner parsing in ACT-R

Steven Bal 11013818

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. J. Dotlaˇcil

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 107 1098 XG Amsterdam

(2)

Abstract

This paper describes a syntactic parser that was constructed directly inside of a cognitive architecture, namely Adaptive control of thought–rational (ACT-R), and analyzes the performance of such a cognitively constrained parser in a psycholinguistic context. Three different grammar rule struc-tures (default, simple and lexicalized) were created and employed by the parser during testing, to evaluate the difference in performance between the different rule structures. The parser was initially tested for its parsing capabilities by comparing its results to the gold standard present in Penn Treebank corpus. Subsequently it was used to read sentences from eye tracking and self-paced reading experiments and to investigate whether correlation was present between the activation levels of the parser and the behavioral data. Finally, the results of the parser on garden-path sentences were analyzed. Application of the parser on Penn Treebank corpus yielded an average of 40% recall and an average of 43% precision. Although the activation levels produced by the parser did not result in any significant correlations with self-paced reading data, significant cor-relations were found for two rule structures between the activation levels and eye tracking data from Frank et al. (2013) (simple: ρ “ 0.126; lex-icalized: ρ “ 0.231) and significant correlations were found for all three rule structures with eye tracking data from the Dundee corpus (Kennedy, Hill, and Pynte, 2003) (default: ρ “ 0.187; simple: ρ “ 0.160; lexicalized: ρ “ 0.280). On garden-path sentences, the parser displayed similar be-havior to that of human readers. These results indicate that the parser is psycholinguistically accurate to a degree, which may give more credence to ACT-R as a cognitive architecture and as a theory to understand the human mind.

(3)

1 Introduction

Human language has always been an intriguing subject of research and with the ever-increasing amount of data and increase in computational power available, it is not surprising that the field of research on the intersection of linguistics and computer science, computational linguistics, remains a fruitful one at that. Moreover, computational psycholinguistics is concerned with investigating the means by which the human mind can produce comprehension of natural lan-guage. As the name suggests, this field draws from both psychological and linguistic theories to examine the behavior of the mind.

1.1 Research question

In computational psycholinguistics, it has been attempted to model behavioral data, such as eye tracking data, using different metrics of syntactic processing complexity (Demberg and Keller, 2008). It has however not been attempted to implement a complete parser inside of a cognitive architecture directly and to use this parser to model behavioral data, and thus the research question of this paper will be:

How well can a cognitively constrained data-driven syntactic parser, constructed in Adaptive control of thought–rational (ACT-R), produce a fit to

eye tracking and self-paced reading data?

This will be done by comparing the activation levels produced by the parser in ACT-R (Anderson et al., 2004) for certain sentences to the reading times of human subjects for the same sentences. This comparison will be performed for three different grammar rule structures used by the parser. Furthermore, the results of the parser on classic examples of garden-pathing will be examined.

It is hypothesized that the activation levels of ACT-R should display a corre-lation with the reading times from behavioral data, since these activation levels are used by ACT-R to compute retrieval latency, which is analogous to reading time. In addition to this, it is hypothesized that a cognitively constrained parser should perform similarly to human subjects when reading garden-path sentences and thus it would be expected that the parser produces the same errors that are frequently produced for those sentences by human subjects.

(5)

2 Theoretical foundation

2.1 Computational linguistics

2.1.1 Grammars

In order to model natural language, the field of computational linguistics em-ploys a wide variety of statistical and rule-based techniques. One approach to modeling syntactic combinatorics of language is through a context-free grammar (CFG), which utilizes a set of terminal symbols; a set of non-terminal symbols; a set of production rules, each of which maps one single non-terminal symbol to a combination of terminal and/or non-terminal symbols (for example: S Ñ NP VP); and a starting symbol S, which must be a non-terminal symbol. This notion of grammar may be extended to a probabilistic context-free grammar (PCFG), which assigns conditional probabilities to each of the production rules described earlier (Hale, 2014). Such a grammar may be used to capture the syntactic structure of a sentence, which is often represented as a parse tree.

Probabilistic context-free grammars may be expanded even further by lexi-calization, which is a modification of conventional PCFGs in which non-terminal symbols in the rules of a grammar are annotated with a lexical head. In order to find the lexical head for a rule, one constituent on the right-hand side of that rule must be selected as the head daughter for that rule, which is trivial in cer-tain cases, but non-trivial in others (Jurafsky and Martin, 2014). For instance, consider the following structure:

NP NN man JJ old DT the

(a) Unlexicalized rule

NPrmans NNrmans man JJrolds old DTrthes the (b) Lexicalized rule

Figure 1: Example of trivial rule lexicalization

The example above demonstrates a case in which finding the lexical head is trivial, as it is often the case that the lexical head for a noun phrase (NP) is the noun itself (NN in this case). For non-trivial cases, an algorithm or a table containing a priority list may be used to determine lexical heads for rules.

Lexicalized grammars provide an advantage over standard grammars, as they can help ensure that the correct rules are employed for specific words when parsing a sentence. For instance, transitive verbs (e.g. to eat) are often followed by another noun phrase to serve as the object of that transitive verb, whereas intransitive verbs (e.g. walk) lack this noun phrase that serves as an object. A disadvantage of lexicalization is that grammars can become sparse, as certain rules with certain words might only encounter once, even within large annotated corpora.

(6)

2.1.2 Syntactic parsing

The syntactic structure of a string of words conforming to a certain grammar can be uncovered by parsing that string. A multitude of parsing methods can be used, one of which is top-down parsing, which starts at the highest level of a parse tree (e.g. the start symbol S) and attempts to reach the words of the string by expanding the rules of the grammar, using a stack. For instance, if a non-terminal symbol A is encountered on top of the stack, the parser will attempt to find a rule in the grammar of the form A Ñ α, and if it succeeds in doing so, will replace A on top of the stack with the sequence of symbols α. Alternatively, if a terminal symbol x is encountered on top of the stack, and this symbol x is identical to the next word in the string that is being parsed, then x is removed from the input and from the stack (Hale, 2014).

A parsing method that operates in the reverse order is bottom-up parsing, which starts at the word level and advances upwards through the parse tree. For example, if a sequence of symbols α is encountered on top of the stack, the parser will attempt to find a rule in the grammar of the form A Ñ α, and if it succeeds in doing so, will replace α on top of the stack with the non-terminal symbol A. Beyond this, if the upcoming word of the string to be parsed occurs in the grammar as a terminal symbol, that word is pushed on top of the stack (Hale, 2014).

When comparing the two aforementioned parsing methods, a few advantages and disadvantages can be identified for both. An advantage of top-down parsing is that this method will never attempt to create tree structures that are not present in the grammar and thus this method will not lose time in doing so. Disadvantageous to this method is the fact that a top-down parser might spend time exploring trees of which the leaves (the terminal nodes of the parse) do not match the words in the string that is being parsed. As expected, this is vice versa for bottom-up parsers, as starting at the word level can cause incorrect parses to be considered, however, starting at the word level does ensure that the terminal nodes of the parse will always match the words of the string that is being parsed (Jurafsky and Martin, 2014).

In order to exploit the advantages of both top-down and bottom-up pars-ing, left-corner parsing provides a combination of both methods. This parsing technique advances in the same direction as bottom-up parsing, as it starts at the word level and it requires evidence before a rule is employed, however, the encounter of the first symbol on the right-hand side of the production rule (also known as the left-corner of the rule) in the grammar is enough for a left-corner parser to use that production rule. The remaining symbols in the right-hand side of the rule are stored as symbols that are expected to be encountered later on during parsing, similar to top-down parsing (Hale, 2014).

(7)

The left-corner parsing method employs three different operations: the shift operation checks whether the next word in the sentence exists as a terminal node in the grammar, and if this condition holds, it is pushed on top of the stack; the project operation checks whether the top of the stack contains symbol B and whether a rule exists in the grammar such that A Ñ B γ, if both of these condi-tions hold, B is replaced with A and the expectation of encountering [γ], which is a combination of one or more symbols, for example: γ = C1C2C3...Cn; and

the project and complete operation checks whether the top of the stack contains symbol B and whether the element below it on the stack is an expectation [A], if these conditions hold, then both B and [A] are replaced with the components of [γ] (Hale, 2014).

2.1.3 Garden-pathing

In the field of linguistics, a phenomenon that exposes the difference between existing parsing methods and the mechanisms that humans use to parse or read a sentence is called garden-pathing. A sentence is said to be garden-pathed if it is a grammatically correct sentence and if a human reader will read the initial part of the sentence with a certain expectation in mind of what is to come. However, that expectation is never met by the sentence, confusing the human reader as to whether the sentence is grammatically correct or not. Such sentences are considered temporarily ambiguous. A classic example of a garden-path sentence is as follows (Bever, 1970):

The horse raced past the barn fell

Upon reading, human readers generally follow a structure that will not re-sult in a proper syntactic structure for the sentence, as the correct structure is often the less intuitive one. Most parsers would not parse such a garden-path sentence incorrectly, which might indicate that such parsers are not psycholin-guistically accurate, because garden-path theory suggests that human readers resolve ambiguities in sentences by following a single-path analysis for those sentences, which is incorrect in the case of garden-path sentences (Frazier and Rayner, 1982). Displayed below are the incorrect single-path parse that is often followed by human readers and the correct parse which would be found by most parsers: S ? fell VP PP NP NN barn DT the IN past VBD raced NP NN horse DT the

(a) Incorrect parse

S VP VBD fell NP VP PP NP NN barn DT the IN VBN raced NP NN horse DT the (b) Correct parse

(8)

2.2 Cognitive architectures

In the field of cognitive science, cognitive architectures are theories that attempt to model the human mind. According to Newell (1994), all behavior in humans is fueled by one single system, the mind, which may consist of multiple parts. As such, cognitive architectures as theories must take this into consideration, meaning that a cognitive architecture should explain all behavior produced by the mind, as opposed to selecting one aspect of behavior to be explained.

Adaptive control of thought-rational, or ACT-R (Anderson et al., 2004), is an example of such a cognitive architecture, consisting of multiple modules and components — each of which is concerned with processing different types of information — and integrating them into one single structure that can be used in a simulation to perform tasks. Every module has its own buffer, which allows for the storage of a certain amount of information and also enables ACT-R to differentiate between the information of which there is awareness (e.g. a certain event in memory which has been recalled) and the information of which there is no awareness at a specific moment in time (e.g. all other events in memory). Information that is present within the buffers may be exchanged with the production system, which is the central module that connects all of the other modules and provides the architecture with procedural knowledge. This module contains a number of production rules, which are structured as if-then clauses specifying certain conditions that must be met (e.g. for the contents of certain buffers) for certain actions to be carried out by the architecture. The execution of production rules in ACT-R is serial, which necessitates a manner to select the production rule to use at a point in time during simulation, as there might be a possibility that more than one production rule can be employed at that specific point in time, based on the conditions. This selection is enabled by a production utility equation, where the production rule that has the highest utility associated with it will be employed.

Besides a procedural memory module, the architecture also consists of a declarative memory module, which allows it to store factual knowledge. Linked to this module is the retrieval buffer, which enables the recall of facts from the declarative memory module, facilitating their usage by the central production system. These facts are stored in so-called chunks, which are lists of attributes and values that allow the architecture to differentiate between different types of facts. For every chunk an activation level can be computed, allowing ACT-R to determine the retrieval latency for that specific chunk.

Furthermore, there is a perceptual-motor system, which as its name suggests, is divided into a system which allows the architecture to process visual input and a system which allows the architecture to perform actions in a simulated environment. The visual system in ACT-R is unique in that it consists of two different modules to account for visual input: a visual-location module and a visual-object module, each connected to its own buffer. The two-stream hypothesis of visual processing is implemented in ACT-R through these two modules, as the visual-location module acts as the dorsal ”where” stream and the visual-object module serves as the ventral ”what” stream. The production system can utilize these two modules to search in the environment by applying constraints to productions, which pertain to the location of an object when sent to the visual-location system and to the properties of an object when sent to the visual-object system. Object identification can be achieved by sending a chunk

(9)

containing a location in the environment to the visual-object system, allowing this system to move its attention to this location and store the object at that location as a chunk in the declarative memory. The motor module in ACT-R is based on EPIC from Meyer and Kieras (1997) and allows for the control of hand movement.

Also present is a goal module and an associated buffer to record the inten-tions of the architecture during a simulation, allowing ACT-R to demonstrate behavior that acts towards its goals, without demand for external stimuli. The goal module stores the overarching goal of ACT-R during a task, but it may also store intermediate goals that ought to be reached to achieve the final goal. Displayed below is the cognitive architecture and the relations between each of its modules.

Figure 3: Schematic representation of ACT-R, image courtesy of Anderson et al. (2004)

2.3 Eye tracking in psycholinguistics

One important type of experiment in research related to psycholinguistics is eye tracking (Duchowski, 2007). Although a plethora of different eye tracking methods exists, most techniques share in common the monitoring of the fixations and the motions of the eyes over a period of time during the tracking process. Examples of eye motions that may be tracked are: saccades, which are rapid eye movements used to find a new fixation location; and regressions, which are revisits to words that have been read previously, when reading a sentence (Duchowski, 2007).

Eye tracking in a psycholinguistic setting is mostly applied to reading ex-periments, in which human subjects are tasked to read through sentences that are displayed on a screen while an eye tracking device records their eye move-ments. In certain experiments, the readers may also be asked questions about the sentences they read, to ensure that they maintained an understanding of the sentences during reading. From this eye tracking data, different measures like first pass reading times and total reading times may be derived.

(10)

An alternative type of reading experiment is self-paced reading, which differs slightly from eye tracking experiments, as human subjects in the latter are shown full sentences on the screen for them to read, whereas human subjects in the former are shown one word at a time and given the ability to proceed to the next word at the press of a button. In self-paced reading experiments, only the time between button presses is recorded as the reading time and it is not possible for the human subject to regress to previously read words.

2.4 Related work

Syntactic parsers as described earlier are widely used within computational lin-guistics, but are not limited to applications within this field of research, as parsers are also utilized in the fields of psycholinguistics and cognitive science.

In Demberg and Keller (2008) for example, parsing was used in order to predict eye fixation durations on sentences by using two theories for syntac-tic processing complexity: dependency locality theory (DLT), which postulates that the cause of processing complexity is the cost of computational resources that are consumed by a processor; and surprisal, which quantifies the probabil-ity of a word in a linguistic context. DLT is divided into two costs: integration cost, which is the cost of integrating new inputs into a built structure at a given computational stage; and memory cost, which is the cost of storing parts of the input that are to be used in parsing later parts of the input. The research concluded that DLT integration cost was not a significant predictor for reading times of arbitrary words, but it did demonstrate that DLT integration cost was a successful predictor of reading times of nouns. An unlexicalized formulation of surprisal happened to be a successful predictor for reading times of arbitrary words. Through comparison, it appeared that the two measures were uncor-related with each other, which implied that a complete theory must integrate both theories to account for processing complexity.

Similarly, in Boston et al. (2011) eye fixation durations on sentences were predicted using two metrics: surprisal; and cue-based retrieval, which quantifies the comprehension difficulty of a given sentence. This paper also explains the notion of parallel processing in cognitive theories, which states that the mind can perform multiple cognitive operations at the same time. A family of depen-dency parsers was used to examine the predictions at multiple levels of assumed parallel processing in the human mind. In dependency grammars, words de-pend on other words in asymmetric relations between heads and dede-pendents, where one head may have multiple dependents, but not vice versa. This research demonstrated that surprisal and retrieval became more proper predictors of eye fixation duration once a higher degree of parallel processing is assumed.

Beyond this, in Engelmann et al. (2013) the aim was to examine the interac-tion between oculomotor control and sentence-level language comprehension by using surprisal and cue-based retrieval as measures for parsing difficulty, which were both integrated into the eye movement model EMMA (Salvucci, 2001) inside the cognitive architecture ACT-R. Moreover, a reading model was imple-mented that initiated Time Out regressions, which allowed the model, in the case that a word in a sentence was identified more rapidly than a word before it, to regress to the previously read word and continue with normal reading once the previous word had been identified.

(11)

3 Research

3.1 Materials

3.1.1 Python implementation of ACT-R

The parser was constructed in the Python 3.5 implementation of ACT-R, pyactr (Dotlaˇcil, 2018), which implements the aforementioned modules from section 2.2. pyactr utilizes mostly the same formulas as ACT-R 5.0 (Anderson et al., 2004) and this section will outline in more detail the formulas that are of importance in this research.

In pyactr, chunks are represented as lists of slot-value pairs and the equation used to compute the activation level of a specific chunk is as follows:

Ai“ Bi` ΣjWjSji (1)

In this equation, Ai represents the activation level of chunk i; Bi is the base

activation level of chunk i; Wj is the weight of an element in the goal buffer,

which is set to 2; and Sjiis the strength of association between element j in the

goal buffer to chunk i, which is estimated to be equal to S ´ logpf anq, where S is the general strength of association and f an is the number of facts in the declarative memory associated with element j (Anderson, 2007, p. 110, Table 3.2).

In order to establish the base activation level Bi, a learning equation is

applied:

Bi“ lnpΣn_j“1t´dj q (2)

Where tj is the amount time that has passed since the architecture retrieved

chunk i for the j-th time and where d is a parameter with a default value of 0.5. The base activation level of a chunk quantifies how prior use and retrieval affect memory.

Subsequently, the activation level Ai for chunk i is utilized to determine the

probability of retrieval for that chunk, which is achieved using this formula: Pi“

1 1 ` expp´Ai´τ

s q

(3)

Where τ denotes the retrieval threshold for activation and s is the noise param-eter, which has a default value of approximately s “ 0.4. The resulting value of this equation quantifies the probability for a chunk to be retrieved from the declarative memory.

Finally, the activation level is used to determine the latency of retrieval Ti

for chunk i:

Ti“ F ¨ expp´f ¨ Aiq (4)

Where F is the latency factor and f is the latency exponent, which is often simply set to 1. The retrieval latency for a chunk signifies the time in seconds needed for the architecture to recall said chunk.

Although the values used for the retrieval threshold τ and the latency factor F differ from task to task, a relationship between the two variables has been proposed in Anderson et al. (1998):

(12)

Which indicates that the latency of retrieval for a chunk with activation level Ai “ τ , given latency exponent f “ 1 is defined as Ti“ 0.35ëxppτ qëxppÁiq “

0.35 seconds (by substitution of equation 5 into equation 4).

Procedural memory productions in pyactr are encoded as strings, where an arrow is used to differentiate between the consequent and the antecedent of the implication. In order to determine which production rule to employ at a given time during simulation, the system computes the production utility:

Ui“ PiG ´ Ci (6)

Where Pi is an estimation of the probability that the current goal will be

achieved, given that production rule i is employed; G is the value of said goal; and Ci is an estimation of the cost to reach the goal. Since these equations

are not of specific importance for this research, refer to p. 1044 of Anderson et al. (2004) for further explanation of the equations that underlie the procedural memory module of ACT-R.

3.1.2 Penn Treebank corpus

The data that was fed into the parser was garnered from Penn Treebank corpus (Marcus, Marcinkiewicz, and Santorini, 1993), which is a corpus consisting of 25 sections of annotated English sentences (approximately 50.000 in total) from the Wall Street Journal. These sentences are stored as parse trees in order to show the syntactic structure, which allows for the usage of these parse trees as training data in a data-driven parser. Furthermore, Penn Treebank was used to test the performance of the parser. The table below displays the split of data for training and testing of the parser:

Phase Section(s) used Training 0-20

Testing 22 & 25

Table 1: Split of Penn Treebank corpus for different phases

None of the sections were used as development data, because the focus of this research was specifically to model behavioral data, not to maximize parsing performance. Therefore, a decent parsing performance was considered sufficient for the research to start applying the parser to the behavioral data.

3.1.3 Behavioral data from Frank et al. (2013)

Since this research was concerned with a syntactic parser in a psycholinguistic setting, it was used to model self-paced reading data and eye tracking data from Frank et al. (2013). Both data sets draw from a small corpus of 361 English sentences, which were in turn collected from three novels, which were uploaded on a website for free use. The sentences were corrected in case of grammatical or spelling errors and consist of 13.7 words on average (σ “ 6.36). Furthermore, the occurrence of identical proper names was limited to two across all sentences, to prevent the human subjects from connecting different sentences to construct a story.

(13)

The self-paced reading data was produced by 117 human subjects, each of whom was tasked to read the aforementioned 361 sentences and was subse-quently asked yes/no questions about specific sentences they had read. The data was acquired by allowing the subjects to read the sentences word-by-word and letting them press a button to display the next word, recording the time in milliseconds between button presses as the reading time for that word. Data of human subjects of whom the error rate for the question answering was higher than 25% was excluded, thus the final data set consists of data from 104 sub-jects.

In order to collect the eye tracking data, 205 of the original 361 sentences were selected, as those were the sentences that could fit on the display from which the human subjects had to read, in a single line. In the experiment, 43 human subjects had their eye movements tracked while reading these 205 sentences. The rate at which the eye tracking was performed was set to 500 Hz. Four measures were derived from the eye tracking data: first fixation time, which is the duration of the first fixation on a certain word; first pass time or gaze duration, which is the sum of fixation durations of a human subject on a certain word before that human subject fixates for the first time on another word; right-bounded reading time, which is the sum of fixation durations of a human subject on a certain word before that human subject fixates for the first time on another word to the right of the current word; and go past reading time, which is the sum of fixation durations of a human subject on a certain word up until the first fixation on a word to the right of the current word. Since go past reading time also includes the fixations on words to the left of the current word, this measure is also called regression-path time. In the same fashion as mentioned before, in order to minimize the question answering error rate, the data of one human subject was excluded, thus the final eye tracking data set consists of data from 42 subjects.

3.1.4 Behavioral data from Dundee corpus

As the aforementioned eye tracking data set from Frank et al. (2013) only com-prises of 205 sentences, the parser was also used to model data from the Dundee corpus (Kennedy, Hill, and Pynte, 2003). The Dundee corpus contains eye tracking data produced by ten native English readers and ten native French readers over twenty texts comprising of approximately 50.000 words that were displayed as paragraphs on a screen. The rate of sampling for the eye tracking was set to 1000 Hz (Cop et al., 2017). The measures that were derived from the experiments are fixation duration per word, in which the same word may occur multiple times and landing positions for each word.

(14)

3.2 Method

3.2.1 Preprocessing

Before the annotated data from the Penn Treebank could be used, the syntactic trees had to be converted to a binary form, similar to the tree conversion speci-fied in Roark (2001), only slightly modispeci-fied in the sense that no empty terminal symbol was added to unary rules and thus these rules remained unary after conversion. A formal definition of this conversion is as follows:

Let G “ pV, T, P, Sq be the original Penn Treebank grammar, where V is a set of non-terminal symbols; T is a set of terminal symbols; S P V is the start symbol; and P is a set of production rules, for example: A Ñ α, where α P pV Y T q˚_{. Let G}

cbe the converted grammar. Every rule in the

converted grammar must satisfy one of the following definitions: 1. (A Ñ B A-B) P Gc iff (A Ñ Bβ) P G, such that B P V and β P V˚

2. (A-α Ñ B A-αB) P Gc iff (A Ñ αBβ) P G, such that B P V , α P V` and

β P V˚

3. (A Ñ a) P Gc iff (A Ñ a) P G, such that a P T

This rule conversion was applied in order to prevent a lookahead problem from occurring, when using left-corner parsing. For instance, consider a sentence that is being parsed, of which the gold standard parse contains a rule with a great number of symbols on the right-hand side. Since a left-corner parser only determines whether the left-corner of a right-hand side occurs, it has to store the rest of the right-hand side as a prediction, but in the case of a great number of symbols, it would almost be impossible to predict such a rule.

After the rule conversion was completed, the newly binarized rules were used to establish a probabilistic context-free grammar, through computation of the conditional frequencies of the Penn Treebank rules. For every rule, the frequency of the symbols on the right-hand side were computed, given the symbol on the left-hand side. These frequencies were computed for rules that ended in terminal symbols (rule 3 in the definitions above) and rules that ended in non-terminal symbols (rules 1 and 2 in the definitions above). For rules of type 1 2 separate files were created for first order, second order and third order rule frequencies. The order indicated how far the expansion into the left-corner was, before the frequency was computed. These conditional frequencies were utilized by the parser later on to perform parsing on sentences.

The rule structure of the PCFG created by the preprocessing as explained above will be called default rule structure from here on, that is, the tagset of Penn Treebank was not modified in the default rule structure. Furthermore, another PCFG was constructed, with simple rule structure, in which coreference tags were omitted from non-terminal symbols where this information occurred. For example, noun phrases in Penn Treebank often carry information about their role in the sentence, displayed as: NP-SBJ, to indicate that this noun phrase is the subject. In the simple rule structure, this was converted to NP. Moreover, another PCFG was constructed, namely one with a default rule structure which has been lexicalized. The lexical head of each rule was selected as follows: if the left-hand side of the rule was a noun phrase, verb phrase or adjective phrase,

(15)

the first word in the phrase that matched the phrase type was selected as the lexical head. For other phrase types, the first word of the phrase was selected as the lexical head for the rule (van Oers, 2018). The performance of these three different rule structures was compared and contrasted when the parser was applied to behavioral data.

The sentences from the behavioral data of Frank et al. (2013) were also preprocessed slightly, as punctuation such as commas and periods (at the end of sentences) were detached from the words they were previously attached to. This modification was made, because Penn Treebank corpus also treats punctuation as separate tokens. Furthermore, the sentences were tagged by the part-of-speech tagger of the nltk package.

The Dundee corpus required more extensive preprocessing, as it is a much less organized data set than the data from Frank et al. (2013). In the corpus, the stimuli are not grouped together as sentences, rather, the words are grouped by text and line number. Since the pyactr parser only accepted single sentences, the corpus had to be preprocessed to identify sentence numbers, before tagging was possible. The algorithm used for this task is displayed in Appendix A.1.

A sentence in Dundee corpus was considered finished if the last character of the last word was either a period, a question mark or an exclamation mark (line 7 in algorithm). Furthermore, words were brought to lowercase and leading and trailing apostrophes were removed from the words, however, apostrophes in contractions were not removed by this operation (lines 12 and 13 in the algorithm). Words were checked for any remaining alphanumeric, non-apostrophe, non-dash characters, which were subsequently separated from the words with a space and treated as separate words (lines 14-20). This was done to conform to the structure of Penn Treebank, in which punctuation is separated from other words.

Beyond this, recall from section 3.1.4 that the eye tracking data in the Dundee corpus only consists of fixation duration per word, where the same word can occur multiple times. This data was preprocessed, in order to derive the first pass reading time and the total reading time for each word. The fixa-tion durafixa-tion registered for the first occurrence of a specific word in a sentence was interpreted as the first pass reading time for that word, and the sum of all fixation durations for that word in the sentence were interpreted as total reading time for that word.

The sentence tagging was applied to the first five English texts of the corpus, resulting in a total of 554 sentences with an average length of 25.3 words (σ “ 13.5). The preprocessing of eye tracking data was applied to data of the first nine human subjects (subjects a, b, c, d, e, f, g, h, i) for those first five texts. Subject j was omitted, as the data for this subject caused unidentified errors during preprocessing.

3.2.2 Parser construction

Before a left-corner parser could be constructed in pyactr, the types of chunks that were to be imported into the declarative memory had to be specified. The first chunk type to be defined was the word chunk, which stored information about the form or pronunciation of the word and the category of the word as it occurs in Penn Treebank, this information was retrieved from the rules in the Treebank of the form as shown in rule 3 in the rule conversion. Secondly, the

(16)

reading chunk stored information about the current state of the parsing process; the current word that was being parsed and its position; the rule symbol that was expected to be encountered; and the top four contents of the stack used by the parser. Lastly, the rule chunk stored the head of the rule that is expected, based on the top of the stack; the two daughters of this expected rule; the rule that was actually found; and the two predicted daughters of this rule. Below are three example chunks, displayed as attribute-value matrices:

„

form distance category NN



(a) AVM matrix for the word chunktype » — — — — — — — — — — – state parsing position None word None expected NP-JJ stack2 VP-VBD-NP stack3 S-S-,-S stack4 S-S stack5 None fi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi fl

(b) AVM matrix for the read-ing chunktype » — — — — — — — — – expected PP-IN daughter1 None daughter2 NP found PRP$ predicted1 NP-PRP$ predicted2 None predicted3 None fi ffi ffi ffi ffi ffi ffi ffi ffi fl

(c) AVM matrix for the rule chunktype

Figure 4: Attribute-value matrices for different chunktypes as they might be used during parsing

Subsequently, a pyactr model was instantiated in pyactr with latency factor F “ 0.1; latency exponent f “ 0.1; retrieval threshold τ “ ´20; noise s “ 0.8; and strength of association S “ 10. The values for these settings were chosen rather arbitrarily and should be optimized in the future.

The parser was designed with a parallelization factor of 5, which allowed it to record at most five different parses at any given moment during simulation. Separate goals and expectations were stored in the goal buffer of the ACT-R model for each of these five parallel parses. Then, the ACT-R model was granted the capability to parse, by encoding the left-corner parsing method as production strings. Left-corner parsing was chosen, because of the advantages this method provides over top-down and bottom-up parsing, and because it resembles the manner in which human readers parse sentences more closely than these two parsing methods. A few examples of production strings and chunk declarations described here are included in Appendix A.2.

Firstly, a reading chunk was added to the goal buffer of the ACT-R model and was initialized with the state being start_reading and word position set to 0, to indicate that the parser was ready to start reading and parsing the sentence. Then, for each of the five parallel parses a reading chunk with state set to parsing and expectation of symbol S, the start symbol, was stored in the goal buffer.

Before the parser could be used to parse a sentence, all of the words of that sentence and their respective part-of-speech tags had to be loaded into the declarative memory of pyactr. This production was employed if the reading chunk was equal to finished_recall, indicating that it could move to the next word and read that word. If this condition was met, the state of the reading chunk would be set to start_reading and its word and position would be modified to match the next word and its position in the sentence.

(17)

Furthermore, the converted Penn Treebank rules ending in terminal symbols, and the first, second and third order rules from section 3.2.1 were loaded into the declarative memory of pyactr, with decay factor set to 0.2, which controls the speed of the decay of the reading time for ACT-R.

Finally, the ACT-R model was given production rules that enabled for: the recall of lexical information from declarative memory; the recall of rules supplied by Penn Treebank from declarative memory; a set of rules for each parallel parse that allowed the model to identify when the parsing was done for certain predictions and to generate new predictions; and rules that allowed it to flush the remaining information from the buffers and the stack.

After the model was provided with these production rules and chunks for a specific sentence, simulation could begin. During simulation, the model selected the appropriate production to employ, given the state of the parsing process, and utilized these productions to parse a specific sentence. The model selected the five most likely parses at every step of the process, by selecting the grammar rule chunks from the declarative memory with the highest retrieval probabilities. Since these probabilities were subject to noise, the model was not deterministic in its parsing output. The parsing results could be extracted from pyactr by recording the constituents (from Penn Treebank) found for certain parts of the sentence. In addition to this, when a production for rule recall was fired, the (at most) five activation levels, for the rule chunks that were retrieved, were recorded. From the five parallel parses, the most likely parse was selected based on the total of all the activation levels for that parse.

3.2.3 Parser evaluation

Once the parser had been constructed, its performance was evaluated on Penn Treebank corpus. In order to do this, the sentences from section 22 and 25 of Penn Treebank were taken and fed into the parser to produce parse results and from these parse results a mapping was constructed, which mapped phrases of the sentences to their constituents (as found by the parser). For exam-ple, the sentence "champagne and dessert followed .", if parsed correctly, would have resulted in the following mapping:

{(’champagne’, ’and’, ’dessert’): ’NP-SBJ’, (’followed’,): ’VP’}

In the same fashion, a gold standard mapping was constructed from the trees of the sentences as they occur in Penn Treebank section 22 and 25, which allowed for comparison between the parse result and the gold standard. In order to assess the performance of the parser, the precision and recall were computed using the following formulas:

precision “ # correct constituent mappings in parse result

# total constituent mappings in parse result (7) recall “ # correct constituent mappings in parse result

# total constituent mappings in gold standard (8) In order to fit the results of the parser to the behavioral data, the sentences from Frank et al. (2013) and Dundee corpus were parsed and three measures were extracted from pyactr during the parsing process: the minimum activation

(18)

of the parallel parses at a given simulation step; the maximum activation of the parallel parses at a given simulation step; and the average activation of the parallel parses at a given simulation step. These activations were computed for each word by using the aforementioned formula 1 once the parser moved to the next word. They were chosen as measures to fit to reading times, because these values were directly involved in the computation of the retrieval latency (formula 4) in pyactr.

For every sentence in Frank et al. (2013), the reading times and parser measures of the first and last word were omitted, as the reading times on the first and last word of a sentence are usually significantly slower then the reading times for the other words. The remaining parser measures, reading times (in the case of self-paced reading data) and the first pass, right-bounded and go past reading times (in the case of eye tracking data) were appended to lists. Correlation analysis was performed by computing the Pearson correlation coefficient between the lists of three parser measures and the lists of reading times from the data of Frank et al. (2013) individually. In the same fashion, correlation analysis was performed on the parser measures and the first pass and total reading times for the Dundee corpus.

For both the data from Frank et al. (2013) and the Dundee corpus, the same correlation analysis as described above was performed, but with the activation levels from pyactr shifted to the right by 1. This shift caused the activation levels produced for the first word of a sentence to be aligned with the reading time of the second word, the activation levels for the second word to be aligned with the reading time of the third word, and so forth. This was done, because it could have been the case that the effects of the activation levels on the reading times was delayed by one word.

Finally, the parser was used to parse several classic examples of garden-path sentences, to assess whether mistakes similar to those made by human readers were made. These examples were taken from several papers on garden-path theory, namely Frazier and Rayner (1982), Staub (2011) and Tabor, Galantucci, and Richardson (2004), and every example was a pair: one sentence which was not temporarily ambiguous, and one sentence which was temporarily ambiguous (garden-pathed). It would be expected that the parser parsed the unambiguous sentences correctly, while it would be expected that it parsed the garden-path sentences — more often than not — incorrectly, similar to how human readers would have parsed these sentences.

(19)

4 Results

The parser has been evaluated on Penn Treebank corpus section 22 with the parallelization factor limited to 5 parallel parses at any given time. In addition to this, the parser was tested on section 25, a smaller section, with the same parallelization factor and with three different rule structures. Below are the resulting recall, precision and the percentage of sentences that were not parsed as reported by the parser, given the rule structure used by the parser.

Rule structure Average recall Average precision Percentage skipped

Default 0.397 0.431 19%

Table 2: Results of parser on Penn Treebank corpus section 22, number of sentences = 1.700

Rule structure Average recall Average precision Percentage skipped

Default 0.367 0.395 15%

Simple 0.291 0.352 22.5%

Lexicalized 0.348 0.388 15%

Table 3: Results of parser on Penn Treebank corpus section 25, number of sentences = 40

Furthermore, correlation analysis between the activations as reported by the parser and the reading times from self-paced reading data (Frank et al., 2013) yielded the following results:

Rule structure Measure Correlation coefficient p-value Default Minimum activations -0.0391 0.0678 Maximum activations 0.0161 0.451 Average activations -0.00191 0.928 Simple Minimum activations -0.00358 0.867 Maximum activations -0.00672 0.754 Average activations -0.000942 0.965 Lexicalized Minimum activations -0.00471 0.827 Maximum activations 0.00419 0.846 Average activations -0.00479 0.824 Table 4: Results of parser on self-paced reading data (Frank et al., 2013), num-ber of sentences = 239

(20)

Rule structure Measure Correlation coefficient p-value Default Minimum activations -0.0249 0.275 Maximum activations -0.0156 0.494 Average activations -0.0138 0.547 Simple Minimum activations -0.0223 0.329 Maximum activations -0.0127 0.577 Average activations -0.0163 0.475 Lexicalized Minimum activations -0.0349 0.126 Maximum activations -0.0309 0.175 Average activations -0.0363 0.111 Table 5: Results of parser on self-paced reading data (Frank et al., 2013), num-ber of sentences = 239, shift=1

Correlation analysis between the activations as reported by the parser and the reading times from the eye tracking data from Frank et al. (2013) led to the following scores:

Mean first-pass RT Mean right-bounded RT Mean go past RT Rule structure Measure Correlation p-value Correlation p-value Correlation p-value

Default Minimum activations 0.0522 0.0600 0.0422 0.128 0.0364 0.189 Maximum activations 0.0363 0.191 0.0304 0.272 0.0229 0.409 Average activations 0.0506 0.0685 0.0423 0.127 0.0363 0.191 Simple Minimum activations 0.126 0.00000504 0.116 0.000030 0.0930 0.000796 Maximum activations 0.0618 0.0259 0.0499 0.0724 0.0277 0.319 Average activations 0.108 0.0000914 0.0959 0.000541 0.0716 0.00985 Lexicalized Minimum activations 0.231 3.48 ¨ 10´17 _0.212 _{1.24 ¨ 10}´14 _0.170 _{6.65 ¨ 10}´10 Maximum activations 0.183 3.00 ¨ 10´11 0.165 2.45 ¨ 10´9 0.128 4.01 ¨ 10´6 Average activations 0.218 2.11 ¨ 10´15 _0.200 _{4.01 ¨ 10}´13 _0.159 _{8.30 ¨ 10}´9

Table 6: Results of parser on eye tracking data (Frank et al., 2013), number of sentences = 180

Mean first-pass RT Mean right-bounded RT Mean go past RT Rule structure Measure Correlation p-value Correlation p-value Correlation p-value

Default Minimum activations -0.0684 0.0223 -0.0703 0.0187 -0.0705 0.0184 Maximum activations -0.105 0.000462 -0.106 0.000394 -0.0895 0.00276 Average activations -0.088808 0.00297 -0.0897 0.00268 -0.0800 0.00746 Simple Minimum activations -0.0740 0.0134 -0.0807 0.00695 -0.0941 0.00163 Maximum activations -0.0854 0.00428 -0.0921 0.00206 -0.0857 0.00415 Average activations -0.0945 0.00156 -0.101 0.000702 -0.103 0.000546 Lexicalized Minimum activations -0.239 5.49 ¨ 10´16 _-0.241 _{3.00 ¨ 10}´16 _-0.233 _{3.48 ¨ 10}´15 Maximum activations -0.243 1.79 ¨ 10´16 _-0.244 _{1.26 ¨ 10}´16 _-0.229 _{1.06 ¨ 10}´14 Average activations -0.248 3.64 ¨ 10´17 _-0.250 _{2.26 ¨ 10}´17 _-0.237 _{9.86 ¨ 10}´16

Table 7: Results of parser on eye tracking data (Frank et al., 2013), number of sentences = 180, shift=1

(21)

Correlation analysis between the activations as reported by the parser and the reading times from the eye tracking data from Dundee corpus (Kennedy, Hill, and Pynte, 2003) resulted in the following correlations:

Mean first-pass RT Mean total RT Rule structure Measure Correlation p-value Correlation p-value

Default Minimum activations 0.167 8.27 ¨ 10´53 _0.179 _{6.96 ¨ 10}´61 Maximum activations 0.168 7.39 ¨ 10´54 _0.180 _{2.84 ¨ 10}´61 Average activations 0.187 1.71 ¨ 10´66 _0.200 _{1.19 ¨ 10}´75 Simple Minimum activations 0.154 3.64 ¨ 10´46 _0.168 _{1.02 ¨ 10}´54 Maximum activations 0.121 1.15 ¨ 10´28 _0.131 _{2.03 ¨ 10}´33 Average activations 0.160 8.63 ¨ 10´50 0.171 1.56 ¨ 10´56 Lexicalized Minimum activations 0.276 9.97 ¨ 10´145 0.275 5.49 ¨ 10´144 Maximum activations 0.260 6.28 ¨ 10´129 _0.256 _{6.29 ¨ 10}´125 Average activations 0.280 3.65 ¨ 10´149 _0.276 _{1.15 ¨ 10}´145

Table 8: Results of parser on eye tracking data from Dundee corpus, number of sentences = 443

Mean first-pass RT Mean total RT Rule structure Measure Correlation p-value Correlation p-value

Default Minimum activations -0.136 9.79 ¨ 10´34 _-0.116 _{9.31 ¨ 10}´25 Maximum activations -0.0406 3.15 ¨ 10´4 _-0.0228 _{4.35 ¨ 10}´2 Average activations -0.0962 1.31 ¨ 10´17 _-0.0772 _{7.17 ¨ 10}´12 Simple Minimum activations -0.0579 2.29 ¨ 10´7 _-0.0530 _{2.00 ¨ 10}´6 Maximum activations 0.0288 9.94 ¨ 10´3 _0.0282 _0.0116 Average activations -0.0135 2.26 ¨ 10´1 _-0.00845 _0.450 Lexicalized Minimum activations -0.130 5.83 ¨ 10´31 _-0.114 _{4.06 ¨ 10}´24 Maximum activations -0.107 2.13 ¨ 10´21 _-0.0867 _{1.28 ¨ 10}´14 Average activations -0.123 8.32 ¨ 10´28 _-0.103 _{3.61 ¨ 10}´20

Table 9: Results of parser on eye tracking data from Dundee corpus, number of sentences = 439, shift=1

Below are the parse results on two pairs of unambiguous and garden-path versions of sentences. The full results are shown in Appendix A.3.

Sentence pair from Frazier and Rayner (1982):

1. Unambiguous: since jay always jogs a mile this seems like a short distance to him .

• Correct parse:

{(’a’, ’short’, ’distance’): ’NP’, (’to’, ’him’): ’PP’,

(’a’, ’mile’, ’this’, ’seems’, ’like’, ’a’, ’short’, ... ’distance’, ’to’, ’him’): ’NP’,

(’this’,): ’NP’,

(’jay’, ’always’): ’NP’,

(’this’, ’seems’, ’like’, ’a’, ’short’, ’distance’, ’to’, ’him’): ’S’, (’him’,): ’NP’,

(22)

(’a’, ’short’, ’distance’, ’to’, ’him’): ’NP’, (’short’,): ’ADJP’,

(’seems’, ’like’, ’a’, ’short’, ’distance’, ’to’, ’him’): ’VP’, (’like’, ’a’, ’short’, ’distance’, ’to’, ’him’): ’PP’,

(’since’, ’jay’, ’always’): ’PP’}

2. Temporarily ambiguous: since jay always jogs a mile seems like a very short distance to him .

• Incorrect parse: {(’jay’,): ’NP’,

(’always’, ’jogs’, ’a’, ’mile’): ’ADVP’, (’very’, ’short’, ’distance’): ’ADJP’,

(’like’, ’a’, ’very’, ’short’, ’distance’): ’PP’, (’jogs’, ’a’, ’mile’): ’VP’, (’him’,): ’NP’, (’jay’, ’always’, ’jogs’, ’a’, ’mile’): ’S’, (’a’, ’very’, ’short’, ’distance’): ’S’, (’a’, ’mile’): ’NP’,

(’to’, ’him’): ’PP’,

(’since’, ’jay’, ’always’, ’jogs’, ’a’, ’mile’): ’SBAR’}

Sentence pair from Staub (2011):

1. Unambiguous: while mary was mending , a sock fell on the floor . • Correct parse:

{(’while’, ’mary’): ’PP’, (’mary’,): ’NP’,

(’fell’, ’on’, ’the’, ’floor’): ’VP’, (’a’, ’sock’): ’NP’,

(’was’, ’mending’): ’VP’, (’mending’,): ’VP’,

(’on’, ’the’, ’floor’): ’PP’, (’the’, ’floor’): ’NP’}

2. Temporarily ambiguous: while mary was mending a sock fell on the floor .

• Incorrect parse:

{(’fell’, ’on’, ’the’, ’floor’): ’VP’,

(’mary’, ’was’, ’mending’, ’a’, ’sock’): ’S’, (’a’, ’sock’): ’NP’,

(’the’, ’floor’): ’NP’,

(’on’, ’the’, ’floor’): ’SBAR’,

(’was’, ’mending’, ’a’, ’sock’): ’VP’, (’mary’,): ’NP’,

(23)

5 Evaluation

5.1 Penn Treebank corpus

As can be seen in table 2, the parser with default rule structure achieved around 40% recall and 43% precision when employed on section 22 of Penn Treebank, with approximately 81% of the sentences parsed. These scores are lower than the achieved results of the basic parser from Roark (2001, p. 265, table 2), which produced a 71.1% recall and 75.3% precision. However, this parser had a parallelization factor between 200 and 500 parses per word, which is greater than the five parallel parses considered by the parser in this research, therefore the results of the cognitively constrained parser were deemed a good enough baseline to continue the research and use the parser on behavioral data.

Furthermore, the parsing capability of the parser using three different rule structures was evaluated on section 25 of Penn Treebank (table 3), which con-sists of 40 sentences. It appears that the default and lexicalized rule structure are close to each other in terms of performance, while the simple rule structure performed slightly worse. This could be explained by the fact that in the case of a simple rule structure, coreference tags were removed from the grammar rules, which might have caused the parser to employ less specific rules that may have lead to incorrect parses.

5.2 Self-paced reading data from Frank et al. (2013)

In table 4, the correlations between the activation levels of the parser and the self-paced reading data from Frank et al. (2013) are displayed. The parser failed to parse 121 sentences of the total 361 provided, mainly because these sentences were significantly longer than the other sentences. For all three types of rule structure, the correlations between the activations (with and without shifting) are close to 0 and their p-values indicate no significant correlations, because p ą 0.05 holds for all of them. A possible explanation for these results could be that the reading times per word in self-paced reading experiments are not representative of the fixation times on the words on the entire sentence, as the human subjects decided when to move to the next word by clicking manually.

5.3 Eye tracking data from Frank et al. (2013)

Furthermore, in table 6, the correlations without shifting for the eye tracking data from Frank et al. (2013) are shown. For this data set, there are quite signif-icant differences between the correlations for the three different rule structures. The three activation measures of the default rule structure do not seem to have any significant correlations with the behavioral data, although the p-values are closer to p “ 0.05 than in table 4.

For the simple rule structure, all three activation measures appear to have a correlation with a significant p-value with the mean first-pass reading times, the strongest correlation being with minimum activations and the weakest with maximum activations. This order of correlation strength mimimum ą average ą maximum appears to hold for correlations with mean right-bounded reading times and mean go past reading times as well, although the correlation with maximum activations is not significant with those two types of reading times.

(24)

These weak positive correlations could occur due to the fact that an increase in activation (such that it is greater than the retrieval threshold) for chunks across the parallel parses in pyactr caused the model to retrieve all of these chunks for each parallel parse, which might have increased the time it took for the model to process the word that was being read, whereas in the case lower activations, the model would not have had to retrieve all chunks, resulting in a lower latency.

For the lexicalized rule structure, the results follow a pattern similar to the results of the simple rule structure, the only difference being the strength of the correlations, as they are greater for the lexicalized structure. The increase in correlation strength could be caused by the fact that a lexicalized rule structure might be more similar to the rule structures utilized by human readers while parsing a sentence, when compared to the default and simple rule structures.

The results of the shifted correlations are displayed in table 7. Immediately noticeable across the board is the notion that the correlations are negative in the case of a shift by 1. As mentioned earlier, this could be due to the fact that the effects of activation levels on reading times might have been delayed by one word. The negative correlation would then indicate that an increase in activation level would be paired with a decrease in reading times, which is consistent with the formula that pyactr employs to compute its own retrieval latency (formula 4). Similar to the non-shifted results, the lexical rule structure provides the strongest (albeit negative) correlation with the eye tracking measures, whereas the correlations of the default and simple structures appear to be similar in strength.

5.4 Dundee corpus

In table 8, the non-shifted correlations of the activations on Dundee corpus are shown. When comparing these results to the results in table 6, it is noticeable that the correlations are positive as well, with the lexicalized structure provid-ing the strongest correlations for both data sets. Different from the results in 6 is that the correlations appear to be stronger across all three rule structures. This could be due to the fact that the average length of the sentences used from Dundee corpus was 25.3 words, which is significantly higher than the average of 13.7 words per sentence for Frank et al. (2013). It could be that the correla-tion between activacorrela-tion levels and reading times was better captured in longer sentences, as there were more data points to compute correlation between for a single sentence. Additionally, the higher correlation could be attributed to the higher sampling rate used for eye tracking in the construction of the Dundee corpus.

The shifted correlations in table 9, although weaker than the shifted results on the eye tracking data from Frank et al. (2013), are also significantly negative. For all the rule structures, the minimum activation appears to have the strongest negative correlation. This would be expected because: a decrease in activation level increases latency; and since the model considered multiple parallel parses at any given simulation step, the lowest activation level across those parallel parses would have been the best indicator of the overall latency for that simulation step. Conversely, the maximum activation having the weakest correlations for all rule structures would also be expected, because selecting the highest activation across parallel parses would have been less efficient when attempting to estimate

(25)

latency for the entire model at a simulation step.

5.5 Garden-path sentences

Finally, the results of the parser on two pairs of sentences, each of which contains one unambiguous sentence and one garden-path sentence are shown in 4. For the first pair (item 1), the unambiguous sentence was parsed somewhat as expected: although the parser attached the dependent clause

this seems like a short distance to him to the noun phrase of a mile, it did manage to correctly identify the dependent clause and map it to the constituent S. The garden-path sentence was parsed incorrectly, as can be seen from the fact that the phrase jay always jogs a mile was mapped to the constituent S, meaning that the parser followed the garden-path of the sentence. Moreover, the parser did not manage to find a constituent for the verb seems, because it already identified the noun a mile as an object.

For the second pair, the unambiguous sentence was once again parsed cor-rectly, because the parser correctly identified the two different clauses in the sentence, as they were separated by a comma. The garden-path version how-ever, was once again not parsed correctly, because there was no comma in this sentence to separate the two sentences, causing the parser to attach the phrase a sock to the first part of the sentence, resulting in the identification of mary was mending a sock as a clause by itself, while the correct parse would not select this attachment.

This pattern of parsing errors appeared in most of the examples of garden path sentences that were considered.

6 Conclusion

As was hypothesized and also shown by the results, the activation levels of the cognitively constrained parser fit decently to behavioral data, since moderately strong correlations exist between the them, although the nature of these cor-relations has two mutually exclusive explanations. Furthermore, as expected, the errors produced by the parser when parsing temporarily ambiguous sen-tences were similar to how human subjects would incorrectly interpret the same garden-path sentences, because the parser tried to attach the noun phrases lo-cally. This could indicate that this parser is more psycholinguistically accurate than state-of-the-art parsers that would not have produced the same errors.

The results of this research indicate that the cognitively constrained parser is psycholinguistically accurate to a degree. This could be a first step to a better understanding of human language processing and could grant more credence to the cognitive architecture of ACT-R as a theory to explain of the human mind.

7 Future work

There is still plenty of room for improvement and further research in this domain, and this section will briefly mention a few suggestions for future research on this subject.

Firstly, the performance of the parser in ACT-R could be optimized. This could be achieved by tuning the settings of the model, such as the latency

(26)

exponent, latency factor, retrieval threshold, the number of parallel parses, et cetera. The values for these parameters were chosen rather arbitrarily in this research. Additionally, a type of grammar other than a probabilistic context-free grammar could be chosen as the framework for the parser. Moreover, the existing production rules could be modified and new productions could be added to improve performance.

Secondly, linear regression could be used with the activation levels as input to predict reading times and to identify whether these activation levels could be significant predictors of those reading times. Alternatively, logistic regression could be used to attempt to predict whether a human reader would regress to a previous word or not, given the activation levels.

Finally, the visual module of ACT-R could also be integrated into the pars-ing process. This would cause the words of the sentence to be displayed in a simulated environment, and subsequently the ACT-R model would read these words and parse the sentence in the same manner as done in this research. The visual module would add an extra layer to the process and could possibly make the parser more psycholinguistically accurate.

(27)

8 References

Anderson, John R (2007). “How Can the Human Mind Occur in the Physical Universe?” In:

Anderson, John R et al. (1998). “An integrated theory of list memory”. In: Journal of Memory and Language 38.4, pp. 341–380.

Anderson, John R et al. (2004). “An integrated theory of the mind.” In: Psy-chological review 111.4, p. 1036.

Bever, Thomas G (1970). “The cognitive basis for linguistic structures”. In: Cognition and the development of language 279.362, pp. 1–61.

Boston, Marisa Ferrara et al. (2011). “Parallel processing and sentence com-prehension difficulty”. In: Language and Cognitive Processes 26.3, pp. 301– 349.

Cop, Uschi et al. (2017). “Presenting GECO: An eyetracking corpus of mono-lingual and bimono-lingual sentence reading”. In: Behavior research methods 49.2, pp. 602–615.

Demberg, Vera and Frank Keller (2008). “Data from eye-tracking corpora as ev-idence for theories of syntactic processing complexity”. In: Cognition 109.2, pp. 193–210.

Dotlaˇcil, Jakub (2018). pyactr. https://github.com/jakdot/pyactr.

Duchowski, Andrew T (2007). “Eye tracking methodology”. In: Theory and practice 328.

Engelmann, Felix et al. (2013). “A framework for modeling the interaction of syntactic processing and eye movement control”. In: Topics in cognitive sci-ence 5.3, pp. 452–474.

Frank, Stefan L et al. (2013). “Reading time data for evaluating broad-coverage models of English sentence processing”. In: Behavior Research Methods 45.4, pp. 1182–1190.

Frazier, Lyn and Keith Rayner (1982). “Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally am-biguous sentences”. In: Cognitive psychology 14.2, pp. 178–210.

Hale, John T (2014). Automaton theories of human sentence comprehension. CSLI Publications.

Jurafsky, Dan and James H Martin (2014). Speech and language processing. Vol. 3. Pearson London:

Kennedy, Alan, Robin Hill, and Jo¨el Pynte (2003). “The dundee corpus”. In: Proceedings of the 12th European conference on eye movement.

Marcus, Mitchell P, Mary Ann Marcinkiewicz, and Beatrice Santorini (1993). “Building a large annotated corpus of English: The Penn Treebank”. In: Computational linguistics 19.2, pp. 313–330.

Meyer, David E and David E Kieras (1997). “A computational theory of execu-tive cogniexecu-tive processes and multiple-task performance: Part I. Basic mech-anisms.” In: Psychological review 104.1, p. 3.

Newell, Allen (1994). Unified theories of cognition. Harvard University Press. Roark, Brian (2001). “Probabilistic top-down parsing and language modeling”.

In: Computational linguistics 27.2, pp. 249–276.

Salvucci, Dario D (2001). “An integrated model of eye movements and visual encoding”. In: Cognitive Systems Research 1.4, pp. 201–220.

(28)

Staub, Adrian (2011). “Word recognition and syntactic attachment in reading: Evidence for a staged architecture.” In: Journal of Experimental Psychology: General 140.3, p. 407.

Tabor, Whitney, Bruno Galantucci, and Daniel Richardson (2004). “Effects of merely local syntactic coherence on sentence processing”. In: Journal of Memory and Language 50.4, pp. 355–370.

van Oers, Evelyne (2018). “Cognitively constrained parsing in ACT-R: The ef-fects of function tags and lexical information on eye movement models”. In:

(29)

A

Appendices

A.1 Preprocessing algorithm for Dundee corpus

Algorithm 1: Tag sentences from one textfile in Dundee corpus Data: File containing words from one text of Dundee corpus Result: List of sentences and their POS tags

1 begin

2 sentences ÐÝ []; 3 current sentence ÐÝ []; 4 previous word ÐÝ ””; 5 for index, word in textfile do

6 if length of current sentence ą 0 then

7 if last character of previous word in ”.?!” then 8 sentences.append(current sentence);

9 current sentence ÐÝ [];

10 end

11 end

12 word, previous word ÐÝ word.lowercase();

13 word ÐÝ word.remove(apostrophes at start and end of word); 14 pattern ÐÝ regular expression to match to non-alphanumeric,

non-apostrophe, non-dash characters;

15 matches ÐÝ all matches in word according to pattern; 16 if matches != [] then

17 for match in matches do

18 word ÐÝ word.replace(match, ” ”+match)

19 end

20 word ÐÝ word.split(” ”)

21 end

22 if type(word) == list then

23 current sentence ÐÝ current sentence + word

24 else

25 current sentence.append(word)

26 end

27 end

28 Result ÐÝ [];

29 for sentence in sentences do

30 Result.append((’ ’.join(sentence), pos tag(’ ’.join(sentence)))

31 end

(30)

A.2 Example production strings for pyactr

Chunk string to initialize reading state for the model: parser.goals["g"].add(actr.chunkstring(string="""

isa reading

state start_reading position 0

word """+str(sentence[0])))

Example production string to store the lexical information when moving to the next word:

for i in range(len(sentence)-1):

parser.productionstring(name="moving from word " + str(i), string=""" =g> isa reading state finished_recall"""+str(NUMBER_PARALLEL+1)+""" position """+str(i)+""" word """+str(sentence[i])+""" ==> =g> isa reading state start_reading position """+str(i+1)+""" word """+str(sentence[i+1]))

Example production string to recall lexical information:

parser.productionstring(name="recall word cat", string=""" =g> isa reading state start_reading word ~None word =x ?retrieval> state free ==> ~retrieval1> ~retrieval2> ~retrieval3> ~retrieval4> ~retrieval5> =g> isa reading state recall_word +retrieval> isa word form =x""")

(31)

parser.productionstring(name="recall rule", string=""" =g> isa reading state recall_word =g1> isa reading state parsing =g2> isa reading state parsing =g3> isa reading state parsing =g4> isa reading state parsing =g5> isa reading state parsing ?retrieval> state free buffer full =retrieval> isa word form ~None cat =f ==> =g> isa reading state recall_rule ~retrieval> """)

Modeling behavioral data using a cognitively constrained parser - Left-corner parsing in ACT-R