ShuffleMill; Semantic Feature Extraction for Interactive Text

(1)

ShuffleMill; Semantic Feature Extraction for Interactive

Text

SUBMITTED IN PARTIAL FULLFILLMENT FOR THE DEGREE OF MASTER

OF SCIENCE

Jordan Brown

11277130

M

ASTER

I

NFORMATION

S

TUDIES

H

UMAN-

C

ENTERED

M

ULTIMEDIA

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

August 16, 2017

1st_Supervisor ₂nd_Supervisor

Dr. Frank Nack Hartmut Koenitz

(2)

ABSTRACT

This paper recounts and evaluates the efforts of a software project called ShuffleMill, which is meant to reshuffle the presentation of text in an experimental textual narrative. As the interactor progresses through the narrative, the software system begins to make certain structural transformations of its own. Meant to be the beginning of a larger text-processing toolkit, the eventual goal of the project is to leverage many of the more recent computational developments in algorithmic approaches to natural language processing. At the same time, the project aims to foreground both the author's intent and the richness of the text itself, thus diminishing the algorithmic phenomenon. The latter task separates the whole text into parts and mines the embedded knowledge, while striving to preserve the text’s coherence. The aim here is to isolate the basic information and including it in an inductive system that connects the original text, and the agents that interact with it. To the original author, the software offers natural language processing as an expressive option the afford the playful, yet intelligent arranging and rearranging of their own text. The result is an iterative shifting of the text towards patterns extracted from the text itself, as selected by the interactor. However, its success can be measured by its preservation as an interlocutor between the user and the text; each iteration in ShuffleMill connection by creating a balance between interactor’s interactor’s unpredictable selections and the machinic processes. Through this avenue, ShuffleMill offers the ability to fit into the framework of not only repurposed traditional text as well as preexisting interactive narratives not originally designed for its use.

Keywords

Natural language processing; interactive storytelling; hypertext; computational linguistics; literary theory; simulacra; rhetorical structure theory; discourse analytics

1. Introduction

As a software effort, the system alongside the project seeks to provide a possible answer the following compounded research questions. What can annotative structures, computational, author-implemented or otherwise, elucidate certain aspects of a text? Then, can these elucidated aspects of the text inform the construction of a system that dynamically transforms the text? And finally, can these transformations be triggered by user selection of subsections in the text, but simultaneously exist as a tool for authoring narrative devices in both traditional and interactive textual narratives?

Explicitly, the tool's broader insight underscores the language information extracted from the text as the instrumental means for enhancement in a text-based narrative design. The information of importance here is found within the text, and excavated from the text using a number of tools within computational linguistics and natural language processing. These attributes, then, are attached to the text in question when run through ShuffleMill. ShuffleMill looks for certain features at the behest of the author- ones which may provide additional, unforeseen insight if they were to become the focus. While this is the dominion of the author of the interactive narrative, the tool is equally concerned with rewiring

agency, particularly with regards to the ordering with which the viewer experiences the text. As a result, its usage hopes to impart bilateral consideration of interactivity between users in general. While there are reasonable concerns about the destruction of the narrative in shuffling its order[22], the interactivity through a digital medium assigns itself a different rulebook than traditional text[21]. Envisioned to operate most effectively in conjunction with narratives that do not rely on an obstinate order of sequence to build storyline, but instead evoke a more aspectual dynamic to build story.

Its layout and implementation is geared to remain extensible, supporting derivative or original authorship. initially, preexisting experimental text was implemented as primary source material, because of its well-established linguistic and artistic merit. Additionally, evoking new structures from only knowledge in and of itself - from the modernist classic Ulysses, as well as several other experimental literary texts such as Marriage by Marianne Moore and Mrs. Dalloway by Virginia Woolf - exemplifies the power of these tools in a more grounded frame of reference and critique. Figure 1 displays just how much certain underlying features can reformulate the entire text when selected by the user. The green, bottom, ordering displays results from an actual trial run using Ulysses. The blue ordering above is an example of its traditional ordering.

The project intakes the documents, splitting them first into significant events, phrases, and other lexical spans. Then, it extracts all relevant semantic information on the text, embedded within as annotation markup. Such a schema contains an internal model, written in regular expressions, designed to further elucidate the text. The author is given the ability to manipulate the values and emphases of these variables as well as add parameters of their own. Then, presented to the reader as any other sort of text-based narrative, it contains the system alongside instructions to highlight the parts of text they personally deem. At each iteration, when they move to the next page, the system processes the dynamics of the underlying selection, then based on these values reorders the text in terms of events. As a result, the text transforms itself based on both the semantic knowledge engendered by author and machine knowledge of the text. It lends itself to a more playful approach to narrative despite the complex, annotation-driven guideposts instilled at the core of this format. At the same time, however, the transformed presentation of the text preserves the entirety of the initial text, while even retaining its syntactic coherence and word-to-word lexicography.

(3)

Fig 1. an intuitive visualization of what happens when a traditional text(blue) is brought into ShuffleMill(green)

Figure 1 above represents a primitive representation of how the text is tangibly manipulated by the system. Traditional narrative proceeds unilaterally by this order of events by default, hereby assigned to numbers in order of their occurrence in the text. The same text implemented in ShuffleMill will proceed in the first 5 events in the same order and internal elements. Then, after 5, in a process described henceforth as an iteration, the user’s selection within the prior events redistribute the values of the events throughout the entirety of the text. This results in certain events that relate most significantly to the selections to drag towards each new iteration. These iterations carry on in a similar progression as traditional text, which is to say, in going through the text, there is no chance of going through unintended repetition of events.

1.1 Motivation

One central motivation in this project arose from seemingly disparate theories on the appropriate measurements and uses of language in the computational era driven by their codification[35]. One on hand, linguistics is fundamentally a humanities discipline. As with any other humanities discipline, many scrutinize, if not outwardly reject the possibility, the successfulness of a mechanized process – especially with analysis that might otherwise be done by the scholars themselves[26]. Meanwhile, the target of this skepticism would be remiss to address this. Instead, the camps transpose each other in stark contrast, despite models inclined to admit insight from one another. At the same time, for some, the developments in the computational insight into language could not be ignored, made valid concerns about what it qualifies as knowledge representation, for example. Another significant concern for this project was with markup in general as a weak primary structure of annotation.

With respect to design thinking relative to text, a motivating factor argued that an opportunity was being squandered. By not openly borrowing from disciplines that clearly borrowed from each other in the first place, systems and frameworks as such imposed unnecessary restrictions. From these concerns, an opportunity arose for experimenting with the text for purposes in accordance with emergent narrative, yet informed as valid experimentation methods based on more computational methods of extracting semantic and linguistic substructure from text.

2. Related Work

The most instrumental tenet in building this system, in conjunction with rhetorical structure theory in computational linguistics[40], that no mode of the process should be designed to fragment the text any more than its original design. Rather, the system should be designed such that every selection made by the user results in a reconstruction of the text as a whole without comprising authorship at the lexical node. Words stay the same, but the ordering and presentations of them change based on the user the the semantic information extraction by the natural language processing pipelines. This has a number of implications on what was introduced into the environment, along with what was not. Work in rhetorical structure theory helped theorize how the features would work. Even though they were extracted from natural language processing, without demonstrating a relational, recitational aspect, they were unhelpful for the system's intents and purposes[27].

The CorpusTools software program, also based in rhetorical structure analysis[34], enabled the construction of multi-layer and sophisticated feature design exportable to XML or RST3[41]. The written work released alongside the software explains the motivations for the design. Additionally, from the authors behind the first software to introduce the RST3 format, also presents rhetorical structure theory for the first time as it is seen now. Rhetorical structure theory, when being introduced, also champions the notion of the generation of coherence relations in the natural language extraction in text; this espouses a consistency in relations between underlying significance features in text[24].A particularly formative consideration in the design of this system suggests the quality and breadth of computation not common with those from the humanities side. Its thinking grants the engine behind computational linguistics a sense of ingenuity on its own: rethinking an existence on its own as comprising a gestalt, whereby semantics was able to bring these concepts together, beyond systematic expressions was a cogent ability to illuminate a unique understanding of text[41]. Computation within linguistics did more than serve the foremost purpose of automating processes which linguistics alone had realized; in turn, the field in general went beyond application the linguistic knowledge. One could extend determinations of one text to present new perspective to narratives of text with purposeful, concrete goals of discovery; this provoked a more nuanced pursuit of utilization of such tools.

In the past, this project has been considered from similar program statements in multiple individual and group efforts. For example, the development of the TACT program allows users to interact with key strings of text by adding annotation schema[19]. It was designed to highlight literary features within a text while making use of semantic extraction features typically concerned with more concrete structures of language[19]. Its framework is even marked by a logically consistent set of relations in first-order logic. This project does not alter the primary text, however, but just adds substructure. Additionally, it make less use of processing capabilities but rather offers the possibility for humans to incorporate to the extent that they may be useful.

Fluid text, despite founding its realizations within a historical context, espoused a comprehensive analysis of how semantic and temporal transformations in the text could lay the groundwork for internal revision[6]. His work with Herman Melville's Tybee concords the differences between each revision of the publication, highlighting that even a relatively few number of word or phrase

(4)

may start at one site, but can quickly snowball into dramatic changes[11]. The changes in question not only outline changes in the word's meaning and context, but also typographical and prosodic changes, as well as temporal changes between dates of publication. With each of these aspects raising keen yet subtle importance in even minor changes, the result is aptly measured by full-scale. resonance, i.e. fluidity. The Time Browser aims for a fluid-like account of text in relation to temporality. Similarly, although more solely dedicated to time, they are equally concerned with text, balancing verisimilitude and accuracy; this contrast leads to an understanding of how text can internally represent a multiplicity of unified sub-textual meanings, sometimes within some of its smallest, irreducible elements. As such, exhibiting its full potential is a challenge, assisted in part by knowledge bases and various timecoding efforts. While the model has a similar understanding of textual complexity, it aims to represent non-fictional narrative, involving different tasks and features.

Concepts in interactive literature share a similar endgame, but differ in methodology, principle and overall framework; such is the case with foundations built upon the methodological thinking of interactive literature as a software by way of its functionality[31]. In reality, some inherent differences between these different interpretations in interactive literature are found really within the subject text, or example text itself[12]. However, certain compositions found in these other mediums, this system rejects outright. On one hand, with respect to the perspective of ergodic literature, nontriviality is a matter worthy of consideration when transforming and manipulating text[1]. Further, the priority of designing a system to be interoperable is essential to the endurance and flexibility of a medium in general[25]. However, on the other hand, with other perspectives in interactive fiction, in producing an autonomous programmatic system, engender a tangential degree of separation between the organic, original text, and the object dealt with by the tendrils of the system[5].

Other considerations of textual narrative within dynamic structure takes places in Seymour Chapman’s Story and Discourse. Chapman distinguishes four common types of distortions between the temporality of the events in a narrative and the temporality of the discourse in the same narrative. He labels these distortions summary, ellipsis, scene, stretch, and pause. These represent the full spectrum of the inverse relationship between what Chatman calls discourse-time and story-time. Using annotation schema explained in later section, the sentences expressed as aspectual, perceptual, or modal are to be classified as oriented towards the furthering of discourse-time. Conversely, those which express concrete events and occurrences are designated as events in story-time . Not only can these sentences be abridged to interleave between one another[22], they can be structure within each of the five paradigms outlined by Chapman. Further, the case of prolepsis[23], in which concentrated areas of the narrative express distilled atemporality but overall retain the ellipsis, story-time oriented structure, can be allotted to the sequence of events relative to the ordering of sentences.

Many projects in computational linguistics, corpus linguistics and natural language processing provided computational annotations to the text. Nothing was applied from the output of such pipelines and toolkits without careful and in most cases, extensive tinkering. FrameNet's annotations, through their relations, bases, and lexical entries, offered the largest number of features to the system. Combined with the Python package Tkinter[39], a project

was created called CanvasWidget[3], which visually represents (though not implemented visually) one of the two main components in this system.

Online projects have existed on a community-wide scale, where a collective reading of a narrative on an open document has allowed groups to annotate sections in an effort to enhance the experience for community as a whole. In particular, one project did so with the same sample of text used for processing and experimentation by this project. The results were valid and delivered on the goal of uncovering information which the reader might not have encountered otherwise. Some of their annotations, as well as the model as a whole, are reimagined as an extension to this system[timeviz.org].

One of the annotations found in the sample contains its own format for annotation that can be easily added to the rest of the system. This annotation appears as an XML markup, but more specifically, as a more human-readable format exists in accordance with the TEI guidelines[13]. On top of a TEI-encoded, user-annotated rendition. While more dedicated processing tools were used in favor of the features built into the markup language, the only feature was implemented. Its' guidelines establish an unambiguous, sophisticated structure, which has an advantage that could be utilized in future work. The user who created this TEI file was also associated with an collaborative project in which a collective of hundreds of users annotated the entire novel from the sample[10]. In terms of content, the annotations bared a closer resemblance to footnotes. However, it exemplified the construction of sub-structural elements, such as annotative ‘features,’ as they are redefined in this project as a type of knowledge representation, but in turn, especially highlighted author annotations in their potential to bring entirely new knowledge to the table[10].

3. Methodology

In attempting to pose a possible response to the research question, the methodology lists the most important required insights for the system’s performance. The system outlined here aims of provide insight into the research question above – namely, in asking what can what can these sub-structures elucidate about the surface text, and how can a narratological mechanism be constructed to showcase the significance of these structures – the system is a manifestation of this pursuit. While invisible to the reader, the linguistic and annotative features are added to the implicit structure of the text in the preprocessing stage. As a result, the original structure retains continuity as explicitly designed, while at the same time, the deeper meanings found can be transforms and manipulated but not stray from the corresponding event. For this system to manipulate the raw text in such a way, it needs to also divide the raw text into chunks of text that can be manipulated. Unlike every other aspect of the system, the case for how to divide these chunks is not supported by ideas related to semantics or narrative. There is a complete subjectivity to choosing to parse the text based on its number of paragraphs. Thus, in this experiment, the sample text was chosen because of its original structure, one which lends itself to intuitive fragmentation. This alleviates the triviality, as well as the possible negative effect that an uninformed disconnect between events could cause. Events, as they are called, are divided by units – the

(5)

sequence of catechistic question-answer series in the chapter “Ithaca,” of Joyce’s Ulysses - implicit in the text itself[23]. Pretesting conducted with sample user selections determined the degree of fluctuation permissible with the weights of the data. For example, if the author wanted to place emphasis on certain semantic features while minimizing others, the results generated by the system would need to be kept to a relative scale – the value of the largest weight, regardless of frequency or infrequency, cannot be more than ten times larger than the smallest weight. Otherwise, when the smallest weights are instantiated they nearly have no effect on the outcome of the system transformations. This would go against the objective of the research question to find a system architecture built based on the implicit structure of the text at hand.

The system is then tested in a qualitative manner. The conceptual rubric in Section 4.3 is reexamined at the testing stage to see if its principles were upheld. Additionally, user testing provides important survey responses and overall feedback for qualitative evaluation. From the areas of expertise of each user in the tests, a more in-depth feedback of the value of specific annotations can be given in certain instances, while in others, perspectives as either an author or reader of the system is highlighted.

4. System Requirements

These arrays of words interrogate the system in a dynamic number of ways, storing its information in a pandas schema that is not even bound to linearity within a table[37], and does, at minimum, the following parsing and retrieving, and tagging within the text were based on the interactions with the reader.

4.1 Algorithmic Function

The system’s algorithmic functions have two primary operations; they sort the values of the results in each iteration, and they provide simple a arithmetic pipeline for translating user selection to corresponding values, to input data in an equation.

There are concrete, logic-based methods put in place for dividing the text into a sequence of events which can then be sorted by these iterative functions. There are multiple reasons for this use of simplicity. As is the case with many complex knowledge representations, simplicity grants the users a greater understanding of which components in the system are having which effect, ultimately increasing effectiveness in the long run[18]. Thus, the values of specific semantic features avoid having a conflated relationship between one another. Another reason is part and parcel to the means of addressing the question as a whole: simple arithmetic and sorting functions permit a simple use of first-order expressions. This allows for the use of expressions in which variables are either free or bound; in the former case, infinitely substitutable. This, in turn, allows for multi-dimensional, overlaps annotative structures that can modify a single word in the text, yet still maintain an singular effect on the expression as a whole.

4.2 Narrative Magnification

The system’s guidelines explicitly state that all of the expressions and semantic extractions must relate to narrative elements in the text insofar as it is a creative work. Whether intuitively surmised with respect to the users, or demonstrated in comparison to existing structures of emergent narrative, the reasoning behind each component’s implementation is based on its ability to

contribute to a greater understanding of the narrative found in the text. Thus, the following model demonstrates a known understanding of dynamic narrative elements, translated back into the same expressions used in ShuffleMill’s system of features, fields, weights, and events. Given the first-order logic premise 1) and the conclusion (2) a possible model for theta satisfies the following statements, with the domain becoming increasingly more specific:

(1)

On the other hand, the essence of this relation, from a broader perspective, reaffirms not just one scene, but additionally its related scenes, portrayed here as a narrative device. It is saying that when there has been an identification of this mode of subtext, wherever else it is found at least exists. However, a scenario could be easily pictured in which the 'true' above is instead given 'false' as a truth value - perhaps an event which results in a fallout allowing it to not be repeated or even reiterated in the same sense again. This can be done without disrupting the fact that a truth value remains consistent, as in (2), for example:

(2)

This expression adapts the narrative structure described in The Language Games of Siva[16], amalgamating the flexibility of the system’s fields and features with notations X and Y; in this narrative depiction, time and space are confined free and bound variables that shape the perspective of the speaker in a unique spatiotemporal relation[33], which is the cause of a moment, which is the result of a conveyable, particular scene.

4.3 Linguistic Extraction

Some of the information extracted from the pipelines are most robust than others. Overall, the evaluation methods of the annotations themselves depended on author intuition or metrics provided by the frameworks themselves. The annotations that result from such information extraction follow a set of guidelines in order to maintain validity and relational functionality[4]. In the first section of the features the project rejects the use of a more comprehensive feature of entity recognition as a result of this intuition. In using PIKES toolkit[7], a confidence level threshold is imposed; the system only implemented the top 80% of features extracted from FrameNet[15], for example, by sorting them by the measure of this value. PIKES resolves and confirms its annotative findings with regards to databases lacking pipelines of their own, such as FrameNet, by “where each mapping is a reified relation identified by a URI and described by properties ‘ks:role’ (pointing to the URI for the predicate argument as defined in PreMOn)[7].” Premon’s ontological model is a knowledge base that offers a predicate matrix and regular expression mappings of its rules for extracting predicates from raw text, well within the guidelines of natural language processing[9]. Spatiotemporal expressions in accordance with the TIMEML markup language follow a similar rubric but relate to the machinic process different. Thus, it can be

(6)

implemented different, as highlighted in (1-2) of Section 4.2, a narrative structure that contextualizes spatiotemporal language at the forefront[33][16].

4.4 User-specific Considerations

4.4.1 Author

From this perspective, the system aims to simplify the ways in which it values, divides, and manipulates the text. The relationships between these processes are invariant. On the other hand, several of the dynamics, weights, and effects within each of them can be reformulated based on the author’s desired effect. The identification of the annotative structures in the text is conducted apart from the author, but their respective values and relations to events can be altered by the author. In turn, this directly corresponds to the transformation’s outcomes. These respective values are visible, and editable to the author, displayed in a format similar to the table in Section 5.4.3; these values remain consistent unless altered by the author.

In addition, the author is granted the ability to remove and add any number of ‘fields,’ as defined below. These are used to introduce components within the txt that could only be affirmed by the author, as well as meta-linguistics considerations such as plot elements. Adding such fields bounds to all of the events contained within the same coefficient, which increases the likelihood of its sequence remaining compact.

The author, in future cases, receives an instructional prompt unique to their end. It simply explains the way the text is divided into events, the way the sub-structures values relate to the transformations as a whole,

4.4.2 Reader

Consistent with how it has been explained through the methodology and the system, the triggering of fields, features and weights within an event depend on the text selected by the reader. Without this aspect, the text may showcase the knowledge of implicit or embedded features, but there would be no catalyst for manipulating the text in a way that illuminates the underlying knowledge. User selection from the reader provides this essential trigger that the transformative aspect of the system depends on.

This application of the role of reader does not stray far from its conception in the seminal text regarding the genesis of digital interactive narrative with Murray’s work in the field[32]. The user as interactor provides input for the process of the computational system.

4.5 Conceptual Rubric

This primer will set up key characteristics in the system and justifications for them in axiomatic form. The explanations that follow accounts for principles as givens, and then tests the consistency of them later as a form of evaluation. At its foundation, this project comprises an intersection between, at the very least, semantics, natural language processing, and emerging narrative. That does not include the adoption of others in particular cases. Hopefully in the explanations that follow, this framework emanates from practicing a balancing act between all of them in each stage of the text transformation process.

Given all the possibilities of natural language processing, the system with only contend with information that will impact the

narrative, which otherwise includes only the user's input and the author's participation.

No a priori narrative mapping will be built into the system. In an effort to crystallize not just what the system is doing but why. Of course, the author will construct the basis FOR narrative events to occur. Instead, it will display a transformation based on real information extracted from the text.

No computation within the machine can overshadow, conflate or diminish the presence of other factors. As such, there cannot be any hidden data structures, switchboards, hashing, or inference models. For the reasons mentioned in section 2, learning models, as well as any of its constituents, are absent. Given this corollary, we see the system's goal is to showcase the narrative, the user and the text. The task of the system is to refrain from interference; hence its intelligence is measured by its simplicity. The only algorithmic process occurs if the system needs to correct its own bias. In the event of this occurrence, there will be an algorithm to correct it and an algorithm to check if the correction needs to take place.

The computational elements that do exist in the system are connected by first-order relations, and the computation occurring globally will not extend beyond its scope.

When the scope exceeds the system functionality, the mathematical use of the chain rule between parameters will be used to resolve the issue.

Now, given the nature of the input data and the machine structure, predicate logic would be contradictory in certain scenarios[14], but it is resolved by virtue of the system's intrarelationship between structures and parameters, modeled after functions of the chain rule and total derivative[36]. This will be discussed in later section in combating this scenario in particular. In the meantime, the total derivative is the premise of the final conceptual axiom.

In light of the same concept, the structure's relationship between map and model is built by default as spatiotemporal series of events[33]; thus in its tabula rasa form, it looks initially like a straight line, or a horizontal sequence of empty values, as would be the case with general proceedings in traditional textual narrative. However, given the consistency of the chain rule and their relation to first order expressions, there is neither a linear map nor linear models found in essence. This is a result of the multidimensional internal structure and the nature of their relations. The problem at work is threefold, presenting a contradiction that would undermine each component and jeopardize all of the principles laid out thus far. Yet all of this simply solved by nature of the internal structure and the rules that express and connect them. Later on, when the scenario of all three problems co-occur, this will be discussed further.

5. SYSTEM

The raw text shown to the reader, its annotated TEI markup, and the data structures that hold the implicit features, fields, rules and weights are stored within multiple formats and locations within

(7)

the system for the purpose of recall, iteration function, and computation: NAF, JSON, Python, PypoLibbox, and TEI formats are all incorporated; in some cases, they were converted from CAF[27], TTK[42] or XML[20] formats. Additional Python packages include numpy, pandas, pandastable, untangle, six, matplotlib, nltk, xmldataset, pprint, future, and functools.

5.1 System Architecture

5.1.1 Fields

The two principal components of the system are labelled as features and fields. The main distinction, as they are defined here, is that fields exist as kind of conceptual overlay, affecting the result regardless of selection, whereas features more specifically become factors when selected by the user. These two work in conjunction with one another to distribute the significance within multiple aspects and contexts of the text per segmentation. In part, this is an account for the dynamic variance awarded to structure at the cost of its external fragmentation.

Within a regular expression, fields are free variables, meaning they are not confined to certain in relations across the text, but are infinitely substitutable[2]. At the same time, they incorporate entries spans of text, representing more conceptualized domain. One can think of them visually as transparent, 2-D filters placed at various places along a line, with the ability to shift in transparency and location along the time, and also overlap each other. In addition, their 'filter' becomes a quantifier of whatever lies beneath them at the time, refracting the selections of features that both directly correspond, or associate in relation to the set of the same feature. As authoring tools, they offered a subjective treatment of the system without diminishing the volatility of internal, more organic features. produced more immediate and more dynamic transformations to the text. At the same time, they granted the system the necessary parameters needed to not require an a priori mapping or shaping. There are two definitive reasons for these qualities: since they’re a continuous range of values, their domain lies parallel, thus occupying a kind of situational plane; secondly, this since they’re not bound by anything else, they can shift freely across the narrative in any number illustrative ways. Despite this, the experimentations here both left the positions of fields unaltered, and weighed them to be overall fairly equal to that of features. In terms of natural language processing and as a data structure they are called in the form on a chunk. Like typical use of chunks, they neither supersede nor intersect the words and features fall into their subset, which thereby prevents unintentional dependencies with other features.

A cohesive construction of these fields was crucial for adherence to the conceptual framework and made possible with RST and discourse semantics software. Even though their computational qualities were designed to be simplistic, narrative-minded interpretations of the text allowed for diverse range of fields to be placed within the known initial formation. This made the conceptual framework a practical possibility.

5.1.2 Features

Features constitute a more concrete layer of data structure. Individually each of them are implemented as Python objects with the same class; fittingly, they take of a hashed feature structure. Each feature was selected based on a number of criteria, such as their dictionary entry within FrameNet, denotes the markup of

particular, fixed strings of text. Currently, they’re each assigned to values subject to the particular preference of the author. Consequently, the values might be very normalized; considering the descriptive and coagulating nature of the features, however, authors might also distort them to underscore certain aspects of text. In the initial experiment, these values ranged between 2 and 15, but in tandem with larger fields, they could see a higher variance.

5.1.2.1 Contextual Features

This category of features extracts basic information from the default TEI annotation of the text when available. Some of these annotations, such as those which allude to entities in the text, such as artistic works, people, and countries. These were pre-encoded into the sample by the TEI annotation schema[13]. This system also extracts entity coreference in the text to account for pronouns, but does so with a greater caution than common in standard processing pipelines, forgoing standard entity coreference for a non-projective coreference resolution[29]. This was enacted after investigating the results within this particular sample, where the style is long winded and often ungrammatical. Additionally, both main characters within the text are males, making the system even more susceptible to error. The consequences of misrepresentation of such a common surface element in the text, the benefit of annotating every pronouns was too potent.

5.1.2.2 User-Appended Features

This category of feature simply gives the author the ability to go into the text and create annotations of their own. The figure below demonstrates a section of the TEI document, which is the formal input into the Javascript application. This particular screenshot displays all of the potential markups in the TEI guidelines. The selection under 'user-appended' or 'ua' both relate to annotations that identify elements in the text that the features would have missed otherwise. This structure is ended for elements that are always going to subvert natural language processing. Often, this will involve structures of symbolism that have wider social implication [10], signaled by outside knowledge rather than by language alone.

To some extent, this constitutes Recent efforts in natural language processing have pursued an unsupervised detection of metaphor, but the common practice in semantics is to uncover it manually[28]. To this effect, the system's policy was to invoke metaphorical concepts, but not structures of metaphor imposes by the processing itself. Then, if the concepts were seemingly upheld in the eyes of the author, they were considered fair game.

5.1.2.3 Discourse Semantic Features

This section makes up the lion's share of annotations transposed from natural language processing pipelines and for the sake of clarity and attribution, it's split into four additional sub-sections. The majority of them originate from the FrameNet knowledge representation, a remarkably perceptive corpora found in natural language processing. The corpus is constructed as a tuple, with linked databases coordinated between FrameBase, FrameNet and FrameRelation[38]. Using natural language processors that identify and annotate the , . From the hundreds of semantic considerations in the corpora, this project parses frame element that could be considered meaningful when discussing narrative. For example, the frame element named, 'Elusive_Goal,' explains its occurrence of when, “An Experiencer does not achieve a Desired_goal. Typically, the Experiencer is the object of the verb.

(8)

Metaphorically, a pursued entity is not captured.” Relative to its FrameBase complement, it explains, “This frame is a metaphorical extension of the Evading frame. The Evading frame concerns literal motion of two self-directed entities, one of which moves in such a way as to successfully prevent the other from capturing it. In the case of the ‘Elusive_Goal,’ frame, however, there is no literal motion, but rather a ‘Desired_goal,’ that is not "captured" by the Experiencer that would like the ‘Desired_goal,’ to be true or known.”

5.1.2.4 Temporal

TIMEX3[42] annotations are syntactically spread across a sample text, compounding into an important account of the temporal instances in the language space. However, simply put, the annotation used as a standalone expression only returns temporal expression that require no understanding of the text beyond surface level. The mark efficiently gets dates, times, calendrical events, and any vocabulary necessary in reference to some timeframe, such as 'yesterday.' Event is a classification structure, signifying an occurrence and/or a state. An occurrence is used for three general uses: a nominal noun, a stative adjective, or a tensed verb. Occurrence is an attribute distinguishable from ‘Event,’ even though many of the categories that subsume a state could be naturally thought to intersect with events. Unlike ‘Event,’ it does not contain a rigid time series. Nevertheless, contributes semantically to the overall temporal meaning. These words have a direct and explicit temporality in TimeML, and the lexical entry a state refers is explicit in its temporal function. The difference is that states are by nature less fixed notions of temporality, so their effect is undeniable, yet indeterminate. Like the three types of occurrence, several sub-classifications exist within an event’s state. State is a label for a type of copula verb, including all forms of be and do. They connect a subject and a predicate in a temporality of its own, but without an inherent interval in accordance with its supra-class. I-state is similar to ‘perception’ in that its validity depends on the subject who uses the verbs. Verbs like ‘believe,’ have a different sense of temporal modality. They are stated with contingency on the possibility of their occurrence.

All of these classifications of event, within both ideas of occurrence and state, have two possible sub-classes: tense and aspect. Tenses included are: past, present, future, or none. In narrative, present tenses, especially a pattern of particular tenses, evoke a time modification separate from the time of the narration. Aspect functions the exact same as tense, and is even usually described in relation to tense, which is why it is classified on its own, but classified on the same plane as tense. As instances, they connote a temporal properties of an implicit, rather than explicit, expression of the semantics in the natural language text.

SIGNAL classifications are comprised of prepositions, connectives, and subordinate pronouns. These lexical items share similar characteristics in how they function and interact within their context. SIGNAL items do not themselves denote any sense of temporality. In ‘When Suzy went to the moon,’ when adds no insight into the temporality of her going to the moon. Rather, its denotation is a placeholder for the temporality.

LINKS, like the TIMEX3, cannot be represented in a single context of a lexical entry. So a LINK exists to describe relations between events or dates that exist within the text, illuminating a unique dimension of temporality that a single word could not

indicate. T-LINK connects an event to another event with the same justification as events but with predicate values.

5.1.3 Annotation Structure

The annotations were stored in multiple formats and relational structures were instructed based on these data and storage formats.

Fig 2. A TEI lexical markup of the interactive text, containing

several different types of ShuffleMill ‘features’ 'Fields,' as the project calls them were processed as a FeatStruct classed 'RangeFeature' object, a framework of the Python module distributed by nltk.org[3]. This has numerous advantages to a more generic range of values corresponding to the series of events, one being the ability to attribute semantic expressions to its makeup. The system relied on the class' ability to be called into any number of different lists and dictionaries, also available in the module. It also made frequent use of its integration with regular expression, or first-order logic, representation[#Arnold2005].

Features are placed in multiple lists, since they call different external sources in different format. One list simply links each feature to each of the events in which they take place. Another, when that event is selected, looks in the corresponding NAF file; after concordance the terms assigned there to specific phrases within the event, and any other information needed from the term in the NAF file[17]. This happens only after the event has been selected; now, the feature determines whether or not it was selected directly. If not, the weight is halved in this iteration.

All of the features pertaining to discourse semantics[40], including RST or ‘Rhetorical Structure Theory,’ are connected programmatically to the Natural Language Toolkit framework, using the FeatStruct Expression object from the Semantic Expression module[3]. When the user selects a phrase which contains 'Proliferating,' for example, all locations with Proliferating add the value of the base weight. However, the regular expressions for the annotation, which lies within the FeatStruct, was assigned a regular expression in which all of the remaining uses of the annotation are given a causal relation. Here's a pseudocode written in first-order logic detailing such a relation:

PROLIF( e&select) * 5W PROLIF( e1 ) *3W =([ PROLIF( e θ ) ⊆ PROLIF( e1 ) ⊆ PROLIF( e2 ) ] ⇐ ⇒ e1>e θ

Keep in mind that e is initially 1, so decimal multiples result in a small but positive increase. Even though while prior event selections by the reader continue to gain value, relations like the expression above are among a few of the ways the system assigns

(9)

greater inflation to unread events. Relations like this also build conditional relations between different Features, ascribed by the author when they seem to establish linkable parameters. Most of these relations, however, only assign one conditional, whereas this ruleset contains two sub-conditionals to place greater value on subsequent events.

Features were implemented as methods within FeatStruct but stored as a simple Python dictionary[3]. This was preferred because of the outline in potential future work to trigger alterations of the weight values using statistical criteria less inaccessible by FeatStruct as a class.

5.1.4 Weight Distribution

Each individual aspect was quantified as attributable weights, governing the effect of Features, Fields, and overall relations between system, text and users. The weight assignments were given with consideration to both their subjectivity and objectivity. With one dimension, the weights are instantiated in such a way that the author is given a subjective, yet non-disastrous freedom to affect the algorithm. As authors, they were given weighting schemes for experimentation. With the intention of deflating or inflating the values as a means of emphasis, assignments were unrestricted as long as a measurable scale was maintained among the entire feature set. In the sample weight set, the maximum value was assigned to “RST-'theme'” at 24.05, while the minimum given to “discourse - 'Attribute,'” at 1.55 This, of course, was valued with a knowledge of a frequency that these Features occur; the former occurs 15 times while the latter occurs over 200 times. Conversely, some Features like “discourse 'Domain,'” was given a vastly inflated value of 18, because of the interesting context it generated in the text. Table 1 below displays the weights assigned some of the features in the dictionary for the experiment that follows. For a feature such as ‘Change/@Change,’ the FrameNet definition requires that for every semantic feature deemed to possess a sense of ‘Change,’ there must also be a corresponding element ‘@Change,’ or a direct object, that is directly modified by this ‘Change.’ Thus, both must exist complementarily in order for the instance to take place.

#####feature_weights_initial #surface_features #FrameNet/FrameBa se #TTK Tarsqi 'TEI_emphasis': 8.5, Change/@Change' : 2.65, 'TimeX3': 8.15,

'TEI_other': 2.5, 'Becoming': 4.45, 'TLINK/@TLINK': 8.15,

###user_annotated 'Dispersal': 1.45, 'Signal' : 6.15, 'Event_Stopgap': 15, 'Dimsension': 6.65, #discourse

'Vanishing_Entity': 15, 'Elusive_Goal': 1.45 'Spatial_Extent': 4.54, 'Unreliable Narrator': 15, 'Endeavor Failure': 12.45, 'Event-Time': 4.54, 'sentiment_compare': 15, 'Existence': 1.45, 'Temporal-Location': 4.54,

'fatherson_loss': 15, 'Impact': 1.45, '@Cognizer': 1.45, jewishness': 15, 'Importance,': 5.45, '@Perceiver': 1.45,

Table 1. An incomplete dictionary of annotated fields and their

corresponding values, as well as their sub-categories, categorized by their source.

The only constraints are within the constraints of the expression rules themselves; without them, a conflation of features would obscure the sense of each other's presence in the text. On this basis, some have disputed the soundness of discourse semantics on the grounds that they constitute an anaphora from a bound variable clause and a free variable consequent[14].Two problems exist in this argument. First, that the premises through (7) are halfway treated as first-order, where x is universal and where y should be existential, it is nothing. This leads to the conflation on some terms, up to (12), between monotonicity and anaphora. This is only to say that within an isolated narrative space, order-preserving and rulesets can be helpful in raising parameters and functions to clear representation. In an application that Eijck all but predicts, this arena remains programmatically consistent with the module, and in terms of relative discourse, anaphora brought in externally informs a different interpretation.

5.2 Text Transformation Setup

5.2.1 Event Segmentation

Within the current framework, the events are automatically segmented, and from this initial order of segmentation the transformations proceed. They are detected via natural language processing toolkits that split and mark paragraphs separately. This architecture could yield a more effective context awareness in later work. Of course, as with relations between features, events, too, could be set as members of conditional logic expressions to link events together.

5.2.2 Field-Feature Relation Schema

Frame relations are programmed expressions in the system, established in semantic, first-order logic, using the same Natural Language Toolkit module mentioned above[3]. Practically and intuitively, but for not only a selected instance of text to trigger a function based on the prevalent feature, but also an output from this function that has an impact on the order of events presented in future iterations, a relative function instills consistency among every instance of each feature. These ruleset, like every component in this system, is determined based on a degree of subjectivity by the author. By default, features generate this entailment relation:

5.2.3 Field-Feature Co-Occurrence

Referring to the example above, let's say there's a Field within the scope of e30. This would give it a reduced multiplier based on the value of the field. However, the case is much different when the field and feature are triggered from the same selection. Since fields are free variables and features are bound variables, they have explicit non- substitute bindings ruling a field and frame cannot use the same e. But looking at Figure 3 below, we see an intuitive operation that brings additional value to concatenating features and events.

(10)

Fig. 3. The orange box resembles a ‘field’ since it modifies the

relation between the textual selections made by the user(top), and its corresponding features on the bottom. These two, then, yield a new order of events along the timeline iterated by ‘E1,’ through

‘E10.’

The figure simply indicates that when these versatile fields traverse features, or in the two intersect one another, they multiply and amplify each other, signifying the importance of the fields and skewing the results in a way that emphasizes their influence. As a methodology afforded to the author of the interactive text, this may or may not be created to guide the interactive tool towards certain events or instances.

5.3 Transformation

5.3.1 Iterative Process

The final results when the weights were totaled, distributed to fields, then frames, then retotaled. Afterwards, they were fractioned by a ratio of ‘# of current iteration / # of total iterations.’ After these calculations, the events were reordered by increasing value using a simple heapsort function in Python. The sorting eliminated previous event iterations. The sorting also only factored the events that had been called by annotations of any kind. This constituted the completion of a single ‘iteration’ as it has been used in this paper.

5.3.2 Realignment

The figure below demonstrates the effect of the procedure after one iteration. The natural progression of the structured events in the text would count up from one, since the events are just a linear sequence of the text. This is demonstrated by the blue boxes in Figure 3. The green diagram shows the results of one pretest, where events of those

6. EVALUATION

The system is tested under two evaluation strategies. First, a conceptual framework, which will initially be introduced below, will then reflexively serve as a rubric to the system architecture. Second, user testing and surveys will measure the success of this on a qualitative scale via questionnaires and general feedback. The users were given identification numbers from 1 to 3. User 1 is a postdoctoral researcher in semantics, so she was asked more detailed questions about the merit of the linguistic features themselves. User 2 is a published creative writer who experiments with form and text, so they were surveyed on the potentials of this function.

6.1 Setup

Figure 4 was the visual display used with the participants of the evaluation of the system. The simplistic interface dislays . Explicit event segmentation is not identified, other than a normal blank line between paragraphs. Not pictured in this image are the top of the page with instructions discussed in the next section, and the bottom of the page, where the user can choose to advance to the next iteration. The left side is a complete list of the user selections in each iteration. That data is then sent to the system as input data, translated into utf-8 character positions in the text.

Fig 4. The visual representation of the system in the first

prototype, as it is displayed to the user

6.1.1 Source Material

The penultimate chapter of Ulysses "Ithaca," was chosen subjectively, but not at random. There were attempts to modify preexisting interactive narrative as a sample in this system, but the dynamic layering of their own was deemed impermissible. “Ithaca,” is unique even to the notoriously complex modernist novel Ulysses[23]. Referred to as a ' catechism repository,' its narrative proceeds in a meticulous yet confessional unfolding of events. This offers a strange respite from the confusing prose and stream of consciousness before and after[30]. Rarely do overloads of information give the impression of an oasis, with chaos coming before and after, yet that is the case with Ulysses. As a format, there are internal, concrete segmentations across the text in the form of a series of 309 question-answer scheme, which seemed tailor made for this project's functionality.

Additionally, Ulysses has an infamous reputation for its inaccessibility with its lexicography. This chapter alone has 7083 unique lexical entries. This presented an interesting challenge to the infrastructure of corpus linguistics, where the selection of the annotative fields was researched. There was the potential to offer clarification to the context, as well as the potential to meddle with increasing confusion. One corpus linguistics tool, Corpkit[29], allowed users to add metadata entries to across results of corpus interrogation and concordance search. The results indicated that while there will still plenty of word repetitions and common usages to identify important concordances, there were also lexical entries given unique characteristics, and perhaps never used again. The unique format of the text is nearly pre-transformative, and with swaths of pertinent information, lexical diversity, complex relations, and sheer density, it offered a potentially unwieldy source that was too remarkably suitable to not choose. (….and so…?)

(11)

Additional texts were considered to complement the complexity and difficulty of Ulysses. Selections were made from both, “Marriage” a poem by Marianne Moore and Mrs. Dalloway, by Virginia Woolf.

6.2 Findings and Discussion

Since the project was realized by a combination of quantitative and qualitative procedure, the evaluation criterion should reflect an evaluation akin to this multitude. Hence, qualitative and quantitative evaluations were conducted with respect to both of the methodologies. With the former, the logical structure of the conceptual rubric is analyzed quantitatively to measure its validity. With the latter, the project was hosted online and users were asked to engage as they saw fit.

6.2.1 Surveys

In the following sections, the participants will be assigned to ID numbers 1, 2, and 3. User testing was organized with three participants, each of whom studied or practiced in fields related to certain spheres expertise directly related to the project. ’User 1,’ studies semantics at the doctoral level, ’User 2,’ is a publisher and experimental text-based artist, and ’User 3,’ works professionally with computational metrics and statistical analysis.

6.2.2 Testing Procedure

After receiving a basic rundown of the project's goals and dynamics, the users were directed to a website which hosted the engine. Before the system was presented, an explainer was presented as follows:

“The program underneath is called ShuffleMill, designed to elucidate computational aspects of language in narrative. The way this works is this:

You are creating your own pattern out of the text, which in turn shapes the narrative from metadata within the text by doing this you create an abstract filter of your own. the text will display the subsection that has been transformed the most drastically, but the transformations affect the entire chain of text are you are reading the sample text - Ulysses, Ithaca, Chapter 17 - for the first time?”

Having known beforehand, however, that one of the three subjects had not read this section, the reduction of final output for the first three iterations was completed beforehand. This was done so that the user could ease into the text, getting an idea of what it would be like before their answers had a real impact.

At 5,000 words or a minimum of 20 segments of events presented in each iteration, the users were given 45 minutes to go through the text at whatever speed they preferred, with another 15 allotted for waiting between iteration runtime. The nonrestrictive approach to pacing caused a significant variation in the number of iterations - between eleven and four.

6.2.3 Intra-User Reiteration

After the first trial run with User 2, who had no familiarity with the material in question, an A/B structured instantiation was implemented as an extension of the original procedure. This user, unlike the previous user, had not seen the text apart from the system. Thus, an experiment into the effects of the machine apart from the content was introduced, divergent from in its original intent; after each iteration after the first two, the user was asked how much of the text they felt may have been transformed in one way or another. While the other two users would have known the location of certain topics of the text, User 2’s judgment of transformation in the text was based on their ability to identify a manufactured shift. They asserted that due to the variance of style and length between neighboring events, their instantiations must have been rearranged or shuffled in some way; for the events in consideration, they were correct that the system had transformed the text as a result of some other underlying connection within the sequence.

6.2.4 User Feedback

Participants had a large degree of variation in their overall impression and feedback. The semantics doctorate commented that while the functionality of the machine was interesting, it was not the machine being displayed, but the text. Therefore, “whatever exists under the hood either projects itself or seems arbitrary, if not ill-advised.” The second user, who had no prior experience to the text but a strong interest in digital experimentation in prose lamented that the first impression from the text made it difficult to recognize transformations but did get the sense of systematic manipulation taking place.

With User 1, the experiment afterwards provided links to description of several of the features found on the FrameNet website, which fits the FrameNet annotations in a flexible number of circumstances, but not explicitly narrative[15]. User 1 was then asked if she agreed with the analysis of the frame with respect to the corresponding term. Then, she was asked if this instance could be said to exist within the text, and then finally, if this instantiation may reflect a dynamic utilized in textual narrative. User 3’s relevant background pertains to application design, concerned with user experience and system interaction. Their feedback was more critical, suggesting that no matter how enriched the text becomes, the result ultimately depends on how and what the user is able to generate from the text. With the user’s end of the experience expressed in such a simple form as highlighted text as opposed to non-highlighted text, the intricacy of the system and the author’s capabilities is significantly minimized. Overall, then, they concluded that the system failed to elucidate the internalized semantic features of the text. Then, their subsequent critique questioned whether the project made enough effort to analyze what was really going on in these adopted natural language processing pipelines and toolkits. If taken to be an inconclusive adoption of the features, an investigation into the code that validates the existence of such features was necessary. If conclusive, this was just as necessary, but even more indefensible when the only defense of the elements used in this way was in reference to another linguistic tool that had not intended to be used in this way.

User 2 is a published creative writer in digital literary magazines and blogs on the Web. Their feedback was nearly unilaterally

(12)

positive, stating that this tool could be implemented in their own work. In showing them some of the criteria for identifying the underlying features in the text, they decided that their applications were interesting for the sake of manipulating the text alone, but also seemed sound in an intuitive sense.

6.2.5 Reflection Against the Conceptual Rubric

As stated previously in the methodology, the conceptual rubric outlined in the system requirements could be reevaluated in light of the experiments and detailing of the framework. If the principles are upheld, the system can at least be considered an answer to the research question, but not necessarily a solution to the question. The summary of each axiom is numbered and summarized below.

1. User interactions is the only way to start the machine. True. If the user chooses to select nothing

in the text, the next page simply shows the traditional following sequence.

2. All semantic markup clearly and directly impacts the narrative. Discussed further in section 6.2.6

3. No a priori mapping of narrative structure. True.

The input structure remains consistent with how it would be structured without the system initially. 4. No concatenation between semantic data

structures. True. This follows from the next axiom

5, but in general, all parameters are considered separately by virtue of the ruleset to which they are applied.

5. All relations are expressed in first-order logic.

True.

6. All calculations are synonymous with the chain rule. True. Insofar as the chain rule, with only one

span, i.e. derivative, differentiating the structures, the chain rule is simply a sum of the quotient, with the higher derivative being multiplied first.

6.2.6 Selected Discussion

One concern in the findings from the qualitative analysis addresses the need to investigate the external systems which generate the annotations used in this system in order to responsibly invoke and repurpose them. As discussed in related work, the conflation between autonomous results between systems espouses a problematic groundless structure in each of them[5]. This is the reason for the selection process for which annotative features could be repurposed for this system. The primary natural language processing pipeline, in generating many other borrowed results from other natural language processing projects, is the PIKES toolkit. One preliminary measure when adopting from PIKES is the built-in confidence level of each annotation in the text, whereby the system only makes use of annotations with a confidence level above 80%[7]. Additionally, PIKES has a detailed and sophisticated measurement of the frames of surrounding sentences, as well as programmatic explanations of its incorporation of other systems[8]. Thus, the results from PIKES meet a sound end, for the most part, barring a low confidence level. With all of the other annotation pipelines and semantic toolkits used in this system, human interaction plays a significant role in their results[34][41][38], which causes it to be less susceptible to these problems of a black box type of conflation between systems.

7. Conclusion

The findings bring important concerns to the forefront, but also reiterate that for the sake of experimentation within narrative, the ShuffleMill system provides interesting insight. Not to be taken as impregnable framework, the features and fields offer a flexibility between author experimentation and natural language processing discoveries alike.

What does it mean in this era to deal with text, and why do so exclusively? One reason is not because in spite of technical advancement in other creative forms and mediums, books are still widely distributed in print, or in print-based digital copy. This is important to note because this paper aims to project an interactive literary hyperspace onto text, which becomes first and foremost, a research into interactivity; if the reader only interacts with text by reading it, the rest of space stays hidden. Nevertheless, this is an option. In a sense, to surmise any additional tasks of reader interactivity other than reading, must of course start with reading. Thus, to target the form itself, in regards to its perceived limits or potentials, becomes a misguided effort. The problem here does not arise as a critique but as a proposition. Literary theory, narratology and hyperspace are all intertwined at the center of this experimentation.

The findings, then, explore this synthesis through crafting a practical tool which incorporates all three. Thus, the aim in logic, theory, and testing avoids certain machinic pitfalls in the simplicity of its computational processes; it also does so by allowing the author in tandem with knowledge extracted semantic features to quantify the transformations of the text with each iteration. Further, the system can be integrated into preexisting interactive narratives as well as into repackaged traditional texts. In this context, ShuffleMill answers the research questions, but cannot be said to provide an indisputable or even a sole answer to the research questions. Many systems can claim to posit the same principles and meet the same ends, but ShuffleMill attempts to reflexively construct its framework based on its own insights and first-order expressions each step of the way.

8. References

z[1] Aarseth, E.J. 1997. Cybertext: Perspectives on Ergodic Literature. Booksgooglecom. 8, (1997), 216.

[2] Arnold, D. 2005. Chart Parsing. 2 (2005), 1–9. [3] Bird, S. et al. 2016. NLTK Book.

[4] Bird, S. and Liberman, M. 2001. Formal framework for linguistic annotation. Speech Communication. 33, 1–2 (2001), 23–60.

[5] Bishop, R. and Phillips, J. 2007. Baudrillard and the Evil Genius. Theory, Culture & Society. 24, 5 (2007), 135– 145.

[6] Bryant, J. 2017. Moby-Dick : Reading, Rewriting, and Editing. 9, 2 (2017), 87–100.

[7] Corcoglioniti, F. et al. 2015. Extracting knowledge from text with PIKES. CEUR Workshop Proceedings. 1486, (2015).

[8] Corcoglioniti, F. et al. 2016. Frame-Based Ontology Population with PIKES. IEEE Transactions on

Knowledge and Data Engineering. 28, 12 (2016), 3261–

3275.