Semi-Automatic Construction of Skeleton Concept Maps from Case Judgement Documents

(1)

University of Amsterdam

Bachelor thesis

Information studies

Semi-Automatic Construction of Skeleton Concept Maps

from Case Judgement Documents

Author:

Bas Sijtsma

Supervisor:

Dr. A. Boer

(2)

This thesis proposes an integral approach to generate Skeleton Conceptual Maps (SCM) semi automatically from legal case documents provided by the United Kingdom’s Supreme Court. SCM are incomplete knowledge representations, providing the foundations of in-formation for the purpose of scaffolding learning. The proposed system intends to provide students with a tool to pre-process text, to help them assess their own knowledge, and to extract knowledge from documents in a time saving manner. A combination of natural lan-guage processing methods and proposition extraction algorithms are used to generate the output. Concluded is that, even though interesting results were achieved, improvements are necessary to provide results of sufficient quality to support students. An important takeaway is that a level of abstraction is required that is not attained by the proposed system.

(6)

1 Introduction

Oftentimes, students have to deal with large amounts of complex information. One approach to deal with this is by constructing conceptual maps of the domain. Conceptual maps are a graphical representation of knowledge, containing a hierarchical structure of core concepts and concept relationships. These conceptual maps help students to assess and share their knowledge with others. Moreover, conceptual maps can serve as a scaffold to structure prior and newly attained knowledge into their existing cognitive structure (Novak and Ca˜nas, 2006a). Since its initial formulation in 1972, much research has been dedicated towards the use of concept maps as a cognitive tool in education and as an analytical tool for domain experts.

A relatively new field of research that has gained traction amongst academics involved in concept maps are Skeleton concept maps. Skeleton concept maps are incomplete representations of information prepared by teachers or domain experts, containing some of the building blocks required to form a conceptual map. Skeleton concept maps serve as a foundation to start a process of meaningful learning. The skeleton concept map has proven to be an effective tool in education, promoting collaborative learning and self-assessment (Novak (2010), Novak and Ca˜nas (2006b), Fischer et al. (2002)). However, building skeleton conceptual maps requires considerable time and effort. In todays online landscape we are faced with the availability of large amounts of information. The opportunity arises to investigate an approach that constructs skeleton concept maps computationally. Since 2001, researchers have attempted to accomplish this task, each dealing with various parts of the problem.

This thesis proposes an integral approach to generate skeleton conceptual maps semi-automatically from legal case documents provided by the United Kingdoms Supreme Court. The goal of this approach is to provide an educational tool to support students in their study of lengthy and complex documents. Students can be offered the output generated by the proposed approach for several purposes. For starters, the output can serve as a tool to pre-process text by providing a foundation of the information. Secondly, the output of the approach can be used by students to assess their knowledge of the domain. This can be achieved by confronting them with information and links they might not have considered before, or by promoting them to think critically of the content that is presented to them. Finally, because students have to deal with large quantities of information, it could serve as a tool to extract knowledge in a time-saving manner.

A Supreme Court case document has been selected as input for the generation of a skeleton concept map. The selection is based on three criteria. Firstly, case judgements can be very complex. Creating a concept map of these type of documents could help students comprehend some of the complexity that it holds. Secondly, these documents are very large. A random selection of the available documents showed that some contain up to 130 pages. If one were to compare different judgements, automated extraction could possible save a substantial amount of time. Finally, these type of documents provided by the Supreme Court are selected because of their consistency. Different documents display identical structures, allowing the possibility to generalize the results of this thesis to its entire database.

This thesis builds on existing theory and methodology and combines several techniques into a single approach. To achieve this, various academic papers dedicated to conceptual maps are first examined. The theoretical underpinning of the conceptual map, the conponents a conceptual maps consists of, and what conceptual maps are used for are analyzed. Then, research

(7)

dedicated to the automated extraction of conceptual maps is summarized. Categorical differences in their approaches are recognized and discussed in detail. Afterwards, common natural language processing techniques used to construct conceptual maps computationally are explained. Finally, the process of proposition extraction is discussed, with one promising algorithm explained in detail.

After the theoretical analysis, the approach to generate skeleton conceptual maps from legal case documents is explained. First, a set of natural language processing tasks is carried out over the input data. Then, a set of concepts is extracted and ranked in order of relative importance. Following that, propositions are extracted, filtered and ordered by ranking. Finally, a skeleton concept map is put together by hand with the extracted information.

In the analysis that follows, the proposed approach is evaluated by examining the output of each of the steps that were performed. In addition, a domain expert in the field of law is given the opportunity to define a “gold-standard”. This sets a benchmark to which the results are compared on several different levels. Concluding the analysis, shortcomings and potential improvements to the proposed approach are discussed.

(8)

2 Theory

2.1 Research on Conceptual Maps

In 1972, Joseph Novak and his team of researchers at Cornell University were studying childrens emerging understanding of science concepts. They created 28 science lessons, and attempted to understand how children developed knowledge of the concepts presented to them. While analyzing the many interviews they had held with the participants, they found it particularly difficult to determine if the children had acquired new understanding of concepts, and whether this understanding was integrated into their existing knowledge framework (Novak and Ca˜nas, 2006a).

The integration of new concepts into a learners existing knowledge framework was based on psy-chologist David Ausubels assimilation theory. Ausubel makes the distinction between learning meaningfully and learning by repetition, or rote. In learning meaningfully, learners assimilate new concepts into their existing concept and proposition framework, which Ausubel refers to as the learner’s cognitive structure (Ausubel, 1978). He describes the cognitive structure as a hier-archical structure, with general concepts at the top, and more specific concepts underneath. He describes two processes: the process of subsumption, in which learners subsume new knowledge in existing concepts and propositions, and superordinate learning, in which prior knowledge is subsumed into new more general or abstract concepts (Ausubel, 1978).

Because of the trouble Novak and his colleagues had determining if the children had learned the meaning of the taught science concepts, they searched for a tool that would facilitate explicitly displaying the childrens cognitive structure. As such, they developed the Conceptual Map. Since then, the concept map has proven itself useful in many different applications. For instance: as a teaching and learning strategy tool; as a tool for gathering, diagnosing and modeling knowledge; and as an application to guide a collaboration process Fischer et al. (2002). It has received wide attention in the academic world, and has proven its validity in many different contexts Markham et al. (1994).

The concept map, as formalised by Novak and his team, is a structured diagram containing concepts connected by linking phrases. Its a hierarchical tree-like structure, which is often concentrated around a focus question. The focus question creates the context and helps determine the scope of the knowledge that is to be represented. At the top of the concept map hierarchy the superordinate concept is displayed. Concepts can be abstract terms referring to an object or event, with its meaning in part tied to the concepts directly related to it. Novak defines concepts as: “a perceived regularity in events or objects, or records of events or objects, designated by a label” (Novak and Ca˜nas, 2006b). Concepts are indicated by a label inside a box, often consisting of a noun phrase. A different type of abstraction is a construct. Constructs are also concepts, however these are more often intangible or inferential. As an example, motivation is a construct that is not directly observable by itself, it is inferred by other concepts related to it. Although constructs are not displayed differently from concepts, they are often higher up in the concept maps hierarchy.

Linking phrases are the arcs connecting associated concepts, in most cases consisting of a verb phrase. A linking phrase is inherently bi-directional: a concept linking from concept A to concept

(9)

B with the linking phrase has-child, also links concept B to concept A with the linking phrase has-parent. Concepts connected by a linking phrase are referred to by Novak and Ca˜nas (2006b) as propositions: “Propositions are statements about some object or event in the universe, either naturally occurring or constructed. Propositions contain two or more concepts connected using linking words or phrases to form a meaningful statement”.

Figure 1: Breakdown of a proposition

Another feature of concept maps are cross-links. Cross-links are usually realized after the con-struction of the initial concept map. Cross links allow two concepts in different parts of the domain to be linked together and illustrate how these may be related to each other. Research by Novak and Gowin (1984) indicates that students identifying insightful cross-links have reached a high level of understanding of the domain. Figure 2 below displays a conceptual map as defined by Novak and Ca˜nas (2006b) for the subject of photosynthesis.

(10)

Although the concept map has proven to be effective tools for educators, it is not without its drawbacks. Several restrictions to the application of concept mapping apply. Firstly, it is not a simple and quick visualization tool because of the formal rules that have to be abided by. It can be difficult to properly identify the concepts and the relationship between them. Secondly, domains with particularly large interconnectedness between concepts may become very difficult to model. Visually complex concept maps may feel chaotic and its interpreters could become overwhelmed attempting to understand it fully. Finally, concept maps may not be suited for procedural content, with temporal or sequential steps between its different parts.

As discussed above, Novaks definition of the concept map has been extensively researched. Be-cause Novaks model is the first to be widely used in education, it is often assumed to be referenced when concept maps are discussed. However, literature often makes synonymous use of the terms concept mapping, mind mapping, argument mapping and knowledge mapping. Clear distinc-tions between these terms can be made. Although these cognitive tools share many of the same objectives, there are significant differences in their execution. The section below will briefly touch upon the different qualitative visualization techniques intended to capture knowledge, as to eliminate any incorrect notions regarding the concept mapping tool.

2.1.1 Mind mapping

Mind mapping is a non-linear visualization tool. A central concept radiates related categories and concepts, which in turn branch out into several other concepts, images or keywords. Together, the branches form a connected structure. It is a quick visualization tool, allowing different thoughts to be expressed in a spontaneous manner. Different colors, images or emphasizing can be used throughout the mind map, making it a tool fit for expressing associated ideas. As its form is more free than a conceptual map, it is argued to promote creative thinking and brainstorming Davies (2011). However, with its free form come some disadvantages. Often, mind maps can be difficult to read and are inconsistent in its representation. Complex relations between different domains within a mind-map are hard to represent (Davies, 2010). Its main difference with a concept map is its application. Where concept maps can be used to display a person’s cognitive structure, mind mapping is more suitable for taking notes and pre-analytic idea jostling (Eppler, 2006).

2.1.2 Argument Mapping

Argument mapping produces graphs that are also hierarchical in nature. However, argument maps do not focus around concepts. Argument maps represent claims as nodes, with argu-ments supporting the claims attached to them. Its main components are premises, co-premises, rebuttals and objections. Unlike the mind map, the argument map is even more strict in its representation than concept maps, in the sense that all premises supporting assumptions must be grounded. At the top of an argument is a conclusion. Underneath this conclusion lie reasons, which consist of one of the components mentioned above. The main purpose of an argument map is to promote critical thinking. Eppler (2006) summarizes research on argument maps, and concludes that argument maps can significantly improve critical thinking, especially amongst students with poor argument analysis skills. Because of its nature, it is not suitable to describe concepts and relations to the extent that concept maps potentially can.

(11)

2.1.3 Knowledge mapping

Knowledge mapping is general term covering the process of schematically capturing, developing and sharing knowledge. It usually includes concepts and relations, which are either labelled or unlabeled, and can be directional or non-directional. It comprises several graphical representation tools, such as the concept map, organisational charts, or semantic networks. Because its used in different contexts, such as business, education or AI, knowledge mapping serves multiple goals. Because of its many different interpretations, the term knowledge mapping is avoided in this thesis.

2.2 Constructing a concept map

With the theoretical basis of the concept map examined, the theory on how to actually construct a conceptual map according to Novak and Ca˜nas (2006b) is discussed. As has been explained above, concept maps often target a focus question. The focus question serves as a starting point. Then, a “parking lot” of concepts related to the focus question is made. This serves as a pre-processing step to get a clear idea of the domain. The concepts are then put in hierarchical order of importance. Afterwards, the concepts are placed on the map one by one, while simultaneously identifying the links between them. After the links have been drawn, they are given names as to represent the relation between them. Finally, when the map is considered to be sufficiently covered by concepts, cross-links can be added to link different areas of the domain.

Novak and Ca˜nas (2006b) discuss providing a “skeleton concept map” to learners as to scaffold the learning process. A skeleton concept map is an incomplete representation of the domain, a concept map that is halfway through its construction. This approach attempts to provide a foundation for students to construct their own concept maps. One such skeleton map as suggested by Novak and Ca˜nas (2006b) is displayed in figure 3.

Figure 3: An example of a skeleton map as defined by Novak (2010)

(12)

of education” (Novak, 2010), in which he attempts to emphasise meaningful learning over rote learning. As starting a concept map from scratch can be difficult, skeleton concept maps help learners get a head start in structuring their knowledge. Students can also use skeleton concept maps as a self-assessment tool. Finally, it has proven to be an effective tool for collaborative learning, increasing long-term meaningful learning amongst its users (Novak, 2010) (Cicognani et al., 2000) (Fischer et al., 2002). As has been mentioned in the introduction, this aligns nicely with the purpose of this thesis.

However, skeleton concept maps can take considerable time to prepare. Therefore, the task of automatically generating skeleton concept maps becomes an interesting topic. To generating skeleton concept maps computationally, some interesting problems have to be dealt with. The following section describes research dedicated to the extraction of conceptual maps from text.

2.3 Research on computationally generating conceptual maps

In the past, research dedicated to the concept map was mostly aimed at proving the validity as an educational tool, or aimed at using the tool to analyze knowledge structures of participants. Recently however, more research is being dedicated towards generating conceptual maps in a semi-automatic way (Kowata et al. 2009). Interestingly, research in the field of concept map generation does not converge into a single method of execution. Approaches differ in their purpose of research, the structure of the data source, the methods used, the resulting output and in the evaluation of their results. In the section below, we will summarize previous research in the field of conceptual map generation as to discover the possibilities for this thesis.

Olney et al. (2011) discuss an approach for extracting concept maps from biology textbooks. The concept maps generated by their method serve as skeleton concept maps for students in biology classes. They discuss a conceptual map representation first formulated by SemNet: a set of triples focusing around a core concept, linked to other concepts by a pre-determined set of relations. End nodes are arbitrary concepts that do not themselves connect to other concepts. An example of one of their maps can be found in figure 4.

Figure 4: A set of connected propositions taken from Olney et al. (2011)

First, they extracted thirty core concepts from a biology books index and glossary. Then, us-ing a set of 4400 premade biology triples, they clustered categories of linkus-ing phrases by hand and distinguished thirty different types of relations. Afterwards, they ran their triple extractor algorithm on the contents of the biology book. When one of the concepts identified earlier was

(13)

found in a sentence, the algorithm ran a semantic role labeler. Using natural language process-ing tools, they targeted semantic features, adjectives, prepositions and predicates to find the related concept and verb phrase connecting the two concepts. After the triple extraction, four filters were applied to exclude certain propositions: a repetition filter (cell has-property cell ), an adjective filter (when the concept is an adjective instead of a noun), a nominal filter (using a lexical database: light has-property the energy of sunlight ), and finally, a likelihood ratio based on a statistical significance threshold (if concepts do not often exist close together, the propo-sition was discarded). They evaluated their method by having judges rate both their output and a hand made gold-standard. Judges were not informed which map was computer generated and which map was hand made. They concluded that their method is successful in generating skeleton conceptual maps, as outputs are mostly accurate.

The previously discussed research by Olney et al. (2011) does not deal with identifying main concepts from the text. Rather, they selected core concepts from the texts glossary by hand, and focused on extracting linking phrases related to the concepts. Leake (2006) performed both the identification of concepts as well as the linking phrases from the text automatically. Their goal was to create conceptual maps of sufficient quality to support document summarization and to provide a conceptual index to help a human get a basic understanding of the document. They pre-processed the text to speed up the construction of conceptual maps for experts. They split their research into two extraction steps: identify concepts from text, and identify and name relationships for linking phrases by syntactic dependencies between chunks of text. To identify the main concepts, they first parsed the source document and perform word normalization. They clustered all morphological variations by using an algorithm for word stemming and used WordNet to find synonym relations. They ranked nouns by their frequency. The algorithm then searched through the text until it found one of the most highly ranked nouns. Then, searching through the parsed text, they found all pairs of concepts within the same sentence, took the verb phrase that had a dependency link, and assumed the verb phrase represented the relationship between the two concepts. The resulting extraction for the text: “paper is a thin, flat material produced by compressed fibers. The fibers are usually natural and composed of cellulose” can be seen in figure 5.

Figure 5: Example set of propositions by Leake (2006)

In the evaluation of their approach, they only looked at the extraction of concepts, not the related concepts or the representation of the relationship itself. They took a list of annotated documents, ran the algorithm to extract the concept, then used statistical measures such as the precision (ratio of relevant concepts retrieved versus the total number of concepts retrieved) or recall (relevant concepts retrieved versus the total number of relevant records in the document) to score the performance.

(14)

Both of the approaches mentioned above made use of language processing techniques, parsing the text computationally before generating concept maps. Olney et al. (2011) on the other hand, introduced a lot of manual work as to induce students into meaningful learning. First, students created a concept map of the human heart while researching the topic online. Afterwards, the students translated the concept maps into summaries. These summaries served as input source for the conceptual map construction. A list of up to thirty hand-picked concepts was created from the summaries. The system then searched through each sentence, looking for co-occurrences of the selected terms. It aggregated a large co-occurrence matrix table (an example of which can be found in figure 6), and finally converted these to a pathfinder network representation of a conceptual map (which can be seen in figure 7). To evaluate the performance of the approach, they had the students who initially created the conceptual maps rate the generated propositions. They compared these scores with the output of a mathematical algorithm, which determines the least-weighted paths in the graphs.

Figure 6: co-occurence matrix Figure 7: Pathfinder network of co-occurence matrix

Although the approach used by Olney et al. (2011) offers interesting insights into the matching of related concepts by the use of a co-occurrence matrix, the resulting graph differs from Novaks definition of a conceptual map. Clariana and Koul (2004) mention this in their title by phrasing it as “concept-map like representations”, but referred to them as concept maps in the remainder of the text.

Kowata et al. (2010) stated that statistical approaches (such as the former) are not suitable to create concept-maps that are fit for human interpretation. In their research they attempted to achieve exactly this goal: generate human-readable concept maps. Their approach was based on the metaphor “disassemble to reassemble” (Kowata et al., 2010). They generated a pipeline of tasks, which is displayed in figure 8

(15)

They extract their core concepts linguistically from parsed text: identify a noun phrase, verbal phrase and prepositional phrase as a (concept - relation - concept ) triple. From a dataset of 51,397 Brazilian-Portuguese sentences that were manually tagged with semantic relations, they selected roughly two percent of the text. Out of 927 sentences, 927 graphs were produced. To evaluate their result, they manually clustered the graphs into one of three categories: Not-understandable, None, or Understandable (the None category consisted of graphs that did not contain any links between elements). They concluded that their system has decent performance, but had problems with anaphora. Anaphora are expressions of which the interpretation depends upon another expression in the context. The problem of anaphora is to be expected, as they do not consider any semantic dependancy in their algorithm.

Oliveira et al. (2001) use an approach that attempts to deal with anaphora. Their research consisted of two separate modules: TextStorm, which extracted concept relations by using lin-guistic techniques; and Clouds, a machine learning module that built the concept maps semi-automatically by use of “inductive learning”.

The Textstorm Module started by tagging sentences with natural language processing tools. It found the parts of speech of all words in a sentence, then chunked the sentences according to their own grammar (which was not specified further). It then generated propositions by the use of a linguistic algorithm: the subject in a sentence was the main concept, the main verb was the linking phrase, and the object in the verb phrase was used as the second concept in the proposition.

The generated propositions were fed to the Clouds module, a machine learning system that built on feedback from its users. The Clouds module will attempt to guess new relations by the use of inductive learning and will propose these relations to students. Students then provided feedback by rating these relations to improve its performance. For instance, in the concept map shown in figure 9, the dog concept was given two new relations: (dog property small ) and (dog -have - fur ). The system then induced that the dog concept could -have the property relationship friendly, because the concept of cat shared these similarities.

Figure 9: Example of inductive learning system from Oliveira et al. (2001)

Oliveira et al. (2001) discuss a solution for the anaphora problem. Their solution considers the sentence directly previous to the anaphora. In the case of “John was hungry. He entered a restaurant”, the anaphora “he” was resolved by taking the concept “John” from the previous sentence.

(16)

Finally, we discuss research by Tseng et al. (2007). In their search to offer teachers support in creating their courseware, they analyzed students historical test records. They split their research into two phases and dubbed their approach the Two-Phase Concept Map Construction. In phase one, they pre-processed the historical test data. A table of questions relating to concepts was created. They applied Fuzzy Association Rule Mining algorithms to find four association rules. They attempted to discover learning patterns and applied the mined association rules in the creation of new concept maps in phase two. Simplified, given two questions Q1 and Q2, if the concepts in Q1 are a prerequisite for the concepts in Q2, achieving a high grade in Q2 implies that a high grade in Q1 is achieved. Conversely, if a low grade in Q1 is achieved, it is implied that the student may also get a low grade in Q2. This gives teachers an insight into how to adapt their classes. They gave the example of a physics class: concepts 1 - Speed and direction of motion and concept 2 - Change of speed and direction are related Tseng et al. (2007). If students do not perform well on concept 2, teachers can spend time teaching concept 1 to enhance the performance of students on concept 2. The resulting graph displayed the taught concepts and the (quantitative) strength of the relations towards other concepts.

2.4 Conceptual map generation review: looking back

Looking back on the previously discussed research, several distinctions can be identified. These distinctions relate to the purpose of their research, the data sources, the methods used to con-struct the conceptual maps, the resulting outcome and the evaluation of the outcome. These distinctions can be sorted into several categories. In the section below, we will briefly discuss the differences and their impact on the execution of the research.

2.4.1 Purpose of the research

The purpose of the research is a factor in deciding a number of the research’s other attributes, contributing to for instance the completeness of the generated output, the target audience, and the method of evaluation. The purpose of the research can be categorized into two categories: analytical and educational.

The analytical category is characterized by the attempt to advance the work of domain experts. The following purposes are identified:

• Support domain experts in sharing and generating knowledge (Leake (2006); Kowata et al. (2010); Alves et al. (2001));

• Add to the analysis and understanding of text research (Alves et al., 2001);

The educational category is characterized by its purpose to use concept maps as educational tool and to extend its use in the classroom:

(17)

• Support students by assessing their own knowledge with others, providing a collaboration tool (Clariana and Koul, 2004);

• Provide teachers with insights into possible modification of teaching practices (Tseng et al., 2007);

2.4.2 Structure of the data source

The data sources used for the concept map construction in the reviewed research was in most cases dependent on the purpose of the research. For educational purposes, often data sources related to education were used. Broadly, we can identify structured and unstructured data sources. Unstructured text is considered to be regular text that is not pre-annotated either computationally or manually. The data sources in approaches using unstructured texts were:

• Text books of a specific educational domain (Olney et al., 2011); • Texts by domain experts (Leake, 2006);

• Non-specific unstructured text (Oliveira et al. (2001); Clariana and Koul (2004));

Data sources in the structured category can be defined as data that was in some form annotated. Among this category were the following types of data sources:

• Texts from a large corpus with semantic structures manually annotated (Kowata et al., 2010);

• Statistical data with annotations of related concepts (Tseng et al., 2007) ;

2.4.3 Methods used in the generation of conceptual maps

Although each of the authors have their own approach in constructing the conceptual maps, processes involving linguistic analysis are most common. Other methods can be categorized as machine learning techniques, statistical techniques or a combination of these.

The linguistic analysis approach deals with common language processing techniques, such as sentence boundary detection, tokenization, part of speech tagging and chunking. In some cases, external assets such as WordNet or NomBank are used to compliment the performance of the system by finding morphological variations of words (Oliveira et al. (2001); Leake (2006); Olney et al. (2011)). They build propositions from sentences and apply rules to filter results that are unlikely to make sense or add value.

The statistical approaches either make use of a quantitative data source or quantify the contents of the data source. In the quantitative data source, a relation is established between statistics and concepts, and further analyzed to extract regularities (Tseng et al., 2007). The quantification of the data source involves counting the appearance of concepts in text (Clariana and Koul, 2004). Although the results of the approach do not represent a conceptual map as formalised by Novak and Ca˜nas (2006b), one can consider the results to be naturally “objective”, in that they output results of hard data.

(18)

The machine learning techniques that are applied involve requesting feedback from users to increase the performance of the system. These techniques however are not used in the automatic construction of conceptual maps alone, but used to complement performance of the linguistic processes in an earlier phase (Olney et al. (2011); Oliveira et al. (2001)). As a result, the output of the two approaches combined are of significantly better quality than the other approaches.

2.4.4 The result output

The result of the used methods can be categorized into three classes: a fully connected graph; a connected graph without named relations; or merely proposition triples. As mentioned above, the highest quality output is achieved by a combination of linguistic and machine learning processes (Olney et al. (2011), Oliveira et al. (2001)), they generate a fully connected graph. Their results however are not sorted hierarchically, as intended by Novak and Ca˜nas (2006b). Kowata et al. (2010) also claim to produce connected graphs, however it is suspected that they generate a lot of “small” graphs containing only a few connected concepts.

Tseng et al. (2007) and Olney et al. (2011) produce connected concepts without describing the relation between these concepts. For the purpose of their research, the naming of linking phrases was not relevant.

Finally, the approach producing proposition triples is meant to provide information for readers to interpret as a text pre-processing step (Leake, 2006). As multiple propositions with the same starting concept can exist, they can connect the concept node to several other concepts with different relations. The fully connected graph is made manually after analysis of the propositions.

2.4.5 The evaluation of the results

Lastly, a distinction between the methods of evaluation of their results can be made. Generally, the researchers that performed an evaluation attempted to identify either the measure of “cov-erage” of their output on the initial data source, or rated the quality of the output on a certain scale.

Authors differ in the way they measure coverage. For instance, Leake (2006) takes a set of pre-annotated documents, and measure the precision and recall only for the concepts that were extracted. Linking phrases or the proposition structure are not taking into consideration. Olney et al. (2011) on the other hand, take a combined approach by creating a gold-standard concept map by a domain expert, then have human judges rate both concept maps on a scale that represents coverage. Oliveira et al. (2001) measure only the precision score.

Kowata et al. (2010) do not measure the coverage. Instead, they have domain experts rate the output on a three-point scale of comprehensibility. Finally, to complement their statistical approach, Tseng et al. (2007) use statistical measurements to rate the redundancy and circularity of the output.

The many different approaches to evaluating the output demonstrates the difficulty of rating concept maps properly. Indeed, Wandersee (2011) mentions that because of its many features,

(19)

concept maps do not lend themselves to a single dominant method of evaluation. An evaluatee should consider if more weight is to be given to for instance the hierarchical order, the scientific validity of the propositions, the labeling of relations, or the graphical effectiveness.

2.5 Language processing techniques in detail

Natural language processing (NLP) is concerned with the manipulation of natural language by computers. Applications of technologies based on natural language processing have become very common, such as speech recognition in mobile phones or language translations by online services such as Google Translate. Various linguistic processes solved by computational techniques were briefly mentioned in the literature review. Amongst these processes are sentence segmentation, tokenization, stemming, part-of-speech tagging and chunking. These processes form a pipeline of language analysation, and lay at the heart of proposition extraction. The following section describes the natural language processing pipeline and the difficulties that arise when attempting to solve these problems computationally.

Many NLP systems are based on supervised machine learning. Where in the past NLP systems were created manually with predefined rules and handmade dictionaries, the many exceptions in language and the availability of large corpora of text allowed researchers to automatically create NLP systems. Machine learning based NLP systems, when sufficiently trained, are generally more robust than manually built NLP systems (Rajaraman et al., 2011).

Supervised machine learning is the task of building a predictor model based on input training data. The training data consists of a set of inputs labelled or annotated with the correct answer. The supervised learning algorithm analyzes the data and seeks to create a predictor model that can classify data which it has not encountered before. For instance, in a part-of-speech tagger, a corpus is provided with words manually labeled with its correct part-of-speech. It uses its “experience” with the training data to solve the problem of tagging new data.

2.5.1 Sentence boundary detection

Because many of the natural language processes deal with separate words, one of the first steps in NLP is the segmentation of sentences. At the core, sentence segmentation is about splitting text into a sequence of logical units. As an example, take the following text:

"Bas talked to Jaimy. She was very upset."

A supervised machine learning sentence detector analyzes this sentence, and could generate two sentences as output:

1: "Bas talked to Jaimy." 2: "She was very upset."

(20)

In essence, this seems like a simple problem. Look for a period, exclamation mark or other forms of punctuation, and split the sentence at the position where it is encountered. However, this approach becomes inaccurate quickly because of the ambiguity of punctuation marks. For instance, consider the following input:

"Dr. Bas talked to Jaimy; she was very upset."

The sentence detection rule defined above would incorrectly output the following classification:

1: "Dr."

2: "Bas talked to Jaimy;" 3: "she was very upset."

While arguably, the input could be classified as a single sentence. Although new rules can be defined to deal with these kinds of exceptions, training a classifier with large amounts of varied annotated data will result in more robust output. After a text has been split into sentences, it is fed to the tokenizer.

2.5.2 Tokenization

The tokenizer has the task of detecting word boundaries, splitting a sentence into “tokens”. Similar to sentence boundary detection, this seems like a simple problem: analyze a sentence and split a word when a whitespace character is encountered. In reality, the detection of individual words is not that simple. Consider the sentence:

"Bas doesn’t like the Rotterdam-based football club."

This sentence displays two problems: doesn’t can be defined as two words: does and n’t (not), and Rotterdam-based could be split at the hyphen. A sufficiently large training set would catch exceptions that are easily overlooked by hand-written grammars. Tokenization often serves as a pre-processing step, dividing texts into units that are to be processed further. Even though it is considered one of the easier steps in the NLP pipeline, errors made at this stage will propagate to later stages and cause errors early in the process (Rajaraman and Ullman, 2011).

2.5.3 Stemming and Lemmatization

One of the tasks in text-analysis is concerned with determining the importance of specific words within a text. A method that can be performed to achieve this is counting the frequency of appearance of words within the text or corpus. However, because of the many morphological variations of words, treating each words as is will result in a distorted view of the outcome. For instance, the words “breaks”, “broke” and “broken” are all variations of the word break. To determine the frequency of words, a form of normalization should be performed.

(21)

Stemming and lemmatization are two methods to deal with this problem. Stemming is the process of removing derivational affixes to a base form, or stem. Stemming is usually performed by algorithms that chop off ends of words. This means that the reduced form is not always a valid morphological root. For instance, consider the words “rescue” and “rescuing”. A stemmer would reduce these words to the base form of “rescu”. This makes stemming kind of a blunt approach. Lemmatization on the other hand, reduces words to linguistically valid lemmas. This is done by comparing words to records in a lexical database. Confronted with the sentence below:

"Those men have broken all their promises"

A lemmatizer would compare the words with a database and could output the following result:

those man have break all their promise

Notice that it identified the lemmas of words with irregular conjugations. Lemmatization is in theory also capable of taking the context of a word into account to return the appropriate result. The word “meeting” can be interpreted as either a noun or a verb. In the case of a noun, a lemmatizer would return the lemma “meeting”. If the word is used as a verb, it would result in the lemma “meet”. In this sense, a lemmatizer is a more sophisticated method of normalization.

2.5.4 Part-of-speech tagging

A part-of-speech (POS) tagger denotes the part-of-speech of a word inside a sentence, such as noun, verb or adjective. As input, the POS tagger takes the tokens identified at a previous stage. Because a word can represent a different part of speech depending on the way it is used, there is no standard set of POS tags. This is exemplified in the following sentences:

1: "Can you answer my difficult question?"

2: "What is your answer to my difficult question?"

In the first sentence, the word answer is expressed as a verb. However, in the second sentence, answer can be classified as a noun.

One of the ways a trained NLP system can identify the POS tag is by a combination of frequency-based tagging and transformation frequency-based tagging. First, it checks the most often occurring tag for a token. The system then takes the context into consideration by looking at the sentence as a whole. If a token most often occurs as a verb (frequency based), but is sufficiently often tagged as a noun when preceded and succeeded by certain functions, the tag is changed (transformation based).

Taking our previous example, we can analyze the following results of a trained POS tagger: 1: Can/MD you/PRP answer/VB my/PRP$ difficult/JJ question/NN?

(22)

In the first sentence, answer is classified as VB. In the second sentence, answer is classified as NN. These notations are commonly used abbreviations defined by the Penn Treebank Project, a project that set out to generate a large annotated corpus. The full range of notations can be found in appendix A. After tokens have been tagged with its POS, they can be “chunked” together to try to reason about what parts of text belong together.

2.5.5 Chunking

Chunking is a technique used to segment and label text in syntactically correlated sequences. It is a pre-processing step for natural language understanding, and forms a basis for the extraction of propositions from text. Chunking is concerned with applying a “grammar” learned from annotated training data. The grammar consist of rules that determine how sentences are chunked. For example, one of the rules could be that Noun Phrase (NP) chunks should be formed whenever a noun (NN) preceded by a personal pronoun (PRP) is found. The abbreviated notations and their definition can be found in Appendix B). Analysing the sentence used above, a chunker could conclude the result shown in figure 10.

Figure 10: Breakdown of a proposition

Chunking is an important step in the identification of entities. Imagine the problem of trying to identifying persons, organisations or locations from text. These terms can consist of multiple words. In the sentence “Bas Sijtsma visited the United States of America”, a chunker could identify that the words “Bas” and “Sijtsma” or “the”, “United”, “States”, “of”, and “America” are single entities.

2.5.6 Parse tree

A parse tree is an ordered representation of the syntactic structure of a sentence. It attempts to identify the possible relations between parts of a sentence in the form of nodes and branches. It does so by the use of context-free grammars, a set of rules to generate patterns of strings. POS tagged tokens and chunks created in the previous stages serve as input for the tree construction. However, a parse tree is a representation of the likely interpretation of a sentence. This can be problematic. Consider the sentence:

(23)

Subtle meanings of words can affect how the sentence is interpreted. The sentence above could mean that the act of driving a car is dangerous, or that cars that are driving are dangerous. This would be represented in different parse trees.

To generate a parse tree, the parser needs detailed semantic knowledge of specific words in a sentence. It needs to be able to predict whether dangerous refers to driving or if it refers to cars. The actual output of the parser depends on the training data that served as input and the grammar rules that were inferred from it. Parse trees can be used to infer meaning from sentences. For example, in Rusu et al. (2007), researchers use parse trees to identify the subject and object in sentences.

2.6 Proposition extraction algorithm

As has been discussed, conceptual maps consist of a set of linked propositions in a hierarchical structure. This puts proposition extraction at the center of the problem. As has been briefly mentioned in the literature review above, natural language processing approaches make use of triple extraction algorithms. These algorithms consist of specific grammars, a rule-set that define a proposition. For instance, consider the following sentence:

Bas eats pizza on Sundays.

A very basic grammar could define that a proposition consists of the sequence (noun verb -noun). Running this rule-set over the example sentence, the proposition: (Bas - eats - pizza) would be extracted.

One very promising algorithm to extract propositions comes from Fader et al. (2011). They start by criticizing algorithms defined by others in the field of information extraction by stating that the resulting linking phrases are often incoherent and omit critical information. This is caused by algorithms that prioritize the extraction of nouns as concepts. They provide the following scenario (Fader et al., 2011):

1. "Faust made a deal with the devil" 2. Faust - made - a deal

3. Faust - made a deal with - the devil

The first sentence is the example sentence. The second sentence is a proposition extracted by algorithms prioritizing concepts: nouns are selected first, followed by the selection of a verb to bridge the nouns. The third sentence is a proposition extracted by their proposed algorithm, which prioritizes linking phrases. Clearly, the third proposition is much more informative. The algorithm selects linking phrases based on a syntactic and a lexical constraint. The syntactic constraint consists of three rules. These rules state that a relation phrase should be either one of the items in table 1:

(24)

1. A verb ’Was’ 2. A verb followed by nouns, adjectives or adverbs, but ending in a preposition ’Was in’

3. Multiple adjacent sequences of the above ’Was primarily built in’

Table 1: Syntactic constraints of the proposition extraction algorithm

However, the syntactic constraint can match linking phrases that are very long, and therefore become overly specific. To combat this, a lexical constraint is used. Fader et al. (2011) analyzed a corpus of 500 million sentences. Linking phrases were matched with concepts according to the syntactic constraint in table 1. Then, all linking phrases that have less than 20 distinct concept pairs are filtered. This results in 1.7 million distinct relationship pairs. Simplified: if the linking phrase occurs with less than 20 different concept pairs, it is probably too specific to be meaningful.

After a linking phrase is identified, the algorithm attempts to find the concepts related to it. To find the related concepts, it searches for a noun-phrase chunks closest to the identified linking phrase. The noun-phrase left of the linking phrase serves as the first concept. This noun-phrase is not allowed to be an anaphora. The noun-phrase to the right of the linking phrase serves as the second concept. If no two particular noun phrases can be found, no proposition is extracted. Finally, when the noun phrase to the left of the linking phrase does consist of an anaphor, the algorithm attempts to perform anaphora resolution in a similar approach to Oliveira et al. (2001). When a word such as “which” is found, it is skipped as a concept, and the noun phrase to the left of the anaphor is selected. In the following example:

"Bas lives in Amsterdam, which is the capital of Holland"

The first proposition is extracted without problems: (Bas - lives in - Amsterdam). Then, it starts by identifying the linking phrase “is”, skips over the word “which” (an anaphor) to the noun “Amsterdam” to form the proposition (Amsterdam - is - the capital of Holland).

The algorithm is compared to a set of state-of-the-art proposition extraction tools on its precision and recall scores. Its performance was also rated manually by human judges. It was concluded that the algorithm outperformed other tools substantially (Fader et al., 2011).

2.7 Concluding

In the section above, the theoretical basis of the conceptual map was discussed. The composition of a conceptual map: concepts; linking phrases; propositions and the hierarchical structure were described. Other knowledge representations that share similarities to the conceptual map were briefly mentioned to demonstrate their different purposes. The development of a new tool by Novak, the skeleton concept map, was introduced.

Afterwards, research dedicated to generating conceptual maps computationally was reviewed. Several distinctions were identified in their purpose of research, the structure of the data source,

(25)

the methods used in the approach, the resulting output and method of evaluation. An important aspect in most of the reviewed research related to natural language processing tools, which were then described in detail.

Clearly, the natural language processing tools that were described provide useful input for the purpose of information extraction. These tools serve as the basis for the extraction of propositions by algorithms. In the following section, a method is proposed to generate conceptual maps from text based on reviewed research, natural language processing tasks and one such proposition extraction algorithm.

(26)

3 Methodology

Categorical differences were distinguished in the literature review. Subsequently, the approach to generate conceptual map proposed in this thesis is described in the same fashion. Though the purpose of this research has been discussed in the introduction, it is briefly repeated for consistency. After that, the data source is defined. Following that, the methods used to extract the conceptual map from the input data is explained. Thereafter, the resulting output and method of evaluation is described.

3.1 Purpose of research

This thesis was written with several goals in mind. The first goal was to provide students with a tool to pre-process text. This could potentially help them in gaining an understanding of the input data by providing a foundation of the information. This could be especially helpful if the input data source is large and complex, such as the text being analyzed in this research. The second goal was to provide learners with the opportunity to self-assess their knowledge by confronting them with specific information. The third and final goal is to tackle the problem of large quantities of available data. The proposed approach could provide a tool to extract knowledge in a time-saving manner.

The first two goals clearly fit the educational category. The third goal could be said to benefit both the educational and the analytical purpose depending on the context in which the tool is used.

3.2 Type of data

Two types of data sources were identified in the literature review: structured and unstructured data. Generally, a structured data source is either annotated text or quantitative data. Any text not containing meta-data of its content is placed in the unstructured data category. This means that text could contain enumerations or lists, and appear structured in a way, but still be defined as unstructured. For the purpose of this research, legal documents provided by the United Kingdom’s Supreme Court is used (Court, 2014a). These are unstructured documents, containing background information of an appeal and substantiations of a verdict. The Supreme Court provides both the full length judgement, some containing up to 130 pages, and a press summary.

These documents were selected because of their complexity, their length, their consistency and their availability. The complexity and length of text are attributes that can be troublesome for students. If concept maps of sufficient quality are created, it could substantially help them in their understanding of the document. Imagine for instance comparing two verdicts with similar backgrounds. The consistency of the documents allows some form of generalization of performance across their entire database. Finally, the United Kingdoms Supreme Court provides these documents for an indefinite amount of time, ensuring that any reader attempting to access these documents later will be able to do so.

(27)

A single document is used for the construction of a conceptual map. This is done to ensure that the output of the approach is somewhat bound within a subject and not focused around many different domains. Of course, this is a very simplified model of reality. The single input document may contain multiple topics that require distinct concept maps. Also, multiple documents could refer to the same domain, making them appropriate for merging to generate a more complete image of the subject. Therefore, this approach requires the assumption that the input data source is specifically related to a single domain. This assumption is evaluated in the analysis.

3.3 Conceptual map extraction method

The linguistic analysis approach is selected for the extraction of conceptual maps. This approach is chosen for two reasons. First, natural language processing has great potential to deal with unstructured text. Using statistical approaches would require quantifying the input data in a way. One can easily imagine information getting lost in the translation. It is therefore not deemed appropriate for the purpose of this research. Second, the availability of free high-quality natural language processing libraries makes this approach very accessible. Many excellent researchers have contributed to these libraries to tackle problems that no longer have to be dealt with in this thesis. Although there is great potential for machine learning techniques to improve the quality of the extractions, it is left to examine in further research to set a boundary on the scope of this thesis.

Architecture of the system

The construction of a conceptual map will proceed in a series of steps, as shown in figure 11. The steps are explained in detail below.

Figure 11: Architecture of the conceptual map building

Step 1: Document NLP parsing

The first step consists of preparing a document for input and performing the natural language processing pipeline on its content. The preparation of the document is done by saving the text that will be analyzed to a separate file. The text remains unaltered otherwise. The file is then read by the system, which performs the sentence boundary detection, tokenization, part-of-speech

(28)

tagging and chunking with the help of Apache OpenNLP.

OpenNLP is a machine learning based NLP toolkit. Although there are many alternatives available online, OpenNLP was selected based on three criteria. Firstly, OpenNLP is capable of performing all NLP tasks that are required. Secondly, one of the goals of OpenNLP is to provide a large number of pre-built models for the training of classifiers. These models were freely available for download, which meant that lots of functionality was available out-of-the-box. While some other toolkits require pre-defined grammars to perform the task of chunking, OpenNLP is able to run this task based on the training models alone. Thirdly, it provided a minimal interface in Java, with which some experience was already had. Finally, it is (somewhat) actively developed, with a large community that can provide support.

Step 2: Extract all nouns

As stated in the first chapter, concepts are commonly represented by nouns (Novak and Ca˜nas, 2006b). For the purpose of finding the concepts in the text, all nouns tagged in the previous step will be extracted and saved in a separate file. The consequence of this approach is that any concept not represented by a noun, or concepts incorrectly missed by the POS-tagger, will not be extracted. A sample of the input text will be evaluated to determine the scale of this problem. Step 3: Match morphological variations

Several variations of the same noun may be extracted. To deal with this, one can perform the task of stemming and lemmatization. As discussed in an earlier chapter, lemmatization is a more sophisticated approach if the lexical resource used is sufficiently large. Lemmatization can also return the appropriate lemma depending on the part of speech. Because we have already performed the POS tagging in an earlier step, we will make use of this opportunity to achieve the best results.

The lexical database that will be used for this step is WordNet Miller (1995). It is one of the largest freely available lexical resources that can be used for the process of lemmatization, containing over 150.000 words in total. It is also capable of returning the correct lemma based on the POS tag.

Step 4: Rank nouns in order of frequency

Now that all variations of the same noun have been matched, the term frequency measure can be used to determine the rank of a potential concept. It can be questioned if term frequency is the right measure to rank concepts with. Normalizing the order of appearance within a corpus by for instance term frequency-inverse document frequency could be considered as another approach suitable for this task. However, the goal of this research is to generate conceptual maps from a single data file. Term frequency within the input data is therefore deemed most appropriate, and will be evaluated on its performance.

Step 5: Apply proposition extraction algorithm

The system returns to the input document that has been parsed in step 1. Because the text is now annotated with its POS and chunks, it’s ready for the proposition extraction algorithm. This step extracts both the linking phrases and the concepts, and returns them in the format (concept - linking phrase - concept ).

The extraction of propositions will be done according to the algorithm proposed by Fader et al. (2011). This is done for four reasons. Firstly, a tested and tried extraction algorithm defined by

(29)

experienced researchers is likely to be of better quality than an algorithm the author of this thesis can come up with within a reasonable time span. Secondly, as described in the literature review, it outperforms many other state-of-the-art proposition extraction tools. Thirdly, other algorithms require the generation of a parse tree. Not only is this a very expensive task computationally, it is also sensitive to ambiguity. As has been explained, the performance of a parser is very much dependent on the type of training data that is provided. The available training models are not specifically focused around the same type of text as our input data. Although this assumption is not tested, it can possibly hurt the performance of the parser and consequently the performance of the proposition extraction. Finally, Fader et al. (2011) are the only authors that made their extraction algorithm fully available for others to use. Other authors omitted details related to the extraction algorithm or did not specify the algorithm at all.

Step 6: Perform proposition filtering

Because the proposition extraction algorithm only analyzes the annotations of the data and does not consider its content (apart from its lexical constraint), some propositions will be extracted that are not useful for representation in a concept-map. In this step, two filters will be applied to narrow down the list of available propositions. The first filter that is carried out is the repetition filter, as proposed by Olney et al. (2011). The filter works by comparing the two concepts in the proposition with each other. If the text in one of the concepts is identical to the text in the other concept, the proposition is discarded.

The second filter serves to clean up propositions consisting of mostly symbols, special characters and punctuations. This filter is a safety net in case the NLP methods fail to catch and properly deal with these type of characters. Whenever more than half of an item in the proposition is not regular text, it is discarded.

Step 7: Select only propositions containing top ranked nouns

A concept map often consists of approximately thirty different concepts Novak and Ca˜nas (2006b). Therefore, only propositions containing at least one of the thirty most appearing nouns identified in step 4 will be selected. Propositions not containing any of these nouns will be discarded.

The amount of output that the proposition extraction algorithm generates is proportional to the length of the text. This step is an attempt to ensure that only propositions containing critical information remain. The result of this step is a list of propositions in hierarchical order. Step 8: Combine propositions into skeleton concept map

The final step consists of putting together the selected propositions. In this step, propositions are manually selected and combined to form the skeleton concept map. Items taken from the list of most important concepts generated in step four will be put in the concept “parking lot”, where they can be used by a learner to connect to the rest of the graph.

3.4 The resulting output

Because the proposed method is capable of returning propositions with named relations, the output of the system will be a set of propositions ordered by their importance. It is expected that many propositions share similar concepts. These will be combined into a skeleton concept

(30)

map by hand. The graphical tool used to represent the concept map is considered to be arbitrary.

3.5 Evaluating the output

As illustrated in the theory review, because of the many features of a concept map, no single dominant approach for evaluation yet exists. Regardless, a commonly used technique for evaluat-ing various aspects of the concept map is usevaluat-ing a gold-standard approach. To evaluate the results of the proposed approach, a method somewhat similar to those of Leake (2006) is used. A set of instructions to construct a conceptual map is provided to a domain expert. These instructions are taken directly from research by Novak and Ca˜nas (2006b). He is then given the same data source as the data used in this research, and will proceed to attempt to construct a concept map. All propositions generated by the domain expert are extracted and compared to the propositions that were automatically constructed. Using the gold-standard propositions, a measure of preci-sion and recall can be calculated. All distinct concepts identified by the domain expert are used as “correct” concepts. The precision score is calculated by the ratio of correct concepts that were extracted compared to the total amount of concepts that were extracted. The recall score will be calculated by the ratio of the correctly extracted concepts compared to the total “correct” concepts identified by the domain expert. These two measures will serve as a way to rate the coverage of the used method on the input data.

After the propositions have been evaluated, the skeleton concept map that is created is criticized. Shortcomings and potential improvements are considered. Finally, observations about the overall quality are made to serve as feedback for the conclusion of this thesis.

(31)

4 Analysis

4.1 Evaluation of input data

The proposed methodology is tested on a Judgement document from the United Kingdoms Supreme Court (Court, 2014a). The document was randomly selected from the first page of the Supreme Court website. It describes the judgement made on the 21st of may 2014. It contains twenty five pages: the first four consist of an introduction of the case, followed by six pages of information related to the appeal. Then, three pages describe the decision of the Upper Tribunal. The remainder of the document describes the issues that the Supreme Court has to consider, and the arguments that led them to their final judgement.

Summarized, the case deals with the use of expert evidence in asylum appeals. In asylum appeals, a crucial issue is whether an appellant is honest about where they come from. The appellants claimed to come from a particular region of Somalia where they were at risk of persecution. However, based on linguistic evidence, the Secretary of State dismissed those claims, stating that their speaking was linked to Kenya instead. The linguistic evidence was in the form of linguistic analysis reports, provided by a Swedish commercial organisation called “Sprakab”. Sprakab’s methods involve listening to audio recordings of speech, followed by interviews with the speaker that are processed by anonymous linguists and analysts.

During the case, changes were implemented in the process of hearing asylum appeals. A new set of regulations (Practice Directions) that applied to the immigration and asylum chambers was issued, which contains guidance on the principles in applying expert evidence. The Practice Directions contained no specific guidance for evidence in the form of the Sprakab Reports. The Supreme Court was to consider five issues: if the immigration judges were to attribute weight to the reports generated by Sprakab; if the witnesses from the Sprakab organisations are to be granted anonymity; if there are any particular rules governing expert evidence offered by organisations instead of individuals; to what extent the evidence not in a form specified by Practice Directions can be accepted; and to what extend the Upper Tribunal can give guidance to the weight that is given to the expert evidence reports.

The appeal is unanimously dismissed by the Supreme Court. It decided that the Sprakab expert evidence reports, if the process is properly checked, were to be accepted. The weight given to these reports should be examined critically in any future cases. The expert witnesses anonymity is justified in this particular case, but it remains to be determined in any future cases.

After analyzing the document, it can be concluded that it seems fit for the purpose of this research. As mentioned in the methodology, for the document to be suitable for the automated extraction, it should not cover many different topics. This is to ensure that a single concept map is the appropriate tool to coherently cover all information. The summary above mentions concepts that mostly relate to the same domain: the use of linguistic analysis by experts as evidence, and practice directions and regulations related to the use of this type of evidence. Furthermore, using the suggested approach on this document seems suitable for the educational purposes this thesis intends to address, as it is quite lengthy and rather complex. It took several read throughs for the author of this thesis to fully comprehend the issues that this judgement

(32)

deals with.

4.2 Evaluation of the language processing tasks

The extraction of concepts proceeds according to the natural language processing pipeline. From the POS tagged text, all nouns are extracted. This means that any errors made in the identifica-tion of nouns by the tagger will propagate into the steps that follow. To evaluate the performance of the tagger, the Press Summary document of the same judgement (Court (2014b)) is analyzed. Firstly, all nouns in the text were annotated by hand. Then, the NLP pipeline was run over the document and compared with the manually notated data. After the comparison, the precision and recall measure was calculated. The results are displayed in table 2.

Extracted nouns 263

Correctly tagged nouns 263

Total nouns in data source 266 Precision - Correctly extracted nouns / extracted nouns 98.87% Recall - Extracted nouns / Total nouns 100%

Table 2: Performance of POS tagger

Out of 960 tokens, the tagger identified 263 nouns, missing only 3 nouns by mistaking them for verbs or adjectives. It did not incorrectly tag any tokens as a noun that were in fact of a different part of speech. It can be concluded that the tagger performed exceptionally well in the task of noun tagging, which means that concepts are extracted from the press summary with high reliability. It cant be said with certainty that the NLP tasks carried out on the full case judgement text performed just as good. However, because the language is very similar in both types of text, the high scores in the press summary allows the assumption that the performance is at least adequate.

4.3 Evaluation of concept ranking

To rank concepts, nouns were extracted from the full judgement text and lemmatized with the lexical database. All identical lemmas were summed to count the frequency of appearance. In total, 2930 nouns were extracted from the judgement. Out of these nouns, 674 distinct lemmas were identified. 518 of these occurred less than five times. A list of the fifty one most occurring lemmas can be found in table 3. The full list of lemmas and their frequency can be found in appendix C.

Semi-Automatic Construction of Skeleton Concept Maps from Case Judgement Documents

University of Amsterdam

Bachelor thesis

Information studies

Semi-Automatic Construction of Skeleton Concept Maps

from Case Judgement Documents

Author:

Bas Sijtsma

Supervisor:

Dr. A. Boer

Contents

1

Introduction

2

Theory

2.1

Research on Conceptual Maps

2.2

Constructing a concept map

2.3

Research on computationally generating conceptual maps

2.4

Conceptual map generation review: looking back

2.5

Language processing techniques in detail

2.6

Proposition extraction algorithm

2.7

Concluding

3

Methodology

3.1

Purpose of research

3.2

Type of data

3.3

Conceptual map extraction method

3.4

The resulting output

3.5

Evaluating the output

4

Analysis

4.1

Evaluation of input data

4.2

Evaluation of the language processing tasks

4.3

Evaluation of concept ranking