Finding relevant legal case law documents by reference structure

(1)

UNIVERSITY OF AMSTERDAM

Finding relevant legal case law

documents by reference structure

by

Wolf Bernard Willem Vos

10197923

BSc Artificial Intelligence Final Project 18EC University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors dhr. dr. R.G.F Winkels dhr. dr. A.W.F. Boer Boer

Leibniz Center for Law University of Amsterdam

Faculty of Law Vendelstraat 8 1000 BA Amsterdam

(2)

UNIVERSITY OF AMSTERDAM

Abstract

by Wolf Bernard Willem Vos

In this research project an attempt is made to find improvements for a legal recommender system. This project focuses on recommending case law within the Dutch tax domain. Bag-of-words combined with TFIDF weighting and cosine similarity has been used to find the textual similarity between two case law documents, this serves as a baseline for evaluating. The same algorithms are used to calculate the reference structure similarity between two documents. The references are extracted from the documents by a parser that has a precision of 0.55 on this domain. Experts evaluated the recommendations done by the baseline and the text similarity combined with the reference similarity algorithm. The evaluation resulted in the conclusion that adding a similarity measure on reference structures is not performing as well as text similarity alone if not all references are identified by the parser.

(3)

Introduction

Legal workers in the Netherlands have been struggling to find a system that provides useful recommendations of legal documents regarding the specific case they are working on (Winkels et al., 2013). In the past decades, a significant amount of recommendation systems have been designed that perform well for a great number of subjects. However, it seems that these systems fail to perform as desired on Dutch legal documents. In an effort to create a system that delivers the requested performance, Winkels et al. (2014) of the Leibniz Center for Law have made significant contributions to solving this problem by creating their own legal portal called ’OpenLaws: A legal recommender system’. This is a system where users can read legal documents, the one they are reading is called the focus document. The system recommends legal documents to users which are relevant to the specific case they are working on. Despite recent improvements to the system, it still does not perform as expected.

This research project tries to find ways to improve the performance of the system and focuses on finding case law documents which are relevant to a Dutch tax case law docu-ment in focus. Firstly, a bag-of-words combined with term frequency inverse docudocu-ment frequency (TFIDF) and cosine similarity is implemented to serve as a baseline for the evaluation process. Secondly, if adding a centrality measure would assist in finding rel-evant articles is investigated. Lastly, the possibility to link relrel-evant case law documents by their outgoing references to other case law documents is examined. This is done by applying a bag-of-words algorithm (combined with TFIDF and cosine similarity) to the references found in the case law documents (bag-of-references). The references in the case law documents are extracted from the plain text by a parser which is originally created by Vredebregt (2014) (for Dutch immigration law) and edited in this project to

(6)

Introduction 2

improve performance on Dutch tax law. In order to evaluate the system, a group of ex-perts are asked to rank the resulting relevant documents recommended by the different algorithms.

(7)

Chapter 2

Related Work

2.1 Adding a centrality measure

In Winkels et al. (2013) two forms of a centrality measure (degree centrality & be-tweenness centrality) have been used to find relevant legal documents. In their research centrality is only applied to six laws and no case law. Therefore, it is interesting to investigate whether the performance goes up if a centrality measure is applied to case law.

A recommender system for academic papers is created in Strohman et al. (2007), because the aim of this paper is similar to the aim of this research project, it was used as a source of inspiration. Strohman et al. (2007) approach the problem by combining several features into one recommendation system. They use a summary of unpublished paper without references as input i. A text based similarity algorithm is then used to find r (related), n relevant articles to i. Subsequently, all articles referred to by r are added resulting in R (related and referred). A centrality measure is applied on the collection of articles R, in this paper they use the Katz centrality which results in a significant contribution to the performance of the system. A Katz centrality method has not been tried yet by Winkels et al. (2014). In the results of Strohman et al. (2007) it is deemed the most important feature for the system, therefore it is an interesting feature to try.

2.2 Sequential relevancy

Shani et al. (2002) created a Markov Decision Process that uses n-grams to create relations between articles. They use Markov Decision Processes (MDP) to recommend articles with a depth of n articles. They take into account the sequential order of

(8)

Related Work 4

articles which are read by users and subsequently concludes that one article is relevant to another when they are read in succession. After reading an article a user generally navigates to a next article by selecting a recommended article. If a user clicks on such a recommendation the weight of it is increased. In Shani et al. (2002) it is stated that the system has to be initialized before the system can start to gathering data. Shani et al. (2002) propose random initialization of recommended articles but it is concluded that this form of initialization is insufficient for the system to work properly. Thus, it is necessary that the system is initialized by a content based algorithm. In our research project an attempt is made to solve this problem for Dutch tax case law.

(9)

Chapter 3

Background Theory

This research project utilizes multiple algorithms and theories, these are described and explained in this chapter. Firstly, centrality and some of its forms are explained. Sec-ondly, theory about relevancy within the law domain is explained followed by the theory behind bag-of-words, term frequency inverse document frequency and cosine similarity.

3.1 Centrality

Centrality is a measure that indicates the influence of the nodes or edges within a network, this influence is calculated by different methods. In this section some of those methods are illustrated.

Katz centrality is a way to measure the relative influence of a node within a network. This is done by calculating the sum of the amount of first degree nodes and other degree nodes. The value assigned to each connection will be attenuated by an attenuation value α. The equation used for this method is shown in equation 3.1.

Ckatz(v) = ∞ X k=1 n X j=1 αk(Ak)jv (3.1)

Degree centrality is a measure for calculating the amount of incoming and outgoing connections between nodes.

CD(v) = deg(v) (3.2)

(10)

Background Theory 6

Betweenness centrality is a measure to calculate the amount of ’data’ that goes through a node. A node is rewarded a ’point’ when a message has to travel through that node from one node to another node. You divide the total of shortest paths that go through the node by the total of shortest paths.

CB(v) = X s6=v6=t σst(v) σst (3.3)

3.2 Relevancy within case law

Relevancy within case law is very hard to define, this is mainly because relevancy is very subjective. A certain document can be relevant for one person and be irrelevant for another. A document can be relevant for an expert but too complicated for a student. To be able to answer this question, the following assumption is made: document X is relevant to person Y, who is working on case Z, if person Y intends to refer to document X in case Z. This initially only shifts the question to: when will someone refer to a document in a case?

A current case can be strengthened or weakened by a previously decided case if it has similar factors. There are multiple ways factors can match or differentiate cases, we can split them up in five ways1 (Wyner and Bench-Capon, 2007):

• An argument can be strengthened because a previous verdict is similar to the situation of the current case and the person is attempting to argue in line with the previous case.

• An argument can be weakened because a previous verdict is similar to the situation of the current case but the person is attempting to argue the opposite of the previous case.

• An argument can be strengthened/weakened as a result of a missing differentiating factor.

• An argument can be strengthened/weakened as a result of a extra differentiating factor.

• Factors can be identified as not in either case so they should not be taken into account.

1

In Wyner and Bench-Capon (2007) seven ways are stated but for clarity four ways have been combined into two ways

(11)

Background Theory 7

These five factors are essential to be aware of when a case law document is used to win a case. In contrast to case law in the United States of America or the United-Kingdom, a previous case can only serve as informative to current cases.

This project intends to aid experts in finding relevant case law documents that can help them make a case. General previous cases are very interesting for beginners or students but not for experts since they already know most of those general cases. Therefore, this project will focus on recommending case law documents that are less well known but which are highly relevant to the expert.

3.3 Text similarity algorithm

In order to be able to determine if the performance of the system increases when adding a new feature a baseline has to be established. In this project this baseline is set by only using bag-of-words processed with Term Frequency * Inverse Document Frequency (TFIDF) and calculating the similarity between documents with cosine similarity.

3.3.1 Bag of words

To grasp the content of a document bag-of-words can be used, which is a simple technique that collects all the terms used in the document and stores those in a vector. In most cases words are used as a term. However, performance can increase when using n-grams to create terms because word order might be more important than just word count.

3.3.2 Term Frequency * Inverse Document Frequency (TFIDF)

Comparison of two document bag-of-words vectors is generally done by cosine similarity (explained in 3.3.3), this is a simple and effective way to calculate the similarity of two vectors. If unweighted, a bag-of-words vector does not portrait the important content of a text because the cosine similarity is biased by stop words (e.g., the, of, he, she) and the total amount of terms in a text (e.g., bigger documents tend to have higher similarities than relatively smaller documents). To solve these problems, TFIDF weighting is applied to the bag-of-words vector.

Term Frequency (TF)

Term frequency tf of a term t is the total amount of occurrences of that term in the document.

(12)

Background Theory 8

Inverse Document Frequency (IDF)

The inverse document frequency for term t is calculated by taking the logarithm of the total amount of documents divided by the total amount of documents that contain term t (Robertson, 2004), thus resulting in the following equation (3.4).

idf (t) = log(N dft

) (3.4)

Combining TF and IDF

In the last step of the TFIDF preparation for calculating the cosine similarity tf and idf are combined. Every term frequency in tf for document d will be multiplied by its respective weight stored in the idf vector and thus resulting in the tf idf vector for document d. This will solve the problem of the bias towards stop words of the bag-of-words vector and thus increases the difference between the meaningful words in a document. In this project the TfidfVectorizor of the python library SciKit-Learn (Pedregosa et al., 2011) is used, therefore the tf idf vectors are normalized before the similarity is calculated, eliminating the document size bias of the bag-of-words vector. Normalization of the vector is done by dividing the vector by its norm as shown in equation 3.5.

normalized(tf idf (d)) = tf idf (d) pw2

1+ w22+ ... + wm2

(3.5)

3.3.3 Cosine similarity

Cosine similarity is a way to calculate the similarity of two vectors which is done by taking the dot product of two vectors and dividing that by the norm of each vector. In this project the normalization step has already been done as described in section 3.3.2. Therefore, only the dot product remains and will result in the similarity s (0 ≤ s ≤ 1) between two documents.

3.4 Data

To be able to conduct this research project sufficient data has to be available. A large corpus of annotated Dutch case law has been made available to the public on

(13)

Background Theory 9

rechtspraak.nl. The data is freely retrievable via an Application Programming Inter-face (API). The documents are in .xml format and generally contain three parts; De-scription, inhoudsindicatie (summary) and uitspraak (verdict). Description contains all the meta data that is available for the document. Inhoudsindicatie contains a short summary of the case and uitspraak contains the full verdict of the case. Ap-pendix A shows an example of a document. Further information about the open data provided by rechtspraak.nl can be found at https://www.rechtspraak.nl/Uitspraken-en-Registers/Uitspraken/Open-Data/Pages/default.aspx

(14)

Chapter 4

Method & Approach

In this chapter the method and approach are described. Firstly the method for gathering usable data is explained, this contains the retrieval of data, the pre-processing of data and reference extraction. Secondly the baseline implementation of the algorithm con-taining bag-of-words is described. Thirdly the bag-of-words method for references and subsequently the motivation and implementation of the combination of the bag-of-words and the bag-of-references is described.

4.1 Gathering of data

Gathering of the data used in this research project is available through the rechtspraak.nl API explained in section 3.4. In this project 5959 documents were used and those were retrieved using the following query:

http://data.rechtspraak.nl/uitspraken/zoeken ?sort=DESC &max=7000 &return=DOC &subject=http://psi.rechtspraak.nl/ rechtsgebied/bestuursrecht_belastingrecht

These documents vary widely in court house and also in year and subject. The variety of documents is sufficient to say that this set of documents represents the average of all documents in the Dutch tax domain.

(15)

Method & Approach 11

4.2 Pre-processing of data

In order to be able to process the data some pre-processing has to be done. For this project only the textual content and the references are used, therefore all unnecessary meta data is removed from the document. This is done by the implementing the following steps:

1. Removal of all the meta data except the references and saving the textual content. 2. Removal of all the XML tags

3. Removal of all characters other than letters and digits. 4. Removal of all double and unnecessary white-spaces

4.3 Reference parser

Most of the documents contain references to laws. Some of those references are listed in the meta data of the XML document, but these lists are limited to recent case law and are rarely complete. An attempt to solve this problem has been made by Vredebregt (2014) by implementing a parser which extracts references from the text, which are located bythe regular expression shown in appendix B. If a reference is found it is matched with the Basis Wetten Bestand (a list with all Dutch laws with some of their abbreviations), if a match is found the parser adds the identification number of the law to the meta data of the document (true positive). If no match is found it is counted as a false positive.

4.3.1 Initial

The parser was initially implemented to extract references within Dutch immigrant law with a precision of around 0.75. On Dutch tax law the parser performed a lot worse. Initial tests showed a precision of 0.43.

4.3.2 Improvements & results

Some adjustments have been made to the parser to improve performance on the Dutch tax domain, these are shown in table 4.1. Some abbreviations are added to the list of laws, these are shown in the improvements table. Other improvements include some additional pre-processing.

(16)

Table 4.1: Reference parser improvements

Improvement Precision before after ∆

Added ’btw’ & ’ivbpr’ 0.4398 0.4904 +0.0506

Stripped ’wet’ from abbreviations found 0.4904 0.5397 +0.0493 Stripped the year from found reference 0.5397 0.5438 +0.0041 Added ’ksb’, ’cdw’, ’ucdw’,’awda’,

’kostenwet’, ’biz’, ’bvr’ 0.5438 0.5540 +0.0102

These quick improvements lead to a total increase of 11.4% precision. Since the reference parser is not the main focus of this research project no further improvements have been made. The amount of identified and matched references are enough to continue the research.

4.3.3 Evaluation

There are several reasons why the performance of the reference parser is mediocre. Nor-mally the parser recognizes when a law has been abbreviated (e.g., ’wet op de inkom-stenbelasting’ after this: ’de wet’) and then applies the original reference identification number to all the subsequent references made by ’de wet’, but if in first instance the abbreviated law is not recognized by the parser all subsequent references are not rec-ognized, thus resulting in multiple false positives which decrease the precision score of the parser. This problem accounts for around 35% of the false positives. The remaining 65% of the false positives are a combination of unknown laws thats are referred to and the regular expression that fails to extract the right words (e.g., ’protocol’ is neglected by the regular expression but that word is used a lot within Dutch tax law).

4.4 Baseline implementation

In order to conclude someting about the performance of a similarity measure on the reference structure of Dutch tax case law documents a baseline algorithm has to be implemented. For this research project a bag-of-words (section 3.3.1) has been com-bined with TFIDF (section 3.3.2) and cosine similarity (section 3.3.3) to serve as the baseline for evaluation. The cosine similarity calculates similarity measure between each document in the test database (5959 documents) and the focus document.

In this research project the TfidfVectorizer function of the python library of SciKit-learn (Pedregosa et al., 2011) has been used for the bag-of-words and TFIDF vector creation. This function allows the user to adjust a wide variety of settings. For this experiment the only two settings were changed:

(17)

TfidfVectorizer(min_df=1, max_df=0.7)

• min_df=1: minimal document frequency (cut-off), this means that when building the vocabulary the terms with a frequency lower than the threshold will be ignored. This is done to decrease matrix size, resulting in a decrease of the time necessary to compute similarity measures. If float, the parameter represents a proportion of documents, integer absolute counts. (Pedregosa et al., 2011)

• max_df=0.7: maximum document frequency (cut-off), this means that when build-ing the vocabulary the terms with a frequency higher than the threshold will be ignored. If float, the parameter represents a proportion of documents, integer ab-solute counts. This threshold is set to remove corpus specific stop words (e.g., the, it, a, of). (Pedregosa et al., 2011)

With the resulting TFIDF matrix the similarity scores can be computed with cosine similarity (section 3.3.3). The script outputs the n-most similar documents to the focus document. In the appendix C an example is shown with n = 10.

4.5 Bag-of-references implementation

This section describes the improvement of the system by adding a similarity measure of the references to laws which are extracted from the documents. For the implementation the same steps have been taken as the steps in the baseline implementation (described in section 4.4).

In this implementation also the TfidfVectorizer function of the python library of SciKit-learn (Pedregosa et al., 2011) has been used to create the TFIDF matrix. For this implementation 2 settings have been changed from their default value:

TfidfVectorizer(min_df=1, analyzer=’word’)

• min_df=1: This setting has been set to 1 just as with the baseline implementation to reduce matrix size.

• analyzer=’word’: This settings determines if a word should be seen as a term of a character n-gram.

(18)

• (max_df=0.7: was removed for this experiment because the scarcity of references within some documents.)

With the resulting TFIDF matrix the similarity scores can be computed with cosine similarity (section 3.3.3). The script outputs the n-most similar documents to the focus document. In the appendix C an example is shown with n = 10.

4.6 Bag-of-words with bag-of-references

The final step of the algorithm combines the words similarity score and the bag-of-references similarity score by taking the average of the two scores. Some problems arise when combining the scores, this is caused by the following; if the focus document has one outgoing reference which is very general, the similarity scores mostly will be either 0.0 or 1.0, this is acceptable in later work when the reference parser works perfectly and we know the document actually has only one outgoing reference but at this stage too much documents are ignored when a focus document has binary similarity scores. An idea is to apply weights to the scores (e.g., bag-of-words = 0.7, bag-of-references = 0.3) but unfortunately this does not solve the problem of the binary similarity scores therefore we chose to only take into account the focus documents that have at least four outgoing references. This ensures that all similarity scores are not binary unless it directly means something (e.g. if the focus document has four outgoing references and another document has the same four outgoing references, it is logical that they have a similarity score of 1).

4.7 Centrality

During this project, literature research has concluded that a centrality measure is not suitable to find relevant documents in regard to a focus document. Centrality measures will find the most general or most influential documents within a group of documents (Newman, 2010), but this research focuses on finding the most relevant documents in regards to a focus document. Finding the most general or the most influential documents within a domain might be interesting for students but because this project aims to aid experts in their search for relevant documents a centrality measure will most likely decrease performance of the system.

(19)

Chapter 5

Evaluation method

In this chapter the evaluation method is described. An expert evaluation is done on the actual relevance of returned documents.

5.1 Expert evaluation

A group of experts is asked to evaluate the system by completing the following steps. They are asked to first read a focus document, this document is selected from a database and is showed on the evaluation website that was created for this purpose.

Figure 5.1: Focus document: ’Read this document so that you understand its content.’

Subsequently the experts are asked to read the six recommended documents, these con-sist of the three documents with the highest similarity scores for the baseline implemen-tation and the three documents with the highest similarity scores for the bag-of-words combined with the bag-of-references, an example is shown below. In order to determine which of the two algorithms performs best, the experts are asked to rank the documents on relevancy in regard to the focus document, this ranges from one (most relevant) to six (least relevant). Lastly the experts are asked to rate the document on relevancy in regard to the focus document, this is done to discover the performance of the algorithms in general. The decision to evaluate in this fashion was made because the intuition is that it is easier to rank documents than to determine if a document is relevant or not. This method of evaluation is rather subjective if a small amount of test subjects are

(20)

Evaluation method 16

found but if can serve to determine if the research is heading in the right direction and find possible bugs and errors (Shani and Gunawardana, 2011).

Figure 5.2: Evaluation input example

For the expert evaluation 60 experiments have been ran, this results in 60 focus doc-uments with 6 recommended docdoc-uments each. Because the goal is to determine the increase or decrease in performance when adding the bag-of-references the documents that had recommendations that occurred in both algorithms were removed, 21 focus documents remained.

5.2 Bag-of-words vs Bag-of-references

This second experiment has been done to see if the two different algorithms (i.e. base-line implementation and bag-of-references) are recommending similar documents. The idea behind this test is that if the two algorithms recommend similar documents, the two algorithms are performing as expected. If on the other hand, the two algorithms recommend completely different documents, the question rises if both the algorithms are doing something useful or they are returning documents close to random. We see how many of the best ten documents recommended by the bag-of-references algorithm are also in the best ten documents recommended by the baseline implementation. The chance this happens when the documents are selected randomly out of 5959 documents is ₅₉₅₉10 = 0.001678.

(21)

Chapter 6

Results

6.1 Results

The results are not as expected, they mainly indicate that the bag-of-references addition is decreasing the performance of the algorithm.

6.1.1 Expert evaluation

For the expert evaluation four test subjects have been found, in total 18 evaluations have been done, of which 15 unique and 3 of the same document to determine if the test subjects are ranking the documents in the same way. The evaluation consists in two parts as described in section 5.1. Both the ranking step and the general relevancy step have shown that adding the bag-of-references only decreases performance. The average of the ranking1 is in the favor of the baseline algorithm by almost one point and the average of the general relevancy step is also in favor of the baseline algorithm.

Table 6.1: Results expert evaluation

avg(rank) avg(relevancy)

baseline 3.0556 5.5741

bag-of-references added 3.9444 4.4630

These results are the opposite of what was expected, this can have be caused by many reasons which are explained in section 6.3.1.

1

1 is most relevant, 6 is least relevant

(22)

Results & Discussion 18

6.1.2 bag-of-words vs bag-of-references

In this experiment it is tested if the bag-of-words and the bag-of-references return the same recommendations. The experiment was done with 1000 focus documents and the top ten recommendations of each algorithm. For those documents, 95282 recommenda-tions were made and 1135 recommendarecommenda-tions are recommended by both algorithms. This results in 1135₉₅₂₈ = 0, 1191 = 11, 91%. This shows us that the algorithms are converging to the same point, but not enough as we discovered from the evaluation in section 6.1.1.

6.2 Conclusion

This research concluded that reference similarity is not a good way to find relevant case law documents if not all references are located and identified. Applying a centrality measure might be a good way to improve performance if evaluated by novices, but it is not a useful way of finding relevant case law documents for experts.

6.3 Discussion

6.3.1 Adding bag-of-references

The bag-of-references addition has proven to be less effective than the baseline text similarity recommender, this has different reasons. The main reason is that the reference parser did not extract all references from the documents, this is explained in section 6.3.2. Another reason is that the too much weight was assigned to the references, this was 50-50 for the combination algorithm, a different percentage in favor of the text similarity might have been better.

6.3.2 Parser

Several improvements could make the parser perform better on the Dutch tax domain. It was wrong to assume that the parser was not the main focus of the research since this is a key component when computing reference similarity.

Firstly tax domain specific reference words could be added to the regular expression, this could result in the regular expression locating more references, also the less general ones. At this point the reference parser is very good at retrieving all general references

(23)

Appendix 19

(Vredebregt, 2014), up to a precision of .95. This number is achieved because a great amount of interesting references, that were not in the scope of the research conducted by Vredebregt (2014), are counted as true negatives instead of false positives, thus increasing precision. These references are most interesting because they are most differentiating between cases.

In addition a list of abbreviations could be dynamically updated on document level. This means that when the writer of a document states that ’wet op de inkomstenbelasting’ is abbreviated to ’WIB’ (which is a unusual abbreviation for this law), the abbreviation refers to the original identification number. This can improve the precision and recall of the system and thus improving performance of the algorithm.

Another improvement can be to introduce tokens for references that are not identified. For instance if a reference A is found but it can not be resolved to a law it can still be used as token At, if in another document a reference A is found but it can not be resolves it can still be matched to token At, thus increasing the similarity between the two documents.

A labelled data set could be very useful for future research because evaluation is very time consuming. Experts needed an average of 12 minutes to evaluate one focus document, this consists of reading 7 case law documents. If a labelled data set is available testing can be done a lot faster and minor changes in the algorithm can also be measured. This way evaluation is subjective and therefore it is hard to discover changed in performance after small changes.

6.3.3 Centrality

During this project, literature research has concluded that a centrality measure is not suitable to find relevant documents in regard to a focus document. Centrality measures will find the most general or most influential documents within a group of documents (Newman, 2010), but this research focuses on finding the most relevant documents in regards to a focus document. Finding the most general or the most influential documents within a domain might be interesting for students but because this project aims to aid experts in their search for relevant documents a centrality measure will most likely decrease performance of the system.

(24)

Appendix A

XML Case Law Document

As retrieved from www.rechtspraak.nl:

<?xml version=’1.0’ encoding=’utf8’?> <open-rechtspraak> <rdf:RDF> <rdf:Description> <dcterms:identifier>...</dcterms:identifier> <dcterms:format>...</dcterms:format> <dcterms:accessRights>...</dcterms:accessRights> <dcterms:modified>...</dcterms:modified> <dcterms:issued>...</dcterms:issued> <dcterms:publisher>...</dcterms:publisher> <dcterms:language>...</dcterms:language> <dcterms:creator>...</dcterms:creator> <dcterms:date>...</dcterms:date> <psi:zaaknummer>...</psi:zaaknummer> <dcterms:type>...</dcterms:type> <psi:procedure>...</psi:procedure> <dcterms:coverage>...</dcterms:coverage> <dcterms:spatial>...</dcterms:spatial> <dcterms:subject>...</dcterms:subject> <dcterms:references>...</dcterms:references> <dcterms:hasVersion> <rdf:list> <rdf:li>...</rdf:li> <rdf:li>...</rdf:li> 20

(25)

Appendix 21 <rdf:li>...</rdf:li> <rdf:li>...</rdf:li> <rdf:li>...</rdf:li> </rdf:list> </dcterms:hasVersion> <rdf:Description> <dcterms:identifier>...</dcterms:identifier> <dcterms:format>...</dcterms:format> <dcterms:accessRights>...</dcterms:accessRights> <dcterms:modified>...</dcterms:modified> <dcterms:issued>...</dcterms:issued> <dcterms:publisher>...</dcterms:publisher> <dcterms:language>...</dcterms:language> <dcterms:title>...</dcterms:title> <dcterms:abstract/> </rdf:Description> </rdf:RDF> <preserve:inhoudsindicatie> [SUMMARY] </preserve:inhoudsindicatie> <preserve:uitspraak> [VERDICT] </preserve:uitspraak> </open-rechtspraak>

(26)

Appendix B

Regular expression for retrieving

references

regex = (

# ’(?:\.\s+)([A-Z].*?’ # Matches the entire sentence

’([^a-zA-Z](?:(?:[Aa]rtikel|[Aa]rt\\.?) ([0-9][$0-9a-zA-Z:.$]*) |[Bb]oek ([0-9][$0-9a-zA-Z:.$]*)

|[Hh]oofdstuk ([0-9][$0-9a-zA-Z:.$]*)),?’ # Matches Artikel and captures the number (and letter) combination for the article

’((?:\s+(?:lid|aanhef en lid|aanhef en onder|onder)? (?:[0-9a-z ]|tot en met)+,?’

# matches "lid .. (tot en met ...)"

’|,? (?:[a-z]+ lid|[a-z]+ en [a-z]+ lid),?)*)’

# matches a word followed by "lid" e.g. "eerste lid" ’(,? onderdeel [a-z],?)?’ # captures "onderdeel ..." ’(,? sub [0-9],?)?’ # captures "sub ..."

)

(27)

Appendix C

Examples of top 10

recommendations

Top ten recommendations with the bag-of-words, TFIDF and cosine similarity.

similarity vector for ECLI-NL-RBAMS-2010-BO1378.xml Place 0 is itself, and therefore it has a value of 1.0 0 - ECLI-NL-RBAMS-2010-BO1378.xml : 1.0 1 - ECLI-NL-RBALK-2008-BC6103.xml : 0.721333351833 2 - ECLI-NL-RBDOR-2010-BM0117.xml : 0.686513967946 3 - ECLI-NL-RBALK-2011-BQ0469.xml : 0.64270130861 4 - ECLI-NL-RBARN-2006-AY9465.xml : 0.639606249155 5 - ECLI-NL-RBDOR-2010-BO5257.xml : 0.615909353675 6 - ECLI-NL-RBAMS-2011-BV6758.xml : 0.612440694902 7 - ECLI-NL-RBDOR-2010-BM2339.xml : 0.610510053653 8 - ECLI-NL-GHAMS-2013-CA2684.xml : 0.608986502282 9 - ECLI-NL-RBAMS-2011-BR6478.xml : 0.596049227139 10 - ECLI-NL-RBHAA-2006-AZ2187.xml : 0.591708776125

Top ten recommendations with the bag-of-references, TFIDF and cosine similarity.

similarity vector for ECLI-NL-RBAMS-2010-BO1378

These are the top ten recommendations with only references Place 0 is itself, and therefore it has a value of 1.0 0 - ECLI-NL-RBAMS-2010-BO1378 : 1.0

1 - ECLI-NL-RBSGR-2008-BD1495 : 0.891948002503 2 - ECLI-NL-RBUTR-2010-BU4490 : 0.72704418227

(28)

Appendix 24 3 - ECLI-NL-RBOVE-2014-951 : 0.692230862824 4 - ECLI-NL-RBALK-2008-BD7537 : 0.685972757339 5 - ECLI-NL-RBALK-2007-BB9105 : 0.685972757339 6 - ECLI-NL-RBAMS-2011-BQ4241 : 0.647746243493 7 - ECLI-NL-RBALK-2008-BC4175 : 0.637261231395 8 - ECLI-NL-RBALK-2012-BX0044 : 0.58874966142 9 - ECLI-NL-RBALK-2008-BD5937 : 0.576297362383 10 - ECLI-NL-GHAMS-2001-AD8208 : 0.569354443333

(29)

Bibliography

Newman, M. (2010). Networks: an introduction. Oxford University Press.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Robertson, S. (2004). Understanding inverse document frequency: on theoretical argu-ments for idf. Journal of Documentation, 60(5):503–520.

Shani, G., Brafman, R. I., and Heckerman, D. (2002). An mdp-based recommender system. In Proceedings of the Eighteenth conference on Uncertainty in artificial intel-ligence, pages 453–460. Morgan Kaufmann Publishers Inc.

Shani, G. and Gunawardana, A. (2011). Evaluating recommendation systems. In Rec-ommender systems handbook, pages 257–297. Springer.

Strohman, T., Croft, W. B., and Jensen, D. (2007). Recommending citations for aca-demic papers. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 705–706. ACM.

Vredebregt, B. (2014). New interface for wetten.nl.

Winkels, R., Boer, A., and Plantevin, I. (2013). Creating context networks in dutch legislation. Available at SSRN 2368852.

Winkels, R., Boer, A., Vredebregt, B., and van Someren, A. (2014). Towards a legal recommender system.

Wyner, A. and Bench-Capon, T. (2007). Argument schemes for legal case-based reason-ing. In JURIX, pages 139–149.

Finding relevant legal case law documents by reference structure