Classifying civil law cases involving Brussels I Regulation and recast using machine learning techniques

(1)

Classifying civil law cases

involving Brussels I Regulation

and recast using machine learning

techniques

Roos Slingerland 10775935

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. A.W.F. Boer

Leibniz Center for Law Faculty of Law University of Amsterdam

Vendelstraat 8 1012 XX Amsterdam

(2)

Summary

The European Union created a set of rules about the jurisdiction, recognition and enforcement of judgments in civil and commercial matters involving indi-viduals resident in different Member States. This Brussels I Regulation 44/2001 came into force in 2002, but was replaced by the Brussels I Regulation Recast 1215/2015 in 2015. Analysis of the beginning of this change of law, or law change in general is important for legal workers, but since the lack of agree-ment of referencing to this regulation, finding these cases manually is hard. In this thesis a AI solution for finding these cases was proposed and the grey area was investigated with a legal expert. This thesis focuses on the classification of 15.000 judgments within civil law from rechtspraak.nl. Based on the occur-rence of words from a list of keywords, each case was classified true or false. When this keywords were then deleted from the text, it was then investigated to what extent the machine learning algorithms decision trees, random forest and gradient boosting trees were able to classify the cases using the remaining words as features. This was done with three different classification systems: one to distinguish the Brussels I Regulation from the other civil law cases, one to distinguish the old Regulation from the not-old regulation and one to dis-tinguish the recast from the not-recast cases. This was done with respectively the following optimal accuracy’s: 0.9450, 0.853, 0.778. However, it the data was scarce and the initial labelling doubtful, so further research should focus on these aspects to improve analyzing law change.

Acknowledgements

I would like to thank my supervisor, dr. A.W.F. Boer for creating the tender of this thesis and for guiding me through the maze of the legal domain. My gratitude also goes to dr. S. van Splunter for supervising the process and sharing his own struggles as a graduate student. Mister M. de Rooij helped me with interpreting the results and shared his knowledge about the European Union and law in general with me. I am grateful for his insights and for the time he wanted to take to come all the way to Amsterdam from Brussels. I would also like to take this moment to thank mister T. Leunissen and mister J. van Gerwen for their support. Not only were you my personal tech-support, you kept believing in my academic skills, even at the moments that I did not.

(3)

1 Introduction 4 2 Literature Review 6 3 Tools 7 4 Classifier 1 8 4.1 Method . . . 8 4.1.1 Data . . . 8 4.1.2 Pre-processing . . . 8 4.1.3 Experimental setup . . . 10 4.1.4 Evaluation . . . 10 4.2 Results . . . 11 4.3 Grey area . . . 12 5 Classifier 2 15 5.1 Method . . . 15 5.1.1 Data . . . 15 5.1.2 Pre-processing . . . 15 5.1.3 Experimental setup . . . 16 5.1.4 Evaluation . . . 16 5.2 Results . . . 16 6 Discussion 19

7 Conclusions and future work 20

(4)

1 Introduction

The field of AI and Law has existed for multiple decades but reached its peak in the 1980s according to Rissland, Ashley, and Loui [14]. The connection be-tween law and AI works in both directions and this synergy has been serving as a two-way catalyst for the development of new approaches. The synergy is so fruitful, because of the many similarities between the domains. The article of Rissland, Ashley, and Loui [14] lists many of them, but one of the most relevant ones is the fact that legal concepts and law in general evolve. AI is capable of both detection and analysis of these, which can be useful for legal workers.

An institute that also deals with a great deal of changes is the European Union (EU). With 28 Member States under its wings, it is necessary to provide struc-ture. In the field of law this was done by the initiating of European Case Law Identifier (ECLI).This is an initiative of the Council of the European Union to increase the readability for case law in Europe [12]. In 2016 there were 18 Mem-ber States working on the implementation, or already using it [17], among which the Netherlands joined in 2013. This replaced the Landelijk Jurisprudentienum-mer (LJN), that was in use upon 2013 in the Netherlands and Dutch verdicts can be found at the open data initiative rechtspraak.nl. The ECLI-codes consist of five parts, separated by colons:

1. the term ECLI

2. country code

3. court identifier

4. year of decision

5. specific identifier

Above this code, the ECLI contains uniform metadata to improve search options for European case law, such as subject, publisher and place of court of justice. Since 2016 a search engine for European law, based on ECLI, has been online [12]. Because of the implementation of the ECLI-codes, a foundation the size of the EU was built that can be used by techniques from AI.

A regulation that involves all Member States of the EU is the Brussels I Regula-tion, which is a set of rules about the jurisdicRegula-tion, recognition and enforcement of judgments in civil and commercial matters involving individuals resident in different Member States of the European Union and the European Free Trade Association (EFTA). The EU Regulation (EC) 44/2001, as the regulation is strictly called, was created by the Council of the European Union and came into force in March 2002. However, the Court of Justice of the European Union (CJEU) stated that Article 23 regarding application of jurisdiction was too con-cise and therefore suggested a recast. Hence, a recast was created, but this time made by both the Council and the European Parliament. It was implemented in 2015 and it is officially called Regulation (EU) No 1215/2012. The largest difference between the old and new regulation is that in the recast the rules of Brussels I were extended to defendants not domiciled in a Member State of the EU [15].

(5)

Although this recast was supposed to bridge all the gaps of the 44/2001, Danov [3] states that the application of this recast has been largely overlooked by both policymakers and literature. Since the regulation is in the first place still new and opinions differ as to its usage, more insight in the start period of the recast is useful. Secondly, AI research tends to focus on data about the end of a period, yet legal workers are interested in the beginning of a change [1]. Thirdly, AI research in the legal domain is often focused on laymen, although it is the group of legal experts that uses the designed tools [1], and research should therefor get more into depth. More insight in the usage of the Regulation would be of great value for legal workers and the EU, but research of both the Regulation and Recast is complicated. Since there are no agreements regarding the referencing of the regulation or the recast, these cases are hard to find manually; however the usage of AI could help finding them. This thesis will therefor focus on text analysis of cases involving the Brussels I Regulation and will answer the follow-ing research question:

To what extent is it possible to design a supervised classification system that uses judgments of civil law cases from rechtspraak.nl to distinguish:

1. cases about Brussels I Regulation from all the other civil law cases?

2. cases from the Brussels I Regulation Recast from the Brussels I Regulation?

Answers to these questions would indicate whether or not a system could be used to reliably distinguish cases involving the Regulation or Recast and then further research is possible. The method of binary classification could then be extended to other Member States or even to other regulations that endured changes of law. It could then be used by experts instead of laymen and could be used at the start of a new law instead of at the end, which are big advantages as stated before. It is expected that because the second classification problem has to deal with a smaller set of data, those results will score lower on accuracy than the results of the first classification problem.

The rest of this thesis is organized as follows. First, more information about text analysis in the legal domain will be outlined in the Literature Review and some more information will be given about the tools used in the section Tools. Then the thesis will be split into the first classifier (Brussels I Regulation cases vs. other civil law cases) and the second classifier (Brussels I Regulation vs. Brussels I Regulation recast). In the Method section will then be explained how a list of keywords given by legal expert Michiel de Rooij from the T.M.C. Asser Institute was used to label the data, which pre-processing steps and classifica-tion algorithms were used and how the evaluaclassifica-tion was obtained. The results that were also discussed with mister de Rooij, will be available in the sections results. The subject of that meeting was whether the grey area of false positives and negatives is still informative when the dataset is small. After these results, a critical discussion about both classifiers can be found in the section discussion. And at last in the conclusion an answer will be given to the question to what extent a binary classification system can distinguish cases involving Brussels I Regulation.

(6)

2 Literature Review

Text classification is a problem that has been studied in many domains, includ-ing the legal domain. Bruninclud-inghaus and Ashley [2] explain this demand by the desire of attorneys to find the most relevant cases and argue that this informa-tion need caused the wide interest of classificainforma-tion in the legal domain. De Maat, Krabben, and Winkels [4] for example describe a study about the classification of legal sentences and the comparison of machine learning techniques against knowledge based classification. It describes the method of bag-of-words, which will also be used in this thesis. Gon¸calves and Quaresma [7] applied multiple algorithms on European Portuguese legal texts and stated that legal texts are very suitable for text classification because of the unstructured format of the data. Bag-of-words method was used, but also the part of speech tags and lem-matisation were applied to study the linguistics in more detail. It argues about the shortcomings of the bag-of-words method - the method being too simplistic to obtain good results - which was also mentioned by Bruninghaus and Ashley [2] and De Maat, Krabben, and Winkels [4]. Above that, the researchers state that the legal language has a unique style and that the vocabulary and word-distributions differ from ’regular’ English.

Besides the right algorithms, proper feature selection is of great importance. Not only is it necessary to make large problems computationally efficient [5], but it can improve the accuracy substantially [5]. This increase of accuracy could also mean that less data is needed to obtain good results, which is a big advantage of a system [5]. Also for this study proper feature selection and suf-ficient amount of data should be kept in mind.

Since in this study the initial labelling is done by a list of keywords but can not to be said certain, analysis of the false positives and false negatives is im-portant. Sokolova and Lapalme [16] researched different performance measures and discussed true positives, false positives, true negatives and false negatives, also known as the confusion matrix. It is stated in that article that false nega-tives are often due to manual labelling or unreliable labels. A large amount of false positives is due to data with many outliers that cannot be explained by the mainstream data (also called counterexamples) [16]. These counterexam-ples often occur in case of subjective labbeling according to the writers. With respect to this thesis and its subjective and perhaps unreliable labelling, false positives and false negatives are expected.

Zheng [18] made an analysis of a data set obtained from rechtspraak.nl for cases until October 2016 and used MAchine Learning for LanguagE Toolkit (Mallet). Rechtspraak.nl is an initiative to increase the possibilities of open data, an idea that has been growing interest over the last 20 years [10] and that will be used in this thesis as well. Firstly, an indication of field of law was made, resulting civil law as the biggest in both the old (77%) and new (73%) regulation. Topic modelling was used to generate multiple topics and then multiple classifiers were trained and tested. Variations of the ratio between train- and testdata and also the amount of documents (judgments) per class were executed. The follow-ing machine learnfollow-ing algorithms were used: NaiveBayes, MaxEnt, DecisionTree and C45. The first classifier did not seem able to find the difference between a document from the Brussels I Regulation and a document from the recast.

(7)

The proposed solution was to give the system more data about one of the two regulations, however this was unsuccessful. Thus the decision was made to train the system without the distinction in themes. By doing this, the results of this thesis can be compared to those results. For the old regulation an accuracy of 0.64 was obtained, and for the recast an accuracy of 0.78 was obtained. There are multiple differences between the report of [18] and this thesis. First of all, this thesis will not use topic modelling, but will see the judgments as a bag-of-words, where pre-processing should be executed to decrease the number of features to improve efficiency and accuracy. Secondly, whereas [18] retracted more then 2 million cases (also unpublished) about the entire field of law up to October 2016, this thesis worked on 15.000 published civil law cases until May 2017.

3 Tools

For this thesis multiple tools were used and will now be discussed.

KNIME is an open source platform focusing on data mining, manipulation, vi-sualization and prediction. With its easy user interface, many machine learning applications can be used by building a workflow with different building blocks [8]. An example of the workflows built for this thesis can be found in the Ap-pendices.

MongoDB is also an open source and free tool, focusing on storing data in JSON-like documents. It is possible to handle large amounts of data, to change data later on and to retrieve specific sets of data based on specific requirements. For example: retrieve all documents that are classified positive and contain 3 keywords. Each datapoint is stored as an object, where multiple instances can be added with different sizes [11].

(8)

4 Classifier 1

4.1 Method

4.1.1 Data

From rechtspraak.nl 15,000 cases were retrieved, starting with the newest in May 2017. From these cases the XML was obtained, including all sorts of tags. For each case a new object was made in MongoDB with the XML as instance and the title as meta-data which had to be unique in the database. The tags were then stripped off, the type of document was changed into txt-files, and these were added as new instances to each case in MongoDB. Then these txt-instances were checked for the containment of the keywords. Michiel de Rooij, who also helped interpreting the results, made a list of words in multiple languages that indicate both the 44/2001 and 1215/2015. In Dutch the combination of these two lists was as follows: EEX-Vo; EG-Executieverordening; EEX-Verordening; Brussel Verordening; Brussel I; 44/2001; EEX-Vo II; Brussel Ibis; Brussel I-bis; EU-executieverordening; Brussel I bis-Verordening; EEX Verordening II; Brussel 1 bis-Vo; Brussel 1 bis; Herschikte EEX-Vo; 1215/2015. This list was changed in one regular expression and an new instance ‘clean’ was created. In this instance the txt-file without the keywords was inserted and moreover all the terms that matched the regex were listed as new instances and a new meta-data instance was set ‘true’ if the case did contain a keyword, ‘false’ if it did not. In the example of figure 2 in the Appendices can be seen how all the instances are related.

It is important to understand that the labels true and false are here used as golden standard. However, this does not mean that there are no cases involving the regulation that are labeled false. Since this study is researching the reli-ability of such a classificationsystem, the wrong classified cases are important as well. As stated in the introduction, there are no agreements regarding ref-erencing to this regulation, so the number of cases involving the regulation is expected to be larger than the number of true cases.

Once this was done, an analysis of the true and false cases could be made. From the 15,000 cases, there were 360 cases classified true, and 14,640 classified false by using the keywords. In table 1 can be seen in how many documents multiple keywords appeared and in table 2 the frequency of each keyword can be found. Here can be seen that there are even cases where 6 keywords were found (namely ECLI:NL:GHSHE:2017:1873 and ECLI:NL:GHSHE:2017:1874), however, most of the documents only contained 1 keyword.

From all these instances, a CSV-file of 15,000 rows was created with the columns ‘title’, ‘document’ and ‘label’ containing the title of the case, the txt-file of judgment without keywords and the classified true or false respectively. This CSV-file was then ready to be handled by KNIME.

4.1.2 Pre-processing

To be able to use algorithms on the data, a binary vector of unique terms for each case was needed. However, since the texts of the cases were sometimes very extensive and the number of positive examples scarce, proper pre-processing was important. By trial and error of experiments, there was decided to take the following pre-processing steps.

(9)

Number of keywords frequency 1 214 2 103 3 32 4 8 5 1 6 2 Total 565

Table 1: The frequency of appearance of multiple keywords in one document.

Keyword fequency 1215/2012 103 44/2001 72 Brussel I 134 Brussel I bis 0 Brussel I Bis-Verordening 8 Brussel I bis-Vo 0 Brussel Ibis 12 Brussel I-bis 3 Brussel I-Verordening 7 EEX-Verordening 123 EEX-Verordening II 7 EEX-Vo 71 EEX-Vo II 0 EG-Executieverordening 1 EU-Executieverordening 1 Herschikte EEX-Vo 23 Total 565

Table 2: The number of documents each keyword appears in.

1. For each document delete:

Terms consisting of the punctuation: !#$%()*+,./:;¡¿=?@ˆ ‘— [] Terms consisting only of numbers

Terms consisting of less than 4 characters

Terms that equal words in stop list (see figure 5 in Appendices) Terms that occur in less than 1% of the documents

Terms that occur in more than 95% of the documents

2. For each term in each document:

Convert all characters in lowercase characters Use Snowball Stemmer for Dutch language

This resulted in around 6000 unique terms that were then used as features. Within these 6000 a few terms stood out, because of their similarities with the keywords. These terms were further investigated of appearance in classified ’true’ and ’false’ cases and this analysis can be found in table 7 in section 4.3,

(10)

where also expert mister de Rooij gave his opinion. To be certain not to train the model on wrong labelled cases, the 34 cases containing one of these ’grey keywords’, were excluded from the list before selecting the test- and traindata.

4.1.3 Experimental setup

To create a baseline of 50%, above the 360 positive examples, 360 negative ex-amples were drawn randomly. To enlarge reliability of this random sample, this was done with 10 different random seeds. Each experiment of 720 cases was then split in 504 cases to train on (70%) and 216 cases to test the classification on (30%).

Since early experiments (see table 11 in Appendices) already showed the poor results of the algorithms naive Bayes (accuracy of 0.51) and k-nearest neighbour (accuracy of 0.77), these were excluded in further experiments. The remaining algorithms were decision trees, gradient boosting trees and random forest and will now be explained more.

Decision trees is a tree-structured algorithm, developed around thirty years ago, where each internal node presents a test on an attribute, each branch corresponds to an attribute value and each leaf node represents a class label. Decision trees can deal with noisy data and function well with disjunctive hy-potheses [9]. It does not have any requirements about the distribution of the data (for example Naive Bayes requires independent variables), since it is a non-parametric technique. Lawrence et al. [9] mention the disadvantage of poor performance on skew data.

Gradient Boosting trees is an algorithm that keeps improving its model by calculating the error and fitting new Decision Trees to the corresponding cost function, and by doing so increasing its complexity. Lawrence et al. [9] state that in most cases it outperforms decision trees or at least performs equally. It is said to deal with overfitting better than decision trees.

Random forest is a machine learning algorithm that again uses decision trees, by learning multiple decision trees simultaneously. It then chooses the most com-mon label of all the models. This has the advantage of decreasing the overfitting problem that decision trees tend to have. But researchers Prasad, Iverson, and Liaw [13] also state the disadvantages of time and computational resources and the ’black-box’ characteristic.

4.1.4 Evaluation

In the pre-processing all the cases were labelled true or false based on the occur-rence of a few keywords. The research question is to what extent the classifier can classify cases based on their texts without these keywords. This could be measured by using the relative number of correct classified cases, also known as the accuracy. Precision is the number of correctly classified positive examples divided by the number of examples labeled by the system as positive and recall is the number of correctly classified positive examples divided by the number of positive examples in the data. The F1-measure is the harmonic mean of both precision and recall. Since this study is mainly interested in cases that involve the regulation (positive cases), but are not classified as true because of a lack of keywords or lack of agreement on referencing, recall is in this case more

(11)

im-portant than precision. A measure that weighs recall higher than precision is the F2-measure and for each experiment this value was calculated, its formula is as follows:

F2= 5 ∗_{4∗precision+recall}precision∗recall

Besides the calculated accuracy, precision, recall, F1 and F2 values, for each experiment a list of false positives and false negatives was created in the Appen-dices. False positives are classified as true, while the golden standard classified them as false because of the lack of keywords. It could be then possible that these cases are indeed involving the regulation and further research could focus on these lists.

4.2 Results

From tables 3, 4 and 5 can be seen that the mean of the 10 samples indicate the highest accuracy by the usage of random forest (0.9450) and a almost equal accuracy for decision trees (0.9053) and gradient boosting trees (0.9059). For clarity the bottom lines (the mean values) of all three tables were copied and put together in table 6. From this table it becomes clear that random forest scores best on recall, precision, F1-score and F2-score and is therefor on all performance measures the best scoring. Decision trees is almost always the second best, except for accuracy and the F2-measures. On that performance measure gradient boosting is performing better and decision trees come third.

Sample Recall- Recall

+ Precision -Precision + F1 -F1 + F2 -F2 + Accuracy 1 0.885 0.923 0.926 0.881 0.905 0.901 0.337 0.561 0.903 2 0.901 0.972 0.971 0.904 0.935 0.936 0.957 0.914 0.935 3 0.927 0.832 0.850 0.918 0.887 0.873 0.848 0.911 0.880 4 0.935 0.917 0.918 0.935 0.927 0.926 0.921 0.932 0.926 5 0.873 0.907 0.906 0.876 0.889 0.890 0.900 0.879 0.889 6 0.946 0.857 0.876 0.938 0.910 0.896 0.872 0.931 0.903 7 0.904 0.920 0.913 0.912 0.908 0.916 0.919 0.906 0.912 8 0.891 0.860 0.867 0.885 0.879 0.872 0.865 0.886 0.876 9 0.929 0.941 0.929 0.941 0.929 0.941 0.933 0.958 0.935 10 0.876 0.913 0.917 0.872 0.896 0.892 0.905 0.884 0.894 Mean 0.9067 0.9042 0.9073 0.9062 0.9065 0.9043 0.8456 0.8761 0.9053

Table 3: Results of classifying different draws of negative samples against all (360) positive samples, using decision trees, whereas + means case involving the regulation and - means not.

(12)

Sample Recall- Recall + Precision -Precision + F1 -F1 + F2 -F2 + Accuracy 1 0.903 0.923 0.927 0.897 0.915 0.910 0.918 0.907 0.912 2 0.865 0.962 0.960 0.872 0.910 0.915 0.943 0.882 0.912 3 0.918 0.841 0.856 0.909 0.886 0.874 0.854 0.905 0.880 4 0.926 0.917 0.917 0.926 0.922 0.922 0.919 0.924 0.922 5 0.909 0.925 0.926 0.908 0.917 0.917 0.922 0.912 0.917 6 0.893 0.905 0.909 0.888 0.900 0.896 0.901 0.896 0.899 7 0.913 0.876 0.872 0.917 0.892 0.896 0.884 0.905 0.884 8 0.864 0.879 0.880 0.862 0.872 0.870 0.875 0.867 0.871 9 0.939 0.898 0.886 0.946 0.912 0.922 0.908 0.928 0.917 10 0.912 0.913 0.920 0.905 0.916 0.909 0.912 0.913 0.912 Mean 0.9042 0.9039 0.9053 0.9030 0.9042 0.9031 0.9035 0.9040 0.9059

Table 4: Results of classifying different draws of negative samples against all (360) positive samples, using gradient boosting trees, whereas + means case involving the regulation and - means not.

+ Precision -Precision + F1 -F1 + F2 -F2 + Accuracy 1 0.912 0.942 0.945 0.907 0.928 0.925 0.935 0.918 0.926 2 0.964 0.962 0.964 0.962 0.964 0.962 0.962 0.964 0.963 3 0.936 0.944 0.945 0.935 0.941 0.940 0.942 0.938 0.940 4 0.963 0.954 0.954 0.963 0.959 0.959 0.956 0.961 0.959 5 0.927 0.953 0.953 0.927 0.940 0.940 0.948 0.932 0.940 6 0.920 0.971 0.972 0.919 0.945 0.944 0.960 0.930 0.945 7 0.904 0.965 0.959 0.916 0.931 0.940 0.954 0.914 0.935 8 0.945 0.925 0.929 0.943 0.937 0.934 0.929 0.942 0.935 9 0.929 0.941 0.929 0.941 0.929 0.941 0.941 0.929 0.935 10 0.965 0.981 0.982 0.962 0.973 0.971 0.977 0.968 0.972 Mean 0.9365 0.9537 0.9532 0.9375 0.9447 0.9456 0.9505 0.9397 0.9450

Table 5: Results of classifying different draws of negative samples against all (360) positive samples, using random forest, whereas + means case involving the regulation and - means not.

4.3 Grey area

This study tries to give an answer to the question to what extent it is possible to make a supervised classification system to assist this kind of research into a newly developing field of law after a redesign of a legal framework. The result of the first classification was demonstrated in the earlier section. However, as explained in the evaluation sections, the data is scarce and it is not certain

(13)

Sample Recall- Recall + Precision -Precision + F1 -F1 + F2 -F2 + Accuracy Decision trees 0.9067 0.9042 0.9073 0.9062 0.9065 0.9043 0.8456 0.8761 0.9053 Gradient boosting 0.9042 0.9039 0.9053 0.9030 0.9042 0.9031 0.9035 0.9040 0.9059 Random forest 0.9365 0.9537 0.9532 0.9375 0.9447 0.9456 0.9505 0.9397 0.9450

Table 6: Overview of average of 10 samples of multiple scoring methods using different algorithms, whereas + means case involving the regulation and - means not.

that the initiating labelling was indeed the right labelling. Therefor, the false positives and false negatives of both classifiers (see Appendices) might be more than just a random mistake from the classifier. False positives for example might indicate cases that are indeed involving the regulation but because of their lack of keywords, were classified as false. Further investigation of this grey area is there for needed.

This could be a result of the lack of some essential keywords, in other words that the list of keywords used for the labelling was not sufficient enough. In section 4.1.2 a few grey keywords were already given. In order to indicate whether or not these ’keywords’ should be added to the list of keywords, a legal expert from Brussels was asked for his expertise.

Terms Label true Label false eg-verordening 13 17 eex-verdrag 24 8 i-bis 28 1 i-verordening 9 2 i-vo 6 3 ii-vo 25 1 ibis 24 2 brussel-ii-bis 1 1 Total 130 35

Table 7: ’Grey terms’ that were found as features, but needed extra attention, with the frequency of appearance in true and false labeled data from the first classifier.

Mister de Rooij [15] went over the keywords and concluded the following. The term eex-verdrag stands for Europees Executieverdrag and was the predeces-sor of the old regulation 44/2001. It was created in 1968 and has been rectified multiple times. This agreement was in use parallel from European law. These two were then merged in the Brussels I Regulation (44/2001). For example case ECLI:NL:RBAMS:2007:BA1277, which is classified as false, but does contain the keyword eex-verdrag, involves an American company with flashlights. It is very similar to positive cases, and that can now be explained because the eex-verdrag did involve similar cases, but as from March 1 2002, these cases were under Brussels I.

(14)

Brussels I Regulation, but this is not always the case. As an example case ECLI:NL: RBAMS:2017:298 regarding toys from Canada does contain ’eg-verordening’ but has nothing to do with Brussels I.

The cases containing the terms i-verordening, i-vo and ii-vo that were classified as false, applied to the Rome I-verordening and Rome II-verordening and do therefore not involve Brussels I.

Ibis as grey term involved the Ibis hotel chain company and have therefor noth-ing to do with Brussels I.

Brussels-II-Bis at last is similar in characters, but involves different cases. This Brussels-II-Bis Regulation (2201/2003) and its predecessor Brussels-II (1347/2000) involve international divorces, parental authority and child kidnapping. This regulation and its recast do have the same characteristics of international uni-fication in common with Brussels I and its recast. However, since the content of the Brussels II cases are completely different, these cases do not have overlap with Brussels I.

Mister de Rooij also came up with the Lugano Convention from 1988 and 2007 (also called EVEX) and stated that this convention is similar to Brussels I and that he for this reason would not be surprised to see a few of these cases in the list of false positives.

Finally, Mr de Rooij expressed an interest in using the developed classifier on unseen and unclassified decisions from the classified databases of unpublished decisions (unpublished because they are deemed of no interest) held by the courts themselves. In the context of this thesis access to this corpus certainly cannot be realized in time (if at all), but for the project this is certainly useful. So in conclusion it is possible to say that these terms should not be added to the list of keywords, because they do not strictly indicate cases of Brussels I and that the research could be extended by adding the unpublished cases from the private collections of the courts

(15)

5 Classifier 2

5.1 Method

5.1.1 Data

For the second classifier the goal was to reliably distinguish old cases from new cases. So all the previous negative examples were excluded and only the 360 positive examples were included. The keywords that were also used in the first classifier, were used again, however split in the Brussel I Regulation and Brussel I Regulation recast (as was originally instructed by mister de Rooij) as follows:

• Brussels I Regulation: EEX-Vo ; EG-Executieverordening; EEX-Verordening; Brussel I-Verordening; Brussel I; 44/2001

• Brussels I Regulation recast: EEX-Vo II; Brussel Ibis; Brussel I-bis; EU-executieverordening; Brussel I bis-Verordening; EEXVerordening II; Brus-sel 1 bis-Vo; BrusBrus-sel 1 bis; herschikte EEX-Vo; 1215/2012

In MongoDB then new instances were created, indicating whether cases were labelled old or new, where it was possible to be both. This labelling could be done, because the keywords were already in MongoDB as the instance ’term’. The labelling resulted in 310 new cases, 124 old cases, of which 74 were labelled both new and old. In table 8 the frequency of each keyword can be found in both old and new cases.

Keyword Frequency in old Frequency in new

1215/2012 65 103 44/2001 72 18 Brussel I 134 18 Brussel I bis 0 0 Brussel I Bis-Verordening 2 8 Brussel I bis-Vo 0 0 Brussel Ibis 6 12 Brussel I-bis 2 3 Brussel I-Verordening 7 1 EEX-Verordening 123 37 EEX-Verordening II 2 7 EEX-Vo 71 29 EEX-Vo II 0 0 EG-Executieverordening 1 0 EU-Executieverordening 1 1 Herschikte EEX-Vo 12 23 Total 498 260

Table 8: The number of documents each keyword appear in, divided in old cases (44/2001) and new cases (1215/2015) where overlap is possible.

5.1.2 Pre-processing

For the pre-processing of the second classifier the same steps were taken as for the first classifier, namely:

(16)

1. For each document delete:

Terms consisting of the punctuation: !#$%()*+,./:;¡¿=?@ˆ ‘— [] Terms consisting only of numbers

Terms consisting of less than 4 characters

Terms that equal words in stop list of Appendices (see figure 5 in Appendices)

Terms that occur in less than 1% of the documents Terms that occur in more than 95% of the documents

2. For each term in each document:

Convert all characters in lowercase characters Use Snowball Stemmer for Dutch language

5.1.3 Experimental setup

The classification problem was split up in two parts, old against not old and new against not new. By doing so, the classification was still binary, just like the first classification problem. Whereas in the first classification problem there were 50% positive examples and 50% negative examples and the baseline could be set on 50%, in the second classification problem that was different.

For the old cases the baseline was 310

360∗ 100% = 86.1% and for the new cases the

baseline was 360−124

360 ∗ 100% = 65.6%. Again, the data was split into train- and

testdata using a 70/30 ratio, meaning . Because of the small number of new cases, also the 80/20 ratio was used for that part. The same machine learning algorithms were used as in the first classifier: decision trees, gradient boosting trees and random forest for both classification parts and further explanation about these algorithms can be found in 4.1.3. For the algorithm gradient boost-ing trees, this time also multiple settboost-ings were performed for the number of maximum treedepth. In the first classifier this hardly changed the accuracy, but in this classifier it did, as can be seen in further sections.

5.1.4 Evaluation

The evaluation of the results of the two sub-parts of the second classifier was done by calculating recall, precision, F1-measure, F2-measure for both the pos-itive and negative examples and the overall accuracy. The results of the classi-fication of the old and new cases can be found in respectively table 9 and table 10. Again, it should be noted that the labelling that was obtained in the pre-pocessing step, is not necessarily the right labelling, and cases that are classified wrong are interesting. Again lists of false positives and false negatives can be found in the Appendices (table 15 - 20).

5.2 Results

From table 9 can be concluded that the baseline of 86.1% was never reached by using the machine learning algorithms. The highest accuracy was obtained by using gradient boosting trees with a threshold of maximum treedepth set on 4 (0.853). Also when looking at the F1-values this algorithm outperforms the

(17)

rest. However, when looking at F2-values a difference is visible. For the negative cases gradient boosting with threshold 10 is scoring best, but when looking at the positive (meaning old) cases, gradient boosting with threshold 10 is scoring worst and gradient boosting with threshold 4 is scoring best. Random forest is scoring worst at the negative cases for both F1-value and F2-value, although this is not the case for the positive cases.

+ Precision -Precision + F1 -F1 + F2 -F2 + Accuracy DT 0.273 0.977 .750 0.842 0.400 0.904 0.313 0.947 0.835 GB max 10 0.364 0.897 0.471 0.843 0.410 0.872 0.381 0.886 0.789 GB max 4 0.273 1 1 0.845 0.429 0.916 0.319 0.965 0.853 RF 0.910 0.977 0.500 0.810 0.154 0.885 0.109 0.938 0.798

Table 9: Results of classifying old cases against not old cases, random forest (RF), gradient boosting trees (GB) and decision trees (DT), whereas + means old case and - means not old case.

From table 10 can be concluded that according to the highest accuracy, again gradient boosting trees with a threshold of maximum treedepth set on 4 was best when using the 80/20 ratio (0.778). Only for random forest the accuracy was lower when using 80/20 ratio instead of 70/30. All the algorithms reached the baseline of 65.6%. Although random forest (80/20) obtained the highest re-call for the negative examples, it got the lowest rere-call for the positive examples. When looking at the F1-values (80/20) it can be seen that gradient boosting with threshold 4 scores best on both positive and negative examples. This also holds for the F2-values and as stated before, for the accuracy.

Ratio Sample Recall- Recall

+ Precision -Precision + F1 -F1 + F2 -F2 + Accuracy 70/30 DT 0.828 0.556 0.726 0.694 0.774 0617 0.579 0.811 0.716 GB max 10 0.797 0.622 0.750 0.683 0.773 0.651 0.633 0.787 0.725 GB max 4 0.906 0.578 0.753 0.812 0.823 0.675 0.613 0.871 0.771 RF 0.906 0.444 0.699 0.769 0.789 0.563 0.485 0.855 0.716 80/20 DT 0.786 0.667 0.767 0.690 0.776 0.678 0.671 0.782 0.736 GB max 4 0.857 0.667 0.783 0.769 0.818 0.714 0.685 0.841 0.778 RF 0.881 0.400 0.673 0.706 0.763 0.511 0.438 0.830 0.681

Table 10: Results of classifying new cases against not new cases, using random forest (RF), gradient boosting trees (GB) and decision trees (DT), whereas + means new case and - means not new case.

Finally, there was decided to make two wordclouds, from both the old and the new cases, to see if there were big differences there. To make these, there

(18)

Figure 1: Wordclouds of old cases (blue) and new cases (green).

was decided not to use stemming in order to retrieve the complete words instead of the stemmed words and to include only words longer than 4 characters. A wordcloud is a method to visually show the importance of terms by analyzing their relative term frequency. It can be clear that many legal terms occur in the wordclouds and also some countries and words like ’staat’ and ’international’. There are also some dates, months and places of court written in the wordclouds visible that for the system stood out.

(19)

6 Discussion

In this thesis a new method was suggested for analyzing a newly developing field of law after a redesign of a legal framework by using a supervised binary classification system. It was notable that from the different tried algorithms, the algorithms with the best accuracy were all based on trees. Apparently, this type of algorithm works nicely on this sort of data. There were two main ob-stacles though; scarce data and doubtful labeling, which will now be discussed.

Scarce data

With only 124 cases referring to the 1215/2015 regulation, the second classi-fier did not perform well on accuracy. The number of false positives and false negatives was high and further research is needed to find out what these wrong classified cases say about the data. Are they counterexamples due to too less data or was the initial labelling incorrect?

By examining only 15.000 cases, scaling-up of this research is recommended. Because of the ’black box’-character of rechtspraak.nl it might be possible that not all the cases involving the regulation were retrieved, even though the code was written in such a way that it would start with the newest. And although in the dataset cases from 1999 were found, it cannot be said for sure that this means that it contained all cases up to 2000.

Because of limitations of computer power, there was decided to use only 360 negative cases in the first classifier. However, choosing 360 out of 14,640 can be done in many ways and even though this was done 10 times to mediate the results, chance still played a big role by selecting the negative cases.

To make conclusions for the entire European Union, it is necessary to scale the research up to other languages. Michiel de Rooij made the list of keywords for multiple countries, so in principal this should not differ much from this research. Except the reason that other countries would deliver more data, it might also give more insight into the referencing style of the Netherlands. Maybe the clerks of other countries use only 2 keywords in contrast of the Netherlands that use over 15. However, it is not definite that the other Member States use the same format as rechtspraak.nl and usage of XML-files and that might limit the pos-sibility to scale this research up to other countries.

Michiel de Rooij stated that it might be possible to get access to the unpub-lished cases held by the courts themselves - for example on the Parnassosweg in Amsterdam. Since he could not tell the format of these cases, it can not be stated that these cases could also be used to scale this research and to enlarge the data set. It is unknown how many cases are located there and the computa-tional limitations should be kept in mind. Especially KNIME is not accustomed to extremely large data for text analyzing.

Doubtfull labelling

The classification systems were measured for their performance based on the labelling the system decided on per case after the learning process and the re-semblance with the original labelling before the learning process. When these were not identical this could have two reasons; the system decided on the wrong label after the learning process or the original label was wrong. Since the orig-inal labelling in this thesis was done based on a list of keywords that is not mandatory for clerks to use in the judgments, this kind of labelling is said to

(20)

be doubtful. Especially false negatives are often due to unreliable labels, which could be an explanation for the large lists of false negatives found in the Ap-pendices. On the other hand it could be possible that the learning process just was not sufficient and this can be due by the scarce amount of data.

In summation the problems of scarce data and doubtful labeling could be the reason of the sometimes poor results. To scale up this research to obtain more data and more reliable results, further investigation is needed for the type of data in other countries and the court for unpublished cases, where computa-tional and time limitations should be kept in mind and tree-like algorithms perform well.

7 Conclusions and future work

The objective of this thesis was to answer the following research question: To what extent is it possible to design a supervised classification system that uses judgments of civil law cases from rechtspraak.nl to distinguish:

1. cases about Brussels I Regulation from all the other civil law cases?

2. cases from the Brussels I Regulation Recast from the Brussels I Regulation?

This was done by labelling 15.000 civil law judgments from rechtspraak.nl based on the occurrence of a list of keywords provided by a legal expert. The text of the judgment was pre-processed in a way that around 6000 terms were used as features for the classification task. Within these 6000 terms, there were a few terms similar to the keywords. With the help of Michiel de Rooij the conclusion was made not to include these terms in the list of keywords, but not to include the cases containing these terms in either train- or testset. The first classification contained 360 positive cases and 14,640 negative cases. Ten times a sample of 360 negative cases was drawn to accomplish a baseline of 50%. The algorithm random forest was scoring best on accuracy (0.9450), precision, recall, F1-measure and F2-measure when predicting 30% new data (testdata), leaving the algorithms decision trees and gradient boosting trees behind.

For the second classifier the negative cases were excluded and the 360 positive cases were split into the 44/2001 (310) and the recast 1215/2015 (124) cases, with a overlap of 74 cases. An experiment was done to distinguish the old cases from the not-old cases and the best result on all performance measures mentioned earlier was obtained with the gradient boosting algorithm with a threshold of maximum trees set on 4. This accuracy of 0.853 was however, not higher than the baseline of 0.861.

The new cases were again treated as a binary problem with the two classes new and not-new. This time also the 80/20 ratio for train- and testdata was used and caused over all better results than the 70/30 ratio. Again the gradi-ent boosting tree algorithm with threshold 4 resulted in the highest accuracy (0.778), which was higher than the threshold 0.656. It also scores best at both F1- and F2-measure.

In the wordclouds of the old regulation and the recast many legal terms, words about international contact and countries were found.

(21)

The study had to deal with scarce data and doubtful labels and this first problem can be solved partially - on the condition that the data is in the same format-by using cases from other Member States, format-by using the private collections from courts or by enlarging the number of observed Dutch cases.

This all taken into account, the research question can be answered as follows. Classifying Brussels I Regulation cases from other civil law cases can be done with an accuracy of 0.9450, but with computational limitations. Classifying Brussels I Regulation from the Brussels I Regulation Recast is harder, due to the small amount of data and mistakes are made more commonly. The grey area of wrong classified cases is discussed with legal expert Michiel de Rooij from the T.M.C. Asser Instituut and he recommended more data to improve the obtained results. All wrong classified ECLI’s can be found in the Appendices.

As mentioned in the discussion as well, further research could investigate this grey area more, to conclude whether the labelling should be improved and/or the data should be scaled up in any of the ways mentioned before. This would contribute to more information about the usage of the regulation and change of law in general.

(22)

Appendices

A

Tools

Figure 2: Example of positive case in MongoDB with all instances.

Figure 3: Overview of the KNIME workflow used for both classification prob-lems.

(23)

Figure 4: Overview of the KNIME pre-processing workflow used for both clas-sification problems.

(24)

B

Stopwords

aan aangaande aangezien achte achter achterna af afgelopen al aldaar aldus alhoewel alias alle allebei alleen alles als alsnog altijd altoos ander andere anders anderszins beetje behalve behoudens beide beiden ben beneden bent bepaald betreffende bij bijna bijv binnen binnenin blijkbaar blijken boven bovenal bovendien bovengenoemd bovenstaand bovenvermeld buiten bv daar daardoor daarheen daarin daarna daarnet daarom daarop daaruit daarvanlangs dan dat de deden deed der derde derhalve dertig deze dhr die dikwijls dit doch doe doen doet door doorgaand drie duizend dus echter een eens eer eerdat eerder eerlang eerst eerste eigen eigenlijk elk elke en enig enige enigszins enkel er erdoor erg ergens etc etcetera even eveneens evenwel gauw ge gedurende geen gehad gekund geleden gelijk gemoeten gemogen genoeg geweest gewoon gewoonweg haar haarzelf had hadden hare heb hebben hebt hedden heeft heel hem hemzelf hen het hetzelfde hier hierbeneden hierboven hierin hierna hierom hij hijzelf hoe hoewel honderd hun hunne ieder iedere iedereen iemand iets ik ikzelf in inderdaad inmiddels intussen inzake is ja je jezelf jij jijzelf jou jouw jouwe juist jullie kan klaar kon konden krachtens kun kunnen kunt laatst later liever lijken lijkt maak maakt maakte maakten maar mag maken me meer meest meestal men met mevr mezelf mij mijn mijnent mijner mijzelf minder miss misschien missen mits mocht mochten moest moesten moet moeten mogen mr mrs mw na naar nadat nam namelijk nee neem negen nemen nergens net niemand niet niets niks noch

(25)

nochtans nog nogal nooit nu nv of ofschoon om omdat omhoog omlaag omstreeks omtrent omver ondanks onder ondertussen ongeveer ons onszelf onze onzeker ooit ook op opnieuw opzij over overal overeind overige overigens paar pas per precies recent redelijk reeds rond rondom samen sedert sinds sindsdien slechts sommige spoedig steeds tamelijk te tegen tegenover tenzij terwijl thans tien tiende tijdens tja toch toe toen toenmaals toenmalig tot totdat tussen twee tweede u uit uitgezonderd uw vaak vaakwat van vanaf vandaan vanuit vanwege veel veeleer veertig verder verscheidene verschillende vervolgens via vier vierde vijf vijfde vijftig vol volgend volgens voor vooraf vooral vooralsnog voorbij voordat voordezen voordien voorheen voorop voorts vooruit vrij vroeg waar waarom waarschijnlijk wanneer want waren was wat we wederom weer weg wegens weinig wel weldra welk welke werd werden werder wezen whatever wie wiens wier wij wijzelf wil wilden willen word worden wordt zal ze zei zeker zelf zelfde zelfs zes zeven zich zichzelf zij zijn zijne zijzelf zo zoals zodat zodra zonder zou zouden zowat zulk zulke zullen zult true false

(26)

C

Early experiments classifier 1

Algorithm and parameters split 1 split 2 spit 3 mean Decision tree 0.908 0.912 0.889 0.903 Gradient boosting, 100 models, no treedepth threshold 0.876 0.917 0.897 Gradient boosting, 100 models, treedepth threshold =2 0.94 0.926 0.933 Gradient boosting, 100 models, treedepth threshold =4 0.935 0.935 Gradient boosting, 100 models, treedepth threshold =10 0.876 0.922 0.899 Naive bayes, maximum 2, p = 0.00 0.521 0.498 0.488 0.502 Naive bayes, maximum 2, p= 1/5659 0.544 0.516 0.502 0.521 K-nearest neighbour, k= 5, without weighting 0.816 0.71 0.737 0.754 K-nearest neighbour, k=5, with weighting 0.797 0.751 0.774 K-nearest neighbour, k= 9, without weighting 0.793 0.751 0.772 K-nearest neighbour, k=9, with weighting 0.788 0.77 0.779 K-nearest neighbour, k=3, without weighting 0.811 0.751 0.781 K-nearest neighbour, k=3, with weighting 0.793 0.751 0.772

Table 11: Results of the accuracy of multiple machine learning algorithms, all with words occurring in less than 1% or more than 95% deleted and train-test ratio 70%-30%.

(27)

D

Grey Keywords

eg-v erordening i-bis i-v erordening i-v o ibis ii-v o eex-v erdrag brussel-ii ECLI:NL:CRVB:2004:AO3131 ECLI:NL:GHAMS:2014:5826 1 ECLI:NL:GHARN:2012:BX0484 1 ECLI:NL:GHDHA:2016:3305 1 ECLI:NL:GHSHE:2016:5305 1 ECLI:NL:GHSHE:2016:5728 1 ECLI:NL:HR:2004:AO0903 1 ECLI:NL:HR:2006:AU5704 1 ECLI:NL:HR:2006:AV0650 1 ECLI:NL:OGEAA:2017:47 1 ECLI:NL:OGEAC:2017:28 1 ECLI:NL:PHR:2006:AU5704 1 ECLI:NL:PHR:2006:AV0650 1 ECLI:NL:PHR:2006:AY7918 1 ECLI:NL:PHR:2007:AZ3534 1 ECLI:NL:RBAMS:2007:BA1277 1 ECLI:NL:RBAMS:2016:6282 1 ECLI:NL:RBAMS:2017:2841 1 ECLI:NL:RBAMS:2017:298 1 ECLI:NL:RBDHA:2016:13741 1 ECLI:NL:RBDHA:2016:15834 1 ECLI:NL:RBDHA:2016:8293 1 ECLI:NL:RBDHA:2017:1087 1 ECLI:NL:RBDHA:2017:1659 1 ECLI:NL:RBDHA:2017:3668 1 ECLI:NL:RBDHA:2017:4382 1 ECLI:NL:RBDHA:2017:4392 1 ECLI:NL:RBDHA:2017:907 1 ECLI:NL:RBDHA:2017:921 1 ECLI:NL:RBLIM:2017:2921 1 1 ECLI:NL:RBMNE:2016:6315 1 ECLI:NL:RBROT:2016:4923 1 ECLI:NL:RBROT:2016:8570 1 ECLI:NL:RBROT:2017:714 1 Total 16 1 2 3 2 1 8 1

(28)

E

Classifier 1

ECLI classified as false positives random

forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:GHAMS:2014:2274” 1 1 1 ”ECLI:NL:GHAMS:2015:1519” 1 1 1 ”ECLI:NL:GHAMS:2015:2887” 1 1 1 ”ECLI:NL:GHAMS:2015:3003” 1 1 1 ”ECLI:NL:GHAMS:2015:4034” 1 1 1 1 3 ”ECLI:NL:GHAMS:2015:5443” 1 1 1 ”ECLI:NL:GHAMS:2016:1403” 1 1 1 1 3 ”ECLI:NL:GHAMS:2016:1548” 1 1 1 ”ECLI:NL:GHAMS:2016:1730” 1 2 1 2 4 ”ECLI:NL:GHAMS:2016:3586” 1 1 1 2 ”ECLI:NL:GHAMS:2016:3739” 1 1 1 ”ECLI:NL:GHAMS:2016:3911” 1 1 1 1 3 ”ECLI:NL:GHAMS:2016:4601” 1 1 1 ”ECLI:NL:GHAMS:2016:5118” 1 1 1 ”ECLI:NL:GHAMS:2016:5229” 1 1 1 ”ECLI:NL:GHAMS:2016:5362” 1 1 1 1 3 ”ECLI:NL:GHAMS:2017:1048” 1 1 1 2 ”ECLI:NL:GHAMS:2017:31” 1 1 1 ”ECLI:NL:GHAMS:2017:464” 1 1 1 2 3 ”ECLI:NL:GHAMS:2017:92” 1 1 1 ”ECLI:NL:GHAMS:2017:927” 1 1 1 ”ECLI:NL:GHARL:2014:10047” 1 1 1 2 ”ECLI:NL:GHARL:2014:4340” 1 1 1 ”ECLI:NL:GHARL:2015:3059” 1 1 1 ”ECLI:NL:GHARL:2015:5207” 2 2 2 ”ECLI:NL:GHARL:2016:10234” 1 1 1 2 ”ECLI:NL:GHARL:2016:4449” 1 1 1 1 3 ”ECLI:NL:GHARL:2016:6966” 1 1 1 ”ECLI:NL:GHARL:2016:9294” 1 1 1 2 ”ECLI:NL:GHARL:2017:324” 1 1 1 ”ECLI:NL:GHARL:2017:3246” 1 1 1 ”ECLI:NL:GHARL:2017:609” 1 1 1 ”ECLI:NL:GHARL:2017:910” 1 1 1 2 ”ECLI:NL:GHARN:2010:BN0249” 1 1 1 ”ECLI:NL:GHARN:2010:BO1073” 1 1 1 2 ”ECLI:NL:GHARN:2011:BQ0496” 1 1 1 ”ECLI:NL:GHDHA:2014:4490” 1 1 1 ”ECLI:NL:GHDHA:2015:1032” 1 1 1

(29)

ECLI classified as false positives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:GHDHA:2015:2745” 1 1 1 2 ”ECLI:NL:GHDHA:2015:3990” 1 1 1 ”ECLI:NL:GHDHA:2016:162” 1 1 1 ”ECLI:NL:GHDHA:2016:1844” 1 1 1 2 ”ECLI:NL:GHDHA:2016:2137” 1 1 1 2 3 ”ECLI:NL:GHDHA:2016:2539” 1 1 1 2 ”ECLI:NL:GHDHA:2016:3101” 1 1 1 2 ”ECLI:NL:GHDHA:2016:3351” 1 1 1 2 ”ECLI:NL:GHDHA:2016:3386” 1 1 1 ”ECLI:NL:GHDHA:2016:4127” 1 1 1 2 ”ECLI:NL:GHDHA:2016:876” 1 1 1 1 3 ”ECLI:NL:GHDHA:2017:121” 1 1 1 2 ”ECLI:NL:GHDHA:2017:1359” 1 1 1 ”ECLI:NL:GHDHA:2017:470” 1 1 1 ”ECLI:NL:GHDHA:2017:798” 1 1 1 1 3 ”ECLI:NL:GHSGR:2009:BL2811” 1 1 1 2 ”ECLI:NL:GHSHE:2006:AZ0868” 1 1 1 ”ECLI:NL:GHSHE:2007:BB9559” 1 1 1 ”ECLI:NL:GHSHE:2008:BF5205” 1 1 1 2 ”ECLI:NL:GHSHE:2011:BP9911” 1 1 1 ”ECLI:NL:GHSHE:2012:BY6964” 1 1 1 ”ECLI:NL:GHSHE:2015:2079” 1 1 1 2 ”ECLI:NL:GHSHE:2015:4428” 1 1 1 ”ECLI:NL:GHSHE:2015:601” 1 1 1 2 ”ECLI:NL:GHSHE:2016:4394” 1 1 1 ”ECLI:NL:GHSHE:2016:4487” 1 1 1 ”ECLI:NL:GHSHE:2016:4550” 1 1 1 ”ECLI:NL:GHSHE:2016:856” 1 1 1 ”ECLI:NL:GHSHE:2017:1114” 1 1 1 ”ECLI:NL:GHSHE:2017:1549” 1 1 1 1 3 ”ECLI:NL:GHSHE:2017:1550” 1 1 1 ”ECLI:NL:GHSHE:2017:208” 1 1 1 ”ECLI:NL:GHSHE:2017:939” 1 1 1 ”ECLI:NL:HR:2003:AF4610” 1 1 1 ”ECLI:NL:HR:2007:AY9707” 1 1 1 1 3 ”ECLI:NL:HR:2007:AZ4061” 1 1 1 1 3 ”ECLI:NL:HR:2013:BZ3670” 1 1 1 ”ECLI:NL:HR:2016:2988” 1 1 1 ”ECLI:NL:OGEAA:2016:746” 1 1 1 ”ECLI:NL:OGEAA:2016:879” 1 1 1

(30)

ECLI classified as false positives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:PHR:2004:AO9077” 1 1 1 ”ECLI:NL:PHR:2005:AU4784” 1 1 1 2 ”ECLI:NL:PHR:2006:AU6093” 1 1 1 2 ”ECLI:NL:PHR:2006:AV0050” 1 1 1 ”ECLI:NL:PHR:2006:AV0624” 1 1 1 ”ECLI:NL:PHR:2006:AZ2721” 1 1 1 2 ”ECLI:NL:PHR:2009:BG9951” 2 2 2 ”ECLI:NL:PHR:2009:BH2822” 1 1 1 ”ECLI:NL:PHR:2011:BP4454” 1 1 1 ”ECLI:NL:PHR:2011:BQ1684” 1 1 1 1 3 ”ECLI:NL:PHR:2011:BQ1696” 1 1 1 1 3 ”ECLI:NL:PHR:2012:BX9020” 1 1 1 ”ECLI:NL:PHR:2013:BY7841” 1 1 1 2 ”ECLI:NL:PHR:2014:1856” 1 1 1 1 3 ”ECLI:NL:PHR:2016:1004” 1 1 1 2 ”ECLI:NL:PHR:2016:1193” 1 1 1 ”ECLI:NL:PHR:2016:1346” 1 1 1 2 ”ECLI:NL:PHR:2016:1434” 1 1 1 ”ECLI:NL:PHR:2016:536” 1 1 1 ”ECLI:NL:PHR:2016:920” 1 1 1 ”ECLI:NL:PHR:2016:941” 1 1 1 2 ”ECLI:NL:PHR:2016:987” 1 1 1 ”ECLI:NL:PHR:2017:190” 1 1 1 ”ECLI:NL:PHR:2017:202” 1 1 1 2 ”ECLI:NL:RBAMS:2007:BA3914” 1 1 1 ”ECLI:NL:RBAMS:2016:1019” 1 1 1 1 3 ”ECLI:NL:RBAMS:2016:2119” 1 1 1 1 3 ”ECLI:NL:RBAMS:2016:3060” 1 1 1 1 3 ”ECLI:NL:RBAMS:2016:6888” 1 1 1 1 3 ”ECLI:NL:RBAMS:2016:8449” 1 1 1 ”ECLI:NL:RBAMS:2017:1067” 1 1 1 ”ECLI:NL:RBAMS:2017:1693” 1 1 1 1 3 ”ECLI:NL:RBAMS:2017:228” 1 1 1 1 3 ”ECLI:NL:RBAMS:2017:3433” 1 1 1 ”ECLI:NL:RBAMS:2017:917” 1 1 1 ”ECLI:NL:RBARN:2010:BN2002” 1 1 1 2 ”ECLI:NL:RBDHA:2013:19462” 1 1 1 2 ”ECLI:NL:RBDHA:2016:15712” 1 1 1 1 3 ”ECLI:NL:RBDHA:2017:1170” 1 1 1 1 3 ”ECLI:NL:RBDHA:2017:2614” 1 1 1 2

(31)

ECLI classified as false positives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:RBDHA:2017:2719” 1 1 1 ”ECLI:NL:RBDHA:2017:3685” 1 1 1 2 ”ECLI:NL:RBDHA:2017:4254” 1 1 1 ”ECLI:NL:RBGEL:2016:286” 1 1 1 ”ECLI:NL:RBGEL:2016:4868” 1 1 1 ”ECLI:NL:RBGEL:2016:6856” 1 1 1 ”ECLI:NL:RBGRO:2012:BW6269” 1 1 1 ”ECLI:NL:RBHAA:2007:BA5466” 1 1 1 ”ECLI:NL:RBLIM:2017:1147” 1 1 1 2 ”ECLI:NL:RBLIM:2017:1419” 1 1 1 ”ECLI:NL:RBLIM:2017:3038” 1 1 1 ”ECLI:NL:RBLIM:2017:3213” 1 1 1 ”ECLI:NL:RBMAA:2005:AT3587” 1 1 1 1 3 ”ECLI:NL:RBMNE:2017:1541” 1 1 1 ”ECLI:NL:RBMNE:2017:1748” 1 1 1 ”ECLI:NL:RBMNE:2017:175” 1 1 1 ”ECLI:NL:RBMNE:2017:2257” 1 1 1 ”ECLI:NL:RBMNE:2017:277” 1 1 1 2 ”ECLI:NL:RBMNE:2017:843” 1 1 1 1 3 ”ECLI:NL:RBNHO:2015:11985” 1 1 1 ”ECLI:NL:RBNHO:2015:3194” 1 1 1 2 ”ECLI:NL:RBNHO:2016:11223” 1 1 1 ”ECLI:NL:RBNHO:2016:1148” 1 1 1 ”ECLI:NL:RBNHO:2017:4185” 1 1 1 1 3 ”ECLI:NL:RBOBR:2016:6603” 1 1 1 2 ”ECLI:NL:RBOVE:2016:5306” 1 1 1 ”ECLI:NL:RBOVE:2017:1457” 1 1 1 1 3 ”ECLI:NL:RBOVE:2017:233” 1 1 1 2 ”ECLI:NL:RBROT:2015:5922” 1 1 1 2 ”ECLI:NL:RBROT:2016:10334” 1 1 1 2 ”ECLI:NL:RBROT:2016:2339” 1 1 1 2 ”ECLI:NL:RBROT:2016:7965” 1 1 1 2 ”ECLI:NL:RBROT:2016:8224” 1 1 1 2 ”ECLI:NL:RBROT:2016:8317” 1 1 1 ”ECLI:NL:RBROT:2016:8654” 1 1 1 ”ECLI:NL:RBROT:2016:9671” 1 1 1 2 ”ECLI:NL:RBROT:2017:1672” 1 1 1 ”ECLI:NL:RBROT:2017:1832” 1 1 1 2 ”ECLI:NL:RBROT:2017:3525” 2 2 2 2 6 ”ECLI:NL:RBROT:2017:646” 1 1 1

(32)

ECLI classified as false positives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:RBSGR:2007:AZ9059” 1 1 1 ”ECLI:NL:RBZWB:2014:9448” 1 1 1 2 ”ECLI:NL:RBZWB:2016:1539” 1 1 1 1 3 ”ECLI:NL:RBZWB:2016:4237” 1 1 1 Total 72 100 107 174 279

Table 13: Table of ECLI’s from Classifier 1 that were classified as false positives, meaning classified as case involving the regulation, while their label was false. It represents the total of all 10 draws, split in the different algorithms.

(33)

ECLI classified as false negatives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:GHAMS:2014:5507” 1 1 2 2 ”ECLI:NL:GHAMS:2015:4512” 1 1 1 ”ECLI:NL:GHAMS:2016:4212” 1 1 1 ”ECLI:NL:GHARL:2014:2717” 1 1 1 ”ECLI:NL:GHARL:2015:7355” 4 3 4 4 11 ”ECLI:NL:GHARL:2016:1854” 1 1 1 ”ECLI:NL:GHARL:2016:2327” 1 1 1 ”ECLI:NL:GHARL:2016:9173” 1 1 1 ”ECLI:NL:GHARL:2017:1954” 1 1 1 ”ECLI:NL:GHARL:2017:2325” 2 2 2 2 6 ”ECLI:NL:GHARL:2017:998” 3 3 3 ”ECLI:NL:GHARN:2011:BR3312” 1 1 1 2 ”ECLI:NL:GHDHA:2014:4644” 1 1 1 ”ECLI:NL:GHDHA:2016:2384” 1 1 1 ”ECLI:NL:GHDHA:2016:3755” 2 4 5 6 ”ECLI:NL:GHDHA:2016:4051” 1 1 1 ”ECLI:NL:GHDHA:2016:4284” 4 4 4 4 12 ”ECLI:NL:GHSGR:2003:AH9364” 1 1 1 ”ECLI:NL:GHSGR:2011:BQ5061” 1 1 1 ”ECLI:NL:GHSHE:2013:6299” 1 1 1 2 ”ECLI:NL:GHSHE:2016:467” 1 1 1 ”ECLI:NL:GHSHE:2016:4863” 1 1 1 ”ECLI:NL:GHSHE:2016:652” 1 1 2 2 ”ECLI:NL:GHSHE:2017:1382” 1 1 1 2 ”ECLI:NL:GHSHE:2017:146” 4 4 4 4 12 ”ECLI:NL:GHSHE:2017:1868” 2 2 2 4 ”ECLI:NL:GHSHE:2017:1873” 1 1 1 ”ECLI:NL:GHSHE:2017:2274” 1 1 1 2 ”ECLI:NL:GHSHE:2017:324” 1 1 1 ”ECLI:NL:HR:2006:AU8179” 3 2 2 3 7 ”ECLI:NL:HR:2006:AX5381” 4 4 2 4 10 ”ECLI:NL:HR:2011:BQ0510” 1 1 1 1 3 ”ECLI:NL:HR:2017:408” 3 2 3 3 8 ”ECLI:NL:PHR:2003:AF9714” 1 1 2 2 ”ECLI:NL:PHR:2004:AO0903” 2 1 2 3 ”ECLI:NL:PHR:2012:BV3678” 3 2 3 5 ”ECLI:NL:PHR:2016:1324” 1 1 1 ”ECLI:NL:PHR:2016:1337” 2 2 2 4 ”ECLI:NL:PHR:2017:35” 3 4 4 7 ”ECLI:NL:RBAMS:2016:1938” 2 2 2 ”ECLI:NL:RBAMS:2016:7841” 1 1 1 2

(34)

ECLI classified as false negatives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:RBDHA:2014:9323” 2 4 4 5 10 ”ECLI:NL:RBDHA:2016:12523” 2 2 2 ”ECLI:NL:RBDHA:2016:13933” 1 1 1 ”ECLI:NL:RBDHA:2016:14189” 1 1 1 ”ECLI:NL:RBDHA:2016:14326” 1 1 2 2 ”ECLI:NL:RBDHA:2016:14426” 1 1 1 ”ECLI:NL:RBDHA:2017:1025” 1 1 1 ”ECLI:NL:RBDHA:2017:1907” 1 1 1 ”ECLI:NL:RBDHA:2017:5222” 1 1 1 ”ECLI:NL:RBDHA:2017:555” 3 3 4 6 ”ECLI:NL:RBDHA:2017:814” 1 1 1 ”ECLI:NL:RBGEL:2015:2116” 2 1 2 3 ”ECLI:NL:RBGEL:2015:4277” 1 1 1 ”ECLI:NL:RBGEL:2017:1262” 1 1 1 ”ECLI:NL:RBGEL:2017:1517” 1 1 1 ”ECLI:NL:RBLIM:2016:2848” 1 1 1 ”ECLI:NL:RBLIM:2016:9897” 1 1 1 ”ECLI:NL:RBLIM:2017:3003” 1 1 1 ”ECLI:NL:RBLIM:2017:3038” 1 1 1 ”ECLI:NL:RBMNE:2016:1798” 1 2 2 2 5 ”ECLI:NL:RBMNE:2016:2087” 1 1 1 ”ECLI:NL:RBMNE:2016:5346” 1 1 2 2 ”ECLI:NL:RBMNE:2016:6290” 1 3 3 4 ”ECLI:NL:RBMNE:2016:7045” 1 1 1 2 ”ECLI:NL:RBMNE:2017:412” 3 1 3 4 ”ECLI:NL:RBOBR:2013:7463” 1 1 1 2 ”ECLI:NL:RBOBR:2014:1912” 1 1 1 2 ”ECLI:NL:RBOBR:2016:7220” 2 2 2 4 ”ECLI:NL:RBOVE:2015:5822” 2 2 2 ”ECLI:NL:RBOVE:2017:1503” 1 1 1 ”ECLI:NL:RBOVE:2017:2152” 1 1 1 ”ECLI:NL:RBROT:2016:10263” 1 1 1 ”ECLI:NL:RBROT:2016:5863” 5 4 5 5 14 ”ECLI:NL:RBROT:2016:7258” 1 2 2 3 ”ECLI:NL:RBROT:2016:8662” 1 2 2 3 5 ”ECLI:NL:RBROT:2016:8738” 3 4 4 7 ”ECLI:NL:RBROT:2016:9795” 4 4 4 8 ”ECLI:NL:RBROT:2017:1976” 1 1 1 2 ”ECLI:NL:RBROT:2017:3564” 3 1 1 4 5 ”ECLI:NL:RBROT:2017:929” 1 1 1 2 ”ECLI:NL:RBZWB:2013:BZ4984” 1 1 1 1 3

(35)

ECLI classified as false negatives random forest gradien t b o osting decision trees n um b er of dra ws total ”ECLI:NL:RBZWB:2016:4351” 1 1 1 2 ”ECLI:NL:RBZWB:2016:7594” 1 1 2 2 Total 49 105 103 154 257

Table 14: Table of ECLI’s from Classifier 1 that were classified as false negative, meaning classified as case not involving the regula-tion, while their label was true. It represents the total of all 10 draws, split in the different algorithms.

(36)

F

Classifier 2

ECLI classified as false positive random

forest gradien t b o osting decision trees total ”ECLI:NL:GHAMS:2016:4156” 1 1 1 3 ”ECLI:NL:GHAMS:2016:5379” 1 1 1 3 ”ECLI:NL:GHARL:2016:2327” 1 1 1 3 ”ECLI:NL:GHARL:2017:2874” 1 1 2 ”ECLI:NL:GHARL:2017:998” 1 1 1 3 ”ECLI:NL:RBDHA:2016:13933” 1 1 1 3 ”ECLI:NL:RBDHA:2017:1025” 1 1 ”ECLI:NL:RBDHA:2017:3174” 1 1 2 ”ECLI:NL:RBDHA:2017:4379” 1 1 ”ECLI:NL:RBGEL:2016:5611” 1 1 1 3 ”ECLI:NL:RBLIM:2017:2843” 1 1 1 3 ”ECLI:NL:RBMNE:2016:7045” 1 1 1 3 ”ECLI:NL:RBNHO:2017:4377” 1 1 1 3 ”ECLI:NL:RBNNE:2016:4935” 1 1 1 3 ”ECLI:NL:RBNNE:2016:5613” 1 1 1 3 ”ECLI:NL:RBROT:2016:5863” 1 1 1 3 ”ECLI:NL:RBROT:2016:7435” 1 1 1 3 ”ECLI:NL:RBROT:2017:1976” 1 1 ”ECLI:NL:RBZWB:2017:1105” 1 1 2 ”ECLI:NL:RBZWB:2017:1400” 1 1 1 3 Total 16 15 20 51

Table 15: Table of ECLI’s from Classifier 2 - old regulation - that were classified as false positive, meaning classified as case involving the old regulation, while their label was false.

(37)

ECLI classified as false negative random forest gradien t b o osting decision trees total ”ECLI:NL:GHSHE:2015:3905” 1 1 ”ECLI:NL:RBDHA:2016:12523” 1 1 ”ECLI:NL:RBROT:2016:10090” 1 1 ”ECLI:NL:RBROT:2016:10263” 1 1 Total 2 2 4

Table 16: Table of ECLI’s from Classifier 2 - old regulation - that were classified as false negative, meaning classified as case not in-volving the old regulation, while their label was true.

(38)

ECLI classified as false positive random forest gradien t b o osting decision trees total ”ECLI:NL:GHARL:2016:9863” 1 1 2 ”ECLI:NL:GHSGR:2003:AH9364” 1 1 ”ECLI:NL:GHSHE:2008:BG7778” 1 1 ”ECLI:NL:GHSHE:2015:3905” 1 1 ”ECLI:NL:HR:2006:AX3080” 1 1 ”ECLI:NL:PHR:2006:AU4795” 1 1 ”ECLI:NL:PHR:2006:AX3080” 1 1 ”ECLI:NL:PHR:2012:BV3678” 1 1 ”ECLI:NL:RBAMS:2014:1376” 1 1 ”ECLI:NL:RBARN:2010:BO7637” 1 1 ”ECLI:NL:RBDHA:2016:12523” 1 1 ”ECLI:NL:RBDHA:2017:555” 1 1 ”ECLI:NL:RBGEL:2015:7708” 1 1 2 ”ECLI:NL:RBLIM:2016:7412” 1 1 2 ”ECLI:NL:RBLIM:2016:8473” 1 1 2 ”ECLI:NL:RBROT:2014:7077” 1 1 2 ”ECLI:NL:RBROT:2016:10263” 1 1 2 Total 11 6 6 23

Table 17: Table of ECLI’s from Classifier 2 - new regulation, 70/30 train/test ratio - that were classified as false positive, meaning classified as case involving the new regulation, while their label was false.

(39)

ECLI classified as false negative random forest gradien t b o osting decision trees total ”ECLI:NL:GHAMS:2016:4156” 1 1 2 ”ECLI:NL:GHAMS:2016:5379” 1 1 ”ECLI:NL:GHARL:2016:10356” 1 1 1 3 ”ECLI:NL:GHARL:2016:2327” 1 1 1 3 ”ECLI:NL:GHARL:2017:2874” 1 1 1 3 ”ECLI:NL:GHARL:2017:998” 1 1 1 3 ”ECLI:NL:GHSHE:2016:5607” 1 1 2 ”ECLI:NL:RBAMS:2016:5691” 1 1 2 ”ECLI:NL:RBAMS:2016:7750” 1 1 ”ECLI:NL:RBDHA:2017:1025” 1 1 ”ECLI:NL:RBDHA:2017:110” 1 1 ”ECLI:NL:RBDHA:2017:1102” 1 1 2 ”ECLI:NL:RBDHA:2017:2312” 1 1 ”ECLI:NL:RBGEL:2015:4277” 1 1 1 3 ”ECLI:NL:RBGEL:2016:4202” 1 1 1 3 ”ECLI:NL:RBGEL:2016:4814” 1 1 ”ECLI:NL:RBGEL:2016:5611” 1 1 2 ”ECLI:NL:RBGEL:2017:2436” 1 1 1 3 ”ECLI:NL:RBLIM:2017:2843” 1 1 1 3 ”ECLI:NL:RBLIM:2017:3038” 1 1 1 3 ”ECLI:NL:RBLIM:2017:765” 1 1 ”ECLI:NL:RBMNE:2016:1674” 1 1 1 3 ”ECLI:NL:RBMNE:2016:6005” 1 1 ”ECLI:NL:RBMNE:2016:7045” 1 1 2 ”ECLI:NL:RBMNE:2017:1676” 1 1 ”ECLI:NL:RBNNE:2016:2736” 1 1 ”ECLI:NL:RBNNE:2016:5613” 1 1 ”ECLI:NL:RBOBR:2015:19” 1 1 ”ECLI:NL:RBOVE:2016:4630” 1 1 2 ”ECLI:NL:RBROT:2016:10090” 1 1 ”ECLI:NL:RBROT:2016:5863” 1 1 1 3 ”ECLI:NL:RBROT:2016:7435” 1 1 ”ECLI:NL:RBROT:2017:1976” 1 1 ”ECLI:NL:RBROT:2017:2996” 1 1 ”ECLI:NL:RBZWB:2017:1400” 1 1 Total 20 19 25 64

Table 18: Table of ECLI’s from Classifier 2 - new regulation, 70/30 train/test ratio - that were classified as false negative, meaning classified as case not involving the new regulation, while their label was true.

(40)

ECLI classified as false positive random forest gradien t b o osting decision trees total ”ECLI:NL:GHARL:2016:9863” 1 1 2 ”ECLI:NL:GHSHE:2015:3905” 1 1 2 ”ECLI:NL:HR:2006:AX3080” 1 1 ”ECLI:NL:PHR:2006:AX3080” 1 1 ”ECLI:NL:PHR:2012:BV3678” 1 1 ”ECLI:NL:RBAMS:2004:AT3893” 1 1 ”ECLI:NL:RBARN:2010:BO7637” 1 1 ”ECLI:NL:RBDHA:2016:12523” 1 1 2 ”ECLI:NL:RBDHA:2017:555” 1 1 ”ECLI:NL:RBGEL:2015:7708” 1 1 2 ”ECLI:NL:RBLIM:2016:7412” 1 1 ”ECLI:NL:RBLIM:2016:8473” 1 1 2 ”ECLI:NL:RBROT:2016:10263” 1 1 1 3 Total 9 6 5 20

Table 19: Table of ECLI’s from Classifier 2 - new regulation, 80/20 train/test ratio - that were classified as false positive, meaning classified as case involving the new regulation, while their label was false.

(41)

ECLI classified as false negative random forest gradien t b o osting decision trees total ”ECLI:NL:GHAMS:2016:5379” 1 1 ”ECLI:NL:GHARL:2016:2327” 1 1 1 3 ”ECLI:NL:GHARL:2017:2874” 1 1 2 ”ECLI:NL:GHARL:2017:998” 1 1 1 3 ”ECLI:NL:GHSHE:2016:5607” 1 1 2 ”ECLI:NL:RBAMS:2016:5691” 1 1 ”ECLI:NL:RBDHA:2016:13933” 1 1 ”ECLI:NL:RBDHA:2017:1102” 1 1 ”ECLI:NL:RBGEL:2015:4277” 1 1 1 3 ”ECLI:NL:RBGEL:2016:4202” 1 1 1 3 ”ECLI:NL:RBGEL:2016:4814” 1 1 ”ECLI:NL:RBGEL:2016:5611” 1 1 1 3 ”ECLI:NL:RBGEL:2017:2436” 1 1 1 3 ”ECLI:NL:RBLIM:2017:3038” 1 1 1 3 ”ECLI:NL:RBMNE:2016:6005” 1 1 ”ECLI:NL:RBMNE:2016:7045” 1 1 ”ECLI:NL:RBOBR:2015:19” 1 1 ”ECLI:NL:RBROT:2016:10090” 1 1 ”ECLI:NL:RBROT:2016:5863” 1 1 1 3 ”ECLI:NL:RBROT:2016:7435” 1 1 Total 10 10 18 38

Table 20: Table of ECLI’s from Classifier 2 - new regulation, 80/20 train/test ratio - that were classified as false negative, meaning classified as case not involving the new regulation, while their label was true.

(42)

References

[1] Alexander Boer. personal communication. June 7, 2017.

[2] Stefanie Bruninghaus and Kevin D Ashley. “Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases”. In: Proceedings of the 7th international conference on Artificial intelligence and law. ACM, 1999, pp. 9–17.

[3] Mihail Danov. “The Brussels I Regulation: Cross-Border Collective Re-dress Proceedings and Judgments”. In: Journal of Private International Law 6.2 (2017), pp. 359–393.

[4] Emile De Maat, Kai Krabben, and Radboud Winkels. “Machine Learning versus Knowledge Based Classification of Legal Texts”. In: JURIX (2010), pp. 87–96.

[5] George Forman. “An extensive empirical study of feature selection metrics for text classification”. In: Journal of machine learning research 3.Mar (2003), pp. 1289–1305.

[6] _{genediazjr. Stopwords-nl. 2017. url:} https://github.com/stopwords-iso / stopwords - nl / blob / master / stopwords - nl . txt (visited on 06/19/2017).

[7] Teresa Gon¸calves and Paulo Quaresma. “Is linguistic information relevant for the classification of legal texts?” In: Proceedings of the 10th interna-tional conference on Artificial intelligence and law. ACM, 2005, pp. 168– 176.

[8] _{KNIME.COM AG. KNIME Open for Innovation. 2017. url: http : / /} www.knime.org (visited on 06/19/2017).

[9] Rick Lawrence et al. “Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree anal-ysis”. In: Remote sensing of environment 90.3 (2004), pp. 331–336. [10] Leonardo Lezcano, Salvador S´anchez-Alonso, and Antonio J Roa-Valverde.

“A survey on the exchange of linguistic resources Publishing linguistic linked open data on the Web”. In: Program 47.3 (2013), pp. 263–281. [11] _{MongoDB.COM AG. What is MongoDB. 2017. url: https : / / www .}

mongodb.com/ (visited on 06/19/2017).

[12] Marc van Opijnen. Rechtspraakdata: open, linked en big. Sdu Uitgevers BV, 2014, pp. 12–39.

[13] Anantha M Prasad, Louis R Iverson, and Andy Liaw. “Newer classification and regression tree techniques: bagging and random forests for ecological prediction”. In: Ecosystems 9.2 (2006), pp. 181–199.

[14] Edwina L. Rissland, Kevin D. Ashley, and R. P. Loui. “AI and Law: A fruitful synergy”. In: Artificial Intelligence 150 (2003), pp. 1–15.

[15] Michiel Rooij de. personal communication. June 22, 2017.

[16] Marina Sokolova and Guy Lapalme. “A systematic analysis of performance measures for classification tasks”. In: Information Processing & Manage-ment 45.4 (2009), pp. 427–437.

(43)

[17] Marc Van Opijnen and Bart Veenman. “Jurisprudentie zoeken op Eu-ropese schaal: Hoe de ECLI-zoekmachine de grensoverschrijdende toe-gankelijkheid van rechterlijke uitspraken vergroot”. In: Nederlands tijd-schrift voor Europees recht 4 (2016), pp. 137–140.

[18] Kah Ho Zheng. Brussel I project. Tech. rep. University of Amsterdam, 2016.

Classifying civil law cases involving Brussels I Regulation and recast using machine learning techniques