The Potential of Machine Learning in Risk Assessment for Policing - Detecting Front Firms based on their Business Website

(1)

The Potential of Machine Learning

in Risk Assessment for Policing

Detecting Front Firms based on their Business Website

L.C. Verhees

10837833 Supervisor: dr. V.M. Dirksen 2nd Examiner: drs. A.W. Abcouwer

Special thanks to:

P.A.M. Wagener Dutch National Police

Bachelor thesis Information Science Faculty of Science, University of Amsterdam

(2)

Abstract

Recently, it has come to light that front firms exploit a normal business website to cover up their criminal activities and by doing so they give the impression that it concerns a legitimate business. Such websites are referred to as facade websites. Consequently, the suspicion has arisen that this phenomenon could be an interesting source for detection of such front firms. This thesis presents research on the potential of ML for the detection of facade websites as a method for police intervention. The main goal of this thesis is to determine to what extent the current approaches for assessing the risk of fake websites are sufficient for facade websites, and assuming they are not, which approach is suitable for the risk assessment of facade websites by the police. Based on an in-depth literature study combined with expert interviews, this study proposes a set of design guidelines, which advocate the development of a ML classification system for the detection of facade websites by the police. By taking the specific characteristics of risk assessment in policing into account, a mixed methods approach with an embedded design is suggested. The quantitative ML aspect of this approach should consist of unsupervised learning until enough information is gathered to perform accurate classification. If this point is reached, semi-supervised learning based on RTL should be applied to predict the labels of new instances of facade websites.

Keywords

Facade website detection, machine learning, criminal undermining, front firms, policing, internet fraud, website classification.

(3)

1. Introduction

Criminals abuse or weaken the structure of society, resulting in a reduction of confidence of the citizens within the same society. Furthermore, the integrity and credibility of the government are under growing pressure, because the upperworld and underworld are becoming increasingly intertwined. This phenomenon is referred to as criminal undermining and can cause a society to falter (Joldersma, et al., 2008). Hence, undermining has been a spearhead of the Dutch national police, struggling with the issue of how to deal with criminals who undermine the society through the structures of the upperworld (Pilger, 2014). Consequently, the need for new methods of intervention has arisen, not only in forensic investigation but also in daily police surveillance, which are both components of the core tasks of the Dutch police (Politie, n.d.).

A way in which the upperworld and underworld intertwine is through front firms. Front firms are a typical characteristic of the intertwinement of upperworld and underworld (Joldersma et al., 2008). A front firm is an entity that is set up and controlled by an organisation, which can act for another organisation without the actions being attributed to them, thereby allowing them to hide from public view. The more criminal networks succeed in exploiting such ‘connectors’ in the upperworld – including at governmental institutes and private business that may serve them in their criminal business processes, also referred to as facilitators – the more successful a criminal enterprise can become (Boerman, Grapendaal, Nieuwenhuis & Stoffers, 2012).

Recently, an interesting discovery has been made concerning these front firms. During a police investigation into organised crime by the Dutch National Police, it has come to light that front firms exploit a normal business website to cover up their criminal activities and by doing so they give the impression that it concerns a legitimate business . Such a website is referred to as a facade website. 1 Consequently, the suspicion has arisen that this topic could be an interesting source for early interventions. To date, no research has been conducted on such facade websites. However, the phenomenon of fake websites has been studied thoroughly. As defined by literature, fake websites are fraudulent websites used to deceive unsuspecting internet users (Wang, Zhu, Tan & Zhou, 2017). Both fake websites and facade websites are websites that involve fraud. But despite this similarity, there is one important difference. Facade websites function merely as a cover-up, the actual fraud takes place elsewhere. On the contrary, the goal of fake websites is to deceive its direct users. (Abbassi & Chen, 2013). Thus, the target group of the fraud differs.

Various authors have proposed models for the detection of fake websites utilising a machine learning (ML) approach (Abbasi & Chen, 2013; Adedoyin, Kapetanakis, Samakovitis & Petridis, 2017; Chua & Wareham, 2004). ML is a practice that uses statistical techniques to give computers the ability to ‘learn’ with data, without being explicitly programmed (Mohri, Rostamizadeh & Talwalkar, 2012). These fake website detection systems involve a specific set of fraud cues. A fraud cue is a piece of information, also referred to as a feature or indicator, based on which the risk on a fake website can be determined. For

(5)

example, a fraud cue could be a company phone number, and the presence or absence of this fraud cue increases or decreases the risk on a facade website. In what way the fraud cue influences the risk on a facade website, which is referred to as the weight or parameter of the fraud cue, is determined using the ML algorithm. Consequently, the ML algorithm can predict whether a website is “fake” or “real”. This will be further explained in section 2.2.

In short, the Dutch police could benefit from such an adoption of ML to detect facade websites as an alternative method of intervention. However, since facade websites differ from fake websites as mentioned above, systems proposed for the detection of fake websites are not sufficient for facade websites. To sufficiently assess the risk of facade websites, research should focus on the characteristics of this type of websites and its corresponding fraud cues. Such investigation methods involving ML for policing are relatively new, especially in relation to risk assessment. Thus, scientific research in this field is scarce. Therefore, this research aims to contribute to the discussion on this topic. Furthermore, there is hardly any literature on websites used as a cover-up for fraud and detection methods for those websites to date. Hence, this research targets a gap in scientific knowledge regarding this topic. Moreover, this research is socially relevant, because an approach to risk assessment for facade websites contributes to obtaining insight into methods to combat fraud.

This study presents research on the potential of ML for the detection of facade websites as a method for police intervention. The main goal of this thesis is to determine to what extent the current approaches for assessing the risk of fake websites are sufficient for facade websites, and assuming they are not, which approach is suitable for the risk assessment of facade websites by the police. Literature on fake websites describes those websites that aim to deceive its direct users. However, as discussed above, there is barely any to no literature on the concept of facade websites, which are used as a facade to cover-up criminal activities, and approaches to assess the risk of such websites. Therefore, the research question for this study can be described as follows: _{to what extent is an ML approach suitable to} determine the risk of a facade website? In order to answer the main research question, the following sub-questions have been formulated:

1. Which perspective on ML is the most suitable for the detection of facade websites? 2. Based on which fraud cues can the risk of a facade website be determined?

These two sub-questions, in combination with the literature study, will provide an answer to the main research question.

The remainder of this thesis consists of seven chapters. First, an in-depth literature study on ML perspectives and fake websites is conducted in chapter two. In chapter three, the methodology of the conducted expert interviews is described. The results from these interviews are described in chapter four. In chapter five these findings are combined with the insights from the literature study and are discussed to answer sub-questions one and two. In chapter six, conclusions are drawn, and an answer to the main research question is formulated. Based on these conclusions, chapter seven discusses the limitations of this study and proposes suggestions for future work. This thesis concludes with chapter eight, in which several topics of advice for the adoption of ML for the detection of facade websites by the Dutch police are formulated.

(6)

2. Literature Review

This chapter contains an in-depth literature study focusing on the phenomenon of fake websites and the potential of ML approaches to assess the risk of those fraudulent websites. This literature study was structured according to an approach as presented by Randolph (2009). This approach contains four steps for conducting a literature study: problem formation (narrowing down the concepts that are central in the study), data collection (determining which sources to examine), analysis and interpretation (synthesising retrieved studies) and public presentation (applying editorial criteria to separate relevant from irrelevant information). Sources that were selected for the retrieval of information were MIT, Springer, Elsevier, MIS Quarterly, IEEE, ACM, IOS Press, Journal of Management Information Systems, International Journal of Information Security & Cybercrime and Google Scholar. Retrieved studies concerning aforementioned concepts were analysed, interpreted and relevant information that was found has been described.

2.2 Machine Learning in Risk Assessment

Machine learning consists of designing efficient and accurate prediction algorithms (Mohri et al., 2012). The adaption of ML algorithms based on real-world datasets has allowed people to solve problems that were previously considered unsolvable (Kotsiantis, Zaharakis & Pintelas, 2007). Consequently, numerous ML perspectives and corresponding algorithms exist. However, ML tasks are typically classified into two categories: supervised and unsupervised learning. These categories will be described in this section. In section 2.3, the findings on ML will be linked to facade websites.

2.2.1 Supervised Learning

A supervised learning algorithm makes predictions based on a set of predefined examples, called training data. This training data consists of instances with a known weight for each feature and a corresponding outcome, in the case of fake websites this outcome is “fake” or “real” (Murphy, 2012). The algorithm looks for patterns in the training data and then uses the best fitting pattern to make predictions for unlabeled data. The specific algorithm that is applied can vary. Several statistical approaches exist and the best suitable approach depends on the specific case. As explained by James, Witten, Hastie & Tibshirani (2013), in a supervised learning setting, typically there is access to a set of _{p features X}₁, X₂, ..., X _p(i.e. indicators or fraud cues), measured on _{n observations (i.e. instances), and a response Y (i.e. “fake” or} “real”), also measured on those same _n observations. The goal is then to predict _Y using _X₁, X₂, ..., X_p.

Detecting facade websites concerns the prediction if a website belongs to one of the two categories, namely “facade” or “real”. Such a problem with a qualitative response is referred to as a classification problem (James et al., 2013). Furthermore, since the prediction consists of two possible categories, a two-class classification algorithm is necessary (Mohri et al., 2012). Examples of supervised two-class classification algorithms are logic-based algorithms (e.g. decision trees and random forest),

(7)

Figure 1. Supervised Machine Learning Flow

Support Vector Machines (SVM) and neural networks (Goh & Singh, 2015). Choosing the most suitable algorithm mostly depends on the type of features involved. Generally, logic-based systems tend to perform better when dealing with discrete or categorical features, whereas SVMs and neural networks tend to perform much better when dealing with multi-dimensions and continuous features (Kotsiantis et al., 2007). Furthermore, factors such as accuracy, training time, linearity, number of parameters and number of features are important to take into account (Mohri et al., 2012). Specific approaches for the classification of facade websites will be discussed in section 2.3.2.

Training the Machine Learning Algorithm

Independent of the approach chosen, the first step in supervised ML is to define a training dataset with known labels (Murphy, 2012). The training data needs to fit or train the model, see Figure 1. In the case of a set of _pfeatures, the parameters _β₀, β₁, ..., β _pneed to be estimated (i.e. the weights), such that _{Y ≈ β}₀+ β₁X₁ + β₂X₂+ ... + β _pX_p. Kotsiantis et al. (2007) argue that a training dataset can be obtained in several ways. If a requisite expert is available, this person can suggest which fields (attributes and features) are most informative. If such an expert is not available, then a so-called “brute-force” method can be applied. This method involves measuring every aspect available in the hope that the right (both informative and relevant) features can be isolated. However, a dataset based on a brute-force method is not directly suitable for induction. It often contains noise and missing feature values, and this requires significant pre-processing.

The amount of labelled data required for the training set of the ML algorithm depends on a number of different considerations (Mohri et al., 2012). A study performed by Figueroa, Zeng-Treitler, Kandula & Ngo (2012) states that the initial annotated sample of training data should at least be around 100~200 samples. However, in the case of facade websites, this raises a problem. This is due to the fact that only several instances of facade websites are available. Moreover, the indicators on which these websites can be identified are unknown. Thus, there is not only is there a small sample of training data

(8)

Figure 2. Semi-Supervised Machine Learning Flow

available, but the training data is also incomplete. As a result, a semi-supervised approach can be applied to handle this lack of samples.

Evaluating the Machine Learning Algorithm

After the model is defined using the learning algorithm, the model should be evaluated using a test dataset. The test dataset is similar to a training dataset in the sense that it contains labelled data. However, it is only used only to test the accuracy of the model. In other words, the unlabelled instances are used as input for the model, and the predictions are compared to the actual labels of the test dataset. It is essential to quantify the extent to which the predicted response value for a given observation is close to the actual response value for that observation (James et al., 2013). Therefore, the quality of the results needs to be assessed. If the quality of the results is too low, adjustments could be made in the set of fraud cues or the collection of training data (James et al., 2013; Murphy, 2012).

Semi-supervised learning

Semi-supervised learning is a class of supervised learning that, in contrast to supervised learning, makes use of unlabeled data for training in addition to labelled data (James et al., 2013; Murphy, 2012). This process typically involves a small amount of labelled data with a large amount of unlabeled data. The initial model is used to predict the labels for the unlabelled training data. These predictions are then used to retrain the classifier, see Figure 2. Research has found that unlabeled data, when used in conjunction with a small amount of labelled data, can produce considerable improvement in learning accuracy (Zhu & Goldberg, 2009). This approach can be highly beneficial since the training set is often very limited relative to the overall size of the testbed (i.e. there is only a small amount of known good/bad websites) (Wu & Chellapilla, 2007). Given that facade websites have only recently been discovered and, therefore, little information is available about this phenomenon, a semi-supervised approach could be beneficial for the detection of facade websites.

(9)

Figure 3. Unsupervised Machine Learning Flow Bias

However, both in a semi-supervised learning setting, as well as in a supervised learning setting, the fraud cues of the labelled data have to be available. Although several instances of facade websites are known, the fraud cues that indicate that it concerns a facade website are not yet known. As mentioned previously, this missing information can be based on the judgement of a requisite expert. However, this judgement is conditional on specific background knowledge. This background knowledge covers data, information and justified beliefs, often formulated as assumptions. These assumptions are the primary cause of biases (Amundrud & Aven, 2015). Bias refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, with a much simpler model (Kotsiantis et al., 2007). 2.2.2 Unsupervised learning

In an unsupervised learning setting, the instances are unlabeled and, therefore, a response variable that can supervise the analysis is missing (Mohri et al., 2012). Consequently, only a set of features _X₁,_X₂,..., _X_p measured on _{n observations is available (James et al., 2013). In contrast to supervised learning, predicting} is not the primary objective of this approach, because an associated response variable _{Y (i.e. the label} “real” or “fake”) is not available. Instead, the goal is to discover interesting things about the measurements on_X₁, _X₂,...,_X_p. By applying these unsupervised algorithms, researchers hope to discover unknown, but useful, classes of items (Maglogiannis, 2007). Thus, since the outcome of supervised learning is not a label _Y, this approach is not focused on classification, but instead focuses on finding structure and identifying clusters, see Figure 3. Moreover, an unsupervised approach can be valuable if a training set is not available and the fraud cues are unknown or when the set of existing fraud cues is too limited and needs to be extended.

Despite its advantages, unsupervised learning is often much more challenging than supervised learning. (James et al., 2013). When a predictive model is fit using a supervised learning technique, it is

(10)

possible to assess the quality by seeing how well the model predicts the response _{Y on observations not} used in fitting the model (i.e. the training data). However, in an unsupervised learning setting, it is impossible to determine this because the true answer is unknown. Consequently, there is no universally accepted mechanism for performing validation of the results in an unsupervised learning setting, which makes it difficult to assess the quality of an unsupervised learning approach. A study by Dahl, Adams & Larochelle (2012) proposes an approach for evaluating unsupervised learning algorithms by applying supervised learning. First, the learned instances from unsupervised learning can be used as input for supervised learning. Then, the performance of the unsupervised learning algorithm can be assessed by its ability to improve the performance of the supervised learning compared to a baseline using a standard representation. Second, in contrast to supervised learning, unsupervised learning focuses on finding clusters instead of making classifications. Thus, the outcome does not consist of a predicted label, but of a set of clusters and, therefore, an extra manual iteration is necessary to interpret the results. In short, analysing facade websites using unsupervised learning can contribute to the discovery of new fraud cues. However, the fact that this analysis does not directly result in a prediction if a website is real or facade is a disadvantage. The application of unsupervised learning for facade websites will be further discussed in section 2.3.2.

2.3 The Detection of Fake Websites

Although little research has been done regarding facade websites, as discussed in the introduction, various researchers have focused on fake websites. Fake websites, as defined by literature, are fraudulent websites mainly used to deceive unsuspecting internet users (Wang et al., 2017). A study conducted by Gyongyi and Garcia-Molina (2005) estimates that fake websites comprise nearly 20% of the entire web. This section will provide an overview of the types of fake websites, the detection systems proposed for fake websites and how this relates to facade websites.

2.3.1 The Types of Fake Websites

Previous studies have identified three categories of fake websites: _spam,_{spoof and concocted websites.} The first type,_{spam websites, are websites that target search engines (Abbasi & Chen, 2009b; Gyongyi &} Garcia-Molina, 2005). The second category is a type that exactly copies the look and feel of another website to trap users into providing personal information, which is referred to as _{spoof, phishing or hoax} websites (Adedoyin et al., 2017; Dinev, 2006). The third category contains those fake websites that appear as unique and legitimate entities with a failure-to-ship fraud objective, referred to as 2 _{concocted or}

scam websites (Abbasi & Chen, 2009b). A popular variant of concocted websites is an escrow website, which targets online marketplaces (Abbassi & Chen, 2013). Each type of fake website adopts a different method to mislead its users. Despite these different methods, the primary goal of these three types of fake

2 _{Businesses collect the money from unsuspected users using false promises, after that, they disappear} without delivering the promised product or service.

(11)

websites is to defraud its direct users. Whether the focus is on stealing credit card information, login credentials or money, the direct visitor of the website is the target.

In contrast to fake websites, facade websites merely function as a cover for concealing the actual fraud and thus giving the impression that it concerns a legitimate business. Nevertheless, facade websites show the most resemblance to concocted websites, since both facade websites and concocted websites appear as unique and legitimate entities and do not exploit an existing company. Due to this similarity, the characteristics of concocted websites can contribute to an understanding of facade websites.

Various characteristics of concocted websites can be found in literature. Concocted websites adopt rules of social engineering together with trust-enabling features to reach their target audience (Abbasi et al. 2015). Social engineering is an act of psychological manipulation, and trust-enabling features refer to specific design elements which have been proven to increase trust, such as providing contact information. Abbasi, Zhang, Zimbra, Chen & Nunamaker (2010) argue that to build trust, fake websites are often professional-looking and sophisticated regarding their design. The high-quality appearance makes it difficult for users to identify them as fraudulent. Furthermore, since concocted websites do not copy an existing brand, they can be harder to recognise (Abbasi & Chen, 2009b). However, a study by Abbasi et al. (2010) states that the text is often similar to prior fake websites. Moreover, this same study stresses that fake sites tend to have less linkage than legitimate websites. Finally, Abbasi et al. (2010) point out that a crucial difference between concocted websites and legitimate websites can be found in user conversion. Legitimate websites seek to increase the amount and frequency of website visits and conversions, thus focusing on a conversion rate to be as high as possible. Conversely, phishing websites are primarily focused on successfully defrauding unsuspecting users once, thus focusing on “single conversion”. As a consequence, there are differences in the extensiveness of certain parts of the website, such as customer service and FAQ pages. However, this last point forms a difference with facade websites. In contrast to concocted websites, facade websites function merely as a cover-up, so the user is not supposed to see what happens behind this facade. As a consequence, the focus of facade websites is “zero conversion”. Thus, the absence of best practices to increase user conversion, such as call-to-action buttons or contact forms, can increase the risk on a facade website.

2.3.2 Machine Learning Approaches for Fake Website Detection

By successfully exploiting human vulnerabilities, fake websites have emerged as a significant source of online fraud (Gyongyi & Garcia-Molina, 2005). Furthermore, by creating as much trust as possible with the user, as discussed in the previous section, fake websites aim to make users believe they are securely connected to a trusted website when the opposite is true (Dinev, 2006). Such websites span numerous domains, including financial, medical, legal, retail, social networking, and search/portal websites (Abbasi et al., 2015). Due to the increasing number of fake websites and the ever-changing nature of attacks, the automatic detection of fake websites has become a crucial task (Patil & Patil, 2016).

(12)

Several authors propose fake website detection systems based on ML, which have found to be more accurate in detecting fake websites than other traditional approaches, such as look-up systems 3 (Abbasi et al. 2010). In the remainder of this section, the ML approaches for fake website detection will be discussed. This discussion can contribute to the understanding of possible approaches for the detection of facade websites.

Design Requirements for Machine Learning Based Detection Systems

When applying ML to the detection of fake websites, multimedia such as images and text are part of the input data. Multimedia mining, compared to traditional data mining, consists of numbers and reaches much higher complexity. This increased complexity is caused by several factors (Kotsiantis et al., 2007). First, the algorithm has to deal with a huge volume of data. Second, the variability and heterogeneity of the multimedia data are much higher than regular data (e.g. diversity of sensors and time or conditions of acquisition). Third, the meaning of multimedia content is qualitative and often subjective. Furthermore, a website usually consists of various web pages, resulting in an even higher complexity of the analysis. Consequently, only the first page (e.g. the homepage) of the website could be analysed. However, Abbasi et al. (2010) show that performing the analysis at web page level results in higher accuracy. In this case, all pages within a website of interest are independently classified, and the aggregate of these page-level classification results is used to determine whether a website is real or fake. By adopting such a page-level approach, the complexity of the analysis increases exponentially. Thus, it is important that this higher complexity is taken into account when choosing an algorithm for a ML approach.

Abbasi et al. (2010) describe four essential requirements necessary for effective fake website detection. First, the ability to generalise patterns in existing data is important. These patterns can then be applied to unseen data to make predictions. Since the goal of ML is to identify patterns in data, this first point is covered by applying any ML approach. Second, research has identified a large number of website fraud cues, spanning multiple components of website design. The five main fraud cue categories for fake website detection are web page text, URL, source code, image attributes and linkage attributes. Nevertheless, the set of fraud cues required to represent these design elements for accurate fake website detection may encompass thousands of attributes. Furthermore, it has been found that systems based on limited fraud cues are susceptible to easy exploits. Since the characteristics of facade websites overlap those of fake websites, it is very likely that facade websites encompass at least hundreds of fraud cues as well. Thus, the system should be able to handle large feature sets. Moreover, little knowledge about facade websites is available at this point in time. However, the amount of knowledge will increase when more research is conducted. Therefore, the system must be scalable regarding the number of fraud cues utilised. Third, since websites include media mining, they often involve complex properties which can be difficult to encode in flat vectors. Thus, Abbasi et al. (2010) propose an approach involving custom kernels to enable the utilisation of domain knowledge to represent such information. This is especially

.

3_{A look-up system is a system that “looks up” the URL of a website against a blacklist of URLs reported} by users (Abbasi et al., 2010).

(13)

Figure 4. Machine Learning Flow for Website Classification

valuable to detect duplicate content, which is crucial for detecting spoof websites. However, facade websites do not necessarily copy existing websites. Therefore, the utilisation of domain knowledge is less relevant for facade websites and may result in unnecessary complexity. Finally, a significant challenge associated with fake website detection is the dynamic nature of websites. They evolve over time and do not always do this in a predictable fashion. Furthermore, criminals can adapt their methods and by doing that bypass existing detection systems. Although facade websites might be more static than fake websites, the risk that criminals will change their method also exists for facade websites. Thus, the classification models need constant revision. This can be done by designing systems that include a layer of dynamic learning and can relearn on newer, more up-to-date training sets.

Summing up, the design characteristics that are important for detecting facade websites have the ability to generalise patterns in existing data, the ability to handle large and dynamic feature sets and the ability to incorporate a layer of dynamic learning. Regarding the utilisation of domain-specific knowledge, further research is necessary to determine the added value of this aspect in comparison to the increased complexity of the analysis. Nevertheless, the three other characteristics need to be taken into account when designing a system for the risk assessment of facade websites. Figure 4 illustrates a ML classification system that incorporates these requirements.

Supervised Content-based or Link-based Classifier Systems

Most fake website detection systems as proposed by literature are content-based or link-based classifier systems (Chou, Ledesma, Teraguchi & Mitchell, 2004; Wu, Miller & Garfinkel, 2006). These systems utilise supervised classification algorithms to detect fake websites based on the appearance of specific fraud cues in website content and linkage attributes (Wang et al., 2017). Various authors have identified sets of fraud cues that may be applicable to facade websites (Abbasi & Chen, 2009a; Abbasi & Chen, 2009b; Abbasi et al., 2010; Fetterly, Manasse & Najork, 2004; Le, Markopoulou & Faloutsos, 2011). It is

(14)

important to note that while evidence has been provided regarding the similarities between fake websites and facade websites, it has also been concluded that not all fake website fraud cues are sufficient for facade websites and no formal evaluation has been conducted to assess the effectiveness of these cues for identification of facade websites. A collection of the fraud cues derived by literature can be found in table 1 of Appendix I.

Furthermore, fake websites consist of continuous fraud cues (i.e. the possible values of a fraud cue are ordered in the amount that it increases or decreases the risk). As previously discussed, SVM’s and neural networks tend to perform best with this type of data. This approach is supported by a study performed by Abbasi et al. (2010), in which the performance results of various learning-based classification techniques were compared. Support Vector Machines significantly outperformed the other six learning classifiers (logistic regression, J48 decision tree, Bayesian network, naïve Bayes, winnow and neural network) in terms of overall performance. However, the results show that the performance varies per type of fake website. Thus, in line with what was discussed in section 2.2.2, it would be necessary to test the performance for the specific case of facade websites based on overall accuracy, precision, recall and F1 score . 4

Since limited fraud cues make a classifier susceptible to easy exploits, an extended set of fraud cues combining various sources of information results in a higher accuracy. Thus, all available sources should be combined, which may encompass hundreds of attributes. Since a complete training data set is not available for facade websites, the data should be labelled manually, which would be very time-consuming. Moreover, as discussed in section 2.2.1, the training set would be built on assumptions and background knowledge, which increases bias. Furthermore, as mentioned above, the system should contain a layer of dynamic learning. Thus, a supervised approach is not suitable in the case of facade websites and a semi-supervised approach is necessary.

Semi-Supervised Classifier Systems

Several scholars propose a semi-supervised approach to overcome the lack of training data (Abbasi & Chen, 2009b; Abbasi et al., 2010; Wang et al., 2017). As previously mentioned, semi-supervised approaches to fake website detection add a layer of adaptive learning to the analysis. Wang et al. (2017) claim that adding a layer of adaptive learning to static classification results in a higher performance than systems relying solely on a static analysis. This is in line with the argumentation in section 2.2.1. By incorporating adaptive learning, the fraud cues are dynamically updated by relearning on newer, more up-to-date training collections of real and fake websites and are therefore comparable to previously discussed semi-supervised approaches.

Tian et al. (2007) propose a system in which semi-supervised learning is applied to generate new link features, that were then added to the original feature set and were used to reclassify the testing data. Building on this paper, Abbasi, Zahedi & Kaza (2012) also propose a recursive trust labelling (RTL) system that uses underlying static classifiers, containing fraud cues from both content-based and

4 _{Overall accuracy, precision, recall and F1 score are common metrics to evaluate the performance of a} classification system (Kazamian & Ahmed, 2015).

(15)

link-based sources, coupled with a recursive labelling mechanism. The results revealed that such an approach was able to improve the performance of the detection system significantly.

However, if in a semi-supervised learning setting, a critical problem can arise when misclassified instances are added to the training data. Misclassified instances can result in incorrect rules and assumptions incorporated by the classification model, which in turn can result in an increased bias and amplified error rates (Gan & Suel, 2007). Abbasi, Zahedi & Kaza (2012) describe several ways to avoid this problem. The semi-supervised learning can be limited in a way that it will generate only one or a few new instances. However, in this case, the adaptive learning components provide minimal improvements. Another approach is to only add test instances that have the most substantial prediction agreement across the static classifiers to the training data. Furthermore, resetting the training dataset and reclassifying each training instance during each iteration allows for error correction. This method significantly improves performance across several iterations.

Unsupervised Approaches to Fake Website Detection

To date, no unsupervised approach has been applied for the detection of concocted websites, those fake websites which show the most resemblance to facade websites. As discussed in section 2.2.2, unsupervised approaches result in a set of clusters and not in a predicted label. Therefore, unsupervised learning is not suitable for the case of current fake website detection systems, since those systems are designed to be used by internet users and, therefore, have to provide a predicted label instantly. However, an unsupervised approach in which the clusters are used as inspiration for further analysis can be useful for facade websites, since the case of facade websites is different. Moreover, the analysis is performed by the police and, therefore, it is less critical to provide a predicted label instantly.

2.4 Conclusion

Based on the literature review, several conclusions can be drawn. Fist, facade websites most likely adopt social engineering together with trust-enabling features to reach their target audience. To build trust facade websites will often be professional-looking, which makes it difficult to identify them as fraudulent, even more, because facade websites do not copy an existing brand. Furthermore, it is likely that the text of facade websites is similar to prior fake websites and that facade websites tend to have less linkage than legitimate websites. Finally, since facade websites focus on “zero conversion”, the absence of best practices to increase user conversion can increase the risk on a such a website. Furthermore, three design characteristics essential for accurate facade website detection have been identified: the ability to generalise patterns in existing data, the ability to handle large and dynamic feature sets and the ability to incorporate a layer of dynamic learning.

Furthermore, it can be concluded that both semi-supervised learning and unsupervised learning show potential for the assessment of facade websites. A semi-supervised setting is preferred over a supervised setting, since only little training data is available for facade websites and the fraud cues can be dynamically updated by relearning on newer, more up-to-date training collections of labelled data. Nonetheless, both semi-supervised and unsupervised approaches have disadvantages. On the one hand

(16)

with semi-supervised learning, the parameters that indicate that it concerns a facade website are unknown. This missing information is necessary for the training set and can be based on the judgement of a requisite expert. However, by doing this, these assessments are conditional on judgements and specific background knowledge, which causes biases. On the other hand, although no adoptions of unsupervised learning for fake websites exist, this approach can be valuable for facade websites to discover unknown but useful classes of items. In the case of facade websites, the analysis is performed by the police and, therefore, it is less critical to provide a predicted label instantly. However, an extra manual iteration is necessary to interpret the results, and if the data is too scattered, the system may not be able to find useful clusters. Furthermore, there is no universally accepted mechanism for performing validation of the results in an unsupervised learning setting, which makes it difficult to assess the quality of an unsupervised learning approach. As a solution, the learned instances from unsupervised learning can be used as an input for supervised learning. Consequently, compared to a baseline using a standard representation, the performance of the unsupervised learning algorithm can be assessed by its ability to improve the performance of the supervised learning algorithm.

3. Methodology

This chapter describes the methodology that was adopted in this research. In the first section, the research design is discussed. Next, a description of the data collection and the data analysis is provided. The chapter concludes with a discussion concerning the reliability and validity of the conducted research.

3.1 Research Design

The focus of this study is the potential of ML for the risk assessment of facade websites within the domain of the police. The occurrence of facade websites originated within the Dutch national police. However, not much knowledge concerning this phenomenon is available. Furthermore, due to sensitivity of the topic, the details about the discovery of this phenomenon may not be disclosed. Therefore, in addition to the in-depth literature review, a qualitative study was conducted that consists of expert interviews. These interviews were held to stimulate the orientation process and gather information concerning this phenomenon. This approach is appropriate since the emphasis of the interviews was the participant’s perspective on the topic (Bryman, 2016). Experts from various domains were interviewed, including the domains of ML, innovation and new methods of intervention within the police, as well as innovation concerning the web and the internet. Although details concerning the discovery of facade websites may not be disclosed, it can be noted that this discovery was related to criminal facilitators active as shipping agents in the port of Rotterdam in the Netherlands. Therefore, both an expert in the field of criminal facilitators, as well as a participant involved in this discovery were interviewed. A more detailed description of each participant’s expertise and function can be found in Appendix II.

(17)

3.2 Data Collection

As discussed in the previous section, qualitative data was gathered by conducting expert interviews to supplement the theoretical knowledge from the literature review. A semi-structured approach was chosen, since those interviews tend to be flexible as the interviewer can respond to the direction in which participants take the interview (Bryman, 2016). This flexibility is valuable given the preliminary stage of the research topic. The semi-structured interviews were minimally structured.

The main objective of the interviews was to gather all relevant information concerning the detection of fake websites using ML as a method of intervention for the police. A list of topics was used during each interview to slightly guide the conversation and make sure that the most relevant aspects were discussed, the complete list can be found in Appendix III. The most important aspects that were discussed involved the characteristics of facade websites versus those of fake websites, the possibilities of detecting facade websites as a method of intervention, unsupervised versus (semi-)supervised ML approaches and the emergence of bias within the police.

A total of six participants were interviewed. These participants have already been briefly described in section 3.1. In addition, a more detailed description can be found in Appendix II. Five of the six participants were identified in collaboration with the thesis supervisor from the Dutch National Police. During each interview, participants were asked which other employees they would recommend interviewing. This approach is referred to as ‘snowball sampling’ (Bryman, 2016). By applying this approach, one extra participant was interviewed. In the end, five employees from the National Dutch police and one person related to the SIDN Fund was interviewed. The SIDN Fund is a Dutch fund that provides financial support to ideas and projects that aim to make the internet stronger or that use the internet in innovative ways (SIDN Fonds, n.d.).

All participants received an invitation for the interview from a colleague who introduced the researcher and briefly described the research topic. Most of the interviews took place at the office of the participant. Two interviews took place at a similar formal environment. At the start of the interview, the procedure was explained to the participant, he/she was informed of his/her anonymity and was asked permission to record the interview. Next, the interviewer introduced him/herself and the research topic, followed by an introduction of the participant. The semi-structured interview was conducted according to the list of topics mentioned above. Topics were discussed in a slightly different way and sequence, plus new topics were added as the interviewer picked up on interviewees’ replies. At the end of each interview, the research steps were explained, and the participant was thanked for his/her time.

3.3 Data Analysis

Each interview session was recorded as an audio file and processed to extract the findings. First, the interview was transcribed in a literal way. Then, each transcription was coded using the software Atlas.ti. The first cycle of coding was conducted to thematically group the statements, based on these codes

(18)

relations between codes were identified and categories were constructed. A list of the codes and categories can be found in Appendix IV. These codes resulted in the outcomes described in chapter four.

3.4 Reliability & Validity

To guarantee the reliability of this study, consistency and replicability of the gathered results is essential (Bryman, 2016). Since this qualitative study focuses on a local case, this can be challenging. However, the process of how the interviews were conducted is described. Moreover, the interviews were conducted until no new information was extracted and a topic list was used to structure them. Furthermore, the attached codebook (Appendix IV) provides information concerning the details of the interviews that were interpreted as important. These steps contribute to the reliability of this study (Boeije, 2005; Bryman, 2016).

The validity of the study concerns the extent to which findings truly represent the phenomenon that should be measured (Bryman, 2016). Factors that contribute to a high validity include reducing the risk of misunderstandings regarding the answers from the interviewees and providing an environment in which the interview is conducted that does not influence the results (ecological validity) (Boeije, 2005). First, the steps taken during the analysis of the data are described (Boeije, 2005). Second, the results have been reviewed by the participants to ensure face validity and ensure that no misunderstandings could have arisen concerning the answers given. Furthermore, the interviews were held face-to-face and have been recorded and transcribed literally. And finally, to ensure the ecological validity, the interviews took place in a neutral environment, where it was unlikely that the participant would be interrupted.

4. Results

In this chapter, the results of the semi-structured interviews are discussed. Six interviews were held, and the insights of these interviews can be roughly grouped into two topics: the perspective of ML for risk assessment within the Dutch police and the specific case of facade websites. Both topics are discussed in this chapter. Due to anonymity, participants are referred to based on their expertise. A more detailed description of each participant can be found in Appendix II. A list of the codes used to structure the interviews can be found in Appendix III.

4.1 Perspective Machine Learning for Risk Assessment

Similar to what has been discussed in the literature review, all participants described (both directly and indirectly) an approach based on a classification system. The participant specialised in data science stressed that nowadays data science approaches are more and more necessary for the police due to the large amounts of data. The participant involved in the discovery of facade websites noted that facade websites can be identified based on indicators (i.e. fraud cues). However, none of the fraud cues provides a conclusion in itself. It is a combination of factors that ultimately leads to a suspicion. This participant discussed such an approach based on fraud cues: “The result can be expressed based on scoring every

(19)

single indicator, and then combining those weighted scores into a final score”. Furthermore, the innovation specialist noted that the practice of risk analysis can be very complicated and that such risk assessments should be interpreted with care since it is always a reflection of the past. Therefore, as he stressed, risk assessments should only be used as an inspiration, but never as the whole truth.

4.1.1 Human Analysis vs Computer Analysis

The relation between an analysis performed by a human and an analysis performed by a computer for policing was discussed with the participants. During the interviews, most participants referred to human assessment as the qualitative aspect of the analysis, whereas a computer assessment was referred to as the quantitative aspect of the analysis. In the remainder of this section, the terms quantitative and qualitative will be used to describe the corresponding aspects of the analysis.

A majority of the participants, including the innovation, IT and data science specialists, argued that a combination of both a quantitative aspect, as well as a qualitative aspect is crucial for risk assessment performed by the police. They described that a qualitative aspect is necessary for evidence in police investigations. The IT specialist argued that the one can not do without the other. Additionally, he stressed that when relying solely on quantitative analysis, it can be difficult to provide argumentations for certain decisions, which is crucial in police work. The participant specialised in innovation continued by arguing that, especially when it comes to errors due to inaccuracies in the system, the police can not afford such mistakes. However, he stated that to keep up with the ever-evolving tactics of criminals, the police needs the support of approaches involving technology, such as ML, which often requires quantitative methods for analysis. In line with this statement, the IT specialist described an approach for detecting facade websites where a qualitative assessment could be used to make a pre-selection of potential facade websites, and a police officer can perform a qualitative analysis based on this preselection. The participant specialised in data science agreed with this approach and argued that there should always be someone from the police to make the final analysis, which will consequently be a qualitative analysis. In order to support this qualitative assessment, this participant suggested a process as follows: (1) Someone provides a list of websites which needs to be investigated. (2) The detection tool will make a quantitative assessment and classifies all the websites as ‘fake’ or ‘real’ together with an accuracy score for each classification. (3) In addition to this quantitative assessment, the tool will deliver a report in the form of an Excel file, containing detailed information about the intelligence this classification is based on. (4) After that, a police officer can select several websites that stand out and perform a final qualitative analysis based on the information in the Excel report. The participant stressed the report document as a crucial part, since the sources of each assessment, which are necessary for further police work, can be documented.

4.1.2 Bias

Interestingly, all participants described the challenge that the police often build and act on knowledge which is already present. Furthermore, as described by the participant with expertise in ICT and society, by the time the police have thoroughly developed a particular system, criminals are often already several

(20)

steps ahead. However, the innovation specialist argued that existing knowledge does not necessarily say something about the future and thus by building upon this knowledge, the bias of a system increases, which results in inefficiency and consequently produces counterproductive effects. Nevertheless, the ICT and society expert argued that this is very hard to overcome since criminals can do everything they want, something a government cannot do: "You are almost always behind organised crime, I think that is just a given fact that you have to take into account."

Furthermore, as discussed in the literature review, bias is not only a result of an analysis based on existing knowledge. The innovation specialist confirmed this and stressed the risk when various types of bias are combined in an analysis: "There is always bias in your model, and there will be bias in your application as well. For example, there are many variations that, especially if you are going to predict, make it dangerous, especially when used as evidence in police work." In line with this statement, the participant specialised in data science argued that in this case, fraud cues and corresponding weights are always biased since they are selected by humans.

However, the innovation specialist continued by summing up the advantages of an analysis based on computer judgement, in comparison to an analysis based on human judgement:

People have [bias] too. Moreover, a person becomes tired, and a system is not. A system also learns much smarter and faster. These developments will soon go beyond human capabilities, especially with certain tasks. What we do with that technology is that you can let agents work together. So, if one can read well in a system and knows what it means, another agent can consult other sources very quickly and get information from there.

Furthermore, this participant argued that although an analysis based on a computer includes bias, an analysis based on human judgement can contain even more bias. Moreover, a computer works and learns faster than a human and the assessment quality of a human can decrease due to external effects, whereas the assessment quality of a computer is consistent.

4.1.3 Unsupervised vs (Semi-)Supervised Approaches

Three participants, those specialised in IT, data science and innovation discussed a supervised approach. However, as mentioned in the previous section, all participants referred to both the reactivity of the police, as well as the risk of bias. This suggests that a supervised approach is not ideal. Consequently, these participants rejected this approach.

A semi-supervised approach was also discussed. The participants specialised in IT and data science described an approach in which the fraud cues are provided, but the weights of the fraud cues are defined using ML. The data scientist argued that this approach requires a substantial labelled training set, however, this information is often not available: "Criminals conceal everything and, therefore, you do not know what the real labels are, what you normally do know in data science, which makes it more difficult.” Furthermore, he stressed that an approach such as RTL, where instances are added to the training set, could even increase bias since it is based on biased features which are emphasised during this

(21)

process. Moreover, this participant discussed that a high amount of weak fraud cues could have a negative influence on the accuracy of the model. However, he continued by stating that in the case of the police, a qualitative analysis will follow the quantitative analysis, so it is less of a problem if the algorithm classifies websites as fake with low accuracy.

To overcome the bias, the data scientist describes an unsupervised approach:

Imagine you start with nothing, how can you use neutral features to determine what the outliers are and what stands out? Then you can say, this group of websites stands out within a certain category. Then a person can look at it and label the groups of websites that are very suspicious or very normal.

This approach results in clusters instead of specific classifications and, thus, is very similar to an unsupervised approach. Both the innovation specialist and the IT specialist described a similar approach. However, the IT specialist claimed that an approach like this might require more work since the websites are not labelled instantly. On the other hand, he argued that the quantitative analysis will be followed by a qualitative analysis, and, therefore, this extra work can be neglected.

4.2 The Detection of Facade Websites

4.2.1 Preconditions of the Analysis

As mentioned previously, facade websites are business websites of front firms used by criminal facilitators as a cover-up. The participant specialised in criminal facilitators noted that criminal facilitators are located in the layer between the upperworld and the underworld. In addition, he described the various types of criminal facilitators:

The area between the upperworld and the underworld is a grey field with many different shades of grey. Some facilitators are just wrong and belong to the underworld. Some are located in the upperworld and mainly perform legitimate work, but the majority is in the grey area in-between both. Moreover, there are many different types. [...] They come in all shapes and sizes, ranging from the self-employed person to the large offices with 30-40 employees of which one is malicious, and a huge range in-between.

Since the diversity among criminal facilitators is high, both the participant specialised in this group and the IT specialist argued that it is important to determine which group of facilitators is targeted with the analysis of facade websites. The IT specialist stressed: “You have to be aware of what you are scanning. You can separate the malicious versus legitimate companies from each other. However, you will not detect a single malicious employee within a legitimate company.” Thus, only true front firms can be detected. The participant specialised in criminal facilitators continued by arguing that there is a chance that this group of front firms is rather small:

(22)

If you look at how that process works, it rarely happens that criminals work up into an upperworld figure and then be successful in the upperworld. On the contrary, it is almost always someone from the upperworld, who slips into the underworld. So it almost always interweaves from top to bottom.

Additionally, various participants noted the importance of focusing on criminals with the most significant impact in the criminal network. The ICT and society expert described that it is easier to arrest two small poorly organised criminals. However, they can be easily replaced. Thus, it is crucial to focus on those with a more significant impact.

Furthermore, three participants noted the possibility of focussing on a broader perspective than solely websites. The innovation expert referred to a focus on the digital representation of a company rather than only the website. In addition, the IT specialist argued that the metadata of a facade website can be interesting. Other digital media that were discussed included apps and social media, such as Facebook, LinkedIn, Twitter and Google Business pages. In line with the discussion to look at the complete digital representation of a company instead of only the website, the IT specialist argued that it is crucial to design a detection system in such a way that it can handle varying sources and fraud cues:

You do not know what the internet will look like in 10 years. If you look back at 10 years ago, it was also very different. [...] Maybe LinkedIn will go bankrupt tomorrow, then you cannot rely on that source anymore. So I think that the tooling and methodology will continue to exist, but that your sources and thus your fraud cues as well will change. There will be something new, and you should be able to do something with that.

The innovation specialist supported this claim: “What you want is sustainable innovation [i.e. long-lasting innovation], so you want to develop a model with a specific set of fraud cues for websites, and another specific set of fraud cues for apps, etc”. Thus, in line with the literature review, the participants suggested an approach in which various sources of information are combined, including the Chamber of Commerce, the branch association and public social media pages were suggested. The Dutch Chamber of Commerce handles is own indicators to detect fraudulent websites. When implementing this source in the analysis, these indicators can be implemented as fraud cues. A list of the indicators of the Dutch Chamber of Commerce can be found in Table 2 of Appendix I.

Besides generalising the detection system regarding media sources, generalising industries was also discussed. Various participants discussed a generalisable approach that can be used across domains. The data scientist argued that the system should consist of a module that can be used for every type of website. Thus, a much more generic approach. However, three participants discussed the need for an industry-specific aspect in detecting facade websites. Two approaches were discussed. On the one hand, the IT specialist described a general set of fraud cues with industry-specific weightings per indicator. On the other hand, the researcher specialised in criminal facilitators describes a general set of fraud cues together with a subset of industry-specific fraud cues. However, for both approaches, domain-specific

(23)

knowledge is necessary. Both participants argued that obtaining this knowledge can be time-consuming and, in some cases, it can be hard to determine which companies belong to a specific industry: "For example, there is no central registration of shipping agents. [...] At the Chamber of Commerce they belong to a category together with shipbrokers and stevedores, but that is too vague. So it is much work to gain insight into that."

4.2.2 Facade Website Fraud Cues

As discussed in the introduction, the already known fake website detection systems involve a specific set of fraud cues. Based on these fraud cues the risk on a fake website can be determined. The literature review has shown that not all fraud cues for fake websites are sufficient for the detection of facade websites. Thus, the participants were asked to identify specific fraud cues for facade websites. Table 3 in Appendix I presents an overview of the fraud cues mentioned in the expert interviews together with their potential influence on the risk of a facade websites, as judged by the participants.

In addition, during the interviews, several more general comments were made regarding fraud cues for facade websites. The IT specialist noted that it seems that the more of a facade a website is, the more static this website is. Furthermore, both this participant and the cyber resilience specialist noted that creating trust is an important aspect of facade websites, which is in line with what was discussed in the literature review. Such websites adopt design elements which have been proven to increase trust. Additionally, the participant specialised in criminal facilitators discussed the fact that potential clients find such facilitators themselves, and therefore facilitators do not have to search for clients. Thus, facade websites do not provoke conversion but are more closed, which is in line with targeting on a zero conversion, as discussed in the literature review.

5. Discussion

This research aimed to explore the potential of ML for the detection of facade websites by the police. The in-depth literature study focused on the phenomenon of fake websites and the potential of ML approaches to assess the risk of those fraudulent websites. Expert interviews have been conducted to supplement the insights from the literature review. Experts from various domains were interviewed, including the domains of ML, new methods of intervention and innovation within the police. In this chapter, the findings from both the in-depth literature review and the expert interviews are combined to answer the two-sub questions which were formulated in the introduction of this study.

Two challenges faced by the police have been identified. First, several developments-including the increasing amount of data, smarter ways to do crime and the intertwinement of the upperworld and the underworld-make it more and more difficult for police officers to investigate all relevant information of a case. Thus, smarter ways of analysis and intervention are desired. Second, in relation to the previous point, it became clear from the expert interviews that the police often build and act upon knowledge that is already present. This limitation causes an increase in bias, making possibilities for preventive

(24)

interventions more difficult. Although this reactiveness cannot be fully prevented, acting based on existing knowledge should be minimised.

Furthermore, there is one essential characteristic of the police that is important to take into account. With every case, the final analysis and assessment are performed by a person, referred to as a qualitative analysis by the participants. Thus, the analysis supported by technology, referred to as a quantitative analysis by the participants, cannot be decisive, but can only be used as inspiration for the final assessment. However, to overcome the first challenge, computer analysis is necessary, since this approach makes it possible to analyse massive amounts of data in a minimal amount of time. From this, it can be concluded that a mixed method approach is the most suitable approach, combining both a human and computer analysis to get the best out of both approaches.

Based on literature regarding mixed methods as a research approach, namely combining qualitative and quantitative analysis, several interesting findings can be distilled. Bryman (2016) described mixed methods as research that combines quantitative and qualitative research. Furthermore, he argues that mixed methods research can be classified in terms of priority and sequence. If, in the case of the police, the human analysis is seen as the qualitative approach and the computer analysis as the quantitative approach, the qualitative method is the principal data-gathering tool. Furthermore, the data collection associated with both methods is sequential and not concurrent. Consequently, this approach can be classified as an embedded design, where qualitative analysis is the priority approach, but it draws on a quantitative approach as well within the context of the analysis. Bryman (2016) continues by describing this design as follows:

The need for an embedded design can arise when the researcher needs to enhance either quantitative or qualitative research with the other approach. The phasing or the data collection may be simultaneous or sequential. The need for the design arises when the researcher feels that quantitative (or qualitative) research alone will be insufficient for understanding the phenomenon of interest.

This description of an embedded design matches the desire of the police to support the human analysis with computer analysis, with the human analysis being the primary assessment.

Although there is a difference between mixed methods as a research approach as formulated by Bryman (2016) and the mixed methods approach combining a human analysis and computer analysis, these characteristics can provide insights into the best suitable ML approach for the police. The second challenge, stated at the beginning of this section, can be solved by using unsupervised learning. By applying an unsupervised approach, the algorithm can identify clusters in the data that can provide input for new fraud cues (i.e. if all instances in a cluster have a similar characteristic that was not yet known, this can be identified as a new fraud cue). This way, the police will be less dependent on existing knowledge, and new patterns can be discovered. Since the ML analysis will always be followed by an analysis performed by a police officer, the need for an extra iteration to interpret the results of unsupervised learning is covered. As discussed in the literature review, in order to evaluate the

The Potential of Machine Learning in Risk Assessment for Policing - Detecting Front Firms based on their Business Website