• No results found

Using Machine Learning to analyse factors impacting darknet marketplaces listings

N/A
N/A
Protected

Academic year: 2021

Share "Using Machine Learning to analyse factors impacting darknet marketplaces listings"

Copied!
32
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Using Machine Learning to analyse factors

impacting darknet marketplaces listings

Bachelor Thesis

Information Science - University of Amsterdam

Joep Harmsen

10813349

Supervisor:

mw. dr. V.M. (Vanessa) Dirksen

V.M.Dirksen@uva.nl

June 29, 2018

Abstract

backgroundSince the rise of the internet new types of online illicit drug

market-places sprouted. This thesis investigates the driving forces behind gaining revenue on these markets through a case study of cocaine listings on the former darknet marketplace called Alphabay.

methodsRaw data originating from Alphaybay is processed, organized and

anal-ysed. By using Python and scikit-learn various machine learning algorithms for regression are applied.

resultsRegularized linear regression and random forest regression provide for the

best goodness-of-fit. The random forest algorithm also provides for feature im-portance per feature in explaining the variance of the target variable. The most compelling finding is that effect of vendor reputation and the perceived level of trust is insignificant.

conclusionsMachine learning proves to be an eligible way of analysing the effect

of economic factors on listings of illicit drugs on darknet marketplaces. Various known economic principles apply in the case presented in this thesis, others don’t, subsequently resulting in novel information about darknet marketplaces.

(2)

Contents

1 Introduction 4

2 Problem Statement 5

3 Literature Review 5

3.1 Factors affecting revenue in online markets . . . 5

3.1.1 Vendor Reputation . . . 6

3.1.2 Trust . . . 7

3.1.3 Product Information . . . 8

3.1.4 Product Pricing . . . 8

3.2 Illicit Drug Markets . . . 10

3.2.1 Conventional Drug Markets & Distribution Networks . . . 10

3.2.2 Online Illicit Drug Markets . . . 11

3.2.3 Darknet Marketplaces . . . 12

3.2.4 Case Study: Cocaine listings on Alphabay . . . 13

4 Methodology & Data 14 4.1 Programming Techniques . . . 14

4.1.1 NumPy & Pandas . . . 14

4.1.2 Regular Expressions . . . 15

4.1.3 Matplotlib . . . 15

4.1.4 Machine Learning & Scikit-Learn . . . 15

4.2 Preparing the dataset . . . 16

4.3 Analytical Methodology & Evaluation Metrics . . . 16

4.4 Data Exploration . . . 18

4.4.1 Exploration . . . 18

4.4.2 Correlation heat map . . . 18

5 Results & Discussion 19 5.1 Correlation analysis . . . 19

5.2 Regression Evaluation . . . 21

5.3 Polynomial Regularized Linear Regression (Lasso) . . . 21

5.4 Random Forest Algorithm . . . 22

5.5 Dropping number of Successful Transactions . . . 23

5.6 Discussion . . . 23

6 Conclusion & Limitations 24 6.1 Conclusion . . . 24

(3)

7 Acknowledgements 25

8 Appendix 31

(4)

1

Introduction

“In the last decade, use of the Internet has grown exponentially and has become an integral part of daily life, providing global communication, access to information and provision of entertainment” (Anderson et al., 2017, p. 430). Together with the rise of the internet there has been a tremendous increase in the amount of digital marketplaces. The offer of virtually any product or service is nowadays also available online, includ-ing products or services of illegal nature. A vast majority of these products and services were sold on illegal digital marketplaces such as the Silk Road I and Alphabay, which have both been taken down. A lion’s share of the products sold on these marketplaces consisted of various types of illegal drugs. This research analyses which factors impact success in an unsupervised illegal digital market. The main research question posed in this thesis is therefore: “Which factors have the greatest impact on the total gross per day for a listing of illicit drugs on an unsupervised illegal digital market?”. To answer this question in an adequate manner it is important to define the characteristics that may influence the target variable. Furthermore, information will need to be extracted from data originating from illegal digital marketplaces. This information will have to be classified, when possible, as one of the factors impacting the total gross per day. This way, these characteristics can be used as features in a machine learning problem to model the impact of each feature on the target value and to predict this target value in the best possible manner. When the importance of each feature is clear, it is possible to reflect on the gained information and to see whether this agrees with prior literature about both online licit or illicit markets. Furthermore, Caulkins & Reuter (2006, p. 1) state that “markets for illicit drugs present an interesting case study for economics, combining non-standard characteristics such as addiction and product illegality”. A wide variety of prior research has been conducted on licit online markets (Xu and Kim, 2008). Additionally, there also has been research on the size and scope of vendors on darknet markets (Paquet-Clouston et al., 2018) and the characteristics of their users (Van Hout and Bingham, 2013; Barratt et al., 2014). However, no research on which factors impact the success of darknet listings, as described by a variable such as total gross per day, has yet been conducted. Barratt et al. (2016) and Van Hout & Bingham (2013) propose that because of the revolutionary and novel nature of these darknet mar-ketplaces, they are not expected to disappear in the future. Therefore it is important to analyse the driving forces behind such markets. Since data is one of the few things that can be recorded of these marketplaces, it is relevant to gain as much insight on how to process and make sense of this data. This thesis conducts a case study on a darknet marketplace, in a way that is representable and reproducible for other darknet markets as well. The methodological novelty of using machine learning algorithms to analyse the impact of various factors on darknet marketplace listings hopefully provides for a potential base for future research.

(5)

2

Problem Statement

The illegal and anonymous nature of online illegal digital markets are the reason for prior research providing descriptive characteristics of vendors and buyers. However this thesis attempts to identify the driving forces behind these descriptives. Therefore, as stated in the introduction, the central questions throughout this thesis are:

(i) ”Which factors have the greatest impact on the total gross per day for a listing of illicit drugs on an unsupervised illegal digital market?”

(ii) ”How do the findings in the analysis of this thesis relate to prior literature on licit online markets and illicit markets?

(a) ”What is in accordance with prior literature?”

(b) ”What differs with prior literature?”

3

Literature Review

3.1

Factors affecting revenue in online markets

As stated in the introduction Caulkin & Reuter (2005) point out that markets for illicit drugs present an interesting case for studying economics. However, to analyse if the economic principles for high revenue in licit online marketplaces also rule for illicit online marketplaces, it is first needed to identify these principles. “In order to improve online revenues, there is an imminent need to understand the factors that influence on-line purchases and how to tackle these factors” (Ranganathan and Ganapathy, 2002, p. 87). A large amount of research has already been devoted to this subject. In a study by Chio et al., (2014) key categories are presented which all have a significant effect on the initial purchase intention, while only some have a significant effect on the repeat purchase intention. Since this thesis makes no distinction between first time buyers and repeat buyers it is of importance to account for factors impacting both. The 5 key categories presented are: website attributes, seller attributes, consumer be-liefs/perceptions, customer serviceand shopping benefits. Since the darknet markets work as listings the website attributes are not applicable. The seller attributes describe the seller’s online reputation as perceived by potential online buyers, as well as the size of the seller. The customer beliefs/perceptions describe the perceived risk the online buyer might perceive. This also concerns the level of trust that a buyer has in the seller. The customer service is not measurable on online illicit market places and therefore disregarded. The shopping benefits category consists of two subcategories; the prod-uct offeringwhich describes the information that is available about the product and the pricewhich describes the price of a product. Furthermore the following relevant

(6)

factors will invesigated in the remainder of this thesis: vendor reputation, vendor size, trust level, product informationand price. The weight of these factors on illicit markets is also discussed throughout the following section.

3.1.1 Vendor Reputation

A vendors reputation remains one of the few pieces of information that a consumer can assess during the online decision-making process.According to Einweiler (2001, p. 6) “reputation is an indicator that signals the experiences of third parties with a poten-tial exchange partner.” A substanpoten-tial amount of research has been dedicated to vendor reputation and its impact on economic factors. Xu & Kim (2008, p. 485) state that as “a vendor’s good reputation grows, the vendor may consider changing its pricing strategy for better profit.” However, vendor reputation sensitivity differs per situation since online markets vary heavily. Subsequently, a study by Gregg & Walzack (2010) investigates impact of vendor reputation on price and perceived trustworthiness for list-ings on the auctioning platform eBay. This research shows that a poor online auction reputation can negatively influence the price an auction item receives. Furthermore, a website of good quality and a poor vendor reputation doesn’t affect trustworthiness where a website of poor quality with a good vendor reputation score does affect trust-worthiness. More studies investigated the impact of vendor reputation on perceived trustworthiness of the vendor, such as a study conducted by Tams (2009) who found that web vendors’ reputations had a significant influence on consumers’ trust in them to deliver on their promises. Utz et al (2012, p. 56) add to this that “the effect of re-views online is stronger than the effect of store reputation. A negative review can have negative effects, and these effects cannot easily be compensated by a good reputation”. This brings us to the impact of reviews on vendors reputation. “Web based technologies have created numerous opportunities for electronic word-of-mouth (eWOM) commu-nication.” (Cheung et al., 2008, p.1) The information provided in these communication channels between vendor in the form of reviews and ratings can be of great impact during online commerce. In a study by Ye et al. (2009) the effect of reviews of vendors on the amount of sales of hotel rooms is investigated. The study stated that “traveler re-views had a significant impact on the online sales of hotel rooms. Online rere-views have the ability to reduce cognitive load, increasing the awareness of the customer, resulting in more sales” (Ye et al., 2009, p. 638). Finally, a study by Davis & Khazanchi (2008), investigates the effect of the number of views on product sales. The results show that the number views per product is statistically significant in explaining the increase in product sales. Furthermore, they add that the relationship between volume and sales will change depending on the number of product views.

Similarly, vendor reputation in conventional drug markets has its effects. In a study by Denton & O’Malley (1999) it is established that “a good reputation is

(7)

equiv-alent to a stack of references in the licit economy.” Furthermore, the dealers with a good reputation often acquired a base of buyers who bought more frequent and steady. According to the literature above vendor reputation proves to have a significant im-pacts on perceived trust, pricing and sales. Therefore, Vendor reputation can take on an important role throughout this thesis.

3.1.2 Trust

Trust is an important part of the buyer–seller relationship which needs to be build up from the start and can’t be neglected (Warren et al., 2008). In research by Gefen (2002) it is stated that not being able to control actions of third parties results in complexity which can cause consumers to simply choose not to engage in certain actions. Nonethe-less, this complexity can be lowered by trust. Therefore, according to Brynjolfsson & Smith (2000, p. 564) “trust may take on a heightened importance in electronic markets because of the spatial and temporal separation between buyers and sellers imposed by the medium”. According to Kollock (1999) transactions made in a digital marketplace can be described as Prisoner’s Dilemma transactions, that require a great amount of trust given that it may not be possible to track down or even identify the other party. Furthermore, the two sides of the transaction are likely to be separated in both time and distance. As stated earlier, trust is highly dependent of the perceived vendor reputation and the information available.

Trust also plays an important role in conventional drug markets. Parties in-volved in conventional drug markets share a level of uncertainty by cause of the il-legality of the market. Trust can eliminate uncertainty for both buyers, who might question product quality in relation to price, and sellers, who need trustworthy rela-tionships to avoid imprisonment.(Tzanetakis et al., 2016). On illegal digital markets however trust has to be established between anonymous sellers and buyers. Trust on these markets is generated by multiple factors such as vendor reputation by means of a rating, number of views, number of successful transactions and reviews by other buyers (Rhumorbarbe et al., 2016). These findings highlight the importance of creating trust, often through the results of feedback systems, subsequently enabling buyers to assess the previous behaviour of vendors with other buyers. Another way of establishing trust on illegal digital marketplaces is by making use of third-party services such as Escrow. Escrow is a system that would prevent the vendor of receiving the buyers money until after the product was received. However there are multiple cases where illegal digi-tal marketplaces were closed after money in escrow was stolen by administrators or external hackers (Martin, 2014). Seemingly, trust, which is highly dependant of a ven-dors reputation, is an essential factor for the success of online or offline, licit and illicit vendors. However, the importance of trust throughout this thesis is yet to be discussed.

(8)

3.1.3 Product Information

An important resource for online consumers is information. Since there is no real life experience of the product available on a digital market, information plays a much big-ger role in the decision making process of the buyer. Valuable information can often be obtained in online customer reviews by previous consumers. Online customer re-views can be defined as “peer-generated product evaluations posted on company or third party web sites” (Mudambi and Schuff, 2010, p. 186). In prior research it has become evident that online customer reviews can affect online businesses in multiple manners. Primarily, it can be stated that positive ratings generally have a positive ef-fect on online businesses. Specifically, Clemons et al. (2006) found, in a study on the effect of online ratings in the Beer industry, that strongly positive ratings can have a positive influence on the growth of product sales. Furthermore, they stated that “while the highest ratings are a good predictor of rapidly growing future sales, the presence of poor ratings is not a good predictor of poor sales” (Clemons et al., 2006, p. 160). Partially equal principles are discussed in a study by Chevaillier & Mayzlin (2006) who investigate the effect of consumer reviews on the sales patterns of two leading online book stores, namely Amazon.com and Barnesandnoble.com ( bn.com). Their results also indicate that five-star reviews improve sales. However, they also find that a poor review, such as a one-star, review hurts the sales of a book in a statistically signifi-cant way at Amazon.com. Moreover, one-star reviews have larger impact in absolute value relative to the five-star reviews, indicating that the one-star reviews, even when they are not heavily represented, have more impact on buyers (Chevalier and Mayzlin, 2006). The different findings above about the effect of negative ratings are somewhat contradictory. This can be explained by the nature of the products that are being used as subject. Firstly, the way of pricing essentially differs. Where the general price of a book is consistent the price of beer varies due to the quality and nature of the beer. Secondly, because beer is a consumable product which can be bought repeatedly, high ratings tend to show positive consumer behaviour which influences sales more than a one time negative experience. According to prior literature, product reviews, which provide for information regarding the product of interest, can significantly influence sales. Since the subject of this thesis are listings of illicit drugs, the product is most similar to the study of Clemons et al (2006) since it can be seen as a consumable prod-uct which can be bought repeatedly. However the actual impact of prodprod-uct information on illicit drug listings on an illegal digital marketplace is also yet to be discussed.

3.1.4 Product Pricing

Firstly, it is important to clearly define the concept of price. From a consumers’ point of view, price is usually defined as what the consumer must give up to purchase a prod-uct or service. Research typically views price only in terms of an amount of currency

(9)

asked or paid for an item or a service (Peter et al., 1999). “Online or offline, price is unquestionably one of the most important cues utilized during a consumer’s decision-making process” (Chiang and Dholakia, 2003, p. 179). Furthermore, next to price, the combination of pricing and the available information also seems to be of importance (Pan et al., 2002). However, since the quantity of available information varies per dig-ital market and its products, it is important to understand the different levels of impact of pricing on the customers perception of the product. Zeithaml (1988) states that con-siderable empirical research has investigated the relationship between price and quality and has shown that consumers use price to infer quality when it is the only available cue. However, when price is shown together with other cues, the outcome shows more fragile results. According to Grewal (1998) performance and price perceptions may be related in a way that a higher price indicates a higher quality product or service. Nonetheless, in accordance to Zeithaml (2009), this relationship is likely to reduce in significance in the presence of other informational cues. This same relation is also in-vestigated by a study of Kardes et al. ( 2004) in which experiments with varying infor-mational loads are conducted to investigate the selectiveness of information processing and their relation to perceived quality based on price. The results of the experiments show that, although consumers typically assume that price and quality are correlated, the degree to which price is used as a basis for inferring quality is reduced when other informational loads are high (Kardes et al., 2004). However, information can be scarce in illegal digital marketplaces. With customers only relying on a vendor’s description and rating, it is of importance to keep considering the relation between pricing and perceived quality.

In conventional drug markets price is also an important factor. According to Ritter (2006) detailed price analyses show that drug demand is sensitive to price. This is illustrated by the trends in the prices of cocaine, heroin and cannabis which show a substantial decrease in price during the past 28 years (Ritter, 2006). Storti & De Grauwe (2009) add to this by stating that modern globalization, like in other markets, has an opening effect on the drug market. This results in a more open and competitive market. Sequentially, drugs like cocaine and heroin have seen a substantial decline in retail price during the last two decades due to the reduction of transportation costs and the availability of new information technology (IT). This new IT also has improved the distribution and stock management of drugs as well as the communication around drugs. According to Bright & Ritter (2010, p. 360) “retail price has been regarded as a vital measure of drug market activity, and law enforcement effectiveness”. The price of pure gram of cocaine in 2003 in the US is estimate to be around $120 (DeSimone and Farrelly, 2003). According to Storti & De Grauwe (2009) the retail price of a gram of cocaine is around 100$.

Furthermore, research on the impact of price on illegal online marketplaces is of interest. According to Rhumorbarbe et al (2016) price is also a key feature of illicit

(10)

drug market(s). In multiple studies it was stated that consumers are willing to pay generally high prices for illicit drugs on illegal digital marketplaces. Some consumers also stated that the promise of a very high quality product was incentive enough to buy at a higher price. (Barratt et al., 2014; Van Hout and Bingham, 2013). Contrary to this, Martin (2014) states that consumers are able to source better quality products than those available from street retailers, at a lower price. Christin’s (2012) research into a first online digital marketplace called Silk Road indicates this as well saying that consumers were extraordinarily satisfied about both price and quality. Concluding, price is an important cue in the consumer decision process. When it’s the only cue available, price is highly relational with quality. On highly anonymous markets however price often is one of the only cues available. The findings above are acknowledged by research that says consumers don’t mind paying a higher price since they believe in the promise of a higher quality product. Contrary to this, research states that prices are generally lower on darknet marketplaces. In this thesis the effect of price on revenue on online digital marketplaces is yet to be investigated.

3.2

Illicit Drug Markets

3.2.1 Conventional Drug Markets & Distribution Networks

There has been much research on the various sizes and characteristics of drug bution networks and markets. Firstly, the size and operational nature of drug distri-bution networks will be discussed. In a study by Martin (2014) it is established that traditional illicit drug distribution networks are often large, spanning across multiple countries. There are many nodes in these networks which account for different lev-els of control and power. Furthermore, May & Hough (2004, p. 554-555) state that “traditionally, the structure of drug distribution systems has been viewed as pyramidal, with large-scale importers and traffickers operating at the apex, filtering down to street dealers who operate on the lowest tier.” A similar view is presented in Bruinsma & Bernasco (2004) who state that the size of a distribution network is generally large, the network has a high density, the network has strong cohesion and frequent contact be-tween actors inside this network is typical. As stated in Caulkins & Reuter (1998, p. 2) “Illicit drugs are, ultimately, consumer goods, and like other goods in modern societies they are provided primarily through markets.” In contrast to the distribution networks discussed above, the size of conventional drug trading parties on drug markets seems to be smaller. In a study by Bouchard (2007) which investigated the size of various drug organisations in Quebec it is stated that majority of more local organisations were strikingly small, consisting of 2 to 10 members. Additionally, small organizations are more easily created and more easily replaced than large ones. Independently of size, drug distribution networks and drug markets are dynamic, constantly fluid and highly variable. “This flexibility explains their characteristic durability, and accounts for the

(11)

continued functioning of illicit drug networks despite the ongoing removal of dealers, traffickers or any other nodes that constitute part of the system of distribution” (Mar-tin, 2014, p. 364). As stated in a study by Bouchard (2007) illegal drug networks are in fact, to an extent, resilient to external shocks caused by heavily increased pressure of law enforcement. In his study it becomes evident that in spite of significant rises in repressive policy by the criminal justice system in the U.S. in between 1980’s and 1990’s, no indicator shows that it affected markets for illegal drugs. From certain an-gles, drug markets may even seem stronger than ever. “Recruitment and replacement of dealers do not appear to present any difficulties, and users perceived availability of most drugs has remained stable” (Bouchard, 2007, p. 326). However, the nature of illicit drug markets remains highly competitive. Bouchard (2007) also states that a lot of organisations only last for a very short period of time before splitting up and coming back in different formations sometimes.

Furthermore, the variance in the incomes of the actors on the different levels of the drug trade is discussed. “Though drug markets generate hundreds of billions of dollars in sales and have created great wealth for some traffickers, it is important to understand that the overwhelming majority of those involved in the drug trade make very modest incomes” (Reuter & Trautman, 2009, p. 9). To conclude, a thorough un-derstanding of drug distribution systems and drug markets is necessary to comprehend the darknet marketplaces discussed throughout this thesis. Conventional distribution networks seem large and parties operating in drug local markets are traditionally small and replaceable. Moreover, drugs are a consumer good with consumer behaviour be-hind them. However, since darknet markets operate online, what does this imply for the darknet markets in contrary to more conventional offline illicit drug markets?

3.2.2 Online Illicit Drug Markets

As stated in Martin (2014) the internet has been used for facilitating drug trades since it was first established. According to Markoff (2005) the first online e-commerce trans-action was the exchange of a bag of marijuana in 1971 between students of Stanford University and Massachusetts Institute of Technology, using the Arpanet. This exam-ple illustrates the powerful nature of the internet in connecting peoexam-ple with personal goals, such as buying or selling of a product. It is also the first example of the inter-net being used as an environment for both the selling and consuming of illicit drugs. As stated in Buxton & Bingham (2015) both the rise of the personal home computer and the increasing use of the internet resulted in the establishment of areas for sharing knowledge about drug use, legislation and drug manufacturing. In a study by Schnei-der (2003) the nature of the popular newsgroup alt.drugs, which originated in 1987, is investigated. The newsgroup was heavily used to discuss the manufacturing and distri-bution of popular drugs, to a point where it could be seen as a way of teaching. It is

(12)

also stated that the information provided on the newsgroup formed the foundation for potentially violating laws. Early online drug deals were later arranged through private messaging instances of the forum.

However, “key vulnerabilities such as ease of public access over the clear net (enabling law enforcement infiltration of user groups), and their use of traceable pay-ment systems (Paypal, Pecunix, Western Union and cash in the post) allowed for inter-ference of law enforcement” (Buxton & Bingham, 2015, p. 3). The consequences of these vulnerabilities are illustrated by the arrest of Marc Willems, a Dutch citizen living in Lelystad, who, in collaboration with multiple American citizens, setup an online il-legal drug marketplace called Adamflowers, also known as ‘The Farmers Market’. The market first operated through Hushmail, an encrypted email service, and later started using TOR browsing for more heavy encryption, of which the functionality will be explained in the following section. However, payments went through Western Union, Paypal and cash payments through mail. After more than 5000 successful transactions accounting for more than a million dollars in revenue, the United States Drug Enforce-ment Agency (DEA) was able to take down the predecessor of the first so called darknet marketplaces (DEA Public Affairs, 2012). The potential for selling and buying of illicit goods through the internet is something that was recognized in the early stages of its emergence. Primary communities and even markets were certainly used substantially but lacked essential manners of security to sustain. The following section discusses the markets that originated after the rise of various novel technologies.

3.2.3 Darknet Marketplaces

Darknet marketplaces first arise in 2011 and can be seen as a revolutionary way of trad-ing illicit drugs (Bros´eus et al., 2016). Darknet marketplaces are illegal digital market-places located on the dark web that facilitate anonymous buying and selling of services and goods. Other studies refer to these illegal digital marketplaces as cryptomarkets, due to the cryptocurrency based transactions on these markets (Martin, 2014), or as anonymous online markets (Christin, 2013). Throughout this thesis the term darknet marketplace will be used. The birth of these darknet marketplaces originates from a combination of several recent technologies. Firstly, anonymity online became available through the use of Tor. Tor is an abridgement for The Onion Router and was the result of a project by the United States Naval Research Laboratory to facilitate secure commu-nication online for sensitive government intelligence. Tor uses a special kind of routing called onion routing. Onion routing implements multiple layers of encryption, hence the onion metaphor. As a result the IP address of the user is untraceable and the user is completely anonymous (Dingledine et al,. 2004). A second novel technology which played an important part in the birth of these new darknet marketplaces was the Bit-coin. Bitcoin was the first of many cryptocurrencies and was the digital currency used during the transactions on any darknet marketplace. Bitcoin is stored in a virtual wallet (Van Hout & Bingham, 2014). The payments with Bitcoin are completely anonymous since no third party is involved and all transactions go through the blockchain making them heavily encrypted (Nakamoto, 2008). Lastly, the practical nature of the darknet markets resembles much of the nature of other successful marketplaces, such as Ebay.

(13)

For instance, a grid based overview with large numbers of listings and a community of buyers which provide public information about vendors (Barratt, 2012). Moreover, Tzanetakis et al (2016) state:

‘professional-looking’ shipments that contain drugs are delivered by traditional postal services without their knowledge. Unlike traditional markets, geographical restrictions are only related to customs on online drugs market. Drugs can be purchased anytime globally on darknet marketplaces and thus create an even larger and more competitive market. (p. 66).

In research conducted by Paquet-Clouston (2018) the size and scope of vendors on a darknet marketplace called Alphabay is discussed. It is stated that the majority of ven-dors are small and make little to no sales on a regular basis, whereas high-level venven-dors spend more money on repetitive listings and advertising. Because of the interwoven-ness of the online illicit drug market and the physical labor in the offline world, such as packaging and shipping, growing as a vendor will always remain costly. Still, the com-bination of anonymous vendors and buyers, untraceable transactions, a high level of accessibility, vendor feedback systems and consumer friendly design makes for a very interesting platform for the commerce of illegal products and services. The customers on these illegal digital marketplaces, just like on regular digital marketplaces, are look-ing for the best product for the best possible price. Van Hout & Blook-ingham (2014) even describe the customers on Silk Road I, one of the biggest and most successful dark web/net marketplaces, as intelligent, educated and wealthy customers. Therefore it is essential to see what factors influence these customers buying behaviour, which even-tually results in the total revenue on these markets. Where the research of Van Hout & Bingham (2014) provides for detailed information about the characteristics of cus-tomers on darknet marketplaces and the research of Paquet-Clouston et al. (2018) accurately analyzes the size and scope of vendors on a darknet marketplace, this search goes beyond descriptives and characteristics and will provide for analytical re-sults about the driving forces in such a marketplace.

3.2.4 Case Study: Cocaine listings on Alphabay

To answer the research question and the corresponding sub-questions, real darknet mar-ketplace data will be analysed in this research. The dataset used for the analysis in this thesis originates from the data library of an article of David Denton, an American data scientist who scraped data with a Google Chrome extension developed by Martins Balodis (Denton, 2017). The raw dataset describes more than 600 cocaine listings and the data originates from the darknet marketplace Alphabay. In a study by Van Buskirk et al. (2016) it becomes evident that “the two largest marketplaces still operating at the end of the monitoring period were Alphabay and Dream Market. Alphabay recorded the highest number of unique vendors seen on any one market since the monitoring

(14)

project began in September 2012.” Alphabay is characterized by their ‘Vendor level’ and ‘Trust level’ scores which are available throughout the whole dataset. To better understand the nature of these listings an example is provided in the figure below. To-gether with its active community of vendors and buyers and its clear ways of presenting information, Alphabay provides for an interesting case in regard to the research ques-tion.

Figure 1: Exemplary illicit drug Listing on Alphabay

4

Methodology & Data

The essence of licit markets form the base of this research. The literature that is stated above provides us with the current state of knowledge that is available about the factors affecting sales and consumer decisions which result in revenue. After having discussed the nature of conventional illicit drug markets and the emergence of new online illicit drug markets the analysis of these markets becomes engaging.

4.1

Programming Techniques

Several programming languages and techniques are used throughout the analysis of the dataset. The most viable tool used in this thesis is Python. Python is a powerful programming language designed in 1991, which focusses on code readability. Python’s syntax enables programmers to write programs in fewer lines of code in comparison to older programming languages such as C (Van Rossum, 2007).

4.1.1 NumPy & Pandas

NumPy provides for an elegant way of presenting numerical data in Python, making use of N-dimensional NumPy arrays, like the columns and rows in a Micorosoft Excel

(15)

sheet. Pandas is a Python library of data structures that builds on NumPy. The statistical tool was initially developed for quantitative finance applications. It is a powerful tool for the manipulation and ordering of datasets and makes use of techniques such as automatic data alignment and hierarchical indexing (McKinney, 2011). In this thesis the pandas module will be used to manipulate the raw dataset. This dataset should also be processed in a way that machine learning techniques can be applied.

4.1.2 Regular Expressions

Data is often unstructured and messy. Therefore it is essential to efficiently extract useful information from the raw data set. A method for doing so in pandas is using regular expressions. This method locates specific character strings embedded in char-acter text (Thompson, 1968). Given the correct regular expression the output results in text that matches a certain part of the original piece of text. For instance, this way all digits corresponding to a % sign can be extracted or specific values behind the letter combination USD can be found.

4.1.3 Matplotlib

Matplotlib is a library for making 2D plots of arrays of data in Python. Even though Matplotlib is primarily written in pure Python, it makes heavy use of NumPy and other extension code to provide good performance even for large arrays (Hunter, 2007). In this thesis Matplotlib is used to visualize the outcome of all machine learning algo-rithms in graphic images, presented as graphs.

4.1.4 Machine Learning & Scikit-Learn

The problem that arises in this case can be categorized as a supervised machine learning problem. In supervised learning problems, a program predicts an output based on the input it receives. The program analyses the correct pairs of inputs and outputs, whilst trying to learn from these pairs. The problem is also a regression problem and therefore regression algorithms will be applied to solve this problem. In regression problems the program must predict the value of a continuous response variable (Hackeling, 2014). In this thesis the continuous response variable is total gross per day. Scikit-learn is a Python library that provides for the implementation and use of machine learning algorithms while using the Python language (Pedregosa, 2011). Many factor affect the success of Machine Learning on a given problem. ”Reducing the dimensionality of the data reduces the hypothesis space and allows for algorithms to operate faster and more efficiently” (Hall, 2000, p. 1). Therefore, only features that affect the target variable directly are included in the model. The regression algorithms which will be used are linear regression, regularized linear regression, decision tree regression and random

(16)

forest regression. The last two allow for the inspection of feature importance (Liaw et al., 2002).

4.2

Preparing the dataset

Firstly, the raw dataset was processed into a pandas DataFrame which allows for in-formation extraction, computational operations and general data structuring and mod-ification. For the extraction of a listings purity or quantity, regular expressions proved to be an indispensable tool. After extracting every piece of potentially viable informa-tion from the dataset, various computainforma-tional operainforma-tions between variables so far were executed. Resulting in a dataset representing all the essential pieces of information per a listing. The following features were extracted out of the raw dataset, and computed when applicable as presented in table 1. The final dataset counted 129 clean listings.

Table 1: Dataset features

Feature Description

Quantity The amount of cocaine offered in grams

Quality The percentage of purity of the cocaine

Successful transactions The number of successful transactions per lisiting Vendor level The vendor level measured on a scale from 1-10

Trust level The trust level of the vendor measured on a scale from 1-10

Price Price in U.S. dollars

Price per gram Price per whole gram

Calculated by dividing price by quantity and multiplying by 100 Price per pure gram Price per 100% pure whole gram

Calculated by dividing price per pure gram by quality and multiplying by 100 Days in business The amount of days the listing has been online

Total gross per day

The total amount of money the listing has produced per day

Calculated by multiplying the number of successfull transactions by price and dividng this by the number of days in business

4.3

Analytical Methodology & Evaluation Metrics

Subsequently, after preparing the dataset a general exploration of the dataset provides for areas of interest for the rest of the analysis. To illustrate potential telling relation-ships in the dataset a correlation heat map is constructed. For the machine learning algorithms the dataset is split up into datasets containing either solely the target vari-able, namely total gross per day, or all of the remaining features. Before attempting to calculate the impact of every feature on the target value it is fundamental importance to evaluate how well the features explain the variance in the target variable. In order to find out which algorithm models this particular regression problem best, multiple regression algorithms will be tested and evaluated. All algorithms will be applied as polynomial regression allowing for non-linear trends to be modelled as well by creating more features(Kokkinos and Maragos, 2005). Since the dataset is relatively small, in-stead of splitting the dataset into training and test datasets, k-fold cross validation will

(17)

be applied. Because of the iterative nature of the training and test dataset the whole dataset can be used in the evaluation of the fit (Francis, 2003). A 5-fold cross vali-dation is used throughout the research meaning that the dataset will be split into five different test and training datasets.

While evaluating the performance of every algorithm the same set of metrics will be used, namely; training R2 score, test R2 score, explained variance score, mean average error (MAE) and root mean squared error (RMSE). The first three metrics are all on a 0 - 100% scale, so the higher the value the more positive effect this has for the particular fit of the model. The training data score is the R2 score of the training dataset which is 80% of all the data points. This score is structurally high because the model learns how to score based on the behaviour of the training data. The R2 score, also known as R-squared or the coefficient of determination, generally measures the goodness-of-fit for regression models. This score indicates the percentage of the vari-ance in the target variable caused by the remaining features in the model (Nagelkerke et al., 1991). Subsequently, R2 measures the strength of the relationship between your model and the target variable. The Explained Variance Score accounts for the measure-ment of the same type of relationship but uses a different approach for error residues. The other two metrics are error rates, which naturally need to be minimized when eval-uating model performances. The first metric is the mean absolute erre (MAE). ”MAE measures the average magnitude of the errors in a set of predictions, without consid-ering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight” (Karim, 2018). The second metric is the root mean sqaured error (RMSE). The RMSE expresses the square root of the average of squared differences between prediction and actual observation, whilst taking the magnitude of the error into ac-count. Whether either one of the metrics is better for model evaluation has been the subject of discussion in the past. Willmott & Matsuura (2005, p. 82) state that “MAE is the most natural measure of average error magnitude, and that (unlike RMSE) it is an unambiguous measure of average error magnitude.” Chai & Draxler (2014, p. 1249-1250) state that “a combination of metrics, including but certainly not limited to RMSEs and MAEs, are often required to assess model performance”. Furthermore, during the analysis in this thesis MAE will be the main measurement of error but other metrics will not be excluded from the results.

(18)

4.4

Data Exploration

4.4.1 Exploration

A general look at the descriptives of the dataset provides us with information about the cocaine listings on the Alphabay network. For instance, the offered quantity on the listings varies from a minimum of 0.1 grams to a maximum of 14 grams with an average of 2.11 grams. Some other notable facts from this table are that the vendor level is generally higher than the level of trust, the quality of the cocaine has a purity of 90.65% and that the mean price per gram is in balance with cocaine prices discussed earlier in this thesis (Storti and De Grauwe, 2009; DeSimone and Farrelly, 2003). Most striking perhaps is the data of the target variable total gross per day which varies from a minimum 1.81$ a day to 459.05$ a day with a mean of 63,99$ and a high standard deviation, illustrating the variance in the scope, size and gained revenue of the vendors in the dataset.

Table 2: Dataset Descriptives

Quantity Quality Successful transactions Vendor level Trust level Number of views Count 129.00 129.00 129.00 129.00 129.00 129.00 Mean 2.11 90.65 146.85 6.08 5.61 4895.69 Standard deviation 2.68 5.96 170.52 2.25 1.41 7703.95 Minimum 0.10 50.00 32.00 1.00 3.00 202.00 25% 0.50 90.00 48.00 4.00 4.00 1490.00 50% 1.00 90.00 81.00 6.00 6.00 2423.00 75% 2.00 93.00 148.00 8.00 6.00 4385.00 Maximum 14.00 100.00 803.00 10.00 10.00 62724.00

Price Price per gram Price per pure gram Days in Business Total gross per day Count 129.00 129.000 129.00 129.00 129.00 Mean 154.49 95.65 105.95 336.79 63.99 Standard deviation 174.46 71.16 79.54 172.48 78.72 Minimum 10.00 39.29 43.89 71.00 1.81 25% 55.00 58.31 64.82 196.00 13.45 50% 94.04 74.62 79.55 303.00 33.68 75% 186.00 98.06 105.26 444.00 97.25 Maximum 987.82 410.00 455.56 746.00 459.05

4.4.2 Correlation heat map

The figure below illustrates the level of correlation of each feature with every other feature in the dataset through a correlation heatmap. The red diagonal indicates the 100% correlation score of every feature with itself. The other red blocks in the heatmap show points of correlation between various features. Therefore, points of interest are therefore between quantity and price, successful transactions and number of views, vendor level and trust level, trust level and days in business, number of views and successful transactions, total gross per day and price, total gross per day and number of views and total gross per day and successful transactions.

(19)

Quantity

Quality

Successful Transactions

Vendor Level

Trust Level

Number of views

Price

Price per gram

Price per pure gram

Days in business

Total gross per day

Quantity

Quality

Successful Transactions

Vendor Level

Trust Level

Number of views

Price

Price per gram

Price per pure gram

Days in business

Total gross per day

0.25

0.00

0.25

0.50

0.75

1.00

Figure 2: Correlation heatmap for all features in the dataset

5

Results & Discussion

5.1

Correlation analysis

The relation between the features mentioned above, except the ones correlating with total gross per day, are visualized in the figures on the next page. As seen in Davis & Khazanchi (2008) their research there is a relation between the number of views and the amount of successful transactions. Subsequently, a rational relation between the price of a listing and the quantity in grams that is offered is visible. Another finding that agrees with the literature on this subject is the relation between vendor reputation and the perceived level of trust (Gregg and Walczak, 2010; Tams, 2009). Finally, there is a correlation between the level of trust and the number of days a listing is online that also illustrates that there are no listings with a trust level of 8 or higher that haven’t been online for more than 340 days. This potentially illustrates that trust online is built over time. In the next section the appliance, analysis and evaluation of multiple machine learning algorithms will be discussed to model the behaviour of the target variable.

(20)

(a) Correlation: 0.66 (b) Correlation: 0.78

(21)

5.2

Regression Evaluation

The first part of the analysis consists of evaluating whether the features stated above really do explain the target value, which is the total gross earned per day per unique listing. In the table below the results of multiple machine learning algorithms are pre-sented. Since the predicted variable is a numerical variable that is not known yet, the problem is approached like a regression problem. The metrics as discussed in the methodology are presented in the table to assess the performance of the algorithms that are used. All of the scores are the mean of five scores due to the k-fold cross validation. In general the scores are good with no R2 score below 0.627 and no error higher than 46.700. However, when looking at the scores in the table, the algorithms with the best model fit are most interesting for further investigation.

Table 3: The five regression models with evaluation metrics

Polynomial Linear Regression Polynomial Regularized

Linear Regression (Lasso)

Polynomial Regularized Linear Regression (Ridge)

Training Data Score 0.747 0.997 0.997

R2 Score 0.627 0.856 0.836

Explained Variance Score 0.636 0.858 0.849

MAE (Mean Absolute Error) 30.049 16.128 16.750

RMSE (Root Mean Squared Error) 46.022 29.417 29.831

Polynomial Decision Tree Polynomial Random Forrest

Training Data Score 0.945 0.968

R2 Score 0.626 0.786

Explained Variance Score 0.639 0.791

MAE (Mean Absolute Error) 25.490 19.055

RMSE (Root Mean Squared Error) 46.700 34.838

5.3

Polynomial Regularized Linear Regression (Lasso)

The model with the highest evaluation scores and the lowest error rates is the model that uses linear regression with Lasso regularization. Lasso regularization causes the coefficients of features towards zero and even sets some features to zero, resulting in models that aren’t overfitting while reducing variance, without increasing biasness of the model (Zou and Hastie, 2005). The cross validated R2 score of the Lasso model is 0.856. Therefore, 85.6% of the variance in the total gross per day can be explained by the features as stated in the data section of this thesis. Altogether with an MAE error score of 16.128, the goodness-of-fit for this model is high. However, this linear regression model doesn’t allow for the inspection of feature importance, which is just as important as goodness-of-fit for the research question posed. An algorithm that does allow for feature importance inspection is the Random Forest Algorithm.

(22)

5.4

Random Forest Algorithm

The goodness-of-fit for this model is lower with a cross validated R2 score of 78.6 and an MAE error score of 19.055. Nevertheless, the possibilities and functionality of the random forest algorithm serves the purpose of this thesis a lot better. While still predicting the targeted variable total gross per day the importance of the separate features in this process is also calculated. The importance of the features for explaining the variance in the target variable total gross per day for the cocaine listings in the dataset are visualized in the graph below. A visualization of one of results of one random tree are available in the appendix.

Figure 3: Random Forest feature importance graph with all features

The sum of the feature importances adds up to 100. Therefore, the percentages as seen in the graph account for a percentage of importance per feature for explaining the vari-ance in the predicted total gross per day. Price and number of successful transactions account for the lion share of the variance. Notably, Trust and vendor level together account for less than 3 percent of the variance. Also, quality doesn’t seem to explain more that 2 percent of the variance.

(23)

5.5

Dropping number of Successful Transactions

Since the variable which expresses the number of successful transactions accounts for such a big part of the variance it might be interesting to remove this variable from the model. It is also a factor which vendors can’t really control. This results in a new R2 score of 0.672 with an MAE of 26.014. The goodness-of-fit is substantially lower but still explains a good amount of the variance of the target value. However, the feature importances become more telling.

Figure 4: Random Forest feature importance graph without successful transactions

Price remains an important feature but the number of views replaces the importance of successful transactions, as a result of their correlation as seen in the correlation heatmap above. What is most striking is that the importance of the Vendor level and Trust level remains insignificant. This is in contrary to all previous literature on trust in licit (online) markets, which states that a high level of trust perceived trust has a significant impact on sales. (Brynjolfsson & Smith, 2000). Trust also proved to be an important factor in previous studies on conventional illicit drug markets (Tzanetakis et al, 2016) but seems to be of insignificance whenever the illicit drug market is operational online with a high level anonymity between buyer and vendor.

5.6

Discussion

The aim of this thesis is to see which factors impacted the target variable total gross per day most, in a way to extract the driving forces behind an online unsupervised illicit drug market. As darknet marketplaces provide for interesting economic cases, it is valuable to identify similarities with acquainted economical principles, but even more so, differences with these economical principles. In accordance with previous

(24)

literature price can be a highly important informational cue in the consumer decision making process (Chiang & Dholakia, 2003). Furthermore, price can take on vital role in assessing the quality of a product when information is scarce or potentially unreliable (Grewal et al., 1998; Kardes et al., 2004). As well as Davis & Khazanchi (2008) state in their research, there is evidence of that the number of views has an impact on the target value total gross per day when the variable of successful transactions per day was dropped from the model. Another finding that proves to be in accordance with prior literature is the relation between a vendor reputation and the perceived level of trust. Prior literature states that trust, which is often a result of a good vendor reputation, has a significantly positive effect on sales.The results in this research however show no such impact. Altogether, the variables vendor level and trust combined account for no more than 5% of the total variance explained of the total gross of a listing. Where trust does play an important role in conventional drug markets, the nature of the darknet marketplaces isn’t based on the same levels of trust, since anonymity is guaranteed.

6

Conclusion & Limitations

6.1

Conclusion

As discussed in the introduction of this thesis the aim of this thesis is to find factors and their corresponding level of impact on the total gross per day of a listing on a darknet marketplace by applying machine learning. Literature on licit online markets provides for factors that may also apply to darknet marketplaces, which, just as conventional illicit drug markets, differ from licit online markets. In order to find similarities or dif-ferences between both licit and illicit online markets a case study was conducted. More than describing the scope and size of vendors on the darknet marketplace like Paquet-Clouston (2018), this research focuses on analysis. By cleaning and restructuring data originating from the darknet marketplace Alphabay, analysis becomes possible. The supervised regression problem that arises while trying to model and predict a numer-ical target variable, namely total gross per day per listing, is approached by multiple regression algorithms. After evaluation, regularized polynomial linear regression, us-ing Lasso, and random forest regression prove to be the best fit for explainus-ing variance in the target variable. By inspecting the the feature importances in the random for-est regressor insights were gained about the factors impacting the total gross per day. Price, successful transactions and later number of views per listing proved to be of high importance, which is in accordance with prior research. The correlated features ven-dor level and trust level however prove to be insignificant on the darknet marketplace, which is novel information. While trust plays an important role in licit online markets and conventional illicit drug markets, the anonymity on the darknet marketplaces could retain a similar role.

(25)

6.2

Limitations

The cleaned up dataset exists of 129 listings which is acceptable for the use of ma-chine learning algorithms. However, more data provides for a more general output. Therefore, in future studies the size of the raw dataset should be increased by longer scraping of the darknet marketplace website. Another limitation is that solely the list-ings concerning cocaine are taken into account. Other categories of illicit drugs or even other categories of products could result in different findings. Furthermore, “statisti-cal methods can help identify significant correlations, but these are rarely considered to be sufficient to posit the existence of a causal connection” (Mittelstadt et al., 2016, p. 4). Even though correlation provides for information about the features involved, machine learning doesn’t deliver causal relationships. However, this research poses a first manner of analyzing darknet marketplaces and their economic principles, based on machine learning, and the correlations provided in the results might be valuable while considering future research.

7

Acknowledgements

The author would like to thank mw. dr. V.M. (Vanessa) Dirksen for her accompaniment during this research. The author would also like to thank dhr. prof. dr. E. Kanoulas for taking some brief time to assess the code written throughout this thesis and providing the author with useful advice concerning the methodology. The author would like to thank his family and friends for lending me their critical eyes and opinions whilst remaining supportive throughout the years of this bachelor programme and throughout this thesis.

References

Affairs, D. P. (2012). Operation Adam Bomb: Arrest of Creators, Operators of Online Secret Narcotics Marketplace. https://www.dea.gov/pubs/ pressrel/pr041612a.html. [Online; accessed 12-May-2018].

Anderson, E. L., Steen, E., and Stavropoulos, V. (2017). Internet use and problem-atic internet use: A systemproblem-atic review of longitudinal research trends in adoles-cence and emergent adulthood. International Journal of Adolesadoles-cence and Youth, 22(4):430–454.

Barratt, M. J. (2012). Silk road: ebay for drugs. Addiction, 107(3):683–683.

Barratt, M. J., Ferris, J. A., and Winstock, A. R. (2014). Use of silk road, the online drug marketplace, in the united kingdom, australia and the united states. Addic-tion, 109(5):774–783.

(26)

Barratt, M. J., Ferris, J. A., and Winstock, A. R. (2016). Safer scoring? cryptomarkets, social supply and drug market violence. International Journal of Drug Policy, 35:24–31.

Betancourt, R. R. and Gautschi, D. (1993). Two essential characteristics of retail mar-kets and their economic consequences. Journal of Economic Behavior & Organi-zation, 21(3):277–294.

Bouchard, M. (2007). On the resilience of illegal drug markets. Global crime, 8(4):325–344.

Bouchard, M. and Morselli, C. (2014). Opportunistic structures of organized crime. The Oxford handbook of organized crime, pages 288–302.

Bright, D. A. and Ritter, A. (2010). Retail price as an outcome measure for the effectiveness of drug law enforcement. International Journal of Drug Policy, 21(5):359–363.

Bros´eus, J., Rhumorbarbe, D., Mireault, C., Ouellette, V., Crispino, F., and D´ecary-H´etu, D. (2016). Studying illicit drug trafficking on darknet markets: structure and organisation from a canadian perspective. Forensic science international, 264:7–14.

Bruinsma, G. and Bernasco, W. (2004). Criminal groups and transnational illegal mar-kets. Crime, Law and Social Change, 41(1):79–94.

Brynjolfsson, E. and Smith, M. D. (2000). Frictionless commerce? a comparison of internet and conventional retailers. Management science, 46(4):563–585.

Buxton, J. and Bingham, T. (2015). The rise and challenge of dark net drug markets. Policy Brief, 7.

Caulkins, J. P. and Reuter, P. (1998). What price data tell us about drug markets. Journal of drug issues, 28(3):593–612.

Caulkins, J. P. and Reuter, P. (2006). Illicit drug markets and economic irregularities. Socio-Economic Planning Sciences, 40(1):1–14.

Chai, T. and Draxler, R. R. (2014). Root mean square error (rmse) or mean absolute er-ror (mae)?–arguments against avoiding rmse in the literature. Geoscientific model development, 7(3):1247–1250.

Cheung, C. M., Lee, M. K., and Rabjohn, N. (2008). The impact of electronic word-of-mouth: The adoption of online opinions in online customer communities. Internet research, 18(3):229–247.

(27)

Chevalier, J. A. and Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of marketing research, 43(3):345–354.

Chiang, K.-P. and Dholakia, R. R. (2003). Factors driving consumer intention to shop online: an empirical investigation. Journal of Consumer psychology, 13(1):177– 183.

Chiu, C.-M., Wang, E. T., Fang, Y.-H., and Huang, H.-Y. (2014). Understanding cus-tomers’ repeat purchase intentions in b2c e-commerce: the roles of utilitarian value, hedonic value and perceived risk. Information Systems Journal, 24(1):85– 114.

Clemons, E. K., Gao, G. G., and Hitt, L. M. (2006). When online reviews meet hy-perdifferentiation: A study of the craft beer industry. Journal of management information systems, 23(2):149–171.

Davis, A. and Khazanchi, D. (2008). An empirical study of online word of mouth as a predictor for multi-product category e-commerce sales. Electronic markets, 18(2):130–141.

Denton, B. and O’Malley, P. (1999). Gender, trust and business: Women drug dealers in the illicit economy. British Journal of Criminology, 39(4):513–530.

Denton, D. (2017). Annual sales estimation of a darknet market. https://rpubs. com/davidldenton/dnm. [Online; accessed 15-April-2018].

DeSimone, J. and Farrelly, M. C. (2003). Price and enforcement effects on cocaine and marijuana demand. Economic Inquiry, 41(1):98–115.

Dingledine, R., Mathewson, N., and Syverson, P. (2004). Tor: The second-generation onion router. Technical report, Naval Research Lab Washington DC.

Einwiller, S. (2001). The significance of reputation and brand for creating trust in the different stages of a relationship between an online vendor and its customers. In Proceedings of the 8th Research Symposium on Emerging Electronic Markets (RSEEM2001), September, volume 16, page 18. Citeseer.

Ellison, G. and Ellison, S. F. (2009). Search, obfuscation, and price elasticities on the internet. Econometrica, 77(2):427–452.

Francis, L. (2003). Martian chronicles: Is mars better than neural networks? In Casualty Actuarial Society Forum, pages 75–102.

Gregg, D. G. and Walczak, S. (2010). The relationship between website quality, trust and price premiums at online auctions. Electronic Commerce Research, 10(1):1– 25.

(28)

Grewal, D., Monroe, K. B., and Krishnan, R. (1998). The effects of price-comparison advertising on buyers’ perceptions of acquisition value, transaction value, and behavioral intentions. The Journal of Marketing, pages 46–59.

Hackeling, G. (2014). Mastering Machine Learning with scikit-learn. Packt Publishing Ltd.

Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.

Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(3):90–95.

Kardes, F. R., Cronley, M. L., Kellaris, J. J., and Posavac, S. S. (2004). The role of selective information processing in price-quality inference. Journal of Consumer Research, 31(2):368–374.

Karim, M. R. (2018). Scala Machine Learning Projects: Build real-world machine learning and deep learning projects with Scala. Packt Publishing Ltd.

Kokkinos, I. and Maragos, P. (2005). Nonlinear speech analysis using models for chaotic systems. IEEE Transactions on Speech and Audio Processing, 13(6):1098–1109.

Liaw, A., Wiener, M., et al. (2002). Classification and regression by randomforest. R news, 2(3):18–22.

Markoff, J. (2005). What the Dormouse Said: How the Sixties Counterculture Shaped the Personal ComputerIndustry. Penguin.

Martin, J. (2014). Drugs on the dark net: How cryptomarkets are transforming the global trade in illicit drugs. Springer.

McKinney, W. (2011). pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing, pages 1–9.

McKnight, D. H., Choudhury, V., and Kacmar, C. (2002). Developing and validating trust measures for e-commerce: An integrative typology. Information systems research, 13(3):334–359.

Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., and Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2):2053951716679679.

Mudambi, S. M. and Schuff, D. (2010). Research note: What makes a helpful online review? a study of customer reviews on amazon. com. MIS quarterly, pages 185–200.

(29)

Nagelkerke, N. J. et al. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3):691–692.

Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system.

Pan, X., Ratchford, B. T., and Shankar, V. (2002). Can price dispersion in online markets be explained by differences in e-tailer service quality? Journal of the Academy of Marketing science, 30(4):433–445.

Paquet-Clouston, M., D´ecary-H´etu, D., and Morselli, C. (2018). Assessing market competition and vendors’ size and scope on alphabay. International Journal of Drug Policy, 54:87–98.

Park, C.-H. and Kim, Y.-G. (2003). Identifying key factors affecting consumer pur-chase behavior in an online shopping context. International journal of retail & distribution management, 31(1):16–29.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blon-del, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Ma-chine learning in python. Journal of maMa-chine learning research, 12(Oct):2825– 2830.

Peter, J. P., Olson, J. C., and Grunert, K. G. (1999). Consumer behavior and marketing strategy.

Ranganathan, C. and Ganapathy, S. (2002). Key dimensions of business-to-consumer web sites. Information & Management, 39(6):457–465.

Ranganathan, C. and Grandon, E. (2002). An exploratory examination of factors af-fecting online sales. Journal of Computer Information Systems, 42(3):87–93.

Reuter, P. and Trautmann, F. (2009). Global illicit drugs markets 1998–2007. Utrecht: European Commission.

Rhumorbarbe, D., Staehli, L., Bros´eus, J., Rossy, Q., and Esseiva, P. (2016). Buying drugs on a darknet market: A better deal? studying the online illicit drug mar-ket through the analysis of digital, physical and chemical data. Forensic science international, 267:173–182.

Ritter, A. (2006). Studying illicit drug markets: Disciplinary contributions. Interna-tional Journal of Drug Policy, 17(6):453–463.

Salam, A. F., Iyer, L., Palvia, P., and Singh, R. (2005). Trust in e-commerce. Commu-nications of the ACM, 48(2):72–77.

Schneider, J. L. (2003). Hiding in plain sight: an exploration of the illegal (?) activities of a drugs newsgroup. The Howard Journal of Crime and Justice, 42(4):374–389.

(30)

Shankar, V. and Bolton, R. N. (2004). An empirical analysis of determinants of retailer pricing strategy. Marketing Science, 23(1):28–49.

Smith, M. D., Bailey, J., and Brynjolfsson, E. (1999). Understanding digital markets: review and assessment. MIT press.

Storti, C. C. and De Grauwe, P. (2009). Globalization and the price decline of illicit drugs. International Journal of Drug Policy, 20(1):48–61.

Tams, S. (2009). Trust-building in electronic markets: Relative importance and in-teraction effects of trust-building mechanisms. In Value Creation in E-Business Management, pages 143–154. Springer.

Teece, D. J. (2010). Business models, business strategy and innovation. Long range planning, 43(2-3):172–194.

Thompson, K. (1968). Programming techniques: Regular expression search algorithm. Communications of the ACM, 11(6):419–422.

Tzanetakis, M., Kamphausen, G., Werse, B., and von Laufenberg, R. (2016). The transparency paradox. building trust, resolving disputes and optimising logistics on conventional and online drugs markets. International Journal of Drug Policy, 35:58–68.

Utz, S., Kerkhof, P., and Van Den Bos, J. (2012). Consumers rule: How consumer re-views influence perceived trustworthiness of online stores. Electronic Commerce Research and Applications, 11(1):49–58.

Van Buskirk, J., Naicker, S., Bruno, R., Breen, C., and Roxburgh, A. (2016). Drugs and the internet.

Van Hout, M. C. and Bingham, T. (2013). ‘surfing the silk road’: A study of users’ experiences. International Journal of Drug Policy, 24(6):524–529.

Van Hout, M. C. and Bingham, T. (2014). Responsible vendors, intelligent consumers: Silk road, the online revolution in drug trading. International Journal of Drug Policy, 25(2):183–189.

Van Rossum, G. (2007). Python programming language. USENIX Annual Technical Conference, 41(1):36.

Warren, P., Davies, J., and Brown, D. (2008). ICT futures: Delivering pervasive, real-time and secure services. John Wiley & Sons.

Willmott, C. J. and Matsuura, K. (2005). Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Climate research, 30(1):79–82.

(31)

Xu, Y. C. and Kim, H.-W. (2008). Order effect and vendor inspection in online com-parison shopping. Journal of Retailing, 84(4):477–486.

Ye, Q., Law, R., Gu, B., and Chen, W. (2011). The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings. Computers in Human behavior, 27(2):634–639.

Zeithaml, V. A. (1988). Consumer perceptions of price, quality, and value: a means-end model and synthesis of evidence. The Journal of marketing, pages 2–22.

Zhang, Z., Ye, Q., Law, R., and Li, Y. (2010). The impact of e-word-of-mouth on the online popularity of restaurants: A comparison of consumer reviews and editor reviews. International Journal of Hospitality Management, 29(4):694–700.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320.

8

Appendix

8.1

Random Forest Tree Visualization

The graph can be seen on the following page.

Referenties

GERELATEERDE DOCUMENTEN

In this article, I have tried to contribute three things to its development: a new definition of video games as digital (interactive), playable (narrative) texts; a

“[t]oday, memory is widely called upon to legitimate identity because the core meaning of any individual or group identity is seen as sustained by remembering.” 97 Or

The authors address the following questions: how often is this method of investigation deployed; what different types of undercover operations exist; and what results have

A dot represents the weighted average share of value added contributions by the new Eastern EU regions (NER) in the final output of manufacturing products

Vaessen leest nu als redakteur van Afzettingen het verslag van de redaktie van Afzettingen voor, hoewel dit verslag reéds gepubliceerd is.. Dé

Figure 5.1 shows the five important stakeholders in the chain. The arrows show the route a 600 ml crate follows through the chain. The arrows between the POS and the consumer is

University of Applied Sciences Amsterdam, Create-It Applied Research Centre, Domain Media, Creation and Information, The Netherlands.. August Hans den Boef studied literature at

My research has concentrated on Mali, Cameroon and Chad and always been in collaboration with colleagues and students in Africa and elsewhere.The research proposed in this lecture is