• No results found

Big Data & Counterterrorism in the Post-Snowden Era

N/A
N/A
Protected

Academic year: 2021

Share "Big Data & Counterterrorism in the Post-Snowden Era"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BIG DATA & COUNTERTERRORISM IN THE POST-SNOWDEN ERA

Abstract

Ever since the terror attacks in Paris 2015, the Belgian security agencies were on the highest terror alert possible. Still, they were not able to prevent three coordinated terror attacks in Brussels on 22 March 2016. In the wake of terror attacks such as in Paris and Brussels, European leaders call for an extension of measures for intelligence and security agencies to collect, store and exchange data. This call comes forth from the belief that data mining might be the solution for predicting terrorism. The primary purpose of this thesis was to look critically at these theories and concepts behind data mining for counterterrorism and examine multiple challenges that come along. We do not know everything about the effectiveness of the counter data mining measures of national security agencies. Most of it is confidential. What we do know, as listed throughout this thesis, points to a lack of effectiveness on a technical, epistemological and social level and lists a number of challenges that data mining researchers have to cope with. The Snowden disclosures were of great value and by the critical use of the available literature this thesis contributes to the elucidation of the complexity and opacity of the Big Data surveillance programs. Still, suggestions for further research can be found in areas such as privacy and the dichotomy of freedom and security. From this perspective and on the basis of the results of this research, it can be concluded that the question is not whether data mining programs can be justified under law or not, but rather whether they are actually effective in the first place. Our political structures are not yet even comfortable discussing the challenges and problems of our calculated surveillance society, but whether they like it or not, it is a world that is coming fast and we are going to confront it.

Key words

Data mining, terrorism, dataveillance, Snowden MA Thesis New Media Supervisor: Niels van Doorn Second reader: Stefania Milan

UvA – MA New Media & Digital Culture

Date: 24 June 2016 Amount of words: 19.518

Name: Hidde van der Beek Student number: 10170952

(2)

There are known knowns, which we know we know; known unknowns, which

we know we don't know; and unknown unknowns, which we do not know we

don't know – Donald Rumsfeld, United States Secretary of Defense (2002)

(3)

Contents

Introduction ... 4

1. Database Populism and the Technical Challenges of Big Data Science ... 6

1.1 Statistics and Database Governmentality ... 6

1.2 Data Science ... 9

1.3 The Era of Big Data ... 10

1.4 Database of Intentions and Metadata ... 13

1.5 Calculated Publics and the Opacity of Algorithms ... 14

1.6 False Positives and the Challenges of Classification ... 15

2. Big Data Purposes for National Security and Terrorism ... 18

2.1 The Data Derivative ... 18

2.2 Governmental Data Mining ... 19

2.3 Dataveillance ... 21

2.4 The Politics of Preemption and the Ambiguity of New Terrorism ... 22

2.5 The Incalculable Threat ... 24

2.6 The Search for the Needle in the Haystack ... 26

3. Counterterrorism and Data Obfuscation in the Post-Snowden Era ... 28

3.1 Citizenfour and the Revelation of PRISM ... 28

3.2 Data Obfuscation in the Post-Snowden Era ... 30

3.3 Where Do We Go From Now? ... 31

Conclusion ... 33

Epilogue ... 38

(4)

Introduction

"What we feared has happened" (n. pag.). These are the words that Belgian Prime Minister Charles Michel used in his press conference broadcasted by the NOS in the article Belgische

premier: wat we vreesden is gebeurd (2016) as a reaction to the terror attacks on Brussels

Zaventem airport and a subway train in the EU quarter on 22 March 2016. Ever since the Paris attacks in January and November 2015, Belgian national security agencies have been on the highest alert possible. Still, the attacks in Brussels could not have been prevented. They were correctly worried that a deadly terror attack was likely to occur. On the morning of 22 March 2016, three coordinated nail bombings occurred in the capital city of Belgium, which left more than 300 people wounded and 34 people were killed. The bombings were the deadliest act of terrorism in the history of Belgium. The Islamic State of Iraq and Syria claimed responsibility for the attacks.

In the wake of terror attacks such as in Paris and Brussels, European leaders call for an extension of measures for intelligence and security agencies to collect, store and exchange data. This call often finds a lot of critique as well in relation to privacy infringements and is part of an ongoing debate between privacy and security. The call for an extension of data gathering possibilities comes forth from the belief that data mining might be the solution for predicting terrorism. As Dempsey & Flint in Commercial Data and National Security (2004) note: the goal is to search "based on the premise that the planning of terrorist activity creates a pattern or ‘signature’ that can be found in the ocean of transaction data created in the course of everyday life" (1464). The Belgian government and counterterrorism agencies knew that an attack was likely to happen, yet they were not able to prevent it and find the needle in the ever-growing haystack of data. It is hard to say something about to what extent the counterterrorism measures work. Most of that information is confidential. By way of contrast, what this thesis can do is going in-depth and outline theories and concepts on the premise of the imagined value of Big Data techniques.

There are particular hopes and ideas that are connected to preventive and predictive data mining. The goal of this thesis is to look critical at these theories and concepts behind data mining for counterterrorism and examine the multiple challenges that come along. One of the challenges that come along is from a technical side regarding data mining techniques and working with data in general. The actor in the case of counterterrorism needs to be outlined as well: terrorism itself. When characterizing the terrorism we face nowadays, what are the challenges? Finally, the revelations of Edward Snowden in 2013 revealed the scale of the monitoring and surveillance programs of the US National Security Agency. Despite the fact that since his revelation we know we are being watched, we just continue our daily lives. If people could mitigate state surveillance, how could they do it? There is a friction between the imagined value of data mining techniques and the result we see in the news with attacks such as in Paris and Brussels. The hopes and ideas that are connected to big data and the friction that comes along result to the core question of this thesis: how might predictive data mining be of value for counterterrorism programs and what are the challenges on a technical, social, and political level that pertain to these techniques?

(5)

To answer this question, this thesis collects multiple studies from different academic disciplines such as computer science, social science, and political science. With the critical use of key texts in relation to Big Data and terrorism out of each field, clarification sought to be made about this subject that is characterized by complexity and opacity. This thesis will be divided over three chapters. The first chapter starts with a short historical overview of database populism in relation to the concept of governmentality. By using Driscolls article From Punched Cards to “Big

Data”: A Social History of Database Populism (2012), it examines the role of statistics for getting

insights of the population by three periods, of which the first one goes all the way back to the late 19th century. Moving to the present, the concept of data science will be outlined as well as one of

the key characters in today's data mining techniques: automated data analysis. This will be further discussed in relation to the concept of 'Big Data' and the role of metadata regarding the databases of intentions. Near the end of this first chapter there is space to discuss some technical challenges considering dealing with data in relation to false positives, challenges of classification and the opacity of algorithms, which are at the very heart of criticizing data mining techniques.

Continuing on this technical and epistemological critique, in the second chapter producing knowledge with data mining applications will be further applied and emphasized in relation to national security, which is at the core of counterterrorism. The chapter starts with a more precise concept of the 'pattern' data mining applications are searching for: a data derivative. It continues by examining the concept of dataveillance, which is all about profiling people with suspicious and uncommon behavior and calculate their future malevolent actions. This calculation will be problematized when analyzing the characteristics of the terrorism we face nowadays. German sociologist Ulrich Beck (2002) asked perhaps the most interesting question in relation to this terrorism: “how to feign control over the uncontrollable” (41). Furthermore, in relation to the outlined characteristics, the politics of preemption play a key role in this chapter as well as it is about acting on something that could potentially happen. Analyzing the new terrorism as an incalculable threat marks some very interesting notes in relation to believes and thoughts outlined in the previous paragraphs in relation to the search for the needle in the haystack.

The third and final chapter of this thesis starts by discussing the Snowden disclosures regarding the PRISM program of the NSA. It was a landmark in the global awareness of the worldwide spying and surveillance programs. This final chapters continuous on the challenges of big data mining as listed throughout the previous chapters in relation to the ability of people of how they could mitigate the questionable state surveillance by addressing the concept of data obfuscation. This concept by Brunton and Nissenbaum (2015) can be seen as a weapon of the weak against the problematic surveillance programs. It will examine the asymmetry of information and power in relation to the secrecy of the surveillance programs by national security agencies. The question is whether obfuscation will directly stop the addressed asymmetries and challenges. Therefore this chapter ends with outlining where we can go from now in a future that lies ahead that is characterized by state surveillance that has become questionable in our post-Snowden era.

(6)

1. Database Populism and the Technical Challenges of Big Data Science

Where does the belief come from that data mining programs could prevent a terrorist attack? One could start by looking at the history of statistics, as it was primarily developed to get insights into populations, which is at the core of today's data mining programs. The arrival of the personalized computer in the 70s and 80s of the previous century revolutionized access to data and exceeded the ability to work with information about the population. This first chapter emphasize on the development to so-called data science. It outlines the challenges of dealing with data and the most important developments that eventually lead to the core element of today's data driven society: Big Data. What is data and when does data become Big Data and how does this relate to knowledge production? These questions, which have to be answered first, are of particular value for looking at today's data mining techniques in relation to counterterrorism.

1.1 Statistics and Database Governmentality

In order to understand the practices behind governing, 'measuring' and getting insights (read producing knowledge) of the population, which is at the core of today's data mining programs, this chapter starts with a short historical overview of the rise of the database in relation to the concept of governmentality. French philosopher and social theorist Michel Foucault developed the concept of governmentality as a guideline for the analysis of the genealogy of the modern state in the later years of his life during the 20th century. In his lecture Governmentality (1991), Foucault describes this concept as "the art of government" (87). Governmentality is a complex form of power that does not act on people as individuals, nor bodies citizens as subjects, but works on the aggregated and abstract concept of population. Prior to statistics there was no such thing as population. Population is not a social construction, but it is a material real construction and something after it is calculated and constructed through apparatuses consisting of measuring instruments and statistics. "Statistics enables the specific phenomena of population to be quantified" (141), as Foucault mentions in his

Security, Territory, Population: Lectures at the Collège de France 1977—1978 (2007). With

statistics, governments can control, 'capture', regulate, and conduct society as a population. With statistical tools, techniques, and measures, governments can capture reality in new ways that might give new interesting insights into things like the size and scale of national territory, wealth, and the economy. Scott refers to statistics, considering these possibilities, in his book Seeing Like

A State (1998) as "tools of legibility" (25). With statistics, a population can be read. In the same

vein Desrosières notes in his well-known book The Politics of Large Numbers: a History of

Statistical Reasoning (1998) that statistics were "a formal framework for comparing states. A

complex classification aimed to make it easier to retain and to teach facts, and for those in government to use them" (399). For instance, the bigger the population and the more men a population has, the stronger it will be. This only holds assuming that more people are better and stronger, and the male gender is seen as a stronger gender than women. From these points of view by Scott and Desrosières you can say that statistics as the politics of measurements

(7)

produced a taxonomy (the science of organizing and classification) next to just a quantification of the population. Continuing on Desrosières (1998): statistics "give us a scale to measure the levels at which it is possible to debate the objects we need to work on" (398). These tools of measurement and calculation produced a certain worldview and generate a form of knowledge production on society and the population. One can say, it was the birth of a new epistemological paradigm as governments learned and got insights of their population in new ways that gave a certain image of 'reality' that was called the 'social'. Other aspects of society were not included in this view of reality, which is of course a huge methodological and epistemological limitation that is still relevant in today's data mining programs. So statistics structure the public space by categorization and classification, which I will return to later on in paragraph 1.5. Collecting all these social data of the population produced these insights and information next to the fact that our very idea of 'the social' is shaped by these instruments of measure and calculation. In the first half of the 19th century, according to Hacking in his chapter How Should We Do The History of Statistics?

(1991), this "enthusiasm for numbers" consequently caused an "avalanche of numbers" as well (186), which became a problematic case.

The data that was produced and collected with statistics had to be stored somewhere, which gave rise of the modern archive as we know it today. This modern archive became increasingly important to the functioning of governing institutions, otherwise they could not handle the growing amount or 'avalanche' of data. Within this framework we need to look at the rise of the database as we move from the archive to the database. Next to the 'classic' storing of data, which the archives were good for, the governments wanted to analyze and process the data, and with analyzing and processing the introduction of human interaction with data was set in place. These statisticians can be seen as early data processors and turned the data into information, which is a very important step in today's data mining programs. They were looking for patterns and correlations between these data that could mean something and could have a particular truth in it. Near the end of the 19th century, according to Driscoll in his From Punched Cards to “Big Data”: A

Social History of Database Populism (2012) "public and private bureaucratic institutions were

growing increasingly large and geographically dispersed, creating a rising demand for new systems to manage information" (7). In the end, processing information was still a manual thing and was therefore time-consuming. A corporation that later called themselves IBM introduced an electro-mechanical prototype form of a database that worked with punched cards, which contained machine readable information. It was a very important step in the automation of data processing. As Dorman notes in his The Creation and Destruction of the 1890 Federal Census (2008): "the punched cards became, in effect, not only a copy of the information, but its principal medium" (366). This could automate the process and make it quicker, more efficient, intensify it and perhaps most important, scale it. It marks one of three periods that Driscoll (2012) notes regarding changes in the accessibility and infrastructure of database technologies. These machines by IBM could process the punched cards way faster than the hand and the brain of classic statisticians. It can be

(8)

seen as the birth of automated data analysis and is central to the development of modern governmental data mining applications. By way of contrast, not everyone saw this as a satisfying development. As Driscoll (2012) continues, "fear and anxiety marked much of the popular response to database technology in this early period" (10). This can be seen as a very early form of critique regarding surveillance and control via governmental databases. This notion of fear and anxiety will be further emphasized in the last chapter of this thesis. Continuing on information processing in the late 19th century, as Beniger notes in his book The Control Revolution (1986), the

early applications of information processing by the state tended to use these data for "the active control of individuals" (408). Although this took place in the late 19th century, it has remarkably

similarities with the 21st century counterterrorism programs, which will become clear in the next chapter. Beniger uses the example of a railway ticketing system in his book to describe how punched cards played a role in getting insights on the population. At the point of purchase, the conductor manually punched a hole in a specific area of the train ticket that indicates the purchaser's appearance i.e. gender, hair color, eye color etc. This way each ticket was uniquely connected to its buyer and would give information about the travelers of the train.

Still, these automated data analysis machines were vast, very expensive and almost impossible to understand for non-specialists. As Driscoll (2012) continues: "the role of the database in society remained remarkably stable from 1890 to 1960" (11). It was not until the 1970s that these information-processing machines become available for individuals, instead of only for large bureaucracies such as governments and big corporations. During the intervening decades the technology became cheaper and smaller, though it was still a very niche market. In the late 1970s and the beginning of the 1980s there was a general popularization of the home computer, which not only made a proliferation possible of computers but of personal databases as well and marks the second period in Driscoll's historical overview of database technologies. As Driscoll notes regarding this second period: "during the popularization of personal computing in the 1980s, databases were imagined in one of two forms: remote collections of information accessed via a telecommunication system, or locally-produced, grassroots systems for the storage, analysis, and production of information" (18). Eventually in the 90s, the introduction of the Internet created new markets for commercial databases, which can be seen as the first early versions of cloud computing and storage. This cloud-model of distributed computing and subscription-based online services were remotely accessible and mark the third period in Driscoll's typology. Especially with the rise of professional database intermediaries we moved from being close with the database and interact with it, to having more distance, remote access and thus layers of mediation and obfuscation. The 'old' relational databases made room for flexible, open, and unstructured databases. Due to the exponential growth of technical capacities, we gather much more data that are just thrown in these unstructured databases and we wait for an algorithm to find whatever we are looking for. One can use Deleuze's concept of the 'dividual' to describe the subjective form of the database. In his Postscript on the Societies of Control (1992) Deleuze mentions, "we no longer

(9)

find ourselves dealing with the mass/individual pair. Individuals have become dividuals, and masses, samples, data, markets, or banks" (5). We can see the dividual in a sense of a digital representation. This numerical language that makes up this data double, as not self-contained units, allows whole new ways of control. Databases do not abstract and detach from a pre-given social realm, they are imminent to and part of the very relations that make up what we understand to be the social. Databases construct how we understand the social. As the way of storing, analyzing, and processing data extended in the second half of the 20th century, the way of doing

research changed as well to a more data-intensive form of science.

1.2 Data Science

The term 'data science' was first coined by Danish computer science pioneer and Turing award winner Peter Naur in his book Concise Survey of Computer Methods (1974). In this book he defines data science as "the science of dealing with data" (72). Data science was more than statistics; it was not just about calculating data, but rather about dealing with it by using a computer. Dealing with data includes processing, storing, cleaning, and manipulating data before the data is analyzed. This is not something new and already had to be done with statistical research as well. Dealing with data is just one part of the entire 'data-chain'. First you collect data, second step is to prepare and deal with the data, subsequently as a third step you analyze the data. The fourth step is to interpret data and the final and fifth step is the application of the produced knowledge. The work of Naur was organized around the concept of 'data' defined in

IFIP-ICC Vocabulary of Information Processing (1966) as "a representation of facts or ideas in a

formalized manner capable of being communicated or manipulated by some process" (n. pag). To bring the contrast between people and technology, and developing an international standard vocabulary for the profession, a distinction was made between data and information. They define information as "the meaning that a human assigns to data by means of the known conventions used in its representation" (n. pag). This connects to the fourth step in the data-chain. To further outline the difference between data and information: data are the bits from which information is derived. Therefore data needs to be put into context to make them meaningful and useful to become information.

In the 90s Jonathan Berry from BloombergBusiness published a cover story on Database

Marketing (1994). At the core of database marketing according to Berry lies an important thing, the

notion of prediction by using data: "companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to get you to do so" (n. pag). The notion of prediction in relation to the provisional unit of the dividual, which is based on statistical correlation, has particular resonance in relation to the text by Fayyad et al. In their From Data Mining to Knowledge

Discovery in Databases (1996) they write: "knowledge discovery in databases refers to the overall

(10)

this process. Data mining is the application of specific algorithms for extracting patterns from data" (39). Algorithms can be seen as automated self-contained procedures of calculation. Extracting patterns, producing useful knowledge from data and predict future trends are important aspects of today's data mining techniques and applications. The additional steps, returning back to Naur, processing, storing, cleaning, and manipulating are essential to ensure that useful knowledge is derived from the data. Without focusing on the aspects of 'dealing with data' (step two in the data-chain), the producing of meaningless or invalid patterns might be the result.

Today's databases can involve millions of rows of data due to the diffusion of Internet access and therefore, despite the fact that the computational capacities has been extended as well, scalability has become a huge issue. As Driscoll (2012) writes: "the development of new database technologies is driven by the demands of extremely large data sets, especially those produced by highly centralized web services such as Google and Facebook—a cross-cutting field of research colloquially termed Big Data" (1). When does data become Big Data? Lev Manovich relates Big Data in Trending: The Promises and the Challenges of Big Social Data (2011) to data that exceeds computational capacities of normal computers such as the "ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time" (1). This chapter continues on the concept of data, in particular Big Data, which is, as scalability has become a central issue at the core of today's society, characterized by multiple devices collecting all kinds of information and data. What are the characteristics of (big) data and which technical challenges arise when working with data that are of particular interest for counterterrorism techniques and applications?

1.3 The Era of Big Data

The first two paragraphs made clear that facing a vast collection of data about the population is not something that is entirely new, in fact the history of it goes two centuries back. This scaling up of social data has its own history. As more and more data is being stored, especially with the new technological and computational possibilities that have come to us over the past ten years, boyd & Crawford (2012) mention in their Critical Questions for Big Data: Provocations for a Cultural,

Technological and Scholarly Phenomenon (2012) "the era of Big Data has begun" (662). Big Data

can be placed within this history of social statistics. As Beer notes in his How Should We Do the

History of Big Data? (2016) regarding the introduction of Big Data: "the type of data may have

changed as might its analytics – with the shift toward commercial and algorithmic forms amongst other changes – but the lineage is clear" (2). The scholarly and popular literature contains enthusiasm and positivity about Big Data. This arises mostly from the business sector, where Big Data offers new insights and possibilities for direct and personal marketing, supply-chain optimization, and other means of being more efficient and generate more profit through enhanced insights and control management. This excitement has spread to the domain of national security and counterterrorism as well, which is the topic of this thesis. As boyd & Crawford (2012) mention,

(11)

Big Data triggers an utopian and dystopian rhetoric: "on one hand, Big Data is seen as a powerful tool to address various societal ills, offering the potential of new insights into areas as diverse as cancer research, terrorism, and climate change. On the other, Big Data is seen as a troubling manifestation of Big Brother, enabling invasions of privacy, decreased civil freedoms, and increased state and corporate control" (663-664). The consequences of Big Data are broad and have its consequence on an epistemological level as well. Big Data changes the definition of knowledge and how we come to knowledge.

As boyd & Crawford continue: "Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community" (665). In the end, Latour made this already clear in his Tarde’s Idea of Quantification (2009): "change the instruments, and you will change the entire social theory that goes with them" (9). Big Data changed the instruments at a whole new and mostly scaled-up level and therefore creates a shift in how we think about doing research. Big Data moves focus away from explanation and implies a data-driven approach instead of the hypothesis-driven approach of the traditional scientific method. What people do becomes more important than why they do it. Lazer et al. outline in Computational Social Science (2009), Big Data offers "the capacity to collect and analyze data with an unprecedented breadth and depth and scale" (722). An unprecedented breadth, depth and scale reminds of the 'avalanche' of data in 19th century as

mentioned in the first paragraph and gives weight to the escalation of data resources as more and more objects are getting 'smart' and therefore produce data. Albeit scale is an important aspect of the new possibilities that arise alongside Big Data, a profound development in the aspects of epistemological foundations is relevant as well, as it reframes fundamental assumptions of science (i.e. the constitution of knowledge and the process of doing research). Hey, Tansley & Tolle argue in The Fourth Paradigm: Data-Intensive Scientific Discovery (2009) that Big Data represents a "fourth paradigm" to characterize the revolutionary transformation to a data-intensive form of science (xix). The first paradigm was thousand years ago, when science was empirical and was used to describe natural phenomena. The last few hundred years, as the second paradigm, there was a theoretical branch characterized by using models and generalizations. The third paradigm from the last few decades was a computational branch simulating complex phenomena. Today's fourth paradigm is about data exploration, unifying the previous paradigms of theories, experiments, and simulations. To an even more extreme view, in The End of Theory: Will the Data

Deluge Make the Scientific Method Obsolete (2008) according to Anderson, Big Data represents

the "end of theory" (n. pag.) as Big Data generate more useful and accurate results than domain experts for example who traditionally craft hypotheses and other research strategies. I do not agreed with Anderson saying that Big Data generate more useful and accurate results, at least in the case of terrorism, which I will further explain in the next paragraphs. Instead of such an extreme view, this thesis sees the new paradigm as complementary instead of substitutive to pre-existing paradigms such as observation, experimentation, and simulation. Though, when Big Data

(12)

impacts epistemological foundations, a paradigm shift in scientific research is inevitable due to the challenges that come along with a data-intensive form of science. This chapter attempts to outline the impact of Big Data in the context of science and summarize its technical challenges because next to the fact that there is enthusiasm and positivity about Big Data, there is a lot of critical literature as well. With the new possibilities of Big Data, new limitations and challenges may come along that needs to be examined.

In the 21st century, digital methods and technological innovations allow us to capture,

process, edit, store and analyze vast amounts of extensive data sets in new, easier and faster ways. Perhaps most important is the fact that nowadays data is born digital or is natively digital. The daily use of the Internet, social media platforms, smart phones and various applications generate enormous amounts of data that allow 'Big Data' to emerge. Further, digital media platforms create affordances to produce and mine real-time customer activities at the same time through their API's. However, we should not forget something that is called the 'raw data phantasy'. As Naur (1974) already mentioned you have to deal with data. Data has to be 'cooked', as in it has to be processed, stored, and cleaned. Continuing with Manovich (2011): "we need to be careful of reading communications over social networks and digital footprints as 'authentic'. Peoples' posts, tweets, uploaded photographs, comments, and other types of online participation are not transparent windows into their selves; instead, they are often carefully curated and systematically managed" (6). They are pre-fabricated and standardized artifacts that are medium specific. It is something Langlois & Elmer (2013) address as well. Social media does not record but structure, organize and pattern activity. Data is always prefigured through the gathering mechanisms of a platform. Therefore Lisa Gitelman (2013) made the conclusion that 'raw data is an oxymoron' as the juxtaposed elements (raw and data) appear to be contradictory. If you want to work with data, you have to deal with data. Consequently, raw data has to be 'cooked'. Data is in the end not a 'given', you have to take it and capture it and is therefore always manufactured.

According to boyd & Crawford (2012) what is essential considering Big Data is the relation between data sets, which might give particular new insights that could not have been produced without a data-intensive form of science. Therefore, the importance and the value of data does not lie in its size, but rather in the possibility to add new insights to other data. To extend this notion, Savage & Burrows emphasize in The Coming Crisis of Empirical Sociology (2007) the essence of complete data sets or speaking in mathematical terms: n=all (number=all, in other words: you have all the data). Big Data is not about a small sample but about a complete set that might give insights in the granular (subcategories and submarkets that samples cannot assess). But as a remark on this idea about Big Data, Leonelli writes in What Difference Does Quantity Make? On the

Epistemology of Big Data in Biology (2014) that "having a lot of data is not the same as having all

of them, and cultivating such a vision of completeness is a very risky and potentially misleading strategy" (7). If the n=all principle is true, Big Data should provide a complete view of reality, which is hard to imagine.

(13)

1.4 Database of Intentions and Metadata

The aim of collecting all of those data is to interpret the data and eventually profile people with suspicious and uncommon behavior and predict their future, potentially malevolent actions. Profiling people is not something new. As mentioned in the first paragraph with the example of the railway ticketing system coined in the book of Beniger (1986), already in the late 19th century each ticket was uniquely connected to its buyer via its personal appearance. With the introduction of the Internet and digital databases in relation to Deleuze's concept of the 'dividual', the way of profiling people and predicting their future behavior exceeded with more comprehensive and precise possibilities. John Battelle researched the different data points that multiple services of Google store in individual profiles. The different data points from multiple services relate to one of the key elements according to boyd & Crawford (2012) of Big Data: connecting multiple data sets together. In the case of a research by John Battelle, the native services of Google were investigated: Gmail, Google Maps, Google Search etc. Battelle calls these specific personal profiles in his well-known book The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our

Culture (2005) 'database of intentions'. In spite of the fact that this book primarily focuses on

search engines and business purposes, a database of intentions has particular resonance for counterterrorism as well. The collected, and more important connected and related data, might give particular insights in someone's intentions. The importance of such database is addressed by Richard Rogers in his The Googlization Question: Towards the Inculpable Engine (2009) too: "the database contains one's flecks, content about interests and habits" (175). Again, despite the fact that Google uses this information mostly for providing personalized advertising and recommendations for Google Search, it might be useful for profiling persons in general and eventually for security measures as well.

Large technologic companies such as Apple, Google, Microsoft, Facebook, and Amazon have databases of intentions that consist of people minds, thoughts, and ideas that have enormous value. As a critique towards these databases of intentions, it is not just about intentions, it is about all the detritus and traces we leave behind all the time as well, that is scaled under the concept of 'metadata'. This can be the date and time you have called or the location from which you last accessed your email. The data that is collected does not contain content-specific details but rather transactional information. Metadata is data about data and is the fact that a communication occurred. This information is the same thing that is produced when a private investigator is following you all day. They cannot be close enough to hear every word you say but they can be close enough to know when you left your house, who you have met and how long you were in a particular place. After the Snowden revelations in the summer of 2013, president of the United States of America Obama said in a press conference broadcasted by NBC News the following regarding the NSA's secret surveillance program: "nobody is listening to your telephone calls. [...] They’re not looking at names and they’re not looking at content, but sifting through this so-called metadata, they may identify potential leads with respect to people that might

(14)

engage in terrorism". Perhaps, a more suitable and more clear word for metadata, which makes it interesting in the case of terrorism, is 'behavioral data', as it suggests that metadata is about what people do and their connections that might have particular value.

1.5 Calculated Publics and the Opacity of Algorithms

At the very heart of finding patterns in data lie the algorithms as automated self-contained procedures of calculation that make meaning out of data and transform it into value. Gillespie writes in The Relevance of Algorithms (2014) about the concept of calculated publics, where he notes that certain algorithms "seem to spot patterns that researchers could not see otherwise" (190). Calculated publics are therefore algorithmically calculated presentation of the public that are often opaque in their mechanisms of calculation. According to Gillespie "the intention behind these calculated representations of the public is by no means actuarial" (189) Lagoze questions the value of calculated publics in Big Data, Data Integrity, and the Fracturing of the Control Zone (2014) in relation to the accuracy or veracity of data: "one acknowledged factor is an overconfidence in the veracity of the data as a true sample of reality, rather than a random snapshot in time and the result of algorithmic dynamics" (5). It is hard to reach a complete view when you do not have all the data but rather a random sample. This algorithmically calculated presentation of the public is based according to Hasselbalch in his Standing in the Rip Current of

the Algorithmic Economy With Closed Eyes (2015) on "subjective assumptions, perhaps even

biases, and interests – commercial, governmental, scientific etc." (n. pag.). The complication is that this is evolving with no ethical oversight and no public scrutiny. The total lack of algorithmic transparency in the proprietary computational algorithms and socially consequential mechanisms (false positives) is often referred to as the opacity of algorithms. As Fourcade & Healy point in

Accounting, Organizations and Society Classification Situations (2013) this shift towards an

algorithmic economy is made possible by "the emergence and expansion of methods of tracking and classifying consumer behavior" (560). Algorithms do not solve the problems that come along classification, as listed in the previous paragraph. The claim that algorithms classify more 'objective' cannot simply be made due to the fact that a degree of human judgment and influence is still involved in designing algorithms. The problem or question that is proposed should be translated into a formal language that the computer can understand. This means that both domain knowledge and knowledge of the formal computer language (code) is of great importance.

This degree of (built-in) human judgment and influence is important if we want to assess the ethics of a service. If you want to know how an algorithm is designed and on which factors decisions and classifications of people are based, exposing the ethics of a service is a necessary thing. Yet, because of the level of opacity with algorithms this information is not available, often referred to as the black box metaphor due to the malleable and obscure character. As Andrejevic & Gates mention in their Big Data Surveillance: Introduction (2014): the emerging, massively data-intensive paradigm of knowledge production relies more than ever on highly complex automated

(15)

systems that operate beyond the reach of human analytical capacities" (186). The process of analyzing data and its results are systemically and structurally opaque for ordinary citizens. The opacity can be seen as a form of proprietary protection or as Pasquale calls in his The Black Box

Society: The Secret Algorithms that Control Money and Information (2015) 'corporate secrecy',

where he proposes that the opacity is as product of lax or lagging regulations. "What if financiers keep their doings opaque on purpose, precisely to avoid or to confound regulation?" (2). The opacity of algorithms could be explained by the self-protection in the name of competitive advantage in the case of intelligence agencies and businesses. Though, this could also be a cover to "deploy strategies of obfuscation and secrecy to consolidate power and wealth" (14). However, what if there is a form of opacity without human semantic explanations. Burrell argues in How the

Machine ‘Thinks’: Understanding Opacity in Machine Learning Algorithms (2016) "when a

computer learns and consequently builds its own representation of a classification decision, it does so without regard for human comprehension" (10). The machine learning algorithms are challenged at a more fundamental level in the search for more transparency and ethical standards for classification. Tufekci notes in Engineering the Public: Big Data, Surveillance and

Computational Politics (2014) that the opacity of algorithms "alters the ability of the public to

understand what is ostensibly a part of the public sphere" (26). They cannot access the ethics of a service and therefore how an algorithm is designed and on which factors decisions and classifications of people are made remains opaque. As a consequence, as Zarsky notes in

Transparent Predictions (2013): a non-interpretable process might follow from a data-mining

analysis that is not explainable in human language. Here, the software makes its selection decisions based upon multiple variables (even thousands)" (1519). In this regard, determinations of class, risk and suspicion are a result of complex and opaque data interactions that are anticipatable and inexplicable. Disclosures cannot be properly understood as long as intelligence agencies, businesses and even governments keep their doings opaque in a 'black box'.

1.6 False Positives and the Challenge of Classification

The amount of data and the quality and veracity of it is inevitably an issue and could lead to what McFarland & McFarland (2015) call "being precisely inaccurate" (1). The previous paragraphs made clear that the list of technical challenges that come along working with data is quite a list and therefore the outcome of working with those inaccurate and biased data is problematic as well. In

Data Mining and Data Analysis for Counterterrorism (2004) DeRosa mentions, "if the data are not

corrected or “cleansed” before they become the basis for government data analysis, inaccurate or incomplete identification could result" (14). Should not we see cleaning as manipulating? The more I read and write, the more I tend to say yes, which is of course very problematic. Unfortunately, manipulation of various kinds is often necessary if you even want to use data in some way. It is the price you have to pay for 'dealing with' data. This becomes even more important when scale and therefore the amount of data increases. Mayer-Schönberger and Cukier state in their Big Data: A

(16)

Revolution that Will Transform How We Live, Work, and Think (2013) that "looking at vastly more

data also permits us to lessen our desire for exactitude" (13). As scale increases, it is harder to reach a high level of exactitude as the number of inaccuracies increases along as well. Mayer-Schönberger and Cukier further outline how Big Data challenges the way we live and interact with the world we live in. Our data driven society "will need to shed some of its obsession for causality in exchange for simple correlations: not knowing why but only what" (7). These correlations may not tell why something is happening but they do alert us that something is happening. In some situations you do not need to know what is inside as long as we can predict what it will do so it is enough to act on. For example, when you want to buy a plane ticket you want to know the best time to save the most money ('what'). The airfare madness behind this practice ('why') is less important for the actual act. This is not the case with terrorism. With data mining techniques you will be able to answer 'what', 'when and 'who', while the most essential questions, 'how' and 'why' in relation to terrorism, will be unanswered. In the end, correlations are not the same as causalities. Causality means that variable Y causes variable Z. Correlation means only that variable Y and variable Z are connected with each other. A correlation that is statistically significant, is not necessarily causal. Without a theory about the why of correlations and without good evidence that it is causal, interventions based thereon can completely miss the point. This is a huge epistemological limit that Mayer-Schönberger & Cukier do not fully address.

Inaccurate or incomplete identification could work in two ways: it could lead to false negatives, which is a significant security issue as it, in case of this thesis, identifies potential terrorists as not being a threat. Conversely, it could lead to false positives as well, which means innocent people will incorrectly be identified as being suspicious. Metadata tells a story about you that is made up of facts, but is not necessarily true. It is one of the critiques boyd & Crawford (2012) are addressing in their article as well: "taken out of context, data lose meaning and value" (670). In the end, not every pattern or connection is equivalent to every other pattern or connection. The separation between the 'noise' of innocent people and the 'signal' of potential terrorists is a major technical challenge at the core of data mining. It is not something new though, as it is known in statistics as the 'base rate fallacy' and it applies in other domains as well. The 'base rate fallacy' basically means that specific statistical data is ignored in favor of other data to make a certain probability judgment. This connects to the problem of apophenia, which is the perception of patterns within random data. The most common example in our daily life is people who are seeing faces in the clouds. Apophenia is according to Bratton in his article Some Trace

Effects of the Post-Anthropocene: On Accelerationist Geopolitical Aesthetics (2013) about

“drawing connections and conclusions from sources with no direct connection other than their indissoluble perceptual simultaneity” (n. pag.). If you search long enough, you will eventually find something. You will always find some correlations but whether they are actually a meaningful signal or not, you do not know. The critical issue is what security agencies do with these correlations and false positives as it might have major consequences for innocent people. Though,

(17)

the reality is that the stakes when fighting terrorism are very high and therefore there will be a great temptation to act on suspicious behavior, whether the process of identifying has been done correctly or not. As DeRosa (2004) continues: "even if the government later corrects its mistake, the damage to reputation could already be done, with longer-term negative consequences for the individual" (15). Redressing inaccurate identification is a complicated procedure as the result might have been disseminated to other databases.

The concept of false positives is part of a larger challenge and most commonly encountered task of data mining, which is the challenge of classification. In a broad sense this implies the characteristics that determine and therefore divide the population in different groups. This challenge has its own history. Returning back to the article of Hacking (1991), as coined in the first paragraph of this chapter, "when the avalanche of numbers began, classifications multiplied because this was the form of this new kind of discourse" (192). The emergence of new ways of measuring people led to people being classified in new ways as well, which itself has large implications for the way in which individuals and groups were perceived and treated. I already have addressed the connection between statistics and classification in the first paragraph of this chapter by coining Desrosières (1998) where he talked about 'categories', which is a word that is derived from the Greek term kategoria and refers to judgment rendered in the public. Categories, as he notes, as quantified objects part of statistics in general are the "bonds that make the whole of things and people hold together" (236) and are at the core of classification. In a different earlier work of Desrosières, his chapter How To Make Things Which Hold Together: Social Science,

Statistics and the State (1990), he notes that "classifications appear to be conventions, as is

shown by comparisons between different countries, or between different historical periods in statistical descriptions of the social world" (197). This is a huge epistemological limit. At the core of the criteria used for segmentation lie predefined classes. In their Data Mining (2010) Weiss & Davison note that "some classification tasks may require that complex decision boundaries be formed in order to achieve good classification performance" (546). However, it is often impossible to perfectly separate each example into a specific class due to the fact that there is often noise in the data and a degree of inaccuracy. Classification tasks are a key area of application and one where many sociological and ethical concerns arise even before the introduction of machine learning algorithms. In their Sorting Things Out: Classification and Its Consequences (1999) Bowker & Star already addressed a fundamental aspect of classification that becomes more and more relevant in today's computational politics and classification mechanisms of Big Data applications: "each category valorizes some point of view and silences another" (5). Data change over time, are not always accurate and have to be 'cooked' or prepared and normalized and aggregated before processing. Hereby, you are manipulating data as well. These classificatory struggles make the amount of inaccurate decisions higher or perhaps even inevitable and is connected to what Lyon calls in his book Surveillance as Social Sorting: Privacy, Risk, and Digital

(18)

divisions" (2). If the data is seen as an accurate reflection of a certain population, which it is not, as outlined throughout the previous paragraphs, the conclusion that is made about a population is inaccurate as well. The bias that is being produced results in increasing social stratification and digital discrimination. Besides that, this discrimination may result into a cumulative disadvantage for certain groups in society. As Zarsky notes in his Understanding Discrimination in the Scored

Society (2014): individuals might be judged "based on what inferences and correlations suggest

they might do, rather than for things they have actually done" (1409). This is of course a very problematic, discriminating and unethical case as people could be wrongly identified. As mentioned in the previous paragraph, determinations of class, risk and suspicion cannot be properly understood as long as intelligence agencies, businesses and even governments keep their doings opaque in a 'black box'. Therefore individuals who became the victim of inaccurate or incomplete identification and categorization such as false positives are not able to challenge the process by which they are assigned. Fearfully, we must simply accept the (discriminating) consequences of data science 'knows best' even if we know they do not.

2. Big Data Purposes for National Security and Terrorism

The technical and epistemological challenges that accompany a more data-intensive form of science in search for particular patterns, relations, and correlations has been interesting to outline in the first chapter of this thesis. Though, the terms 'pattern', 'relation', and 'correlation' may still be a bit abstract. What the algorithms of data mining applications may produce is a data derivative. Producing actionable knowledge with data mining applications will be further applied and emphasized in this chapter in relation to national security, which is at the core of counterterrorism. I will introduce key concepts such as dataveillance and the metaphor of the needle in the haystack, and I will try to outline and emphasize the characteristics and possibility of patterning terrorism. The politics of preemption play an important role as the data mining techniques work on speculative ideas and suspicion. On the basis of these concepts, the purposes of and ideas behind governmental data mining and data mining techniques for national security in relation to terrorism will be expounded.

2.1 The Data Derivative

In Data Derivatives: On the Emergence of a Security Risk Calculus for Our Times (2011) Amoore emphasizes the importance of the relations between data regarding national security measures and data mining techniques and applications. "Contemporary risk calculus does not seek a causal relationship between items of data, but works instead on and through the relation itself" (27). In other words: the relation is just as real as what it relates and it is the relation itself that holds information. The importance of the relation is something boyd & Crawford (2012) addressed as well as the most important thing regarding their definition of Big Data. This leads to the concept of the data derivative, which is the aspect that algorithms are producing and is the product of

(19)

analysis. A data derivative comes up as soon as particular relations fall into place that flag particular behavior and turns on an alarm. The relation and the automated pattern recognition of that relation is at the core of such data analysis. The relevant information does not come from the data points themselves but it emerges in the (meta)space between the data points, which is a conceptual difference and relates to the notion of metadata as mentioned in paragraph 1.4. "The data derivative is not centered on who we are, nor even on what our data says about us, but on what can be imagined and inferred about who we might be – on our very proclivities and potentialities" (28). This is an interesting assumption in relation to this research on counterterrorism as Amoore suggests that (suspicious) behavior might be predicted. When it comes to calculating risk profiles it is according to Amoore "of lesser consequence whether data accurately captures a set of circumstances in the world than whether the models can be refined for precision" (32). This is because "movement in any direction can be secured so long as it is possible to correlate the mobility to some future amalgam of possible outcomes" (36). That is what this logic is about and connects to Jonathan Berry's article on database marketing I addressed in the previous chapter. Any movement can be secured as long as it is possible to correlate its mobility to some future sets of possible outcomes. The more we move and act freely, the more data we generate for a database (of intentions) in terms of relations and generate more patterns that might lead to a prediction of possible futures. As such, "the data derivative embodies a risk mode that modulates via mobility (freedom) itself" (36). Databases are about establishing new relations and connections between elements in an environment in order to predict possible futures and perhaps possible criminal futures that gives insights in future terrorist attacks.

2.2 Governmental Data Mining

What will the governments do with all of those data and such a data derivative? Shortly returning back to the first paragraph of the first chapter: statistics have become an important component in the act of governance. Hacking (1991) argued that "statistics has helped determine the form of laws about society and the character of social facts. It has engendered concepts and classifications within the human sciences" (181). Statistics is part of the infrastructures and modes of governance of the state, with mostly important, direct implications for how people are classified and therefore treated. Furthermore, to extend this notion and to answer this question more elaborated, Rubinstein, Lee & Schwartz focus in their Data Mining and Internet Profiling: Emerging

Regulatory and Technological Approaches (2008) on profiling in relation to national security and

governmental data mining. "Companies can track and document a broad range of people's online activities and can develop comprehensive profiles of these people" (261). Developing these comprehensive profiles is at the core of governmental data mining: "its particular focus is on pattern-based searches of databases according to a model of linkages and data patterns that are thought to indicate suspicious behavior" (262). The last part suggests that suspicious behavior and therefore possible future malevolent actions can be discovered with data mining techniques. This

(20)

is not the first quoted article that claims that behavior might be predicted. Is human behavior actually so predictable? Is it possible to determine someone's plans by analyzing their metadata from phone calls, emails, bank transactions etc.? Journalist Alejandro Tauber had at the Amsterdam Privacy Conference in 2015 a conversation with former NSA worker William Binney. Tauber asked him perhaps the most important question regarding predicting human behavior. In Tauber's article Ik ben eindelijk bang voor massasurveillance (2015) Binney's answer to this question was "with twinkling eyes: yes" (n. pag.). It is not a long answer and the fully convinced 'yes' out of the mouth of someone who has almost thirty years of experience with data intelligence and analysis at on of the biggest national security agencies, scares me.

Returning to the text of Rubinstein, Lee & Schwartz, there is an important difference to note in the way of data mining for national security. They outline two different forms of data mining: first there is the subject-based approach where intelligence agencies gather information about subjects they already suspect. This way of data mining can be seen as a more traditional and post-hoc form of intelligence. The suspect is already on the radar and the goal is to gather more information. The second approach, which is becoming more relevant nowadays and additionally more controversial in relation to privacy standards, is a pattern-based approach: "the government investigator develops a model of assumptions about the activities and underlying characteristics of culpable individuals or the indicators of terrorist plans" (262). Developing a model of assumptions is obvious but remarkable and questionable as well. Who determines which assumptions are part of the profile and which are not? There is no such thing as objectivity with making profiles. It is the same form of critique regarding the opaqueness and objectivity of algorithms and data I addressed in the previous chapter. Besides that, according to Dempsey & Flint, as they note in their Commercial

Data and National Security (2004), this technique is in tension with "the constitutional presumption

of innocence and the Fourth Amendment principle that the government must have individual suspicion before it can conduct a search" (1466). This connects to the possibility of having false positives, which can intrude in the lives of innocent people and could have serious consequences. Despite these concerns, the government and intelligence agencies are searching for, speaking in Amoore's words, a data derivative that indicates a match between someone's data and a particular pattern or predetermined profile. This is exactly why data mining is more about people like you, instead of focusing on just you. I will try to outline this with an abstract example: if there is data about terrorist A, which has a particular search history, demographic, network, bank transactions etcetera, then person B who has similar data has a specific percentage of chance that he might be a future terrorist with possible malevolent actions as well. This person becomes suspicious and comes on the radar as it acts likewise as a terrorist. Regardless of the fact that as I mentioned in the previous chapter not every pattern or connection is equivalent to every other pattern or connection. Not every correlation contains a meaningful signal or has an actual truth in it. With these data mining techniques governments and security agencies are working towards a 'new' form of surveillance: dataveillance.

(21)

2.3 Dataveillance

The theories of governmental data mining techniques are connected to national security-related surveillance purposes, what Levi & Wall following Roger Clarke (1994), in Technologies, Security,

and Privacy in the Post-9/11 European Information Society (2004) call 'dataveillance', which

basically means that everything we do will be recorded and stored in a database. Dataveillance, a combination of the words 'data' and 'surveillance', is according to Levi & Wall the "proactive surveillance of what effectively become suspect populations, using new technologies to identify ‘risky groups' by their markedly different patterns of suspect behavior" (200). Returning back to Battelle (2005) from the previous chapter, the collection, gathering, and connecting of the data from our online behavior that is stored in 'databases of intentions' lies at the beginning of this concept of dataveillance. Surveillance technology, as Levi & Wall continue, relies on “the proposition that each movement or transaction leaves a trail of electronic traces, which means that individuals cannot easily disappear" (206). This is exactly why transactional data or metadata is so important. Lots of the data we produce is not intentional. While it is mostly used for commercial and business purposes such as advertising, it can be seen as a beginning of what can be extended and expanded to a society where power, control, monitoring, and thus sophisticated surveillance is empowered by data mining techniques that serve national security purposes and is connected to what Haggerty & Ericson call in their The Surveillant Assemblage (2000) "the disappearance of disappearance" (619). It is increasingly difficult for individuals to maintain their anonymity or to escape the monitoring of national security agencies. Nowadays panoptical aspirations are executed through advanced surveillance programs that enable the few to scrutinize the many like never before.

Surveillance by Big Data programs as in dataveillance mobilizes the promise of a surfeit of data as means of control. (Corporate) technology is emerging to collect everything about everyone at all times, we know we are being monitored all the time and that our data is being collected. To relate this development to the text of Rubinstein, Lee & Schwartz (2008), the second approach of data mining where we just collect everything beforehand, before there is even a suspect, is becoming more dominant. Lyon continues on this notion in his Surveillance, Snowden, and Big

Data: Capacities, Consequences, Critique (2014). As he notes, in today's data mining applications

"data are obtained and data are aggregated from different sources before determining the full range of their actual and potential uses" (4). It is a preventive approach of looking at national security characterized by a bulk collection of (meta)data. The problem with this way of working with data and to emphasize this even more, is that as Frické mentions in The Knowledge Pyramid: A

Critique of the DIKW Hierarchy (2009) this "encourages the mindless and meaningless collection

of data in the hope that one day it will ascend to information – pre-emptive acquisition" (136). This all works on the belief and promise that one day the information will become useful and is connected to the politics of preemption.

Referenties

GERELATEERDE DOCUMENTEN

Opgemerkt moet worden dat de experts niet alleen AMF's hebben bepaald voor de verklarende variabelen in de APM's, maar voor alle wegkenmerken waarvan de experts vonden dat

Table 6.2 shows time constants for SH response in transmission for different incident intensities as extracted from numerical data fit of Figure 5.6. The intensities shown

The MIDAS Project (Meaningful Integration of Data Analytics and Services) aims to map, acquire, manage, model, process and exploit existing heterogeneous health care data and

Moreover, assessment of some dimensions involves a level of subjectiv- ity (e.g., trust dimensions involves judgement of data source reputation), and in many cases only a

Given the use of the RUF as a prototype resource-based VNSA by Weinstein in his work (Weinstein, 2005), it comes as no surprise that the RUF ticks all the boxes on its inception.

Moreover, this approach makes it possible, analogous to previous fMRI drug studies ( Cole et al., 2010 ), to quantify changes in functional connectivity between all com- ponents of

Doordat het hier vooral gaat om teksten worden (veel) analyses door mid- del van text mining -technieken uitgevoerd. Met behulp van technieken wordt informatie uit

The questions of the interview are, in general, in line with the ordering of the literature review of this paper. Therefore, three main categories can be