• No results found

Life-mining Children: The Datafied Child and her Digital Shadow

N/A
N/A
Protected

Academic year: 2021

Share "Life-mining Children: The Datafied Child and her Digital Shadow"

Copied!
85
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Datafied Child and her Digital Shadow

Student Name: Sabina Bahisheva

Supervisor: Drs. Lonneke van der Velden

Second reader: Dr. Anne Helmond

Program: M.A. Media Studies

Specialization: New Media and Digital Culture

(2)
(3)

Abstract

Companies and government agencies appeal to the growing amass of metadata accumulated from online platforms, due to the possibility of tracking human behavior online (van Dijck 198). The data that users leave behind on online communication channels, allows for the

identification of patterns and serves as the predicting factor for future behavior. Much of the debate surrounding children’s online participation addresses issues of “cyberbullying, stranger danger, and sexting”, whereas less attention is given to data privacy (Brown and Pecora 201). Little research has been conducted about the effects of data surveillance practices on children – when every detail of their lives is recorded and stored online (Lupton and Williamson). As the number of children accessing the Internet grows, tracking activities pose a threat to children’s privacy and their online identity (Lupton and Williamson). The main focus of this research are web tracking and life-mining activities on popular children’s websites in the United States and Russia. The research question is: What type of trackers are at work on popular children’s

websites in the United States and Russia, and how do they contribute to life-mining and the data shadow of children? In order to answer the research question, top ranked children’s websites in

the United States and Russia were analyzed. With the help of the Digital Methods Initiative tool, ‘Tracker Tracker’, the presence of third parties on the children’s websites was detected.

Moreover, Ghostery database was consulted in order to obtain information about trackers detected. By analyzing what user data trackers collect, it showed how these devices contribute to life-mining, while the trackers retention rate and sharing activities with third parties made its contribution to the data shadow visible. Study results showed that the majority of the trackers detected were from the category of Advertising. Moreover, it was found that on average the trackers on US websites collect more user data, as compared to trackers on Russian websites. The type of data most collected by trackers on US and Russian websites is of technical nature, such as ‘IP Addresses’, ‘Browser Information’, and ‘Hardware and Software type’, followed by more personal data. Additionally, results showed that tracker companies detected on US websites share more data with third parties, than trackers on Russian websites.

(4)
(5)

Abstract... 3

Tables and Figures... 7

1. Introduction... 8

Outline... 13

2. Theoretical Framework... 14

Dataveillance... 14

Problems surrounding dataveillance...14

Life-mining and Data shadow... 20

Datafied Child... 21

Parental role...23

3. Methodology... 27

3.1 Why tracking the trackers?... 28

3.2 Choice of websites... 30

3.3 Tools... 32

Tracker Tracker...32

Geo IP...35

3.4 Glossary... 35

4. Case Study I: Tracking the trackers on children’s websites in the United States...40

4.1 Most common trackers... 40

4.2 Children’s websites with the most trackers... 41

4.3 What type of trackers are most common?... 43

4.4 Server locations of the trackers found... 43

4.5 What type of data is collected?... 44

4.6 Data retention rate of trackers... 45

4.7 Companies’ privacy policies regarding Children... 46

4.8 Data Sharing with third parties... 47

5. Case Study II: Tracking the Trackers on children’s websites in Russia...48

5.1 Most common trackers... 48

5.2 Children’s websites with the most trackers... 50

5.3 What type of trackers are most common?... 51

5.4 Server locations of the trackers... 52

5.5 What type of data is collected?... 53

5.6 Data retention rate of trackers... 54

(6)

5.8 Data Sharing with third parties... 56

6. Discussion... 58

6.1 Life-mining... 58

Type of Data...60

6.2 Data Shadow... 63

Tracker Companies Retention Rate...65

7. Conclusion... 68

(7)

Tables and Figures

Table 1. The Top 10 of the most common trackers on US websites for children...41

Table 2. The Top 10 of children’s websites in the United States with the most trackers...42

Table 3. Type of trackers appearing on children’s websites in the US...43

Table 4. Server locations of trackers found on children’s websites in the United States...44

Table 5. Top 10 of most collect type of data on children’s websites in the United States...45

Table 6. Data retention rate of trackers detected on children’s websites in the United States...46

Table 7. Number of trackers detected on children’s websites in the United States with statement about data privacy of children... 47

Table 8. Type of data trackers share with third parties... 47

Table 9. Most common trackers on Russian children’s websites analyzed...49

Table 10. The Top 10 of children’s websites in Russia with the most trackers...51

Table 11. Type of trackers on children’s websites in Russia...52

Table 12. Server locations of trackers found on Russian websites for children...53

Table 13. Top 10 of the most collected data on Russian children’s websites...54

Table 14. Data retention rate of tracker on children’s websites in Russia...55

Table 15. Number of trackers (detected on children’s websites in Russia) with statement about data privacy of children... 47

Table 16. Type of data trackers share with third parties...57

Figure 1. Categories of Trackers detected on children’s websites in the United States and Russia...59

Figure 2. The type of Data collected on children’s websites in the United States and Russia...61

Figure 3. The type of Data Trackers share with third-parties...65

(8)

1. Introduction

The development of new technologies has opened up new path ways to datify our daily activities, for example, our location, reading or exercise activities, and education (e.g. Khan Academy). Datafication, as Timo Elliott explains, it "is about taking a process or activity that was previously invisible and turning it into data […] that can be tracked, monitored, and optimized” (Elliott). A process which can lead to new opportunities and new challenges. Through

datafication, our body is extending into the digital world, where data shadows and digital bodies manifest. This data shadow is generated partially with and partially without user consent. As Philip N. Howard argues, “[t]he data shadow follows us almost everywhere[,] [w]e are not always aware of its appearance, but others can observe our silhouette” (Howard). However, as Felix Stalder points out, the data body does not just merely follow us, at times, it also precedes us (Stalder). Prior to our arrival somewhere, either digital or physical sphere, our data is being measured and classified, and we are met with treatment that fits a profile which supposedly represents us.

Companies and government agencies appeal to the growing amass of metadata accumulated from online platforms, such as Facebook, YouTube, Twitter, Skype, LinkedIn, iTunes, Gmail, and Hotmail, due to the possibility of tracking human behavior online (Dijck). On the same note, Viktor Mayer-Schönberger and Kenneth Cukier remark that “[w]e can now collect information that we couldn’t before, be it relationships revealed by phone call or sentiments unveiled through tweets” (Mayer-Schönberger, and Cukier). Scholars, such as José van Dijck, are concerned about how “datafication has grown to become an accepted new paradigm for understanding sociality and social behavior” and propose a more critical look at the entire ecosystem of connective media (van Dijck 198).

The data that users leave behind on online communication channels, allows for the identification of patterns and serves as the predicting factor for future behavior. In their research on activity prediction, Wouter Weerkamp and Maarten de Rijke found a new type of time-aware information extraction that they characterized as life-mining (Weerkamp and de

(9)

combined digital trails left behind by people who live a considerable part of their life online” (Weerkamp and de Rijke). This gives rise to the question: to who is this useful knowledge useful? Weerkamp and de Rijke believe that activity prediction is an interesting task for police and intelligence services, marketers, event organizers and TV stations (Weerkamp and de Rijke). It is understandable, from the surveillance and marketing point of view, that activity prediction is a powerful tool that gives insight about humans and their behavior. However, it is important to take into consideration that this logic in reference to human behavior is "a slippery slope between analysis and projection, between deduction and prediction” (Amoore qtd. in van Dijck 200).

Datafication and life-mining have become the social norm and play an important role in the interconnected knot of sociality, research and commerce. In addition, datafication promotes a culture of dataveillance. This is a form of continuous surveillance of a person’s online activities through a data trail, created by online actions such as web browsing and online purchases (Raley). Overall, dataveillance is used for the purpose of monitorization, identification, tracking, regulation, prediction and prescription (Clarke; Raley). The problem with this type of

surveillance of citizens, as van Dijck points out, is that it does not monitor for specific purposes (as with regular surveillance) but instead continuously tracks data for unstated preset purpose (205) – leaving the citizens often in the dark. Dataveillance is often conducted by “‘first parties’, which are sites the user visits directly, and ‘third parties’, which are typically hidden trackers such as ad networks embedded on most web pages” (Englehardt and Narayanan). The extend of people’s knowledge and consent regarding dataveillance varies. Certain forms of surveillance are done voluntarily, for example social surveillance on social media websites (Marwick) and self-surveillance by using self-tracking devices (Albrechtslund and Lauritsen).

Today, children spend a considerable time online. According to research conducted for Kaspersky Lab about connected kids, “[f]our in ten (44%) admit to being online constantly; ranging from 25% of those aged eight to ten to 61% of 14 to 16 year olds” (Kaspersky Lab). In addition, this research found that the most avid young Internet users are in the United States and Russia, where 83% to 88% of children use it daily (Kaspersky Lab). For children, datafication and dataveillance starts at a very early stage of their lives. Life-mining, which is arguably a part

(10)

of dataveillance, might be a relatively new concept for an average adult, as some parts of his/her life are not datafied. Young or in utero children, on the other hand, often times cannot escape this fate, especially when parents post their child’s life online and use tracking and monitoring devices.

A two-year investigation, showed that well-known companies – Viacom, Hasbro, Mattel and JumpStart Games – collected information from their (young) users without consent and allowed third parties to track users’ behavior on the Internet (L. H. Newman). Little research has been conducted about the effects of data surveillance practices on children – when every detail of their lives is recorded and stored online (Lupton and Williamson). As the number of children accessing the Internet grows, tracking activities pose as a threat to children’s privacy and their online identity (Lupton and Williamson).

The main focus of this research are web tracking and life-mining activities on popular children’s websites in the United States and Russia. The research question is: What type of

trackers are at work on popular children’s websites in the United States and Russia, and how do they contribute to life-mining and the data shadow of children?

In order to answer the research question and to provide structure to this thesis, there are two sub questions:

- Whether life-mining of children occurs more on popular children’s websites in the United States or Russia?

- How large is the data shadow of children in the United States and Russia?

To answer these questions, the top ranked children’s websites in the United States and Russia are analyzed. With the help of the Digital Methods Initiative tool, Tracker Tracker, which was developed in 2012, the presence of third parties on the children’s websites is detected (Gerlitz and Helmond). The Ghostery database is consulted to obtain information about trackers detected – the type of data they collect. Trackers are “devices that allow for user-data

collection, such as internal tracking devices, bugs, widgets, external analytic services and further interfaces to the cloud” (Helmond). Ghostery is a free privacy browser extension for Internet browsers Cliqz, Firefox, Chrome, Opera, Safari, Edge and Internet Explorer, and mobile browsers

(11)

Android and iOS (“Download the Ghostery Browser Extension”). Moreover, using the ‘Tracker Tracker’ and the Ghostery database, an overview is created of the tracking activities on

children’s websites in the United States and Russia. Furthermore, tracking is briefly discussed in light of datafication of children and problems surrounding dataveillance.

In this thesis, it will be argued that the data shadow is an effect of mining. The life-mining activity collects user data, resulting in a data shadow that follows and sometimes precedes users, while growing over time. Analyzing life-mining provides content about the tracking activities, while the data shadow analysis gives insight about the scope of these

activities. The scope testifies to which degree the data shadow extends. In order to analyze how trackers contribute to life-mining, information about the type of trackers and the type of data they collect is gathered. Looking at this data, provides insight into the content, concretizes life-mining, and gives an idea about what information is collected. In addition, by analyzing the data retention rate, data sharing and privacy policies of the observed trackers, the scope of the trackers and hence the data shadow is made visible.

Although the focus on tracking the trackers in studies is growing, little research has been conducted on the topic of dataveillance and life-mining of children. Additionally, life-mining is a relatively new concept which has not been applied to children’s studies before. This thesis, therefore, contributes to the literature of dataveillance and life-mining by seeking to expose the tracking activities on children’s websites. Moreover, existing studies on children’s online privacy do not compare the tracker behavior on children’s websites in the United States and Russia –the countries where children spend the most time online on average (Kaspersky Lab). The current study aims to address this gap with a qualitative and quantitative analysis. Furthermore, this study also contributes to the literature of children’s online privacy concerns in two ways. Firstly, by showing a disbalance of power between users and data collectors; and secondly, by

highlighting the short comings of the Children’s Online Privacy Protection Act (COPPA) in the United States and the lack of such regulations in Russia. Given the growing concern about online privacy (Rainie), especially that of children, this subject can also be useful for parents, as this research shows how children are being tracked by third parties on popular children’s websites.

(12)
(13)

Following the thesis' introduction, the thesis is divided into seven chapters. Chapter 2 presents the theoretical framework of the thesis – including the theory of life-mining and the concept of the data shadow. Additionally, problems surrounding dataveillance and datafication of children are also presented within this chapter.

Chapter 3 describes and explains the methodology of this research, that is, the way the data was collected and analyzed. Moreover, it explains the choice of focusing on trackers on children’s websites in the United States and Russia, and the parameters of the datasets. Additionally, the research and visualization tools used - the ‘Tracker Tracker’ and ‘Geo IP’ – are explained in more detail. Furthermore, a glossary is presented with a useful set of definitions for the terms used in this research. The aim of this chapter is to clarify how this research has been accomplished.

In Chapter 4 and 5, the case study findings are presented, using the methodology described in Chapter 3. Chapter 4 presents the findings regarding children’s websites in the US, while chapter 5 focuses on those in Russia. Each case study answers sub-questions of the chapter – taking a closer look at the most common trackers, websites with the most trackers, and type of trackers and their characteristics. Both chapters also examine tracker companies’ privacy statements about collecting children’s data.

Chapter 6 links the theory with the empirical research, and presents the discussion of the results – the findings are placed in the context of the literature discussed and are used to make observations about the impact of trackers on life-mining and the data shadow of children, and the difference or similarities of web tracking children on websites in the United States and Russia.

In Chapter 7 the contribution of this study to the debate of dataveillance and life-mining of children is elucidated, to the extent to which the case study findings can be generalized. Moreover, the chapter provides recommendations for future research.

(14)

2. Theoretical Framework

Dataveillance

Problems surrounding dataveillance

Nowadays, "the term "big data" is often used in popular media, business, computer science and computer industry" (Manovich 1). According to boyd and Crawford, as well as Lev Manovich,

Big Data is poorly defined. As Manovich observes, in computer sciences Big Data is referred to

as "data sets whose size is beyond the ability of commonly use software tolls to capture, manage and process the data within a tolerable elapsed time" (Manovich 1). However, this definition is outdated, as data can now be analyzed on desktop computers with standard software. A more fitting definition is provided by boyd and Crawford: “Big Data is less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets" (2).

In the realm of Big Data, as Manovich proposes, people and organizations are divided into three classes: "those who create data (both consciously and by leaving digital footprints), those who have the means to collect it, and those who have expertise to analyze it" (Manovich). The last group has the power to determine the rules – who gets to participate and how the data will be used. One of the problems concerning dataveillance, and Big Data in general, is that "only social media companies have access to really large social data - especially transactional data" (Manovich). In addition, only researchers working for these companies have – although often limited - access to data, while the rest of the scholarly community does not. Similarly, boyd and Crawford also point out that there is a power disbalance in Big Data studies, as

researchers and companies have tools and access which are not accessible to social media users as a whole. Moreover, many people are unaware of what is going on backstage, where a

multiplicity of agents and algorithms gather and store user data for future use - this reality is often not part of user's imagined audience (boyd and Crawford).

The example of the class action lawsuit against Google in 2012 shows a disproportion between users and data collectors, differing in how they think about personal data. The suit claimed that Google intercepted, read, and mined the content of Gmail messages - at that time

(15)

Google had around 104 million users in the United States alone (Rosenblatt). According to the case brought against Google, the information collected was "used to target ads and build user profiles" (Rosenblatt). However, Google believed that because mainstream media covered their data collection practices, people would be sufficiently informed about it. Therefore, according to Google, people who used Google’s email services “implicitly consented to having their correspondence scanned, sorted, and mined, since allegedly the public had widespread

knowledge of Google's data mining practices" (Mendoza qtd. in (Sugimoto, Ekbia, and Mattioli). Although Google’s defense was rejected, it remains an alarming issue. As Sugimoto et al.

argued, it is becoming increasing commonplace to claim that the “use of digital platforms, applications, and services like Google should be based upon an implicitly accepted exchange of personal information for service” (65). On the same note, Declan McCullagh argues that our norms are changing and people are becoming more open. He claims that “[p]articipating in YouTube, […] Flickr, and other elements of modern digital society means giving up some privacy, yet millions of people are willing to make that trade-off every day” (McCullagh).

Additionally, in December, 2015, a privacy-focused consumer advocacy group, Electronic Frontier Foundation (EFF) published a public complaint against Google Inc. and "the privacy practices of Google for Education" (Cardozo and Cope). According to EFF, Google collected information of students browsing behavior, search history, clickstream data, videos watched on YouTube, browsers extensions installed, and passwords, regardless of its relation to schoolwork (Cardozo and Cope). In response, Google stated that it was up to the school districts to allow students "to use Chrome Sync or use the app beyond its core education suite" and claimed that the (aggregate) data collected is used to improve Google's products (Peterson). The

aforementioned examples concerning Google, underline the power disbalance in Big Data. The example of the public complaint in 2015, emphasizes a power disbalance between Google, school administrators, parents and children, as no parties, besides Google, were aware of the tracking activities of children. Google has faced multiple lawsuits and investigations on multiple fonts (Kastrenakes; Cardozo and Cope; Sugimoto, Ekbia, and Mattioli). These examples are worth mentioning as it sketches an idea about how user data is mined and used by a company

(16)

that has a global market share of approximately 92%1 in general Internet search services

(“Search Engine Market Share Worldwide”). It can be argued that Google has a monopolistic control of user data, as the company does not only collect data from users using the search engine, but also data about their interests (based on their Gmail accounts), what they view on YouTube, their location (using data from Google Maps), "a whole array of other data from use of Google's Android phone, and user information supplied from Google's whole web of online services" (N. Newman).

In 2006, a group of researchers from the Berkman Center for Internet & Society at Harvard University gathered Facebook-profiles of 1640 freshmen college students in order to study the changes in their interests and friendships over a period of three years (Lewis et al.). The Harvard-based research group released a data set of Facebook profile information of college students from "an anonymous, northeastern American university" (qtd. in Zimmer), which gave other researchers the opportunity for further analysis and exploration. Soon after, other researchers found ways to de-anonymize parts of the data set, which compromised the privacy of the students, who were oblivious to being a subject in this research and who did not consent to the collection of their data (Zimmer). This example raises questions about whether data can remain anonymous, especially if user data is mined and controlled by large companies such as Google and Facebook. Moreover, this incident raised a lot of new ethical questions that did not only concern researchers but society as a whole: is public data on social media sites public? Can it therefore be used without consent? What is the code of ethical practice in this respect? Incidents as such fuel privacy advocacy, underlining the need for better privacy protection. However, as danah boyd and Kate Crawford wrote, "the difficulty is that privacy breaches are hard to make specific" (boyd and Crawford). Although data study opens up new opportunities for studying communities and societies, any study involving humans raises privacy issues, as it is hard to quantify the risk of abuse of such data (Berry).

In 1999, M. Jill Austin and Mary Lynn Reed already foresaw the problem of websites recording data by tracing children's online behaviors, such as the sites they visit and the choices

1 Search engine market share by all platforms (desktop, tablet and mobile). Based on the last 12 months (accessed on June 6, 2017).

(17)

they make on particular webpages. Although some companies claim that they do not share the data gathered on their sites with other advertisers, nor require personal information to be provided - they do use cookies (Austin and Reed). Cookies are small text strings that are sent by a Web server to a browser and stored on the user's hard drive (Millett, Friedman, and Felten). They keep track of the number of times a user has visited a site, the frequency with which one visits various parts of a website or application, the type of computer operating system, which browser and user’s ID. Companies also often collect data for improvement of the site,

customization of content, and for research and analysis – with the purpose of improving the overall performance of the webpage.

Dataveillance of people raises concern and questions about human rights in the digital age. In the European Union a reform of data protection rules was proposed in 2012 (“Protection of Personal Data”). The new set of rules, which will go into effect starting 25 May 2018, aim "to give citizens back control over their personal data [... and] allow European citizens and

businesses to fully benefit from the digital economy" (“Protection of Personal Data”). However, unfortunately, current legislations around the world primarily focus on the protection of the physical self and less so on the protection on the digital self. The protection of the digital self is often carried out through the measurements of privacy. Privacy is "the claim of individuals, groups, or institutions to determine of themselves when, how, and to what extent information about them is communicated to others" (Chai et al.). Advances in new media and computer technology bring forth a wide array of beneficial developments to our everyday life. However, alongside these developments, threats in cyberspace and privacy breaches continue to grow - developing into critical problems. As Stalder suggests, the concept of privacy does no longer work (121), as it has been eroding for a while with the growth of smart-phones, TVs, home appliances, self-tracking devices, social networking sites and applications. The privacy bubble is being “pierced by more and more connections to the outside world” (Stalder). However, to be completely absent from databanks “is neither practical nor desirable, as there are cases in which we want “them” to have our data” – in cases of national security or health related issues (Stalder 122). Moreover, it is important to take into consideration that the notion of privacy is complex and confusing, as there is no neutral conception of it. The concept of privacy varies

(18)

among people, making it hard to have a clear definition or law on this matter. In addition, it is often challenging for people to discover and legally charge organizations that generate data without their knowledge - referring back to the power inequality. On a similar note, Daniel Solove argues that it is too difficult for individuals to self-manage their privacy, “to weigh up the costs and benefits of agreeing to terms and conditions without knowing how the data might be used now and in the future, and to assess the cumulative and holistic effects of their data being merged with other datasets" (Solove qtd. in Kitchin 172).

Given that children might be less aware of the online risks - consequences and safeguards - children merit specific protection with regard to their personal data. As a

representative from the New York Attorney General's Office said, "[a]dults are cognitively more mature about the quid pro quo of visiting a site, […][but] [k]ids 12 and under don't have that understanding” (La). Children of today are exposed to a growing number and range of

commercial messages, both online and offline. Various studies show that children do not take adequate care to protect themselves online and engage in risky and inappropriate behavior online (Children’s Commissioner), which, as mentioned prior, might be harmful to their future. One of the problems being, as Eric K. Clemons and Joshua S. Wilson stressed, is that “young Internet users engage in online activities that reveal a great deal about the cost to serve them and their willingness to pay for goods and services, which could be used against them by well-informed sellers” (40). Moreover, Clemons and Wilson, did not only discuss the dangers of data-mining activities that are being granted to business but also to providers of educational

applications and services, which “create new risks to preteen and teen privacy” (41). Studies by Google and Fordham University “demonstrate that these practices (data mining of educational application accounts) ignore the explicit presences of parents and students" (Clemons and Wilson).

The European Parliament and US government are implementing protection rules, with the hope to benefit individuals and business, and limiting marketing to children under the age of 13. General Data Protection Regulation (EU) and the US Children’s Online Privacy Protection Act (COPPA), state in general that "processing of personal data of a child below the age of 13 years shall only be lawful if and to the extent that consent is given or authorized by the child's

(19)

parent or custodian" (Pfeifle). However, given the growth in the number of technological devices and growth in their use by children, and the increasingly younger age at which these Internet-connected devices are being used, it becomes challenging for controllers to obtain verifiable consent and for governments to track what data businesses are collecting.

Despite the US legislations, research shows that there are (well known) websites for children (or child-targeted), which collect information from their (young) users without consent and allowing third parties to track users' behavior on the Internet (L. H. Newman). After a two-year investigation by New York Attorney General Eric Schneiderman, it was concluded that Viacom, Hasbro, Mattel and JumpStart Games violated the COPPA. The research conducted showed that "third-party vendors used cookies and IP addresses to track kids under 13-years-old, giving them access to some of their personal information without first receiving their parents’ approval" (Addady). Nick Jr., Nichelodeon, Barbie, Hot Wheels, American Girl, Neopets and My Little Pony were among the websites that were illegally tracking children's online behavior. This two-year investigation underlines the power disbalance in big data – companies vs. users. It makes it difficult for parents to stay knowledgeable of everything that is happening behind the Internet vail. Moreover, the violation of COPPA by the child-targeted companies raises questions: Is collecting data of children under the age of 13 still acceptable even with parental approve? And whose task is it to protect these younglings, especially if parents are uneducated about the online world and are inadequate for carrying out this task? In addition, the effectiveness of COPPA is put into question, as with the implementation of the law, many social networking sites (e.g. Twitter, Facebook, Instagram, Snapchat, Pinterest, Reddit, Vine) chose to ban children under the age of 13 instead of obtaining affirmative consent from their parents. Research shows that such bans are proven ineffective, as millions of children under the age of 13 are using Facebook (Lenhart et al.). The COPPA act, which was promoted as a

‘parental empowerment tool’, did lead to a power shift in “privacy control between the state, media industry and parents” – parents were obliged to take greater responsibility for their children (Montgomery).

Interestingly, in Russia, the online privacy regulations do not include rules regarding the processing of personal data of children. However, in September 2015, Russia did pass a data

(20)

localization law (Russian Federal Law No. 242-FZ), which states that companies "collecting personal data of Russian citizens online or offline, are obliged to record, systematize, accumulate, store, update, change and retrieve such data in databases located within the territory of the Russian Federation" (Savelyev). This gives rise to several complications, as the Russian law provides a broad definition of personal data - "the ability to identify among many persons a specific, unique individual" (Henni et al. 12). For example, if a person’s first name is stored but not his/her last name, the data will not be considered personal, as it is insufficient to identify an individual (Henni et al. 12). Moreover, the law does not address the question of whether an email address and phone number are considered to be personal data. Furthermore, it raises questions regarding how companies will detect the citizenship of individuals. In

addition, the law states that "personal data may be collected, stored and used only with the consent of the data subject (Henni et al.). With user consent, cross-border transmission and B2B (business to business) outside of Russia is permitted if data is initially collected and updated via a database located in Russia, and then transmissed “to third parties for secondary processing outside Russia is allowed” (Savitsky).

Life-mining and Data shadow

As aforementioned, datafication has become "an accepted new paradigm for understanding sociality and social behavior" (van Dijk). Social life aspects, such as "friendships, interests, casual conversations, information searches, expressions of tastes, emotional responses, and so on", are nowadays more and more quantified (van Dijk 198). Friending people and liking posts and pictures on Facebook has become an algorithmic relation (Gerlitz and Helmond), exchanging and watching audiovisual content is datafied by Youtube (Ding et al.), and opinions and political leanings are quantified from Tweets (posts on Twitter) and Retweets (reposting someone’s Tweets on your own profile) (Wong et al.). As social interactions move to web environments, datafying and life-mining these activities is becoming possible. Some studies use this data to predict upcoming activities in the near future (Szabo and Huberman; Yu and Kak; Asur and Huberman). In their study Weerkamp and de Rijke introduced activity prediction as a particular

(21)

instance of life-mining (Weerkamp and de Rijke). Using user data on Twitter, Weerkamp and de Rijke tried to "establish a set of activities that are likely to become popular at a later time" (Weerkamp and de Rijke). It can be argued that dataveillance and life-mining are intertwined as both can be used to monitor, identify, track, regulate and predict users and their behavior. Although most studies focus on the activity prediction instance of life-mining, this thesis will be looking into life-mining as a whole – it will explore the ways in which trackers life mine children

Datafied Child

When we speak of (online) identity, the agency of the individual is central. As Tema Leaver argues, there is "a presumption that identity should be controlled, curated and managed by the self in question [...] [and] [i]f the individual is not in control, then that is seen as a problem to be fixed, whether it be through privacy settings, terms of use, rights or access" (151). Children may engage themselves in activities that record details of their lives, but there are also many other actors who are doing so on their behalf (Lupton and Williamson). Before the child is grown and has the agency over his/her online identity, others set the initial frameworks of this identity and initiate the database building of the individual from an early age. In addition, pregnancy, has become a subject of representation and interpretation via digital technologies (Lupton and Thomas). It is common for parents to announce and share their pregnancy with a fetal

ultrasound image (Leaver). There are hundreds of apps that claim to aid in the management of pregnancy and children to some extent. According to the statistics provided by app analytics and app market data firm, ‘App Annie’, ‘Pregnancy Tracker & Baby Development Calendar’ app is one of the most popular pregnancy apps in the category of Health and Fitness (App Annie). This app allows expectant mothers to track their weight, belly growth, the amount of times the baby kicks and even has a contraction timer. After the baby is born, parents can record and track their baby's sleeping and feeding patters, but also their infant's growth and weight. Keeping track of such records is not new, though, previously it was recorded in paper form and was used for medical oversight. However nowadays, this data is also stored in corporate databases and users are not informed about its use. As Lane DeNicola argues, given the commercial mindset and the

(22)

aim for profit of such apps, "the question of what happens to this information now, or in the future, is very unclear" (DeNicola). As children grow, this type of monitoring and tracking can continue with the expansion of wearable digital devices (e.g. GPS and activity trackers).

Aforementioned, new media is becoming a fundamental element in the social and cultural development of children (Huk). More and more children, starting from a young age, access social media websites, gaming sites, virtual worlds, video sites and blogs through smartphones, tablets and laptops. For the youth, these websites, which have grown

exponentially in recent years, are a gateway to entertainment and communication. Once they start accessing websites, playing online games and using apps, the children’s online browsing and searching habits are datafied and surveilled. Currently, Facebook, YouTube and Instagram are the most popular social network websites ((Kallas), which are also frequently an object of scientific research (Vanderhoven et al.; Jordaan and Van Heerden; Aggarwal, Agrawal, and Sureka). Media habits of children are often times overlooked in the public discourse as the focus is often on teens and tweens.

Datafication and dataveillance also expands into the educational system. Not only is it possible to monitor a child's educational performance and progress but also his/her movement around the school (Taylor). Around the globe, surveillance schools are emerging, these schools are “characterized by an array of routine practices that identify, verify, categorize and track pupils in ways never before thought possible" (Taylor). In her study on surveillance schools, Emmeline Taylor, spoke about how microchipping (radio frequency identification), CCTV cameras, fingerprints and retina scanning are used to track and monitor children’s movements and purchases on school grounds (Taylor). In the United States, "children are monitored not only by commercial companies when they log into software, but also their personal health, wellbeing and education details are tracked by government agencies from early infancy until they start work" (Lupton and Williamson). These initiatives were introduced by the Obama administration with the purpose of using the data to contribute to educational policies (Lupton and Williamson). With this knowledge, it can be argued that schools are becoming

(23)

calculable datasets whose data can be analyzed in order to anticipate and predict their futures, and then used to prescribe and enact interventions to pre-empt or prevent identified risks" (64).

Parental role

At present parents are facing new and unique parental challenges, raising their children in a digitalized culture, as compared to previous generations. Social networking sites, such as Facebook, Instagram and YouTube are also becoming useful platforms on which parents can share the joys and challenges of parenthood. At an increasing rate, it is becoming a social norm for parents to document their children’s lives publicly. There is a lack of research on the issue of parents actively sharing information about their children on social media, which was also addressed by Anna Brosch in her research on the sharenting culture among parents on

Facebook. Sharenting is described as "the practice of a parent to regularly use the social media to communicate a lot of detailed information about their child” (Brosch). Parents are willingly putting a lot of information about their children online without their consent. The research, conducted by the Parent Zone on behalf of Nominet shows that an average parent posts about 1000 pictures of a child before they turn five (Nominet). Sharenting brings forth other issues, such as sexual predators, ridiculing by strangers (Klausner) and digital kidnapping, when a stranger steals baby photos and appropriates them as his or her own online (O’Neill).

Additionally, when parents share their children’s life online, they often forget that their content might be inappropriate in the future. As Richard Follett, an ambassador for the Child

Exploitation and Online Protection Centre, said: “[n]ot only might these images be used to embarrass them in their delicate teenage years, they could also be accessed by potential employers or university admissions departments” (Daily Mail). The ethics of posting pictures vary, and different views on it exist. This sharenting behavior should be taken into account when considering the parents’ awareness of the dangers their child might face online, but also their capacity to properly educate their children regarding safe navigation through the digital world.

According to a survey conducted by Duggan et al. in 2015, 33% of parents "have had

concerns or questions about their child's technology use" (Duggan et al.). Moreover, they found that only 12% of parents have ever felt uncomfortable about something a family member,

(24)

relative or friend have posted on social media about their child (Duggan et al.). Previous research, “The 2011 Teens and Digital Citizenship Survey”, showed that parents have mixed feelings about the impact of digital technology on their children (Lenhart et al.). The research involved 799 American teens of ages 12 to 17 years and their parents. The survey findings showed that compared to Latinos, whites2 had a more positive view on technology and believed

that it helped their child(ren) connect with friends and again access to information (Lenhart et al.). Similarly, high-income households (earning $75,000 and above), as well as college educated parents, were more likely to feel positive about the impact of Internet and cell phones on their kids (Lenhart et al.). The overall findings of Lenhart et al. showed that although parents in general were positive about the influence of digital technology on their children – there was still evident concern. Particularly the concern about online content to which children might be exposed, the tone of the social world online, and the amount of time that children spend on the web (Lenhart et al.). Interestingly, the concern about predators, cyberbullying and privacy is absent from the list. Observing Russian parents, Galina Soldatova and Vladimir Shlyapnikov point out that also these parents are more worried about the content found on the Internet instead of whom their child might be communicating with online. This results in overprotection of “their children from content risks while underestimating the communication risks” (Soldatova and Shlyapnikov). Additionally, it is important to note, that there are no regulations in Russia that prohibit advertisement of adult-content on child-targeted sites. Although websites might not promote such advertisements themselves, it can still be displayed on their websites without the owner’s knowledge – the website’s advertisement is overmastered through a virus.

According to the results of the “EU Kids Online II” survey in Russia, in 2010, “children in Russia see […] [sexual images] in the pop-up format 6 times more often than their peers” in Europe (Soldatova et al.). Being exposed to content is a visible process, however, datafication and life-mining of children is often an invisible one. Neither children nor parents are aware of how the child is being quantified and tracked online, as all of this is happening backstage. This process, to some extent, can be made visible when users download extensions that blocks trackers (e.g. Ghostery and Privacy Badger). Having installed such extensions, users can see what trackers (devices that collect user data) track them on websites they visit.

(25)

Another research, conducted by McAfee, “The McAfee Digital Deception Study 2013”, explored the online disconnection between parents and their children. This research focused on pre-teens, teens and young-adults. They found that despite the young people’s high awareness about the risks of sharing their personal data online, the youth betrayed their own judgements. The McAfee Study showed that young people, between the ages of 10 and 23, post personal data online while parents remain in the dark (Eichorn). This parental unawareness stretches further, as only 9% of parents are aware about their child witnessing cruel behavior online and only 6% are aware of their child being a victim of cruel behavior (Eichorn). Additionally, a research conducted by danah boyd et al., showed that the majority of parents prefer

governmental policies that provide information or guidance in regard to age limits on the use of websites and online services, instead of policies that create restrictions. Furthermore, 35% of parents did not want the government involved in any way (Boyd et al.).

In their article, “Faceless: Chasing the Data Shadow”, Many Luksch and Mukul Patel, argue that "[p]hysical bodies leave data traces: shadows of presence, conversation, movement"

(Luksch and Patel). However, as life-mining is described as a concept that extracts information, it can be argued that the data shadow that comes forth from this is more vivid, compared to when the data is surveilled, as mining is arguably done with a purpose. Luksch and Patel claim that these data shadows form data bodies, "whose behavior and risk are priorities for analysis and commodification, by business and by government" (Luksch and Patel). The idea behind securing the data body is allegedly to secure the human body, “either preventatively or as a forensic tool” (Luksch and Patel). However, when looking at the above-mentioned examples of Google, research at Hard University, and lawsuit against well-known companies (Viacom, Mattel, Hasbro and Jumpstart), sharenting, the expansion of dataveillance into everyday life, and the power disbalance between user and data collector, the statement of securing the data body is not assured, particularly in benefit of the individual (data provider). On top of that the security of children’s data bodies is alarming, especially when online privacy regulations have flaws

(COPPA) or when no regulations exist at all (in case of Russia). Lupton and Williamson claim that not only do parents, caregivers, family members, friends, teachers and healthcare providers aid in the datafication process of children, but also commercial entities who seek to “capitalize on

(26)

and profit from children’s personal information” (Lupton and Williamson). Provided that, children are becoming increasingly datafied via mobile and wearable devices, educational software and social networking sites. Given that children's online population has been growing, alongside with time spend online, it is interesting to observe how children are life mined

(“Home Computer Access and Internet Use”). By tracking devices, which collect user data, on children’s websites, this thesis is able to make life-mining and the data shadow more concrete, concepts which are difficult to grasp.

(27)

3. Methodology

More and more children rely on digital tools, social platforms and online services to learn, communicate, socialize with friends and family, play, or work (Third et al.). According to the study led by the Institute for Culture and Society at the University of Western Sydney, with support from Harvard University and UNICEF, “[c]hildren around the world increasingly think of access to digital media as a fundamental right” (Third et al.). Much of the debate surrounding children’s online participation addresses issues of “cyberbullying, stranger danger, and sexting”, whereas less attention is given to data privacy (Brown and Pecora). As discussed in Chapter 2, Internet users are under constant dataveillance by entities who track their online behavior. Given the topic of life-mining, it is interesting to see which trackers are at work on children’s websites, as data collection of this young demographic will have a great impact on their online and offline life as they get older. Mary Madden, senior researcher for the Pew Research Center's Internet Project stated, “[i]n many ways, teens represent the leading edge of mobile

connectivity, and the patterns of their technology use often signal future changes in the adult population” (Madden et al.). Therefore, dataveillance of young children and teens is particularly interesting for companies and organizations. In contrast, by bringing the web tracking behavior into view it will make parents aware of how their children are being monitored when they browse (children's) websites.

Given the difference in legal agreements about the process of data collection of children in the United States and Russia and the fact that both countries have the highest percentage of young Internet users (Kaspersky Lab), it is interesting to see whether there are particular trackers at work in these two nations. The wider aim of this research is to contribute to the debate of life-mining, with respect to children, and to provide some insights in the ecology of online tracking of children in the United States and Russia.

(28)

3.1 Why tracking the trackers?

Through cookies, referer header and other tracking technologies, third parties are able to uniquely identify users and obtain their browsing histories (Englehardt and Narayanan 1389). Additionally, third parties may have access to users’ location (first parties’ websites) and obtain other sensitive information (e.g. email and IP address) via the referer header (Englehardt and Narayanan 1389). However, according to Cliqz, a privacy centric browser and search company based in Germany, trackers “do not necessarily, by default invade the users' privacy” (Cliqz). As beforementioned, trackers can play an important role for websites, as the data returned can improve the website's usability and user experience. However, often times, the implementation of trackers brings forth “undesired side-effects for users and websites” (Cliqz). Trackers are most often outsourced to third parties, for example for measurement or reasons of convenience (Cliqz). In their “Anti-Tracking” report, Cliqz argues that "very often the information that the site itself receives is not critical on its own, however once combined across different web-sites (very often outside the scope of a single one published) it could potentially become very privacy sensitive profiling information” (Cliqz). For example, health information stored on pregnancy apps is not critical but very sensitive, especially if combined across websites and linked to profiles on social networking sites. In this example, critical data can be seen as confidential data, if compromised it could "result in significant and/or long-term harm to the [...] individual whose data it is". Arguably, the main difference between confidential and sensitive data is "the likelihood, duration and the level of harm incurred", as both data types must remain confidential by law (“What Are the Data Classifications”).

As Yu et al. argued in their study “Tracking the Trackers”, "[t]he [w]eb, as of today, has evolved to become an ecosystem in which users, site-owners, network providers and trackers coexist” (Yu et al.). Therefore, eliminating the trackers from this ecosystem would have consequences, as Internet advertising revenues in the United states totaled $59.6 billion in 2015, which was $10.1 billion higher than in 2014 (Silverman). In Russia, the Internet

advertising revenue ($1.73 billion) is lower compared to the United States. Yet, it is the largest in Central and Eastern Europe. PricewaterhouseCoopers (PwC), “a company that provides

(29)

assurance, tax and advisory services to clients in 158 countries”, predicts the revenue from advertising to reach $3.66 billion in 2020 (PricewaterhouseCoopers). Given this, “[i]f all 3rd party traffic were to disappear or be blocked overnight, there would be significant disruption to the ecosystem” (Yu et al.). Nonetheless, the loss of privacy of users cannot be validated with revenues, as privacy is a fundamental human right that needs to be protected (Warren and Brandeis).

Previous studies have dealt with concerns surrounding online privacy: not clearly informed consent, unauthorized collection of user’s personal information and usage of this data by third parties (Bergström; Woo; Smith, Milberg, and Burke; Wirtz, Lwin, and Williams). Moreover, studies have been conducted about the challenges in determining the boundaries of tracking children through digital devices, in reference to their privacy (Gelman et al.; Simpson).

Additionally, researchers have examined the practices of online tracking, profiling and targeting of children by advertisement companies (Cai and Zhao; Story and French). However, little research has been conducted on tracking the devices that track children online. The ‘Tracker Tracker’ tool has been used to analyze which trackers are at work in specific issue spaces (e.g. on news sites, privacy advocate sites, adult industry, technology and science blogs, children's sites, addictions) (Beyer et al.). However, their focus on children’s websites was very limited as they only analyzed Alexa’s ‘Top 10 Children Pre-School sites’. In addition, the tool has been used to track trackers on Dutch governmental websites (van der Velden) and to visualize the relative presence of Facebook, Twitter and Google trackers in the top 100 Alexa websites, in order to make reveal the alternative fabric of the web that these websites are creating (Helmond and Gerlitz). Another study that involved tracking the trackers was conducted by Stanford University, their focus was on how user data, such as username and email address, is being leaked to third parties (Mayer).

A group of Princeton researchers studied top 1 million websites, as listed by Alexa, and conducted "the largest and most detailed measurement of online tracking […] to date"

(Englehardt and Narayanan). Steven Englehardt and Arvind Narayanan, authors of the Princeton study, found that many websites are using fingerprinting – “tracking technique executed by running a code that requests a user's device to generate a unique set of information, and based

(30)

on that information, creates a unique fingerprint for that particular user" (“Browser

Fingerprinting Census”). By using this tracking technique, anonymous users can be identified based on the unique fingerprint of their hardware and software (Englehardt and Narayanan). Additionally, Narayanan and Englehardt’s research showed that fewer but larger third-party trackers are present on websites. Their findings showed that "all of the top 5 third parties, as well as 12 of the top 20, are Google-owned" (Englehardt and Narayanan).

As was mentioned before, tracking the trackers will give insight into what is happening in the back-end when a user visits a website. How children are being life mined and by whom on popular websites in the United States and Russia will be made visible by tracking the trackers on these webpages. Additionally, using the information about the trackers detected, the data shadow which comes forth from life-mining activities, will be exemplified.

3.2 Choice of websites

In order to compare the trackiness of child targeted websites in two national spheres, two URL lists are compiled from websites in the United States and Russia. The two URL lists are limited to 100 webpages each.3 For both the United States and Russia, the lists are based on ‘Alexa top

500 sites’ on the web. Alexa Internet, Inc. is an American company which "specializes in the commercial web traffic data and analytics” (Kurian). Alexa’s traffic estimates and ranks are based on data from their global data panel – “a sample of millions of Internet users using one of many different browser extensions” (“About Us”, Alexa). Alexa's top sites rank is calculated using the average daily visitors and pageviews over the past three months (“About Us”, Alexa).

Additionally, majority of the data is collected through websites that have installed the Alexa script and certified metrics (“About Us”, Alexa). However, Alexa does allow for website owners to manage the privacy settings of their certified metrics (“About Us”, Alexa).

For the compilation of the US URL list, the Kids and Teens category was selected and narrowed down to Pre-School, which contained a list of 217 websites. The Pre-School ranking

3 This is also convenient, since the ‘Tracker Tracker’ tool, which was used for analysis, can only process a maximum of 100 web pages at a time.

(31)

list also included webpages from countries other than the United States. Therefore, for this case study the URL list was narrowed down to the top 100 results which exclude the non-US

websites.

For the compilation of the Russian URL list, Alexa’s top sites in Russia were used. Similarly, the Kids and Teens (Дети и Подростки) category was selected, which yielded 468 results. However, the majority of the websites in the Russian Kids and Teens category were not intended for children4. Therefore, the Russian URL list was also compiled with the use of

‘Rambler's top 100 ‘ranking (‘Рамблер/Топ100’) website. Rambler (Рамблер) is a popular search engine and web portal in Russia, although not as big as Yandex (Vladimirovna). Yandex (Яндекс) is Russia's largest search engine, which had 55,4% of the country’s search market share in 2016 (Gesenhues). Yet, Yandex’s top website lists are not fit for comparison for both the United States and Russia. This is due to the fact that Yandex’s ranking is based on the ‘Citation Index’, which is “determined by the number and quality of your site’s backlinks, with quality being influenced by relevance of the context, geographical location of the linking site and the Quotation Index of your incoming links" (Gabdulkhakova). Similarly to Alexa, Rambler’s ranking system is also based on daily visitors and pageviews (“Family and Children”). Rambler’s top 100 could not be used for the compilation of the US URL list, as it only provides "the rating for various Russian-language Internet sites" (Elias and Zeltser-Shorer).

For the selection of top ranking child websites on Rambler, first the Family and Children (Семья и дети) category was selected, and second, the sub-category Children's entertainment (Детские развлечения). Given that the list of webpages also included websites targeted at parents, they were filtered out of the final Russian URL list for the research. The websites retrieved from Rambler were a good addition to the list of websites from Alexa’s, as both lists included children’s sites that were popular in Russia at the time of research.

The URL lists were compiled on 10 April 17, 2017. In addition, all the URLs were checked for content manually, as some links appeared to be broken. Furthermore, it appeared that both Alexa and Rambler’s top ranked sites in the Children category had websites that were hardly child-related and not intended for children. Given the nature of this research, it was important

(32)

to select websites or website pages which children could navigate themselves. Therefore, websites targeted at parents were also excluded from the list. Despite the fact that the URL lists on Alexa and Rambler included websites that were not for children, they both provide lists of popular websites and use same ranking system, and were thus useful for this research.

3.3 Tools

Tracker Tracker

The ‘Tracker Tracker’ tool was used to detect the presence of third parties’ trackers on the US and Russian websites for children. The ‘Tracker Tracker’ was developed in 2012 by the Digital Methods Initiative at the University of Amsterdam and is built on top of the privacy browser plugin Ghostery (Gerlitz and Helmond). Ghostery, a privacy browser extension, scans a webpage for third-party elements (trackers) and matches them to their database of over 2,200 trackers (“Ghostery”). The trackers detected are displayed in the upper right corner of the browser. In the drop-down control panel, users can click on the trackers that trace them and read a short description about the provider of the tracker (“Ghostery”). Users can click the learn more link to view details about the tracker and what data it collects (“Ghostery”). Additionally, users can choose which trackers to block.

Ghostery classifies the trackers into eight categories: Advertising, Site Analytics, Social

Media, Customer Interaction, Audio Video Player, Comments, Essential and Adult Advertising

(“What Are the New Tracker Categories?”). No trackers were detected from Adult Advertising category in this research.

By imputing a list of URLs in to the ‘Tracker Tracker’, the tool is able to detect predefined ‘fingerprints’ of cloud devices. The ‘Tracker Tracker’ categorizes the trackers according to the Ghostery’s classification. The tool’s output includes information about the number of trackers found on the given website(s), the name of the trackers and the type of trackers.

While using the tool there are several things that need to be taken into consideration. Firstly, the ‘Tracker Tracker’ database was last updated on 24 March, 2017. Given that the tool was used on 10 April, 2017, for analysis, it might not include the newest trackers. Secondly, as

(33)

the tool is not able to automatically click and therefor accept the cookies notification, it won’t be able to retrieve trackers beyond the cookie wall (“Tracker Tracker”). However, this is more relevant to European websites, as they are required by law to notify the visitors about the presence of cookies and get their consent “to store or retrieve any information on a computer, smartphone or tablet” (Vassilo, Metwalley, and Giordano). Russian and US based websites do not have to comply with this EU law. Nonetheless, this research will focus on automated tracking devices, which do not require active participation from the users through widgets and social buttons. This does not mean that these widgets and social buttons do not track users online. In his essay “Facebook Tracks and Traces Everyone: Like This!”5, Arnold Roosendaal

states that the Facebook Like button, besides being a nice feature for content providers is also “used to place cookies and to track and trace web users, regardless of whether they actually use the button” (Roosendaal). Therefore, the ‘Tracker Tracker’ may still detect trackers from these tools. However, when a user interacts with a widget or social buttons, this might bring forth additional trackers from different sources.

Thirdly, given that “the tool is not being run on a real browser but using a scripting language called phantomjs”, it may fail to load some content accurately (“Tracker Tracker”). Fourthly, some webpages might “load different elements based on IP, device fingerprinting or randomly” (e.g. an advertisement carousel) (Tracker Tracker). And lastly, at time of writing this thesis, Ghostery was acquired by Cliqz, a German start-up company which build an anti-tracking browser with a built-in private search (Lomas). So far, besides the regular tool updates, there have only been changes made to the ‘Tracker Tracker’ tool format. However, according to Emile den Tex (a programmer for the Digital Methods Initiative), new development in anti-tracker software is a move away from having a database of trackers and looking for the presence of those trackers on websites towards having a smart (AI-algorithmic and user collaborative) filter which tries to prevent the browser from sending personal-identifying information towards other servers (personal communication, March 24, 2017). This may affect how we can use the ‘Tracker Tracker’ in the future. However, both the acquisition of Ghostery by Cliqz and future prospects

5 This paper was a work in progress. The final version was published in 2012, as “We Are All Connected to Facebook…by Facebook”, in: S. Gutwirth et al. (eds.), European Data Protection: In Good Health?, Heidelberg: Springer (2012), pp. 3-19.

(34)

of anti-tracker software do not affect research conducted for this thesis. Moreover, the ‘Tracker Tracker’ is still useful at the time of this research.

Prior to tracking the trackers with the DMI tool, the output settings were tested. Firstly, the output of the ‘Tracker Tracker’ tool was set to only analyze specified pages – the US and Russian URL list. Afterwards, the settings were changed to an output which also included subpages (one for each URL). The test results showed that subpages had fewer trackers

compared to the number of trackers found on the specified pages (homepages). Moreover, the DMI tool did not detect new trackers on subpages, when compared to their homepage trackers. The only exception was the US website, Thekidzpage.com, of which the output with one

subpage yielded more trackers compared to the homepage. These additional trackers were included in the final result. Aside from this, it was decided to focus the analysis on specified pages for tracker activity.

The ‘Tracker Tracker’ output was arranged in ‘OpenRefine’, application for data cleanup. The columns were rearranged and exported as Microsoft Excel file. Using the Ghostery

database, it was possible to obtain company information about the trackers. This gave insight into their privacy policies, data retention periods, data sharing policies with third parties, and what data is collected. By visiting the trackers websites and reading through their privacy policy, information was collected about whether these trackers make statements about the possible unintentional collection of data from children.

Results showed that not all websites analyzed have trackers. Therefore, websites on which no trackers were found are not included in the analysis and network visualization. The following websites do not have trackers6:

http://www.starfall.com/ http://www.Paulysplayhouse.com http://www.Keacoloringbook.com http://www.Goobo.com http://meddybemps.com/funandgames.html http://Heroesleague.ru

6 This was double checked using the Ghostery privacy browser extension for Chrome - no trackers were found on the websites mentioned.

(35)

Geo IP

The ‘Geo IP’ tool was used to translate the URLs from the trackers found on the US and Russian websites. The ‘Geo IP’ uses Maxmind's GeoCity Lite database (last updated on February 07, 2017) to resolve the URLs or IP addresses to a geo-location (“URL and/or IP to GEO”). By entering the website of the company to whom the tracker belongs, the tool is able to find the sites primary IP address. For example, in order to get an estimate about which jurisdiction the ‘Google Analytics’ tracker falls under, the tracker's webpage (https://analytics.google.com/) is used as input in the Geo IP tool. This, however, is merely an indication, as companies can have their software located on different servers around the world.

3.4 Glossary

This glossary is intended to provide a useful set of definitions for the terms used in this research. As the characterization of the trackers in this research is based on the output of the ‘Tracker Tracker’ tool (and hence, Ghostery), the following glossary provides a more elaborate definition of the terms in order for the reader to better understand the different types of trackers and what they do. Additionally, the type of data that trackers collect when users visit a website is also better explained under the headers: ‘Type of Data’ and ‘Specified Data’.

Types of Trackers

Advertising The advertising (ad) web tracker is commonly used “to provide advertisement companies with information about web users” (Meuwese). The purpose of the tracker is to collect data components that might reflect user’s intention and motivation for accessing the webpage. Moreover, the ad tracker provides “advertising or advertising-related services such as data collection, behavioral analysis or retargeting” (“What Are the New Tracker Categories?”).

Referenties

GERELATEERDE DOCUMENTEN

Hospitals exchanging data among themselves is not considered, since this has been already widely researched (Gordon & Catalini, 2018). I will look into the different

Our interviews revealed that the historical traffic and location data pertaining to internet, as defined in appendix B to section 13.2a of the Dutch Telecom- munications Act, is

This article asks to what extent the children with ties to the jurisdiction of the Nether- lands in camps in Syria, Turkey and Iraq, fall within the jurisdiction of the

Regarding to the constituency differences the electoral majority, shows to have positive effect on the MPs‟ likelihood of having outside connections, media engagements

champion Bohèmes of international trusteeship which may provoke unrest and result in colonial désintégration, and may at the same time alienate us from the European states whose help

Without going into further discussion of this issue (see some remarks by Pontier & Pernin, section 1.5, and Kroonenberg), it is clear that the standardization used is of

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

The similarity reductions and exact solutions with the aid of simplest equations and Jacobi elliptic function methods are obtained for the coupled Korteweg-de Vries