Finding life trends using data analysis in Cookery

(1)

Bachelor Informatica

Finding life trends using data

analysis in Cookery

Dennis Kruidenberg

9th June 2017

Supervisor(s): Adam Belloum, Mikolaj Baranowski

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

With the increasing amount of data the importance of data analysis has grown. A large amount of this data has shifted to cloud based storage. The cloud offers storage and compu-tation power. The Cookery framework is a tool developed to build application in the cloud for scientists without a complete understanding of programming. The framework does not have any programs to prove its workings. This thesis will look at an use case to prove that it is possible to build a data analysis pipeline using the Cookery framework. The use case will cover topics such as cloud programming, SaaS, oAuth and machine learning. The use case can give a user a way to see life trend based on email data by visualising informa-tion in graphs. Fellow students are also working on Cookery projects called ’Cookery in Amazon Lambda’ and ’Cookery as an If This Then That alternative’. These three project are building the foundation of the Cookery ecosystem.

(4)

(5)

Introduction

The world of data is growing faster than ever, doubling every year. In 2020 it is expected that 40 trillion gigabytes of data will be created, replicated or consumed [1]. This is 5200 gigabytes per person. Hidden in this huge amount information (usually referred to as Big Data), there is lots of data that is not apparent at first sight. This data can reveal unexpected connections, optimise business operations or even habits of people. Data analysis is required to extract the information. New techniques on data analysis are constantly being worked on and improved. Methods such as machine learning and neural networks are regularly in the news as they tackle larger problems. But for the person is data analysis out of reach. This is in contrast with other topics regarding computer such as setting up a network or building a website. Data analysis can and should be made easier too.

Another development in the consumer computer world is the cloud. Instead of keeping data on your personal computer, everything from photos to documents is stored in the cloud. Al-though most people won’t know exactly what the cloud is, they are widely adopting it. The cloud is not only useful for storage, it can also be used to compute and run applications. This is called cloud computing and has a lot of applications in science and industry. One of these applications is Cookery. Cookery is a framework for designing Domain Specific Languages for scientific applications in a cloud environment. The framework was designed by the PhD student Mikoaj Baranowski at the university of Amsterdam and will be the starting point of this project.

1.1 Research question

When a scientist runs an experiment it can generate a lot of data. This data needs to be ana-lysed in order to answer research questions. For people without a computer science background, can this be difficult. It requires programming, which has a big learning curve. Cookery tries to tackle this problem and is lowering the barrier to data analysis. In this project Cookery will be extended and will help scientists without programming skills to be able program.

Personal challenges of this project are understanding and apply new principle such as data analysis, cloud computing and Cookery. These three elements will be combined into a working product that scientists can use. The research focuses on how to handle data in the Cookery framework. My main research question is; ”How to define a data analysis pipeline using Cook-ery”. A data analytics pipeline includes a number of data processing stages. It starts by reading the users data sets and generates results, which could be in various forms (graph, text, images, etc.).

(8)

1.2 Goal of the project

The Cookery framework is only a proof of concept at this point in time. It is functional, but has only a few programs to prove this. During this project I will extend the framework with the ability to use data analytics in cloud services. This extension will focus on handling data and how a data pipeline can be constructed. This will display the usability of Cookery and help improve the adoption by the scientific community. In order to show the functionality of the data pipeline, a use case will be implemented. The use case is about analysing Gmail in order to visualise life trends, more in chapter 4. This use case will show what is possible in Cookery and will be used to illustrate the use of the extension that will be developed in this project.

(9)

CHAPTER 2

Related Work

This chapter contains all the related technical aspects of this project. Cookery is a cloud based application. Cloud computing and Software as a Service are concepts that are in direct contact with Cookery. The communication between the client and the server is done through HTTP command via REST and API’s. The Cookery application is using this in order to interact Google services that are needed in the use case. The Google services are machine learning and the Gmail server, which both need oAuth to verify the user.

2.1 The Cloud

The cloud is becoming increasingly popular. Due to the improved usability of the products offered in the cloud, consumers are widely adopting it. The cloud consists of servers, which are accessible via the internet. These servers can provides services such as reading your emails in Gmail or viewing photos in Dropbox. With the adaptation of the cloud, more complex services such as cloud computing are also becoming more important. Cookery is a cloud based application. It can process and analyse data without the application running on your personal computer.

2.1.1 Cloud Computing

The cloud is not only a place for storing photo’s or files. It is also a place for programs to execute code. According to the National Institute of Standards and Technology the definition of cloud computing is: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.[2] This means that there are resources in the network that can do computing for the user. There are several advantages to cloud computing. The first advantage is that businesses can outsource their IT systems and let them run in the cloud [3]. The effect of outsourcing is that businesses do not have to maintain their systems which a specialised company can do it more efficiently. This is due to the fact that the resources can be shared and only are used during peak operations. The company can run their IT cheaper in the cloud than if they would do it locally[4]. Secondly, a considerable advantage is a simple device such as a mobile phone can be used to do more complex operations, which gives the mobile phone more performance. Home assistant devices such as Google home or amazon’s echo are also becoming more popular [5]. The devices work by doing only a small fraction of the computation and storage locally. Everything else is done through interaction with cloud services[6]. This makes them very versatile. Finally, Cloud computing offers great accessibility. The information can be accessed from anywhere, where there is an internet connection. This makes it location independent and creates greater flexibility. However, cloud computing also has disadvantage Since the data is outside of your reach in the hands of another company, there needs to be trust, especially when it concerns sensitive data.

(10)

2.1.2 Software as a Service

Software as a Service (SaaS) is a principle that is used in cloud computing. The software is directly offered to the customer via the internet, usually the web browser. The customer does not own the software, SaaS focuses on separating the possession and ownership of software from its user[7]. SaaS is a part of cloud computing. It refers to the software that is run in the cloud.[4]. The use of SaaS requires communication between the client and the service in the cloud. . This is usually accomplished using the standard HTTP protocol[8]. Applications are also possible to interact with a SaaS, for example your email client on your computer exchanges information with the email servers, this is done via an Application Protocol Interface(API).

2.1.3 REST

The communication between the client and the server is necessary to make SaaS functional. One of the most basic principles is Representational state transfer(REST). Many of the application make use of the REST standard. The REST standard is not a protocol like HTTP, but is more of an architectural style [9]. This means that every application can have a different implementation of the REST, but they all do follow the same style. REST is stateless, so every action is on itself and no data is stored.

REST uses the standard HTTP protocol. Through different HTTP commands as POST, HEAD, PUT, GET, DELETE it is indicated what actions needs to be taken. These commands refer to actions such as Create, Read, Update or Delete (CRUD). What the action exactly does is not standardised in the REST style, but is defined by the developer of the service that implements the REST. So REST is a way to communicate between an application and a server by exchanging HTTP requests. An Application Protocol Interface (API) also uses REST in order to commu-nicate with a SaaS. The API is usually provided by the service and it allows other programs to use the functionalities of the service.

2.2 oAuth

In the use case there is interaction between Cookery, Gmail and machine learning API. These services need verification to get private data. This verification process is done through oAuth.

2.2.1 Technical aspects

The main priority of oAuth is security since it is a standard for password delegation. This results in a complex system to validate the users credibility. However, not only the users need to be checked, also the application from which the user is accessing information needs to be validated. The authorisation layer of oAuth consists out of four players; the resource owner, the resource server, the client and the authorisation server[10].

Tokens

The interaction between the client, which is the application, and the resource server is done through the resource owner(the user) and the authorisation server. The communication between these players is done with the help of tokens. The primary token used is called the ’access token’. The access token is the credential used to access protected resources [10]. The token is repres-enting an authorisation issued to the client. On the authorisation server where the information about the token is stored, there are also different scopes and the duration of the access that was granted by the user stored. This information is used to check if an access token has the privilege to access private information. The token may contain data but usually is an identifier for the servers. The abstraction layer between the client and the user is provided by this access token and replaces the normal credentials such as user name and password. It also enables the user to give better control over which data can be accessed and for how long. The lifetime of an access token is defined by the client but is usually 24 hours. The resource server defines the format and

(11)

structure of the access token.

Next to the access token is the ’refresh token’. These refresh tokens are used to get the ob-tain access tokens. The refresh tokens are send from the authorisation server to the client after an user has granted access. This token can also be used when an access token is expired and the scope needs to be changed, although this needs additional conformation from the user. Unlike the access token, the refresh token is never send to the resource server but is only send to the authorisation server. The refresh token is, just as the access token, represented as a string. The oAuth protocol has well defined steps in order to reach verification. The protocol starts at the application that wants to access private information. This application needs a client id. This client id is registered at Google and is used to make the access token unique for the ap-plication. The application sent a request to the client to log into the apap-plication. The user is sent to a web page with the option to log in. After the client has the correct credentials to log in, it sees an overview with what permission the application wants to have access to. When he grants access, there is an access and a refresh token sent back from the authorisation server to the application. With these token, that only works for one specific client id, can the application access the resources that the user has granted access to. This token is stored and stays valid for days or month, depending on the application. The refresh token can be used to obtain a new access token after the validity of the access token ends. A schema of the verification steps is presented in Figure 2.1.

Figure 2.1: Flow of the oAuth protocol [11].

2.3 Machine Learning

The use case of the project makes use of Google’s machine learning API . With this API users can categorise data with the help of machine learning. Machine learning gives the computers the ability to learn without being explicitly programmed. Machine learning is used to personalise adds on web pages or give recommendations on websites such as Amazon. Machine learning has become increasingly popular due to the greater availability and variety of data. The computa-tional processing is getting more powerful and cheaper, which make it more feasible. [12].

(12)

2.3.1 Techniques

Although the techniques for machine learning are complex, it is good to know basic principles for a more complete understanding of machine learning. Machine learning enables regression, classifications and clustering.

Clustering is a technique where a set of object is grouped in such a way that the objects in the group are more similar to the objects in the group than objects outside the group. A similar but different technique is classification. Classification concerns about the problem of matching a new object to a category[13]. An example of classification is spam filtering or language recogni-tion. In classification systems are there many algorithms used such as the Naive Bayes Classifier that is based on the Bayes’ theorem and the Nearest Neighbour Classification that is used for pattern recognition. A big advantage of classifier systems is that the systems can be upgraded incrementally and thus improved to get better performance without a loss of previous results. Regression analysis is the attempt to find relationships between variables. It tries to find how a dependent variable reacts if the independent variable changes. This is also used in finance and the medical industry. These and other techniques are combined into a system that makes predictions based on data.

2.3.2 Combining machine learning with cloud computing

The importance of machine learning in certain applications is quite clear. But the implement-ation is difficult. Fortunately, there are companies that are combining machine learning and cloud computing. An example of this is the Prediction API from Google [14]. This service, and others, makes it easier to apply machine learning to your own application. It combines all the discussed subjects; cloud computing, API’s, oAuth and machine learning. The data is uploaded to a Google server, where it is analysed and modelled. After the training, you can sample the model on new data and receive classification. This can be used to forecast weather or show recommendations for your web shop. There are also pre-made models that can analyse language, sentiment or categorise Tags on social media. These pre-made models are used in the use case. Although the Google API makes the implementation easier, there still needs to be a thorough understanding of oAuth and requires general programming skills. Data analytics still proves to be a difficult task for scientists. Cookery tries to reduce personal knowledge needed to apply data analysis or other scientific applications.

(13)

CHAPTER 3

Cookery

Cookery was developed to enable scientists to run scientific applications on cloud computing resources. Baranowski stated that the differences in the environments of cloud providers make it difficult to learn [15]. Cookery was developed as a cross-platform and cross-cloud environment application model. This model had to be simple, to shorten the time for learning the Cookery environment. A simple model also helps the users to understand the application logic. The back-end of Cookery makes use of Python.

The front-end Cookery makes use of Jupyter. Jupyter is a client-server application that has gained a lot of popularity, especially by data scientists [16]. Jupyter Notebook is a web-based console that allows for interactive computing. It is suitable for capturing the whole computation process. This ranges from documentation to executing code and showing the results. This means that a scientist can show his whole process from the data to the graphs and plots along with doc-umentation. The code for generating these graphs can be dynamic. The user can alter the code and plot to adapt it to their application. Jupyter has a build-in Python kernel to compile and execute Python programs. The developers of Cookery also made a kernel for the communication between Cookery and Jupyter.

3.1 How does Cookery work?

The Cookery model makes use of three layers. The layers are visible in Figure 3.1. Every layer is meant for a different user.

(14)

We will start at backed layer of Cookery. This layer is required to establish connection and allow the other layers to access remote resources. The setup needs to be done one time to get Cookery running on a server

The second layer concerns the domain-specific language (DSL) of Cookery. A DSL is a pro-gramming language specific to certain area of application. The DSL is specific to Cookery. The DSL is used to define the elements that are used in layer one. The DSL for Cookery is based on the English language to make them understandable for people without programming knowledge. The elements in this layer are constructed from activities. Every activity consists of a variable, action, subject and condition. A variable definition is an optional element and can assign a result to a label, which is a String. An action refers to an action implementation in the DSL, for example ’count’ to count a subject. Subjects are optional elements that refer to a subject implementation in the DSL. Subjects are distinguished from actions by starting with a capital letter. Examples of a subject are a data file or a variable. Conditions also optional and are using the keywords ’if’ and ’with’. They refer to conditions specified in the cookery program. The data flow of an activity is shown in Figure 3.2

Figure 3.2: Data flow of an activity [15].

The first layer is the layer where the end-user will be interacting. The user will combine already defined activities to create a Cookery program. As is apparent after explaining the working of Cookery, the framework offers a separation of the creating of general function by developers and the creation of user-specific programs by the end-user. The layers reduce the amount of code that the end-user has to write, but still allows flexibility.

3.2 How is Cookery useful?

There is a difference in the difficulty of programming languages. A low-level programming lan-guage such as C is generally more difficult than a high-level programming lanlan-guage. Although the programming has become easier with Python or Java, it still requires considerable amount of programming skills. Contrary, other technological processes such as setting up a router or building a website has become easier. Cookery is make programming easier by the use of an ab-straction layer. The developers are designing and programming the activities, which the end-user (i.e. scientists without programming knowledge) uses and combines them in a working program. Cookery has the ability to create complex programs that would require a lot of programming knowledge if the Cookery framework was not available.

(15)

CHAPTER 4

The extension of Cookery

4.1 The use case

To answer the research question about how to define a data analysis pipeline in Cookery, there was a use case needed to combine data analysis and Cookery. In 2012 Stephen Wolfram wrote a blog post called ’The Personal Analytics of My Life’ [17]. In this post he tries to analyse his personal life by looking at his personal email, calendar appointments, phone conversations, keystrokes and even his walking with a pedometer. Some of its data went as far back as the 1990. Based on the plots of the data he tries to find life trends such as changes in career or sleeping habits.

The use case for this project will try to replicate his blog by finding life trends using Cook-ery. The project tries to make this kind of personal analytics more open and dynamic. The personal analytics will only be based on emails as they are more accessible. Cookery is a cloud based application, which everyone can access. They can login into their Gmail account using oAuth and retrieve life trend analysis in the form of graphs. A visual representation of the data flow for the project is shown in Figure 4.1.

This is a good scenario for the research question because of the many connections (represen-ted by the arrows in Figure 4.1) that need to be made. This means that Cookery and the data pipeline within need to handle data from multiple sources, some of which need to be stored to a file or be kept in the program memory, and give visual results back to the user.

(16)

Figure 4.1: A. Front-end communication between user and Cookery. B. Jupyter accessing cookery with the kernel. C. Cookery requesting oAuth for access. D. Login process of user and granting access E. Retrieving personal emails from Gmail. F. Analysing language and semantics with Google’s Prediction API.

4.2 Implementation and Evaluation

This project has a great dependency on oAuth. This made it obvious to begin this project with the implementation of oAuth (Figure 4.1 - C). The first step was to register the application at Google in order to obtain the credentials. This was done at the Google API manager[18]. After the registration, it was possible to download the ’client secret’, which needs to be stored client side. Now that the application was known at the oAuth authentication server, we could start writing code to talk to these servers. Luckily, the documentation from Google on oAuth was extremely useful [19]. With the help of the Google oAuth API, the user gets send to a web page in order to fill in their credentials. After correctly filling them in, the user sees the permission the needed by the program. After granting access the user is sent back to the application which received the tokens. This scope for this project is specified as in Listing 1.

1 SCOPES = ['https://www.googleapis.com/auth/gmail.readonly',

2 'https://www.googleapis.com/auth/prediction']

Listing 1: Scopes used by oAuth.

The application can only read emails and access the prediction API. The credentials were stored, so this verification only needs to be done once. Unless the scopes get changed, the credentials expire or the user retracts the permission.

After the oAuth was working properly, we could start with retrieving emails (Figure 4.1 - E). With help of the documentation provided by Google, we could read emails from the server. The apiclient package1_{from Google was used to create a service through HTTP. The API service was}

1_{Google APIs Client Library for Python:} _{https://developers.google.com/api-client-library/python/}

(17)

needed to access the API’s. Before this service could establish a connecting with the API, it first had to authorise the credentials of the user in oAuth. The retrieval of individual emails from the server was done in two steps. First had to be requested as list of emails. In this request it is possible to specify which label the emails needed to have. An example of a label is ’INBOX’ or ’SENT’. In the list of the first request are the ids of individual messages that were sent, which can be retrieved by their id as shown in Listing 2.

1 # Get list of messages from Gmail server 2 response = gmail.users().messages().list(

3 userId='me', labelIds=['SENT']).execute() 4 # Retrieve individual messages

5 for i in response:

6 message = gmail.users().messages().get(

7 userId='me', id=i['id']).execute()

Listing 2: Code for requesting individual outgoing messages.

A request had a maximum number of 100 emails. If more emails are needed, it was necessary to use the ’Next Page Token’. This Next Page Token is returned in every request and can be used to the next 100 messages for that request. The next page token is given as parameter to the request as in Listing 3.

1 # execute a response based on the nextPageToken

2 response = service.users().messages().list(userId=userId,

3 labelIds=labelIds,

4 pageToken=nextPageToken).execute()

Listing 3: The use of the next page token.

The message that was returned from the server was in the form of a dictionary. In the emails we were interested in the body, time and date and the participants. The body was encoded in base64. This needed to be decoded back to ACSII to makes them readable. The decoding was done by the following line of code: base64.urlsafe_b64decode(part['body']['data'].encode('ASCII')). When a message is a reply on another email, does the body also contain the previous message. This is not something we want later on with analysing semantics or language. A little parser was needed to filter out only the text was written in the reply. Time and date were represented as a String in the received dictionary from Google. With the help of the datetime package in Python, we were able to convert it into a datetime object. The participants were formatted as ’Alias - Email address’. The participants were grouped by their address field (TO, CC, BCC) in the dictionary.

The next step was to implement the machine learning API (Figure 4.1 - F). This was done in a similar way as with the Gmail API. The API can be used to create your own model based on a data set. We were not interested in this part of the API and therefore not implemented. We were more interested in the pre-trained models from the API. These are models are used to analyse semantics and predict the language of an email. Google made the models based on their data set. A body of a message could be sent to the servers, analysed and returned to Cookery. The options for semantics were positive, negative or neutral. Language can be categorised as English, Spanish of French. Listing 4 is a request to the machine learning API to analyse the semantics of an email.

The user can specify how many months of email they would like to analyse. All the incom-ing and outgoincom-ing emails are analysed. For the incomincom-ing emails we are only lookincom-ing at the date and time and the number of emails. For the outgoing emails we are interested in the date and

(18)

1 # message is the body of the email that needs to be analysed 2 body = {"input": {"csvInstance": [message]}}

3 # project is the number of the pre-trained model 4 output = servicePrediction.hostedmodels().predict(

5 body=body, hostedModelName="sample.sentiment",

6 project=414649711441).execute().execute()

7 return output["outputLabel"]

Listing 4: Request to the machine learning API from Google.

time, the participants, number of emails, semantics and languages. All this information was stored as an object per email in a dictionary.

After Cookery was finished analysing emails, we could start visualising the dictionary of analysed emails. The visualisation was done though Matplotlib2_{. What parts of the data was going to}

be visualised was inspired by the blog of Stephen Wolfram. He was interested to find life trends from this data. This meant that many of his plots were time based. Examples are time vs. day plot when emails were sent or the number of emails incoming and outgoing. The plot are based on the last six month of my personal emails which contain 470 outgoing and 1200 incoming messages. The results based on the time stamp of the email and the number of emails are shown in Figure 4.2.

Figure 4.2: Results based on the number of emails and the time stamp of the email.

(19)

As stated, we were also interested in the participant of an email. This made it possible to show conversations between groups through time as you can see in Figure 4.3. Every colour of the dots represents a person or group that the emails were addressed to. People that were only addressed once were made black. Multiple dots of the same colour represents a conversation.

Figure 4.3: Results of participant analysis of the emails.

Semantics were used in a similar way. The time vs. day plot was coloured according to the semantics of the body of the message that was send. This was done to see if a conversation had a good or bad sentiment of if there was a part of the day that was more prone to bad semantics, see Figure 4.4.

Figure 4.4: Results of semantic analysis of the emails.

The pre-trained model for language from the machine learning API was less useful then we initially thought. Although the model itself does work, it can only categorise three languages. If a language of the email is not one of the three, it will result in a false positive. This is a problem when the main language of the emails are not represented in the API. This is something that the user needs to be aware of if he decides to use this part of the application. Since the main language of my emails are Dutch, the results we not correct and not useful, see Figure 4.5. My personal emails do not contain French or Spanish but they do get classified as such. This functionally is only useful for people whose email only contain these languages.

(20)

Figure 4.5: Results of language analysis of the emails.

The pipeline completes by reporting these graph back to the user. The user can now pull conclusion from the data to find life trends. This is personal and is more interesting when you have more data. If we look at my data, it is visible that in the beginning of the year I was more active at night due to my night shifts at my work. In the conversations figure we can see that in the last three month I have only send massages to a very small group of people which differs from the group in the beginning of the year. This is caused by following a minor at another faculty. At the distribution we can see that the occurrence of me sending more then 5 emails per day is very low. From the semantics graph we can not really say anything definitive. Although the negative messages appear to be grouped, this is not conclusive. The number of participant in the emails sent in the last six months was only limited from 1 to 3 people.

(21)

CHAPTER 5

Conclusion

This chapter contains the conclusion about the data analysis pipeline in Cookery. We discuss some limitation about the use case and talk about how this project can be used in the future.

5.1 Conclusion

The goal of this project was to answer the question ”How to define a data analysis pipeline using Cookery”. We did this by create an use case where we could find life trends based on emails. The Cookery program was working but the conclusion on life trends are very personal and cannot be generalised. The working pipeline shows that is it possible to create data analysis programs in Cookery. This means that scientists can use Cookery as it was intended and create their own data analysis pipeline. The Cookery application can be recommended to scientists that are having difficulties with programming. The platform reduces the amount of code and time needed to analyse data. This shortens research time and thus increases efficiency.

However, it can be hard for a developer to get familiar with the Cookery work flow. During the project it was challenging to make the activities small and specific in order to be flexible. For the scientists it can be difficult to know what an activity does. Documentation is required to solve this problem.

The foundation of the Cookery ecosystem has been build by this project as well as the Cookery projects of fellow students. These projects will be added to the Github repository1 _{of Cookery}

and can be used by scientists. As the Cookery ecosystem progresses more functionalists will be added and the usability increases.

5.2 Discussions

There are some limitations on the use case that was used to show a data analysis pipeline. The machine learning API has a ’User rate limit’. This limits the number of requests to the API to 650. When many month or even years will be analysed, the program will stop due to this limitation. Paying a premium to Google can prevent this. Also, it is not possible to upload batch request to the machine learning API. Every message has to be individually sent and analysed. This creates a lot of overhead and results in a large execution time. The speed of the execution does depend on your internet connection which will improve when this Cookery program will be deployed in the cloud.

In the use case we only made use of Google products for displaying the possibilities of data analytics in Cookery. But there are more services available such as AWS Amazon and Azure

(22)

Microsoft. Although these are different services the functionality is similar. If we had used these services in the use case, there would have been a similar approach. Both these other services also make use of oAuth. The implementation and use of oAuth in the pipeline would not change. Generating the graphs would also not change. We can say that these parts can be used in a more general way. The communication between Cookery and Google is the main part that needs to be altered to make it work with the services of Amazon and Microsoft.

5.3 Future work

An important aspect of the project was to build a foundation for other applications. This pro-ject serves as a foundation that would help with the adaptation of Cookery by scientists. Future work of this project could be extending the functionality with appointments in online calendar or your GPS location that is being registered by Google’s location history. Not only makes this the project more revealing about your personal life, but it is also makes the pipeline more complex. Jupyter Notebook also has the ability to have interactive widgets. This way the user cannot only use but also adapt the code to their specific needs. The implementation of this would make the project more dynamic.

By implementing more cloud services such as AWS Amazon and Azure Microsoft, we could make Cookery more diverse and more usable. This would help by the adoption by scientist which will have more options for data analytics. It would also be helpful to implement a func-tionality to upload data to Cookery and perform statistical analysis to create a more complete pipeline.

(23)

Bibliography

[1] John Gantz and David Reinsel. “The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east”. In: IDC iView: IDC Analyze the future 2007.2012 (2012), pp. 1–16.

[2] Peter Mell, Tim Grance et al. “The NIST definition of cloud computing”. In: (2011). [3] L Ferreira Pires. “Wat is Cloud computing?” In: Computerrecht 2011.3 (2011), pp. 104–

107.

[4] Michael Armbrust et al. “A view of cloud computing”. In: Communications of the ACM 53.4 (2010), pp. 50–58.

[5] Xiaojing Ye and Junwei Huang. “A framework for cloud-based smart home”. In: Computer Science and Network Technology (ICCSNT), 2011 International Conference on. Vol. 2. IEEE. 2011, pp. 894–897.

[6] Moataz Soliman et al. “Smart home: Integrating internet of things with web services and cloud computing”. In: Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on. Vol. 2. IEEE. 2013, pp. 317–320.

[7] Mark Turner, David Budgen and Pearl Brereton. “Turning software into a service”. In: Computer 36.10 (2003), pp. 38–44.

[8] Michael Cusumano. “Cloud computing and SaaS as new computing platforms”. In: Com-munications of the ACM 53.4 (2010), pp. 27–29.

[9] Robert Richards. “Representational State Transfer (REST)”. In: Pro PHP XML and Web Services. Berkeley, CA: Apress, 2006, pp. 633–672. isbn: 978-1-4302-0139-7. doi: 10.1007/ 978-1-4302-0139-7_17. url: http://dx.doi.org/10.1007/978-1-4302-0139-7_17. [10] Dick Hardt. “The OAuth 2.0 authorization framework”. In: (2012).

[11] _{Mitchell Anicas. An Introduction to OAuth 2. url: https://www.digitalocean.com/} community/tutorials/an-introduction-to-oauth-2 (visited on 08/07/2017).

[12] _{SAS. Machine Learning: What it is and why it matters. url: https://www.sas.com/en_} us/insights/analytics/machine-learning.html (visited on 30/05/2017).

[13] David E Goldberg and John H Holland. “Genetic algorithms and machine learning”. In: Machine learning 3.2 (1988), pp. 95–99.

[14] _{Google. Prediction API Documentation. url: https://cloud.google.com/prediction/} docs/ (visited on 01/07/2017).

[15] Mikolaj Baranowski, Adam Belloum and Marian Bubak. “Cookery: A framework for devel-oping cloud applications”. In: High Performance Computing & Simulation (HPCS), 2015 International Conference on. IEEE. 2015, pp. 635–638.

[16] _{Jypter Notebook. Main Documentation. url: http://jupyter-notebook.readthedocs.} io/en/latest/notebook.html (visited on 01/07/2017).

[17] _{Stehpen Wolfram. The Personal Analytics of My Life. url: http://blog.stephenwolfram.} com/2012/03/the-personal-analytics-of-my-life/ (visited on 02/07/2017).

(24)

[18] _{Google. API Manager. url: https://console.developers.google.com/apis/ (visited} on 03/07/2017).

[19] _{Google. API Documentation. url: https : / / developers . google . com / gmail / api /} quickstart/python (visited on 03/07/2017).

Finding life trends using data analysis in Cookery

Bachelor Informatica