A Cookery extension to simplify cloud service integrations

(1)

Bachelor Informatica

A Cookery extension to

sim-plify cloud service integrations

Michael van Mill

June 9, 2017

Supervisor(s): Adam Belloum and Mikolaj Baranowski

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

In recent years cloud services have become essential in our everyday lives. The linking of these cloud services allows people to have more powerful applications. The Cookery frame-work is a development tool which can be used to write cloud applications without having a deep understanding of programming. It is developed at the UvA and currently has not got a module which allows for integration with cloud service providers.

This thesis proposes an architecture for a Cookery extension that allows for connections with cloud services. This architecture implements the OAuth 2.0 protocol for convenient authorization and just as the Cookery framework itself, the extension also supports Jupyter notebook integration.

A working Github commit watcher is used as a proof of concept for the architecture. How-ever, more research is needed to enhance redundant aspects of the architecture and to add more cloud services to increase the value of the Cookery framework.

The code developed in this thesis for the architecture and the proof of concept application will be available as a part of the Cookery Github.

(4)

(5)

Introduction

These days cloud service providers are becoming a major part of our lives. Because of this in-crease in popularity over the years, cloud service providers are trying to meet a certain aspect of their users demands by providing a solution to be easily integrated with other systems or services the user is interested in. These solutions are often called open integration protocols and work with application programming interfaces (APIs). Often these APIs require a deep understanding of programming and therefore prevent users from tailoring programs to their needs.

This is one of the main reasons the Cookery framework is being developed [1]. The objective of this framework is to give its end-users an easy to use interface which can be used to write complex applications with the easy to use Cookery language. The general idea of the Cookery project is that it will give programming a natural feeling. This is achieved by the fact that the Cookery language is based upon the English language. Take for example the following two lines of code that count the number of words in a file: A = split File text file.txt. and count A. The first line creates a variable A and splits the file in words. The next line counts all words in A. The focus of this work is to build a foundation for an extension in the Cookery framework which will allow users to develop applications that can connect features of multiple cloud service providers together. It will be useful to have the ability, to extract, combine and exploit the best features of their favorite online services. With this ability users can develop applications that can automate their tasks, extend usability or simply give the user easier access to their data. In order to achieve this goal, a proof of concept will be developed in which an architecture will be designed that is optimal for combining functionalities of multiple cloud service providers. Important features of this architecture are the cloud service provider APIs, a server to handle authentication, Jupyter Notebook for user interaction and of course the Cookery kernel imple-mentations itself. As a proof of concept a Github commit watcher will be developed. This application will be able to trigger an email notification whenever a watched repository receives a new commit. This application is suitable as a proof of concept, because it implements authen-tication (Github as well as email), it needs to actively monitor for triggers and there are actions bound to these triggers. Also, it has to deal with the Simple Mail Transfer Protocol (SMTP).

1.1 Theoretical research goals

The objective of this research is to provide more insight into the problems that come with combining multiple cloud services into a framework (Cookery). These problems have a very broad scope, therefore this thesis is focused mostly on three areas: authorization, Jupyter notebook integration and the Cookery framework. The focus on authorization has been selected to ensure user privacy and data security. The focus on Jupyter notebook is chosen to provide users with an optimal experience. At last the focus on the Cookery framework is necessary in order to make

(8)

the extension work.

1.2 Research questions

This research is focused on building an extension in the Cookery framework that allows for the combination of multiple systems. In order to do that the central question is:

How do we make open integration protocols of cloud services available for end-users in a framework that can be used to create cloud applications?

This question is interesting, because not only are the APIs to be implemented in a single system, but it is also necessary that the end-user is able to use these integrations in a convenient and advantageous way.

The user privacy and data security plays a crucial role in this implementation and therefore the secondary question is:

How can we use, and deal with the authorization protocols attached to these cloud services whilst still providing a good user experience?

This question will support the design of the architecture, because the focus on user experience dictates that the authorization implementation should be as much abstracted from the user as possible.

1.3 Related work

This research builds upon the work of Mikolaj Baranowski, Adam Belloum and Marian Bubak who started the development of the Cookery framework and began to tackle the complexity problem that comes with the the integration of APIs. Their aim with the Cookery framework is to provide a cross-platform and cross cloud-environment application framework with an easy to use interface for the end-user [1].

As this research is focused on the combination of multiple systems, we certainly have a lot of overlap in terms of functionality with IFTTT (If This Than That)[17, 9]. IFTTT is a web-based service that allows its users to create chains of conditional statements. These chains contain triggers from cloud services that can invoke actions in other cloud services. Thus, IFTTT can be used to automate web-application tasks. Examples are uploading photos to Dropbox that were received by email or automatically uploading the same content to multiple social media streams. IFTTT is focussed on simplicity and therefore only supports one trigger action pair. Other IFTTT alternatives Zapier [19], and Microsoft Flow [13] di↵er from that approach and o↵er support for longer chains of trigger/actions pairs. Apart from o↵ering longer chains, Zapier and Microsoft Flow use the exact same concept for workflow automation as IFTTT.

Cookery di↵ers from these services by its ability to implement any data transformation, because it can be extended with all the functionalities that Python has to o↵er.

1.4 Thesis outline

In chapter 2 concepts of cloud computing and open integration protocols are explained. Chapter 3 will provide information regarding the Cookery framework, project Jupyter and their role in the development of the proof of concept. Chapter 4 gives a detailed description of online authorization approaches and specifically focuses on third party access. In chapter 5 the implementation and design of the architecture will be explained. In chapter 6 we discuss the implementation of the proof of concept, draw conclusions from the research and motivate future work.

(9)

CHAPTER 2

Cloud environment and SaaS

This section aims to provide a deeper understanding of the underlying technologies which are necessary to integrate di↵erent cloud server providers.

2.1 Cloud computing

An important concept that needs to be understood in order to value the Cookery framework is the concept of cloud computing. More specifically, platforms as a service and software as a service, which are broadly categorized as cloud computing. These services are programs/platforms where people don’t need to buy their own copy but rather subscribe to a service for a monthly fee or even for free. There are two main features of software as a service (SaaS). The first one being that the client does not need to install the software but just pays for its use. The second important feature of SaaS applications is that they are often hosted centrally, which enables SaaS-providers to execute updates for all it users simultaneously [6]. In this way SaaS-providers can easily maintain their service with minimum hassle for their clients.

Because cloud services mainly focus on their one service and aim to cater as many customers as possible, they cannot customize their service for the specific needs of all the individual clients. This limitation prevents customers from having their internal systems connected to their SaaS services.

Sometimes it does not matter whether services are connected to each other. For example, the service that a company uses as version control and to track issues in software development does not need to be connected to the accounting software. However in other cases these links are important, an example can be an online store that needs to be connected with the accounting software in order to maintain profitable efficiency. With the absence of necessary connections, undesirable situations will submerge. In these situations the flow of data will not be optimal and will cause a lot of manual labor, examples of unnecessary labor being: data entry, data duplication, monitoring for updates or performing backups.

Cloud server providers simply can not cater custom solutions to all their users and at the same time their users can not modify the service because they do not run it locally but are just subscribed to a centrally hosted solution. The solution currently o↵ered by cloud service providers is the implementation of open integrations protocols. These open integration protocols often come in the form of an application programming interface (API). These APIs allow developers to connect cloud services with their own resources or internal systems. Meanwhile, the cloud server providers benefit from this approach in two di↵erent ways. First their systems become more valuable for their customers as it automates more for them and secondly because good APIs can start a positive feedbackloop around the SaaS service. A positive feedbackloop can start because good APIs attract developers, who then make more applications using those APIs. These new applications in their turn attract even more developers and users. This way cloud/SaaS platforms can get a positive feedbackloop going [2].

(10)

2.2 Open integration protocols from a technical point of view

As mentioned in the previous section: application programming interfaces are the most widely adopted open integration protocols by cloud service providers. An application programming interface is a broad term to define a set of protocols, subroutines and tools for building appli-cation software. APIs come in many di↵erent forms, there are operating system APIs, remote APIs, web APIs and more. Because the cloud service providers operate in an online web-based environment they make use of web APIs. These kind of APIs generally work over the Hypertext Transfer Protocol (HTTP). This protocol allows easy communication between webservers and webbrowsers but also between multiple webservers. However, these APIs can be implemented in di↵erent ways. Currently there are two popular approaches to build web APIs. These ap-proaches are RESTful web resources and SOAP-based web services. Right now there is also a third approach gaining popularity which is the GraphQL API query language. It needs to be noted however, that even when the same approach is used (e.g. RESTful), APIs can still di↵er much regarding technical specifications.

Restful API

Representational state transfer or REST is now one of the most widely adopted standards for web APIs. A REST API is a set of predefined operations. The goal of these operations is to give the client access to one or multiple web resources. A web resource can be any kind of information, a document or image, a temporal service (e.g. ”today’s weather in Amsterdam”), a collection of other resources, a non-virtual object (e.g. a person), and so on)[5]. The kind of operations that are available in a RESTful service usually have a lot in common with the trivial HTTP verbs: GET, PUT, POST and DELETE. Resources can be easily be accessed by the clients by just sending a request to an API endpoint of the resource server. An API endpoint is essentially an Uniform Resource Identifier (URI). For example a GET request with an URI like resource/user/17 will return the user that is identified with id 17. The returned data is usually in HTML, XML or JSON, however other data types are also possible.

SOAP API

The Simple Object Access Protocol often shortened to SOAP is a protocol that specifies the exchange of information. Unlike REST, SOAP is a protocol that comes with prescribed message formats, building blocks and transport methods. The message format that is adopted by the protocol is the extensible Markup Language Information Set (XML Infoset). This can be seen in the building blocks as SOAP implements a standard XML-message. In this message there are two required properties, the envelope and the body. The envelope property identifies the document as a SOAP message and the body which contains the call and response information. As for transport methods, SOAP is neutral and extensible designed [16]. The neutrality allows for SOAP to be implemented independent of the protocol (e.g. HTTP, SMTP, TCP etc.). Like RESTful web APIs, SOAP APIs also make use of API endpoints. Another important aspect of SOAP is the extensibility which allows the protocol to be extended with additional security specifications like Web Services Addressing (WS-Addressing).

GraphQL

GraphQL is a query language for APIs and is relatively new as it was only developed by Face-book in 2012 and publicly released in 2015. Instead of approaching web resources or services individually with multiple endpoints, GraphQL works on an entity graph basis and only needs one endpoint (usually /graphql)[7]. This basis allows clients to specify exactly what resources they would like to receive and is able to handle this with only one request. This as opposed to REST or SOAP APIs where multiple requests are often required. At the time of writing (May 2017) Github is testing out their fourth and newest API version which is based on GraphQL, where as their current and third version is REST based.

(11)

2.3 Integrations with APIs

There are multiple difficulties with integrating systems by using APIs, however, the focus of this work is mainly on the fact that cloud-environments are very heterogeneous[1, 12]. This becomes visible when di↵erent API approaches are examined on subjects such as: authentication protocols, return types or script languages. This causes a situation where developers have to adapt to each API and are not able to follow one consistent protocol. The Cookery framework aims to solve this problem by being a cross-platform and cross cloud-environment application framework.

(12)

(13)

CHAPTER 3

Jupyter and the Cookery framework

The goal of this section is to provide information about project Jupyter and the Cookery frame-work.

3.1 Project Jupyter

Formerly known as the IPython Notebook, project Jupyter is an open source project that allows for interactive data science and scientific computing across multiple programming languages [10]. The project comes in the form of a web application and is installed locally on the users machine. After installation notebooks can be made [15]. A notebook is essentially a document that can contain live code, equations, visualizations and explanatory text [14].

Figure 3.1: Project Jupyter structure diagram1_.

As can be seen in figure 3.1, Project Jupyter has a couple key components. These key compo-nents are: the kernel, the Notebook server, the notebook file, and the browser.

The kernel

The kernel is essentially the interpreter of the code that is executed. It receives blocks of code from the Notebook server and returns all the variables and return values. In Jupyter Notebook, users have the ability to switch out di↵erent kernels. Di↵erent kernels means interpreters for di↵erent languages. Therefore, being able to switch kernels turns Jupyter Notebook into a lan-guage agnostic application.

(14)

The notebook server

The notebook server is responsible for creating, updating and saving notebooks. The notebook server also serves as the central messenger, it communicates with the browser/user and at the same time it communicates with the kernel. The communication with the kernel goes via the high-performance messaging library ZeroMQ [3]. For the communication with the user the note-book server actually starts a webserver by using the Python Tornado package. Because of this, the user can easily access the notebook server via a browser. Because the browser can use HTTP and Websockets to communicate with the server it can instantly serves results back to the user. The notebook file

In order to keep track of di↵erent elements (code, equations, graphs, and more) the notebook file is build up by cells. Each cell is dedicated to be either a block of code, a visualization or explanatory text etc. Apart from containing these cells, the notebook file also keeps track of variables used in the code, as well as the output of the code blocks. This gives users the ability to run their code in a step wise manner. When a code cell is made to run,or rerun, the notebook server sends it to the kernel which will then process this piece of code and will return all the relevant result information. A big advantage of notebook files is that they can be easily shared as their state is fully saved. Technically a notebook file is saved in a JSON format with an .ipynb extension.

3.2 Cookery

The Cookery framework is designed to give people without programming background the ability to make complicated applications. Cookery makes this possible by abstracting the originally complicated building blocks. On the top layer of this abstraction is a language based on the English language that can be used by the end-user in order to write applications. Although Cookery could be very useful in many fields, the goal of this research is to lay the foundation that will give Cookery the ability to connect to multiple cloud server providers. By implementing this foundation, applications in Cookery can easily use many functionalities and services that are provided by cloud service providers. That, in combination with the default data transformation capabilities of Cookery, will provide for a powerful tool that can be used to make applications that now require a lot of complicated code.

The Cookery framework has the ability to run in a Jupyter notebook environment. This is because in previous research a Cookery kernel was developed for the Jupyter framework.

3.2.1 Cookery layers

Figure 3.2: Cookery layer structure2_.

As it can be seen in figure 3.2 the three layers that make up the Cookery framework are: the Cookery language layer, the domain specific language (DSL) layer and finally the backend layer. This list of lay-ers is ordered from easy to hard to use. The Cook-ery backend layer requires a vCook-ery deep understand-ing of programmunderstand-ing and can be used to adapt the framework to new environments or to implement new protocols. The Cookery DSL layer is also a development layer. In this layer actions, subjects and conditions can be defined such that they can be used by the end-user in the Cookery language layer. The simplest layer is the Cookery language layer, focused on users with little to no program-ming experience. In this layer they will be able to

2 _{Source: http://ieeexplore.ieee.org/abstract/document/7237105/ by Mikolaj Baranowski, Adam} Bel-loum, and Marian Bubak.

(15)

make a program using simple elements that are defined in the DSL layer.

Most of the work this paper describes is done in the Cookery DSL development layer as this is the layer where actions like connecting to a cloud service provider or sending mails are defined so that they can be used by end-users through the Cookery language.

3.2.2 Cookery Elements

The Cookery language has multiple elements users can utilize to build their program. Each of the elements serves a specific purpose. The elements that are available in Cookery are:

• Actions • Subjects • Variables • Conditions

An action is a operation that is defined in the underlying Cookery DSL. Actions can be given subjects, which represent remote data. This data can serve as input or output for an application. Subjects are also implemented in the underlying Cookery DSL. Results of actions can be stored in variables. These variables can be used later as a reference to the data. Lastly conditions can serve as data transformers. They execute on data that is retrieved by the subject implementation and transform that data before it is passed to the action [1].

(16)

(17)

CHAPTER 4

Authorization

This first part of this chapter explains the need for authorization and the second part reviews and examines multiple aspects of online authorization.

4.1 Need for data access

Because of the advancements in internet technologies, lots and lots of cloud service providers have started to appear. We now have cloud services for almost everything (e.g. listing and saving music, sharing programs and code and communication services). Bigger cloud service providers usually have access to a lot of useful and personal data. This data can be used really well by other companies. Using data of bigger cloud service providers serves two main purposes. The first one being that the user doesn’t need to go through the hassle of providing the same information to multiple companies. And secondly it allows the company to reuse work of other service providers. A good example of a company that uses another internet service providers data is Tinder. Tinder makes its users connect to Tinder via their Facebook accounts. In this way the user can use his Facebook data on Tinder and does not need to re-upload all his information. At the same time Tinder does not need to worry about fake profiles since this is already done by Facebook. Because of this increase in demand for data access, the users privacy and online safety be-comes more at risk. To maintain this privacy and safety, online authorization is necessary. This is because it can be used in order to restrict people from accessing data they are not supposed to access. Online authorization is implemented in many di↵erent ways. In the next section the most important and widely used approaches are reviewed.

4.2 Approaches of online authorization examined

Username password combination

Probably the best known method for online authorization is a login-protocol where the user needs to provide his username and password in order to access his profile and/or the service in question. This protocol is fairly simple. The cloud service provider asks for the users credentials, the users then provides the cloud service provider with his credentials, which then are checked. This method is really suitable for private use as the user can easily manage to authenticate himself in order to get full access to his account.

Keys

Keys are used in many ways regarding online authorization. However in this thesis the focus is on keys that users have to obtain themselves. This is possible as cloud service providers often o↵er the option to their users to register keys. Also, they give their users the ability to give certain permissions to these keys (e.g. read permission, read/write permission or root access).

(18)

Generated keys can be used to give third parties certain account permissions. These permissions can be used by third parties to connect their services [4].

OAuth

The OAuth 2.0 protocol combines the best of both of the previous approaches. The OAuth protocol provides a generic framework to let a resource owner authorize a third party that wants to access to its resources [18]. Figure 4.1 shows how the client, the resource owner and the cloud service servers interact with each other.

Figure 4.1: OAuth 2.0 authorization flow for third party access1_.

• Step A: when using the OAuth protocol the third party (the client) that wants to access the information of the user (resource owner) sends a request for authorization to the resource owner. This request often includes a certain scope (permission level).

• Step B: the resource owner then logs in and grants the authorization. It sends this grant back to the client.

• Step C: the client then sends this authorization grant to the authorization server (the cloud provider), who than verifies the credentials.

• Step D: if the authorization grant checks out, the authorization server sends an access token to the client.

• Step E and F: the client can use this access token to request and receive resources from the resource server (the cloud provider)[8].

4.3 Third party access

In this section the aforementioned authorization approaches are discussed in regard to how suit-able they are for third party access.

Username password combination

The username with password approach is unsuitable for third party access, because the third party would get root control over the users account. In this way the third party would have the

(19)

power to write unwanted changes or even delete the account all together. Keys

Keys are mostly embedded in systems for the exact purpose of supporting third party access, hence this approach is a safe way for the user to grant permissions for data access to another party. The user has the ability to restrict access to only certain scopes and can easily revoke access. However this option gives some hassle for the user who now has to generate keys for all third party applications that he wants to connect.

OAuth

The widely used OAuth protocol is not yet fully without flaws[18, 11] however it does provide the user with the best combination of convenience and safety. Just like keys it does not expose the user to third party root access-right threats, and keeps the user in control over granted permis-sions. At the same time it brings a very convenient experience to the user because the only action required is to log in and press an ”OK” button. For this reason OAuth is the authentication approach that is implemented in the Cookery extension.

(20)

(21)

CHAPTER 5

Experiment

This chapter describes the practical part of this research and o↵ers explanations for design deci-sions that were made.

5.1 Cloud server provider connections - proof of concept

The goal of this research is to design a foundation that will serve as the main architecture for connections between multiple APIs used by the Cookery Framework. This architecture is neces-sary in order to extend the Cookery framework with methods that can be used to easily connect multiple cloud server providers together. With these methods end-users will gain the ability to write their own IFTTT-like programs. This means that they could, for example, let Gmail send a message whenever a new song from their favorite band becomes available on a service like Spotify. As a proof of concept a Github commit watcher is chosen. That means that the goal of this experiment is to design and implement an architecture that is able to get authentication from Github, is able to monitor Github for changes and is able to connect via SMTP with a mail provider to send a notification mail whenever a commit is pushed to the server. This is a suit-able proof of concept because it needs to implement: (1) OAuth authentication, (2) a trigger mechanism and (3) the SMTP protocol. These components are vital to an architecture that needs to support the creation of conditional applications regarding multiple cloud services. The last important component of this proof of concept is that it needs to be compatible with Jupyter notebook. This will provide an extra challenge as then we need to deal with the ongoing IOloop1 of the notebook server.

1_{To handle the OAuth process, we need to implement our own Tornado webserver with IOloop. This is a} problem as there already is an IOloop that handles the input and output of the notebook server. It is not possible to run the two at the same time, and hacking our webserver handles within the default notebook server is not a desirable solution.

(22)

5.2 Proof of concept design and implementation

As shown in figure 5.1, the design of this architecture was split in two parts, the first part (A) being a working proof of concept without Jupyter notebook integration and the second part (B) consisting of adding the integration capabilities for Jupyter notebook.

Figure 5.1: (A) Basic architecture, (B) Architecture with Jupyter notebook integration.

We show how the di↵erent components of the architecture interact with each other, the cloud services and with the authentication server in figure 5.2. In the next sections the interactions with the Cookery development layer components will be explained in more detail.

Figure 5.2: Interaction between the components. (A) Basic architecture, (B) Architecture with Jupyter notebook integration.

5.2.1 Proof of concept without Jupyter notebook integration

(23)

trigger mechanism and the action mechanism. Authentication

Because of the reasons concluded in the chapter on authorization, the OAuth approach has been chosen as the solution for the authentication problem. This is not the easiest approach to im-plement in the Cookery development layer (that would be keys2_{), but will give the best user}

experience when using the Cookery language (the highest layer).

The first thing that needs to be taken care of in order to implement the OAuth protocol is to get a client-id3_{and a client-secret}4 _{from the cloud server provider. The client-id serves as an unique}

identifier for the Cookery application. This is necessary to provide the user with information about who wants to access their data. The client-secret serves as a key that can prove the iden-tity of the client. When requesting the client-id and client-secret, the cloud server provider also asks for a callback URL. This callback URL will be used by the cloud server provider to notify the client whenever a new permission is granted.

The implementation of the OAuth flow in the Github commit watcher is quite similar to how it is described in figure 4.1 in the section on authentication. However from a technical point of view there are a couple things that need some extra explanation.

Figure 5.3: OAuth 2.0 flow part 1/3.

As can be seen in the figure 5.3 in transaction (A) the client (the Cookery application) will send the resource owner an authorization request. While this sounds quite fancy, the client does nothing but send an URL to the user. This URL looks like https://github.com/login/ oauth/authorize?client_id=[client_id], where [client id] will be replaced with the re-ceived client-id. The next thing that happens is that the URL is opened automatically for the resource owner (the end-user), this is achieved by using the Python webbrowser package5_{. At this}

point the resource owner can let Github know that it wants to give the [client id]-application certain permissions. The user identifies himself to Github by simply logging in with his user-name/password combination. Now transaction (A) is fully completed and transaction (B) will start. When Github gets the users request to give permission to the given [client id], it looks up that client-id and searches for the given callback URL and redirects the user there. Not only does Github simply redirects the user to the callback URL it also passes an authorization grant token.

The next implementation step for the proof of concept is actually being able to catch the web request of the user to the callback URL. In order to handle this web request the proof of con-cept application starts its own local webserver along with a handle for the callback URL. This webserver is implemented by using the Python tornado package6_{. With this package the}

autho-rization grant web request can be caught and parsed. After this request is parsed, the webserver is no longer needed and is closed.

2_{When users would provide the keys themselves, there is no need for an OAuth flow with multiple requests} between the client and the authorization server of the cloud service.

3_{As specified by the RFC 6749 OAuth 2.0 protocol a client-id is: a unique string representing the registration} information provided by the client.

4_{As specified by the RFC 6749 OAuth 2.0 protocol a client-secret needs to be used by the client in order to} verify his identity when requesting an authentication token.

5_{Python webbrowser package: https://docs.python.org/2/library/webbrowser.html} 6_{Python tornado package: http://www.tornadoweb.org/en/stable/}

(24)

In figure 5.4 the client (the Cookery application) has the authorization grant and needs to ex-change that for an access token in order to be able to ask for resources. This is where transactions (C) and (D) come in. In transaction (C) the client sends the authorization grant token plus his client-id plus his client-secret to the authorization server of Github. At this point Github will verify all the tokens and if the tokens are correct it will send back the access token to the client. Transactions (C) and (D) are implemented by a request via the Python requests package.

As it can be seen in figure 5.5, in transactions (E) and (F) the client can now request a certain resource by sending a request to the API endpoint for that resource along with its access token. By providing the access Github verifies its permissions and sends back the requested resource if the permission checks out.

Trigger mechanism

Now that accessing resources is possible, there is the need for a monitoring system that can initiate a trigger. For the proof of concept application we want to monitor a certain repository for new commits. In order to monitor these commits, three components are necessary. These components are: the name of the repository, the API resource endpoint and a mechanism that can initiate an API endpoint request periodically. The end-user handles the repository name as he has to specify it in order to run the program. The API endpoints can be found in the Github API documentation and are abstracted in the pyGithub package which we use in this application

7_{. For the periodical request mechanism a recursive function with a timer is used. This function}

executes the request and then waits for a specified amount of time to execute again. Whenever the request returns a new commit value the mail function (action) is triggered.

Action mechanism

The triggered action in the proof of concept is an email notification. In order to have a working mail function the Simple Mail Transfer Protocol needs to be implemented. To accomplish this the Python smtplib package8 _{is used. This package already has a full implementation of the}

mail protocol.

5.2.2 Adding Jupyter notebook integration

One of the biggest advantages of Jupyter notebook is that cells of code can be executed sepa-rately and multiple times. Jupyter allows for this user interaction by handling input and output via the tornado IOLoop. In order to implement our OAuth flow we need to spawn a new server and stop it after the flow is completed. This is not possible, because a new server would need an

7_{Github API documentation: https://developer.github.com/v3/}

(25)

IOloop and there can only be one IOloop running at the same time. A solution to this problem would be to add a handle to the existing IOloop, however this handle cannot be removed, because removing a handle dynamically is not supported in the tornado package. If the handle cannot be removed the cell cannot be run twice without crashing the notebook server. So in order to integrate Jupyter notebook the whole server component of the OAuth flow has to be extracted out of the application. This is done by setting up an extra Cookery Authentication server. With this change the OAuth diagram significantly changes as can be seen in figure 5.6.

Figure 5.6: OAuth 2.0 flow with Jupyter integration (Jupyter integration is highlighted in blue).

The Cookery authentication server will handle the callback from the Github API. However, when it receives the callback information it also needs to know to which user the information belongs. This is handled in multiple (transaction) steps. First in transaction (A) the users tells the Cookery authentication server that it wants an ID token. So it can later ask for the access token that is connected with the ID token. In transaction (B) the Cookery authentication server responds with an ID token. The authorization request in transaction (C) is almost identical to the similar request without the Jupyter integration. The only thing that is di↵erent is that it also sends the ID token to the Github server. When Github redirects to the callback URL in transaction (D) it also passes this extra ID token. This way the Cookery authentication server knows to which ID token it should connect the access token that it will receive through transactions (E) and (F). Now the client simply asks the Cookery authentication server for the access token that is connected with its ID token in transaction (G). Because of the cellular build of notebooks the user can simply wait with executing transaction (G) until after the user gave its permission by providing his credentials to Github.

5.3 Results

This section describes the proof of concept application that was made, as well as how the designed architecture opens the framework up to integrations with more cloud services.

(26)

5.3.1 Proof of concept application

The proof of concept application as shown in figure 5.7, implements a Github watcher that takes a repository property (in this case commits) and monitors it for changes. Whenever a change is detected, it triggers an action (email notification). The proof of concept application also works with Jupyter notebook, which is a web application itself. With this proof of concept application users can easily automate a Github related notification process.

The access to the repository is achieved by the implementation of an OAuth authentication flow, which is explained in detail in figure 5.6. This implementation gives the user a convenient and secure authentication process. The monitoring works via periodic calls by a recursive function. This function compares the newest commit with the latest monitored commit. If there is a di↵erence between these two commits, an action is triggered. Because of the interval base of the monitoring mechanism, it could happen that two commits are made in one interval. When this happens only one notification gets send. The chance of this happening can be reduced by decreasing the interval. However, this will increase the server load9.

Figure 5.7: Scheme of the proof of concept application.

5.3.2 Architecture for integrations with cloud services

The proof of concept application served as an use case in order to design an architecture for the Cookery framework. The goal of this architecture is to enable the Cookery framework with the ability to connect cloud services.

The architecture that is implemented in the experiment and is shown in figure 5.2 adds an authentication and a monitoring components to the framework. The authentication component is useful as it implements a full OAuth flow. The OAuth protocol is widely adopted by cloud service providers and thus opens the framework up for integration with other cloud services. The monitoring component adds a useful feature to the framework, because it makes it possible to actually realize IFTTT-like applications10_{. Also, the architecture is successfully extended}

with integration of Jupyter notebook. This integration allows the users to interactively make applications with the Cookery framework.

(27)

CHAPTER 6

Discussion and conclusion

This chapter first discusses the results of the experiment, then draws conclusions from the research and lastly motivates further research.

6.1 Experiment results in regard to the research questions

The first research question is: how do we make open integration protocols of cloud services avail-able for end-users in a framework that can be used to create cloud applications? This question is approached from a practical point of view as a proof of concept is implemented that connects multiple cloud services. In order to make this proof of concept work, three core components are designed: an authentication component, a monitoring/trigger component and a trigger/action component. When we examine these components, we find that the monitoring/trigger and the trigger/action mechanisms are vital for the connection of multiple cloud services. These are the components that enable the Cookery framework to actually connect multiple cloud services together.

The second research question is: how can we use and deal with the authorization protocols at-tached to these cloud services whilst still providing a good user experience? This question is researched from a theoretical point of view in chapter 4 where di↵erent authorization methods are reviewed. The conclusion of chapter 4 is that for our purpose, the integration of cloud service authentication within the Cookery framework, implementation of the OAuth authorization flow is optimal. The question is also examined practically within the experiment. As mentioned in the results there is an authentication component designed that implements an OAuth authorization flow. This component brings user security and provides the user with a convenient experience. Thus, it can be said that by the implementation of this OAuth flow architecture component we can deal with authorization protocols attached to cloud services whilst still providing a good user experience.

6.2 Interactivity as a result of the Cookery server

One of the great benefits of Jupyter notebook is that it provides an interactive environment for the user. This is something that the designed architecture manages to keep intact. This is due to the implementation of the extra Cookery authentication server that operates independently. Another advantage of the separate Cookery server is that all information regarding OAuth ap-plication tokens (client-ids and client-secrets) is now safely stored in this server. If it were not for this server, the client-ids and client-secrets should have been given to the end-user, because then they should have been packed in the Cookery kernel.

The Cookery authentication server, however also comes with an element that is not optimal yet. The storing of ID tokens together with access tokens is one of the essential components that make the server work. However, storing these tokens will eventually be redundant as the access

(28)

token is stored with the end-user application anyway. This problem could be solved by deleting the tokens as soon as they are requested by the end-user, however that would be at the expense of the interactivity of Jupyter notebook.

6.3 Conclusion

With cloud service providers providing more services than ever before, benefits of connecting these services start to become more evident. The goal of this study was to design a possible architecture to make connecting cloud service providers more accessible to people without an extensive programming background.

By the investigation of architectural designs for the framework extension, some interesting in-sights regarding the research questions were found.

How do we make open integration protocols of cloud services available for end-users in a frame-work that can be used to create cloud applications?

By abstracting the authorization processes and providing plug and play trigger mechanisms, domain specific languages can be created that can be used by non-technical users in order to work with open integration protocols of cloud server providers.

How can we use and deal with the authorization protocols attached to these cloud services whilst still providing a good user experience?

The research in chapter 5 about authorization concludes that OAuth is the most widely adopted approach when it comes to user experience in authorization. The implementational technicalities of this protocol are fully abstracted from the end-user by using the Cookery abstraction layers and by the integration with Jupyter notebook the user can work in an interactive environment.

6.4 Future work

As mentioned in section 6.2, additional research could improve the architecture even more, especially regarding the Cookery authentication server. Improving the architecture will always be an important matter to enhance the overall system. However, the architecture developed does serve as a basis for the implementation of more cloud server providers. Future research can build on this and can use this architecture to extend the framework with other cloud service integrations like Twitter, Spotify, Facebook etcetera. These extra integrations will exponentially increase the usefulness of the Cookery framework as it gives the end-user the ability to easily develop complex applications that can connect their most important services.

(29)

Bibliography

[1] M. Baranowski, A. Belloum, and M. Bubak. Cookery: A framework for developing cloud applications. In High Performance Computing & Simulation (HPCS), 2015 International Conference on, pages 635–638. IEEE, 2015.

[2] M. Cusumano. Cloud computing and saas as new computing platforms. Communications of the ACM, 53(4):27–29, 2010.

[3] A. Dworak, P. Charrue, F. Ehm, W. Sliwinski, and M. Sobczak. Middleware Trends And Market Leaders 2011. Conf. Proc., C111010(CERN-ATS-2011-196):FRBHMULT05. 4 p, Oct 2011.

[4] S. Farrell. Api keys to the kingdom. IEEE Internet Computing, 13(5), 2009.

[5] R. T. Fielding and R. N. Taylor. Architectural styles and the design of network-based software architectures. University of California, Irvine Doctoral dissertation, 2000.

[6] N. Gold, A. Mohan, C. Knight, and M. Munro. Understanding service-oriented software. IEEE software, 21(2):71–77, 2004.

[7] GraphQL. Serving over http — graphql. http://graphql.org/learn/ serving-over-http.

[8] D. Hardt. The oauth 2.0 authorization framework. 2012.

[9] IFTTT. About ifttt. https://ifttt.com/wtf/, 2017. [Online; accessed 30-May-2017]. [10] Project Jupyter. About project jupyter. https://jupyter.org/about.html, 2017. [Online;

accessed 30-May-2017].

[11] T. Lodderstedt, M. McGloin, and P. Hunt. Oauth 2.0 threat model and security considera-tions. 2013.

[12] E. M. Maximilien, A. Ranabahu, and K. Gomadam. An online platform for web apis and service mashups. IEEE Internet Computing, 12(5), 2008.

[13] Microsoft. Getting started with microsoft flow. https://flow.microsoft.com/en-us/ documentation/getting-started/, 2017. [Online; accessed 30-May-2017].

[14] F. P´erez and B. E. Granger. Ipython: a system for interactive scientific computing. Com-puting in Science & Engineering, 9(3), 2007.

[15] M. Ragan-Kelley, F. Perez, B. Granger, T. Kluyver, P. Ivanov, J. Frederic, and M. Bus-sonnier. The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. AGU Fall Meeting Abstracts, December 2014.

(30)

[17] B. Ur, M. Pak Yong Ho, S. Brawner, J. Lee, S. Mennicken, N. Picard, D. Schulze, and M. L. Littman. Trigger-action programming in the wild: An analysis of 200,000 ifttt recipes. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 3227–3231. ACM, 2016.

[18] F. Yang and S. Manoharan. A security analysis of the oauth protocol. In Communications, Computers and Signal Processing (PACRIM), 2013 IEEE Pacific Rim Conference on, pages 271–276. IEEE, 2013.

[19] Zapier. Documentation version 2. https://zapier.com/developer/documentation/v2/ #what-is-zapier, 2017. [Online; accessed 30-May-2017].

A Cookery extension to simplify cloud service integrations

Bachelor Informatica