Cookery in AWS Lambda

(1)

Cookery in AWS Lambda

Timo Dobber

June 8, 2017

Supervisor(s): Adam Belloum, Miko laj Baranowski

Inf

orma

tica

—

University

of

Amsterd

am

(2)

(3)

developing and connecting cloud applications easier. But most of the people who can develop applications using Cookery still do not have the financial means to realise it. This is where AWS Lambda comes in. The combination of AWS Lambda and Cookery make it possible for people, who do not have any program experience, to program in the cloud with relatively low costs. In this paper, we present a way to combine AWS Lambda and Cookery. We develop two functionalities that can be embedded in Cookery and later be extended when necessary. The result enables a new field for developers who lack the financial support or the programming experience.

(4)

(5)

1 Introduction 7 1.1 Research question . . . 7 1.2 Related work . . . 8 2 Theoretical background 9 2.1 Cloud computing . . . 9 2.2 Software-as-a-Service . . . 9 2.3 Function-as-a-Service . . . 9 3 Cookery 11

4 Amazon Web Services 13

4.1 AWS Lambda . . . 13 4.2 AWS CloudWatch . . . 13 4.3 boto3 . . . 14 5 Implementation 15 5.1 Deployment . . . 15 5.2 Scheduling . . . 16 5.3 Use-cases . . . 17 6 Results 19

7 Conclusion and Discussion 21

7.1 Conclusion . . . 21 7.2 Discussion . . . 21 7.3 Future work . . . 22

(6)

(7)

Introduction

Cloud computing is getting a more and more prominent factor in life. People use the cloud almost every day, sometimes without even knowing it. The cloud comes in a variety like storage services, computation services, video streaming services and much more. Dropbox [10] and Google Drive [16] are an example of cloud storage services that are widely used, with Dropbox claiming to have half a billion users in March 2016 [9]. On the other side, more and more people tend to use the cloud for business purposes. In the cloud, it is easier to collaborate across geographically distributed locations. In the Harvard Business Review: Cloud Computing Comes of Age [25], it is stated that cloud software greatly reduces the implementation time and it does not need a big up-front investment. It is also stated that a cloud provider could have an application up and running in five weeks, contrary to the 18 months that it would take according to the IT-business. On the other hand, the Harvard Business Review also shows that security of these cloud services is the biggest barrier.

But programming in the cloud has become more difficult and complex, due to the many programming languages available and the big documentations that accompany the APIs (Appli-cation Programming Interfaces). In order to make programming in the cloud easier, Cookery has been developed in the context of PhD research work at the UvA [6]. Cookery enables developers to combine multiple cloud applications in an easy way. On top of that, Cookery uses its own Do-main Specific Language (DSL) to make it more accessible for people without any programming experience.

Many small start-ups cannot afford their own server; hence the cloud approach provides a solution. They can buy some computing time or storage somewhere in the cloud, which will be cheaper at the beginning as opposed to buying a server. Running a service on the cloud comes with the next problem. You need to build a software infrastructure to handle all the requests your application will receive. This is a long and tedious job that not everybody can do, certainly not all small start-ups. This problem is solved by the introduction of Function-as-a-Service (FaaS), also known as serverless computing.

1.1 Research question

Function-as-a-service makes it possible for smaller businesses to create an application in the cloud. On the other hand, we have Cookery that makes it easier for people who do not have any programming experience. These two services combined would be a powerful tool for start-ups which lack the financial support or programming knowledge, to create and manage their own applications. This leads us to the following research question:

How can we develop a framework based on Cookery and running on AWS Lambda? With this research question, we aim to develop a framework that combines the advantages of serverless computing with the advantages of Cookery, like the easy usage for end users. This can be achieved by extending the toolkit of Cookery with new implementations that connect with AWS Lambda [4].

(8)

This research question also brings some challenges with it. Cookery is a great framework and has some great opportunities to make programming easier. But Cookery is a new framework and the toolkit does not contain that much except for some basic features. For example creating a new Cookery project, which instantiates a directory with the necessary files. So, every new functionality has to be developed from scratch, while also keeping in mind that it can be extended later. AWS Lambda also has some limitations, like the rate limits, such as the maximum time to run a function and maximum allocation space. But you also need to know how AWS Lambda handles errors and failures. Another challenge is security, as mentioned earlier in the introduction. How can we guarantee the security on AWS Lambda and the cloud service providers we reach out to?

1.2 Related work

Although the principal of Function-as-a-Service and AWS Lambda already exist for a while, there is not much research about rewriting applications in order to operate with AWS Lambda.

Spillner and Dorodko [26] have researched a tool that analyses java code and transforms it into AWS Lambda functions. The results showed that simple and heterogeneous code can be transformed without problem, which suggests that Cookery will be able to run functions on AWS Lambda.

In Serverless Computation with OpenLambda Hendrickson et al. [18] present a new platform for building applications in the serverless computation model. They also describe some key aspects of serverless computation and show some performances of AWS Lambda.

Other studies show the findings of creating a service using serverless computation. Yan builds a chatbot using IBM OpenWhisk [19], an alternative of AWS Lambda [29]. While Kiran uses a lambda service to construct a data-handling backend with high throughput, but low costs [22]. Malawski presents in his study an approach of combining scientific workflows with serverless computation, considering multiple architectural designs [23]. The results showed that the prototype scientific workflow on Google Cloud Functions [15], another alternative of AWS Lambda, introduced no significant delays. The paper also states that working with the serverless computation model can bring some complications when considering bigger workflows that run more than 5 minutes or that the preparation of more complex applications to execute on a serverless computation service can be an issue.

In chapter 2 we will talk about the theoretical background, which is focused on cloud comput-ing and its different services. An in-depth view of Cookery is given in chapter 3. In chapter 4 we introduce AWS and the different services that are needed for this project. Our implementation is described in chapter 5, we will describe in detail what we have done in this project. The results are presented and discussed in chapter 6, while in chapter 7, we give a conclusion and discuss our project and what it enables for other people.

(9)

Theoretical background

2.1 Cloud computing

In A View of Cloud Computing [2] it is stated that ”cloud computing refers to both the ap-plications delivered as services over the Internet and the hardware and systems software in the data-centres that provide those services”, where the services are being referred to as Software-as-a-Service. The cloud would then be the data-centres hardware and software, and the service being sold is utility computing. In the end, it is stated that cloud computing is then the sum of Software-as-a-Service and utility computing.

While this definition is a bit vague, it does show what cloud computing is. Another definition is given by Wang et al., in the early days of cloud computing: A computing Cloud is a set of network enabled services, providing scalable, QoS (Quality of Service) guaranteed, normally personalized, inexpensive computing infrastructures on demand, which could be accessed in a simple and pervasive way [28].

Cloud computing is thus a collection of multiple services, which are offered via the Internet. The cloud service offerings are divided in three main categories, namely SaaS, Software-as-a-Service, IaaS, Infrastructure-as-a-Service and PaaS, Platform-as-a-Service. In the last few years a new service has been developed, which came to be known as Function-as-a-Service or FaaS. One of the reasons cloud computing has become so big, is that the user does not need to download any software, but instead uses it via the Internet. This makes cloud computing extremely scalable and flexible, because this takes away the concerns of geographical locations, hardware performance and software configurations [28].

2.2 Software-as-a-Service

Software-as-a-Service (SaaS) is basically what the name implies, software which is being offered online as a service, with or without subscription or on-demand payment. A user will not have to worry about downloading, installing, setting up, running and updating the program, because the service provider will do that. An example of this is Netflix, a complete video streaming service offered with a subscription, accessible via a browser. So basically, SaaS can be seen as the application layer of cloud computing. While this is mostly beneficial for the end user, SaaS is also an outcome for the IT-business as mentioned in the introduction.

2.3 Function-as-a-Service

The more recently established Function-as-a-Service (FaaS) is somewhat smaller as Software-as-a-Service. Again, this is exactly as it says, with FaaS a user only runs a function on an external server. This is also called serverless computing, because you do not need a server for your application anymore. You basically have the function running on a server in the cloud. A user of

(10)

the application provides the input and the function returns the output. This makes it possible for smaller businesses to develop their own application without buying servers. Developers also don't need a system administrator to maintain the servers, they don't need to write a complete infrastructure that can scale with the demands of the applications and they don't need to handle all the administration. So basically, applications can scale up rapidly without needing to start new servers [18].

In the article Serverless Computation with OpenLambda [18] the authors show that we have reached a new stage in the sharing model with FaaS, which is shown in figure 2.1. As seen in the figure we have progressed from only sharing the hardware, which is done with virtual machines like VMWare, sharing the hardware and the operating system, as seen with containers like Docker, to sharing the runtime of a system.

The handler is started in a container, which can only be used by the handler itself. Although multiple containers run in the same run-time, communication between containers is not possible. Other functions would then be able to intercept your functions and gain access to valuable information.

On the other hand, a user will have to recognise some places where performance issues can arise. The readiness latency, the time it takes to start, restart or unpause a container, can have consequences for the overall performance [18]. And there are more like the number of containers per memory (container density), package support, cookies and sessions. A study has compared the cost, performance and response time of different implementation architectures such as monolithic architecture, microservice architecture operated by the cloud customer and microservice operated by AWS Lambda. With the microservice architecture a developer will try to develop an application as a suite of small services [11], which all run their own process. The results of this study show that a microservice operated by AWS Lambda is up to 77.08% cheaper per million requests than the other two methods, while the response times are faster than the cloud customer operated microservice architecture and about the same as the monolithic architecture [27]. There are multiple online platforms that offer FaaS. Some examples are Google Cloud Functions [15], Microsoft Azure [24], AWS Lambda and IBM OpenWhisk [19]. AWS Lambda is chosen for this project, because it's the only one to offer the service in combination with Python.

(11)

Cookery

Baranowski, in the context of his PhD work, developed Cookery, a framework to make pro-gramming with other cloud applications a lot easier [6]. Cookery makes it possible for people to combine cloud services using a high-level language. This high-level language, or Cookery language, has the same syntax as English, which makes it possible for people without any pro-gramming experience to make applications and understand what is going on. An example of the Cookery language is: A = split File text_file.txt.

When we take a closer look at Cookery, we can see that it is actually composed of three layers, which can be seen in figure 3.1. The first layer is used by a user to develop Cookery applications using the Cookery language. In this layer activities can be defined and modified. The second layer is for developers and instead of the previous layer, this layer makes use of the cookery Domain Specific Language (DSL). This layer is used for defining and modifying actions, subjects and conditions. The third and last layer is the Cookery back-end and is also intended for developers. Here developers can implement protocols, which are for the activities and data, and communication with execution environments.

As mentioned above, the Cookery language has a syntax based on English, which allows people not familiar with programming to understand what is happening. The sentences that one can make are called activities. These activities consist of other elements, like variables, actions, subjects and conditions. The mentioned elements all have their own role within the Cookery language. Variables are optional and they assign results of an activity to a label which can later be used as a reference, thus representing the data flows. An action refers to its implementation in the Cookery DSL and it represents remote operations. Subjects represent the input or output of an application, also known as remote data. They can, for example, be used for retrieving data from a cloud service. Both actions and subjects can be followed by arguments and both implementations are divided between all three levels. Conditions are used with keywords, like if or with, to separate them from the rest of an activity. They are routines defined in Cookery DSL and are meant to transform data before it is passed to an action. This data can be retrieved in different ways, including from a remote location in a subject or from a variable. The middle layer provides the Cookery DSL, which allows users to define actions, subjects and conditions (Cookery elements). These elements all consist of a name, a regular expression and a Python routine. Cookery comes with a toolkit in order to makes things easy. The toolkit enables a user to execute Cookery applications, generate new projects and evaluate expressions.

(12)

Figure 3.1: User roles and layers in Cookery [6]. The first layer is for developing applications with Cookery language, the second layer for is for defining actions, subjects and conditions using the Cookery DSL and the third layer is the back-end where protocols are implemented.

(13)

Amazon Web Services

Amazon has reacted well on the coming of cloud computing. In 2006, they started with Amazon Web Services, offering IT infrastructure services to businesses in the form of web services [1]. AWS currently offers more than 70 services across many fields in the IT business, like artificial intelligence, storage, security, computing and many more. The possibility to connect most of the services makes AWS a very robust platform. Furthermore, AWS uses a pay-per-use billing model, which make the platform very interesting from a financial point of view. The services that are of interest for this project are AWS Lambda and AWS CloudWatch, which are explained some more, and AWS S3.

4.1 AWS Lambda

AWS Lambda is Amazons service for serverless computation. It can be accessed via a web console of Amazon, where all other AWS services can also be found, via the AWS Command Line Interface (CLI), or via the API using the boto3 package [7]. The latter allows a programmer to connect with the AWS services from his own programs. The functions on AWS Lambda can be triggered via multiple ways. One way is via a website connecting with the API which is called AWS API Gateway. Functions can also be manually invoked using boto3 mentioned above. But they can also be triggered by so called events, like a file being uploaded to AWS S3 or a schedule that is set in AWS CloudWatch. Each invoked function will spawn a container to be executed in. While this can better handle security issues like hijacking someone else's functions, this also blocks the possibility to let your functions communicate. The functions are also stateless, this means that results will need to be returned or saved to a database in order to continue working with them. This is not particularly an issue with smaller applications, but for applications that run longer than 5 minutes, the maximum time a function can run on AWS Lambda, this can be troublesome. Another thing that can be considered troublesome is the fact that only Python native libraries and boto3 can be imported on AWS Lambda. Every other library that is needed, has to be included with the deployment package.

4.2 AWS CloudWatch

AWS CloudWatch monitors operational and performance metrics of AWS cloud services, includ-ing AWS Lambda. It can be used to read the logs of all functions and see whether functions terminated successfully or not. But it can also be used to create rules, for periodic invocations of Lambda functions and invocations when a certain event pattern is matched, or to set alarms, to get an email whenever a function runs more than a certain number of seconds. When a rule gets triggered, it will create an event to invoke a Lambda function. We will use AWS CloudWatch to make rules so we can schedule our Lambda functions. This way, we can automatically run Lambda functions every day or every 2 hours for example. It is also possible to run a function every 5 minutes, which allows us to run functions the whole day.

(14)

4.3 boto3

An extra Python library is needed in order to connect with AWS via Python scripts, which is boto3, also known as the API (Application Programming Interface) for AWS. We use this instead of the aforementioned console, which can be considered as the front-end of the API. In order to be able to connect to AWS services, you need to create a client or resource for a specific cloud service, like CloudWatch, CloudWatch events or Lambda as seen in listing 1. Resources represent object-oriented interfaces to AWS and provide a higher-level of abstraction than the low-level clients, whose methods map close to 1:1 with service APIs [8]. For each cloud service there, are specific functions that can be invoked via the corresponding client, for example all functions related with Lambda can be called via the Lambda client.

1 import boto3

2

3 client = boto3.client("lambda") 4 client = boto3.resource("s3") 5

6 client = boto3.client( 7 "lambda",

8 aws_access_key_id=ACCESS_KEY, 9 aws_secret_access_key=SECRET_KEY, 10 region_name=REGION

11 )

(15)

Implementation

As mentioned before, we will try to run functions on AWS Lambda by creating a framework based on Cookery and extending the toolkit. The functionalities we considered for this project are the deployment of a Lambda function and the scheduling of a Lambda function. These functionalities are first implemented in Python without Cookery to simplify the process. After that we create a use case that makes use of both functionalities and can be useful for future work. Everything mentioned here is implemented with Python 3, just like Cookery, so it can be effortlessly integrated and run with Cookery.

5.1 Deployment

In lambda_deploy.py you can find every method that is needed to deploy a function. All the functions that make use of the boto3 library are found in the file aws_connect.py, so it is separated from the rest. In order to access the services that AWS provides via the API, a user will need to have an access/secret key pair, which can be acquired from the web console or the AWS CLI. These can be configured on the system itself or just given as variables when creating the client as seen in listing 1. Users will need to think about some parameters to create a Lambda function, as seen in the create_function(...) function of the lambda client in listing 2.

A user will have to provide a fitting function name, with which the function is recognizable. AWS Lambda stores the function with this name. Variables like which runtime to use, how much time before the function times out, how much memory to use and the description of the function are straightforward. The role variable is a role for AWS Lambda to assume when it executes a function and looks something like this: arn:aws:iam::123456789012:role/service-role/c

testRole. A role is used to attach policies to, which are just permissions for invoking a Lambda function, getting full or read-only access to CloudWatch and so on. The roles and policies are basically the security system that AWS uses. A user will have to explicitly add policies to roles, but can also easily delete them. The format of a role is the ARN (Amazon Resource Name) format and this format is used to identify all resources that are available with AWS. The handler variable is a concatenation of two names, the handler name and the file name where the handler is located: file_name.handler_function_name. The handler name is the name of the method with which AWS Lambda invokes the function and it has the following syntax: def handler_name(event, context). The function can also accept environment vari-ables, which will be the same for every invocation of the function. At last the code of the function has to be supplied, which can be done in two ways. One way is to pass the contents of the zip file that contains the function. This method would not work for us, due to troubles with encoding and supplying the zip file. The other way is to zip the file and upload it to AWS S3 (Simple Storage Service), a cloud storage service, with a given bucket name and key to find the zip file within that bucket. The buckets that AWS S3 uses are basically directories to store files. The zip file gets deleted after it is send. lambda_deploy.py has to be invoked with some parameters. The directory to deploy, which must be in the same directory as lambda_deploy.py, holds the

(16)

function that a user wants to deploy, with the file containing the handler in the root of that directory. Further parameters are the function name on AWS Lambda, the handler file name, the handler method name and the optional environmental variables.

1 response = client.create_function( 2 FunctionName="string", 3 Runtime="python3.6", 4 Role="string", 5 Handler="string", 6 Code={ 7 # "ZipFile": b"bytes", 8 "S3Bucket": "string", 9 "S3Key": "string" 10 }, 11 Description="Description", 12 Timeout=300, 13 MemorySize=128, 14 Environment={ 15 "Variables": { 16 "key: "value" 17 } 18 } 19 )

Listing 2: An example of the create_function(...) function of AWS Lambda from boto3, with the variables that we are interested in.

5.2 Scheduling

As mentioned above we will use AWS CloudWatch to schedule AWS Lambda functions, by creat-ing rules. The schedulcreat-ing function needs fewer parameters than the deployment function, which makes it easier to implement. All functions that are needed for the scheduling and that need to communicate with AWS using boto3, can also be found in aws_connect.py. A user will need to specify the period with which the function needs to be invoked. This period consists of a number and a period specifier, like minutes, hours or days. We have put up some constraints to make it easier to work with a period. A user will only be able to specify periods of 1-59 minutes, 1-23 hours and 1-30 days. This makes handling rules easier since we do not have to handle multiplica-tions like 300 minutes. On top of that we can also easily check if a rule already exists. Then we need a name for the rule and to make things simple we just use the concatenation of the period, like 2hours and 1day. The description of the rule will be an easy reflection of the rule. The lambda_schedule.py first checks the given period and, with the rule name, checks whether this rule already exists or not. When this is the case, we just add the Lambda function as a target

(17)

show what this combination is capable of and the field we are enabling, we will create a use-case. A use-case can be anything and can have multiple cloud services involved. An example is posting a picture on Instagram and immediately save it on Dropbox, which is one of the possibilities as explained by IFTTT (If This Than That) [20] or getting the last-minute news from a website. But a use-case in AWS Lambda can be bigger than this, because AWS Lambda has more com-putational power at its disposal. So, for this project we will create a more complicated use-case, which is a GitHub [13] monitor. This program monitors for commits in a given repository and notifies people by means of an email that shows all the changes made that commit. Another reason to choose for this use-case is the ability to test it whenever we want and the testing is easy. With other use-cases we would be depended of, for example, the news to be released, or we would have to build our own website. The GitHub monitor makes use of the GitHub REST API [14] to get all the information while being authenticated with a GitHub personal access token. The scopes of this token can be configured by the user, like giving access to all private repositories, but not being able to delete them or to change user data. Making authenticated requests gives us the opportunity to send 5000 requests per hour to the API, while making unauthenticated requests only gives us 60 requests per hour. This program first checks if the authenticated user has access to the given repository, for now the program only checks the user's repositories. This is done by sending a request to the REST API with the Python library urllib. The request is also needed to authenticate to GitHub by adding a header with the personal access token to it, which can be seen in listing 3.

1 def make_request(url):

2 request = urllib.request.Request(url)

3 request.add_header("Authorization", "token " + github_token) 4 response = urllib.request.urlopen(request)

5 data = response.read().decode("utf-8") 6

7 return data

Listing 3: The function to make requests to GitHub with authentication using urllib When we send a GET request to url = https://api.github.com/user/repos, we will get a string containing JSON1 as response, which we can easily be deserialized to a Python object, with the accessible repositories. This way we can easily examine every repository and check if the given one is among them. If we find the right one, we need to get all the commits of that repository. By sending another request which is a little more specific, we will get a list of all the commits, sorted by time, with the last commit first.

url = "https://api.github.com/repos/" + repo["owner"]["login"] + "/" +

repo["name"] + "/commits"

,→

.

This list gives us a lot of information about the commit, but not the information we want: the changes that are made in that commit. But it does contain the commit time, which we can use to compare with a pre-set time that depends on how big the request interval will be. The commit is considered new, whenever the commit time is later than that pre-set time, and that means we would like to get the changes that are made. To get that extra information, we will have to send yet another request with the unique sha of the commit:

url = "https://api.github.com/repos/" + repo["owner"]["login"] + "/" +

repo["name"] + "/commits/" + data[0]["sha"]

,→

1_{JSON (JavaScript Object Notation) is used to exchange and store data between a server and a browser as} text, written with JavaScript object notation[21].

(18)

In the response, we can see how many additions and deletions are made, and what the changes are in every file. This information will then be put into the body of an email, which will be sent with Gmail [17] using the Python SMTP library. The authentication of Gmail is done with an application key, just like with GitHub. The email will look like figure 5.1, with the complete link to the commit on GitHub at the bottom. As mentioned before, the Lambda functions are stateless and terminate after 5 minutes. This means that we cannot use any variables after termination, except when we save them to a database or communicate them back. The way the GitHub monitor works is thus the simplest. We have the same problem with giving the repository to check. This information will be lost after termination and needs to be given every time the function gets invoked. This is where the environmental variables come in. They can be given to a Lambda function as key-value pair and they will be the same for every invocation when deployed. This makes it also possible to reuse the GitHub monitor to check another repository, without changing the code.

Figure 5.1: An example of the email the user gets when a new commit is found. The complete link to the commit on GitHub is replaced.

(19)

Results

We have developed the deployment and scheduling functionalities, which work as they should. The Lambda function immediately pops up on the web console and can instantly be invoked, which can be seen in figure 6.1. Same holds for the scheduling of a Lambda function, if needed the rule is directly created and the function is made a target for the rule. The rule starts immediately with sending its periodic triggered events to invoke the targeted functions 6.2.

Figure 6.1: The AWS web console showing the available functions in AWS Lambda.

Figure 6.2: The AWS web console showing the rule in AWS CloudWatch which triggers every 5 minutes, with the targeted AWS Lambda functions.

The intention is to run the GitHub monitor 24 hours a day, without any other constraints, which means that we can create different schedules and see which one performs the best. At first, we let the GitHub monitor make a request for the commits every second and check if there has been a commit within the last 1.5 seconds. This gives the most real-time notifications, but it is also easy to miss a commit. Especially if we consider the time it takes to send an email, which is about 7 to 8 seconds. On top of that, the commit time is not the time that a commit has been pushed. This means that a commit can be made way before the time that we check for new commits or that multiple commits have been made that day, while the monitor can only find the last. When we change the request interval to 59 seconds and the time for a last commit to 70 seconds, we counter the emailing latency and the occasional commit miss. The interval is set to 59 seconds, so the time it takes to sleep and make a request together do not exceed 60

(20)

seconds. On the other hand, this method makes the monitor less real-time. Another problem that is bound with these 2 intervals lies with the termination of the function. AWS Lambda states that the maximum run time of a function is 5 minutes, but it also states that depending on the event source, it may retry the failed Lambda function [12]. This means that we cannot just enter an infinite while loop, but we have to properly terminate the function. We do this by letting the program sleep for 59 seconds before making another request and count the times we made requests. We end the program after 5 requests, this roughly ends up to the needed 5 minutes. Just running the GitHub monitor with a request interval of 1 minute gave interesting results. Most of the time the function would run correctly as seen in 6.3. But there were some invocations where AWS Lambda tended to get ”stuck” during run time, resulting in a failed function as seen in 6.4.

Instead of letting the program sleep for the hard-coded 59 seconds in the example above, we can use the get_remaining_time_in_millis() method that comes with the context object passed as a parameter by the handler method. This is the best way to keep track of the remaining time, because it eliminates the email latency. We set the maximum running time of the Lambda function to 5 minutes and we still check every minute for new commits. After every request, we divide the number of times to check, which is called count, by the remaining time the function has left and decrease count by one. We end the function when we have made five requests. In this way, the function can better handle dynamic latencies, like request latencies or the mentioned email latency, and can be done in roughly four minutes, which can potentially save up to 288 computing minutes per day.

We can also run the program once every 5 minutes and check if there has been a commit in the last 5 minutes. This is the safest method considering the latencies, but it is also the least real-time.

Figure 6.3: GitHub monitor running correctly with a request interval of 59 seconds.

(21)

Conclusion and Discussion

7.1 Conclusion

Our research shows that it is possible to get a framework based on Cookery working on AWS Lambda, as proposed by our research question: How can we develop a framework based on Cookery and running on AWS Lambda? We can deploy functions on AWS Lambda in an easy way by providing parameters and eventually using the Cookery high-level language. These functions can then be scheduled in the same way using the scheduling function. This allows us to run functions on AWS Lambda without manually invoking them. The performances of AWS Lambda are good with the tested use-case, just like the time it takes to deploy and schedule functions. The interesting discovery of the function that was ”stuck” happened only once and shows that, even on AWS Lambda, such errors can occur. The error is likely to be a crash of the container or server and can happen to any other server. Although AWS Lambda may retry a function if it fails, developers still need to be aware that this can occur, while developing an application.

The use-case shows one of the many possibilities we can achieve by using AWS Lambda. The ability to deploy and schedule a function, without having to manually invoke it, can be very helpful. AWS Lambda is also very versatile and can be used in combination of other cloud providers and AWS services. The deployed functions can also be used as a back-end for an application. The developed functionalities make it easier to deploy applications using AWS Lambda, without creating an infrastructure or be dependent of funds.

The security of AWS Lambda lays totally in your own hands. A user has to add policies explicitly to roles, like invoking a Lambda function or adding a rule to CloudWatch. The usage of the access and secret key pair of AWS and the application keys of GitHub and Gmail make authorization easier. The keys can be managed from the source itself. These keys are relatively safe compared to login and passwords, considering that the keys are easily disposed of when security is breached. On top of that, the GitHub key can have different scopes, managed by the user, which narrow down the functionalities.

Thus this project enables a field for developers who need more computational power, but lack the financial support or the programming experience, to still be able to develop and deploy applications in an easy way. With this project we have created an elementary block for the Cookery ecosystem. The code of this project will be added to Cookery.

7.2 Discussion

There are some things that need to be noted. The event and context parameters that are passed by the handler method are useful, but are configured by AWS Lambda. This means that the testing of an application which uses these parameters needs to be done on AWS Lambda, while applications without them can be tested locally. This does not mean that applications that run locally can immediately run on AWS Lambda, because you need to consider the configuration of

(22)

the execution time and memory. Another thing to note is that AWS Lambda functions can only use Python native libraries and boto3, any other library needs to be added to the deployment package. This can be an issue for some applications that use many or big libraries.

FaaS and SaaS are really different things from a developer point of view. With SaaS the whole application has to be build from the ground up, including infrastructure, error handling and everything else. While FaaS is much more modular, because it only uses functions that can be used separately or as a full back-end. So while SaaS is more for full stack applications, including front and back-end, FaaS is more basic, and can be used for the back-end of applications and to simply run functions. FaaS is thus easier accessible than SaaS, especially when we consider that we do not have to write an infrastructure for FaaS. We can say that FaaS and SaaS have different purposes, which means that FaaS will not replace SaaS.

7.3 Future work

In the future Cookery can be extended with more services of AWS and other cloud providers, to create a broader framework and to enable more developers to create applications. For example, an interesting extension would be with AWS DynamoDB[3] or AWS RDS (Relational Database Service)[5] to make it easier to create and manage databases using Cookery. When we combine databases with the functions of AWS Lambda, we can create more complicated applications and deploy them using Cookery. This also means that the toolkit of Cookery can be extended with more services and functionalities in future projects.

(23)

[1] _{About AWS. url: https://aws.amazon.com/about-aws/ (visited on 05/10/2017).} [2] Michael Armbrust et al. “A View of Cloud Computing”. In: Communications of the ACM

53.4 (2010), pp. 50–58.

[3] _{AWS DynamoDB. url: https://aws.amazon.com/dynamodb/.} [4] _{AWS Lambda. url: https://aws.amazon.com/lambda/.} [5] _{AWS RDS. url: https://aws.amazon.com/rds/.}

[6] Mikolaj Baranowski, Adam Belloum, and Marian Bubak. “Cookery: a Framework for De-veloping Cloud Applications”. In: 2015 International Conference on High Performance Computing Simulation (HPCS). 2015, pp. 635–638.

[7] _{Boto3. url: https://boto3.readthedocs.io.}

[8] _{Boto3 Low-level Clients. url: http : / / boto3 . readthedocs . io / en / latest / guide /} clients.html (visited on 04/20/2017).

[9] _{Celebrating half a billion users. url: https://blogs.dropbox.com/dropbox/2016/03/} 500-million/ (visited on 05/21/2017).

[10] _{Dropbox. url: https://www.dropbox.com/.}

[11] Martin Fowler and James Lewis. Microservices a definition of this new architectural term. 2014. url: https : / / martinfowler . com / articles / microservices . html (visited on 05/18/2017).

[12] _{Function Errors (Python). url: http://docs.aws.amazon.com/lambda/latest/dg/} python-exceptions.html (visited on 05/28/2017).

[13] _{GitHub. url: https://github.com.}

[14] _{Github REST API. url: https://developer.github.com/v3/ (visited on 05/25/2017).} [15] _{Google Cloud Functions. url: https://cloud.google.com/functions/.}

[16] _{Google Drive. url: https://www.google.com/drive/.} [17] _{Google Mail. url: https://mail.google.com/.}

[18] Scott Hendrickson et al. “Serverless Computation with OpenLambda”. In: HotCloud’16 Proceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing. 2016, pp. 33–39.

[19] _{IBM OpenWhisk. url: https://developer.ibm.com/openwhisk/.} [20] _{If This Than That. url: http://ifttt.com.}

[21] _{JSON. url: https://www.w3schools.com/js/js_json_intro.asp.}

[22] Mariam Kiran et al. “Lambda Architecture for Cost-effective Batch and Speed Big Data processing”. In: BIG DATA ’15 Proceedings of the 2015 IEEE International Conference on Big Data (Big Data). 2015, pp. 2785–2792.

[23] Maciej Malawski. “Towards Serverless Execution of Scientific Workflows HyperFlow Case Study”. In: Proceedings of the 11th Workshop on Workflows in Support of Large-Scale Science. 2016, pp. 25–33.

(24)

[24] _{Microsoft Azure. url: https://azure.microsoft.com.}

[25] Harvard Business Review Analytic Services. Cloud Computing Comes of Age. 2015. [26] Josef Spillner and Serhii Dorodko. “Java Code Analysis and Transformation into AWS

Lambda Functions”. In: CoRR abs/1702.05510 (2017).

[27] Mario Villamizar et al. “Infrastructure Cost Comparison of Running Web Applications in the Cloud using AWS Lambda and Monolithic and Microservice Architectures”. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 2016, pp. 179–182.

[28] Lizhe Wang et al. “Cloud Computing: a Perspective Study”. In: New Generation Computing 28.2 (2010), pp. 137–146.

[29] Mengting Yan et al. “Building a Chatbot with Serverless Computing”. In: MOTA ’16 Proceedings of the 1st International Workshop on Mashups of Things and APIs. 2016, 5:1– 5:4.