Interactive Restaurant Menu Semantic Web technology for personalized meals

(1)

Interactive Restaurant Menu

Semantic Web technology for personalized meals

Petra Ormel 10607005

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors Dr.-Ing. A. Khalili dr. B. Bredeweg

Knowledge Representation & Reasoning Research Group Dept. of Computer Science

Faculty of Sciences Vrije Universiteit Amsterdam

De Boelelaan 1105 1081 HV Amsterdam

(2)

Abstract

Many studies have shown that diets can have an effect on the pre-vention and recovery of different diseases such as cancer, cardio-vascular diseases and obesity (Willett et al., 1994). Therefore, people could not only have specific taste preferences, but also health related issues that can influence the requirements of their meal. This project investigates the idea of an interactive restaurant menu that can compose meals based on someones personal condition including health. With the aid of RDF data provided by the Semantic Web and by the use of SPARQL queries, this project has made up a demo that shows how knowledge from different domains can interoperate and provide a personalised dish.

(3)

1 Introduction

Many studies have shown that diets can have an effect on the prevention and recovery of different diseases such as cancer, cardio-vascular diseases and obe-sity (Willett et al., 1994). Therefore, people could not only have specific taste preferences, but also health related issues that can influence the requirements of their meal. The search through web-pages to find these personal requirements and subsequently the selection of the most suitable dishes is time consuming and moreover, sometimes not even possible. The problem is that a great deal of data has no semantic meaning to the computer which makes it impossible to return relavant “hits” (Berners-Lee, Hendler, Lassila, et al., 2001). Therefore this project will investigate the idea of an interactive restaurant menu that can automaically compose meals based on someones personal condition.

This will be done with help of the Semantic Web which is an extension of the World Wide Web where all data is provided in Recourse Description Framework (RDF) and therefore can be semantically interpreted by a machine (Berners-Lee et al., 2001). In RDF data, every entity can be defined by its own Uniform Re-source Identifier (URI), also the relation between every entity with other entities in the domain is defined in an ontology which provides a flexible domain inde-pendent data model (Berners-Lee et al., 2001). A step higher in structured data is Linked Data where the relations between the different domains are defined. The Linked Open Data Cloud is an example of such a collections of datasets. A picture of the Linked Open Data Cloud diagram is shown in Figure 1, all these datasets have been published in Linked Data format (Max Schmachtenberg & Cyganiak, 2014).

(5)

With this Linked Data, the question that arises is: “Is this data suitable to provide personalized meals?” There are already other general and specific domains utilizing the Semantic Web in order to achieve specific goals. DBpedia is an example for general knowledge which acts as a hub on Linked Open Data (Lehmann et al., 2015). It is a community effort that extracts information from Wikipedia and provides this data in RDF format on the Web. An example of an organisation that exploits the Semantic Web on a specific domain is “Influenza Research Database” (Squires et al., 2012). The mission of the Influenza Re-search Database (IRD) is to provide a resource for the influenza virus reRe-search community that will facilitate an understanding of the virus and how it inter-acts with the host organism, leading to new treatments and preventive actions (Squires et al., 2012). Also “open PHACTS” is an organisation that exploits the Semantic Web on a specific domain, their goal is to deliver and sustain an ‘open pharmacological space’ using semantic web standards and technologies (Williams et al., 2012). For the pharmaceutical industry, it would be helpful if search engines could return the right information after questions like: “A lead molecule is characterized by a substructure S. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure S.”(Williams et al., 2012). Therefore, the goal of “open PHACTS” is designing a shared inter-operable platform that will return the appropriate information in a non-manual way (Williams et al., 2012).

In order to determine whether the Semantic Web can facilitate the needs for an interactive restaurant menu. The possibilities of an application called ”chooseEAT” will be investigated. This application is able to select the best matching dish from a restauran. Another application that already exist related to this domain is IBM Chef Watson (IBM, 2015). While the focus in IBM Chef Watson lay on ‘What you can make with left overs in the fridge”, this project will focus on how an application could help by serving a personal health related meal.

Therefore the research question in this project is: “How can user information together with data about diet recommendations and restaurant menus interop-erate in order to serve a personalized dish?” To approach this question, first of all the model of the involved domains must be defined. Thereafter the Semantic Web must be utilized in order to find out if there is enough structured data available in these domains, if not this needs to be extracted. Knowledge bases that do not exist yet must be simulated in a generic way where-after a manner must be defined to extract the appropriate data from the knowledge domains. This will be used in a demo that will serve as a proof of concept of the appli-cation. In the next sections the methods that are used for these aspects are explained.

2 Method and Approach

To discover detail regarding the domain knowledge, interviews with experts in the field of dietetics, pharmacy, and sports and movements are held. Dieti-tians are experts on human nutrition and the regulation of diets and therefore could probably deliver valuable information. It is expected that diseases and

(6)

medicines have relations with foods as well, therefore a pharmaceutical expert is approached. Moreover it is expected that performing specific kind of sports could influence a personal diet, therefore an expert on the field of sports and movements is contacted. Section 3 provides an exposure of the interviews includ-ing the result. Based on the model of the domains obtained by the interviews, a scenario of the final application is created which has served as a guideline in this project. The result of the scenario is shown in section 4.

To determine which knowledge bases are already available in RDF format, the World Wide Web and the Semantic Web are utilized. All the available data is evaluated by specific criteria explained in section 5.1.

As a method to convert non-RDF data, openRefine (openRefine, 2016) was used. This is a powerful tool for working with data that is amongst other things; able to transform one format into another, clean data, and extend data with web services and external data. More about how this is done is explained in section 5.2.

To simulate generic knowledge bases for missing data, information from the interviews have been used to create a dataset formatted in Turtle (Terse RDF Triple Language) which is a human readable serialization of RDF that provides a number of constructs to make it easier to write down RDF (Berners-Lee et al., 2001). To be compatible with other data, existing terms for specific types and properties were taken into account. This was done with the aid of schema.org and lotus.lodlaundromat.org. Results are shown in section 5.3.

For the proof of concepts a demo of the application is implemented in Python. Data from the user is obtained by the means of a user-interface. Relevant data from the available datasets are extracted by the use of SPARQL which is a means to query interlinked data (DuCharme, 2013). Algorithms and compu-tations are executed to integrate the different knowledge bases. The result is provided in section 6.

3 Interviews

3.1 Introduction

In this project the Semantic Web is utilized in order to create personal diets which can be used to match the proper dish to a person’s condition. Therefore, all different domains that have relations with creating a personal dish, must be specified. This makes it possible to search for appropriate knowledge bases. However, not only the domains that have the relations are important; the rela-tions themselves are important as well. When these are specified it is possible to link the ontologies of the different domains which will determine how they could interoperate. In this project, this information is for a great part obtained by contact and interviews with experts on the domains of pharmacy, sport and movements and dietitian. This section will give a summary of the interviews and the result in the form of a model about the domain.

(7)

3.2 Expert contact

3.2.1 Pharmaceutical expert Bita Sedaghati

The first interview was with Bita Sedaghati through Skype, she is a senior PhD researcher with expertise in pharmacy, biochemistry and cell biology. She worked as a Pharmacist and Pharmaceutical Technology Researcher and is cur-rently working at the university of Leipzig. Bita was involved in shaping the idea of the application in this project. After the interview, Bita was asked some questions through email.

In the interview she was asked if she knew about drug entities or medical constraints that could influence a personalized meal and if she new available structured databases about this information. She replied that apart from general categories as the weather or the preferable taste of the user, there are two main categories that are important for this application. These are the nutrition facts which must be stated for the whole meal and not just the ingredients and a category based on diseases. These aspects will create personalized meals based on taste but also based on the person’s safety and well being.

To fulfil this function there must be a user interaction to specify a person’s medical condition and drug intake. She gave as an example that when someone suffers from a high blood pressure disease and eats too much salt, it can result in sleeping problems. Therefore the application must be aware of the disease and filter meals on the restriction that the amount of sodium is beneath a specific value. Databases that might provide information about this are stated in the subsection “links”.

Also the user’s medicine intake must be specified, since drugs can have in-teractions with specific foods. She gave here as an example that sedative drugs in combination with alcohol can lead to extreme long sleeping up to around 14 hours. Data with more of this information is also stated in the section “links”. Finally she mentioned that People could also follow specific diets such as Vege-tarian, Vegan or Rawtarian that could influence a personalized meal. finally, she clearly emphasised that the accent in this application must not lay on personal taste preferences but on providing safe dishes, based on somebody’s personal condition.

The interview made it clear that nutrition facts of foods are relevant to determine whether the food is suitable for someone’s health condition. For the application it is necessary to specify which nutrition facts are important and what their relations are to specific personal conditions. Therefore she was asked this question through email. She replied that actually all nutritional facts are important. First of all, because every individual needs thorough nutritional value daily. Secondly because specific groups such as the medically conditioned group (e.g. diabetics and, cardiovascular diseases), categories based on age (children, adults, elders), special diets (e.g. vegan and vegetarian) or special conditions (e.g. pregnant women) influence the daily recommended intake of specific nutritions. According to her the most important factors are: Calories, Saturated fat, Cholesterol, Sodium, Fibers, Sugars, Protein, Calcium and Iron. Important examples she gave are listed in appendix A.

(8)

3.2.2 Sports and movements expert Jolanda Seijger

The second contact was through email with Jolanda Seijger. She is head of groupfitness and fitness at the VU Sports Centre and expert on the field of sports and movements. She was asked if she knew about aspects that could influence a meal from her field of expertise.

She replied that the amount of calories a person needs depends on several aspects such as gender, age, length, weight, fat percentage, muscle percent-age and the total amount of movement. Inclusive the type of job, amount of sleep and activities somebody does in spare time. These influence someones vital metabolism (rest metabolism) and the metabolism when somebody moves which can indicate how many calories someone burns and therefore has to in-clude in his or her meal. She said there exist schemas to calculate these values, however, she did not know where to find these. In conclusion, there are a lot of specific aspects that determine the amount of calories a person needs.

3.2.3 Dietician Trutchka Bouterse

Trutchka Bouterse is a dietician and currently working in a dietetic practice in Suriname. Before she finished the education “food and diet” at the Amster-dam University of Applied Sciences and she has already 15 years of experience in hospitals. She was interviewed through Skype and asked if she knew about aspects that could influence personalized meals based on her experiences. Also the relations of these aspects and how they influence a personal diet were a topic.

During Trutchka’s working experience as dietician and in hospitals she be-came aware that salt is a great issue to current society. High blood pressure diseases, diabetes and a variety of cancer forms aggravate by the intake of too much salt. The daily intake recommendation of salt is 6 grams but people often come far across this amount. As an example of relations between diseases and how they influence a personal diet, she mentioned that diabetes has a nega-tive effect on the working of the kidneys. The function of the kidneys is to maintain the liquid balance and the right blood pressure. When the working of the kidneys is disturbed, they will not be able to cease salt, which causes high blood pressure. High blood pressure has again a negative effect on the kidney’s function, therefore this process can aggravate itself by the intake of too much salt. As a comment she stated there is a difference between type of salts. For example sodium is worse than other types of salt like potassium.

Trutchka mentioned that there exist many diseases that have a negative interaction with specific foods. Another example she mentioned is that some ingredients, such as ‘maca’, can cause increasing of oestrogen. This is danger-ous for people who suffer for example breast cancer. However, there also exist positive disease-food relations, the intake of vitamin B12 could decrease the ef-fects of Alzheimer and Vitiligo. Comparable to these examples, there exist more disease-food relations. She gave some helpful websites where this information can be found. They are stated in the section “Links”. Apart from diseases, also religion, food in-tolerances or diets such as vegetarian are aspects that can

(9)

influence a personalized meal. Because some of these diets imply that the intake of important vitamins and minerals decrease, this must be compensated by the intake of supplements or extra intake of other ingredients.

In this interview came clear that there exist a great amount of diseases that should influence the diet of the people who suffer from it. If the given websites supply datasets where all the diseases and their food recommendations are well structured, it would be of great value for the application. Furthermore the in-terview gave insight in new aspects and their relation to foods, such as religions and personal diets. They can not only cause prohibition of specific ingredients but can also encourage the intake of specific foods.

3.3 Conclusion

To retrospect at the introduction, the interviews should first of all provide in-sight into which domains are relevant for creating personalized diets. It can be concluded that they are: drug intake; diseases; personal information (e.g. gen-der, age); body information (e.g. length, weight, fat percentage); lifestyle (e.g. type of job, amount of sleep); diets (e.g. vegetarians, vegan); food intolerances; special conditions (e.g. pregnancy); and religion. The provided information by the experts made it possible to design a model with the relations of all involved domains, which can be observed in Figure 10 in appendix B. Parts of this dia-gram are used for the design of the scenario of the application which is explicit in the next section.

3.4 Links

Advised by Bita: • http://www.webmed.com • http://drugbank.ca/ Advised by Trutchka: • http://www.food-info.net • http://www.voedingenkankerinfo.nl • http://www.hartstichting.nl

(10)

4 Scenario

(a) (b) (c) (d) (e) (f) (g) (h)

(11)

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j)

(12)

The scenario is the first design of the final application. It reproduces the user interface, how user information will be used to create a personal diet and how this diet results in the application’s output. The result of the scenario is shown in Figures 2 and 3. In this section an explanation of the result is provided.

As shown in Figure 2 (a) the application, which will eventually be able to run on a tablet or mobile phone, starts with creating a user profile by the means of a user interface. The profile is to safe time for the user since personal information does not have to be filled in every time the app will be used. Unchangeable personal information such as name, gender and birth-date are asked and stored in the user’s profile (Figure 2 (b,c)). After, there are options for categories such as medical condition, drug intake, diets, special conditions or religion (Figure 2 (f)). The user is able to fill in its personal details in those domains (Figures 2 (g,h) and 3 (a, b, c, d)). This information is stored as well, in such a manner that it can be easily adjusted when the personal condition changes.

When the user is done providing personal information, the application cre-ates a personal diet by the use of different databases about food-recommendations it is connected to. The diet consists of two main parts. The first part con-tains information about the recommended nutrition values, for example that the amount of Cholesterol may not be above a specific value per dish. The second part is a list of forbidden ingredients. Causes, such as drug-food interac-tions, age, pregnancy, food in-tolerances and more can lead to the requirement to exclude specific ingredients from a personal meal.

When the personal diet is created, the user profile is completed (Figure 3 (e)). Now the user can search for a restaurant in a database (Figures 3 (f,g)). Every restaurant will contain information about the available ingredients and recipes (Figure 3 (h)). An algorithm will use a database about food ingredients and their nutrition values to calculate the total amount of each nutrition in ev-ery dish. When the user has chosen a restaurant, another algorithm will choose a dish that best suits the user’s health condition and rank the others similarly (Figure 3 (i)). When the user clicks on a dish the nutrition values will be shown and whether they are in line with the user’s health condition. Users will always be able to check based on which facts their diet is created in order to decide whether they want to order the dish (Figure 3 (j)).

5 Databases

5.1 available databases

The valuable databases for this project are divided in three categories. The first is data about ingredients and their nutrition values. To the second category belong databases about diseases, drug-intake, existing food allergies, religion, and different lifestyles together with all their food-relations. The last category is data about restaurants with their recipes and ingredients. All databases that were found in these categories were checked according to criteria provided by Open Knowledge International (Pollock, 2004). In Figure 4 the results are

(13)

shown.

Figure 4: Database evaluation

A part of the databases were recommended by the domain experts, almost all other databases were obtained from the data management platform Datahub available at datahub.io. From the results can be concluded that in the cat-egory food and their nutrition values, two useful datasets are available. The first is “Open food facts”, available at openfoodfacts.org, which is a free and publicly collaborative database of food products and their nutrient informa-tion from over the world. The data is available in RDF format and there-fore machine readable. Advantages of this data is that it is structured and can be queried through a SPARQL endpoint. A disadvantage of the data is that it is only about food products such as instant pizza’s and their total nutrition contents. The other useful dataset in this category is the Compo-sition of Foods Integrated Dataset (CoFID) from the UK government avail-able at www.gov.uk/government/publications/composition-of-foods-integrated-dataset-cofid. This data contains 2900 ingredients with their nutrition facts and is publicly available for free. While a licence is not required for the data, a disadvantage is that it is structured in Excel format instead of RDF.

In the category conditions and their food-relations only one useful database was found: DrugBank (available at http://drugbank.com) is a publicly free available bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. It is an regularly updated, online RDF structured database containing 8206 drug entries. The database provides a SPARQL endpoint which allow users to filter all the prop-erties of specific drugs. It could be well used in this project to find the relating food interactions. However, a disadvantage of DrugBank is that the server is regularly down, probably due to its open availability. The problems with the other databases were in most cases that the data was available on the web page but that the underlying structured datasets could not be accessed.

(14)

5.2 Conversion of databases

In this project is decided to use the CoFID database in the first category of databases because it is expected that restaurants often use fresh ingredients to prepare dishes instead of instant products from supermarkets. Therefore the CoFID dataset is converted into RDF format with openRefine. As a result, this data can be used to calculate the total amount of nutrition values in recipes from various restaurants. The CoFID excel file contains 2900 ingredients with relevant nutrition facts but also a great deal of irrelevant information such as researches from where the values were obtained from. Therefore all the unnec-essary values were deleted from the data. Furthermore, not all the provided nutrition values were in the same unit. Therefore some columns had to be edited, this was done by the use of Regular Expressions. Documentation about Regular Expressions is provided on http://regexr.com.

The next step was to convert the data into RDF. The base URI of the database was set to: http://chooseeat.org. All the ingredients in the file were set to type “Food” and were given the property “chooseeat:foodCode” which is a number, up to six digits, representing the unique CoFID and is used in the URI for every ingredient. Other properties are: “rdfs:label” which has the ingredient name as object; “food:group” which specifies the group of the ingredient such as “dairy foods”; and 50 properties about nutrition facts, all formatted in the same way, such as “food:proteinPer100g”. Some objects have the value “Tr”, which means that there were traces found of the nutrient, or “N” which indicates that the nutrient is present in significant quantities, but there is no reliable information on the amount. When all the triples were defined, openRefine transformed the data into the RDF serialization “Turtle”. In order to query the converted data, it was uploaded on the local triple store Stardog which provides a SPARQL endpoint for the data. Documentation about this tool can be found on docs.stardog.com.

5.3 Simulation of knowledge bases

Figure 5: Personal Information Dataset

In order to show how knowledge from different domains could interoperate, at least two databases were necessary in the category “conditions and their food

(15)

relations”. DrugBank was the only appropriate database found in this project. Therefore as a solution for the missing data, knowledge from one of the other in-volved domains was simulated. The data was based on the information provided during the interview with Bita Sedaghati. Amongst other things, she explained how age is related to food. The result is a generic constructed dataset shown in Figure 5 that could be easily replaced or extended with other data.

In the design of the scenario was described that food recommendations should provide statements about the maximum or minimum amount of nutrition values per dish. Web searches and literature studies were done in order to ob-tain this information; however, not for every condition this was available. This is probably due to the fact that these exact values depend on too many aspects. Therefore the diet in the application will consist of three lists: one with nutri-ents that must be encouraged, one with nutrinutri-ents that must be avoided, and a list with forbidden ingredients. In Figure 5 these lists are specified by the object values “more” and “less”, and by the property “chooseeat:forbiddenIngredient” which takes the name of forbidden ingredients as objects. The food recommen-dations for a person in the age category “child” are described in a food guideline of type: “chooseeat:FoodGuideline”. In this case the subject food guideline has property: “chooseeat:guidelineCategory” with the object: “ageNutrition” and for the property: “chooseeat:targetAge”, the object: “child”. These properties are followed by properties about the amount of nutrients needed, where the objects could be either “less” or “more”. This dataset is used in the proof of concepts which is explicit in the next section.

6 Proof of concept

6.1 Global design

(16)

A proof of concept is made to show how knowledge from different domains can interoperate. The proof of concept is a demo of the final application. The program is in a generic way designed such that it is possible to extend it with data. Figure 6 shows the activity diagram of the program which is divided into three main sections: the user, algorithms, and external data. The entire code can be accessed on http://github.com/petraormel/chooseEAT. In the next section all steps of the diagram in Figure 6 will be explicit.

6.2 Implementation

6.2.1 Userinterface

Figure 7: User interface

The proof of concept is implemented in Python which is a user friendly pro-gram language that contains RDF libraries. The demo starts with a user inter-face shown in Figure 7. This user interinter-face is created with “Tkinter” which is Python’s de-facto standard and commonly used GUI (Graphical User Interface) package. In the window the user can fill in the following information: First and last name; email address; age, which must be an integer; and the drugs it is using, this can be the name of any drug under the condition that it is spelled correctly.

6.2.2 User data

(17)

mannheim . de / drugbank / s p a r q l ” ) s p a r q l . s e t Q u e r y ( ” ” ”

PREFIX drugbank : <h t t p : / / w i f o 5 −04. i n f o r m a t i k . uni− mannheim . de / drugbank / r e s o u r c e / drugbank/> PREFIX drug : <h t t p : / / w i f o 5 −04. i n f o r m a t i k . uni−

mannheim . de / drugbank / r e s o u r c e / d r u g s/> SELECT d i s t i n c t ? drug

WHERE {

OPTIONAL {? drug drugbank : genericName ”””+ drug +”””}

OPTIONAL {? drug drugbank : synonym ”””+ drug +””” }

OPTIONAL {? drug drugbank : l a b e l ”””+ drug +”””}

} ” ” ” )

When the user presses the “Ok” button, The user’s URI, which is made identical by a code that represents the time the profile is created, will be added as the RDF type ”FOAF.Person” to an RDF graph from the Python library “RDFlib”. All entries are stored as properties of this type, besides the drug name. From the drug, first its URI must be specified. SPARQLWrapper is a SPARQL endpoint interface to Python which is used for querying the DrugBank database with the query stated above. The retrieved drug URI is added as a “chooseeat.takeDrug” property to the user’s Graph.

6.2.3 Make diet

In the next step, the function “makeDiet” is executed. In this demo are two ex-ternal databases connected to the program, explained in the previous section; a dataset about food recommendations related to a person’s age and the database DrugBank. Therefore, “makeDiet” consists of two functions: “addDietPerson-alInformation” and “addDietDrugIntake”. In these functions is described how the external data relate to the personal diet, in the Linked Open Data approach this is also called “ontology linking”.

In the dataset about personal information, age is divided in categories. Therefore, “addDietPersonalInformation” first of all extracts the age from the user’s graph and determines to which age category it belongs. People under the age of 18 are called “child”, above 64 “senior”, and inbetween “adult”. Since the personal database is stored on the local triple store Stardog, it can be queried with the SPARQLWrapper. The values for the properties “needMoreOfNu-trition”, “needLessOfNutrition” and “forbiddenIngredient” that belong to the concerning age category are extracted from the database and added respectively to the properties “chooseat.more”, “chooseeat.less” and “chooseeat.forbidden” of the user.

The second function “addDietDrugIntake” extracts the drug URI from the user’s graph (if it has one). With this drug URI it queries DrugBank to

(18)

find all the related food interactions. If one of the food interactions explic-its “avoid alcohol”, the ingredient “alcohol” will be added to the property “chooseeat.forbidden” from the user.

Figure 8: Restaurant data

6.2.4 Choose restaurant

For the demo, a small Turtle formatted restaurant dataset is created of which a part can be observed in Figure 8. It exists of one restaurant which has five recipes including its ingredients and nutrition values. For the eventual applica-tion, an extra algorithm can be appended that calculates the nutrition values from the amount of ingredients with help of the converted CoFID database. The function “chooseRestaurant” returns the URI of the restaurant in this dataset. In the final application the user will have influence on the restaurant choice in this section.

6.2.5 Select dishes

The function “selectDishes”, checks whether the chosen restaurant contains recipes with forbidden ingredients. The output of the function are two lists; one with the URIs of the recipes without forbidden ingredients, and one list of recipe URIs that contain forbidden ingredients. The first list is the input for the next function: “rankDishes”.

(19)

6.2.6 Rank dishes

“rankDishes” returns a list of the dishes from best to least suitable for the user. For every nutrition value of the user’s property “chooseeat.more”, the algorithm orders all dishes in ascending order and puts them in a separate list. For every nutrition value of the user’s property “chooseeat.less”, it orders the dishes in descending order and puts them in separate lists as well. For all the lists, the algorithm defines the index of each recipe. This index number represents the score of each dish which is added to its total score. Eventually the dish with the highest score best suits the user’s diet. It contains the highest amount of nutritions of which the user is recommended to eat more, and the least amount of nutritions of which the user is recommended to eat less. This ranked list of recipes is the input for the function “createOutput”.

6.2.7 Output

Figure 9: Output

In the demo, the output consists of four sections. The first section shows which nutritients are encouraged and discouraged regarding to the age of the user. The second section shows the drug the user takes combined with the related food interactions from DrugBank. The next section states which ingredients are not

(20)

appropriate for the user and which dishes from the restaurant contain those ingredients. This can be based either on age or drug intake. The last section shows the user’s best matching dish and a list where the remaining dishes are ranked from best to least suitable. In Figure 9 a possible output of a user is shown.

7 Conclusion

The research question in this project was: “How can user information together with data about food recommendations and restaurant menus interoperate in order to serve a personalized dish?” This question was divided in different sub-tasks that had to be executed. First of all, a model of the domains was required. This is achieved by the interviews with domain experts and has resulted in an appropriate model to base the scenario on and to specify relevant data sources. The Sementic Web was explored to find these relevant databases which has led to the databases DrugBank and the Composition of Food Integrated Dataset. The Semantic Web appeared not to be extended enough to provide more rele-vant data sources.

Therefore the next task in this project was to extract data from the databases that were not structured in RDF. This was the case with the Excel formatted CoFID dataset which is therefore converted into an RDF structured dataset with the aid of a data-cleaning tool. As a solution for missing data, a general simulation of a relevant knowledge base has been made up based on the infor-mation provided during the interviews.

With enough datasets from different knowledge bases available, it was neces-sary to determine how to integrate the data. The DrugBank database provided a SPARQL endpoint which could be used to run queries on. The other datasets were uploaded on a local triple store that made it possible to query them as well. Finally a demo of the application is created that has proven that it is pos-sible to integrate different knowledge bases in order to serve a personalized dish, and therefore provides an answer to the main research question. The demo was not only capable of showing the best matching dish, all other dishes from the restaurant were ranked from best to least suiting the person’s condition. More-over the application provided information of why the selected dish was chosen as best matching the user.

8 Discussion and future work

The first part of discussion is about the model of the domains. Since there are only three domain experts contacted, it might be possible that there exists more aspects that influence a personalized meal than defined during the interviews. Also, since not all subject and relations in the model are confirmed, there might be subjects or relations neglected or misplaces. Future work could re-investigate the model of the involved domains. The next point of discussion is about the search for data sources. The World Wide Web is very large which means that

(21)

there might exist organisations which provide relevant data sources but are not found in this project. Future work could investigate this assumption as well.

Another point of discussion is that the application for the proof of concept makes use of simulations of datasets. While the simulations are made up in a generic way, they can not proof that the application is also compatible with real data. Moreover, the DrugBank database is in the proof of concept only used to look up if a drug has ”avoid alcohol” as an existing food interaction. The real application should check for all interacting ingredients. However, this will prob-ably not be easily possible in an efficient way since the statements about the food interactions provide information about positive and negative food interac-tions which can be easily interpreted by human, but not as straightforward by machines. This might not only provide problems with the DrugBank database but also with other databases. Even though databases could be formatted in a structure that could be semantically interpreted by a machine, it does not include that the objects of properties could be semantically interpreted as well. Another discussion point about the DrugBank database is that the server is regularly down which has as consequence that the application does not work. If the real application will be created, it is necessary to find an appropriate solution.

In the proof of concept the diet consists of the three lists: encouraged nu-trients, discouraged nutrients and forbidden ingredients. However it would be better to have exact recommended maximum and minimum values. This will make it possible to do statements about if the dish really fits someones personal condition. In the application now, even the best matching dish could be un-healthy for the user. Future work could find out if these exact values could be examined by the integration of different knowledge bases.

Furthermore, it would be of great value for this application if future work could investigate how more structured data could become publicly available. Moreover, it should investigate whether this application would be appreciated and used in society. When this is achieved and the last investigation resulted in that it might; future work should realise this application. The proof of concept could serve as a base but it has to be extended in many fields. A few examples are: it must be able to be used on a portable device such as a tablet or mobile phone; more data need to interoperate and be proved reliable; and restaurants must be willing to cooperate with the project and serve the dishes as they are described in their recipes. Also an algorithm has to be created that is capable of calculating the total nutrition values of every dish, taking the cooking method into account, since this can also influence the composition of the food.

(22)

References

Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The semantic web. Scientific american, 284 (5), 28–37.

DuCharme, B. (2013). Learning sparql. ” O’Reilly Media, Inc.”.

IBM. (2015). IBM chef Watson. https://www.ibmchefwatson.com/community. (Accessed: 2016-20-04)

Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., . . . others (2015). Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6 (2), 167–195.

Max Schmachtenberg, A. J., Christian Bizer, & Cyganiak, R. (2014). The Linking Open Data cloud diagram. http://lod-cloud.net/. (Accessed: 2016-20-04)

openRefine. (2016). openRefine. http://openrefine.org/. (Accessed: 2016-23-06)

Pollock, R. (2004). Open Knowledge International. https://okfn.org/. (Ac-cessed: 2016-23-06)

Squires, R. B., Noronha, J., Hunt, V., Garc´ıa-Sastre, A., Macken, C., Baum-garth, N., . . . others (2012). Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and other respiratory viruses, 6 (6), 404–416.

Willett, W. C., et al. (1994). Diet and health: what should we eat. Science, 264 (5158), 532–537.

Williams, A. J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E. L., . . . others (2012). Open phacts: semantic interoperability for drug discovery. Drug discovery today, 17 (21), 1188–1198.

(23)

Appendices

A

Nutrition recommendations Bita Sedaghati

Calories and serving size is clear that is a very crucial factor for every individ-ual and especially fat people who wish to lose some weight.

Saturated fat: This factor is basically considered for people with hyperlipi-demia (high blood fat) beside every individual who cares for saturated fat intake which has been proved to be related to the increased rate of cardiovascular disease and heart attack.

Cholesterol: is another critical factor for people who suffer from hyperchlorsr-tolemia as well as elderly people and people with limited cholesterol intake in their diet.

Sodium: Especially for people with high blood pressure and with heart attack history.

Fibers: Important for maintaining the gut motility and general health, in this regard it is important during pregnancy and for elderly people. Fiber which are healthiest form of carbs also play an important role in filling the stomach thereby are widely used by people who wish to lose weight.

Sugars: Should be limited in diabetic people diet.

Protein: For general health and muscles function, mainly needs to be consid-ered in athletes and highly active individuals, kids in their growing up ages and during pregnancy.

Calcium: Another important element that is very important to people with low bone densities (especially women after menopause) and again kids in their growing up ages and during pregnancy.

Iron: Especially for people who suffer anemia and many other blood-related diseases. Iron deficiency is more likely to be seen in women (especially due to menstruation). Blood’s Iron required to be checked for women regularly and during pregnancy. Babies and kids in their growing up ages need to take enough Iron based on daily recommendations.

(24)

Interactive Restaurant Menu Semantic Web technology for personalized meals