Information Graphs Modelling Patients

(1)

University of Amsterdam

Faculty of Science

Thesis Master Information Studies - Business Information Systems

Final version: August 10, 2015

Information Graphs Modelling Patients

Stefan Paap

10288279

Supervisor: Dr. H. Afsarmanesh

Daily supervisor: M. Shafahi

First examiner: Dr. H. Afsarmanesh. Signature:

...

Second examiner: Dhr. dr. T.M. van Engers. Signature:

(2)

2 Related work 7 3 Approach & Method 9 3.1 Questionnaire . . . 9 3.1.1 Rules . . . 9 3.1.2 Procedure . . . 10 3.1.3 Implementation . . . 15 3.2 Statistical analyses . . . 17 3.3 Modelling . . . 18 4 Results 18 4.1 Decisions . . . 19 4.2 Demographics . . . 19 5 Statistical analysis 20 5.1 Social Chat Model . . . 23

6 Discussion 31 6.1 Hypotheses . . . 35

7 Conclusion 36 References 37 Appendix A - Systematic Literature Review 41 References 50 Appendix B - Results 52 Implemented significant results . . . 52

Other significant results . . . 53

Tables of nationalities and study fields . . . 54

Appendix C - Models 57 General research model . . . 57

Family related situation model . . . 61

Diagnosed model . . . 64

(3)

Abstract

Diabetes mellitus type 2 is a disease of which people can be at risk of developing without their knowledge. The disease does not present symptoms, people ignore them or do not have the medical knowledge to identify the symptoms. The aim of this research is to develop a data model for a smart wizard software that provides users with an indication of the users’ current risk level on developing diabetes type 2, without o↵ending them by asking about personal details they are not willing to share. To be able to this, it is essential to know how willing people are to provide data. To gather data, a questionnaire was created, containing three types of questions, focusing on nine risk factors of diabetes type 2. The questionnaire was conducted among 105 respondents. The results from the questionnaire were used to discover di↵erences regarding demographic information. Several significant results have been found and modelled. In total, four models have been created, each representing the user specific flow for a di↵erent type of scenario: social chat scenario, general research scenario, family related situation scenario and a diagnosed scenario. These four models contain di↵erent user flows, based on the sex and nationality of the user.

(4)

1 Introduction

Currently, people could be at risk of developing diseases without their knowledge. The disease could present symptoms that people ignore or are not aware of, or does not present symptoms unless it is too late. A disease that is developing without the knowledge of people is unwanted and harmful. But, how can people understand that they are at risk when they do not experience any symptoms or such?

An example of a disease that can develop itself without symptoms is diabetes mellitus type 2. Diabetes is a chronic disease that causes higher levels of sugar (glucose) in your blood. Diabetes can result in complications such as heart dis-ease and stroke, obesity, high blood pressure, blindness, kidney disdis-ease, nervous system related diseases and more (Zhang & Zhao, 2013). According to the World Health Organization, it is estimated that in 2014, 9% of the people aged 18 or higher have diabetes and 90% of them are diagnosed with diabetes type 2 (Assal & Groop, 1999). In 2012, 1.5 million deaths were directly caused by diabetes (World Health Organization, 2014). In 2009, diabetes was listed as number 7 of the top 10 leading causes of death in the US (Center for Disease Control and Prevention, 2012). These numbers are telling us that diabetes is a very serious issue. But how does someone develop diabetes type 2?

Diabetes type 2 can be developed during any stage of someone’s life. Several risk factors are known, which vary a lot: someone’s lifestyle can influence on the risk of developing diabetes type 2, but also DNA can be of influence. So, people can be at risk of developing diabetes without their knowledge. Not exercising and consuming unhealthy food might not have a direct (negative) influence on someone, but it increases his/her chance of developing the disease.

This possible development of diabetes is something we want to make people aware of. If it is possible to get an insight on your risk level based on information you provide about your current habits and personal information it would be very useful. Early detection can reduce the burden of complications of diabetes, if treated (Diabetes Prevention Program Research Group, 2002).

Currently, if someone wants to know if he or she is at risk of developing a disease, an appointment with the general practitioner has to be made. This is often a time consuming process which can be frustrating. So, in order to provide more comfort, it would be pleasant to provide users with an indication based on in-formation provided online.

There are online quizzes that provide users with information about diabetes, but these are often only based on e.g. sex. If you select that you are a female list, you are provided with all risk factors and symptoms specific for the female sex. But, it would be more useful to provide someone with a clear indication of his/her current risk level. We aim to develop a data model for a smart wizard software that provides users with an indication of their current risk level, with-out o↵ending them by asking abwith-out personal details they are not willing to share.

(5)

answers. By doing this, the software creates a user specific flow through the process of the software, helping the user to make the process, for example, easier, faster or clearer. For this research, the software should focus on the order in which the questions are asked as well as which questions might o↵end users.

To understand what data is of importance for diabetes type 2, one must know the risk factor for this disease. According to the Mayo Clinic (2014) and the Dutch National Institute for Public Health and Environment (Rijksinstituut voor Volksgezondheid en Milieu, 2013), diabetes mellitus type 2 include the following risk factors: age, BMI, ethnicity, family history, genetic factors, diet, smoking behavior, prediabetes, gestational diabetes (pregnancy diabetes) and (in)activity.

In order to be able to provide the user with an indication of his/her current risk level of developing diabetes type 2, we are interested in collecting biomarkers. Biomarkers are aspects of the body that you can measure in order to understand, for example, your current condition. So, as an example, obesity is a risk factor of diabetes. However, obesity is not something you can measure; your BMI (Body Mass Index) on the other hand is something that can be measured, namely height and weight.

Besides biomarkers, we are also interested in the socio-economic aspects. Next to focusing on where the data can be found, we focus on factors such as money, time and e↵ort since they are also factors that can play an important role on the willingness of someone to share data. Therefore, these socio-economic aspects are also taken into consideration.

1.1 Research question

To be able to develop a smart wizard software, which creates a user specific flow without o↵ending them by asking for information they are not willing to share, the need to gather data about the willingness of people to provide data is present. Also the willingness of people to share data for di↵erent scenarios is a topic of interest and is investigated. The goal of this research is to provide insight in what information can be realistically gathered and where it can be gathered, for di↵erent types of scenarios. With this information, a data model is created that can be used for the development of a smart wizard software. This research will contribute to other projects that aim to provide user with indications based on their current situation and behavior. We are interested in knowing if people are willing to share their data in order to be provided with indication of their current risk level. And, if they are willing to share it, where can we get that data?

For this research, the following research question has been developed:

What are the requirements to construct a dynamic model of a patient based on personal data?

(6)

To be able to answer the research question, the research question has been divided into two sub questions and sub-sub questions, which are as following:

1. What are the sources for gathering data related to a patient model? 2. What are the socio-economic barriers in gathering personal information?

(a) What can be asked to users without o↵ending them?

(b) In what order should we ask for information without o↵ending users? For the second sub question, we want to know and understand three di↵erent matters:

1. If people are wiling to share information about risk factors 2. How people are willing to share that information

3. How frequent people are willing to share (certain types of) information Several hypotheses have been created that will be tested:

1. We expect that most respondents are willing to provide information in a family related situation and in a diagnosed situation for all covered risk factors.

2. We expect that manual submission of data through an online form is the most selected mean of providing information.

3. We expect no significant di↵erences between male and female respondents regarding all responses they provide.

4. We expect that Dutch respondents have no significant di↵erence with the rest of the world regarding all responses they provide.

5. We expect that the youth is more willing to provide information than adults

6. We expect that people active in the field of Information Studies are less willing to provide information through Electronic Health Record or Social Network than others.

7. We expect that people who know someone close to them who is diagnosed with diabetes are more willing to provide data in family related and di-agnosed situations compared to people who do not have someone close diagnosed with diabetes.

(7)

2 Related work

To create a smart wizard software as discussed in the introduction, it is essential to know what questions can be asked to the user to gather information. To provide the user with an indication of its current risk level, he or she needs to provide the information that is needed for an indication. In order to retrieve information about the diabetes’ related risk factors, it is essential to know where to find information about these factors. Manual submission is one of the means to provide information, but certain types of information can be hard to provide by hand, or manual submission of the could not be wanted, due to possible larger e↵orts that have to be made. One of the possible means of providing information is through the user’s electronic health records. Electronic health records contain all medical and treatment information and history of patients.

To store medical data, general practitioners and hospitals maintain files about each patient regarding information, such as personal data and medical conditions. These filing systems used to be paper based, located at one location, usually bound to the hospital where someone was treated. If that same person went to di↵erent hospitals, it was very likely to have other medical files at those hospi-tals. This is obviously a very ine↵ective and inefficient way of keeping track of someone’s medical history.

With the introduction of electronic medical files, created and stored in digi-tal systems, it became more easy to share medical information. And with the introduction of electronic health records, it became possible for hospitals to electronically share medical information about patients including more control for patients over their own files. Although research in 2009 done by Jha et al. (2009) shows that US hospitals still struggle to implement the electronic forms

of health records (9.1% make use of a basic EHR system), a study in 2014 by Adler-Milstein et al. (2014) shows that at least 50% of the US hospitals make use of a basic EHR system. But not only hospitals are adopting these new types of electronic record systems. Also other medical professions, such as physicians, show an increasing trend of using an EHR system (Xierali et al., 2013). These studies show that EHR systems are becoming more popular and the use of it will continue to rise in coming years.

Since these electronic health records contain medical data about the patient, information about risk factors are likely to be found in these files. Hivert et al. (2009) used information stored in electronic health records to find patterns to

identify people with metabolic syndrome putting them at larger risk of becoming obese and develop diabetes. This study shows that the electronic health records can be of importance and prove themselves to be useful, especially for techniques such as data mining and big data to find new, useful patterns and risk factors of diseases.

To understand what questions can be asked to the user of the smart wizard software without o↵ending them, we need to investigate the willingness of users to provide information about risk factors to the smart wizard software. Other

(8)

studies have investigated the willingness of users to share information in general for a certain cause. Teixeira et al. (2011) have investigated the willingness of HIV patients to share personal health information with others. They found that the vast majority (84%) was willing to share this data electronically, but the individuals who would receive the shared data had to be involved in the care of the patient. Fewer patients (39%) were willing to share it with non-carers electronically. Another study by Beckjord et al. (2011) looked into the willing-ness of cancer patients to exchange personal related information electronically. They compared participants who are diagnosed with cancer against participants who are not diagnosed, and found that people who are diagnosed are much more willing to share this information to help others. A research by Pickard & Swan (2014) concludes that there is a strong willingness of people to share personal data in order to “enable next generation healthcare services, ultimately leading to improved health outcomes for all”. These findings show us that people are willing to share personal information electronically, which is also an important requirement for the smart wizard software. Although there is no guarantee that the information required by the model is present in the user’s electronic health record, it can be useful to include this option for them. The mean of providing information is therefore taking into account to discover if users are willing to share information through it.

Many studies have done research on individual risk factors in order to un-derstand if there really are di↵erences that can be related to a higher risk level of diabetes type 2, such as BMI (Goran, Ball, & Cruz, 2003; Meigs et al., 2006; Tirosh et al., 2011), diet (Hu et al., 2001; Montonen, Knekt, J¨arvinen, & Reuna-nen, 2004) and ethnicity (Lindquist, Gower, & Goran, 2000; Harris, 2001). But, none of these studies focus on indicating the users with their current risk level of developing diabetes type 2. The studies focus on a certain type of risk factor and their possible di↵erences. There are, however, tools that can predict someone’s risk level of diabetes, such as the Diabetes Risk Score created by Lindstr¨om & Tuomilehto (2003). This score uses information about certain risk factors such as age, BMI and diet to calculate a score that indicates the risk level op developing diabetes type 2.

What this research aims to do di↵erent, is to find out if people are actually willing to share this kind of information. Based on this information, future applications to provide such risk scores, can focus on scoring users with the knowledge that certain information might not be available.

An additional systematic literature review has been performed for this re-search, in order to be provided with the current state-of-the-art. This systematic literature review has been attached in appendix A.

(9)

3 Approach & Method

As stated in the introduction, the goal of the research is to create a model for a smart wizard software that asks the user about information of risk factors of diabetes, without o↵ending them by asking for information that they are not willing to share. To gather data about the willingness of people to share infor-mation about risk factors of diabetes, we did a questionnaire. This questionnaire focused on the willingness of people to share information in di↵erent scenarios. Also the means through which the data can be submitted is of interest, such as providing it manually or by granting access to his/her Electronic Health Record. Finally, for certain risk factors, the frequency in which people are willing to share this information is taken into account, in order to have the most accurate and up-to-date indication.

Besides gathering information, the questionnaire should also create awareness about the diabetes disease and its risk factors. By providing the participant with additional information, awareness was created since it is important for social context as well as contribution for the society.

To ensure that the questionnaire was quality wise sufficient as well as valid, clear and complete, a set of rules were created to which the questionnaire had to comply, as well as a procedure to update the questionnaire to improve. After each round, the updated questionnaire was submitted to the rules. If changes were required, they would be applied and the questionnaire was tested again. When the questionnaire was finished, it was rolled out and spread both online and o✏ine. A period of 2 weeks was set as gathering time.

3.1 Questionnaire

The aim of the questionnaire was to provide participants and others with public awareness as well as collecting data for this research. Before developing this questionnaire, a set of rules to which the questionnaire had to satisfy was created.

3.1.1 Rules

• The time needed to answer all questions should be approximately 5 minutes; • The questionnaire should not contain more than 30 questions;

• Only submissions that are complete are acceptable;

• Two languages, English and Dutch, should be available to appeal as many people as possible;

• The questionnaire should be completely anonymous to fill in;

• Questions that allow only 1 answers should contain only 1 answers, other-wise that submission is invalid;

(10)

3.1.2 Procedure

In order to develop a logic, solid and valid questionnaire, an update cycle was used to improve the questionnaire.

For the initial version, we started with the following concept: for every risk fac-tor of diabetes, what is the willingness of the participant to provide information in two di↵erent scenarios. And after questioning their willingness of providing their data, what e↵ort are they willing to make to provide that data. And, are they willing to provide their data through devices that they have to buy, hence pay money for the ability of sharing data?

The risk factors as presented in the introduction were used for this question-naire. However, as discussed in the introduction, some risk factors cannot be measured and thus we are interested in the biomarkers. Instead of someone’s BMI, we are interested in the weight and height and instead of someone’s genome factors regarding diabetes, we are interested in its genome sequence, so that re-search can focus on the specific parts needed to determine the genome factors regarding diabetes.

Someone’s active behavior is also a risk factor of diabetes. Most questions about risk factors can be answered strict and with equal perception, but if someone has to answer a question about its activity, people can have di↵erent perceptions on what is being really active or not (Shephard, 2003), making their provided answers unreliable (Prince et al., 2008). We have investigated to incorporate someone’s activity by using information about a certain activity, such as cycling. Cycling is a very popular and common way of transportation in the Nether-lands and it is proven to be a valid source to measure activity in the NetherNether-lands (C. Baan, Stolk, Grobbee, Witteman, & Feskens, 1999). However, since this study targets a more diverse population, including non-Dutch people, cycling cannot be used as a measurement for activity since people from other countries might be active in di↵erent manners. Therefore, the risk factor (in)activity was not incorporated in the questionnaire.

In total, 9 risk factors/biomarkers were included in the questionnaire: age, weight, height, ethnicity, family history, genome sequence, diet, smoking behavior and prediabetes/pregnancy diabetes.

The two di↵erent scenarios used for the willingness of data were as following: 1. How willing are you to provide data x for any general purpose?

2. How willing are you to provide data x in order to get an understanding of your risk level op developing diabetes type 2?

For each of these scenarios, we chose a mean of providing this information. So, to that scenario, we added one of the following means (if applicable for that risk factor):

(11)

2. Through Social Media

3. Through your Electronic Health Record (EHR) 4. Through smart devices

For every risk factor, the means of providing information that applied on that type were selected. All risk factors are linked to both manually providing information as well as through his/her electronic health record (EHR). This is one of the assumptions that we made: basic types of information such as age, height, weight and ethnicity are available in someone’s EHR. If other impor-tant information is available regarding other risk factors, then its available in someone’s EHR: think of prediabetes or pregnancy diabetes. If this data is not available in someone’s EHR, we assume that the response for that risk factor is negative (e.g. no prior case of prediabetes).

Besides these two options, several risk factors had another option. Age can also be shared through social networks. Facebook for example requires a user to pro-vide its date of birth for registration, which can be extracted through Facebook’s API. The same accounts for the risk factor family history. Although its very un-likely to find information about diabetes of other family members on their social network, relationships between them can be extracted from these networks. By questioning the users about diabetes in their families, a more detailed profile can be created of the user: a parent is of larger influence than a far relative. Finally, information about pregnancy diabetes can be shared through social networks. If, for example, information about pregnancy is found on the user’s network, by e.g. analyzing their posts, a more specific question about their pregnancy can be asked.

For the risk factor weight another mean of providing information is presented: a smart scaling device. These devices, such as the Fitbit1_{Aria and Withings smart} body analyser2_{, are able to measure someone’s weight and wirelessly transmit} it to an external application.

With these means of providing information incorporated, the questions of the questionnaire were created. As an example, the risk factor age and a mean of providing information demonstrates what questions were generated:

1. How willing are you to provide your age automatically through social media for any general purpose?

2. How willing are you to provide your age automatically through social media in order to get an understanding of your risk level of developing Diabetes Type 2?

The idea was to answer each of these questions with a Likert scale (figure 1). 1_{https://www.fitbit.com/aria}

(12)

Figure 1: visualization of the proposed Likert scale, which is checked as an example Since age can also be derived from manual input, an EHR and social media, this generated 6 questions. To do this for all 9 risk factors and all the associated means of gathering the data, the total amount of questions came at more than 50. And when adding questions about costs and e↵ort to gather data about these risk factors, an additional 30 to 40 questions were added to the questionnaire. Since this did not comply with the rule of having 30 questions or less, this concept of the questionnaire was rejected. Also, although not tested, a completion time of approximately 5 minutes would have been highly unlikely for a projected amount of 90 questions.

Therefore, we chose for a new approach that did not require an answer for every separate scenario. The idea was to ask the participant in which scenario he/she was willing to share information about a certain risk factor of diabetes. The participant could select four di↵erent scenarios that have been created for this questionnaire, with an increasing level of confidentially. The following scenarios were used:

1. Social chatting

Suppose you are walking on the street and someone walks up to you. You chat a little with this person, and then . . .

2. (General) research

Suppose you are walking on the street and someone approaches you. He explains that he is gathering data for research, and he then ...

3. Family related situation

Suppose that someone in your family has been diagnosed with diabetes. This puts you at a higher risk of being diagnosed with diabetes. For prevention purposes you visit the general practitioner (doctor), and ...

4. Diagnosed

Suppose you have been diagnosed with diabetes and you visit the specialist ...

Each scenario was fitted with an appropriate image to visually attach the partic-ipant to the scenario. The lines of texts provided with each scenarios represents the current situation. The three dots and the end of the sentences represent an open ending, to which the participant of the questionnaire can provide its answers.

The questionnaire contained three types of questions:

Providing information

The participant is asked about providing information about one of the risk factors. Figure 2 shows an example of this question.

(13)

Figure 2: example of the ’providing information’-type question

The participants can select none, one or more options. If one or more options are selected, a follow-up question (figure 3) appears on the screen. If the partici-pant decides that in none of the scenarios are applicable, he/she can proceed to the next question regarding question(s) about another risk factor.

Means of providing information

If the participant selects one or more scenarios, we ask him/her about how he/she is willing to provide that information. Multiple options could be selected. Figure 3 shows an example of the question.

Figure 3: example of ‘means of providing information’-type question If applicable: frequency of providing information

For certain questions, such as the example presented in figure 3, another follow-up question could present itself. Certain risk factors, such as weight, diet & smoking behavior can frequently change and therefore have an important influence on your current risk of developing diabetes. So, we are interested in how often participants are willing to share that information in order to get an insight on their development of diabetes. Figure 4 shows an example of a follow up question.

(14)

Figure 4: example of ’frequency of providing information’-type question Not every ’means of providing information’-question had a follow-up ques-tion for frequency. For example, length, ethnicity and your genome sequence are factors that do not change. Therefore, if you provide it once, you do not have to provide it again.

For every type of mean of providing information a separate question for the frequency of sharing information. So, to use the example of figure 3, if both ‘manually filling in an online form’ and ‘through a smart scaling device’ were selected, two extra questions appeared: one question for each type of mean. This was done because there could be a large di↵erence between filling in an online form or transmit data automatically with a smart scaling device regarding e↵ort as well as cost of doing this. Manually providing the information is more work than stepping on a scale that automatically transmits the data to the applica-tion, but buying such a scaling device is also a factor to consider. Therefore, we gave the participant the option to be as accurate with his/her answers as possible.

Each of the nine risk factors of diabetes used for this research and the as-sociated questions were located on separate web-pages, each following in order. Besides the pages of the nine risk factors, we had two other pages that were placed before the questions. The first page contained demographic questions, such as ‘what is your sex’, ‘what is your age group’ and ‘do you know someone with diabetes’. We were interested in possible di↵erences between demographic groups such as the willingness of people that know someone close to them diag-nosed with diabetes: are they more willing to provide information since they can experience what it is like to live with diabetes (Beckjord, Rechis, Nutt, Shulman, & Hesse, 2011)?

The ages that could be selected were divided into three groups: youth (18-24 years), adult (25-64 years) and senior (65+ years), based on the Canadian age classification standard. The minimum age to participate was therefore set at 18. After the demographics page, a page with an explanation of the scenarios used in the questionnaire followed, including a question if the participant had read the scenarios. In total, eleven web-pages with a maximum of 30 questions were presented. Less questions could present themselves if the participant did not select certain answers, allowing follow up questions not to appear.

(15)

language of the questionnaire was English only at that moment. The purpose was to test if the setup was correct, e.g. to verify the questions, the question formulation and availability of answers. Feedback was gathered and processed to improve the questionnaire.

Table 1: overview of the response time of the sample group Time

Average time 6 minutes, 19 seconds

Median time 5 minutes, 21 seconds

The second step was to create a small sample group of 10 participants. The language of the questionnaire was English only at that moment. The sample group had several purposes. The first one was to verify if the answers they provided were expected: not regarding the predictability of their personal answers, but to check if they understood the questionnaire, i.e. select multiple scenarios as an answer if applied. Their response time was checked in order to verify if the questionnaire satisfied the rule “the time needed to answer all question should be approximately 5 minutes”. Table 1 provides an overview of the completion time. Two participants took almost twice as long as the other eight and after contacting the members of the sample group, they explained that a phone call interrupted their session, making it longer than projected. Since the median satisfied the rule set for this questionnaire, it was agreed that the length as well as the time needed to complete the questionnaire was satisfying. Next, every member of the sample group was contacted and asked for feedback. Each question was discussed as well as the setup of the questionnaire. Overall the feedback was positive and some participants had useful comments. For example, the term ‘genome sequence’ was not very clear, as well as prediabetes and pregnancy diabetes. A reason for this could be because of the English-language barrier whilst the sample group contained Dutch respondents only, but to be as clear as possible about the risk factors, additional links to external web-pages were added with more information about genome sequences and pre- and pregnancy diabetes. Also the images of the scenarios were upgraded to better quality images. Finally, members of the sample group provided grammar suggestions. After the suggestions and comments of the sample group were processed, the questionnaire was ready to be rolled out to all participants. At this moment, the questionnaire was translated to Dutch.

3.1.3 Implementation

To create the survey, LimeSurvey3_{open source software was used. LimeSurvey} provided several interesting features. One of them was that the software could create an o✏ine, paper based version of the online questionnaire with one click of the mouse. It included all logic, such as follow up questions, explained as written text between the questions. An example of the English paper-based questionnaire

(16)

is attached in appendix D. Also multilingual survey support, time-tracking of the duration of the questionnaire and export functions to CSV, Excel and more were useful features that the LimeSurvey software provided. And since the software is open source, it could be downloaded and installed where preferred. The software, a PHP web application, was installed on a private web server and made available through a personal website4_{. To ensure the privacy of the participant, secure} connections were created through a SSL connection (figure 5). A SSL connection is an encryption protocol that secures communication on the internet. Whenever a user submitted his/her answers, the data that would be send to the servers was encrypted so that if intercepted on the internet, it would be secure.

Figure 5: example of Google Chrome’s visual representation of the secure SSL connec-tion with the website

To make the web address easier to access, a tiny URL was created5_{. This led} the participants directly to the start page of the questionnaire (figure 6). The start page provided the participant with information about the context of the questionnaire as well as an estimated completion time and information on how their information was processed and handled: anonymous. Participants could also choose their preferred language: English or Dutch. The default setting was English.

Figure 6: introduction page of the questionnaire

When the participants finished, they were presented a screen (figure 7) with a thank you note, as well as contact information if they were interested in the

4_{https://www.stefanpaap.nl/bis/} 5_{bit.ly/bisresearch}

(17)

results and a page with information on diabetes type 26_{. This page was provided} in order to create more awareness as well as inform the participant about diabetes mellitus type 2 and its risk factors.

Figure 7: final page of the questionnaire

The questionnaire was spread both online and o✏ine. People were contacted by phone, email and social media and asked to fill in the questionnaire online. Also paper sheets with small strokes containing the tiny URL that could be torn of the sheet were spread throughout the faculty building Science Park of the University in Amsterdam. After two consecutive weeks of gathering information, the questionnaire was closed.

3.2 Statistical analyses

The data gathered from the questionnaire was analyzed, to find significant dif-ferences that can be implemented in the model for the wizard software. For the analyses of the gathered data, Excel and SPSS were used. Initially, the data was loaded into Excel to create bar graphs of all available data and provide a visual representation of the data. The data gathered, was presented in an excel sheet. An example of how the results were structured can be found in figure 8. Each of the scenarios is presented as a new column and the data is set as a binary digit: 1 if the participant selected the scenario, 0 if he/she did not. Thus, because of this 0 or 1 situation, in order to find significant di↵erences, we used the binary logistic regression method in SPSS. And if a significant di↵erence was found, the Crosstabs method was used to discover what caused the di↵erence.

To discover important di↵erences that can be used for the software-model, dif-ferent demographic groups were compared against each other using SPSS. For every column (as depicted in figure 8) that contained a binary structure, a binary logistic regression calculation was performed. In total, 58 columns were used for each calculation, providing 58 values, representing a significant di↵erence or not. This was done for every type of demographic, such as sex and nationality.

(18)

Figure 8: example of the structure of the results as provided by LimeSurvey With the significant results found, nonparametric tests were conducted (the related-samples McNemar test) in order to find related results. So, for example, if someone said that he is willing to share information through his electronic health record, is he also willing to do it through a smart device?

A significance level of p <0,05 was used to indicate significant di↵erences for all methods.

3.3 Modelling

To model the user specific flows, we used the modelling language Business Process Model and Notation (BPMN) 2.0. The online BPMN editor Signavio was used, which allowed us to create BPMN models and download them as PDF files.

4 Results

A total of 105 completed submissions have been gathered. The questionnaire had a completion rate of 90,5%. In total 116 responses have been registered, but 11 were incomplete:

• 5 times the questionnaire was opened but no answers were registered • 5 times the questionnaire was opened, partly answered and not finished • 1 time the questionnaire was opened, fully answered but not submitted

Table 2: overview of the response time of the respondents Time

Average time 6 minutes, 46 seconds

Median time 5 minutes, 22 seconds

We did not include any of these incomplete responses, including the com-plete response that has not been submitted. Table 2 provides an overview of the response times. This average response time was larger than projected (5 minutes), but after investigating the time statistics, responses were found with high response times (>25 minutes). The median however was 5 minutes and 22 seconds, which is satisfying regarding the rule set for this questionnaire. All responses were gathered online.

(19)

4.1 Decisions

4 results have been excluded from analyses. These 4 participants only selected one scenario for each question, whilst it was the intention to select all applicable. It could be possible to assume that they would have also selected the other scenarios that are ranked as more personal, but this cannot be stated with certainty. Since these result could act as outliers, they have been excluded from analyses.

4.2 Demographics

Age

Table 3 provides an overview of the age groupings of the respondents. Table 3: overview of number of respondents per age group

Age group Number of_respondents

Youth (18-24 years) 39

Adult (25-64 years) 58

Senior (65+ years) 4

Sex

Table 4 provides an overview of the sex of the respondents.

Table 4: overview of number of respondents per sex

Sex Number of

respondents

Male 45

Female 55

Not willing to share 4

Nationality

In total 17 di↵erent nationalities have been registered. Because of the diverse pop-ulation of non-Dutch people, a simplified poppop-ulation regarding their nationality was created (table 5): Dutch (75 respondents) and non-Dutch (26 respondents). The reason for this large Dutch population is because the researcher’s social environment is based in Netherlands and thus the larger part of the people that were approached for filling in the questionnaire was Dutch or residing in the Netherlands. A complete overview of all nationalities can be found in appendix B (table 26).

(20)

Table 5: overview of number of respondents per nationality group

Group Number of respondents

Dutch respondents 75

Non-Dutch respondents 26

Field of study

A diverse population regarding the respondents’ study fields was gathered. In order to do some useful analysis with these results, the results were divided in four groups (table 6). Group 1 contained all Computer & Information sciences studies (23 respondents); group 2 contained Biological & Health Sciences (24 respondents); group 3 contained Engineering, Physical & Social sciences and Psychology (22 respondents); group 4 contained all other study fields (32 respon-dents). A complete overview of all fields of study can be found in appendix B (table 27).

Table 6: overview of number of respondents per study field group

Group Number of respondents

Group 1 23

Group 2 24

Group 3 22

Group 4 32

Know anyone diagnosed with diabetes

63 participants know someone who is diagnosed with diabetes, 38 participants do not know someone who is diagnosed with diabetes.

Someone close diagnosed with diabetes (friends, family)

Of the 63 participants who answered that they know someone diagnosed with diabetes, 48 participants indicate that it is (also) someone close to them, such as friends and/or family. 15 participants state that this is not the case. In total, this meant that 48 respondents know someone who is close to them who is diagnosed with diabetes, while 53 respondents do not.

5 Statistical analysis

The starting point for analyzing the gathered data was to create a graph contain-ing all the responses regardcontain-ing the willcontain-ingness of participants to share information about a risk factor in a certain scenario (figure 9). Every risk factor is assigned to a color. The height of each colored bar represents the amount of times the respondents selected the scenario as one in which he/she was willing to provide information about that factor. The numbers located in the bars represent the amount of times the scenario has been selected for that risk factor.

(21)

Figure 9: overview of all gathered data regarding the willingness to share information in di↵erent scenarios

As expected, an increasing amount of positive responses can be seen when the scenarios are becoming more personal. The diagnosed scenario is the most selected, with a 94,4% selection rate (859 times out of 909 times).

When looking at the means of providing the information, figure 10 shows that manually submitting information is by far the most selected option. Other options such as EHR and social network are (in most cases) a less selected mean of providing information. It must be noted that not every risk factor is applicable for every mean of providing information. The colors of the bars represent a risk factors and the numbers inside each section represents the amount of times the mean has been selected for that risk factors. Since the amount of times social network has been selected is small, the bar in the graph is rather small. Four risk factors had the option to have information shared through social networks: age (selected 8 times), family history (selected 3 times), diet (selected 9 times) and pre-/pregnancy diabetes (selected 3 times).

(22)

Figure 10: overview of the willingness to share information by using di↵erent means However, when looking at the numbers of genome sequence, we can see that 61 times manual submission has been selected and 63 times submission through your Electronic Health Record. For all other risk factors, manual submission has been chosen substantially more.

Finally, the frequency of providing certain types of information (weight, diet and smoking) can be found in fig 11. As can be noticed, the option Whenever I feel like is the most selected frequency. For the risk factors weight and diet, the second most selected frequency is monthly. For smoking, the second most selected frequency is the option only once.

(23)

Figure 11: overview of the frequencies that respondents were willing to share informa-tion

Several statistical di↵erences have been found and some of them are imple-mented in the models. All of the significant di↵erences that have been found are located in appendix B. To demonstrate how these significant values are imple-mented in the model, one of the models that is developed will be discussed in detail: the social chat model.

5.1 Social Chat Model

The model consists of three lanes, of which two lanes are most important: the software lane and the user lane. The software lane represents the action by the computer, such as asking a question, and the user lane represents the action by a user, such as providing an answer. The third lane represent an external service, which is used to retrieve data from external sources.

The order in which the questions are asked in the model is based on the willingness of people sharing their data. Figure 12 represents the order in which the respondents were most willing to share information about for the social chat scenario. This willingness is based on the results retrieved from the questionnaire. Age is most willing to be provided, whilst genome sequence is least provided. To not o↵end the user of the software by asking for something they are likely not willing to share right away, this order is used as the order in which the software

(24)

will ask its questions. By doing this, the user is asked questions he/she is likely to provide more first, after which a question is asked he/she is probably less willing to answer. The order presented in figure 12 is used for the smart wizard software as the order in which it should ask its questions.

Figure 12: the risk factors ordered from most selected to least selected for the social chat scenario

Now that the order in which questions are asked by the software has been set, the focus lies on what can and cannot be asked to the users regarding a higher chance of o↵ending them. To do this, statistical analyses are performed on the dataset gathered from the questionnaire. As stated in the method, a binary logis-tic regression has been conducted. If a significant result was found, a crosstabs method was used to get insight on what caused the significant di↵erence. Tables 7 to 11 present the significant results for the social chat scenario that have been found, including the conclusion that we can draw from the crosstabs table. Five values are present in each table: logistic coefficient (B), representing the expected amount of change each unit change in the predictor; the standard error (S.E.), providing an indication of the reliability of the mean of the population; the Wald coefficient (Wald), which is used to evaluate whether or not the logistic coefficient is di↵erent than zero; the df value which is standard 1 for this type of analysis; and finally the p value (Sig.) which represents a significant di↵erent between the di↵erent groups.

Demographic: sex

Age, social chat (table 7):

Significantly more female respondents are not willing to share their age

Table 7: results for age, social chat scenario

Risk factor B S.E. Wald df Sig.

(25)

Height, social chat (table 8):

Significantly more female respondents are not willing to share their height in a social chat

Table 8: results for height, social chat

Height 1,155 0,491 5,530 1 0,029

Weight, social chat (table 9):

Significantly more female respondents are not willing to share their weight in a social chat

Table 9: results for weight, social chat

Weight 1,443 0,420 11,809 1 0,019

Demographic: nationality

Age, social chat (table 10):

Significantly less non-Dutch respondents are willing to share their age in a social chat

Table 10: results for age, social chat

Age 1,022 0,519 3,883 1 0,001

Height, social chat (table 11):

Significantly less non-Dutch respondents are willing to share their height in a social chat

Table 11: results for height, social chat

Height 1,232 0,488 6,376 1 0,049

Tables 7 to 11 provide statistical information on the significant values found for three risk factors, of the social chat scenario: age, height and weight. For all three risk factors (age: Wald(1)=4.765, p=0.029, height: Wald(1)=5.530, p=0.029, weight: Wald(1)=11.809, p=0.019), women are less willing to provide information about it compared to men and for age (Wald(1)=3.3883, p=0.001) and height (Wald(1)=6.376, p=0.049), significantly less non-Dutch respondents are willing to provide information about these factors. A significance value of less than 0,05 counts as a significant di↵erence, and is modelled into the smart

(26)

wizard software model. As can been seen, all tables have a significance value of less than 0,05. The non-parametric tests were conducted on these results, but no useful results have been found.

The other demographics, as discussed in the results section, did not provide any statistical di↵erences. The demographics sex and nationality have di↵erences that can be implemented in the smart wizard software. Therefore, the software should ask the user for his/her sex as well as nationality first in order to decide what questions he or she should see or not see.

For the analyses of the sex demographic, the respondent who was not willing to share its sex was excluded. The population of people who were not willing to share their sex was too small to perform analyses on.

Figure 17 shows the complete model for the social chat scenario. As can be seen, the first two questions that the software asks are about the sex and nationality of the user. After that, as presented in figure 12, we ask the user about his/her age. However, as can be seen in tables 7 to 11, significant di↵erences have been found for this particular risk factor that can be modelled. This can be found in figure 13. Before asking the user about his/her age, two conditions are set: if sex is male, then he can continue to the next step. If sex is female, she can skip this question since women are less willing to share their age for this social chat model. After this condition, a new condition follows, based on nationality: if the user is Dutch, he is asked about his age. If not, the question is skipped.

Figure 13: close up of age

After age, the user of the wizard software is asked about his/her smoking behavior and ethnicity. All users are asked about this, since no significant di↵er-ences have been found for these factors.

After smoking behavior and ethnicity, the user is asked about height (figure 14). For this factor, the same conditions apply as does for age (figure 13).

(27)

Figure 14: close up of height

After height, the user is asked about his/her diet. After diet comes the next factor for which significant di↵erences have been found: weight. As can be seen in table 9, significantly more female respondents are not willing to share their weight in a social chat and thus this is processed into the model (figure 15).

Figure 15: close up of weight

After weight, the user is asked about the remaining factors: family history, pre- and pregnancy diabetes and genome sequence. Information about pregnancy diabetes is only asked to women, since this risk factor does not apply to men. A condition to check if the sex of the user is female, is implemented into the model.

Regarding the answers that the user has to provide, it can be noticed that almost all answers have to be provided manually. As can be seen in figure 10, manual selection is the most chosen option and this has been implemented into the model accordingly. The only exception is the genome sequence: since

(28)

there was almost no di↵erence between these numbers (61 manual submission to 63 through EHR), a non parametric test (Related-Samples McNemar test) has been performed on these numbers. For the hypothesis the distribution of di↵erent values across Genome sequence manual submission and Genome sequence EHR submission is equally likely, a value of 0,905 (p <0,05) was calculated and thus the hypothesis was not rejected. This means that people who are willing to share it manually, are also likely to do this by allowing access to his/her electronic health record. However, since the manual ability to provide information is a popular option to share information, the decision was made to provide the user with both options: share it manually or share it through your electronic health record access. Figure 16 shows how this is incorporated into the model. A XOR statement enables the user to select one of the options. If the user selects the option to share it through his/her electronic health record, a request to an external service is made, i.e. the service that holds the data of the electronic health record.

Figure 16: close up of genome sequence

Finally, the frequency of which data can be gathered is implemented in the model. As can be seen in figure 11, the option to share information whenever

(29)

the user feels like is the most selected option. For both diet and weight the second most selected option is monthly and therefore we modelled a reminder for these factors. This reminder is presented in figure 15. The user is send a monthly reminder. This can be a subtle email, stating that the user can update its information is he or she feels like doing it. If not, the next month after that a new reminder can be send. Other ways of implementing a reminder can be used as well, depending on future choices that have to be made.

The model ends with creating an indication of the user’s risk level and present-ing it to him/her. The instantiation of an indication was not part of this research.

It is important to state that the user has the ability to skip every question. This is not directly modelled in the BPMN models. However, information such as genome sequence is not very likely to be willingly provided in a social chat scenario and thus the user should have the option to not provide this information. If the user skips questions, this will a↵ect the accuracy of the user’s indication of its current risk level. The model is used as a generic, visual representation of the flow that is recommended.

For each of the presented scenarios in the questionnaire, a separate model has been created. The reason for this is that the software can be interpreted at di↵erent levels. Imagine you are on Facebook and see a link to a piece of software that can provide you with an indication of your current risk level of developing diabetes type 2. You are interested, but have no idea who is behind this software, what the data is used for, how it is analyzed and so on. This can be seen as the social chat model. But, if you are approached by a university for participation to a research, they explain their software, what it does, and so on, it can be seen as a general research model. And, if you are informed by your GP or doctor and they explain this software that they have available for you can provide you with an indication, it can be seen as a diagnosed scenario. You will most likely trust what they are doing with the data and are more willing to share certain information. Therefore, di↵erent models are created, based on the di↵erent scenarios. The remaining models of the general research scenario, family related situation scenario and diagnosed are located in appendix C.

(30)

(31)

6 Discussion

When looking at the results of the questionnaire, four results were found where the participant only selected one scenario for each question, whilst it was the intention to select all applicable. Presumably they did not fully understand what the expected answering pattern was and thus, to avoid these situations in future research, an example could be added to the questionnaire on how to fill in the questionnaire.

Besides these four unexpected responses, 33 other responses had unexpected re-sults: missing values where they were expected. As an example, one participant answered that he/she was willing to provide his weight during a social chat, for general research and in a family related situation, but not when he/she was diagnosed with diabetes. It is possible that this person chose to not select this scenario but it could also be possible that he/she miss-clicked the selection area of this scenario and did not select it. Since not all answers of these users show a recurring pattern of unexpected results (like the four participants removed from the results), the decision was made to maintain these results and use them for the analysis. However, for future study or use of the questionnaire, a solution for miss-clicking of scenarios could be in the form of an auto-fill of the forms: if you select for example social chat, the other three scenarios are automatically selected and the user can deselect the options that he/she would not like to select. This way, if a certain scenario is not selected when it is expected to be, it can be assumed with certainty that this was a choice of the participant. One aspect to keep in mind with the results of the questionnaire, is that people can state something on paper, but won’t actually do it in real-life. So, we can expect that less of the personal details will be shared.

Regarding the demographic data of the respondents, very few seniors (65+ years) participated: four. For this reason, no analysis with this particular age group was done. The same accounts for the diversity of non Dutch respondents. Since this research took place in the Netherlands and the social environment of the re-searcher was also located in the Netherlands, a high amount of Dutch respondents took the questionnaire. Also, the researcher could not approach large groups of foreigners and thus, the nationalities of the non-Dutch respondents were diverse: over 16 di↵erent nationalities besides the Dutch nationality. These 16 di↵erent nationalities are spread among 26 participants and so the populations of each of these nationalities were not sufficiently large enough to do analysis with. How-ever, for future research it would be interesting to investigate possible di↵erences between di↵erent nationalities, but also regions. The South America popula-tion has di↵erent diet patterns as well as lifestyles comparing to, for example, the Asia population. These di↵erences, that can have a influence on the risk de-velopment of that population, would be an interesting research for future studies.

The problem that rises based on the results of the questionnaire and models is that the BMI of women and non-Dutch users in the model of the social chat cannot be calculated. For women, both weight and height are not asked since they are not willing to provide it and for non-Dutch users, height is skipped.

(32)

The same accounts for the other models: the general research models lacks the height of non-Dutch users and the weight of women; the family related situa-tion model lacks the height of non-Dutch users; the diagnosed model lacks both weight and height for non-Dutch user. However, BMI is a very important factor to know, since overweight is the primary risk factor for diabetes type 2 for both adults (Field, Manson, Taylor, Willett, & Colditz, 2004; Koh-Banerjee et al., 2004; Hartemink, Boshuizen, Nagelkerke, Jacobs, & van Houwelingen, 2006) and children (Dabelea et al., 1998; Wei et al., 2003; Kitagawa, Owada, Urakami, & Yamauchi, 1998). So, in order to retrieve this information, the wizard software should state very clear that height and weight are used to calculate someone’s BMI, that the BMI is used to determine if someone is overweight or not, that overweight is the primary risk factor and can have a large impact on someone’s risk of developing diabetes. And that if the user of the software is not willing to share this information, the indication of its current risk level might become less accurate.

A solution for missing data, or data that people are not willing to provide, might find itself in the field of artificial intelligence. Khan et al. (2012) presented a framework that uses knowledge-based and learning-based approaches to fill in gaps where information is missing, in order to support a medical professional with its decision making process. These techniques can also be used for other fields, such as providing users with an indication of their current risk level of developing diabetes mellitus type 2. Such a solution can perhaps enhance the software’s indication accuracy of the risk level of the user. Future research is needed to implement such an artificial intelligence approach.

One other important aspect of the software is the renewed indication of a user risk level. If the user is willing to share his weight and diet on a monthly base to have an up-to-date indication, the software needs to store the other personal information of the user, making the other information available if needed. It is possible to let the user fill in all the information of all the risk factors again if he/she wants a new, updated indication, but manually providing the information can, understandably, be a time consuming process that the user might not want to experience once a month or perhaps even more often. Therefore, the storage of personal information is an important ethical aspect that is present when creating such a sensitive, personal information-based wizard software.

The results from the questionnaire in a social chat scenario show us that people are not willing to provide every type of information. However, when look-ing at the other scenarios, such as the diagnosed scenario (appendix C), it can be concluded that the only demographic factor of influence is nationality, for providing information about height, weight and ethnicity. Since the participants of the questionnaire did not know that their responses would be transformed into a model for a smart wizard software to provide the user with an indication on its current risk level of developing diabetes type 2 without o↵ending them, they might have answered more willingly to share the information because they know what the purpose of the software is. As Teixeira et al. (2011), Beckjord

(33)

et al. (2011) and Pickard & Swan (2014) have investigated, is that the willing-ness of people to share personal data in order help others or improve future medical help is definitely present. In these studies, the participant knew about the intentions of the research and thus anticipated on it. Since the participants in this research did not know the intention of the research, they might have provided answers that apply on the most general event. But, if they knew that, for example, their weight and height were required for their BMI and that their BMI is a very important indication for their risk level of developing diabetes type 2, they might be more willing to share this information.

To validate the results found in questionnaire, we compared them to other studies. Although no other studies are found that used a questionnaire approach to investigate the users’ willingness to share di↵erent types of information in di↵erent scenarios for the disease of diabetes type 2, several other studies have focused on a questionnaire approach to identify users with a higher risk level of diabetes. Herman et al. (1995) developed two classification trees. The first tree identified groups at higher risks based on risk factors of diabetes. The second classification tree did the same, but incorporated prior medical evaluation. Their results showed that the first classification tree (figure 18) performed as good as the second classification tree. This validates our approach to focus on informa-tion about risk factors of diabetes that are patient specific. When looking at figure 18, risk factors are present that are also incorporated in our model: age, BMI (obesity) and family history. Activity is also included in this classification tree, but as discussed in the method, is not used in our model. Finally, giving birth to a heavy baby is incorporated into the tree model of Herman et al. as a variable.

In this study, no child birth delivery-related questions were asked, and thus vari-able “delivery of heavy babies” could not be included in our analysis. However, the user was asked about pregnancy diabetes, but this is not dependent on the delivery of a heavy child.

Baan et al. (1999) investigated what data was essentially required to identify patients at a greater risk of developing diabetes type 2. They created three models. The first model was based on data gathered in files of a general practi-tioner, such as age, sex and absence of obesity. The second model used additional information by asking questions about family history and smoking. The third model incorporated medical data, such as diastolic blood pressure and systolic blood pressure. After research, they found very similar results for the first and second model, while rejecting the third model since the medical data had no additive predictive value. The researchers concluded that the they recommend the first model, since it uses less information that is usually available in the patient specific notes of a general practitioner. Comparing the factors of infor-mation that they present to be available in the notes, we find similar results: age, sex, presence of obesity (BMI) and pregnancy diabetes. They include three more factors: use of two types of medication and the prevalence of cardiovascular disease. These factors are not incorporated in our study, but other factors such

(34)

as smoking behavior and family history are. These factors are also used in the second model of Baan et al (1999). We can see similar risk factors that are used to identify diabetes.

Figure 18: classification tree developed by Herman et al. (1995)

When looking at the risk factor of one’s genetic factors, it must be noted that this can be complex material. Information about genome sequences are very detailed, specific medical data and most people do not know (useful) information about this. In order to provide information about your genome sequence, one must take genetic tests. At this moment, it is possible to read entire genome sequences from humans. Ashley et al. (2010) studied the possibility the assess a patient based on its personal genome. They found that whole-genome sequenc-ing yields as a useful method for individual patients, although future research is required due to the large field of study associated to genomes.

Thus, to use genetic information for indicating someone’s risk level of developing diabetes types 2, it is required for the patient to do genetic tests. This can be for a genome sequence in whole, but this can be expensive: $10007_{. The other} solution is to do tests for parts of the genome sequence, for example to know information related to diabetes. The information retrieved from these tests can then be added to the patients’ electronic health record, so it can be used for other research.

Another issue that must be mentioned is the adoption of EHR systems. Although 7_{http://www.illumina.com/systems/hiseq-x-sequencing-system/system.html}

(35)

more and more hospitals and medical professionals are the adopting the systems (Jha et al., 2009; Adler-Milstein et al., 2014; Xierali et al., 2013), it is possible that not all patients have an electronic health record, or have access to it. This makes it impossible for users to share information through their record, meaning that they allow access to the smart wizard software to gather data about their genome sequence.

For future research, the questionnaire approach can be used to create smart wizard software model for other diseases besides the diabetes type 2 disease. Other diseases to which a person can be at risk, such as kidney or heart diseases, can be used to develop smart wizard software models, based on the questionnaire approach. By asking about the risk factors that apply to a disease, one can get a better insight on the information people are willing to share in order to be provided with an indication on his/her current risk level.

A limitation of this study is the lack of similarity between the group sized of Dutch and non-Dutch respondents. As discussed in appendix C, the family related situation model and diagnosed model, have their limitations. Significant di↵erences regarding nationalities for height, weight and ethnicity, are present although the selection rates of the scenarios are very high (94 to 96 in total, out of 101 responses). Because of the 75 Dutch respondents to 26 non-Dutch respondents ratio, the non-Dutch population is a minority. We assume that the dissimilar ratio created these significant results, even when high selection-rates are present. To investigate this assumption and to eliminate the problem of dissimilar sized groups, more research is needed. Future studies can focus on doing the questionnaire with larger, more internationally diverse populations, as well as even-sized groups. This can provide more insight on di↵erences between nationalities as well as regions in the world.

Also the integration of the activity risk into the model can be investigated. Now that more and more (smart)phones are equipped with motion sensors and such, more people are provided with insight in their daily activity. This can be used for an equally perceived view of someone’s activity, with a more accurate infor-mation source for the risk factor of diabetes.

The next step in the process of providing the user with an indication of his/current risk level is to determine how the data from the smart wizard soft-ware can provide the user with an indication. Questions such as what data is essentially needed? and what accuracy can be provided for the indication? are

questions that need to be answered and for which more research is required.

6.1 Hypotheses

Finally, the hypotheses as introduced in the introduction are discussed. Based on the results gathered from the questionnaire, two hypotheses are confirmed: hypotheses 1 and 2.

(36)

1. We expect that most respondents are willing to provide information in a family related situation and in a diagnosed situation for all covered risk factors.

2. We expect that manual submission of data through an online form is the most selected mean of providing information for all covered risk factors. For the first hypothesis, when looking at figure 9, it can be concluded that these scenarios have the highest response rates rates. For both scenarios, almost every risk factor had a selection rate of more than 80%, confirming most respondents are willing to provide information in both scenarios. The second hypothesis is confirmed by figure 10. In total it is the most select option to share information and for almost all risk factor, it is the most select mean. The risk factor genome sequence is the exception, having the mean of sharing information through the user’s EHR as the slightly more preferred mean.

3. We expect no significant di↵erences between male and female respondents regarding all responses they provide.

4. We expect that Dutch respondents have no significant di↵erence with the rest of the world regarding all responses they provide.

5. We expect that the youth is more willing to provide information than adults

6. We expect that people active in the field of Information Studies are less willing to provide information through Electronic Health Record or Social Network than others.

7. We expect that people who know someone close to them who is diagnosed with diabetes are more willing to provide data in family related and di-agnosed situations compared to people who do not have someone close diagnosed with diabetes.

The other five hypotheses are not confirmed. The third and fourth hypothesis are rejected since di↵erences have been found for both sex and nationality and have been implemented accordingly in the scenario-based models, as discussed in the analysis section. For the last three hypotheses (5, 6 and 7) no significant di↵erences have been found that confirm these hypotheses.

7 Conclusion

The aim of this research was to develop a data model for a smart wizard soft-ware that provides users with an indication of their current risk level, without o↵ending them by asking about personal details they are not willing to share. As a first step in this process, we have created four models for di↵erent types of software scenarios that can be used to create a user specific software wizard, which aims at asking information of the user, without o↵ending him/her.

Information Graphs Modelling Patients

University of Amsterdam

Faculty of Science

Thesis Master Information Studies - Business Information Systems

Final version: August 10, 2015

Information Graphs Modelling Patients

Stefan Paap

10288279

Supervisor: Dr. H. Afsarmanesh

Daily supervisor: M. Shafahi

First examiner: Dr. H. Afsarmanesh. Signature:

...

Second examiner: Dhr. dr. T.M. van Engers. Signature:

Contents

1

Introduction

1.1

Research question

2

Related work

3

Approach & Method

3.1

Questionnaire

3.2

Statistical analyses

3.3

Modelling

4

Results

4.1

Decisions

4.2

Demographics

5

Statistical analysis

5.1

Social Chat Model

6

Discussion

6.1

Hypotheses

7

Conclusion