Linked Data-as-a-Service Synthesizing experiences in a B2B domain

(1)

Linked Data-as-a-Service

Synthesizing experiences in a B2B domain

Author Daoud Urdu

UvA ID 10576436

VU ID 2566817

Supervisor/examiner drs. Arjan Vreeken (UvA)

Supervisor/examiner drs. Wouter Beek (VU)

Supervisor mr. Lon van den Akker (Graydon)

Second Examiner prof. dr. T.M. Tom van Engers

July, 2016

MSc Information Studies: Business Information Systems University of Amsterdam, Faculty of Science

(2)

1 "Don’t flatter yourself into thinking you can divine my motives or my actions. You are a mouse in a maze."

(3)

1 Introduction . . . 4 1.1 Literature review . . . 4 2 Problem Definition . . . 7 2.1 Theoretical Underpinnings . . . 7 2.2 Theoretical Propositions . . . 7 3 Research Aim . . . 10 3.1 Research design . . . 10 3.2 Research Questions . . . 11 4 Research Methodology . . . 12 4.1 Research Triangulation . . . 12 4.2 Methods . . . 12

4.3 Design for use before use . . . 16

5 Results . . . 17

5.1 Literature . . . 17

5.2 Interviews . . . 20

5.3 Case Study Graydon . . . 28

6 Discussion and Conclusion . . . 35

6.1 Discussion . . . 35 6.2 Implications . . . 36 6.3 Future Research . . . 37 6.4 Conclusion . . . 38 A List of Tables . . . 42 B List of Figures . . . 42 C Abbreviations . . . 43 D Conceptual Framework . . . 44 E Thesis Planning . . . 45

F Approaches for using ontologies . . . 46

G Case Study Graydon . . . 47

G.1 Organogram Graydon . . . 47

G.2 Use Case UML Diagram . . . 48

G.3 Use Case Diagram Table Description . . . 48

G.4 User Stories (Dutch) . . . 49

G.5 Use Case Observations . . . 50

G.6 Technical Scripts . . . 52

H In-depth Interviews . . . 56

H.1 Table Interviewees . . . 56

H.2 Interview Protocol . . . 57

H.3 Network View of Interview Codes . . . 60

I Interview Transcripts . . . 62

I.1 Word Clouds . . . 151

J Reflection . . . 153

(4)

3 Abstract Data plays a crucial role in the information era. In order to sustain resilience, organizations have to persevere their data assets. This thesis was designed to determine the importance of data governance for heterogeneous data. To date, managing heterogeneous data is supported by the Linked Data concepts and Semantic Web technologies. Also, organizations in the B2B domain face the emerging concepts of X-as-a-Service. Therefore, this thesis sets out to determine data governance aspects of Linked Data as-a-Service. As a general research strategy, a triangulation of three methods was adopted to ensure generalizable results. A case study approach was used to conduct the explanatory part of this thesis. Moreover, nine in-depth interviews were conducted with subject matter experts to evaluate experiences with LDaaS. Also, a literature review was chosen to determine the factors that affect the overlying concepts within the discourse.

The results has shown that generally managing heterogeneous data does lead to enhanced information sharing, where it was also found that the impact of adopting LDaaS seems to be essential. Furthermore, this thesis has found out that several considerations should be taken into account prior to operationalizing LDaaS. Finally, this investigation has resulted into several data governance heuristics for LDaaS. These are the following:

1. Communicate non-formalized aspects, which is a non-invasive approach, in an agile man-ner (e.g. user stories);

2. Consider the information architecture as a whole;

3. Consider aspects of data provenance, especially regarded to rapid prototyping;

4. Consider data quality. Ontology engineering seems to be important in order to establish semantics;

5. Consider a classification of data and content (heterogeneity) when formalizing guidelines; 6. Consider compliancy aspects, especially for Social Network Analysis.

(5)

1 Introduction

In the new information era, data plays a crucial role for organizations in performing their business strategy. Especially in times of uncertainty, decision making is being supported by retrieving knowl-edge from data. Within Big Data research there is a growing body of literature which recognizes the 4V’s of Big Data: (1) volume, (2) velocity, (3) variety and (4) veracity (Fan 2013; Chen et al. 2014; Kepner et al. 2014). Recently, there has been renewed interest in the last two mentioned aspects. Managing heterogeneous data, indicating the variety aspect, plays a critical role within this novelty. Furthermore, veracity of data is an important aspect which would address confidentiality, integrity and scalability (Kepner et al. 2014).

Traditionally, organizations seek performance improvement in order to sustain competitive ad-vantage in a continuously changing external business environment. The past decade has therefore seen rapid development of disciplines like Business Intelligence (BI) and data management. Within the discipline of BI, a systematic approach is applied for the process of collecting data, analysing and applying the obtained information for business purposes (Hamer 2013).

The overall structure of the thesis takes the form of six chapters, including this introductory chapter assessing theoretical dimensions of this research. Chapter two begins by laying out the problem definition of this research followed by three theoretical propositions. The third section presents the research aim of this thesis comprising the research questions and assessing existing related work. The fourth chapter is concerned with the research methodology used for this thesis. Chapter five presents the findings of this research, focusing on the process of data gathering and how the data is analysed. Finally, chapter six analyses the results of the previous chapter and tries to come up with preliminary propositions. It also tries to give explanations on the results with a conclusive discourse, followed by a discussion and few future research suggestions as a remainder of the thesis.

1.1 Literature review

Recently, researchers have shown an increased interest in emerging concepts and technologies that facilitate many IT related service aspects and empower organizational departments to improve their information management. Existing research recognizes the critical role played by (1) data governance aspects of X-as-a-Service in the domain of cloud computing (T. Winkler et al. 2011; Khatri and Brown 2010; Bijkerk 2015; Vinkenborg 2015), and (2) promising possibilities of Linked Data1_{supported by the Semantic Web technologies (Chen et al. 2014; Tim Berners-Lee 2010; Steve} Battle 2014).

1.1.1 Semantic Web and Linked Data The very essence of the Semantic Web is

integrat-ing Knowledge Representation (KR) mechanisms and the World Wide Web (WWW) in order to provide meaningful content of web pages. This is done by providing data about data, also known as metadata. Subsequently, a unification of this results into two important aspects: (1) reasoning: applying pre-defined rules on a dataset and (2) software-agents: exchanging information with the understanding of the Semantic Web’s unified language, specified by ontologies. In contrast to other standardization, the Semantic Web is more flexible. There is a shared understanding of both the producer and consumer agents by exchanging ontologies (Tim Berners-Lee and Lassila 2001; Aftab, Afzal, and Khalid 2015).

1

(6)

5

Recently, Linked Data (LD) has been the subject of many classic studies in the discipline of the Semantic Web. In contrast to the WWW, which is recognized as a Web of Documents, with LD data sources are linked and connected with each other. This makes it possible to find related data in a single query.

Five expectations of behaviour are presented in a manifesto by Tim Berners-Lee (2009). These are also recognized by the 5-star deployment scheme2_{. A possible interpretation of those expectations} and 5-star scheme could look like this:

1. Available on the web, with an open licence to be Open Data. For instance, by using IRIs (Internationalized Resource Identifiers)3 as name for things.

2. Make sure people can look up those names. For instance by using HTTP IRIs.

3. Available as machine readable structured data in a non-proprietary format. For instance, CSV instead of Excel.

4. Published using open standards from the W3C (RDF and SPARQL).4

5. Include links to other LOD (Linked Open Data) (e.g. using links to other IRIs).

Resources are expressed in a RDF (Resource Description Framework). A resource could be a (1) class: a resource denoting a concept (2) instance: a resource denoting an individual (thing) and (3) property: resource denoting a binary relationship. Furthermore, specific techniques for the use of Linked Data are called ‘follow-your-nose’ and ‘dereferencing’. The first one informally refers to determining the meaning of a discovered IRI. This is done by performing information retrievals on IRIs in order to obtain more knowledge (W3C Wiki 2007; W3C Wiki 2006).

The second one, dereferencing, is valued by the IRI that is returned. This contains information about the thing that is identified, also called metadata (W3 Wiki 2012). With regard to Knowl-edge Representation, it seems that IRIs denote resources, and resources identify IRIs, producing semantics. Thus, information life cycle may tentatively improve by the semantics in it.

1.1.2 Cloud based services A key study describing the transition from products to services, is that of Oliva and Kallenberg (2003), in which the authors assess (1) to what extent, (2) how and (3) which challenges are coherent with the transition to services. The study identified a thoughtful developmental process to frame capabilities while the change occurs. Moreover, organizations alter the nature of the relationship with the end-user of a product and the focus of a service delivery. Furthermore, the changing nature of "things", including the use of a product cloud, is acknowledged by a study which sets out to determine how smart connected products are transforming competition. These "things" may ensure products becoming smart, or including the usage of a product cloud. In the context of IoT (Internet of Things), the authors identify the rise of the concept reflecting the increasing number of smart and connected products comprising new opportunities.

Similarly, this research is designed from a Design Thinking perspective. A related study (Bjögvins-son, Ehn, and Hillgren 2012) examines the phenomenon of designing "things". Which will be dis-cussed in section 3 . Also, Porter and Heppelmann (2014) acknowledge the existence of a product cloud and point out:

"... coupled with a product cloud in which product data is stored and analysed and some applications are run, are driving dramatic improvements in product functionality and perfor-mance. Massive amounts of new product-usage data enable many of those improvements."

2

See: https://5stardata.info/en/

3 _{See: https://www.ietf.org/rfc/rfc3987.txt} 4

(7)

Overall, a product cloud would probably run software on the supplier’s or a third party server. It will contain a database with product-data, a platform to build software applications, a rules engine and an analytics platform. While providing all these through other layers of cloud based services, identity and security structures should be taken into consideration. A gateway would enable access for external data including other business systems, like ERP (Enterprise Resource Planning) or CRM (Customer Relationship Management) systems, which is potentially also provided as a service through the cloud.

It appears that SaaS (Software as-a-Service) is changing enterprise systems landscapes. Previous findings suggest that a lower systems specificity and smaller organizational size are the main drivers for the deployment of SaaS. Recently, researchers have examined the governance aspects of IT and SaaS. Organizations encourage behaviour of IT governance by creating structures leading to achievement of business performance goals. IT governance is defined as “specifying the decision rights and accountability framework to encourage desirable behaviour in the use of IT” (Weill and Woodham 2002).

Governance on SaaS is currently dependent on organizational and system specific contingencies. Therefore, different contingency models should be applied in different organizational and system frameworks. SaaS does not necessarily need a centralized authority for efficiency purposes, like a correlation claimed for information systems (T. Winkler et al. 2011). However, previous studies found that governance on SaaS aims at operationalizing the allocations of decision rights and task responsibilities relevant to a SaaS system in a business (Sijsa 2015). Additionally, three associated factors regarding to the preface of an implementations of SaaS governance are identified, being (1) origin of the application initiative, (2) scope of application use and (3) business knowledge of IT unit (T. J. Winkler and Brown 2013).

Recent studies imply that IT governance frameworks are firmly connected to data governance. This indicates that data governance decisions should be incorporated with IT governance (Khatri and Brown 2010; Vinkenborg 2015).

Another tier of cloud based services, X as-a-Service, is DaaS (Data as-a-Service). To date, however, there has been little discussion about DaaS. A systematic understanding of how DaaS contributes to cloud services is still lacking. Although some research has been done on DaaS, there is little scientific understanding of its governance aspects.

Central to the entire discipline of DaaS, is its service for remote data storage and maintenance, including a cost aspect that is potentially involved. This is done in the cloud, while providing data to the other layers and services e.g. SaaS. Moreover, DaaS enables users to access data on demand using the Internet. The concept is often also mentioned as cloud-based storage in the literature (Aftab, Afzal, and Khalid 2015; Olson 2009).

A previous study (Cai et al. 2015) reported DaaS as an enabling technology for organizations by way of information integration between existing enterprise systems and sources of heterogeneous data. However, in the same study, four challenges of a DaaS structure were presented: (1) business perspective, (2) systems perspective, (3) disconnectedness between phases of an implementation and (4) disconnectedness when trying to store business elements in data-centric applications.

Recent evidence (Stefanov et al. 2011) addresses the advantages of cloud services combined with pay-per-use models and scalable resources. This would induce offering higher efficient use of capital, cost-effectiveness and agility. However, in order to reap the benefits of cloud, organizations are suggested to present cloud-specific charge-back routines. This would imply allocate IT service costs to service consumers. On the contrary, a study by Dedene et al. (2004) highlights the need for an ABC (Activity-Based Costing) for business IT alignment purposes. Challenges with such an

(8)

7

approach are implied as the translation of business metrics into IT metrics. Examples given are (1) number of order, (2) number of processor instructions and (3) number of XML-messages. In this case the metrics should explain service resource consumption.

2 Problem Definition

In section 1.1 several emerging theoretical underpinnings have been presented. The objectives of this chapter are to present a summary of the key theoretical underpinnings and, on basis of that, consider three theoretical propositions.

2.1 Theoretical Underpinnings

Previous studies showed the possibilities of Semantic Web, wherein the essence is providing mean-ingful content for web pages (Fan 2013). However, current approaches of Semantic Web technology do not support some crucial elements: (1) ad-hoc querying, obtaining information on the fly (2) publishing data as LOD (Linked Open Data) in the cloud and (3) consistency and reliability of the query answers (Rietveld et al. 2015).

The first discussions and analysis of cloud based services address the importance of gover-nance for organizations. However, existing contingency models are not fully capable of accurately predicting the effectiveness of SaaS governance (T. Winkler et al. 2011; Weill and Ross 2004). Furthermore, some research addressed IT Governance and SaaS governance (Bijkerk 2015; Khatri and Brown 2010; Sijsa 2015; Weill and Ross 2004). However, in this discourse, there is still little scientific understanding on the impact of DaaS governance on organizations.

Together these studies may provide meaningful insights into the effectiveness and efficiency an organization may embody. On the contrary, the evidence of Linked Data and its as-a-Service governance aspects for organizations is accordingly weak and inconclusive (Fan 2013; T. Winkler et al. 2011; Vinkenborg 2015). This may suggest broader hypotheses for further research into these two concepts. This research seeks to obtain the data that will support the assessment of the research gap of LDaaS (Linked Data as-a-Service) and its governanace aspects.

2.2 Theoretical Propositions

Based on the literature as assessed in section 1.1, several theoretical propositions were developed. In Figure 1 an illustration of these propositions are presented.

Proposition 1. Managing heterogeneous data can lead to enhanced information sharing, with re-gard to business intelligence.

Variety of data, one of the V’s, is one of the main obstacles within the disciplines mentioned in the introduction (Fan 2013; Khatri and Brown 2010; Rietveld et al. 2015). By managing het-erogeneous data, discovery of relationships across structured and unstructured datasets would be feasible (Oracle 2014). Thus, information sharing is likely to increase. This is done by providing an integrated view of different datasets, using Semantic Web technology. This entails the accep-tance of KR and LD, using ontologies and vocabularies in order to establish information integration (Stuckenschmidt and Van Harmelen 2005).

(9)

Figure 1: Theoretical Propositions

However, there are some limitations. Firstly, managing heterogeneous data comprises several data governance concerns, considering different data owners having different requirements while storing data (Aftab, Afzal, and Khalid 2015). Secondly, the emerging concept of cloud based tech-nologies, X as-a-Service, comprises also governance challenges (Sijsa 2015).

Proposition 2. Implementation of a data governance framework may ensure a more successful usage of cloud based services i.e. ensure availability of LDaaS.

In literature, the causes of data governance have been the subject of debate within X as-a-Service. As mentioned before, within Data as-a-Service, there is little scientific understanding of its governance aspects. Moreover, the determination of heterogeneous data is (1) technically challenging for providing the Linked Data as-a-Service, and subsequently (2) there is no clear overview in which data governance aspects are involved. Also, data interoperability seems to be a shortcoming in heterogeneous data integration for DaaS(Aftab, Afzal, and Khalid 2015). Furthermore, a lack of successful web-services seems to be caused by the negative incentive model: publishing data in a standard format is currently penalized (Auer et al. 2012). Successful web-services may be defined as a systematic process of using a web-services with regard to information sharing, avoiding greatest possible extent of challenges.

Proposition 3. Organizations with a LDaaS operationalized, tend to comprehend a more governed information provisioning.

In LD, it seems that a direct gratification is missing of the efforts of information providers (T Berners-Lee et al. 2007). Furthermore, particularly in a DaaS context, there seems to be no clear distinction between the service provider and the data provider. The data providers seems to encourage the service provider to enforce any privacy requirements. Additionally, classical web

(10)

9

service security models focus on both service provider and service consumer and less on the data provider. Also, user data access management is often not being considered (Aftab, Afzal, and Khalid 2015).

Overall, aforementioned occurrences would typically be recognized as challenges within data governance.

(11)

3 Research Aim

In chapter two a problem definition is highlighted. Moreover, theoretical propositions are presented, on basis of the literature. Taking these two into account leads to the identification of a research gap. This chapter tries to provide an overview of this gap and how this thesis could contribute to that. It ends with research questions followed by existing related work.

3.1 Research design

This research traces the development of the emerging role of the semantic web for organizations who try to cope with data management. Additionally, organizational departments need to be diligent on new concepts and principles of cloud and X-as-a-Service.

The central method of this study is learning from experiences. While considering the current state of knowledge, the theoretical propositions are developed. Furthermore, these propositions are used for an analytical generalization while synthesizing data in a real-world scenario. In Figure 2, this inductive scientific method is illustrated.

Figure 2: Research Design. Adapted and edited from “Case Study Research” Robert K. Yin, 2014. Based on the literature, theoretical propositions are presented. These are empirically investigated against Case study findings and interviews. Based on this, presential propositions will be presented. Subsequently, Rival Theory will be constructed.

Furthermore, this study sets out to assess the aspects of data governance while using LDaaS. Concurrently with this research, a real-world operationalization of LDaaS has been performed. In this way, theory is used to generalize case study findings.

(12)

11

Moreover, this thesis attempts to corroborate both preliminary and theoretical propositions, and cautiously presents the data governance considerations should be taken into account when operationalizing LDaaS in a B2B (Business-to-Business) context. Subsequently, this research tries to address several data governance heuristics for LDaaS. These are divided into perspectives of business and IT.

3.2 Research Questions

In 2.2 three theoretical propositions are presented. In order to assess these propositions, research questions are designed. The main question tries to evaluate the impact of LDaaS, while Sub Question 1 tries to determine the data governance aspects, related to proposition 2. Sub Question 2 focuses on the business IT alignment aspects, which is not related to the propositions.

Main Question : How does Linked Data as-a-Service adoption impact organizations in a B2B context?

Sub Question 1 : What are its relevant data management and data governance aspects? Sub Question 2 : What business IT alignment aspects should be taken into consideration? Given the methodology used, generalizability of the aforementioned questions could be an issue. Taking this into consideration, the following conceived approaches are presented:

1. Taking experiences from other contexts into account, by conducting interviews;

2. Having a detailed real-world case, that is new and insightful from both technical and organiza-tional perspective;

3. From a Participatory Design perspective (Ehn 2008), actually participating in the specific de-velopment and realization of a LDaaS solution in a real world context. Data is being gathered while considering direct-participant observation(Yin 2013);

4. Consider to glance parts of a real-world situation, focus on specific parts or to stop developing in a specific phase in order to manage time and planning. Moreover, the case study seeks to examine the availability, reuse of LOD cloud, and data as-a-Service aspects of a LDaaS. In this specific case, these are concerned internal Graydon data enriched with external Linked Open Data.

3.2.1 Related Work

1. DaPaas (DaPaaS 2014). 5

2. LINDA (LINked Data) (LinDA 2015).6

3. Pilod Enterprise Linked Data (Joep Creusen 2015).7

4. Paper Linked Data Business Cube – Modelling Semantic Web business models(Pellegrini, Dirschl, and Eck 2014).

5. An Approach for Secure Semantic Data Integration at DaaS Layer (Aftab, Afzal, and Khalid 2015).

6. An ontology-based semantic configuration approach to constructing Data as a Service for en-terprises (Cai et al. 2015).

A conceptual framework is shown in Appendix D.

5

See: http://project.dapaas.eu/dapaas-work-packages

6 _{See: http://linda-project.eu/wiki/index.php?title=Main_Page} 7

(13)

4 Research Methodology

This chapter describes the methodology used in order to conduct this research. It begins with a high level overview of the combination of different approaches, in this context called method triangulation, which is the combination of these approaches. Furthermore, the concept of Design Thinking and its relevance for this thesis, is described. Subsequently, three methods used for this thesis are presented.

4.1 Research Triangulation

In previous research, a variety of methods were used to assess LDaaS. However, innovations, like LDaaS, are difficult to predict because of delicate contextual and human social factors. Within this research design, a triangulation of methods is employed. This results into three research methods for finding implications. Accordingly, the research data is drawn from three main sources: (1) a literature review, (2) a case study and (3) in-depth interviews. Also, the synthesis of the first two methods is done according to suggestions of Yin (2013) and Robson (1993). Analysing interviews data is done according to suggestions of Mack et al. (2005) and Robson (1993). Furthermore, a triangulation approach is chosen because it can help facilitate a deep understanding of LDaaS, and to construct validity by using complementary sources of evidence.

interviews literature case − study

Figure 3: Research Triangulation

4.2 Methods

First, a literature review has been carried out in order to develop expected answers on the research questions. These preliminary elements had composed into theoretical propositions, as described in section 2.2. This compound was prepared by using the procedure suggested by Yin (2013), which simply argues having theoretical propositions in order to give purpose in an exploratory study. Using theory in single-case studies, has thus ensured an establishment of the domain for generalization. As a consequence, the research possibly may have a significant external validity.

A case study has been chosen as a second approach to obtain further socio-organizational data, by empirically observing the case. Socio-organizational issues comprise a combination of technology, culture and organization which are blended with intangible assets, among which are, trust, confi-dentiality, knowledge sharing and governance (Weill and Ross 2004). A case study research could be seen as a form of qualitative research that focuses on providing detailed information of one or more cases (Hans Akkermans 2015). Moreover, a case study tries to highlight a decision or a set

(14)

13

of decisions: why they were taken, how they were implemented and with what result (Yin 2013). Mainly, a case study requires (1) a research question, (2) its theoretical propositions, (3) its unit of analysis, (4) linking data to the aforementioned propositions – analysis of the observations and (5) (criteria for) interpreting the findings. The first two components are briefly discussed in previous sections. The next two will be assessed in section 4.2.1. The case study is mainly prepared according to the procedure as suggested by Yin (2013).

As a third method of data gathering several in-depth interviews with subject matter experts have been conducted. Subject matter experts are recognized as either users of LDaaS or potential users of LDaaS. This data, generally expressed in form of individual experiences, complements with two other data sources: preliminary elements of literature and case study observations. Correspondingly, the research tries to provide a holistic view. The in-depth interviews are prepared mainly according to Mack et al. (2005) and available functionalities in ATLAS.ti. This is further elaborated in chapter Qualitative Interviews. A time schedule concerning this thesis project is shown in Appendix E.

4.2.1 Use Case "Operationalize LDaaS" This thesis takes the form of a case study in the context of Graydon. Graydon is selected because the organizations core business is to gather data from diverse sources and translate it to information. By providing the right information to its clients, which are companies with a certain information need, the company can take better decisions and become more competitive (NL 2015). Additionally, Graydon has been operating in a B2B context. 4.2.1.1 Unit of Analysis Within this case study, a realization of LDaaS is done within a given time frame of approximately 208 hours. The actual realization is done within the Competence Centre Data Discovery. Team Data Discovery is one of the four teams within the Division Operations (Re-sources 2016). Several adapted organograms are found in the Appendix G.1. Team Data Discovery could be described as the following (Discovery 2015):

"The Data Discovery team has specific responsibilities for the provision and on-going main-tenance of Information Systems concerned with data storage, data integration and Big Data within Graydon, and between Graydon and it’s Partners and Customers."

Moreover, rapid prototyping is used while developing and operationalizing the infrastructure for the use case. This simply argues a use case-driven development of software. Analyses of this process is described in chapter 5.

This single-case study is performed from two different rationales: an unusual circumstance and a common circumstance. These are respectively divided to Data Consumer 1, Manager of team Group CRM Intelligence, and Data Consumer 2, Data Scientist of team Analytical Integration. The unusual circumstance implies a representation which deviates from everyday occurrences. This would be in the use case when Data Discovery provision information to Dat Consumer 1. A com-mon circumstance involves capturing conditions of an everyday situation (Yin 2013). A comcom-mon circumstance would be if Data Discovery provision data to Data Consumer 2.

4.2.1.2 Case Study Report The case study report will be constructed with the following rationale. In order to maintain its chain of evidence, a case study protocol is constructed. This ensures the reliability of the information for this case study. For this thesis, the case study is used to explore the real-life events and scenario’s when realizing LDaaS in a B2B context. Prior to commencing the use case, several aspects are taken into account.

(15)

First of all, the type of evidence to be expected is associated with theoretical propositions and research questions for this thesis (Yin 2013). Subsequently, citations to specific evidentiary sources in the case study database are taken into consideration. A case study database is used to gather and store the empirical data. Potential data sources include, but are not limited to, observations, archival records, physical artefacts, interviews and documentation. However, data sources are not limited to these aforementioned. For the database MS OneNote 8 _{is used. Figure 4 presents an} adapted version of collecting evidence for a case study as suggested by Yin (2013).

Figure 4: Collecting Case Study Evidence. Adapted from Yin (2013).

Secondly, observations were divided into direct observations and participant-observations (Yin 2013; Mack et al. 2005; Robson 1993). For direct observations, the occurrences of certain type of real-world activities are going to be assessed. These could involve observation of meetings, sidewalk meetings along with others. Organizational culture is one of the possible outcomes of the assessment of these direct observations. On the other hand, participant-observation suggests a more active attitude observer. Some advantages of a participant-observation are: (1) gaining access to events or groups, (2) perceiving reality from the viewpoint of “inside” and (3) manipulating minor events such as organizing meetings. Within this thesis both of the aforementioned methods are going to be employed. However, this classification will not be done when presenting the results. Moreover, observational data is going to be collected using several methods as suggested by Yin (2013).

Furthermore, UML (Unified Modeling Language) is used as an information modeling language for both communicating with Data Consumers and as a basis for technical work. For this purpose the use case diagram is selected as it illustrates the ways of usage and interaction between a system and user (Akkermans 2015b). For this research, the use case diagram and description are found in Appendix G.

Also, user story-formats were prepared and provided to both data consumers. Summarized, they were asked to provide one or more user stories which reflected their information need at the moment.

8

(16)

15

By chance, two of the stories comprised a need for Social Network Analysis. As seen in the Use Case Diagram, these stories needed to be interpreted and translated into an actionable task which Data Discovery/Data Provider needed to do. This requires SPARQL and RDF knowledge. Both of the user-stories and the user story-format can be found in Appendix G.

Moreover, ATLAS.ti and MS OneNote are used to store gathered research data. This should also help to link data to preliminary propositions. These preliminary propositions will eventually support explanation building in order to interpret the findings and construct rival theory.

Furthermore, pattern-matching strategy is adopted as suggested by Yin (2013) for the analysis. Pattern-matching is useful for both theory-developing and theory-testing. However, a pattern should be defined already prior to actual analysis of data (Yin 2013). For this thesis, theoretical propositions are regarded as a predicted pattern. Eventually, each proposition suggest a causal relation between two variables. In this thesis, these could be defined as "nonequivalent dependent variables". A nonequivalent dependent variable is a relevant outcome of the use case.

For example, proposition 1 suggests: "Managing heterogeneous data can lead to enhanced in-formation sharing, with regard to business intelligence." In this case, managing heterogeneous data is one variable, while enhanced information sharing is the second variable. By analysing empirical data, patterns could be confirmed according the theoretical proposition. Likewise, it is also proba-ble that real-world data contradicts. In this case it is possiproba-ble to formulate presential propositions, which will build on "rival explanations".

Lastly, coherent to the use-case and theoretical propositions, in-depth interviews are conducted. This will be elaborated in the following subsection.

4.2.2 Qualitative Interviews As mentioned above, in this study in-depth interviews are being conducted. The interviews are semi-structured. This means there were mainly open questions, and questions were composed as discussion topics (Akkermans 2015a). A total of 9 interviews were conducted. Six of these interviews were conducted in Dutch, while the remaining interviews were conducted in English.

4.2.2.1 Participants For the semi-structured interviews a focus group with participants was desig-nated. The participants include subject matter experts acting in different roles. Criteria for selecting the participants are as follows: (1) LDaaS users (2) subject matter experts in domains concerning main topics of this research. This focus group is chosen because of their relevant knowledge and understanding in practice.

Also, participants acting in a B2C (Business-to-Consumer) context are approached for the inter-view. For this study B2B domain could also imply dynamics of the industry or supply chain wherein the organization is. Examples would be suppliers or organizations in the same industry. Further, primary exclusion criteria for the participants are, either the participant was conceding a role that is not related to the main themes of this thesis or time constraints. A table of the interviewees can be found in Appendix H.

4.2.2.2 In-depth Interviews Due to the different perspectives of the participants for the same research area, different sets of interview questions will be used. The question set will be sent before the interviews, in order to give the participants time to prepare. Seven interviewees will receive the "LDaaS Users" set of questions, whereas the other two interviewees will receive either the "Data Governance Specialists" or "Enterprise Architecture"" set of questions. All the question sets can be found in Appendix H.

(17)

Each interview will be recorded and subsequently nearly the whole conversation will be tran-scribed. For recording two tools are going to be used, the MS Windows 7 Sound Recorder9_{is being} used for real-life interviews, while for the Skype conversations Amolto will be used 10_{. However,} the Skype recordings first needs to be converted in order to use it in ATLAS.ti. For transcribing the interviews, ATLAS.ti functionalities are going to be used. These will be done manually, which implies that human errors, such as spelling mistakes, might occur. For the transcription ’anchors’ will be used in order to enter a time mark after each interview question was transcribed. Moreover, each transcription was sent to the interviewees afterwards for a quick validation.

The interview questions are set up to link data gathered in real world with theory. Mainly, interview questions are derived from the theoretical propositions. The participants were expected to have knowledge about one of the domains as mentioned in paragraph 4.2.2.1. In order to identify different data governance aspects on DaaS and LDaaS, regarding to research question 1, participants were asked, for example, to mention challenges in these areas. Moreover, LDaaS users were asked what kind of deployment of LDaaS was chosen. This was prior to identifying the Cloud and as-a-Service aspects. Another example was to emphasize specific data governance challenges for DaaS with the Data Governance specialist. Also theoretical propositions were taken into account while desining the interview questions sets.

Prior to data collection, the participants acquired a justification of the project. This is done through mail communications. A table of the interviewees, the interview protocol and the respective questions can be found in Appendix H.

4.3 Design for use before use

"To invent a product, we need to design, and to design, we need to explore the material. It’s as simple as that"

Tom Armitage

This thesis tries to examine the practical realization and application of LDaaS. For this, Design Thinking (Manzini 2014; Bjögvinsson, Ehn, and Hillgren 2012) and, rather comparable, Participa-tory Design (Ehn 2008) methods are used. On both of these methods the literature has emphasized the importance of co-design where a collaborative effort of stakeholders and competences are vital. In the design process, ideas have to be prototyped and explored on the way. The main line of thought is “designing for use before use" (Bjögvinsson, Ehn, and Hillgren 2012; Ehn 2008).

Moreover, it is suggested to move from designing “things” (objects) to designing “Things”, which are recognized as socio-material assemblies, especially projects (Bjögvinsson, Ehn, and Hillgren 2012). Designers should opt for a socially innovative design which is considered as beyond the economic bottom line. In addition, in participatory design, democracy and skill are seen as guiding values which lead for users to articulate ‘tacit knowledge’ and ‘aesthetic experience’ (Ehn 2008). The process of this aforementioned approach is applied across all the segments of this research. In particular, an emphasis is noticeable in designing the use case. Stakeholders, with different expertise, are involved from the beginning of the research. This resulted to several communication moments with subsequently collecting input. These inputs are used for designing the use case. Please refer to 4.2.1 for more details and to Appendix G for an UML Use Case Diagram.

9 _{See: windows.microsoft.com/nl-nl/windows7/record-audio-with-sound-recorder} 10

(18)

17

5 Results

As argued in chapter 3 and chapter 4, this thesis tries to corroborate both preliminary and theo-retical propositions. The research questions aims to assess the impact of LDaaS, with a firm focus on the relevant data governance aspects and business IT alignment.

This chapter presents and describes the results in a systematic and detailed manner. It provides an overview of the process of analysing gathered research data. Moreover, it describes what data is gathered and how that data has been gathered. Also, it tries to assess how the data is analysed while relating research questions and propositions with empirical evidence in an iterative way. Main themes or patterns that emerge from the analysis will be highlighted. The main goal is to learn from the data, what they try to say, and how the are related to each other looking at the research questions and theoretical propositions.

This results into preliminary elements which support either presential propositions, which con-struct rival theory, or theoretical propositions, which build on theory of Data Governance on LDaaS. This chapter consists of three sub-sections, each of which discuss one of the three research data sources. First, additional relevant literature is presented, which is thereafter ollowed by a section about the conducted interviews. Finally, analysis of the case study is being conducted.

Several considerations are taken into account to determine the factors that affect the design of the process. On one side, it is critical to know what data structure the empirical evidence has, while on the other side it is important to know what elements are interesting. Taken this into account, the essence of raw data gathering should be significantly clear (Robson 1993).

5.1 Literature

Additional literature was found during the process of designing the use case and interviews. In this section relevant literature is stressed out. The data of this source primarly takes on the structure of scientific research papers or topics / articles mentioned in interviews. It is analysed by reading the articles and drawing relevant, preliminary elements.

5.1.1 Data Governance To date, several studies have investigated the effects of Data Gover-nance. In one of the interviews conducted for this thesis, a relevant strategy is mentioned called "Non-Invasive Data Governance" (Robert S Seiner 2014). In this work Robert S Seiner (2014) ar-gues that Data Governance refers to the formalization of guidelines around management of data. Preferably, a non-invasive approach, which focuses on formalizing what already exists, is suggested in order to prevent the discipline of appearing threatening and difficult (Robert S. Seiner 2010).

Moreover, Weill and Woodham (2002) found that firms who consistently make better IT-related decisions achieve above industry average Return on Investment (ROI). The author, thus, suggests that effective IT governance could be one of these IT-related decisions. In the same vein, Addaada (2016) advocates in his article to incorporate architectural disciplines to govern data. The author suggests to identify logical and physical data domains and data sets. Additionally, a classification of the logical domain could be "customer" and "finance". While this research mainly aims at the semantic layer, which is the third layer, the physical and logical layers are also important to consider in order to provision governed information.

Therefore, governance on LDaaS has several limitations regarding the architectural point of view. Provided that, data governance comprises numerous aspects of the architectural landscape, organizations need to manage all of these aspects in order to do effective LDaaS governance. Thus,

(19)

there are a number of important changes which need to be made. These heuristics will be presented in chapter 6, and may support proposition 2: implementation of a data governance framework may ensure a more successful usage of cloud based services.

5.1.1.1 Platform Linked Data Nederland Interestingly, to date, there is a Dutch community called Platform Linked Data Nederland (PLDN)11. PLDN is a network in which experts and interested parties share knowledge about Linked Data. This network includes people from business, government and research institutions (PLDN unk). Additionally, the PLDN is allowed to use resources of the lab environment at the Big Data Value Center 12.

Besides organizing symposia, the community provides several presentations, documents etc. on the topic Linked Data. A few interesting presentations are given from NXP 13 14_{. Moreover,}

NXP won the Linked Enterprise Data application 15 _{of the Netherlands in 2015 (PLDN 2015).}

Two interviewees that have participated this research are involved with Linked Data projects at NXP. Also, another interviewee coordinates the PLDN community while working at Kadaster. Additionally, one interviewee is involved as member in the community while working at the Dutch Tax and Customs Administration. This is further discussed in subsection 5.2.

Also, the platform provides several contributions of experts in the field of Linked Data. This is done in working groups aimed at a specific topic. One noteworthy topic is called "LD in Enterprise Context"16_{. To date, this is still in progress.}

Paragraph 5.1.1.1 may support proposition 3: organizations with a LDaaS operationalized, tend to comprehend a more governed information provisioning. This is because of the following reasons. First of all, organizations like NXP, experience the usage of Linked Data as being able to answer previously unanswerable questions. Also, as found in one of the slides, a minimal investment is needed compared to traditional BI projects (Walker 2014). Secondly, the organization uses Linked Data to define a canonical data model 17 for its marketing master data and product life cycle management. The organization has several reasons to make use of Linked Data: (1) A canonical model is a long term goal, (2) it is challenging to define RDB and XML schemas, because for example of the need to deal with heterogeneous data, (3) it is more feasible to query XML messages. These findings will be synthesized further later in this chapter.

5.1.1.2 Use of ontologies and vocabularies Information sharing has several implications in the presence of heterogeneous data. Stuckenschmidt and Van Harmelen (2005) propose ontologies as a means of dealing with semantic heterogeneity. Moreover, the authors suggest ontologies as an expla-nation of a shared vocabulary or conceptualization of a specific subject matter. Conceptualization can be defined as a common understanding of certain concepts. It encompasses terms from natural language, and can thus be pretended as a shared vocabulary. Furthermore, the authors present an analysis of several approaches using ontologies. Three main approaches are illustrated in figure 10 of Appendix F. 11 See: http://www.pilod.nl/wiki/Platform_Linked_Data_Nederland 12_{See: http://www.bdvc.nl/} 13 See: http://www.pilod.nl/w/images/d/d6/Walker_PiLOD2_20140417.pdf 14 See: http://www.pilod.nl/w/images/d/d1/20141126_Ordina_Walker.pdf 15 See: http://data.nxp.com/doc/ 16 See: http://www.pilod.nl/wiki/LD_in_Enterprise_Context

17_{A Canonical Data Model is generally used in system/database integration processes where data is}

(20)

19

In a study conducted by Vandenbussche et al. (2015), it was shown that one of the major hurdles of Linked Data is the dilemma that data publishers have in determining which vocabularies to use in order to describe the semantics of data. The researchers propose a catalogue of reusable vocabularies for the description of data on the web: Linked Open Vocabularies (LOV)18_{. The LOV facilitates} data publishers information on indicators such as the interconnection between vocabularies, its version history and current and past editors. Several data access methods are provided in order to increase the reuse of vocabularies in the Linked Data ecosystem. Moreover, the same hurdle of publishing data as LOD is also addressed by Rietveld et al. (2015). As presented in the literature review in 2.1

Taking this into consideration, LOV may significantly address the challenge of publishing data with semantics. This could thus support proposition 1, because heterogeneous data is being managed by Linked Data. Additionally, information sharing comprises several challenges regarding managing heterogeneous data.

Two relevant examples ontologies are: (1) GoodRelations19, which is a vocabulary for publishing products or services information optimized for search engines (GoodRelations 2008) and (2) The Organization Ontology 20_{, which is designed to publish information on organizations and their} structures. The latter one is a generic and reusable core that can be extended or specialized in a particular situation (Reynolds 2014).

5.1.1.3 Recap Literature Results Together, these studies and outcomes provide important insights into the theoretical propositions that seem to go in a particular direction. Effective IT governance is one of the IT-related decisions. Firms who take such decisions have achieved above industry average ROI. Moreover, a non-invasive approach was suggested in order to prevent the discipline of data governance from appearing threatening and difficult. Additionally, it was found that organizations need to manage all the facets of the architectural landscape in order to effectively govern LDaaS. These findings are associated with proposition 2, a successful usage of LDaaS by implementing a data governance framework.

Furthermore, a community of experts and interested parties, namely PLDN, is actively involved in Linked Data related challenges. This platform contributes to the community with knowledge and experiences. An organization particularly acting in the B2B domain involved with Linked Data projects is NXP. NXP experiences Linked Data as an enabling technology in order to answer previously unanswerable questions. The investment seems to be lower than a traditional BI project for information sharing purposes. Also, the organization uses Linked Data to define a canonical data model for marketing and product life cycle domain. This is one of the reasons why NXP uses Linked Data. Two other reasons are dealing with heterogeneous data and flexible querying of XML messages. These findings are associated with proposition 1, managing heterogeneous data for an enhanced information sharing and, respectively 3, a more governed information provisioning with an operationalized LDaaS.

Lastly, it was found that data publishers face the challenge of determining which ontology or vocabulary to use. Ontologies were proposed as a means of dealing with semantic heterogeneity and as an explanation of a shared vocabulary. LOV could address this challenge by providing a catalogue of reusable vocabulary. These findings are related to proposition 1, while managing heterogeneous

18

See: http://lov.okfn.org/dataset/lov/

19_{See: http://wiki.goodrelations-vocabulary.org/Quickstart} 20

(21)

comprises the challenge of using ontologies and vocabularies in order to establish an enhanced information sharing culture.

5.2 Interviews

As mentioned before, several semi-structured interviews were conducted in order to gain insights from experts who are connected on the topic of LDaaS. While transcribing, data was instinctively already interpreted. This made the essence of gathering ’raw’ data considerably clear.

For the process of generating word clouds, TagCrowd21 was used. Preliminary analyses of the transcription are done through word clouds. These were merely used as an indication of starting points for further analysis. For each interview a word cloud is generated on basis of the amount of words. A stoplist of words is used in order to only keep the relevant words. The word clouds are used to construct codes and for providing an overview of the context of each interviewee. The word clouds of each interview including a global stoplist can be found in Appendix I.1.

5.2.0.1 Quotations and Codes Transcripts were coded, which was performed on basis of the the-oretical propositions. Propositions were classified as categories which consists of subcategories. For example, proposition 1 comprises two categories: Heterogeneous Data and Information Sharing. Sub-categories of Heterogeneous Data are Semantic Web and Linked Data. For each sub-category one particular code was developed. In this case code P1_Heterogeneous_Data refers to proposition 1, sub-category Heterogeneous Data. This is done for all the six sub-categories of the three theoreti-cal propositions. Subsequently, quotations are mapped to the related code. A network overview and their intercorrelations is shown in Figure 5. Intercorrelations are drawn manually after an initial interpretation of the data.

From Figure 5 one notices that the code P3_LDaaS_Operationalization seems to have a central position. This could imply the importance of taking several considerations when operationalizing LDaaS, because of its many intercorrelations with other codes.

Also, four broad themes emerged from the analysis. These support building on either existing theory or rival theory, and are defined as the following codes:

1. (LDaaS)_Impact - number of times quoted: 313 2. Business_IT_Alignment - number of times quoted: 78 3. B2B_Domain - number of times quoted: 155

4. Architecture - number of times quoted: 102

The first one, (LDaaS)_Impact, is related to the main research question of this thesis. Each quotation related to the impact of usage LDaaS is mapped to this code. The second emerged theme, Business_IT_Alignment, is related to research sub question 2. The last two codes, B2B_Domain and

Architecture are not associated with neither theoretical propositions, nor research questions. The code B2B_Domain implies the context or B2B domain in which Linked Data is mentioned. An example for this thesis could be a profit organization or non-profit. However, quotations related to the inter-organizational context are also mapped to this code, for example the marketing domain or product development. For this thesis frequencies are counted for each code. In this example, a high frequency could signify the variety of domains in which Linked Data is applied. The last one,

21

(22)

21

Figure 5: A network view of used codes related for each theoretical proposition. Each code has one of more relations with another code. Also, a short description of each code is presented below the name.

Architecture, is mapped to quotations mainly regarding Enterprise Architecture. These quotes are mainly putting LDaaS in the bigger picture of the architecture in an organization. A firm focus has been put on the classification of technical and functional areas of architecture.

In addition, the first 6 codes, derived from the theoretical propositions, together with the 4 emerged themes, results into 10 codes used for this thesis. In Appendix H an overview is provided of the intercorrelations among the ten codes used for this thesis. These intercorrelations are drawn manually in ATLAS.ti after conducting initial analyses. A description of the intercorrelations are (ATLAS.ti 1993):

1. is associated with: relates concepts without subsumption.

2. is part of: rhe part-of relation links objects, not concepts of different abstractional level (as does ISA)

3. is cause of: used for representing causal links, processes, etc. 4. is a: the ISA relation links specific concepts to general concepts. 5. is property of: a meta-relation between a concept and its attributes.

In addition to the aforementioned intercorrelations, the quotation within an interview are often also linked with each other in ATLAS.ti. These linkages are done according "explains", "expands", "justifies", "discusses", "criticizes" and "supports". An example would be when an interviewee spoke a sentence and elaborated on that by explaining the sentence. However, links between these quotations are not tangibly seen in a report or whatsoever. It merely supports the process analysing the interview transcripts.

In order to conduct further analyses, other functions of ATLAS.ti are used. The first set of analyses examined the impact of quotations per interviewee. Figure 6 shows an overview of the number of quotations per code per interviewee. The interviewees are stated with their initials.

(23)

Figure 6: The number of quotations that have been linked to a code across all transcripts.

What is already interesting in this data is that Business_IT_Alignment is less quoted, with 78 quotes, and thus less mentioned. This could indicate the low relevance of Business IT Alignment regarding LDaaS. However, a contradicting evidence was found when talking about this issue in the following conversation (Folmer 2016) (Translated from Dutch):

"Business and IT Alignment is a big issue regarding Linked Data. The business does not understand the technology, and IT turns it into a tech party. The alignment between those is drama. There are few people, looking at our own team, who understands both a little bit. Thats is a big issue. Within Kadaster we do it, like I state it in a lecture, how we’re not supposed to do it. This is the business building, and the other building is IT. They divided each other physically. With a clear client and supplier role. Wherein innovation is more on the business side and less on the IT. In my opinion this is a challenge, also regarding to Linked Data. So in general, in my opinion Business and IT alignment is a huge problem for Linked Data."

This seems contradictory with the frequency of which Business and IT alignment is quoted. Furthermore, (LDaaS)_Impact is quoted the most with 313 quotes across all the transcripts. This could imply the importance of considering the impact of LDaaS. In this thesis, a distinction between ’positive’ or ’negative’ impact is not made while coding. There were some suggestions that operationalizing LDaaS would impact the thinking of decision makers in the organization. Negatively, data is an abstract definition, a decision maker has trouble to cope with the abstraction and technique. A change in mindset is required in order to successfully operationalize LDaaS. interviewee provided an example of the process of using LDaaS with regard to business analistsAs one interviewee puts it (Mackelenbergh 2016)(Translated from Dutch):

"So, thats different from other analytical environments, where often a request is made by the fraude analist to the IT department. They make a network drawing, and give such a network drawing back to the fraud analist. Every time a drawing raises new questions, then

(24)

23

an analist has to go back. We skipped such step. In our enviornment the analist has access to the data en become loose. This brings also a problem, because a fraude analist finds it difficult to focuss. He just has to focus on the question, thats a learning curve for the fraude analist."

In the same converstation, the interviewee puts it into the context of legal concepts: "The lawyers are living in the past, they can not help us if it is about privacy matters. So we have to solve that by ourselves and that is challenging. The temptation is big to use the data. That is not only for the Dutch Tax and Customs Administration, but for the whole government. So it is really difficult to cope with privacy issues."

Also, another interviewee provided experiences with regard to the impacts at senior management (Boer 2016) (Translated from Dutch):

"In senior management in many companies. So people who do not know exactly how it works. They know generally two things: They know that a few people became billionaire with smart exploitation of a database. That is the ’opportunity’ aspect. But that is not giving them concrete ideas how to become billionaire by themselves. If they have the feeling that a database is stolen from them, from which they potentially become rich, they will say no to a project. The other aspect is the ’threats’ aspect. Everybody sees a lot of risks in it, also in senior management."

Positively, new, unforeseen possibilities emerge when using Linked Data. As one interviewee commented (Groth 2016):

"And also this power, of Linked Data principles, a lot of people in the past have looked at "we have this data set for this one product". But if you can integrated data, if you can tie that data together, you can start thinking about "Oh, actually we have this potential service by looking across these two data sets" And thats becoming a really power full thing and you can much more quickly create that service. You stop worrying getting the data right. You start worrying about getting the data conceptually right, and linked. And then you do views on it for different applications, versus defining the data and then have to retransform it for a new product."

Furthermore, IT Governance and Information Sharing are subsequently quoted most. With respectively 304 and 301 number of times, these seem to be crucial. There was a sense of IT Governance amongst interviewees. For example, one interviewee, when asked what data governance challenges may be of cloud based services, said (Rissen 2016):

"Yes. Data quality is definitely one. So, that people are producing data, take a lot of care and in the HTML they produce. In the human readable data they produce. They are very careful around statements of fact or statements in Pin Yin or anything like that. But when it comes to Linked Data, or any kind of machine readable data. People don’t seem to care about the quality any more so much. That’s true in most metadata things unfortunately. " Information Sharing was often quoted when the interviewee was actually talking about sharing information in its abstract form. A firm focus was on business intelligence and the challenge of coping with variety of data in order to share information. Talking about this subject an interviewee said (Dijkstra 2016) (Translated from Dutch):

(25)

"Yes. You see that a lot of knowledge could be gained if you bring data together. Seeing patterns, making relations which was before only available in a system or function. You see that there is an increasing need to do that. To have an overview of the lifecyle."

5.2.0.2 Summary of Quotations and Codes A total of 10 codes used for this thesis are presented. Six of these are directly related to the theoretical propositions and its sub-categories. Moreover, four emerging themes were identified, these are also coded. Two of them, B2B domain and Business IT Alignment, are related to the main research question and sub question. Additionally, word clouds are used in order to provide an overview on the context of each interviewee. The code B2B domain entails the context, either organizational or inter-organizational, of the interviewee. Moreover, the code Architecture involves quotations regarding the bigger picture where a LDaaS should operate in. Interestingly, Architecture is neither mentioned in the research questions, nor in the theoretical propositions. Also, quotations are linked to each other in order to analyse the conversation more effectively. Furthermore, a network view was provided in order to show the intercorrelations of the codes.

Overall, the results indicate that several considerations should be taken into account regarding LDaaS Operationlization, because it has a central position within the network view comprising other codes and thus of the theoretical propositions. Further analyses is needed in order to determine its relation with information provisioning concerning theoretical proposition 3.

Additionally, a contradicting finding was presented concerning Business IT Alignment. Despite its low frequency of occurring in quotations, one of the interviewees emphasized the challenge and its importance in his context. This finding already supports research sub question 2 to some extent, which tries to examine the Business IT Alignment aspects of LDaaS.

Furthermore, the impact of LDaaS is mentioned most often. Some positive and negative senti-ments were presented. A change in the mindset of analists, decision makers and senior management seems critical. This seems to be related to the main research question. Nevertheless, IT Governance and Information Sharing were subsequently quoted most and seems important. While IT Gover-nance covers part of theoretical proposition 2, Information Sharing covers a part of theoretical proposition 1.

Interestingly, looking at the propositions Cloud Based Services and Organizational Challenges, seem less relevant with respectively a total number of times quotations of 180 and 267. While, Heterogeneous Data has a total number of quotations of 282, it deserves further analyses.

5.2.0.3 Co-occurrences A single quotation can be associated with more than one code. Through the transcripts, quotations are selected and coded for one or more of the existing codes. Each quotation of a participant comprised then mappings to the selected codes. Further, analyses were conducted on basis of these relations. For example, a quotation like below (Walker 2016) is related to multiple codes. In this case:

P1_Heterogeneous_Data, P1_Information_Sharing and P3_Organizational_Challenges.

This is mainly because the interviewee (1) comments in the context of a Data Provider and Data Consumer (P3 Organizational Challenges), (2) has to cope with a variety of data in order to share information (P1 Information Sharing) and (3) compares Linked Data principles and Semantic Web technologies with other modes of database modelling, in this case XML, in order to manage heterogeneous data (P1 Heterogeneous Data).

(26)

25

"In terms how to model the data, you get a tree structure rather than a table structure. Maybe also with XML you get a more global identity. You get cue names to name thing, rather to just arbitrary text, a name for a column. Then you see that those documents and document store are a separate thing. Sure, you can query across them and do things like that. But its not get that relational aspect to it."

This results into a next set of quantitative analyses. To assess the degree in which codes co-ocurres with another, a co-occurence matrix is applied. Figure 7 compares the amount of times a code co-occures with another one. Each cell comprises the number of times one code co-occure with another code. Also, next to the number, the correlation coefficient measure is presented. This measure determines the intensity to which two codes movements are associated. These values fluctuate between 0 and 1.

Interestingly, as seen in the matrix, P1_Heterogeneous_Data and

P3_LDaaS_Operationalization resulted in a significant high value of 214, with an correlation coefficient of 0,65. This finding could support an earlier finding in paragraph 5.2.0.1, considerations which should be taken regarding LDaaS Operationalization. These considerations could be directed to Heterogeneous Data. Nevertheless, it is not surprising that operationalizing LDaaS comprises considerations of Linked Data and Semantic Web. Another interesting aspect of the data is the strong association with a frequency of 212 between P1_Heterogeneous_Data and

P1_Information_Sharing. These are both categories of proposition 1, which could indicate that managing heterogeneous data could lead to enhanced information sharing. As one interviewee puts (Gort 2016)(Translated from Dutch):

"Formerly, we had centralized databases. Then we moved to decentralized databases. Then there was the Internet. Organized on documents. Subsequently, e-commerce was resolved, so you can have security. Now you will have distributed network, on data, Linked Data. What is not in the proportion of pillarized administrative body responsible."

Moreover, regarding proposition 2, a clear benefit of a data governance framework for a successful usage of cloud based services could not be identified in this analysis. This is mainly because of the low frequency of occurrence in general for cloud based services.

Regarding to proposition 3, LDaaS Operationalization seems not to occur significantly often, 151 times, with Organizational Challenges. However, Organizational Challenges seems to occur signifi-cantly often with Information Sharing with a frequency of 204 and a correlation coefficient of 0,56. A relevant quotation, mapped with both of these codes, is found below when asked whether man-aging heterogeneous data would lead to enhanced information sharing (Heijboer 2016). (Translated from Dutch)

"It is actually the cause of Big Data. There many different types of data which you can not integrate with each other, so you have to do other things, you need other approaches in order to gain insights. I think so, definitively. It is not that obvious if you endorse this statement, that you are also successful with it. That is a challenge in itself."

5.2.0.4 Summary Co-occurences Results in this previous paragraph suggest that there are some interesting co-occurrences to consider. However, some expected co-occurrences did not appear. It was found that operationalizing LDaaS requires several considerations, in particularly regarding

(27)

Figure 7: A comparison of the number of times one code co-occurs with another code

heterogeneous data. This comprises concepts of Linked Data and technologies of the Semantic Web. Moreover, a strong co-occurrence was found with managing heterogeneous data and information sharing. Surprisingly, these are both categories of proposition 1. A cautious causal relation might be suggested for this proposition.

Furthermore, no significant co-occurrence was found between IT Governance and Cloud based services. This is directed to theoretical proposition 2, which hypothesized a more successful cloud based services ensured by the implementation of a data governance framework. Also, theoretical proposition 3 is seemingly impaired, since LDaaS operationalization not occurred significantly often with Organizational Challenges.

Nevertheless, Organizational Challenges did occur significant often with Information Sharing. This could imply everyday challenges organizations face in order to share information in a variety of ways. A relevant quotation is given earlier.

5.2.0.5 Memos Several additional topics were identified and captured with a link to the quotation in ATLAS.ti as "Memos". A list is given below with the number of times it is grounded:

1. Grounded: 52 times. Application LD. Domain or application of which Linked Data is used. 2. Grounded: 13 times. Smart Contracts.22 _{Mentioning Smart Contracts.}

3. Grounded: 12 times. (Process) (Out) Sourcing. For example Business Process Outsourcing. 4. Grounded: 12 times. Common vs. Graph. Other modes of database modelling is

mentioned/-compared.

5. Grounded: 3 times. Persistence.23Persistent data is important in order to establish your model. It is apparent from this list that very often different types of applications of Linked Data are mentioned. As one interviewee puts it in the medical domain (Groth 2016):

"We have lots of ontologies, so large scale taxonomies. One is called EMMeT. That stands for Elsevier Merged Medical Taxanomy. That is using SKOS, and it has millions of concepts in it. And we have other ontologies that are developed."

Also, it is interesting that Smart Contracts is mentioned, either intentionally or unintentionally. The comment below illustrates a concrete example (Boer 2016)(Translated from Dutch):

22_{See: http://searchcompliance.techtarget.com/definition/smart-contract} 23