• No results found

RISK-BASED DATA CLASSIFICATION

N/A
N/A
Protected

Academic year: 2021

Share "RISK-BASED DATA CLASSIFICATION"

Copied!
51
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

RISK-BASED DATA CLASSIFICATION

Towards a Contingency Approach

by

M. BOONSTRA

S2176815

University of Groningen

Faculty of Economics and Business

July 2015

Supervisor: Prof. dr. E.W. Berghout

Word Count: 15.676

ABSTRACT

The information era brings along an increasing amount of personal and sensitive data that is being collected, stored, processed and exchanged within and between organizations. However this increasing

volume of sensitive information is also subjected to an increase in risks and threats. This information is valuable to organizations and therefore requires protection. Data classification is a method widely

used or considered to be used by organizations to address these risks and to simultaneously identify the security requirements for their sensitive information. Despite the fact that data classification is widely used many organizations still use the method inefficient and ineffectively. In addition little research has been conducted on data classification and on how the method can be optimized for an effective use and implementation. Therefore the purpose of this thesis is to improve the understanding

and importance of data classification for both scientist and practitioners. By performing a literature review and a case study the importance of this concept is addressed by identifying strong beneficial drivers for organizations to use data classification. The main contribution of this research is that it contributes to the current methodologies for data classification by providing excellent insights and a

sequence of steps that organizations need to take for an optimal use and implementation of data classification.

(2)

2

TABLE OF CONTENT

1. INTRODUCTION ... 4

2. RESEARCH DESIGN ... 6

2.1 Controllability, Reliability and Validity ... 6

2.1.1 Controllability ... 6 2.1.2 Reliability ... 6 2.1.3 Validity ... 7 2.2 Research Method ... 7 2.2.1 Data collection ... 7 2.2.2 Initial Understanding ... 8 2.2.3 Literature Review ... 8 2.2.4 Case Study ... 8 3. LITERATURE REVIEW ... 11 3.1 Concept of Information ... 12 3.2 Information Security ... 13

3.2.1 Information Security and Risk Management ... 14

3.2.2 Information Security Costs and Data classification ... 15

3.3 Data Quality Management and Data Governance ... 16

3.3.1 Data Classification and Contingency Factors ... 18

3.4 Roles for Data Classification ... 19

3.5 Methodologies ... 23

3.5.1 NIST SP800-60/FIPS 199 ... 23

3.5.2 ISO/IEC 27002 ... 25

3.5.3 Comparison of Methodologies ... 27

3.6 Analysis of the Literature ... 28

4. DRIVERS FOR DATA CLASSIFICATION ... 29

4.1 Drivers derived from the Literature ... 29

4.2 Obligations and Regulations... 31

(3)

3

5. CASE STUDY ... 32

5.1 STANLEY Security NL ... 32

5.2 Applying Theoretical Findings ... 33

5.3 Results Case Study ... 36

5.4 Optimal Solution for Stanley ... 37

6. CONCLUSION AND DISCUSSION ... 42

REFERENCES ... 45

Literature ... 45

Notes ... 49

APPENDIX I Questionnaire Risk-Based Data Classification ... 51

(4)

4

1. INTRODUCTION

The information era has created many opportunities for organizations to conduct new ways of business and simultaneously forms the lifeblood of many contemporary organizations (Gerber and von Solms, 2005; Tankard, 2015). This era brings along an increasing amount of personal and sensitive data that is being collected, stored, processed and exchanged within and between organizations. However, despite the opportunities this increasing volume of sensitive information is also subjected to an increase in risks and threats1. The risks to information include i.a. data fraud or theft and cyber-attacks because sensitive information is not only valuable to the organization itself but also for someone that intends to do harm. The Global Risk Report 20152 ranked those risks among the top 10 global risks that are likely to occur in 2015. These risks are not unfamiliar to organizations and many have experienced their impact. It became obvious that organizations face severe risks to their information and hence it became one of the most important diligences of organizations in the information age to protect their information (Otero, Otero and Queshi, 2010). To help organizations address these risks and threats scientist have performed risk categorizations meaningful to organizations (Sherer and Alter, 2004) and entire information security risk management approaches have emerged, e.g. ISO 27005 and NIST SP 800-30, to assist organizations in the enhancement of their risk analysis and information security (Fenz, Neubeuer and Pechstein, 2014). Although even despite these efforts to assist, organizations still suffer tremendous losses because of security breaches. The information security breaches survey 20143 of PricewaterhouseCoopers performed in the UK shows that while the amount of security breaches decreased, the cost of every individual breach increased. The average cost of one breach for a large organization and a small business were respectively £875k and £90k. The global costs of cybercrime and security breaches in 20144 were approximately $400 billion, which also comprehends the costs of stolen intellectual property. Currently organizations face major challenges dealing with the risks related to information security, since many risks can have serious consequences, i.e. corporate liability, loss of credibility, and monetary damage (Cavusoglu, Cavusoglu and Raghunathan, 2004).

(5)

5

vulnerabilities and opposed threats in order to identify security controls to reduce and mitigate risk (Gerber and Solms, 2001). Data classification also concerns the identification of information assets and the subjected risks. In general classification can be defined as follows; “classification can be seen as a management tool that enables a systematic

arrangement of objects into groups or categories” (Franks, 2013). Data classification is in

line with this definition but is often complex, since the information is systematically arranged based on the subjected risk into different security categories. In addition to this, information is intangible and therefore harder to identify and to value. Thus identifying the information assets is a rather daunting task even though it is a vital part of risk analyses (Gerber and Solms, 2001: Gerber and Solms, 2005) and therefore it also results too often in an inefficient and ineffective use of data classification6.

Despite the interest of organizations to use data classification among the scientific literature little research can be found on data classification and the methods contribution to the risk analysis or information security of an organization. Therefore this thesis focusses on improving the understanding of this method and its contribution to the field of information security among both scientist and practitioners. In order to do so a literature review and case study will be conducted. The purpose of the case study is to validate the theoretical findings, to show the importance of data classification in practice, and provides the case organization with a starting point to identify their critical information assets for their risk analysis and security requirements. From the literature and case study this thesis aims to provide scientist and practitioners with a solution on how to effectively implement data classification for an efficient and effective use. Consequently the main question of this thesis is to answer:

‘How to optimize risk based data classification?’

To answer this question the following sub questions need to be answered:

- Which methodologies for data classification are available in the literature?

- What is the theoretical basis of information security and data classification in the literature?

(6)

6

classification will be discussed in Section 4. After that the case study and the results will be discussed in Section 5. At last this thesis ends with a conclusion and discussion in Section 6.

2. RESEARCH DESIGN

This chapter describes how the research in this thesis is conducted, which quality criteria are applied and the methods that are used. In order to provide an answer to the research question this thesis combines literature review and case-study research. The research question is associated with a rather unexplored area of research and therefore a qualitative research approach is chosen because qualitative research methods help to develop a new understanding of a concept and may lead to the discovery of new information (Corbin & Strauss, 2008). Qualitative research also includes the understanding of the environmental surrounding, i.e. organizational context, which is easier overlooked by quantitative studies (Karami, Rowley & Analoui, 2006). This thesis also strives for meeting certain research quality criteria. These criteria are controllability, reliability and validity (Yin, 1994).

2.1 Controllability, Reliability and Validity 2.1.1 Controllability

Controllability is a precondition for the other quality criteria and also enables other researchers to replicate the study and to check whether they receive the same outcome (van Aken, Berends & van der Bij, 2012). If other researchers do not replicate the study they can use it for the evaluation of the reliability and validity of this thesis. To ensure the controllability of this thesis a detailed description on how the research is conducted will be provided in section 2.2.

2.1.2 Reliability

(7)

7

instruments to collect data. This is called triangulation (Yin, 2003). First a literature review will be conducted and secondly observations and semi-structured interviews will be as a data collection method. To enhance the reliability of the respondents, they are chosen from different departments within the organization. The reason for this is because employees from the same organization still have different opinions, believes or perspectives. These interviews are held on different moments in time, so early or late in the week and early or late on the day to increase the context reliability, however this is difficult due to the time frame of this study.

2.1.3 Validity

The results of this thesis can only be valid if they can be justified by the way they are generated (Aken, Berends & van der Bij, 2012). Hence, this thesis requires an assessment of the relationship between the research results or conclusions and how these results are generated in order to be valid. This thesis applies three types of validity as quality criteria, i.e. construct, internal and external validity. Construct validity is an assessment of the instruments if they measure what they intend to measure. To ensure construct validity this thesis uses triangulation, i.e. using multiple instruments for data collection (Yin, 2003), and made a serious effort to decrease the possible biases. Internal reliability is concerned with relationship between phenomena. In this this thesis theoretical triangulation, i.e. the use of multiple theoretical disciplines, is used. However in this thesis theoretical triangulation is not used to explain the relationship between phenomena but to explain the relationship between multiple concepts. Therefore data classification is approached from the fields of research concerning information security, risk management, and data management. Concerning the external validity of this thesis, i.e. the generalizability, a case study was performed. Even though in practice data classification is a widely used method little case studies have been conducted in order to optimize the use of data classification; therefore the case study aims to provide excellent insights that can be applied to many organizations to enhance their use of data classification.

2.2 Research Method 2.2.1 Data collection

(8)

8 2.2.2 Initial Understanding

The initial understanding of this subject was during the conversations with the thesis supervisor. In order to get a better understanding (nonscientific) literature has been read that gave a proper initial overview of the concept data classification.

2.2.3 Literature Review

In order to find the theoretical basis of information security and data classification literature was found using a systematic search to find relevant articles. As data collection instrument EBSCOhost and Business Source Premier as electronic database were used. Literature was searched by using the key words ‘data classification’ OR ‘information classification’. The Boolean operator ‘OR’ seemed necessary to retrieve a comprehensive set of literature because the terms data and information are often used interchangeably. To reduce the set of articles only articles that included data and/or information classification in their title were selected. A further analysis was done by reading the abstract of every article. Only two articles were found using these keywords that were relevant for this research. So in order to provide sufficient theoretical foundation for this thesis a broader perspective on data classification was taken. From the two articles it was found that data classification contributes to information security. Therefore using the keyword ‘information security’ and by performing a backward and forward search literature was found on information security research. Except for the paper of El Aoufi (2009), this paper was recommended by the thesis supervisor. From the initial understanding it was found that data classification, once taking a broader perspective, seemed to fall under the roof of data (quality) management and data governance. Using the key words ‘data management’ OR ‘information management’ AND ‘data governance’ OR ‘information governance’ articles were found in the same database. Additional literature was found by performing a backward and forward search. Relevant literature was found in the provided references of each article or the cited articles. For an overview of the research method for the literature review see figure 1.

2.2.4 Case Study

(9)

9

(10)
(11)

11

3. LITERATURE REVIEW

(12)

12 3.1 Concept of Information

As stated in the introduction the volume of information is growing and nowadays it forms the lifeblood of organizations (Tankard, 2015). Information is both an end-product and an instrument or input for organizations into the creation of other goods, decisions, and information (Rafaeli, 2003). According to Choo (1996) information has three main purposes to organizations, i.e. organizations “use information to make sense of the changing

environment, to create knowledge for innovation and to make decisions about the courses of action”. Hence information can be seen as an (strategic) asset, which is valuable to an

organization. The real value of information is however rather subjective, for one person information may be valuable or useful and for another the same information can be useless (Kooper, Maes and Lindgreen, 2011).

Computer science distinguishes data, information and knowledge as three different aspects (Alavi and Leidner, 2001). According to Bhatt (2001) defining those aspects is difficult and can only be done from the perspective of the user. Finne (2000) also expresses the fact that many researchers define data, information and knowledge differently and that it is hard to define information because of the fact that humans interact with information differently or perceive information differently. Trkman and DeSouza (2012) state that data is considered as raw and unanalyzed symbols, whereas information relates to meaning and therefore results from the aggregation of data. In addition El Aoufi (2009) states that among the literature different definitions can be found about the concept of information, for example data, information and knowledge can be defined as follows; “Data are considered as raw facts,

information is regarded as an organized set of data, and knowledge is perceived as meaningful information (p3)”. So information is an organized set of data, i.e. information can

(13)

13

The above illustrates that information can be viewed from different perspectives and even distinctions within an aspect are made among the literature. It can be said that it is hard to translate the concept of information to a single definition. Considering this fact it is out of the scope for this thesis to discuss all the views on information. Thus, information is a concept that goes beyond the particular information processed, stored or transmitted by information systems. However due to scope, nature and understanding of this thesis the term information or data, whenever used in this thesis, consist of all the data, information and knowledge that is processed, stored and transmitted within and between organizations. The terms data and information will be used interchangeably throughout this thesis.

3.2 Information Security

As stated in the previous section organizations use information for different purposes and information can be seen as a strategic asset (Choo, 1996). According to ISO 27002 an asset is anything that has value to an organization, which by definition requires protection. Hence information being an organizational asset does require protection (Misra, Kumar & Kumar, 2007). Nowadays one of the primary goals of organizations is managing their information security (Dhillon and Torkzadeh, 2006: Herath and Rao, 2009b: Siponen and Vance, 2010). The management of information security is necessary because information assets are subjected to risk. There are many definitions of risk available among the literature, but there is common ground on the fact that a risk exists because of a combination of a threat, vulnerability and the value of an asset (Gerber and von Solms, 2002). In this combination vulnerability is a weakness in the system and the threat is the source that has the potential of exploiting this vulnerability to harm or cause loss (Gerber and von Solms, 2002). Therefore organizations require formal, informal and technical security controls to address these risks in order to preserve their information security and asset value (Ahmad and Maynard, 2013). These security controls are the steps organizations take to reduce the probability to specific threats and to mitigate risk (El Aoufi, 2009). All in all these controls form the security of information.

The security of information is assessed on three requirements, which are the confidentiality, integrity and availability of information (Finne, 2000). These requirements are also referred to as the CIA triad or information security triad and can be defined as follows:

- Confidentiality

(14)

14

sensitive information will not fall into the wrong hands, while on the other hand making sure the right people have access to it. It protects information from unauthorized disclosure.

- Integrity

Integrity or quality means that the data will stay the same over its entire life-cycle. This includes maintaining the consistency, accuracy and trustworthiness of the data. So data should not be improperly altered or unauthorized modified.

- Availability

The availability of information concerns the accessibility of the data at all times within a reasonable time of response when requested upon by authorized users.

Following these requirements, information security can be defined as the “process of

controlling and securing information from inadvertent or malicious changes or deletions or unauthorized disclosure” (Gerber, Solms & Overbeek, 2001). The main purpose of

information security is to preserve these requirements or properties of information.

3.2.1 Information Security and Risk Management

Information security is necessary to protect the information assets from risks. As mentioned earlier data classification is a method that enables organization to inventory their information assets and concurrently categorize it according to the information’s sensitivity in order to set up security requirements for their information security (Cohen, 1997). While exploring the literature on risk management and information security it is hard to conclude when a risk management starts and when an information security policy and information classification ends. It seems the domains are closely related and information security can be seen as a risk management discipline (El Aoufi, 2009: Gerber and von Solms, 2005; Taubenberger and Jürjens, 2008). Most risk analysis approaches are structured as follows (Herrmann and Herrmann, 2006; Taubenberger and Jürjens, 2008):

1. Identification of business processes and their actors 2. Identification and valuation of assets

3. Identification of security requirements respectively vulnerabilities and threats 4. Assessments of risks

(15)

15

Data classification is a method that follows the first three steps of a risk analyses. It identifies the business processes and their actors in order to identify and to value the information assets. With the identification and valuation of the information assets data classification allows one to place the information asset in a security category based on the security requirements the data needs by looking at the vulnerabilities and threats the information asset is subjected to. Thus data classification will contribute to the overall information security and risk analyses of an organization. Deriving from this it is possible to come up with a proper definition of data classification and its relation to information security; “it is the identification and

classification of data according to its sensitivity into the right security category, in order to preserve the confidentiality, integrity and availability of the data (CIA)”. The security of each

information asset is assessed on the CIA triad and what the impact of each individual property, i.e. confidentially, integrity, and availability, will have on the organization when compromised. In the definition above the sensitivity of information is often used as a synonym for the value of information. As will be explained later the higher the value of the information to the organization or to the organizations suppliers and partners the more severe the impact will be when the CIA properties of the information asset are compromised. Note that, which will also be discussed later, each security category has a security baseline with predefined security controls for the preservation of the CIA properties of the information assets within that specific security category. Hence with the identification of the security requirements for an information asset, i.e. the classification of an information asset to a security category is meant what the impact of each compromised CIA property of the information asset will be to the organization. This thesis focusses however on the optimization of the use of data classification for an organization and not on the total identification of the security baseline of each category.

3.2.2 Information Security Costs and Data classification

(16)

16

(Neubauer, Ekelhart & Fenz, 2008). It are the costs of having information security. See figure 2 for an illustration of this theory.

Figure 2 Security Cost Trade-Off (Gordon & Loeb, 2002)

This also applies to data classification. The highest security category with the most valuable information will contain more operation cost due to more security (controls) assigned to that category. However more security is not always better. Storing highly sensitive data on an external hard drive and placing it in a safe can be very secure and rather inexpensive, but it drastically decreases the usability and availability of the data when requested upon (Tallon et al., 2013). This addresses the fact that all the requirements of the CIA triad need to be considered for an optimal classification of data. According to ISO 27002 data classification can decrease these operational costs because each category will have a security baseline that eliminates a case-by-case risk analysis and custom designed security per information asset. So next to what is previously stated, i.e data classification contributes to the risk analyses of information assets and overall information security of an organization, it can also contribute to a decrease in the costs of having information security.

3.3 Data Quality Management and Data Governance

Knowing that the identification and valuation of information are the first steps to take in order classify information is not enough to optimize its use. Information does not valuate and classify itself but needs to be managed by employees. For the identification and management of information much information can be found among data quality management research

(17)

17

(DQM), because it both concerns research on the quality and management of organizational data (Otto et al., 2010). The management of data and the quality of data has been scientifically researched since the 1990s (Otto, 2010). However, since a decade, researchers have become interested in the comprehensive framework concerning information called ‘data governance’. It should cover every aspect and questions related to the quality and management of data and many organizations consider data governance as a promising approach to ensure that the value and quality of the information assets are maintained (Otto, 2010: Kooper et al., 2011). According to Wende (2007) data governance is part of DQM, which is an upcoming governance framework for both the quality and management of data because the expected business requirements that organizational data needs to meet is increasing (Otto, 2010).

Organizational assets do require protection and management per definition, and besides information organizations have other assets to manage. According to Weill and Ross (2004) organizations have multiple key assets that need to be governed, i.e. human assets, financial assets, physical assets, IP assets, relationship assets, and information and IT assets. From the above mentioned assets the information assets and IT assets relate to IT governance. Between these two assets a differentiation can be made: “IT assets refer to the technologies (computers,

communications and databases) and information assets are defined as facts having value or potential value that are documented” (Khatri and Brown, 2010, p148). Hence there is clear

distinction between IT assets and information assets, namely IT assets are the tangible assets and information assets are the intangible assets of IT governance. This thesis focusses on the management of these intangible assets to gather insights for the optimization of data classification. These intangible assets are extensively researched in the DQM literature (Wang et al., 1998). Data governance is concerned with the governance of the information assets and not the IT assets of an organization and can therefore be seen as the information (intangible) artifact in IT governance (Tallon, Ramirez, & Short, 2013). According to Khatri and Brown (2010) the acknowledgement of organizational data as an assets and the realization that data requires governance is a growing interest of the information systems research and practice community. At this moment there is no clear definition about data governance (Wende, 2007). Khatri and Brown (2010) however define it as follows: “Data governance refers to who holds

the decision rights and is held accountable for an organization’s decision-making about its data assets”. Tallon et al. (2013), based on the work of Khatri and Brown (2010), define data

governance as “a collection of capabilities or practices for the creation, capture, valuation,

(18)

18

This definition does not surpass those of Khatri and Brown (2010) but does a better job in capturing the different domains of data classification, which will be discussed in section 3.4. From the last definition it can be deduced that data classification can gain valuable insights from data governance because it is concerned with the capturing of the information assets (inventory), valuation of the information assets (sensitivity) and with the access control of this information (i.e. a security method to limit access to information and information processing facilities: it prevents unauthorized usage, disclosure or modification and therefore preserves the CIA triad). Even though the classification of data into security categories is not literally mentioned among the literature of DQM and data governance, the above illustrates that data classification will require governance and this governance will influence the way it is implemented in an organization. As will be explained in section 3.4 DQM and data governance prescribe certain roles and accountabilities for the management of data that will be filled in differently by each organization, these roles influence the inventory and valuation process of information assets. When it comes to implementing data governance there is no one size fits all approach (Weber, Otto and Österle, 2009; Otto 2010). Every organization needs to implement it based on their needs and strategy, the same will account for data classification.

3.3.1 Data Classification and Contingency Factors

(19)

19

the current IT governance and Data Governance literature in combination with 37 interviews of top IT executives and identified six antecedents that enable data governance and three antecedents that inhibit data governance, which also cover the contingency factors of Weber et al. (2013). With this knowledge Tallon et al. (2013) build a theoretical model which is illustrated in figure 3.

Figure 3 Antecedents for Data Governance (adapted from Tallon et al. 2013)

Tallon et al. (2013) composition of data governance also comprehends the classification of data by its value. Hence, the antecedents should also influence the way data classification is implemented in an organization. Understanding the influence of these antecedents on data classification is important because according to the model in figure 3 it can increase firm performance and risk mitigation (Barua, Kriebel and Mukhopadhyay, 1995).

Note that data classification can be seen as a component of data governance and therefore is less comprehensive than data governance itself. So these antecedents can only be used to see if they apply to data classification in order to be used for an optimal implementation of data classification in an organization.

3.4 Roles for Data Classification

(20)

20

decision domains for information assets, i.e. data principles, data quality, metadata, data access and data lifecycle. See table 1 for the data governance framework and its domains.

Table 1 Framework for Data Decision Domains (Source: Khatri and Brown, 2010)

(21)

21

can best be approached by using the domain ‘Data Access’. This is especially true if one takes an information security perspective. The questions to ask and decisions to make in this domain are about the value or sensitivity of the data, this can be deduced from the question

“what is the business value of data?”. Eventually this valuation in combination with the

questions “What are the data access standard and procedures?” and “how will risk

assessment be conducted on an ongoing basis?” will result in the assignment of roles and

responsibilities for the decisions about data access and security policies, in other words which information will be more sensitive and will need additional protection. In addition Khatri and Brown (2010) state that a part of this domain is identifying the data needs of the business and addressing safeguards to ensure confidentiality, integrity and availability of the data. This is in line with data classification and as explained in section 3.2 this is referred to as de CIA triad and concerns information security. And as identified by Cohen (1997) and Tankard (2015) data classification can be used as a method to set the foundation of information security and therefore can be seen as a safeguard to preserve the CIA triad.

In order to classify data according to its sensitivity, one must identify the information assets and valuate it, this is part of a risk analysis (Suh and Han, 2003; Gerber and von Solms 2005) and is also a component of the domain ‘data access’. However according to Gerber and von Solms (2005) it is currently the information-centric era and previously bottom-up risk analyses to identify security requirements are no longer valuable because they only asses the risks related to IT assets and infrastructure and not to the information assets. Nowadays a top-to-bottom approach is more suitable to determine the right amount of security requirements, which involves actors from multiple departments rather than only the IT department (Gerber and von Solms, 2005). Data classification is a method that follows this approach in order to identify the security requirements. This conclusion from Gerber and von Solms (2005) is in line with the fact that information needs governance and roles and responsibilities need to be assigned for the management of information. Using the roles and associated responsibilities identified in the DQM and data governance literature it is possible to assign roles to data classification for the identification and valuation of data. This is necessary because the first step of data classification is identifying the business processes and their actors, which are responsible for the identification and valuation of data.

(22)

22

made, these can be placed on a decentralized and centralized continuum throughout the organization. For each organization the roles and locus of accountability can be implemented differently (Wende, 2007). Among the data quality management (DQM) and data governance literature much attention has been given to the roles and the decision making about data. Wende (2007) performed a literature review on DQM and the associated roles. But most roles are not associated with the decisions that need to be made for implementing a data classification scheme because DQM and data governance are approaches that are designed to comprehend all the decisions about data.

(23)

23

this is the step after data classification. Thus for data classification it is key to identify the data owners and its users for an optimal identification and valuation of the information assets.

3.5 Methodologies

Because data classification is a widely used method among organizations, i.a. hospitals, universities, and public, and private organizations, in the practitioners’ community different guidelines have been developed for data classification. Many companies use these guidelines for their classification of data. The guidelines are however practical of nature and have little theoretical basis and assume each organization can implement data classification. Two common used data classification methods or guidelines are the NIST and ISO. These methodologies will be discussed and at the end of this chapter a comparison will be made. Consequently this section answers the first sub question.

3.5.1 NIST SP800-60/FIPS 199

The NIST is responsible for developing information security standards and guidelines. Initially the standards and guidelines are developed for federal information systems to be compliant with the Federal Information Security Management Act (FISMA), but the NIST does find its audience in the private sector as well7. NIST published a series of special publications that get revised and updated through the years. All the special publications are connected and are useful in the security life cycle of information and information systems; its overall purpose is to guide federal organizations in implementing a risk management framework. Figure 4 shows which guideline can be used at a certain moment in time.

(24)

24

Figure 4 Risk Management Framework (adapted from NIST SP 800-60)8

(25)

25

high) based on the impact level when the information belonging to that SC is compromised. NIST is in line with the identified roles previously discussed, data owners can assess the impact level of information but the NIST also addresses the fact that the overall data classification policy needs to be formally approached and supported by the government agency (i.e. management) itself for its success.

Table 2 Security categories based on the impact level (adapted from FIPS 199)12

3.5.2 ISO/IEC 27002

(26)

26

applicable to all organizations regardless of the type, size or nature. Given the freedom of choice the type, size and nature of the organization will influence the number of levels of classification. This also depends on the amount and types of information within the organization and which regulations the organization falls under. It seems that there are some ‘contingency’ factors that influence the levels of classification and its use. Organizations need to setup a classification scheme for themselves, i.e. number of levels of classification and which security objective to choose as security requirements. ISO does only mention the confidentiality of information as a security objective rather than all the CIA triad properties. To prevent confusing the levels of classification are synonyms of what the NIST 800-60 means by the security categories.

ISO provides as example that an organization can choose for four classification levels based on the impact and disruption of strategic objectives and operations:

- Top confidential level: disclosure has serious impact on long term strategic objectives or puts the survival of the organization at risk (Restricted)

- Medium confidential level: disclosure has a significant short term impact on operations or tactical objectives (Confidential)

- Low confidential level: disclosure causes minor embarrassment or minor operational inconvenience (Internal Use)

- Disclosure causes no harm (Public)

(27)

27

By creating groups of information with similar protection needs security procedures for this group can be setup that apply to all the information in that group. Hence this decreases the amount of time and money spend on case-by-case risk assessment and the custom design of security controls per information asset. Overall data classification provides people dealing with information with a proper guidance on how to use and protect it.

3.5.3 Comparison of Methodologies

The methods have some commonalities but are also quite different in nature. See table 3 for an overview. Based on the desired amount of guidance organizations can choose one of the standards, both are however sufficient standards developed by leading practitioners. Both standards can be used simultaneously for realizing an optimal use of data classification. The prescriptive nature of the NIST can be out of line with the needs of a private organization however the comprehensive description of the security objectives are more useful than when using the ISO standard because ISO lacks this description. This is especially true if an organization wants to use all CIA properties as security objectives, which is recommended in section 3.2.2 for an optimal data classification. Both methodologies have their own strengths and complement each other; hence an organization can use both simultaneously. For example an organization can setup a classification scheme containing four levels of classification (ISO) and can use all three CIA properties to assign the information to the categories (NIST). The methodologies are in line with the roles and responsibilities identified in the DQM and data governance literature.

(28)

28 3.6 Analysis of the Literature

Data classification can be viewed from multiple research disciplines and covers principles of three main research domains, i.e. DQM research, risk management research and information security research. Valuable insights are gained on data classification by analyzing these three research disciplines and consequently the first two sub questions of this thesis are answered through the literature review. In addition to the answers on the first two sub questions the literature review can provide a preliminary answer on the main research question, i.e how to optimize risk based data classification, due to the insights that are gathered: Data classification concerns the identification and classification of data according to its sensitivity in the appropriate security category (Cohen, 1997; Tankard, 2015) and simultaneously can be seen as the foundation of information security (Tankard, 2015). The method requires a top-to-bottom approach for its use and needs to include personnel from all the departments and levels in the organization rather than only the IT or security department (Gerber and von Solms, 2005; Bernard, 2007). Senior management needs to decide on how data needs to be classified (i.e. number of security categories and the criteria for the CIA triad) to preserve the organizational vision on information security. They also need to provide guidance by setting up a formal plan, i.e. a policy, that data owners and users can execute (Bernard, 2007). Assigning roles and responsibilities in the organization is necessary for the governance of data, which is in this case the identification and valuation of information assets (Wende, 2007). Data owners need to be assigned and users need to be identified because; these roles are key in the identification and valuation of information assets, they can impose security controls and requirements, and can apply local knowledge to classify the information (Khatri and Brown, 2010: El Aoufi, 2009; Gerber and von Solms, 2005; Bernard, 2007).

(29)

29

4. DRIVERS FOR DATA CLASSIFICATION

The optimization of data classification can only be of relevance for organizations if the business drivers for organizations to implement data classification are identified. The drivers indicate the importance of data classification for the protection of organizational information assets and how organizations can benefit if they effectively use and implement data classification. There are numerous drivers for organizations to use the method. In general there are two main drivers for data classification, mainly because organizations have to and because they want to. This chapter first discusses the drivers mentioned in the literature and provides an overview of what is researched in the literature review. Secondly the regulations and obligations are discussed and after that additional beneficial drivers for data classification will be mentioned.

4.1 Drivers derived from the Literature

(30)
(31)

31 Table 4 Overview of the main methodologies and literature researched

4.2 Obligations and Regulations

Organizations have to secure their information because of regulations and obligations. The regulations can differ among the industry or country in which an organization operates. Most common regulations are the ‘The Health Insurance Portability and Accountability Act’ (HIPAA) in the US or the Personal Information Protection and Electronic Document Act in Canada (Fung, Wang & Philips, 2007). Other regulations are the European Union General Data Protection Regulation (GDPR) or the Generally Accepted Privacy Principles (GAPP). Remember that the NIST’s special publications are all developed in order to comply with the FISMA. All of the regulations require organizations to protect their information by i.a. implementing controls for the privacy, access and storage of data. Failing to comply with the regulations can result in severe fines (Tankard, 2015). Based on the industry and country an organization operates in it must identify under which regulation it falls. Besides regulations organizations can also have contractual agreements with their customers or business partners who impose security requirements and require the organization to protect their information18.

4.3 Business Drivers

(32)

32

an organization (Tankard, 2015). Data classification can also be of economic benefit, it has the potential to provide work and cost savings. It can minimize administrative overhead when organizations formalize who the data owners are and the ones responsible for the protection of the data. It also shows a commitment of the organization to protect e.g. customer information and when strategically presented it can provide a competitive advantage over competitors (e.g. being ISO 27001 certified)19. However, despite the drivers mentioned the most important driver for organizations to use data classification is that a formal approach to data classification and access rights ensures the organizational goals for information security are satisfied and that not the goals of the employees.

5. CASE STUDY

This chapter includes the case study to validate the findings from the literature review and to find additional answers on how data classification can be optimized for the case organization and other organizations. First the choice for the case site will be explained and after that the antecedents identified in the literature will be discussed. At last the case results and an answer to optimize data classification for the case organization will be provided.

5.1 STANLEY Security NL

(33)

33

customer. This makes Stanley an ideal case site because data classification plays an important role in this process since a lot of information within a customer dossier has a high sensitivity level, e.g. the total amount of security services a customer obtains or installation maps for security cameras. These information assets also require protection with i.a. access rights. In addition STANLEY Security NL is already an ISO 9001 certified organization and aims to receive the certification of ISO 27001. The results of the case study provides Stanley with suggestions for an optimal solution for data classification and an answer for the main research question and in doing so it helps Stanley with a start to receive this certification. The above makes this case site ideal for the case study.

5.1.1 The Case study

How the case study is performed and what the goals are is explained in section 2.2.4. Interviews are held with the sales director, operations director and with one sales manager and project manager due to the time frame of this thesis (see Appendix I for the questionnaire) to identify and valuate the information assets that require protection and to set up the access controls. In doing so the antecedents identified in section 3.3.1 need to be kept in mind to see if the antecedents influence the design and implementation process of data classification at Stanley. The other insights from the literature review need to be kept in mind as well. For setting up a classification scheme at Stanley both the NIST and ISO are used as guidance because they can complement each other (see section 3.5.3). In addition, the corporation Stanley Black & Decker has a data classification policy available based on ISO, however this policy is not yet tailored and implemented in the Netherlands for their security division; nonetheless this policy can be used as a starting point.

5.2 Applying Theoretical Findings

(34)

34

use of data classification. Table 5 shows the enabling antecedents, the description of these antecedents, when it enables data governance according to Tallon et al. (2013) and how the antecedents apply to Stanley for data classification.

Table 5 Enabling antecedents identified by Tallon et al. (2013) applied to Stanley

(35)

35

that it is possible to use data classification. These antecedents are only concerned with data governance.

When applying the enabling antecedents to Stanley some did not seem concerned with the enabling of data classification. Data classification can be implemented regardless of the alignment between IT strategy and organizational strategy, because data classification is not associated with data analytics as implied by Tallon et al. (2013). Concerning IT standardization Stanley is behind of schedule. All sites in Europa are scheduled to receive SAP as ERP system in the coming years. However at this moment Stanley still uses their old systems, i.e. salesforce (CRM), field assist (support system for technicians and field engineers) and Navision (ERP system). Even though this could inhibit Stanley to use data classification it does not entirely. The information assets of each business unit that are processed, stored, and transmitted within and between these systems are stored at the same file server of Stanley. Each employee can however receive access rights to the file server and therefore can have access to the information assets of each business unit. In addition when all the information assets are stored at the same file server this will make it easier to identify the assets. Thus a lack of IT standardization does not influence the use of data classification for Stanley. The IT culture of Stanley is one that will inhibit Stanley to adapt data governance because then most employees need to consider data as a strategic asset. Regarding the use of data classification employees do not need to consider the asset being strategic because not all sensitive information assets are of strategic importance. However for the use of data classification employees need to know how to classify data and need to understand the purpose, i.e. the necessity, and contribution of data classification to the protection of the information assets. The IT culture does not enable or inhibit the use of data classification at Stanley regardless of the fact if the culture at Stanley does or does not support IT.

(36)

36

most employees, therefore Stanley is digitalizing all the customer files into a digital archive for efficient use and storage. This process involves the identification of information assets and also the valuation of these information assets because access rights need to be assigned to employees that should have access or should not have access to the digital archive. The three antecedents mentioned above all enable the use of data classification at Stanley.

Consequently it can be said that the antecedents who are initially concerned with the implementation of data governance can be used to a certain point for data classification. Some antecedents are not in favor for enabling data governance but have no effect on the use of data classification. Only the industry regulations, the growth rate of information and the organizational structure and IT structure enable data classification. Thus it provides Stanley with an initial understanding on what can and will enable/inhibit the use of data classification and consequently on how to approach the implementation of data classification. Deriving from the above the first step to optimize data classification at Stanley would be to increase the awareness of employees about information security risks and its possible impact on the organization; hence the importance of information security needs to be addressed. The employees do not need to consider information to be of strategic importance because not all information is strategic of nature but the users need to know how and why to classify it. Next the role and contribution of data classification as a method to address these security risks and as a method for the identification of security requirements for the information assets needs to be explained. This will enable its use and contributes to an effective implementation. To do so and as derived from the literature review a formal approach to data classification is necessary and a policy needs to be setup by senior management that data owners and users can use so all employees across the organization classify information according to the same requirements.

5.3 Results Case Study

(37)

37

classifying information but exceptions need to be made for the access rights for the continuity of the business. Setting up a data classification scheme does require a contingency approach just as a data governance or DQM framework. For an optimal solution of data classification at Stanley it can be concluded that some steps need to be taken for its implementation and use. Senior management needs to setup a general policy for data classification, which need to include a description of the levels of classification and guidance for employees to classify data into categories. This policy should than be adapted and modified by the data owners of the different business units because each business unit processes and possesses different information assets and has different information needs. The adapted policy should at last be communicated with all the users of information within the business unit to classify all information and to realize access rights that do not inhibit the business continuity. These results will be mentioned and clarified in the next section when an optimal solution for data classification at Stanley will be discussed. An additional result is that during the interviews issues about the information’s life-cycle were addressed; “how long does it need to be stored

and does the information stay in the same category over time?” (project manager). This is a

valuable insight because information has a life-cycle and its value may chance over time. For an up to date data classification scheme it is therefore necessary to perform period revisions on the current classification scheme and the identified information assets. These are also questions that are related to other domains of the data governance model, e.g. the domain ‘data quality’ or ‘data lifecycle’ (see table 1), therefore data classification may be a good starting point for an overall implementation of a data governance framework because the method already identifies most of the information assets.

5.4 Optimal Solution for Stanley

(38)

38

have a designated data owner, since the data owner is responsible for; determining the appropriate sensitivity categorizations and security requirements for its business unit, making decisions about who can access the information, and ensuring appropriate controls are utilized in the storage, handling and regular use of information. The business unit managers and/or process owners are identified through the organizational chart of Stanley and through documentation that states who the process owner of which process is, because these employees also need to be assigned as the data owner for their business unit or process. For Stanley these are the sales director, sales managers, operations director, service managers, project managers, technical director and the business unit manager of the security operations center (SOC). According to the SBD policy it is also their responsibility to choose an appropriate information categorization label for each classification level. The users or personnel (data custodians) are those that ensure protection of Stanley’s information in accordance with the policy of the data owner on access control, information sensitivity categories and information security requirements. As explained in section 3.2 and expressed by the NIST for an optimal classification scheme the valuation of information assets need to be based on all the CIA properties. The policy therefore requires a clear description of the CIA properties, and when the impact of unauthorized disclosure, modification or disruption of access is considered high, moderate, or low. Stanley should just as the NIST take the FIPS 199 definition of the impacts for the CIA properties of information in table 2 as guidance because ISO and the SBD policy lack a comprehensive description of all properties. ISO and SBD only use confidentiality as a security requirement for the information assets. By using all the CIA properties as security requirements a more thorough evaluation is accomplished. This is necessary because, as will become clear later, e.g. availability is also a property that is very useful to provide a good assessment on how information is used by the employees. Table 6 provides Stanley with an optimal data classification scheme. The definition of the CIA triad in table 2 is used to assess the sensitivity and the SBD policy, which is based on ISO, is used to determine the amount of classification levels and their labels.

SBD already used four classification levels and also labeled them. The number of levels and labels were maintained. The operation director stated; “I should keep the same amount of

categories as (SBD) in this scheme, in that way we will always be in line with the policy that is given to us... Based on the policies we use and how we conduct business we are evaluated by HQ and therefore need to be in line with SBD”. The information was found i.a. by asking

(39)

39

they use. As derived from the literature review the data owner or user must consider all the CIA triad properties when classifying information. Occasionally it occurs that information cannot be labeled to a specific level of classification based on the requirements. Therefore a ‘default categorization’ needs to be addressed. All information that cannot be labeled should by default be labeled as private, i.e. internal use only. In this way it will be at least protected against unauthorized disclosure, modification or disruption of access. Never should the information be made public if it is not approved by management.

Table 6 Classification Scheme for Stanley based on ISO and NIST

(40)

40

Stanley to identify and document all their information assets for eventually an effective governance of data. As stated in section 5.3 data classification can be a good starting point to implement a data governance framework.

Table 7 Illustration for the valuation of information assets

Table 7 is used as an illustration on how to valuate information assets and contains the main

customer information assets of Stanley. Interviewees were asked to assess the impact of each

(41)

41

Stanley because of their organizational type (e.g. government). Only the sales director and operation director or employees with additional screening (of good behavior) may access these files because of their high level of sensitivity. Due to the contractual agreement this information may not be stored electronically and need be stored in a safe. Note that of course not all information assets at Stanley regard customer information assets, so even though only customer information assets are illustrated in table 7 other information assets are identified during the case study (see table 6). Concluding from this the optimal classification scheme for Stanley is to use four levels of classification in which information is classified based on; the security requirements (i.e. impact on all the CIA properties), and regulations and/or contractual agreements.

According to ISO the access control policy needs to be in line with the classification scheme. From the case study also the access rights could be identified (see table 8) by asking the interviewees to which information they have access to and if other employees also needed access to this information. Note that there is taxonomy; the individuals who have access to the information in the level restricted do also have access to the lower levels. Access rights for each employee should be given and approved by the data owner. The IT manager or the HR department of Stanley needs to function as an administrator to assign the authentication and authorization rights to the employees. Each new employee requires a screening from the HR department and based on his or her job requirements and associated information needs access rights can be assigned by HR, however only by approval of the data owner.

(42)

42

Even though ISO states that the data classification policy needs to be in line with the access policy, this does not apply to Stanley. Exceptions need to be made since, e.g. the maintenance records and installation maps need to be available when requested upon by technicians when conducting maintenance or other duties at the customer site. The same accounts for account managers. One of the project managers mentioned: “If one of our technician needs to perform

a maintenance check he needs to take a look at the maintenance record of last year”. In

addition to the project manager one of the sale managers stated: “If one of our customers

demands an increase in the amount of security services, e.g. cameras, my account manager needs to look at the contractual agreement and the installation map to become aware of the current amount of security services the customer obtains from us… To sell additional security services the account manager needs to know what the customer already has to see what is possible”.

So when setting up a classification scheme the high water mark can be applied however organizations need to realize how the information is being used by the owners and users. Sometimes a trade-off and exception needs to be made between the security of information and continuity of the business. The quotes above illustrate that even though some information is categorized as restricted, e.g. installation maps, some employees do need access to perform their jobs effectively. These exceptions however need to be granted on an individual basis and approved by the data owner since the data owner is also responsible for the access rights.

6. CONCLUSION AND DISCUSSION

(43)

43

influence. For an effective implementation of data classification it is crucial to follow a sequence of steps. Following a predefined guideline, e.g. the NIST or ISO, for data classification does not work. Therefore it is essential that a policy is setup by senior management that can be used as guidance for the data owners. The policy needs to comprehend a clear description of the minimum number of levels of classification with an obvious label. In addition the policy needs to include a clear description of all the CIA security properties and when the impact of a compromise is high, medium, low or not applicable, i.e. when unauthorized disclosure, modification or disruption of the availability of the information asset will have an serious impact or not. Using all the CIA properties as security requirements provides a more thorough assessment on the sensitivity of the information asset. In addition by using all the CIA properties an organization also assesses how or when users need data for their day to day activity, which is essential for identifying the access rights. The next step is to assign roles and responsibilities for further implementation and use of data classification. Data owners need to be assigned and senior management needs to communicate the policy with them. Then the policy needs to be tailored by the data owners based on the needs of their business unit and communicated with the users. Next the data owners and users need to classify the information. For an optimal classification scheme the high water mark concept needs to be applied, but exceptions about access rights need to be made in order for the business to continue. When the method is fully implemented period revisions need to take place in order to keep the classification of information up to date, because the value may change over time. One single optimal classification scheme for the whole organization is farfetched because it involves different departments with different information assets and needs. By taking this contingency approach to implement data classification an optimal data classification scheme does exist for each business unit in the organization that will still be in line with the policy setup by senior management.

(44)

44

classification at an organization. Even though this thesis provides organizations with an approach to optimize data classification it has its limitations. First of all one case study was performed and even though the case study resulted in valuable insights it does decrease the thesis its generalizability. Secondly due to the time frame of this thesis not all data owners could be interviewed which left the case organization with a starting point. A third limitation was the lack of scientific literature available on data classification which made it necessary to approach data classification from different research domains. This indicates that it is an understudied concept and made it more difficult to apply the theoretical findings in practice. However it illustrates its potential to grow as a subject for scientist to research. Hence further research is necessary to see if the real performance contribution of data classification to risk analysis and information security can be measured. Data classification also has the potential to be a good starting point for the implementation of a data governance framework and therefore more research needs to done on what the best way will be to implement such framework with the use of data classification.

(45)

45

REFERENCES

Literature

1. Aken, van J. E., Berends, H., Bij, van der H. (2012). Problem solving in organizations – A methodological handbook for business and management student.

2. Ahmad, A. & Maynard, S. (2013). Teaching information security management: reflections and experiences. Information Security Management, 22 (5), 513-536

3. Alavi, M. & Leidner, D.E. (2001). Review: knowledge management and knowledge management systems: conceptual foundations and research issues. MIS Quarterly, 25 (1), 107-136

4. Barua, A., Kriebel, C.H., & Mukhopadhyay, T. (1995). Information technologies and business value: An analytic and empirical investigation. Information Systems

Research, 6 (1), 3-23.

5. Bernard, R. (2007). Information Lifecycle Security Risk Assessment: A tool for closing security gaps. Computer & Security, 26 (1), 26-30

6. Bhatt, G.D. (2001). Knowledge management in organizations: examining the interaction between technologies, techniques and people. Journal of knowledge management, 5 (1), 68-75

7. Campbell, K., Gordon, L.A., Loeb, M.P. & Zhou, L. (2013). The Economic Cost of Publicly Announced Information Security Breached: Empirical Evidence From the Stock Market. Journal of Information Security, 11 (3), 431-448

8. Cohen, F. (1997). Information system defenses: A preliminary classification scheme.

Computer & Security, 16 (2), 94-114

9. Corbin, J. & Strauss, A.L. 2008. Basic of qualitative research: Technique and

Referenties

GERELATEERDE DOCUMENTEN

50 There are four certification schemes in Europe established by the public authorities.The DPA of the German land of Schleswig- Holstein based on Article 43.2 of the Data

Comparing our findings from the EC European citizenship policy goals, activities pro- moting European citizenship, the actual European citizenship level among younger Europeans, and

So recourses are in fact the instruments by which social needs can be fulfilled (Steverink & Linden- berg, 2006). We look into the barriers and facilitators older people face

Als wordt gekeken naar absolute concentraties in plaats van naar temporele patronen dan blijkt dat alleen het fase 2 model in staat is om anorganisch stikstof te reproduceren en

under a threshold coverage; change the neighbor-gathering method type, shape parameters and number of compute threads used by PhyML or RAxML; allow the possibility of imputing a

De afwezigheid van gebouwomtrekken op de ferrariskaart en de Atlas van Buurtwegen op de betrokken percelen is een bewijs voor het feit dat alleszins in de 18 de en zeker ook

The research has been conducted in MEBV, which is the European headquarters for Medrad. The company is the global market leader of the diagnostic imaging and

Procentueel lijkt het dan wel alsof de Volkskrant meer aandacht voor het privéleven van Beatrix heeft, maar de cijfers tonen duidelijk aan dat De Telegraaf veel meer foto’s van