Probabilistic Risk Analysis in Business Continuity Management

Hele tekst

(1)University of Amsterdam MSc Stochastics and Financial Mathematics Master Thesis. Probabilistic Risk Analysis in Business Continuity Management. Author: Yu Xu 10311009 bluefishxy@gmail.com. Supervisor: ing.Egbert Smit ABN AMRO Bank N.V Examiner: DR.Bert van Es Universiteit van Amsterdam. August 27, 2014. Universiteit van Amsterdam.

(2) To my parents and my girlfriend....

(3) Abstract Probabilistic Risk Analysis in Business Continuity Management Yu Xu. This thesis explores the business structure and dependencies within a bank using Business Impact Analysis (BIA) and Business Continuity Risk Analysis (BCRA). The aim is to investigate criticality and vulnerability to damage of its components. In the first stage we query the connections between processes of interest and buildings or applications where damages and attacks observed are stored in a business graph database. Subsequently, the structure can be transformed into a quantitative world where centralities are computed to show the properties of the network. Furthermore, we build a Bayesian network in the second stage, which involves probabilistic analysis of business continuity management (BCM). The risk probabilities are estimated and the dependencies of business components are represented by conditional probabilities. In addition, to answer the probability questions given evidence, i.e. value of certain components, Bayesian inference algorithms are proposed. In order to validate the accuracy of input parameters, we present a sensitivity analysis to examine their interactions. An application of the probabilistic model is the Value at Risk (VaR) that combines risk probability distributions and loss distributions to calculate the maximum loss for which the likelihood does not exceed a certain confidence level. Key words: Business Impact Analysis, Business Continuity Risk Analysis, Business Continuity Management, Bayesian network, Value at Risk. ii.

(4) Acknowledgement At the beginning of 2014, I was thinking about what I am going to do with my final thesis. Half a year passed by and I am sitting in a large, bright and busy office at Foppingadreef, enjoying my last month working in ABN AMRO Bank N.V. I didnt have any working experience before this internship, but I have adapted to the full daily working life now. I have spent an amazing time in the bank and will never forget how this precious experience stimulated and enhanced me. My daily supervisor in the bank, Egbert Smit, who was also my interviewer, gave me a lot of help on the research project. He educated me about business continuity management and guided me a lot on my working progresses. I would like to express my gratitude to other colleagues in BCM department: Marc van Doorenmaalen, Wim Hut, Marga de Lange, Irene Lewis, Robert Dieben and Nazima Guman. They never hesitated to help me when I asked. My thanks also go to the whole CISO MT, where I met many interesting people and learned a lot from them. I would like to thank specially Bert van Es, my supervisor at the university, who gave me beneficial lectures as well as lecture notes. I had inspiring discussions with him during our meetings, in which I appreciated his opinions and recommendations to my research. I would also like to thank Peter Spreij and Erik Winands, who gave me a lot of instructions on looking for an internship opportunity. Last but not the least, I would like to say thank you to my parents. We had video chats every week and they always concerned and encouraged me both in my work, study and daily life. Thanks are also given to my girlfriend; we had a in relationship for almost four years but two years living apart. She means a lot to me and I have to express my apologies that I couldnt accompany her when she needed me during these years. I love my family and all the friends who shared an exciting time with me in Amsterdam.. Yu Xu.

(5) Contents. Abstract. ii. List of Figures. vi. List of Tables. viii. 1 Introduction 1.1 Business Continuity Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Literatures and Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Business Impact Analysis 2.1 Business Graph Database . . . . . . . . . . . 2.1.1 Business Structure in a Bank . . . . . 2.1.2 An Application of Query in Neo4j . . 2.2 Business Network Analysis . . . . . . . . . . . 2.2.1 Basic Network Concept . . . . . . . . 2.2.2 Centrality . . . . . . . . . . . . . . . . 2.2.3 An Application of Centrality in Gephi. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 3 BCRA: Model Description 3.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Independencies . . . . . . . . . . . . . . . . . . . 3.2.2 Factorization . . . . . . . . . . . . . . . . . . . . 3.3 Local Probabilistic Models . . . . . . . . . . . . . . . . . 3.3.1 Tabular CPDs . . . . . . . . . . . . . . . . . . . 3.3.2 Noisy-OR CPDs . . . . . . . . . . . . . . . . . . 3.4 Application of a Bayesian Network in Business Structure 4 BCRA: Risk Probability Analysis 4.1 Risk Data . . . . . . . . . . . . . . . 4.1.1 Natural Disaster . . . . . . . 4.1.2 Other Risks . . . . . . . . . . 4.2 Distribution Fitting . . . . . . . . . 4.2.1 Goodness of Fit . . . . . . . . 4.2.2 Pareto Distribution . . . . . . 4.2.3 Application . . . . . . . . . . 4.3 Parameter Uncertainty . . . . . . . . 4.3.1 Bootstrap Confidence Interval 4.3.2 Application . . . . . . . . . .. . . . . . . . . . .. iv. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . .. . . . . . . . . . .. 1 1 2 4. . . . . . . .. 6 6 7 7 9 9 12 14. . . . . . . . .. 19 19 21 21 24 26 26 27 30. . . . . . . . . . .. 35 35 35 38 39 39 41 42 46 46 50.

(6) 4.4. Parameter Learning . . . . . 4.4.1 Dirichlet Distribution 4.4.2 Bayesian Estimation . 4.4.3 Application . . . . . .. . . . .. . . . .. . . . .. 5 BCRA: Bayesian Network Analysis 5.1 Bayesian Inference . . . . . . . . . 5.1.1 Inference Algorithms . . . . 5.1.2 Inference Scenario Tests . . 5.2 Sensitivity Analysis . . . . . . . . . 5.2.1 Functional Relationship . . 5.2.2 Application . . . . . . . . .. . . . .. . . . .. . . . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. . . . . . .. . . . .. 50 50 51 53. . . . . . .. 55 55 55 56 57 58 61. 6 BCRA: Value at Risk 65 6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7 Conclusion and Suggestions 7.1 Conclusion . . . . . . . . . 7.2 Suggestions . . . . . . . . . 7.2.1 Bayesian Network . 7.2.2 Probability Analysis 7.2.3 VaR . . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. A Maximum Likelihood Estimation of α. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 69 69 70 70 71 71 73. B Sample Results of Other Risks 74 B.1 Wind Speed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 B.2 Water Level Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76. v.

(7) List of Figures 1.1 1.2 1.3 1.4. Bank Risk Taxonomy . . . . Business Structure In A Bank Phases of BIA . . . . . . . . . Phases of BCRA . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 2 3 3 4. Graph Elements and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview Nodes and Relationships of Business Structure . . . . . . . . . . . . . . Business Processes Nodes with Names and IDs . . . . . . . . . . . . . . . . . . . Overview of Output Result in Neo4j under the Restriction of 10 Applications . . Undirected and Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . Node with High Betweenness Centrality and Low Degree Centrality . . . . . . . Data Importing from Neo4j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grouped, Colored Network and Layout in Good Visualization . . . . . . . . . . . Nodes Properties and Metrics in Data Laboratory . . . . . . . . . . . . . . . . . Out-degree of “Business Process” and in-degree of “Application”. The smallest value of out-degree in “Business Process”has “BP24”, which has one process and one RTO. The largest value of in-degree of “Application” have “APP1045” and “APP1033”, they are both used by 21 processes . . . . . . . . . . . . . . . . . . . 2.11 Betweenness centrality of “Process”. The most influential one is “P352” under this measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 In-degree and eigenvector centrality of “Building”. Both the metrics indicates that “BU10016” has more impacts on business lines than other buildings have . . 2.13 Out-degree, betweenness and eigenvector centrality of “Business Line”. “BL5011” has the lowest out-degree (only located in one building), and the largest values of betweenness and eigenvector centrality have “BL5017” and “BL5016” . . . . .. 6 7 8 9 9 13 14 15 15. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . in Different . . . . . . . . . . . . . .. . . . .. 3.6. An Example of Directed Graph G . . . . . . . . . . . . . . . Flow of Probabilistic Influence in Bayesian Network . . . . Noisy-OR Model . . . . . . . . . . . . . . . . . . . . . . . . Building Risk Model . . . . . . . . . . . . . . . . . . . . . . CPD of Risk Nodes, Building Nodes and Business Process Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Risk Model . . . . . . . . . . . . . . . . . . . .. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8. Netherlands earthquake magnitude data from KNMI, 1990 to 2014 . . . . . . . Water level and flow strength reference example of Katerveer, the Netherlands ECDF and Fitted CDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Density of Four Fitted Distributions with the Empirical Data . . . . . . . . . . ECDF and Fitted CDF with Limit . . . . . . . . . . . . . . . . . . . . . . . . . Log-log Scale Plot and Fitted Distributions . . . . . . . . . . . . . . . . . . . . Bootstrap Replications of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QQ-Plot of α and log(α) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. 3.1 3.2 3.3 3.4 3.5. vi. . . . . . . . . . . . . . . . . Nodes . . . . . . . .. 17 17 18. 18 20 22 28 31. . 32 . 33 36 38 43 43 44 45 49 49.

(8) i +Xi 4.9 Ratio Fluctuation of αα+X During Learning . . . . . . . . . . . . . . . . . . . . . 53 4.10 Random Numbers Generated from Dirichlet Distributions . . . . . . . . . . . . . 54 4.11 Surface of Dirichlet(95, 4, 1), 1000 samples . . . . . . . . . . . . . . . . . . . . . 54. 5.1 5.2 5.3 5.4. P (P rocess | Hack) vs P (Hack) . . . . . . . . . . . . . . . . . . . . . . . . . . . P (f ail P 124 | yes Hack) vs P (f ail P 232 | f ail P 124) . . . . . . . . . . . . . . P (f ail P 232 | yes Hack) vs P (f ail P 232 | f ail P 124) . . . . . . . . . . . . . . (a) P (f ail P 124 | yes Hack) vs P (f ail P 121 | f ail P 124) (b) P (f ail P 124 | yes Hack) vs P (f ail P 236 | f ail P 124) (c) P (f ail P 124 | yes Hack) vs P (f ail P 234 | f ail P 124) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 62 . 63 . 63. . 64. 6.1 6.2 6.3 6.4. Aggregated Loss Distribution . . . . . . . . . . . . . . . . . . . . Loss Distribution in Log-log Scale . . . . . . . . . . . . . . . . . Simulated Loss Data . . . . . . . . . . . . . . . . . . . . . . . . . Aggregated Loss Distribution: (a) Small Loss Amounts (b) Large. 7.1. VaR, CVaR Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71. B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8. Netherlands Wind Speed Data and Fitted Density . . . . . . . . . . . Pareto Fitting with Best Fit but Too Large xm =17.3 (close to 20) . . Pareto Fitting with Good Fit and Appropriate xm =10.3 (far from 20) CDF of Arnhem and Culemborg brug . . . . . . . . . . . . . . . . . . CDF of Den Helder and Deventer . . . . . . . . . . . . . . . . . . . . . CDF of Eemshaven and IJmuiden . . . . . . . . . . . . . . . . . . . . . CDF of Katerveer and Roermond boven . . . . . . . . . . . . . . . . . CDF of Rotterdam and Westkapelle . . . . . . . . . . . . . . . . . . .. vii. . . . . . . . . . . . . . . . . . . . . . . . . . . . Loss Amounts. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 66 67 67 68. 75 75 75 76 76 77 77 77.

(9) List of Tables 3.1. A summary if a trail is active or not depends on whether W belongs to evidence set Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example of CPT of Business Process 2 in Figure 3.2 . . . . . . . . . . . . . Parameter table of Noisy-OR model when n=3 . . . . . . . . . . . . . . . . . . CPT of an effect variable with 3 parents that transformed by Noisy-OR model Noisy-OR model of example in Figure 3.2 . . . . . . . . . . . . . . . . . . . . . CPT including leak state in Noisy-OR model . . . . . . . . . . . . . . . . . . . An example of Noisy-MAX distribution . . . . . . . . . . . . . . . . . . . . . . Transformation of Noisy-MAX distribution . . . . . . . . . . . . . . . . . . . .. . . . . . . . .. 23 26 29 29 29 30 30 30. Details of Netherlands earthquake data in 1993 . . . . . . . . . . . . . . . . . . . Details of flood data from KMNI, Katerveer, the Netherlands, 12/10/2000 . . . . Details of wind data in 0.1 m/s from USGS, New York, the USA, 08/01/2001 to 10/01/2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Fitting Results of Four Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Comparison of K-S Statistic Values, Original vs Limited . . . . . . . . . . . . . . 4.6 Parameter combinations of Pareto distribution with different K-S statistic values 4.7 Probability of Earthquake with Different Magnitude Intervals . . . . . . . . . . . 4.8 95% Confidence intervals of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Lower bound and upper bound for the 1 in X years earthquake . . . . . . . . . . 4.10 Learning Results of Different α . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36 37 37 43 44 45 46 50 50 53. 5.1 5.2 5.3 5.4 5.5 5.6. Sampling Algorithm Efficiency . . . . . . . . . . . . . . . . . . . . . Inference Results: Evidence on Risk . . . . . . . . . . . . . . . . . . Inference Results: Evidence on Business Process . . . . . . . . . . . Sensitivity between “Hack” and Processes . . . . . . . . . . . . . . . Sensitivity between “Power outage” and Processes . . . . . . . . . . Sensitivity analysis, nodes of interest are in the set of observed nodes. 56 57 58 61 62 64. 6.1 6.2. VaR of Different Confidence Levels . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Probability that Loss Exceeds Certain Threshold . . . . . . . . . . . . . . . . . . 68. 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. B.1 Pareto and Generalized Pareto K-S statistics in 10 water stations . . . . . . . . . 76. viii.

(10) Chapter 1. Introduction In the first chapter, we will motivate and introduce the subject of this thesis. The introduction will start with some background information and strategies of Business Continuity Management (BCM) that are applied. Thereafter, relevant literatures and research methods will be explained briefly. Finally, the structure of the thesis will be presented.. 1.1. Business Continuity Management. BCM is a management process that identifies potential threats to an organisation and the impacts to business operations those threats - if manifest - might cause, and which provides a framework for building organisational resilience with the capability for an effective response that safeguards the interests of its stakeholders, reputation, brand and value creating activities [1]. The process includes moving operations, (recovering operations) to another location if a disaster occurs at a worksite or datacentre. The scope of planning we focus on should include recovering from different levels of disaster which can range from in a short time, localized disasters, to days long problems among several buildings, to a permanent loss of a building [2]. The risks that BCM concerns are part of operational risks (See risk taxonomy in Figure 1.1) The analysis phase consists of business impact analyses (BIA) and business continuity risk analyses (BCRA). The BIA is an essential component of a bank’s business continuity management; it focuses on understanding of bank’s criticality and vulnerability of processes their dependencies [3]. Some processes are more crucial than others and require a greater allocation of funds for measures which are taken to prevent a disaster from occurring and additional measures which are taken to mitigate the impacts. In this thesis, we will use a graph database to describe the properties and relationship upon internal business structure, which enables us to execute queries under certain conditions. In addition, social network indicators will also be applied in evaluating the criticality and vulnerability among different groupsit focuses on understanding of bank’s criticality and vulnerability of processes their dependencies. Business continuity risk analysis (BCRA) investigates possible threats in details, the probability or likelihood of those threats, and the making of experiments on different scenarios [2]. Common threats include earthquake, flood, hurricane or other major storms, power outage, pandemic, fire, cyber-attack and random failure of systems. These threats are to be observed on buildings or applications. In order to look into links among risks and critical processes, building a Bayesian network with respect to business structure is necessary, and the dependencies of those relationships are displayed as conditional distributions within the networks. A typical 1.

(11) CHAPTER 1. INTRODUCTION. Figure 1.1: Bank Risk Taxonomy method to assess threats is to fit risk data to certain probability distributions and set thresholds based on severity levels so that probabilities can be figured out on those critical points under cumulative distributions. Another approach is learning parameters in Bayesian network directly using available data that generate a posterior distribution if prior information is known on a particular confidence. Both the processes make estimations on occurrences of incidence with some mitigation methods being taken into consideration. To investigate the inaccuracy of assessment, sensitivity analysis is performed which shows that varying in parameters can have corresponding impacts on its output in some quantitative relationships. Moreover, Bayesian inference helps us to see outcomes of conditional probabilities of interest if there are threats being observed. An important application of quantitative analysis in BCM is Value at Risk. Value at Risk (VaR) in BCM is defined as the maximum loss amount that an organization would not exceed over a given time horizon on certain confidence level. It is a decent estimate that tells an organization how much capital should be reserved for potential risk every year. VaR in BCM usually considers the probability of an event happening in one year instead of its frequency distribution. A loss distribution is provided along with a risk probability distribution so that VaR can be calculated by aggregated loss distribution. Expected and unexpected losses can be identified which providing an estimation on provisions and capital requirements.. 1.2. Literatures and Research Methods. The ideas of the BIA are widely introduced in a range of literature, and many banks are also working on it [4]. In practice, BIA is an important process that probes into business processes to determine and list critical processes that are vital to keep the business going. It is. 2.

(12) CHAPTER 1. INTRODUCTION. Figure 1.2: Business Structure In a Bank necessary to understand business environments, gather data and information, identify critical processes needed to carry out vital business operations and finally prepare a BIA report enlisting your findings to be submitted to the top management. Potential impacts from an outage are identified while corresponding recovery time objects (RTO) are determined. Furthermore, financial impacts of outage of processes for specific functions are concerned. This gives us an overview of flow in BIA analysis.. Figure 1.3: Phases of BIA BCRA includes risk assessment in that it lists the types of outages a bank is likely to suffer in a year and the vulnerability to certain outages such as power failures, building fire, and so forth. In the risk assessment, we identify the most probable threats as determined under Basel II [2] to the organization and analyze related vulnerabilities to these threats. H. Chen and A. Pollino [5] expressed the basic ideas about Bayesian analysis of BCRA. In this method, causes and effects in BCRA can be represented in a Bayesian network, where the business structure of an organization is reflected in interactions between variables by a graph with nodes and arcs. The strength of these relationships is defined in the conditional probability tables (CPTs) 3.

(13) CHAPTER 1. INTRODUCTION. Figure 1.4: Phases of BCRA attached to each node. CPTs specify the degree of belief (probabilities) that the nodes will be in particular states given the states of parent nodes (the nodes that directly affect that node). The probabilities of risk nodes are obtained from risks data that are usually fitted to probability distributions. In addition, foundation of a Bayesian network and its probabilistic analysis is well defined by D. Koller and N. Friedman [6], while famous inference algorithms are also introduced by them. M.H. Coupe and L.C. van der Gaag gave an approach of sensitivity analysis within a Bayesian network [7]. Finally, loss distribution and VaR solutions of BCRA can be found in compared to the paper of E. Navarrete [8]. We have made some advance in the probabilistic analysis in this thesis of previous research. First of all, we introduce social network metrics such as centralities, which are decent indicators of criticality and vulnerability, and apply them to the business world. Furthermore, statistical techniques such as Monte Carlo simulation and the bootstrap method are frequently used in this analysis. For heavy tailed events, VaR is still a good measure of loss on a particular confidence level in BCM that gives us an extension of its application.. 1.3. Thesis Structure. The outline of the thesis is given as follows: • In Chapter 2, we build a business graph database which contains the business components and their dependencies. It is possible for us to query in the database and find the influences of outage of business processes within the business structure. Subsequently, the network structure is trasferred to Gephi so that centralities can be calculated. A description of the analysis is supported by an example. This network is based on the existing network structure of ABN AMRO. • In Chapter 3, we introduce graph theory and construct Bayesian networks based on the fundamental concepts. We give a detailed derivation of the properties in Bayesian networks in mathematical definitions and theorems. Furthermore, two types of conditional probability tables, common and Noisy-OR (Noisy-MAX), are introduced and compared. Building risks and application risks models are presented based on the bank business structure as an application.. 4.

(14) CHAPTER 1. INTRODUCTION. • In Chapter 4, we fit the risk data to some probable distribution and estimate the event probability given thresholds. A technique of goodness of fit helps us to select an appropriate distribution. In particular, the Pareto distribution is suitable for events where the likelihood of values is monotone and non-increasing but converges slowly. Another method to estimate risk probability is Bayesian estimation. Using a Dirichlet distribution as prior the probability distribution is updated by incoming data. • In Chapter 5, Bayesian inference algorithms are provided to calculate the conditional probability for every node given particular evidence. We discuss the differences among different algorithms in terms of calculation speed and accuracy. Thereafter, sensitivity analysis methods are defined and applied in Bayesian network. Hence, we are possible to know the inaccuracy of parameter input in nodes and find the interactions between the conditional probability of nodes of interest and probability under study. • In Chapter 6, we apply VaR in BCM analysis, combining event distribution and loss distribution to obtain an aggregated distribution. VaR are obtained from percentile of confidence levels of interest. In particular, Monte Carlo simulations are again used to generate loss samples and validate the accuracy of mean and percentiles. In Chapter 7, we give a short conclusion and put forward some suggestions for future research that is not included in this thesis.. 5.

(15) Chapter 2. Business Impact Analysis In this chapter, we focus on analyzing the interconnection of process elements in different groups and their criticality and vulnerability described using social network characteristics. Two tools are used to implement the models: Neo4j and Gephi. Later we will present the modeling procedures and some example results.. 2.1. Business Graph Database. Relational database systems are generally efficient unless the data contains many relationships requiring combinations of large relationship tables. In general, a graph database does better at structural type queries than a relational database. A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way. As in Figure 2.1, a graph starts with nodes (vertexes) with labels, which are used to group the nodes and restrict queries to subsets of the graph. Relationships (arcs) organize nodes into arbitrary structures, with properties that can identify different types of dependencies [9].. Figure 2.1: Graph Elements and Properties 6.

(16) CHAPTER 2. BUSINESS IMPACT ANALYSIS. 2.1.1. Business Structure in a Bank. In the existing business structure of a bank (Figure 2.2), nodes are realized as business processes, sub-processes, RTOs (recovery time objects), applications, business lines and buildings in different locations; labels on nodes are their corresponding names and unique IDs; relationships are defined as 1-to-n mapping relations such as ‘located in’, ‘contains’ and ‘has’, or m-to-n mapping relations such as ‘use’.. Figure 2.2: Overview Nodes and Relationships of Business Structure At the level of interest we have about 20 business processes, with approximately 200 sub processes, 70 locations, 150 applications, 6 RTOs and 20 business lines involved in the business structures. All of these elements can be displayed in a graph network if we import the necessary information. To realize this, we use the data, collected in Excel, to create Cypher query sentences and put them into Neo4j (See Figure 2.3). Neo4j is an open-source graph database implemented in Java [10]. It can also use the Cypher query language, which is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store, to create or retrieve information based on our interests [11]. It is easy to obtain results intuitively and simply in a visualized interface.. 2.1.2. An Application of Query in Neo4j. Let us consider an example. If we want to know “Which applications are used by business processes that have an RTO of 0-2 hrs and that are located in building of Building 1?”, we can look at the structure in Figure 2.2 and see the interconnections of nodes that we are interested 7.

(17) CHAPTER 2. BUSINESS IMPACT ANALYSIS. Figure 2.3: Business Processes Nodes with Names and IDs in. First, applications are used by processes, which are included in some business processes. Subsequently, those processes are used by certain business lines, some of which are located in the building of Building 1. Finally, the target processes or business processes who have a RTO 0-2 hrs would meet our requirement. As a consequence, we immediately have the following statements M AT CH (rto : RT O {name : “0 − 2hrs”}) ← [: BU SIN ESSP ROCESS HAS RT O] − (bp : BusinessP rocess), p1 = ShortestP ath(bp − [?..3] − (bul : Building {name : “Building1”})), p2 = ShortestP ath(bp − [?..2] − (a : Application)) RET U RN p1, p2, rto limit 10; In Neo4j we put the statements into the command window and execute them. After a span of time the nodes and the paths that meet our restrictions are displayed in the output area (Figure 2.4). For further investigation, we can right click the nodes to see their properties, and double click to see more nodes that are connected with them. In the output network, we limit to 10 applications (in yellow) that fit our requirement. It is clear to see how applications and the target building (in cyan) are interconnected. An observation of damage on Building 1 can have impacts on some applications through relative business lines (in pink) and business processes (in red), which have RTO 0-2 hrs (in white). The results can also be exported by table in Excel, which enables us to perform further analysis. If there are some changes in the business structure, those changes can also be updated in the network by some standard operations. We can add or delete nodes and their corresponding labels, and modify relationships among them if necessary.. 8.

(18) CHAPTER 2. BUSINESS IMPACT ANALYSIS. Figure 2.4: Overview of Output Result in Neo4j under the Restriction of 10 Applications. 2.2 2.2.1. Business Network Analysis Basic Network Concept. (a). (b). Figure 2.5: Undirected and Directed Graph Now we will have a look at more characteristics in network analysis. First we need some basic mathematical concept for the network. The definition of network metrics is provided by I. Robinson, J. Webber and E. Eifrem [9] . In mathematical literature, we refer to the network as a “graph”. A graph is an ordered pair G = (V,E) comprising a set V of nodes together with a set E of edges or lines, which are a 2-element subsets of V (i.e., an edge is related with two nodes, and the relation is represented as a pair of the nodes with respect to the particular edge). The most common graphs are undirected graphs and directed graphs [12][13]. An undirected graph is a graph, i.e., a set nodes that are connected together, where all the edges are bidirectional. An undirected graph is sometimes called an undirected network. In contrast, a graph where 9.

(19) CHAPTER 2. BUSINESS IMPACT ANALYSIS. the edges point in a direction is called a directed graph. A mathematical explanation of directed graph is given in Section 3.1. Consider an undirected network with n nodes. Let us label the nodes with integer labels 1, . . . , n and denote an edge between nodes i and j by (i, j), then the complete network can be specified by giving the value of n nodes and a list of all the edges. A good representation of a network is the adjacency matrix. The adjacency matrix A of a simple graph is the matrix with elements Aij such that ( 1, if there is an edge between node i and j Aij = (2.1) 0, otherwise For example, the undirected network in Figure 2.5(a) can be represented as   0 1 1 A = 1 0 1 1 1 0 The matrix is symmetric, since interconnection between nodes is reciprocally. Similarly, the adjacency matrix of a directed network has matrix elements ( 1, if there is an edge f rom node i to j Aij = 0, otherwise As an example, the adjacency matrix of the directed network in Figure 2.5(b) is   0 1 1 A = 0 0 1 0 0 0. (2.2). (2.3). (2.4). Note that this matrix is not symmetric. In general the adjacency matrix of a directed network is asymmetric. As Figure 2.5 shows, each of the nodes can reach another through a path like a circle. A cycle in a directed network is a closed loop of edges with the arrows on each of the edges pointing the same way around the loop. Some directed networks however have no cycles and these are called acyclic networks. Graphs with cycles are called cyclic. In Figure 2.5, if any of the edges is eliminated, then the cyclic directed network becomes acyclic. A typical directed acyclic network is a Bayesian network, which can represent probabilistic relationship between random variables. We will see more formal explanations and applications of Bayesian networks in Chapter 3. To measure the connection properties within a graph, we use the degree of a node, which is the number of edges connected to it. We will denote the degree of node i by ki . For an undirected graph of n nodes the degree can be written in terms of the adjacency matrix as ki =. n X. Aij. (2.5). j=1. Every edge in an undirected graph has two ends and if there are m edges in total then there are 2m ends of edges. But the number of ends of edges is also equal to the sum of the degrees 10.

(20) CHAPTER 2. BUSINESS IMPACT ANALYSIS. of all the nodes, so n. 1X m= ki 2. (2.6). i=1. The mean degree c of a node in an undirected graph is c=. 2m n. (2.7). Node degrees are more complicated in directed networks. In a directed network each node has two degrees. The in-degree is the number of ingoing edges connected to a node and the outdegree is the number of outgoing edges. Bearing in mind that the adjacency matrix of a directed network has element Aij = 1 if there is an edge from i to j, in- and out-degrees can be written as kiin =. n X. kjout =. Aji. j=1. n X. Aji. (2.8). i=1. The number of edges m in a directed network is equal to the total number of incoming ends of edges at all nodes, or equivalently to the total number of outgoing ends of edges, so m=. n X. kiin. i=1. =. n X. kjout. (2.9). j=1. Thus the mean in-degree cin and the mean out-degree cout are equal in every directed network cin =. m = cout n. (2.10). Another important property that we introduce is path. A path in a network is any sequence of nodes such that every consecutive pair of nodes in the sequence is connected by an edge in the network. In a directed network, each edge traversed by a path must be traversed in the correct direction for that edge. In an undirected network edges can be traversed in either direction. More explanation about path in mathematics can be found in Chapter 3. The length of a path in a network is the number of edges traversed along the path (not the number of nodes). It is straightforward to calculate the number of paths of a given length r on a network. For either a directed or an undirected simple graph the element Aij is 1 if there is an edge from node i to node j, and 0 otherwise (undirected Aij =Aji ). Then the product Aik Akj is 1 if there is a path of length 2 from i to j via k, and 0 otherwise. And the total number of paths of length two from i to j, via any other node, is (2). Nij =. n X. Aik Akj = [A(2) ]ij. (2.11). k=1. where [. . .]ij denotes the ij-th element of a matrix. Similarly in r = 2, we generalize to paths of arbitrary length r, we see that (r). Nij = [A(r) ]ij where [A(r) ] = Aik1 Ak1 k2 . . . Akr−1 j .. 11. (2.12).

(21) CHAPTER 2. BUSINESS IMPACT ANALYSIS. As a consequence, we obtain the solution of how to calculate the number of paths between two nodes in a network.. 2.2.2. Centrality. One of the most important metrics that is involved in network analysis is centrality [12]. Centrality of a node measures its relative importance within a graph. Its concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin. Nonetheless, the methods described are now widely used in areas outside the social sciences [12]. Applications of centrality include how influential a person is within a social network, how important a room is within a building, how critical a hub in the Internet, and how well-used a road is within an urban network. Here we apply it to quantify our business structure. In this section three measures of centrality are introduced: degree, betweenness and eigenvector [14]. These centralities are good measures of node importance; we will review their definitions and their application in business network analysis. The simplest centrality measure in a network is just the degree of a node, the number of edges connected to it (see Section 2.2.3). Degree is sometimes called degree centrality in the social networks literature, to emphasize its use as a centrality measure. In directed networks, nodes have both an in-degree and an out-degree, and both may be useful as measures of centrality in the appropriate circumstances. Although degree centrality is a simple centrality measure, it can be very illuminating. In the business structure of a bank, for instance, “Process” contributes in-degree to “Business Line”, while edges from “Business Line” to “Building” are counted as out-degree. The “Business Line” that have a higher in-degree (critical index) are more critical to their other nodes because incidents on them may have impact on more “Process”. On the other hand, those with lower out-degree (vulnerability index) are more vulnerable since only one or a few buildings’ failure may cause outage on them. A different concept of centrality is betweenness centrality, which measures the extent to which a node lies on paths between other nodes [15]. Nodes with high betweenness centrality may have considerable influence within a network by virtue of their control over information passing between others. Specifically, the betweenness centrality of a node i is defined to be the number of those paths that pass through i. Mathematically, let nist be 1 if node i lies on a path from s to t and 0 if it does not i.e. there is no such path. Then the betweenness centrality xi is given by xi =. X. nist. (2.13). st. In a generalized perspective, we can express the betweenness for a general network by the number of paths from s to t that pass through i. And we define gst to be the total number of paths from s to t. Then the betweenness centrality of node i is xi =. X ni. st. st. gst. (2.14). A node can have quite a low degree, be connected to other nodes that have a low degree, even 12.

(22) CHAPTER 2. BUSINESS IMPACT ANALYSIS. be a long way from others on average, and still have high betweenness. Figure 2.6 is an example. We can see that node A lies on a bridge between two groups within a network. Since any path between a node in one group and a node in the other must pass along this bridge, A acquires very high betweenness, even though its degree centrality is only 2, but nonetheless it might have a lot of influence in the network as a result of its control over the flow of information between others. This extreme example, A, is called single-failure point. The failure of A can yield an outage of its corresponding processes. Consider a normalized betweenness centrality, which is defined by the ordinary betweenness centrality (2.14) divided by (n − 1)(n − 2), where n is the total number of nodes in the network. Thus the normalized betweenness centrality is bounded by the interval [0, 1], and the single-failure point always has normalized betweenness 1.. Figure 2.6: Node with High Betweenness Centrality and Low Degree Centrality A natural extension of the simple degree centrality is eigenvector centrality [16]. In many circumstances a node’s criticality in a network is increased by having connections to other nodes that are themselves important. Instead of awarding nodes just one point for each neighbor, eigenvector centrality gives each node a score proportional to the sum of the scores of its neighbors. Let us make some initial guess about the centrality xi of each node i. For instance, we could start by setting xi = 1 for all i. We can use this setting to calculate a better one, which we define to be the sum of the centralities of i0 s neighbors thus: x0i =. X. Aji xj. (2.15). j6=i. where Aji is an element of the adjacency matrix (undirected matrix, both ingoing and outgoing neighbours are considered) . We can write this expression in matrix notation as x0 = Ax, where x is the vector with elements of xj . Repeating this process to make better estimates, we have a vector of centralities x(t) after t steps given by x0 (t) = At x(0). (2.16). Now let us write x(0) as a linear combination of the eigenvectors vi of the adjacency matrix thus x(0) =. X. ci vi. (2.17). i. for some appropriate choice of constants ci . Then x(t) = At. X i. ci vi =. X. ci λti vi = λt1. i. X i. ci (. λi t ) vi λ1. (2.18). where the λi are the eigenvalues of A, and λ1 is the largest of them. Note that undirected matrix 13.

(23) CHAPTER 2. BUSINESS IMPACT ANALYSIS. A is real symmetric, its every eigenvalue is real [17]. Since | λλ1i | < 1 for all i 6= 1, all terms in the sum other than the first decay exponentially as t becomes large, and hence in the limit t → ∞ we get x(t) → c1 λt1 v1 , i.e., a constant. In other words, the limiting vector of centralities is simply proportional to the leading eigenvector of the adjacency matrix. Equivalently we could say that the centrality x satisfies Ax = λ1 x. (2.19). This then is the eigenvector centrality. As promised the centrality xi of node i is proportional to the sum of the centralities of i0 s neighbors (Perron-Frobenius theorem [12]) xi = λ−1 1. X. Aij xj. (2.20). j. which gives the eigenvector centrality the nice property that it can be large either because a node has many neighbors or because it has important neighbors (or both). A “Business Line” in a business network, for instance, can be important, by this measure, because it is linked to many processes (even though those processes may are less important) or connects to processes that are critical. On the other hand, a “RTO” that is in relation to more critical processes may have higher eigenvector centralities, showing that the “RTO” itself is also critical.. 2.2.3. An Application of Centrality in Gephi. We will describe the application of centrality in a business data model shown as Figure 2.2 in Gephi. Gephi is an open-source network analysis and visualization software package written in Java on the NetBeans platform [18]. It is convenient to build a visualized network with an elegant layout and data structures from Neo4j can be imported into Gephi easily. After doing this, metrics of the network, nodes and edges can be computed based on the graph. We will see the procedure and results next.. Figure 2.7: Data Importing from Neo4j 14.

(24) CHAPTER 2. BUSINESS IMPACT ANALYSIS. Figure 2.8: Grouped, Colored Network and Layout in Good Visualization. Figure 2.9: Nodes Properties and Metrics in Data Laboratory. 15.

(25) CHAPTER 2. BUSINESS IMPACT ANALYSIS. In Figure 2.7 and Figure 2.8, we show the differences of our business data model before and after using the Force Atlas 2 layout algorithm [19]. Different colors represent different groups of nodes in the network. Node sizes depend on their degree centrality, ranging from size 2 to 10. In the data laboratory, more specific details are displayed (Figure 2.9). We can view, add, delete and modify all properties of nodes and edges in this window and network metrics can also be shown there after they have been computed. Note that in the calculation of eigenvector centrality, we set the number of iterations to 100.Larger iterations indicate that the value we obtain more likely converges to real value, but it takes longer time for calculations. We then export the data into Excel, and separate them into different groups to conduct analysis. For “Business Process”, they only have out-degree but not in-degree, and their betweenness and eigenvector centrality are both zero, indicating that these metrics are meaningless. Therefore we can estimate the vulnerability of “Business Process” purely on their out-degree centrality. The opposite holds for “Application” nodes, which only have in-degree but not out-degree. The number of “Process” nodes is large, and they have similar values of in- and out- degree. However, “Process” is linked to “Business Process”, “RTO”, “Application” and “Business Line”; therefore they are crucial bridges between other groups of nodes. To evaluate their criticality, the best metric we can choose is betweenness centrality, the larger of which indicates a more vital role of “Process” within the network. Next we turn to the “Building”, which has the same out-degree (with “n-to-1” mapping relations to city) and discrepant in-degree (contain different numbers of business lines). Furthermore, their eigenvector centrality does highly depend on the importance of “Business Line” so that those buildings where critical business lines are located in might have larger value of eigenvector centrality. “Business Line” nodes have the most complicated situations. They have different numbers of edges come from “Process” and may be located in various buildings. As we see at the beginning of Section 2.2.2, high in-degree implies more influence on “Process” while low outdegree indicates more significant vulnerability due to less buildings linking to them. Moreover, Betweenness centrality also explains how important a business line acts as a bridge in the network. In addition, eigenvector centrality is a good improvement of degree centrality since nodes criticality can be measured more accurately. We should notice that this analysis is only an estimated exploration of the internal dependencies and influences within the business structure. The data we use may not perfectly reflect the real structure and sometimes it changes over time. Therefore we should keep track on those changes and update the information in the model accordingly.. 16.

(26) CHAPTER 2. BUSINESS IMPACT ANALYSIS. Figure 2.10: Out-degree of “Business Process” and in-degree of “Application”. The smallest value of out-degree in “Business Process”has “BP24”, which has one process and one RTO. The largest value of in-degree of “Application” have “APP1045” and “APP1033”, they are both used by 21 processes. Figure 2.11: Betweenness centrality of “Process”. The most influential one is “P352” under this measure. 17.

(27) CHAPTER 2. BUSINESS IMPACT ANALYSIS. Figure 2.12: In-degree and eigenvector centrality of “Building”. Both the metrics indicates that “BU10016” has more impacts on business lines than other buildings have. Figure 2.13: Out-degree, betweenness and eigenvector centrality of “Business Line”. “BL5011” has the lowest out-degree (only located in one building), and the largest values of betweenness and eigenvector centrality have “BL5017” and “BL5016”. 18.

(28) Chapter 3. BCRA: Model Description We want to analyze BCRA based on a probabilistic graphical model, especially a Bayesian network. Therefore in this section, we will describe the mathematical background of Bayesian networks and conditional probability distributions. Moreover, an application to business structure in a bank is provided. This example will also be used in the following chapters. Most of the theories belows are from [6].. 3.1. Graphs. In Section 2.1.1, we have already defined some basic concepts of graphs. In order to lay the foundations of a Bayesian network, we will see some representations of a probability distribution using a graph as a data structure. In this section we will survey more concepts in graph theory in a mathematical way. A graph is a data structure K consisting of a set of nodes and a set of edges. Throughout most of the later contents, we use a set of discrete random variables X = {X1 , . . . , Xn } to represent a set of nodes, in which a pair of nodes Xi , Xj can be connected by a directed edge Xi → Xj . We are only interested in directed graphs here since Bayesian networks are directed. Thus the set of edges E is a set of pairs, where each pair is one of Xi → Xj or Xj → Xi , for Xi ,Xj ∈ X , i < j. A graph is called directed if all edges are Xi → Xj or Xj → Xi . The graph is is denoted as G (See also Section 2.2.3). Definition 3.1.1. (Directed Graph) Given a graph K = (X , E) , its directed version is a graph G = (X , E 0 ), where E 0 = {Xi → Xj : Xi , Xj ∈ X }. Whenever we have that (Xi → Xj ) ∈ E 0 , we say that Xj is the child of Xi in K, and that Xi is the parent of Xj in K. We use P aX to denote the parents of X, ChX to denote its children. Figure 3.1 shows an example of directed graph G. There we have A is the parent of B and C, and E is their shared child. In many cases, we want to consider only the part of the graph that is associated with a particular subset of the nodes. Hence we can focus on a sub-graph of a particular graph. Definition 3.1.2. (Sub-graph) Let K = (X , E), and let X ⊂ X . We define the sub-graph K[X] to be the graph (X, E 0 ), where E 0 are all the edges Xi → Xj , Xi , Xj ∈ X. The sub-graph X is complete if every two nodes in X are connected by some edges. 19.

(29) CHAPTER 3. BCRA: MODEL DESCRIPTION. Figure 3.1: An Example of Directed Graph G Although the subset of nodes X can be arbitrary, we are often interested in sets of nodes that preserve certain aspects of the graph structure. Using the basic notion of edges, we can define different types of longer-range connections in the graph. First we introduce a mathematical explanation of path and trail. Then the concepts of ancestor and descendant, cycle and loop are defined. Definition 3.1.3. (Path and Trail) We say that X1 , . . . , Xk form a directed path in the graph K = (X , E) if, for every i = 1, . . . , k − 1, we have Xi → Xi+1 . Furthermore, X1 , . . . , Xk form a trail in the graph K = (X , E) if, for every i = 1, . . . , k − 1, we have Xi Xi+1 (Xi → Xi+1 or Xi+1 → Xi ). A graph is connected if for every Xi , Xj there is a trail between Xi and Xj . In the graph of Figure 3.1, A, B, D form a trail and it is also a path. On the other hand, B, E, C is a trail but not a path. We can now define the long-range relationship in the graph. Definition 3.1.4. (Ancestor and Descendant) We say that X is an ancestor of Y in K = (X , E), and that Y is an descendant of X, if there exists a directed path X1 , . . . , Xk with X1 = X and Xk = Y . We use DescendantsX to denote X’s descendants, AncestorsX to denote X’s ancestors. In Figure 3.1, we have A is the ancestor of D, E and F , and D, E and F are descendants of A. Definition 3.1.5. (Cycle and loop) A cycle in K is a directed path X1 , . . . , Xk where X1 = Xk . A graph is acyclic if it contains no cycles. A loop in K is a trail X1 , . . . , Xk where X1 = Xk . One of the most important concepts in this thesis is a directed acyclic graph (DAG), as DAGs are the basic graphical representation that underlay Bayesian networks. In Figure 3.1, we see it is a DAG, and adding an edge form D to E would lead to a loop in B, D and E. Moreover, adding an edge from E to A would lead to a circle in A, B and E. A final useful notion is that of an ordering of the nodes in a directed graph that is consistent with the directionality of its edges. Definition 3.1.6. (Topological Ordering) Let K = (X , E) be a graph. An ordering of the nodes X1 , . . . , Xn is a topological ordering relative to K if, whenever we have Xj → Xi , for Xi , Xj ∈ X , then i < j. 20.

(30) CHAPTER 3. BCRA: MODEL DESCRIPTION. Topological ordering to implement Bayesian inferences has been applied by many inference algorithms such as Variable Elimination (VE). We will investigate Bayesian inference in details in Chapter 5.. 3.2. Bayesian Network. As we have pointed out in Chapter 2, one of the most typical applications of probabilistic graphical models is a Bayesian network (BN), which represents a set of random variables and their conditional dependencies via a DAG. The core properties of Bayesian networks are its conditional parameterization and independencies. We will explore the representation of a Bayesian network and its structure properties in this section.. 3.2.1. Independencies. Before introducing the representation of a Bayesian network, we first recall Bayes’s theorem [20], which is of importance in the mathematical manipulation of conditional probabilities. Definition 3.2.1. Let H and E be events with P (E) 6= 0, then the conditional probability of H given E, P (H|E), is define as P (H|E) =. P (E, H) P (E). where P (E, H) is the joint probability of H and E. Theorem 3.2.2. (Bayes’ Theorem) (i) (Simple form) Assuming that H and E are events, and P (E) 6= 0. Then the probability of H given E is given by P (H|E) =. P (H)P (E|H) P (E). (ii) (Extended form) Assuming that H1 , . . . , Hn is a partition of events, which are the whole sample space, so they are mutually exclusive as toghther, and E is an event, P (E) 6= 0, then the probability of Hi given E, i = 1, . . . , n, is given by P (Hi )P (E|Hi ) P (Hi |E) = P j P (Hj )P (E|Hj ) In a Bayesian network we can have a joint probability distribution that represents the whole network, which can be written as P (X1 , . . . , Xn ), where X1 , . . . , Xn are nodes in the BN under topological ordering. By Bayes’ Theorem, it is easy to factorize the joint probability into a product of n conditional probabilities as P (X1 , . . . , Xn ) = P (X1 )P (X2 |X1 )P (X3 |X1 , X2 ) . . . P (Xn |X1 , . . . , Xn−1 ) n Y = P (X1 ) P (Xi |X1 , . . . , Xi−1 ) i=2. 21. (3.1).

(31) CHAPTER 3. BCRA: MODEL DESCRIPTION. If X1 , . . . , Xn are independent random variables, then (3.2) can be written as P (X1 , . . . , Xn ) = P (X1 )P (X2 ) . . . P (Xn ) =. n Y. P (Xi ). (3.2). i=1. Factorization of a distribution P is a property indicating that independencies exist in P , and vice versa. In Bayesian networks, we often consider the influence of conditional independencies between the nodes that simplifies the representation of the factorized joint probability distribution in (3.2). Before we look at the independencies in Bayesian networks, we first consider the flow of probabilistic influence within the network, with which we can find the independencies between ancestor and descendant nodes. Let us consider an example of a simple Bayesian network. Suppose two buildings have equal possibility of fire occurrence (Figure 3.2). The network in Figure 3.2 has a contrary ordering with that in Figure 2.2. The directions of relationships in Figure 2.2 illustrate the dependencies between business components, and that in Figure 3.2 represent the flow of influence caused by threats. Business Process 1 is related to Building 1, and Business 2 is related to Building 1 and Building 2. Therefore the influence of Building 1 due to fire may be passed to Business Process 1. On the other hand, Business Process 2 can give us the information of whether it fails, which can be passed from Building 1 to Fire. Moreover, outage in Business Process 2 indicates that incident occurs in Building 1 or Building 2.. Figure 3.2: Flow of Probabilistic Influence in Bayesian Network Another case is whether influence of Building 1 can be passed from Business Process 2 to Building 2. An observation of fire in Building 1 can give us the information that it may have impact on Business Process 2, but this does not tell us if there is a fire in Building 2 or not. Thus information cannot flow from Building 1 to Building 2 through Business Process 2. In contrast, if we have an observation on the intermediate node, we will get an inverse conclusion. In Figure 3.2, given evidence on Building 1 information cannot flow from Fire to Business Process 1 or from Business Process 1 to Fire. Similarly, Business Process 1 cannot influence Business Process 2 via Building 1 that if information on Builiding 1 is given. But the observation of Business Process 2 will give us the information that, if fire happens in Building 1, it less likely happens in Building 2. In this case, the trail is active if evidence has been set.. 22.

(32) CHAPTER 3. BCRA: MODEL DESCRIPTION. When influence can flow from X to Y via W , we say that trail X W Y is active. The results of our analysis for active two-edge trails are summarized in Table 3.1:. Trail X→W →Y Y →W →X X←W →Y X→W ←Y. W ∈ /Z Active Active Active Inactive. W ∈Z Inactive Inactive Inactive Active. Table 3.1: A summary if a trail is active or not depends on whether W belongs to evidence set Z The structure X → W ← Y is also called a v-structure. In a general case of a longer trail X1 . . . Xn , it is easy to see that impacts can flow from X1 to Xn if every two edge trail Xi−1 Xi Xi + 1 along the trail is active. Thus we can summarize this intuition in the following definition Definition 3.2.3. (Active Trail) Let G be a BN structure, and X1 . . . Xn a trail in G. Let Z be a subset of observed variables. The trail X1 . . . Xn is active given Z if (i) Whenever we have a v-structure Xi−1 → Xi ← Xi+1 , then Xi or one of its descendants are in Z; (ii) No other node along the trail is in Z. Note that if X1 or Xn are in Z the trail is not active. Sometimes we may have more than one trail between two nodes. Thus one can influence another if there is any active trail along which influence can flow. Combining this with the former definition, we obtain the notion of d-separation, which provides us with a notion of separation between nodes in a directed graph. Definition 3.2.4. (D-separation) Let X, Y, Z be three sets of nodes in G. We say that X and Y are d-separation given Z, denoted d − sepG (X; Y|Z), if there is no active trail between any nodes X ∈ X and Y ∈ Y given Z. We use I(G) to denote the set of independencies that correspond to d-separation: I(G) = {(X ⊥ Y|Z) : d − sepG (X; Y|Z)} . Let us incorporate this result to the independencies of probability distributions in a Bayesian network. Let P be a distribution over X , then we define I(P) to be the set of independence assertions of the form (X ⊥ Y|Z) that hold under P . D-separation in G implies that probability distribution of P satisfies conditional independency P (X ⊥ Y|Z), if Z consists of parents of X, and Y are non-descendants of X. We rewrite this in another definition to reveal the relationship of G and its probability distribution P . Definition 3.2.5. (I-map) If P satisfies I(G), then we say that G is an I-map (independency map) of P .. 23.

(33) CHAPTER 3. BCRA: MODEL DESCRIPTION. 3.2.2. Factorization. Now we turn to the example of Figure 3.2 again. We denote F as “Fire” node, B1 and B2 as “Building 1” and “Building 2”, BP1 and BP2 as “Business Process 1” and “Business Process 2”. Consider the joint probability distribution of this BN P (F, B1 , B2 , BP1 , BP2 ) = P (F )P (B1 |F )P (B2 |F, B1 )P (BP1 |F, B1 , B2 )P (BP2 |F, B1 , B2 , BP1 ) (3.3) By Definition 3.2.3, (B1 ⊥ B2 |F ), (BP1 ⊥ F |B1 ), (BP1 ⊥ F |B2 ), (BP2 ⊥ F |B1 , B2 ) and (BP2 ⊥ BP1 |B1 , B2 ) hold, thus we can simplify (3.4) to obtain another representation P (F, B1 , B2 , BP1 , BP2 ) = P (F )P (B1 |F )P (B2 |F )P (BP1 |B1 )P (BP2 |B1 , B2 ). (3.4). Notice that the local conditional probability distribution (CPD) of each node only depends on its parent nodes. This result tells us that any entry in the joint probability distribution can be computed as a product of factors, one for each variable. Each factor represents a conditional probability of the variable given its parents in the network. This factorization applies to any distribution P for which G is an I-map. We now state and prove this fundamental result more formally. Definition 3.2.6. (Chain Rule) Let G = (X , E) is a BN graph, then it can represent a joint probability P via the chain rule P (X1 , . . . , Xn ) =. n Y. P (Xi |P aXi ). i=1. where X1 , . . . , Xn are nodes in BN, and P aXi are the parent nodes of Xi . The individual factors P (Xi |P aXi ) are the CPDs. Under the Definition 3.2.3, we can define factorization of a Bayesian network. Definition 3.2.7. (Factorization) Let G = (X , E) is a BN graph over X1 , . . . , Xn , then a probability distribution P factorizes over G if P (X1 , . . . , Xn ) =. n Y. P (Xi |P aXi ). i=1. Using factorization and the previous definition, we can formally define a Bayesian network in the following way Definition 3.2.8. (Bayesian Network) A Bayesian network is a pair B = (G, P ) where P is specified as a set of CPDs associated with G’s nodes. The distribution P is often annotated PB . We can now prove that the phenomenon we observed in Figure 3.2 holds more generally. Theorem 3.2.9. Let G be a BN structure over a set of random variables X , and let P be a joint probability distribution over the same space. If G is an I-map for P , then P factorizes according to G. 24.

(34) CHAPTER 3. BCRA: MODEL DESCRIPTION. Proof. Assume, without a loss of generality, that X1 , . . . , Xn is a topological ordering of the variables in X relative to G. As in our example, we first use the chain rule for probabilities: P (X1 , . . . , Xn ) = P (X1 ). n Y. P (Xi |X1 , . . . , Xi−1 ). i=2. Now consider one of the factors P (Xi |X1 , . . . , Xi−1 ). As G is an I-map for P , we have that P (Xi ⊥ N onDescendantsXi | P aXi ) ∈ I(P), if P aXi are parents of Xi , and Z are non-descendants of Xi . By assumption, all of Xi ’s parents are in the set {X1 , . . . , Xi−1 }. Furthermore, none of Xi ’s descendants can possibly be in the set. Hence, {X1 , . . . , Xi−1 } = P aXi ∪ Z where Z ⊆ N onDescendantsXi . From the local independencies for Xi and Definition 3.2.3, it follows that P (Xi ⊥ Z|P aXi ). Hence we have that P (Xi |X1 , . . . , Xi−1 ) = P (Xi |P aXi ) Applying this transformation to all of the factors in the chain rule decomposition, the result follows. Thus, the conditional independence assumption implied by a BN structure G allows us to factorize a distribution P for which G is an I-map into small CPDs. Note that the proof is constructive, providing a precise algorithm for constructing the factorization given the distribution P and the graph G. Theorem 3.2.9 shows one direction of the fundamental connection between the conditional independencies encoded by the BN structure and the factorization of the distribution into local probability models: that the conditional independencies imply factorization. The converse also holds: factorization according to G implies the associated conditional independencies. Theorem 3.2.10. Let G be a BN structure over a set of random variables X , and let P be a joint probability distribution over the same space. If P factorizes according to G, then G is an I-map for P . Proof. Assume again that X1 , . . . , Xn is a topological ordering of the variables in X relative to G. Since P factorizes according to G, by Definition 3.2.7, we have P (Xi |X1 , . . . , Xi−1 ) = P (Xi |P aXi ) where P aXi are the parent nodes of Xi . Consider an individual factor P (Xi |P aXi ). We want to prove that P (Xi |P aXi ) = P (Xi |P aXi ∪ Z), where Z ⊆ N onDescendantsXi . This implies that P (Xi ⊥ N onDescendantsXi | P aXi ), which we use to apply Definition 3.2.4. As in Theorem 3.2.9, {X1 , . . . , Xi−1 } = P aXi ∪ Z, by Bayes’ Theorem, we immediately obtain P (Xi , P aXi ∪ Z) P (X1 , . . . , Xi ) = P (P aXi ∪ Z) P (X1 , . . . , Xi−1 ) Q P (Xi | P aXi ) =Q i = P (Xi |P aXi ) P i−1 (Xi−1 | P aXi−1 ). P (Xi |P aXi ∪ Z) =. Applying this result to all of the factors in the chain rule decomposition, by Definition 3.2.5, it follows that P satisfies I(G).. 25.

(35) CHAPTER 3. BCRA: MODEL DESCRIPTION. In summary, we have two equivalent views of BN graph structures: • I-map to factorization: A directed graph G, annotated with a set of conditional probability distributions P (Xi |P aXi ), together define a distribution via the chain rule for Bayesian networks. • Factorization to I-map: Directed graph G, associated with independence assumptions, allows P to be represented The two complementary definitions are foundations of a Bayesian network, which simplify the connections among nodes in BN structures. The representations of a BN, local or global, give us an appropriate solution to conduct reasoning in various aspects.. 3.3. Local Probabilistic Models. In this section, we will examine CPDs in more detail and describe a range of representations and consider their implications in terms of additional regularities we can exploit. However, in order to choose the suitable models, we have to compare them with our BN structures.. 3.3.1. Tabular CPDs. The most common CPD is the tabular representation of CPDs, where we encode P (X|P aX ) as a table that contains an entry for each joint assignment to X and P aX . For this table to be a proper CPD, we require that all the values are nonnegative, and that, for each value paX , we have X. P (x|paX ) = 1. (3.5). x∈V al(X). where V al(X) is the states set of X. It is clear that this representation is as general as possible. We can represent every possible discrete CPD using such a table. As we will also see, table-CPDs can be used in a natural way in inference algorithms. These advantages often lead to the perception that table-CPDs, also known as conditional probability tables (CPTs), are an inherent part of the Bayesian network representation.. Building 1 Building 2 Fail Normal. Fail Fail Normal 0.36 0.2 0.64 0.8. Normal Fail Normal 0.2 0 0.8 1. Table 3.2: An example of CPT of Business Process 2 in Figure 3.2 We take Figure 3.2 again as an example. Let us consider the CPT of the node Business Process 2, which has two parents: Building 1 and Building 2, each of which has two states respectively. If we also set two states in Business Process 2, we will obtain 22 = 4 values and 2 × 2 × 2 = 8 numbers filled in CPT. Denote “BP 2” as the normal state of Business Process 2, 26.

(36) CHAPTER 3. BCRA: MODEL DESCRIPTION. and “BP 2” as the fail state. The same settings are applied in B1 and B2. Then we can read from the table that P (BP 2 | B1, B2) = 1, P (BP 2 | B1, B2) = 0, P (BP 2 | BP 1, B2) = 0.8, P (BP 2 | B1, B2) = 0.36, etc. Suppose P (B1, B2) = 0.8, P (B1, B2) = 0.08, P (B1, B2) = 0.08, P (B1, B2) = 0.04. Note that marginal probability of each state in Business Process 2 can also be calculated from the table P (BP 2) = P (BP 2 | B1, B2)P (B1, B2) + P (BP 2 | B1, B2)P (B1, B2) + P (BP 2 | B1, B2)P (B1, B2) + P (BP 2 | B1, B2)P (B1, B2) = 0.9536. (3.6). Similarly, if “fail” state in Building 1 has been set evidence, then the updated marginal probability of “normal” in Business Process 2 will become P (BP 2) = P (BP 2 | B2)P (B2) + P (BP 2 | B2)P (B2) = 0.7808. (3.7). which is lower than that without evidence. However, when the CPT becomes very large and the Bayesian network is complicated, inference algorithms are better solutions to this question. We will discuss this in Chapter 5. The tabular representation also has several significant disadvantages. First, it is clear that we cannot store continuous random variable values in a CPT, which are often discretized into several intervals. Even in the discrete setting, we encounter difficulties. The number of parameters needed to describe a table-CPD is the number of joint assignments to X and P aX , that is, |V al(X)| · |V al(P aX )|. This number grows exponentially with the number of parents. Thus, for example, if we have 5 binary parents of a binary variable X, we need specify 25 = 32 parameters; if we have 10 parents, we need to specify 210 = 1024 parameters. Clearly, the tabular representation rapidly becomes large and unwieldy as the number of parents grows. Huge tables not only require a lot of patience to fill in all conditional probabilities (perhaps thousands values), but also brings a challenge to the capacity of our computer memory. When learning parameters from data and implementing inference algorithms, huge CPT may give rise to too much time on running, even result in running out of memory. Furthermore, there might be some regularity in the parameters that describe similar possibilities of combination. For instance, in Table 3.1, the probabilities of “fail” or “normal” in Business Process 2 are the same under the condition combinations (B1, B2) and (B2, B1). Thus we may want to look for a better representation in order to reduce the number of parameters needed to specify a CPD.. 3.3.2. Noisy-OR CPDs. A practical solution to the problems in Section 3.3.1 is presented by taking advantage of independence of causal interaction (ICI), which provides gates that offer a reduction of the number of parameters required to specify a conditional probability distribution from exponential to linear in the number of parents. The two most widely applied ICI distributions are the binary Noisy-OR model and its generalization, the Noisy-MAX model [21]. They have the advantage that they are using a small numbers of parameters to represent the entire CPT. This superiority leads to a significant reduction of effort in filling in probability values, improves the quality of distributions learned from data, and reduces the running time and complexity of algorithms for Bayesian networks. In this section, we will introduce the Noisy-OR model (Noisy-MAX 27.

(37) CHAPTER 3. BCRA: MODEL DESCRIPTION. as an extension to multiple values of variables), to see how it can simplify the CPDs and its relationship with CPT. Moreover, as Bayesian network may not be a perfect model that matches our real business structure, a leak state can be imported which consider some possibilities of causes that may not be included in our model [22]. The Noisy-OR model with leak state is first introduced by M.Henrion [22] , and J.Diez [23] gives another representation of it. In the Noisy-OR model, multiple causes influence an effect independently, and their combination is specified as a “or” gate. Figure 3.3 shows the model graphically.. Figure 3.3: Noisy-OR Model As in Figure 3.3, we define Z as an effect variable, the Xi as cause variables, and the Yi as hidden variables. Suppose Xi has two states 1 (target state) and 0 (subordinated target), which is the same for Yi and Z. We then define the probability of Xi , with cause Yi individually as a noise parameter λi , for i = 1, . . . , n, where ( 0 Xi = 0 P (Yi = 1 | Xi ) = (3.8) λi Xi = 1 The leak parameter in Y0 is λ0 , which is often set extremely small if we believe that our model is approximately complete. Note that Y1 , . . . , Yn are mutually independent given X1 , . . . , Xn . We immediately obtain a conditional probability distribution of Z, where. P (Z = 0 | X1 , . . . , Xn ) = (1 − P (Y0 = 1)). Y. (1 − P (Yi = 1 | Xi )). i:Xi =1. Y. = (1 − λ0 ). (1 − λi ). (3.9). i:Xi =1. and P (Z = 1 | X1 , . . . , Xn ) = 1 − P (Z = 0 | X1 , . . . , Xn ). (3.10). Formula (3.9) indicates that the conditional probability of Z that falls in target state given X1 , . . . , Xn can be written as the product of each conditional of Yi that falls in the target state as a function of the λi . It shows the decomposition of CPD and explains why the model is called Noisy-OR. The interaction between causes and effect is an OR relationship that each of the cause variables can influence the effect variable independently through hidden variables. 28.

(38) CHAPTER 3. BCRA: MODEL DESCRIPTION. Nonetheless, there is some noise that somewhat affects the value of effect variable. We can summarize this model in a parameter table as below (n = 3): Building 1 State Z=1 Z=0. X1 1 λ1 1 − λ1. 0 0 1. X2 1 λ2 1 − λ2. 0 0 1. X3 1 λ3 1 − λ3. Leak. 0 0 1. λ0 1 − λ0. Table 3.3: Parameter table of Noisy-OR model when n=3 In practice, we often hide the state columns when Xi = 0 they are identical. Here we have 3 × (2 − 1) + 1 = 4 parameters and 4 × 2 = 8 values in this table. To see how this model specifies CPD, we transform Table 3.3 into a CPT as below X1 X2 X3 Z =1 Z =0. 1. 0. 1 1 1 − φ(1, 1, 1) φ(1, 1, 1). 0 0 1 − φ(1, 1, 0) φ(1, 1, 0). 1 1 − φ(1, 0, 1) φ(1, 0, 1). 1 0 1 − φ(1, 0, 0) φ(1, 0, 0). 1 1 − φ(0, 1, 1) φ(0, 1, 1). 0 0 1 − φ(0, 1, 0) φ(0, 1, 0). 1 1 − φ(0, 0, 1) φ(0, 0, 1). 0 1 − φ(0, 0, 0) φ(0, 0, 0). Table 3.4: CPT of an effect variable with 3 parents that transformed by Noisy-OR model where φ(1, 1, 1) = (1−λ0 )(1−λ1 )(1−λ2 )(1−λ3 ), φ(1, 1, 0) = (1−λ0 )(1−λ1 )(1−λ2 ), φ(1, 0, 1) = (1−λ0 )(1−λ1 )(1−λ3 ), φ(0, 1, 1) = (1−λ0 )(1−λ2 )(1−λ3 ), φ(1, 0, 0) = (1−λ0 )(1−λ1 ), φ(0, 1, 0) = (1−λ0 )(1−λ2 ), φ(0, 0, 1) = (1−λ0 )(1−λ3 ), φ(0, 0, 0) = (1−λ0 ). Here we have 23 = 8 parameters and 8 × 2 = 16 values in this table. When n becomes large, say, 20, the number of parameters in an ordinary CPT will grow to 220 = 1048576. But in Noisy-OR, the number of parameters is only 20 × (2 − 1) + 1 = 21, which has a huge discrepancy comparing with that of CPT. In the example of Figure 3.2, we can use a Noisy-OR model to represent the CPD in Table 3.5 Parent State Fail Normal. Building 1 Fail Normal 0.2 0 0.8 1. Building 2 Fail Normal 0.2 0 0.8 1. Leak 0 1. Table 3.5: Noisy-OR model of example in Figure 3.2 Here λ0 = 0. It is easy to read from the table that P (BP 2 | B1, B2) = (1 − 0)(1 − 0) = 1, P (BP 2 | B1, B2) = 1 − (1 − 0)(1 − 0) = 0, P (BP 2 | B1, B2) = (1 − 0.2)(1 − 0) = 0.8, P (BP 2 | B1, B2) = 1 − (1 − 0.8)(1 − 0.8) = 0.36, that we obtain the same result as in Section 3.3.1. If we set λ0 = 0.001 in Table 3.6, then the resulting conditional probability distribution is almost identical with Table 3.2 on most purposes. As an extension of Noisy-OR, Noisy-MAX model deals with an effect variable that has multiple states. The resulting value for the effect variable is the maximum of the states produced by each of its cause variables. The probability distribution of an effect variable Z given X1 , . . . , Xn as its causes without leak state can be expressed as in Table 3.7. We obtain a CPT in Table 3.8 using Table 3.4 29.

No results found