Two Problems for CaR Calculation in Operational Risk Modeling

(1)

Two Problems for CaR Calculation

in Operational Risk Modeling

(2)

Abstract

Ever since singled out from credit risk, operational risk has drawn more and more attentions from financial institutions. Operational losses are often considered to follow a compound counting process, the impact of which is modeled by severity distribution. In the first part, this thesis spend efforts showing Burch-Larsen’s semi-parametric (BLS) approach can be an alternative of modeling the severity; it uses the cdf of the favorite candidate of LDA as the transformation function, and provides a better goodness of fit than the LDA. In the second part, we revisit the shortcomings of linear correlation as the dependence measure: correlation is influenced by the marginal behaviors, and it does not perform well in the non-Gaussian case. We try to show that Imann-Conover (IC) approach is a more reasonable approach than Cholesky Decomposition (CD) approach.

Acknowledgement

(3)

Chapter 1 Introduction

The economical crisis between 1970s and 1980s made the central bank governors of the G-10 realize the importance to supervise the active international banks. The Basel Committee was established as the result. In order to make sure banks survive the worse scenario, the committee introduced a capital measurement system, known as Basel Capital Accord. One of the main tasks of Basel Capital Accord it to require the minimum capital (or the Capital-at-Risk) as a “cushion” against the unexpected loss. The original Accord focusing on credit risk often refers to as Basel I Accord. In banking sectors, there are mainly three types of risks: market risk, credit risk and operational risk.

Until the 1999, operational risk was often implicitly included in the class of credit risk and referred as “other risk”. Both increasing demand on sophistication in the risk measurement and developing bank practices such as outsourcing, securitization, etc. make it necessary to single out this “other risk”. In the 1999, the Committee issued a proposal for a New Capital Adequacy Framework, also known as Basel II Accord. In the section 1.3.1, operational risk is for the first time introduced as a new risk class for whom banks are required to set aside regulatory capital. Basel Committee on Banking Supervision 2004 defines operational risk: “the risk of loss resulting from inadequate

or failed internal processes, people and system or from external events”. The scope of

the operational risk is quite wide, covering from the small typing errors to the huge internal frauds and from the front office to the back office. One of the most notorious cases in the history of operational risk loss is the one that busted Barings Bank in the 1990’s. A rogue trader, Nick Leeson, took unauthorized speculative positions in futures and options, and used an error account to hide his losses. As the losses grew, he bought more than 20,000 futures trying to extricate him from the mess. In the end, Barrings lost around three quarters of the $1.3 billion from these trades, and went bankrupt in 1996.

(5)

operational risk ranges from 15% to 25% for most of the banks. Basel II defines three approaches for the calculation of CaR: Basic Indicator Approach (BIA), Standardized Approach (SA) and Advanced Measurement Approaches (AMA). The BIA is the most straightforward approach. The CaR is determined by multiplying gross income by a fixed percentage. By the SA, banks divide their activities into a number of business lines. Within each business line, CaR is calculated by multiplying the gross income of the business lines by a fixed percentage. This fixed percentage is usually different between every business line. The BIA and SA are for the banks with moderate exposure to operational risk losses, while for the internationally active banks, like Bank F, Basel II requires AMA. Basel II gives a certain degree of freedom to the banks for modeling their operational risk given some boundary conditions. This thesis will mainly focus on two of AMAs: scenario-based AMA (sb-AMA) and Loss Distribution Approach (LDA). They are adopted by Bank F to calculated and benchmark the CaR.

1.1 Scope

CaR calculation is always the first and foremost task of operational risk modeling. Both sb-AMA and LDA are widely adopted for this purpose. Both of these two AMA can be divided into two steps: to model the marginal operational losses and to constitute the joint operational losses. Since every loss event caused by operational risks can be described from the two perspectives: the impact and the frequency (per year), sb-AMA and LDA estimate the distribution functions for the impact and the frequency, giving the severity distribution and the frequency distribution respectively. As the result, the marginal operational losses can be obtained after aggregating the r.v.’s from the severity and the frequency distributions.

(6)

AMA is also called a “qualitative” approach. Cooke [1991] has more about how to calibrate the subjective opinions and to combine a final distribution. Under LDA approach, banks estimate, for each business lines/event type cell, the distributions of the severity and of the frequency; its estimations are on the cell level. Another difference between LDA and sb-AMA is that LDA relies on fitting the internal historical data to estimate the distributions.

(7)

the data but not the tail, and another may fit the tail but not the body. The only model that is close to the observed data is trustworthy. In another word, the sufficient GOF is essential.

After deciding on the marginal operational loss, how to constitute the jointly distributed operational losses is another essential step in CaR calculation. People like to ask which method can be used to do so, given that both the linear correlations and the marginal distributions of the losses are known. One of the possibilities is Cholesky Decomposition (CD) approach. McNeil [2005]1 gives an example of CD approach. The precondition of applying CD approach is the satisfaction of normality assumption about the operational losses, i.e. the marginal operational loss should follow the normal distribution, and the dependence structure (also called as “reference distribution”) should fit the normal copula. First, the marginal operational loss is unlikely to be normally distributed due to the heavy-tailed feature of the severity distribution. Mildenhall [2005]2 believes the features of the severity distribution would be diversified away only if the frequency is high enough and the severity is tame. Yet, the internal data from experts’ opinions suggests 3 quarters of incidents happens at low frequency level: less than twice a year. Secondly, the reference distribution may instead follow the copula with strong tail dependence; one big incident often influences several business lines, thus the operational losses reported by them tend to cluster together.

1.2 Objectieves

As a parametric method, LDA restricts its choice of severity distribution within the several simple distributions, which may provide insufficient model fits. It seems to be natural idea to add the distributions with more parameters as the candidates. Panjer [2006]3 does not think it is a good idea due to its violation against the Parsimony principle.

(8)

As the first objective of the thesis, we resort to a semi-parametric method for this issue. No operational severity data will be shaped exactly as a parametric distribution. To estimate a purely parametric severity distribution may introduce modeling error at very early stage; the later simulated result is influenced by this initial decision, which is made as if the data was a realization of this chosen distribution. To capture the model uncertainty, Thuring [2008] shows a widely-accept approach would first transform the raw data with the cumulative distribution function (cdf) of the chosen distribution, estimate the density function by non-parametric method (Kernel density estimation), then transform backwards with the same cdf. In this way, the non-parametric modeling method allows the priori information from LDA to “guide” itself. Buch-Larsen [2005] firstly came up with a “one-model-fit-all” approach to model the severity of the insurance data by using the cdf of generalized Champernowne distribution. We believe it is more reasonable to use the cdf of the distributions chosen by LDA, instead of the cdf of a fixed distribution. We call it modified Buch-Larsen Semi-parametric (BLS) method in the following context

In the first part of the paper, we would like to put both the benchmark approach and (modified) Buch-Larsen’s approach to the test, and to answer the question: “Does BLS approach outperform LDA benchmark approach by providing a density function with better GOF?”

(9)

approach theoretically stronger than CD approach. We hope to answer another question in the end: “Whether IC method empirically outperforms CD method?”

1.3 Outline

After preparing the readers with basic concepts in operational risks, Chapter 2 summarizes Basel II’s relative contents about operational risk and the three fundamental approaches for the minimal capital calculation. Chapter 3 discusses three AMAs: one quantitative and two qualitative methods. Chapter 4 starts with analyzing the shortcomings of these methods, then introduces the semi-parametric method (BLS approach), and compares it with the parametric method (LDA) in a simulation study in the end. We enter into the second issue since Chapter 5. After introducing the copula concept, Chapter 5 shows IC method is theoretically more robust than the counterpart is. At the end of Chapter 5, we check the empirical performance of IC method in a case study. Chapter 6 concludes the thesis.

(10)

Chapter 2 Operational Risk in Basel II Accord

This chapter starts with the definition of operational risk and the important concepts in section 2.1. Section 2.2 introduces the three proposed approach for Minimal Capital Calculation with ascending degree of complexity. They are Basic Indicator Approach (BIA), Standardized Approach (SA), and Advance Measurement Approach (AMA). As the thesis focuses on modeling operational risk by AMA, section 2.2 also gives the quantitative standards for AMAs according to Basel II.

Section 2.1 the Definition and the Concepts

In the document “Basel Committee on Banking Supervision 2004”, operational risk is defined as “the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events. This definition includes legal risk, but excludes strategic and reputation risk.” The typical operational risks are the results of various events such as internal (or external) fraud, losses due to IT system’s collapse and losses caused by natural catastrophe, etc.

Event Types

Each (loss) event is associated to a specific event type. There are normally eight basic event types.

Event Types Internal Fraud; External Fraud; Malicious Damage;

Employee Practices &Workplace Safety; Client, Products& Business Practices;

Disaster and public safety; Technology and Infrastructure Failure;

Execution, Delivery and Process management

(11)

Banks’ activities can be generally divided into eight business lines4: Corporate Finance; Trading& Sales; Retail Banking; Commercial Banking; Payment & settlement; Agency Services; Asset Management; and Retail Brokerage.

Loss Event

A loss event is an actual operational risk event that triggers a monetary loss; it is the aftermath of an operational incident. In the following parts of our thesis, the concept “event” always refers to loss event. Not all operational events necessarily lead to monetary loss at the end; some events may even bring profit. We will only consider of loss events in our thesis.

Incident

An incident is a conceptual operational risk event that happened before and may well happen again. For example, “Power breakdown” is an incident (i.e. a conceptual operational risk event) that can translate into one or several Loss Events (i.e. actual operational risk events). For instance, the electric power of one Bank F’s building was cut on 21/2/2003 and again on 6/12/2003 due to some ground works in the neighborhood—the same incident caused two Loss Events.

Cells

One distinct business line and one distinct event type combine to form a cell. According to the numbers of aforementioned business lines and event types, we obtain 64 cells5. During a certain period (one year), several incidents may also occur in a given cell6, i.e. they share the same event type and business line. The main purpose of filling incidents to cells is that it is easier to consider the dependence (correlation) between the cells rather than between the operational losses caused by incidents.

4_{They are often named as top-level business lines. In the practice, banks usually details business lines} into several levels.

5_{They are often called “Basel cells”. In the practice, one business unit of a bank can be located from} more than one dimension (business line): sub-business lines, geographical axis, and legal entity axis. The Combinations of the three axes with event types are called “basic cells”. To simplify the problem, we consider the simplest version of the basic cells to be the combination of top business lines and event types.

(12)

Aggregate Loss

The aggregate loss is a sum of loss amounts, representing the risk exposure. There are two types of aggregate loss: aggregate loss of an incident and aggregate loss of a cell. As we only consider the dependence at the cell level, the aggregate losses of the incidents in one same cell will be considered independent from each other.7 As the result, an aggregate loss of a cell is simply the summation of the aggregate losses brought by the incidents that occur in this cell. Please note that “aggregate loss” in the below context means aggregate loss of a cell, unless particularly mentioned.

Section 2.2 Three Proposed Approaches

According to Pillar 1 of Basel II, the Basel Committee wants banks to set aside a minimum capital to cover the operational losses depending on the operational risk exposure. The Committee proposed the three approaches for calculating operational capital requirement (or Capital at Risk) in a continuum of increasing sophistication and risk sensitivity: the Basic Indicator Approach (BIA), the Standardized Approach (SA) and the Advanced Measurement Approach (AMA). The BIA and SA are designed for the small banks and the banks facing relatively low exposure of operational risks. The AMA is more complex and risk-sensitive, targeted to the internationally active banks and the banks facing high exposure of operational risks. A bank is permitted to use the BIA or the SA for some entities and an AMA for others. However, it will not be allowed to switch back from the AMA to either a BIA or a SA once approved for using the AMA.

Basic Indicator Approach (BIA)

BIA requires banks to hold the minimum capital for operational risk equal to a fix percentage α of the average of positive annual gross income (GI). We use capital at risk (CaR) refer to the minimum capital.

Standardized Approach (SA)

(13)

According to the SA, the bank is segmented into standardized business lines. The CaR for each business line is then computed by multiplying the gross income of the business line by a fixed percentageβ .

∑

= = × = = N 1 i (i) i N 1 i (i) β GI CaR CaR ,

Where N is the number of business lines and GI is the average of positive annual _(i) gross income for business line i.

Advanced Measurement Approach (AMA)

Under the AMA, the regulatory capital requirement will equal the risk measure generated by the bank’s internal operational risk measurement system (such as CARMEN for Bank F) which shall satisfies the quantitative and qualitative criteria given by The New Basel Capital Accord. There are six quantitative standards proposed in the Committee of Banking Supervision in Basel II (2003)8: Soundness Standard; Detailed Criteria; Internal Data; External Data; Scenario Analysis; Business Environment and Internal Control Factors. The following clauses from the above standards are related to this thesis.

Section 667:“Given the continuing evolution of analytical approaches for operational

risk, the Committee is not specifying the approach or distributional assumptions used to generate the operational risk measure for regulatory capital purpose. However, a bank must be able to demonstrate that its approach captures potentially severe ‘tail’ loss events. Whatever approach is used, a bank must demonstrate that its operational risk measure meet a soundness standard comparable to a one year holding period and a 99.9% confidence interval.”

--from Soundness Standard

Section 669 (d): “Risk measures for different operational risk estimates must be

added for purposes of calculating the regulatory minimum capital requirement. However, the bank may be permitted to use internally determined correlations in operational risk losses across individual operational risk estimates, provided it can

(14)

demonstrate that its systems for determining correlations are sound, and the correlation assumption must be validated.”

--from Detailed Criteria

Section 675 “A bank must use scenario analysis of expert opinion in conjunction with

external data to evaluate its exposure to high severity events. ...”

(15)

Chapter 3 Three AMAs

For the internationally active banks like Bank F, it is essential to have an operational risk model based on AMA for the CaR calculation. Although having given the qualitative and quantitative criteria, the Committee has left freedom to banks to formulate their own AMA. In this chapter, we introduce the three most popular AMA modeling methods: Loss Distribution Approach (LDA), scorecard approach and scenario-based approach.

A Quantitative Method

LDA method is the most frequently used quantitative approach under AMAs, and it completely relies on fitting the historical data to estimate the aggregate loss density. Thus, it is considered as a backward looking method. For each cell, the bank models the impact and the frequency of loss events on annul basis, then aggregates the severity and frequency distribution as the distribution of the aggregate loss.

LDA approach calculates the aggregated capital at risk (CaR) as the simple sum of the cell-level CaR9:

∑∑

= = = N 1 i M 1 j (i,j) CaR CaR ,

where N is the number of business lines and M is the number of event types.

Basle Committee on Banking Supervision proposes that the CaR is the difference between the Expected Loss (EL) and the α -level quantile of the cdf G of aggregate loss: ) ( G EL ) CaR(_α ₌ ₋ −1 _α _.

LDA considers the losses at cell level to be a compound counting process:

∑

= =R(i,j) 1 k k(i,j) j) (i, Z

X , where R(i,j) is the random variable for the number of operational risk event in type j on business line i. Z_k_(i,_j) is the random variable representing the severity of the loss event with the cdf F . Thus the cdf of _(i,_j) X_(i,_j) can be written as:

(16)

⎪⎩ ⎪ ⎨ ⎧ = > = ∗ ∞ =

∑

0 x (0) p 0 x (x) F (r) p (x) G j) (i, r j i, 1 n j) (i, j) (i, ,

where r]p_(i,_j)(r)=P[R_(i,_j) = and Fr (x)

j i, ∗ _{is r-fold convolution of} j) (i, F . The cdf G_(i,_j) often does not have analytical expression. Therefore, Monte Carlo (MC) simulation is usually the favorite option for banks to simulate the loss amount at cell level, X_(i,_j).

The LDA approach is a risk-sensitive approach in the sense that banks select their own loss severity distribution to estimate the CaR. In other words, it is possible to choose the favorite severity distribution according to the tail behaviors. However, the LDA approach is still far from being perfect. It gives no insight into the cause of the operational risks. Nyström (2002) claims that, under the LDA approach, the operational risk is exogenous and out of control from risk managers. Besides, the LDA approach heavily relies on the data quality, which is obvious a problem here. Due to these reasons, banks often combine the LDA approach with another qualitative approach to estimate the CaR. For example, Bank F is using the LDA approach as its benchmark approach, but using scenario-based self-assessment approach as the main approach.

Qualitative Methods

Given poor historical operational data, many banks are using qualitative approaches to measure operational risks. The qualitative approaches are usually based on experts’ opinions instead of fitting historical data. Two of the most popular approaches are Scorecard Approach and Scenario-based Approach.

(17)

Different with a capital calculating approach, the scorecard approach is often used to allocate and adjust operational risk capital to the business line on a top-down basis10, and to recognize banks’ risk exposure in a forward-looking way. 11

Scenario-based approach (also known as sbAMA) is also a forward-looking approach. It models the (severity and frequency) distribution at the incident level by expert judgment instead of by fitting data. Specifically, experts express their opinions about the distribution (of severity and frequency) of loss event caused by incidents through a few quantiles; by giving the quantiles q_a and q_b, a< b, an expert gives his opinion that there is a probability of (b-a) that the loss amount (severity) lies in the interval (q_a, q_b). We should notice that no all of experts’ opinions are of same quality in the sense that some give more precise estimation to the distribution than the others do. A calibration is needed in order to determine the influence of each expert on the final distribution. Cooke (1991)’s Classical model estimates the distribution as a convex combination of the quantiles given by experts, with weights obtained in calibration phase.

Both of the above qualitative approaches contain a great deal of ‘guess works’, which makes them less reliable. Section 675 requests Bank F to use sb-AMA as the approach to calculate CaR. Bank F’s Capital Requirement Modeling Engine (CARMEN) uses scenario-based approach to estimate the severity and frequency distributions at the incident level, under the assumption that the severity of loss event follows lognormal distribution and the frequency follows Poisson distribution. In the appendix 3, we show how CARMEN applies the sb-AMA.

10_{Some banks like Bank F allocate the operational risk capital on a top-down basis by gross income} weights.

(18)

(19)

Chapter 4 A Semi-Parametric Approach

Operational risk losses come from many different sources and the severity distributions can have a variety of shapes. The severity data can come from unknown and complex distributions, which makes it less reasonable to fit them with one simple distribution. LDA makes a progress by preparing a group of simple distributions as candidates. However, it may be still not enough to provide sufficient fit, thus the severity estimate is less trustworthy. As a parametric modeling approach, LDA may introduce a model error at the early stage by assuming the observations are the realizations of the estimated severity distribution. Besides, there is another shortcoming of LDA. Since LDA chooses the MLE to estimate the parameters of the severity distribution, it ignores the issue of “Parameter Uncertainty”.

Bank F is using the LDA approach as the benchmark approach and sb-AMA (used by CARMEN) to calculate the CaR. In the introduction chapter, we have stressed the importance of the estimation of severity distribution to the CaR calculation. In the section 4.1, we are going to discuss about the problems with these two approaches, and show that a semi-parametric approach can be the approach providing a further improvement. Section 4.2 provides on the theoretical derivation and the asymptotical properties of BLS approach. In the section 4.3, we compare the results of BLS approach with that of the LDA Benchmark approach in a simulation study.

Section 4.1 Problems with sbAMA and LDA

(20)

Distribution Tail Behaviors Number of Parameters

Lognormal Medium-tailed 2

Weibull Light-tailed 2

Loglogistic Slightly heavy tailed 2

Pareto Heavy tailed 2

Loggamma Heavy tailed 2

Lognormal-Gamma Medium to heavy tail 3

Burr Light to heavy tail 3

Table 4.1: Candidates in Benchmark Approach

Comparing to sb-AMA of CARMEN, the LDA benchmark approach makes an obvious improvement on deciding the severity distribution; The LDA benchmark approach no longer ignores the distinct tail behaviors, but prepares seven distribution candidates with a variety of tail behaviors. The seven candidates are lognormal, Weibull, Loglogistic, Pareto, Loggamma, Lognormal-Gamma and Burr. They are in table 4.1. After estimating the parameters for each candidate by MLE approach, LDA selects the one that fits the loss data best for each cell; the model with the best GOF “score” wins. Panjer [2006] shows three GOF tests: Kolmogorov-Smirnov (KS), Anderson-Darling (AD) and Chi-square GOF test. It should be noticed that different GOT tests measures the overall fits in a different way and one should not expect that they will produce the same ranking for the set of candidate distributions. In our case, the fit at the up tail is more import than the rest parts for the estimation ofG−1(_α)_.

The (adjusted) AD test is the only one whose test statistic is more sensitive to the difference in (upper) tail(s). The (adjusted) AD’s test statistic is:

) ( )) ( 1 ( ) ( ) ( AD * 2 * * 2 up dF x x F x F x F n

∫

e − − = ,

where F_e(x)is empirical cdf, _F*(_x)_{is estimated cdf and n is the number of observations.}

(21)

from one of these candidates, which is almost impossible in the real world, this kind of approach is rather likely to fail.

In order to improve the GOF, some suggest modifying LDA by fitting a two-component spliced model such as “lognormal+ Generalized Pareto Distribution (GPD)”. It seems a good idea to have one simple distribution to fit the left body of the data and to have a GPD to fit the right tail. However, there are two problems. The first problem comes from the ever-existing issue in GPD estimation: the selection of the threshold value. It still relies on the method of visual detection like sample mean

excess plot to select the threshold value. In the case of operational risk losses, where the data size is often small, the result can be very unstable. The second one actually sources from the first problem. Coleman [2002]12 shows the estimated shape parameter will become rather volatile when taking distinct values of threshold. More importantly, he warns that the CaR is very sensitive to the value of shape parameter. Another shortcoming of LDA relates to the size of the data. The size of the severity data differs from the one event type to another. For example, the size for the heavy-impact/low-frequency scenarios is often smaller than that for the small events. The LDA benchmark approach produces point estimators (MLE) for the unknown parameters; MLE are the estimated parameters who return the highest likelihood (product) based on the all observed losses. There is a degree of uncertainty with these estimated parameters, since we cannot access the whole population of the losses. Shevchenko [2007]13 warns that the issue of the parameter uncertainty tends to be more serious when sample size is small. LDA does not take it into account. It is a popular way to consider the parameter uncertainty by using Bayesian framework. In this thesis, however, we will not touch this issue, but focusing on the model uncertainty of LDA.

Finally yet importantly, both the sb-AMA and LDA are pure parametric modeling approaches. One of the major shortcomings of the parametric modeling approach is that they may introduce the model errors at the early stage by assuming the severity observations as the realizations of the chosen distribution. In contrast, Nielsen

(22)

[2007]14 points out that the non-parametric modeling approaches have no assumption of the shape of the severity distribution. Instead, the estimate of severity distribution is purely influence by the observations. As the result, the non-parametric estimate is entirely uncertain about the shape. Therefore, a semi-parametric modeling approach can be the answer, who uses non-parametric (Kernel smoothing) technique but allows itself to be guided by prior knowledge of distributional shape.

After clear with the problems with the sb-AMA and LDA, we switch our attentions to this semi-parametric approach. Buch-Larsen [2005] develops such a semi-parametric approach, which shall provide better GOF and avoid of introducing model errors too early.

Section 4.2 BLS Approach

There is more than one non-parametric approach to estimate the severity density; the popular approaches include of histogram, classical Kernel density estimator, etc. This section shows how to derive BLS density estimator based on the classical Kernel density approach; it also uses the asymptotical properties of BLS to show why BLS is a distinct but better non-parametric approach.

Histogram can be considered as the estimator of the unknown density function f(x). However, it has three undesirable drawbacks15: not smooth; depend on end points of bins; depend on bandwidth of bins. Comparing to Histogram, classical Kernel density estimators overcomes the first two drawbacks by using a smooth kernel without end point instead of a block, though it still depends on the bandwidth. We denote fˆ x_h( ) as kernel density estimator, which is expressed as the second-order Taylor expansion of f(u). The bias of fˆ x_h( ) is approximated16 as:

(23)

where h x -u s with , ) ( ) ( 2 K =

∫

s⋅K s ds =

μ . K ( ) is the kernel function, and h is the

bandwidth. We see that the bias is proportional to

2 2

h

and depends on the curvature f ′′(x).

Panjer [2006] classifies the classical Kernel density model into the data-dependent distribution17. The data-dependent modeling method constructs the model without considering any priori information about the possible shape of the true distribution. The estimate completely depends on the observations, thus is entirely uncertain. Many scholars believe it is necessary to combine a priori information to “guide” this data-dependent-only approach. A widely-accept approach would first transform the raw data with the cumulative distribution function (cdf) of the chosen distribution, estimate the density function by non-parametric method (Kernel density estimation), then transform backwards with the same cdf. BLS approach is an example.

According to Buch-Larsen [2005], the cdf of modified Champernowne distribution is used as transformation function. In this paper, we use the cdf of the favorite distribution decided by LDA benchmark approach as the transformation function. The transformation function is noted asT_ˆΘ( ). The algorithm of BLS is following:

(i) Apply the LDA Benchmark method, decide the favorite distribution and estimate the vector of parametersΘ by MLE;

(ii) Transform the data set X_i, i=1,...,n , with transformation function,

) ( ˆΘ T : n i X T Y_i = _Θ_ˆ ( _i ), =1,..., .

(iii) Calculate the classical kernel density estimator based on the transformed data, Y_i , i=1,...,n: ˆ ( ) 1 ( ), 1 ∑ = − = n i i tran y _Nh K y _hY f

where K( ) is the kernel function, h is the bandwidth.

(iv) The estimator of the density of the original data set, X_i, i=1,...,n is

(24)

) ( ) ) ( ) ( ( 1 ) ( ˆ _ˆ 1 ˆ ˆ ) ( ˆ x T h X T x T K Nh x f n i i x T = Θ Θ Θ − _⋅ _′ = ∑ Θ . ) ( ˆ x

T_Θ′ stands for the first derivative of T_ˆΘ( ) to x, therefore T_Θ_ˆ′(x) is the density function of the favorite distribution. From step (iii) to step (iv), the kernel density estimator of the transformed data Y_i i=1,...,n is transformed to that of the raw dataX_i, i=1,...,n; this step can be explained if we extend Theorem 4.2518 in Panjer [2006].

Does the Kernel density estimation perform better after being “guided” by LDA? According to Buch-Larsen [2005], the variance of kernel density estimator can be

approximated as: ), nh 1 ( ) ( || || nh 1 } ) ( ˆ Var{ 2 2 f x o K x fh = + where||K||22=

∫

K2(s) ds. L.

Yang [2000] shows that Kernel density estimator asymptotically follows a normal distribution: . ) ( || || , 0 ~ ))) ( ˆ ( ) ( ˆ ( 2₂ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ −E f x N K f x x f Nb _h _h Nh 1

By applying cdf as transformed function, L.Yang shows us a more stabilized fˆ x_h( ); kernel density estimator will converge to a normal distribution with a smaller standard error: . ) ( ) ( || || , 0 ~ ))) ( ˆ ( ) ( ˆ ( 2₂ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ _′ −E f x N K T x f x x f Nb _h _h Nh 1

Therefore, BLS provides a more stable density estimate than the classical Kernel density estimator. As to the question whether BLS approach outperforms the LDA benchmark approach, we shall provide empirical evidence in the next section.

Section 4.3 Simulation Study

As we discussed in previous sections, the LDA may introduce the model error too early by assuming the data as the realization of the chosen candidate. BLS is a semi-parametric approachthat plays under the prior knowledge of distributional shape. This knowledge is contained in the estimated cdf of LDA of the severity distribution. The data transformation makes BLS a different and better non-parametric method compared to the method of classical Kernel density estimation. Is BLS a better

(25)

approach than LDA? We shall apply the GOF test to check the quality of the fit. Besides, we shall notice that it is not enough to just look at a GOF test of one loss sample. The GOF of a loss sample may indicate one approach provides a better fit than another, but it may not be the case for the entire population; if GOF test statistic is considered to be the parameter, there is uncertainty in the parameter estimate. Therefore, the stability of the “score” is also very important, beside of the “score” itself.

There are two major obstacles in the comparing between the LDA benchmark approach and the BLS approach: (1) the parameter uncertainty exists due to the limit size of the severity data, and it causes the uncertainty of the GOF test statistic as a result; (2) the true severity distribution is unknown. In the first part of the section, we will frame an algorithm to solve these two obstacles. We shall give the results of visual and empirical results in the later part.

Algorithm

To deal with the parameter uncertainty, Resampling method is a common way. This method generates many data samples of the same size from the true distribution (or its estimate: the empirical cdf) and calculates the parameter estimates for each sample to get the distribution of the estimates. Given the severity observations x1, x2, x3,…,xn as realization of independent severity random variable with common distribution S, we want to investigate to variability and sampling distribution of the AD2up test statistics. Given a targeted method (say LDA method), we denote the corresponding AD2up test statistic as θˆ , which is the function of random variables X1, X2, X3, …, Xn. This posterior distribution f_Θ|_X(θ|x) often has a complicated form. Thanks to the Monte Carlo approach, we are able to conquer it by simulation. We generate a large number, NN, of samples of size n from S; for each sample we compute the value of θˆ , bringing us NN of sequence *

NN * 2 * 1,θ ,...,θ θ . As a result, the empirical distribution of * NN * 2 * 1,θ ,...,θ

θ is an approximation to F_Θ_|_X(θ|x); the sample mean _θ*_{and sample deviation std(}_θ*_{) of the resulting values are respectively good}

(26)

S relies on these two factors: the sample mean _θ*_{indicates how small the DA2up}

test statistic is (i.e. how good the “score” is), and std(_θ*_{) shows the volatility of the}

DA2up test statistic (i.e. whether the “score” is stable). A good estimation method should be one reporting both lower_θ*_{and lower std(}_θ*_{). Until now, we have solved}

the first obstacle.

After solving the parameter uncertainty issue, we are facing the second obstacle: unknown distribution for the severity. There are however three circumstances in estimating severity distribution. In circumstance 1, the true severity distribution is exactly same to the one of the distribution candidates of Benchmark. In circumstance 2, the chosen candidate is rather close but not same to the true severity distribution. In circumstance 3, the chosen candidate is only relatively close19 to the true distribution. To simplify the problem, we fix “lognormal” as the chosen candidate. We propose two distributions as the true distributions respectively for circumstance 2 and 3: Champernowne (M, alpha) and a two-point mixture20 distribution p*lognormal (μ ,

σ

) + (1-p)*Pareto ( α ,θ ). Champernowne (M, alpha) has such a favorite characteristic that it is quite closed to lognormal in the body part, and converges to the Pareto distribution in the tail21. By controlling their parameter values, we make the Chapernowne (M, α )close to the chosen candidate “lognormal (0, 1)”, and the two-point mixture distribution less close to lognormal (0, 1). As a result, we obtain two true severity distributions: “Champernowne (M=0.9985008, α=1.741511) ” and “p*lognormal+ (1-p)*GPD(p=0.7, μ=0, σ=1, α=1, β=1)”.

True Distribution Density Parameters

Two-point mixture of p*lognormal (μ,σ ) + (1-p)*Pareto (α ,θ) 1 2 ) (log 2 ) ( ) 1 ( 2 1 ) ( 2 + − − + − − = α α σ μ θ αθ πσ x p e x p x f x ) 1 , 1 , 1 , 0 , 7 . 0 ( ) , , , , (p μσ α θ = Chapernowne (M, α ) α α α α α M x x M x f + = −1 ) ( (M,α)=(0.9985008,1.741511)

(27)

Table 4.3: Two True distributions

There are also three situations in term of the size of observation data: small sample (say size=30), median sample (say size =300), and large sample (say size=1000). We randomly generate a population of 10000 observations from the two aforementioned distributions. By bootstrap resampling NN times from the each population, we are able to artificially create NN small samples (with size=30), NN median samples (with size=300) and NN big samples (with size=3000). Before each time’s resampling, we apply both the modified Buch-Larsen’s approach (with lognormal cdf as the transformation function) and LDA benchmark approach to the same sample and obtain Adjusted Anderson Darling test statistic. Consequently, we will obtain NN couples of the test statistics under each situation &circumstance.

Circumstance 2 Circumstance 3

Situation1 Chosen candidate close to true distribution/size=30

Chosen candidate not close to true distribution / size=30 Situation 2 Chosen candidate close to true

distribution / size=300

Chosen candidate not close to true distribution / size=300 Situation 3 Chosen candidate close to true

distribution / size=1000

Chosen candidate not close to true distribution / size=1000 Table 4.4: Situations and Circumstances

A better approach should be the one that reports lower sample mean and lower standard deviation of the GOF test statistics. In other words, it shall constantly provide better “scores” under different situations and circumstances.

Visual Result ---Circumstance 1

(28)

very precise estimation and clearly outperforms BLS. However, we will not be so lucky with the Benchmark since the situation like this rarely happens in the real life.

Figure 4.5

The density function of the true distribution is in red; the estimated density function by Benchmark approach is in blue; that by Buch-Larsen’s approach

(with cdf of estimated lognormal as transformation function) is in green.

---Circumstance 2 and 3

In the real life, it is almost impossible to have the raw severity data fitted by one simple distribution; it is often the case that one distribution can fit the body of the data but not the tail, while another can fit the tail but not the body. Suppose the true distribution is either Champernowne or the two-point mixture distribution. Champernowne distribution has such characteristic that it converges to lognormal distribution in body part, while to Pareto distribution in right tail; the two-point mixture distribution has 70% inherited from lognormal (0, 1). These features lead the LDA benchmark approach to choose “lognormal” to estimate the density function. In other words, the severity distribution is wrongly decided.

(29)

GOF by BLS and LDA (circumstance 2)

x D ensi ty 0 2 4 6 8 10 0. 00 0. 05 0. 1 0 0. 1 5 0. 20 0. 25 0 .30 True: Chap(0.9985008,1.74151 BLS(lnorm) Pure Parametric(lnorm) Figure 4.6

The density function of true distribution is in red; the estimated density function by Benchmark approach is in blue; that by Buch-Larsen’s approach

is in green.

GOF by BLS and LDA (circumstance 3)

x D ensi ty 0 10 20 30 40 50 60 0. 00 0. 02 0. 04 0. 06 0. 08 True Distr:lognorm_0.7_pareto_0.3 BLS(lnorm) Pure Parametric(lnorm) Figure 4.7

The density function of true distribution is in red; the estimated density function by Benchmark approach is in blue; that by Buch-Larsen’s approach

(30)

Although the results of the visual tests favor the BLS approach, it is not safe, just basing on these results, to claim that the BLS outperforms LDA. To draw such a conclusion, we need also to consider the stability of its performance, i.e. whether BLS constantly reports a better “score”. Therefore, we follow the algorithm at the beginning of this section.

Empirical Result

The results are in the below table. When the size of the observations data is small, and the chosen candidate closes to the true distribution, the difference of GOF under the BLS and LDA benchmark approaches is very small; under the both approaches, the GOF test statistics are reasonably low and stable. (See the row of situation1/circumstance2.) However, the BLS shows a better GOF than the LDA benchmark approach in the case that the chosen candidate less close to the true distribution. (See the row of situation1/circumstance 3.)

mean_BLS std_BLS mean_LDA Std_LDA

Situation 1/Circumstance 2 2.958 1.980 2.858 2.197 Situation 1/Circumstance 3 2.251 1.174 3.585 1.987 Situation 2/Circumstance 2 5.773 3.638 8.665 7.170 Situation 2/Circumstance 3 3.691 0.613 18.107 9.674 Situation 3/Circumstance 2 9.586 2.746 16.424 10.139 Situation 3/Circumstance 3 13.472 2.591 NA22 _NA

Table 4.7: Results of AD2up test statistics

As the size of observation data increasing from 30 to 300 and to 1000, the GOF under the both approaches deteriorate, but to different extents: the “scores” deteriorate faster under the LDA than under the BLS. No matter whether or not the chosen candidate closes to the true distribution, the BLS approach keeps outperforming the LDA benchmark approach by providing lower and stable GOF test statistics all the time.

Conclusion

To sum up, we give the following two comments based on the above results:

(31)

1. When sample size is small, both the methods provide similar GOF. Yet, when the sample size is getting larger, the BLS starts outperforming the Benchmark method providing better and more stable fitness to the data in the general cases.

(32)

Chapter 5 Constitute Dependent Operational Risks

After obtaining marginal distribution of aggregate losses, there are two ways to calculate the minimum regulatory capital according to the section 669. The first way considers CaRbankas the simple sum of theCaRi,j; Frachot [2001] shows this way is

equivalent to assume perfect positive dependence; the worse scenarios come to banks hand by hand. The second way considers the diversification effect by realizing the imperfect dependence between the incidents. Banks are allowed to use internally determined dependence measurement to capture it.

Banks tend to treat the normality assumption as granted when modeling operational risk. Cholesky Decomposition (CD) method is a method under the normality assumption, and it uses the linear correlation to capture the dependence structure. We believe the assumption of normality may be problematic due to the heavy-tailed feature of marginal distribution and the strong upper tail dependency. When this assumption is unsatisfied, linear correlation should not be employed; a better dependence measure, rank correlation, should be used under a general situation. Imann-Covonover (IC) method is such a method applying rank correlation. However, it is still a question whether the IC method is able to empirically outperform the CD method.

In the section 1, we intend to show the CD method is one playing Cholesky trick to the linear (covariance) correlation. In order to explain the IC method more clear, section 2 introduce the concept of normal copula and Wang’s Normal Copula approach. Section 3 shows why the IC method is more theoretically robust than the CD method since it employs rank correlation instead of linear correlation as the dependence measure. In section 4, we apply the two methods in a case study, and make an empirical comparison.

Section 5.1 Cholesky Decomposition (CD) Method

(33)

covariance matrix of the standardization results, CARMEN actually applies the CD method given the values of the (sample) mean vector and correlation matrix.

Suppose X=(X₁ ,…, X_d)' follows a multivariate normal distribution such thatX=μ+AZ where Z=(Z₁ ,… ,Z_k)'is a vector of iid univariate standard normal rvs, andA∈Rd×k and μ∈Rd. Denote the covariance matrix asΣ=AA', which is both symmetric and positive definite23. Then there is exactly one lower triangular matrix L=(_l_ij) with _{l >0 for all j such that}_ij Σ=LL'.

We give an example how the lower triangular matrix can be solved by setting d=3 and Σ is as the following: ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = 22 32 31 23 22 21 13 12 11 Σ a a a a a a a a a .

It is possible to solve the following equations:

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⋅ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ 33 23 22 13 12 11 33 32 31 22 21 11 22 32 31 23 22 21 13 12 11 0 0 0 0 0 0 l l l l l l l l l l l l a a a a a a a a a .

McNeil [2005] gives the method of generating multivariate normal distribution by applying CD method:

Algorithm 3.2

(1) Perform a Cholesky Decomposition of Σ to obtain the Cholesky factorΣ1/2. (2) Generate a vector Z= (Z1,…, Zd)’ of independent standard normal variables. (3) SetX ₌_μ₊_Σ1/2Z_.

Given the value of correlation matrix T and the simulated aggregate losses, CARMEN combines the target correlation matrix T and the sample standard deviations

NM i

i, =1,...,

σ of simulated aggregate losses. As the result, CARMEN obtains the covariance matrix *Σ , and is prepare to follow Algorithm 3.2. Standardizing the simulated aggregate losses, CARMEN obtains the vector Z*. However, Z* may not be a standard normal vector due to the heavy-tailed feature of the simulated aggregate losses.

(34)

The whole CD method is found on the property of linear transformation of normal distribution; any normally distributed r.v. can be expressed as a linear combination of mean, variance and standard normal distributed r.v. However, this will not be a case when the r.v. does not follow the normal distribution!

Section 5.2 Imann-Conover Method

Normality assumption of operational losses has been doubted for two reasons: the heavy-tailed feature of the marginal losses and the strong upper tail dependency. The heavy-tailed feature is actually inherited from the severity distribution, which we have discussed in the former chapter. The upper tail dependency is decided by the dependence structure. In order to take clear idea about the dependence, we introduce Copula concept in this section. We also show the IC method is in fact no different with Wang’s Normal Copula method, both of which applies normal copula as the dependence structure.

Copula Concept

The joint distribution of aggregate losses implicitly contains two pieces of information: one describes the marginal distribution; another describes of the dependence structure. In the previous section, we have shown that one of the main flaws of correlation that it is influenced by the marginal distribution. A good dependence measure should isolate the dependence structure from the marginal distribution. The copula approach does so by expressing the dependence on a quantile scale.

Theorem 5.3 (Sklar 1959)

Let F be a joint distribution function with margins F1, …, Fd. Then there exists a copula C: [0,1]d Æ[0,1] such that, for all x1,…, xd in R=[-inf, inf],

F(x1,…, xd) = C(F1(x1), …, Fd (xd)).

If the margins are continuous, the C is unique; otherwise C is uniquely determined on Ran F1 × Ran F2 × … × Ran Fd, where RanFi=Fi(R) denotes the range of Fi.

(35)

There are three basic categories of copulas: fundamental copula, implicit copula and explicit copula. The implicit copulas including of normal and t copulas are implied by well-know multivariate dfs and do not have analytical forms. In market risks modeling, the copula of joint distributed log returns can be empirically estimated based on the well-recorded data from every trading day. In operational risk modeling, one big incident often influences several business lines, thus the operational losses reported by them tend to cluster together; the reference distribution tend to follow the copula with strong tail dependence. However, it is impossible to estimate copula of joint distributed aggregate losses in the same way as in market risk modeling, since the operational risks do not occur every day and are badly reported. Although we are quite sure about the strong tail dependency between the some operational losses, we choose the normal copula as the reference distribution. This is another major limit of our thesis.

Normal Copula approach

Under the normal copula assumption, Wang [2005] shows us how to use the concept of normal copula to generate a number of joint distributed rvs, given their marginal distributions and rank correlations between them. The following Theorem plays the core role in Wang’s normal copula approach.

Theorem 6.1 Assume that (Z ,...,1 Z_k) have a multivariate normal joint probability

density function given by

) 1 . 1 ( ) 2 / ' exp( | | ) 2 ( 1 ) ,..., (z₁ z z z f n k − ∑ ∑ = π ) ,..., (z1 zk

z= , with correlation coefficients ∑_ij =ρ_ij =ρ(Z_i,Z_j). Let H(z1,...,zk) be

their joint cumulative distribution function. Then

) 2 . 1 ( )) ( ),..., ( ( ) ,..., ( 1 1 1 1 uk H u uk u C ₌ _Φ− _Φ−

defines a multivariate uniform cumulative distribution function called the normal

copula.

For any set of given marginal cumulative distribution functionsF ,...,1 F_k, the set of

(36)

) 3 . 1 ( )) ( ( )),..., ( ( ₁ 1 1 1 1 F Z Xk Fk Zk X = − Φ = − Φ

have a joint cumulative function

) 4 . 1 ( ))) ( ( )),..., ( ( ( ) ,..., ( ₁ ₁ 1 ₁ ₁ 1 ,..., 1 X k k X x x H F x F x F k − − _Φ Φ =

with marginal cumulative distribution functions F ,...,1 F_k. The multivariate variables

) ,...,

(X1 Xk have Kendall’s tau

) 5 . 1 ( ) arcsin( 2 ) , ( ) , (Xi Xj τ Zi Zj _π ρij τ = =

and Spearman’s rank correlation coefficients

) 6 . 1 ( ) 2 / arcsin( 6 ) , ( ) , (Xi Xj rkCorr Zi Zj ij rkCorr ρ π = = .

Theorem 6.1 defines the concept of normal copula in (1.2). More importantly, by inserting (1.3) into (1.4), we would obtain the joint distributed rvs (X ,...,₁ X_k) sharing

the same rank correlation with the known multivariate normal rvs (Z ,...,1 Z_k).

Although Normal Copula does not have an analytic expression, Wang believes it can be generated by an algorithm of simple Monte Carlo simulation under the help from Cholesky decomposition. Suppose we are given the values of rank correlation and cdf of marginal distributions of (X ,...,₁ X_k). Firstly, we convert the values of the rank correlation to the values of linear correlation coefficient according to (1.5), and play Cholesky decomposition trick to obtain the lower triangular matrix B. Then, we generate a column vector of independent standard normal variables Y=(Y₁,...,Y_k)'. After taking matrix product of B and Y, we obtain a column vector

Z=(Z₁,...,Z_k)'=BY and it has the desirable linear correlation. At the end, we play (1.3) obtaining (X ,...,1 Xk ).

(37)

IC approach: doing the same as Normal Copula approach

The algorithm of IC approach from Mildenhall can be found in the appendix 2. The core of the IC method is also built on the theorem24 6.1 in Wang [1998]. Mildenhall has shown that the IC method is essentially same with the normal copula method. The normal copula method25 simulates H and then inverts using (1.3) of the theorem. In contrast, the IC method produces a sample from H at first such thatΦ(z_i) are equally scattered between 0 and 1, then makes the jth order statistic from the input sample correspond to Φ(z)= j/(n+1) where the input has sample size n. Because the jth order statistic of a sample of n observations from a distribution F approximates_F−1(_j/_n₊1)_{, the IC method is doing the same as the normal copula} method. However, we prefer the IC method to the normal copula method since the IC method does not require us the analytical expression of F, which is hard to obtain in this case.

Figure 5.2: Flow Chart of IC method

24

(38)

As the figure 5.2 shows, the output X =(X₁*, ...,X*_k)will be the re-order input sample with unchanged marginal distribution and with rank correlation coefficient equal to that of a multivariate normal distributionZ=(Z₁, ...,Z_k). More importantly, this output has the desired linear correlation coefficients, since the rank correlation is generally very close to the linear correlation. The key that decides whether the IC method returns the desired linear correlation is the closeness between the linear correlation and rank correlation of the losses. The more symmetric the marginal distributions are, the closer the rank correlation to the linear correlation is. The below two figures show how close the rank correlation of two r.v.’s can be to their linear correlation. The left figure shows that they are almost equal in the case of normal distribution, a symmetric distribution; the right shows that they are sometime different when the distribution is heavy-tailed.

(39)

Section 5.3 Measuring Dependence

Both the CD method and the IC method need the linear correlation as the input. However, the unstructured linear correlation between the losses of the cells is hard to estimate in a quantitative way, due to the lacking of historical data and a large number of cells; there are too few data, but too many estimates. If assuming the real linear correlation (covariance) matrix to be a Kronecker product of two linear correlation matrixes, we cut down the number of estimate.

Although both are using linear correlation, the CD method uses it as the dependence measure. While, the IC method regards it just as a “target”; the IC method uses rank correlation as the dependence measure, aware that the numeric value of linear correlation is often close to that of the rank correlation.

In this section, we first show how Bank F estimates the correlation by a qualitative method. Then, we give an analysis on the flaws of linear correlation as the dependence measure. In a contrast, the advantages of using rank correlation make the IC method theoretically more reasonable.

To frame the target correlation by qualitative method

Most banks are still using linear correlations as the dependence measure. Lacking of historical operational loss data, banks are forced to come up the correlation with qualitative methods26. It is a rather popular way to construct a structured correlation matrix by a Kronecker product of two unstructured matrixes, for decreasing the number of parameters estimated. Given a NM-dimension random vector X distributed normally, to have its covariance matrix Λ we need to estimate

1) NM(NM 2

1 ₊ _{parameters. As an alternative, we could consider Λ a Kronecker} product: M. M and N, N where Ψ= _ij × Σ= _ij × Σ ⊗ Ψ = Λ (ψ ): (σ ): .

(40)

By doing so, banks only need to estimate M(M 1)-1 2 1 1) N(N 2 1 ₊ ₊ ₊ _parameters.

Bank F distinguishes two types of correlation in operational risk: Functional correlation and Timing correlation. Functional correlation relates to dependencies within one value chain; a single event can result in multiple losses. Timing correlation concerns those types of operational losses that have a tendency to occur simultaneously or those types of losses where the frequency and/or severity show a sense of dependency during a certain period. Under some managerial guidelines, Bank F applies the zero-correlation assumption as far as the functional correlation concerns, i.e. the target correlation matrix relies on only the Timing correlation.

(41)

Figure 5.1: framing target correlation T

The target correlation T is the Kronecker product between the correlation matrix of Business Lines and the correlation matrix of Event Types

Beside of the convenience of having much less parameters to estimate, this qualitative approach brings the Bank with a forward-looking target correlation. In the market risk modeling, the empirical evidences show that the correlation between contemporaneous stock returns varies over time. It may be also the case for modeling the aggregate losses of operational risks.

Flaws of Linear Correlation as the Dependence Measure

We should be already well aware that correlation is not a perfect measure of

(42)

but the converse is not true”. McNeil [2005] gives an example27.

d k d k k R μ and R (iii)A Z; of t independen rv, valued -scalar negative, -non is 0 (ii)W I N ~ Z (i) where , AZ W μ X ∈ ∈ ≥ + = × ); , 0 (

Givencov(X₁,X₂)=0, Lemma 3.5 claims that X1 and X2 are independent if and only if W is almost surely constant, i.e. zero covariance does not necessarily mean independence. As the second flaw, correlations may not be defined due to the undefined second momentE(X2)_{. This may become a problem when we are dealing} with the aggregate losses from heavy-tailed distributions, which might to report infinite high moments.

Last but not the least, a good dependence measure should isolate the dependence structure from the marginal behavior, i.e. given (T₁,...,T_d) are different strictly increasing functions, by a good dependence measure a random vector (X₁,...,X_d) has the same dependence structure as the random vector (T₁(X₁),...,T_d(X_d)). 28 However, linear correlation only provides invariant answer under strictly linear increasing transformations, and it works well only in the case of elliptical distributions. Unless the individual aggregated loss X(i,j) follows Normal distribution, correlation matrix tends to work inefficiently. McNeil [2005] even claims that correlation has nothing to do with dependence! He shows an example that several distinct dependence structures share a same value of linear correlation coefficient29.

Linear correlation is simple to understand and easy to calculate thus it is still in use in the banking sector as a dependence measurement. Although Basel II does not mention any alternative measure, there is better dependence measure.

(43)

As a desirable alternative, rank correlation is a simple scalar measure of dependence. There are three advantages of using rank correlation. First, rank correlation depends only on the copula of the jointly distributed aggregate losses, and not on the marginal distributions. There are mainly two types of rank correlations, Kendall’s tau (ρ ) and τ Spearman’s rho (ρ ), and we can clearly see this feature from McNeil [2005]; both _s are the functions of only the copulas and the scores.

Proposition30 5.29 Suppose X1 and X2 have continuous marginal distributions and unique copula C. Then rank correlations are given by:

. ) ) , ( ( 12 ) , ( , 1 ) , ( ) , ( 4 ) , ( 2 1 2 1 1 0 1 0 1 2 2 1 2 1 1 0 1 0 1 2 2 1 du du u u u u C X X u u dC u u C X X s

∫ ∫

− = − = ρ ρ_τ

The second advantage is that rank correlation provides a way to calibrate copula to empirical data. Although both Kendall’s tau and Spearman’s rho have complicate definitions, the standard empirical estimators of rank correlation may be calculated by looking at the ranks of the data. The third one is that rank correlation will not be constrained by the issue of undefined second moment.

These above advantages make rank correlation a better dependence measure than linear correlation. Therefore, the IC method which employs the former would be theoretically more reasonable than the CD method which employs the later.

Section 5.4 Case Study

After simulating the aggregate loss of each incident by Monte Carlo approach, we can use the two methods to calculate the minimum regulatory capital, taking into account the imperfect dependence between the cells. The dependence is pre-decided in term of the target correlation of the section 5.1. Although we have shown that IC method seems to be theoretically stronger than CD method, we are not sure about whether it is still the case empirically.

(44)

The internal data we have is based on experts’ opinions, and it reports 1470 incidents during a year. (See the appendix 4 about how these data are formulated.) To compare IC with CD method, we only single 16 of them from two business lines (named as BL1 and BL2) and two event types (with ET_ID: EL0701 and EL 0601). Consequently, we obtain four cells: cell 1, 2, 3 and 4.

(Table 5.4 from the appendix 5)

To further simplify the problem, we set the values of correlation coefficient between business lines as 0.3, and 0.55 as the value of the coefficient between the event types. Following the same approach in the section 5.1, we obtain the target correlation matrix T is set as following:

Figure 5.5: faming Target Correlation Matrix

We shall make the judgment basing on the following two perspectives:

1) the good method should respect the marginal distributions of aggregate losses of cells;

(45)

2) needless to say, the good method should bring us the correlation closed to our target correlation matrix.

Results

As we expect, Imann-Conover (IC) method strictly keeps the original simulation results of each cells; all the features of the marginal distribution are thus well contained in the joint aggregate loss of cells. The Cholesky Decomposition (CD) method distorts the original density quite badly. (See the below figure.) It is simply because IC method only reranks the simulated results of each cell. Under the assumption of normality, CD method linearly transforms the simulated results in the way that we describe in section 5.2. However, the results of Jack-Berra test reject this assumption.

Orginal Density ofCell 1

MM[, i]

De

n

s

ity

0e+00 4e+07 8e+07

0.0 e + 0 0 2 .0 e-0 7 Density(after IC)ofCell 1 MD[, i] De n s ity

0e+00 4e+07 8e+07

0.0 e + 0 0 2 .0 e-0 7 Density(after CD)ofCell 1 MM_SAS3[, i] De n s ity

0e+00 4e+07 8e+07

0.0 e + 0 0 2 .0 e-0 7

MM[, i] De n s ity 2000 6000 0 .00 00 0 Density(after IC)ofCell 2 MD[, i] De n s ity 2000 6000 0 .00 00 0 Density(after CD)ofCell 2 MM_SAS3[, i] De n s ity 0 10000 30000 50000 0 .00 00 0

MM[, i]

De

n

s

ity

3e+05 5e+05 7e+05

0e + 0 0 6 e-0 6 Density(after IC)ofCell 3 MD[, i] De n s ity

3e+05 5e+05 7e+05

0e + 0 0 6 e-0 6 Density(after CD)ofCell 3 MM_SAS3[, i] De n s ity 500000 1500000 0e + 0 0 6 e-06

Two Problems for CaR Calculation in Operational Risk Modeling