Copula modeling for world's biggest competitors

(1)

Copula Modeling for World’s Biggest

Competitors

Alina Buike

Master’s Thesis to obtain the degree in Actuarial Science and Mathematical Finance University of Amsterdam

Faculty of Economics and Business Amsterdam School of Economics

Author: Alina Buike

Student nr: 11388633

Email: alinabuike@gmail.com

Date: January 15, 2018

Supervisor: Prof. Dr. R.J.A. (Roger) Laeven Second reader: Dr. S. Umut Can

(2)

Statement of Originality

This document is written by Alina Buike who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

Defining dependence structure using copula models has been widely used in quantitative framework. Some copula families tend to be more applied in financial modelling compared to others. The well known Gaussian cop-ula seems underestimating the tail values, therefore other copcop-ulas must be studied. Copula modeling is intuitive in theory, but still complex to apply in practise. There are wide range of applications, for instance, modelling Value-at-Risk, defining probability of credit defaults and construction of as-set relation structure in portfolio analysis. This research will consider chosen industry competitors and try to find the most applicable parametric copulas using R software.

Keywords: copula modeling, copula fitting, R, Gaussian copula, Student’s t -copula, Frank copula, Clayton copula, Gumbel copula, Clayton-Gumbel copula (BB1), Gumbel copula (BB6), Clayton copula (BB7), Joe-Frank copula (BB8).

(4)

1 Introduction 1 2 Dependence structures 2 2.1 Association measures . . . 2 2.2 Defining Copulas . . . 2 2.3 Families of Copulas . . . 3 2.3.1 Gaussian copula . . . 4 2.3.2 Joe copula . . . 4 2.3.3 Student’s t -copula . . . 4 2.3.4 Frank copula . . . 5 2.3.5 Gumbel copula . . . 5 2.3.6 Clayton copula . . . 5 2.3.7 Clayton-Gumbel copula (BB1). . . 5 2.3.8 Joe-Gumbel copula (BB6) . . . 6 2.3.9 Joe-Clayton copula (BB7) . . . 6 2.3.10 Joe-Frank (BB8) . . . 6 2.4 Meta Distributions . . . 6 3 Data 7 3.1 Time dependence in stock returns . . . 7

3.2 Choice of competitors. . . 9

3.2.1 Marginal distributions . . . 9

3.2.2 Association measures for competitors . . . 11

4 Copula modeling using R 12 4.1 Model estimation and selection . . . 12

4.2 R functions in copula modelling . . . 13

4.2.1 Additional simulations . . . 14

4.3 Competitors’ dependence structures . . . 14

4.3.1 Copula modeling for Nike and Adidas . . . 15

4.3.2 Copula modeling for Pepsi and Coca-Cola (Coke) . . . 17

4.3.3 Copula modeling for Philip Morris and British American Tobacco . . . . 18

4.4 Resulting copulas for each competitor pair . . . 19

4.5 Using Goodness-of-Fit tests . . . 20

4.6 Remarks about empirical and theoretical copulas . . . 21

4.7 Copulas in real life applications . . . 21

5 Conclusion 23

A Appendix 23

B Appendix 25

C Appendix 25

(5)

(6)

1 Introduction

Our World is full of corporations creating wide range of products and providing lots of services to their costumers. Almost every company is dealing with competition within the in-dustry. Marketing, prices and innovations can be seen as the features defining the value of the firm. Consequently, any advances will lead to competition with other industry involved com-panies. Ideally, brands want to have growing businesses to be able to beat the competition. One of the most fascinating thoughts is - some industries are having only few real competitors. Intuitively, this competitive atmosphere not only creates pressure for themselves, but also huge pressure for less popular brands within the same industry. The goal is to focus on the biggest competitors and understand how they are related with each other. Apparently, if one of the companies develops new products with some extra features, the other brand is forced to build better products or services as well. This requires not only considering own product develop-ments, but also constantly check on competitors’ strategies. As it was mentioned already, it creates high pressure and one could claim the top companies are never independent, hence it could be useful the understand their relation. Notice, risks related to other brands’ strategies are as dangerous as risks within own brand, because failures will create a poor imagine result-ing in clients dissatisfaction. However, the risks comresult-ing from outside, such as general economic distress, would affect both firms no matter how they relate to each other.

Consequently, there are many risks to notify, but not all of them can be properly defined. Furthermore, one of the approaches towards risk modelling would be the precise definition of possible dependence structure between most popular brands within the industry. If there is a possibility to understand the relations between two biggest corporations, then without specifying the exact name of the risk there is a chance to recognize the behavior of those risks. To perceive the dependence structure between World’s biggest competitors, new concept of copula modeling will be later introduced.

Furthermore, this research assumes there are unique dependence structures for biggest competitors representing different industries. Moreover, the dependence structure for each pair can be summarized in a parametric copula. Therefore one of the goals is to test these hypothesis. Additionally, there are many industries to think of, but this study will be mostly based on three competitors. The starting point for competitor choices will be: Nike and Adidas (sports and clothing), Apple and Samsung (mobile phones and computers), Pepsi and Coke (non-alcoholic beverages), Novartis and Pfizer (pharmaceuticals), and Philip Morris and British American Tobacco (tobacco). Another objective is to look at so called Gaussian copula. Many experts tend to use Gaussian copula, which in reality is very limited for financial data. Therefore, comparison between Gaussian copula and other copulas must be established.

Finally the analysis will outline the following. First, define copulas with some copula family examples. Second, data information will be provided with additional comments about time dependence in stock returns. Next, the copula modeling using R software will be discussed with concrete steps of how to deal with parametric copulas and explaining goodness-of-fit tests. Also some general remarks about empirical and theoretical copulas will follow. In order to prove the benefit of copula models some other real life applications will be introduced. In conclusion general notes will be presented with additional motives for further copula developments.

(7)

2 Dependence structures

Association measures are one of the most common ways of describing the dependence structure between two or more random variables using rather simple intuitions. This section will represent some association measures and later define copulas with detailed explanations.

2.1 Association measures

To be able to measure the dependence structure for random variables, the best is to visu-alize these variables in a plot. Although it does not give concrete statistics and can be plotted mostly for bivariate cases, still it is a simple way of visualizing the common patterns, which can provide some intuitions. Statistically sound dependence measures can be determined trough correlations. According to Kaas [11], there are several types of correlation structures, namely, Pearson’s correlation coefficient r, Spearman’s ρ, Kendall’s τ and Bloomqvist’s β. Each of them can be calculated using the following formulas.

r = E[XY ] − E[X]E[Y ] σXσY

(1)

ρ = r(F (X), G(Y )) (2)

τ = 2 × Pr[(X − X∗)(Y − Y∗) > 0] − 1 if (X, Y ) ∼ (X∗, Y∗) independent (3) β = 2 × Pr[(X − ¯x)(Y − ¯y) > 0] − 1 with ¯x and ¯y being medians (4) Notice, these correlation estimates give single number outcomes, meaning they are rather sim-plistic approaches of how to determine correlation based on ideas: linear correlation (Pearson’s correlation coefficient r), non-parametric rank correlation (Spearman’s ρ), ordinal association (Kendall’s τ ) and concordance with the medians (Bloomqvist’s β).

Next section will introduce a concept, which will involve shapes of underlying distributions, therefore providing an overall fit for dependence structure, instead of single number outcome.

2.2 Defining Copulas

In statistics framework, every pair of two (or more) random variables form a joint distri-bution depending on their marginals and dependence between those marginals. Hence, framing this other way round - given the marginals and the dependence structure, one can determine the joint distribution. In risk management field, the main concern is the dependence structure, instead of joint distribution. The financial analysts care of stock dependencies in the portfolio or what is the probability of one stock loosing value, given the other stocks already dropped in value. The trick is to understand how the chosen variables are depending on each other. Informally, copula can be seen as the dependence structure.

Define C(u) as a d-dimensional copula, where distribution function on [0, 1]dwith standard uniform marginal distributions. Notation wise C(u) = C(u1, u2, ..., ud) for multivariate dfs that

are copulas. There are three basic properties characterizing copulas: (1) C(u1, ..., ud) = 0 if ui = 0 for any i

(2) C(1, ..., 1, ui, 1, ..., 1) = ui for all i ∈ 1, ..., d, ui ∈ [0, 1]

(requires that marginal distributions are uniform)

(3) For all (a1, ..., ad), (b1, ..., bd) ∈ [0, 1]d with ai ≤ bi having 2 X i1=1 ... 2 X id=1 (−1)i1+...+id_C(u 1i1, ..., udid) ≥ 0

(8)

where uj1 = aj and uj2 = bj for all j ∈ 1, ..., d. This is called rectangle inequality ensuring that

random vector with df C has non-negative probabilities.

To be able to work with copulas there are few things to be familiar with, namely, probability and quantile transformations.

(1) Quantile transform. If U ∼ U (0, 1) has a standard uniform distribution, then P (F←(U ) ≤ x) = F (x), where denotes generalized inverse.

(2) Probability transform. If X has df F with continuous univariate df, then F (X) ∼ U (0, 1). An useful theorem in copula setting is Sklar’s theorem. It states, if F has a joint distribution with marginals F1, F2, ..., Fd, then there exists an unique copula C, such that for all x1, ..., xd∈ R

F (x1, ..., xd) = C(F1(x1), ..., Fd(xd)). There are few so-called fundamental copulas, namely,

comonotonicity, countermonotonicity and independence copulas. The comonotonicity means perfect positive dependence and countermonotonicity - perfect negative dependence. Intuitively, if there is a copula and it has none of the two - positive nor negative dependence structures, it must be somewhere between both. Therefore, every copula C(u1, ..., ud) have bounds:

max(Pd

i=1ui+ 1 − d, 0) ≤ C(u) ≤ min(u1, ..., ud)) - called Fr `echet bounds for copula. Hence

the comonotonicity copula is the Fr `echet upper bound copula, while countermonotonicity is the lower bound copula.

Previously mentioned fundamental copulas defines three types of dependence structures, another way of perceiving copulas are definitions of implicit and explicit copulas. Implicit cop-ulas are extracted from well-known multivariate distributions using Sklar’s theorem, but they do not have to result in closed-form expressions. In contrast to implicit copulas, explicit cop-ulas form closed-form expressions and they have mathematical constructions known as yield copulas. (McNeil A.J. (et al.), 2015 [17]).

2.3 Families of Copulas

This section further defines certain implicit and explicit copulas, which can be used in a quantitative setting. Although the set of parametric copulas is large, only few of these copulas are actually useful to apply. The most common copulas with their characteristics will follow next with a brief comments about their practicality. Additionally, the very last subsection explains an interesting concept of meta distributions as well as why it could be essential reaching the final results. Some of the copulas explained are part of Archimedean copulas [17]. Further, these Archimedean copulas share common assumption and they must be able to be written in a certain form:

C(u1, ..., ud) = ψ(ψ−1(u1) + ... + ψ−1(ud)) (5)

Where ψ is a generator function and ψ−1 is the generalized inverse function. The generator functions have these properties:

(1) ψ: [0, ∞) → [0, 1] with ψ(0) = 1 and limx→∞ψ(x) = 0

(2) ψ must be continuous

(3) ψ is strictlty decreasing on [0, ψ−1(0)] (4) ψ−1 is given by ψ−1(x) inf{u : ψ(u) ≤ x}

The generator function is particularly important, because it allows to derive upper tail and lower tail coefficients called λU and λL. Next sections will explain the most useful copulas in

financial framework. Note, the application for competitor pairs is based on two random vari-ables, hence the next sections will be based on bivariate setting to be more simplistic.

(9)

2.3.1 Gaussian copula

If random variable Y ∼ Nd(µ, Σ) is a multivariate normal random vector, then its copula

is a Gaussian copula. In fact, the Gaussian copula is an implicit copula and can be expressed as an integral over the density of X (assuming X is a standardized version of Y ).

CGa p (u1, u2) = Z Φ−1(u1) −∞ Z Φ−1(u2) −∞ 1 2π(1 − ρ2₎1/2 exp −(s2 1− 2ρs1s2+ s22) 2(1 − ρ2₎ ds1ds2 (6)

Gaussian copula can be seen as a dependence structure between positive and negative depen-dence, where ρ coefficient represents the strength of dependence. Therefore, if P = Id, where

Id is an identity matrix, then the result is the independence copula (no covariance terms). If

P = Jd, where Jd is a matrix consisting only ones, then the comonotonicity copula is obtained

(McNeil A.J. (et al.), 2015 [17]).

Even though Gaussian distribution seems to be the traditional way of modeling the de-pendence, it has some issues related to financial applications. For instance, Gaussian copula is the most distributed around its center and shows almost no dependence in the tails. Also, many professionals tend to use it, because it has rather simple intuition and it is based on nor-mal distribution. However, financial data is usually skewed and show more extreme outcomes (heavy tailed) compared to Gaussian distribution (Kole E. (et al.), 2006 [14]). Therefore, Gaus-sian copula almost never works in financial analysis, however with proper assumptions normal copula might be still valid.

2.3.2 Joe copula

Joe copula is part of the Archimedean copula families, therefore having a generator function following certain form ψ(u) = (−ln1 − (1 − u)−θ). Joe copula in a bivariate setting has the following equation:

C_θJ = 1 − ((u1)−θ+ (u2)−θ− u−θ1 u −θ 2 )

1/θ_, _{θ > 1} ₍₇₎

Where θ is the copula parameter in (1, ∞) interval. Joe copula is a well studied model, since it highlights the upper tail dependence. Hence, if there is an intuition two random variables have high correlation, when both are positive, then this might be to copula to consider [17].

2.3.3 Student’s t -copula

Student’s t -copula is an improvement over Gaussian copula, since it has more mass in tails. Similarly to Gaussian copula, it has no closed form solution, hence it is an implicit copula. The two dimensional t -copula follows the definition below:

C_ν,Pt (u1, u2) = tν,P(t−1ν (u1), t−1ν (u2)) (8)

The parameters follow from t -distribution with tν, where ν represents the degrees of freedom

and P is the joint df of the vector X ∼ t2(ν, 0, P ). Within this setting P is the correlation

matrix [17].

One of the advantages was already mentioned. This copula can give the insights when extreme outcomes are in place, particularly positive outcomes are equally likely as negative. Moreover, t -distribution allows for dependence flexibility around its center, therefore it could be one of the main approaches in financial setting.

Note, both distributions Gaussian and Student’s t are elliptical distributions having an elliptical shape with symmetry in tails. In fact, every elliptical distribution can be seen as an extension to Gaussian distribution, when ν → ∞ it becomes Gaussian distribution [14].

(10)

In comparison to Joe copula, Student’s t -copula not only shows high dependence in positive outcomes, but also negative. This is one of the reasons, why it might be favorable in financial setting, since it highlights the upper and lower tail dependence as well as the dependence around its center.

2.3.4 Frank copula

Another interesting copula from Archimedean copula family is the Frank copula. Its θ parameter has no restrictions and the copula follows.

C_θF(u1, u2) = −

1 θln[

(1 − e−θ) − (1 − e−θu1_{)(1 − e}−θu2₎

2 ], −∞ < θ < ∞. (9)

With no restrictions to θ the generation function becomes ψ(u) = −lne_e−θu−θ₋₁−1 and it can be seen

as a rather flexible copula, because in theory it can fit any empirical copula [17]. The flexibility implies the Frank copula will always be able to provide a fit, but this shape can be worse com-pared to copula actually providing parameter restrictions. With this in mind, flexibility does not have to be a good characteristic.

2.3.5 Gumbel copula

One of the example of implicit copulas is the Gumbel copula. It has a range between inde-pendence and perfect positive deinde-pendence, meaning it cannot take any negative dependencies. C_θGu(u1, u2) = exp(−((− ln u1)θ+ (− ln u2)θ)1/θ), 1 ≤ θ < ∞. (10)

Equation above represents the bivariate Gumbel copula and it becomes an independence copula, when θ = 1, because then CGu

θ (u1, u2) = u1· u2. In contrary, if θ → ∞ then CθGu(u1, u2) is the

two-dimensional comonotonicity copula (perfect positive dependence) [17].

Still the Gumbel copula has no practical application in financial context, because it has the most mass on the positive part of the tail and no dependence in the lower tail. In fact, it does not even allow for negative values.

2.3.6 Clayton copula

Another example of implicit copula is the Clayton copula, which has opposite thought of Gumbel copula, meaning more mass in the negative tail[17].

C_θCl(u1, u2) = (u−θ1 + u −θ 2 − 1)

−1/θ

, 0 ≤ θ < ∞. (11)

This copula has the dependence structure with a range between the independence copula (when θ = 0) and as θ → ∞ it approaches two-dimensional comonotonicity copula.

Notice all these copulas: Frank copula, Clayton and Gumbel copulas are rather extreme copulas, forcing the outcome to be in a particular range. With this intuition in mind, it would be useful to look at mixes between all those copulas, therefore allowing asymmetries in left and right tails as well as defining parameters for both mixing copulas, similarly as in stand-alone cases.

2.3.7 Clayton-Gumbel copula (BB1)

The next sections about BB1, BB6, BB7 and BB8 copulas will introduce new concept of mixed copulas and will be mainly used within these short forms (BB1, BB6, BB7, BB8). If two copulas are being mixed, then each requires its own estimate in the mix. Furthermore, the

(11)

BB1 copula is very useful copula, since it gives the combination between both extreme cases of Clayton and Gumbel copulas. The resulting BB1 copula is presented below:

C_θ,δBB1(u1, u2) = (1 + [(u−θ1 − 1)

δ_{+ (u}−θ 2 − 1)

δ_]1/δ₎−1/θ

, δ ≥ 1, θ > 0. (12) Notice, the parameter restrictions are staying the same as for stand alone copula models and the copula model is somehow the mix between those two. One may wonder what are the differences between Student’s t -copula and BB1. Yet, t -copula is having a very strong dependence structure around the center, while BB1 copula does not have to show this kind of dependence. Therefore, if there is a solid dependence mainly in the tails, then BB1 copula might be considered [5].

2.3.8 Joe-Gumbel copula (BB6)

The mix between Joe and Gumbel copula or BB6 is represented by the equation (13). This is another rotated copula, meaning possible to detect asymmetric tail dependencies, which can be useful in financial framework, since it highlights the upper-tail dependence [16].

C_θ,δBB6(u1, u2) = 1 − (1 − exp − [(−log(1 − u1)θ)))δ+ (−log(1 − (1 − u2)θ))δ]1/δ)1/θ,

θ ≥ 1, δ ≥ 1. (13)

2.3.9 Joe-Clayton copula (BB7)

This copula is useful for modeling tail dependence as well as quantile regression, because it allows both - positive and negative quantile curves. Moreover, paper by Li F. (2016) [15] briefly captures the main elements of this copula. The BB7 density can be shown in equation below.

C_θ,δBB7(u1, u2) = 1 − [1 − ((1 − uθ1) −δ

+ (1 − u2)−δ− 1)−1/δ]1/θ, θ ≥ 1, δ > 0. (14)

The copula density can be written as CBB7

θ,δ (u1, u2) = η(η−1(u1) + η−1(u2)) with η(t)

represent-ing generator function η(t) = (1 − (1 − t)θ₎−δ _{− 1.}

2.3.10 Joe-Frank (BB8)

Another mixture copula between single Joe and Frank copulas is the BB8 copula, with generator function of − log[1−(1−δt)_1−(1−δ)θθ] and copula model:

C_θ,δBB8(u1, u2) = 1 θ(1 − [1 − 1 1 − (1 − δ)θ(1 − (1 − δu1) θ_{)(1 − δu} 2)θ)] 1 δ), θ ≥ 1, 0 < δ ≤ 1. (15)

Recall, the Joe copula is an extreme copula, which has clear upper tail dependence, while Frank copula provides either strong positive dependence or strong negative dependence [28].

2.4 Meta Distributions

Some may think Gaussian marginals will lead to a Gaussian copula, in practise it might not be the case. According to McNeil A.J. (et al.), 2015 [17], there is no clear way of defining copula knowing only the marginals. For example, two Gaussian distributions will probably lead to a Gaussian copula, although it does not have to. This works also other way round, if there is a Gaussian copula for two random variables, there is no rule telling the underlying marginals have to be Gaussian as well. Moreover, the resulting copula still has to be analyzed no matter of the underlying (could be even unknown) marginals. The meta term explains the idea of arbitrary marginals resulting in known parametric copula. Therefore, the resulting distribution could be Gaussian copula for non-Gaussian marginals, so called meta-Gaussian distribution.

(12)

3 Data

The modeling part requires data series for all the competitor pairs. Recall, the competitors consist of pairs: (1) Nike [19] and Adidas [2]; (2) Coca-Cola (Coke) [12] and Pepsi [22]; (3) Apple [1] and Samsung [27]; (4) Novartis [20] and Pfizer [23]; (5) Philip Morris (PM) [24] and British American Tobacco (BAT) [3]. Even though there are many options of data types (i.e. daily, weekly, monthly, yearly), it is crucial to consider the best time density. Time intervals are extremely important, since smaller frequencies, for instance, daily data tend to show higher time dependence. Therefore, the aim is to have the smallest time dependence, by having large enough samples for copula models. On the one hand, the best solution would be to take monthly data, since the time dependence could be almost invisible. On the other hand, some data series have relatively small time spans, therefore taking monthly data would provide small samples of observations, therefore the rational choice would be to take weekly data.

ln(ri) = ln

p_i− p_i−1 pi−1

(16) The formula above (16) allows to look at comparable quantities rather then stock prices pi itself

[17]. The weekly stock prices are taken from Yahhoo!Finance website. Furthermore, Figure 1 gives the first impressions of how the log-returns for each competitor pair behaves, resulting from equation (16).

3.1 Time dependence in stock returns

As it was already mentioned, the reason why weekly stock returns might be the way to proceed the analysis, is because there should be smaller time dependence compared to daily log-returns providing large enough samples. For instance, daily stock returns are very time dependent and not iid (identically independently distributed) causing problems for copula modelling. In theory, copula modelling should not involve time relation. Another approach is to take data series and fit copulas on the residuals using one of the models ARM A or ARIM A (Patton A.,J. (2015) [21]). However, determining the right ARM A/ARIM A structure might be a challenge itself, since it asks for deeper knowledge and theory about time-series modelling. Hence, this research will not provide analysis on ARM A/ARIM A, but instead will focus on series with small or no time dependence.

Since the ultimate goal is to target the copula models, weekly data is taken and tests for detecting the time dependence are run. Notice, if one of the competitors (in pair) has signif-icant time dependence, but the other one shows no dependence, then the copula cannot be determined, since it involves contrasting time structures within the pair.

rt = α1+ α2rt−1+ ε (17)

Equation (17) is a simple regression model for log return series, which are regressed on its’ past period (lag) values. Table 1 represents the results for all data series.

Notice, at least one competitor in pair has a significant dependence on lagged value. To better understand how time related are the weekly data series, next step would be to look at more lags, to see how strong is the overall dependence. Furthermore, the next assumptions will determine, which competitor pairs can be seen as not time dependent. In other words, the severity of next assumptions will determine, which competitor pairs will be analyzed and assumed iid. The analysis of finding the best copula will be only taken for those pairs, which are assumed to be no time related. Therefore, lets assume one lagged value is not enough to conclude that time series have strong time dependence. Adding more lags is crucial.The best option would be to look at ACF (Autocorrelation Function). The ACF gives correlation coefficients up to pre-determined order (lags) [17]. So the assumption is, if the dependence

(13)

Figure 1: Log-return series for all the competitor pairs.

dies out after first lag, it is good to assume series have no significant time dependence. In other words, if the log-returns depend only on the log-returns one period before, this can be seen as a momentum effect instead of real time dependence. This assumption is made to avoid complicated time series modeling and focus on the main goal.

Appendix A represents all the ACF structures for each time series. Consequently, first lag has the same intuition as it was previously introduced in Equation (17). Notice, autocorrelation does not have a strict dying out effect and might even signal about seasonality. For more careful analysis, the seasonality should be adjusted, but again for these analysis assume the seasonality can be ignored.

Moreover, consider the time series with significant first lag dependence, namely, Nike, Coke, Samsung, Novartis, Pfizer and British American Tobacco. The next step is to analyze if these time series are really time dependent, which will be purely based on assumptions. Furthermore, if series have next lag (order 2) also significant, then assume the series are time dependent. Hence looking at Appendix A outcomes, Samsung seems to have time dependence, not only in first and second lag but also at higher orders. However, it is hard to distinguish weather Novartis is time dependent, since the third lag seems to be more related than the first one, and more lags afterwards exceed the significance boarder, hence it hits rational assumption of time association.

(14)

Competitors rt α1 α2 p-value α2 Nike 0.003794 -0.153825 0.000245 *** Adidas 0.002528 0.038831 0.358 Apple 0.003706 0.007625 0.8676 Samsung 0.003442 -0.131702 0.0039 ** Pepsi 0.0017025 -0.0240852 0.4719 Coke 0.0009346 -0.0773327 0.0205 * Pfizer 0.000681 -0.113084 0.000712 *** Novartis 0.0014467 -0.0912864 0.00623 ** Philip Morris 0.002695 -0.079243 0.0884 .

British American Tobacco 0.002264 -0.135550 0.0033 **

Table 1: Regression for time dependence with significance levels: *** 0.001 ** 0.01 * 0.05 . 0.1.

3.2 Choice of competitors

To be able to follow up with the copula modeling, previous section explained the time dependence in each pair, concluding that Samsung and Novartis are highly time dependent and cannot be analyzed in a copula pair. Notice, form Equation (17) Pfizer had significant time dependence in first lag (the same as Novartis), but ACF outcome suggests that Novartis is even more time related compared to Pfizer. One option would be to work with residuals as in paper by Patton A.,J. (2015) [21], since both series are time dependant.

Consequently, copula modeling will be based on three competitor pairs, namely, (1) Nike and Adidas; (2) Pepsi and Coke; (3) Philip Morris (PM) and British American Tobacco (BAT).

3.2.1 Marginal distributions

One of the biggest mistakes in financial practise is to assume Gaussian distribution without proper analysis. To see how wrong this assumption can be, lets assume each competitor follows normal distribution with mean and variance depending on its values.

To be able to comment about normality use Jarque-Bera test [17]:

T = 1 6n(b 2₊1 4(k − 3) 2₎ ₍₁₈₎ b = 1/n Pn i=1(Xi− ¯X)3 ((1/n)Pn i=1(Xi− ¯X)2)3/2 (19) k = 1/n Pn i=1(Xi− ¯X)4 ((1/n)Pn i=1(Xi− ¯X)2)2 (20)

The JB test (Jarque-Bera) belongs to the class of omnibus moment tests. Hence, this test detects simultaneously if skewness (b) and kurtosis (k) are consistent with normal distribution. To have a visual representation regarding sample distribution, the best is to look at Q-Q plot. Even if it is not a quantitative measure, it is representative enough to understand the contrasting distributions in comparison to normal distribution (45 degree line). Big enough samples (assuming at least 300) should be more or less on the line, if the underlying sample distribution is indeed normal. The samples in the analysis are Nike and Adidas (n = 565); Pepsi and Coke (n = 898); Philip Morris and British American Tobacco (n = 469). In theory, those samples are big enough to properly define, if series are normally distributed.

(15)

Nevertheless, both representations suggest non-Gaussian distributions, since the p-values are very small (rejecting the null of normality) and the tails are far off from the normal distribution, heavier compared to normal distribution from Q-Q plots. All data series are negatively skewed implying longer left tail (Table 2). Additionally, series have very high kurtosis compared to normal distribution (kurtosis = 3), this is typical for financial data and could be even more extreme for daily stock returns [17].

With Q-Q plot visualizations and p-values from J B test in (Table 2), it is clear that none of the competitors can be assumed to be normally distributed. However, coming back to meta distributions, resulting copulas can still have Gaussian copula behaviour, although it seems less realistic, due to such amount of extreme outcomes compared to Gaussian distribution.

(16)

Competitors p-value J B skewness kurtosis

Nike < 2.2e-16 -0.2440266 6.95751

Adidas < 2.2e-16 -0.2747773 5.591435

Pepsi < 2.2e-16 -0.664329 9.944061

Coke < 2.2e-16 -1.17481 11.33444

Philip Morris < 2.2e-16 -0.9059962 10.59178

British American Tobacco < 2.2e-16 -1.765731 19.08703

Table 2: Jarque-Bera JB test for competitors including the results for skewness and kurtosis.

3.2.2 Association measures for competitors

Before further developing copulas models, recall the association measures from previous section. Table 3 outlines different resulting association measures using previously defined equa-tions (1), (2), (3) and (4) for each competitor pair. Those dependence measures are quite differing, due to their contrasting definitions. For Nike and Adidas as well as Philip Morris and British American Tobacco, the highest resulting correlation is Pearson’s, while for Pepsi and Coke it is Spearman’s correlation. Applying different association measures the data series can show high or low positive correlation. Consequently, it would be more beneficial to look at copula models, perhaps explaining when the correlation is low and when high.

Competitor pairs r ρ τ β

Nike and Adidas 0.4371026 0.417404 0.2913399 0.2788632

Pepsi and Coke 0.5062582 0.5107042 0.3766063 0.4342984

Philip Morris and British American Tobacco 0.6093244 0.5557805 0.3948572 0.4089936

(17)

4 Copula modeling using R

This section will provide deeper look into how the copula parameters will be estimated and what will be used to compare several copula models for chosen competitor pairs. There are various range of model selection criteria, but not all of them are useful and intuitive in financial practise.

4.1 Model estimation and selection

In order to select the most appropriate model, there are several methods introduced. There are authors who believe the best approach would be to use information criteria, since there is limited information of other possible ways of how to test and compare copula models [16]. Additionally, some of the copula model selection criteria are hard to use and interpret, therefore the best is to look at Akaike information criterion AIC and Bayesian information criterion BIC, which are typically used in other non-copula models.

To further investigate the most suitable copulas, R software will be used. To assign most appropriate copulas, information criteria with log-likelihoods will be summarized and presented. Recall, the best ways to determine the copula fits can be achieved by calculating AIC and BIC.

AIC = 2k − ln(L) (21)

BIC = ln(n)k − ln(L) (22)

The AIC criterion penalizes for increased number of parameters, while BIC criterion looks at increased sample size. Additionally, k is the number of parameters and L is the maximized log-likelihood. Increasing sample size or number of estimates will increase either of both AIC or BIC. Therefore, choosing the most suitable model means to search for lowest information crite-rion. Even though both criteria look very similar, still they give different intuitions depending on number of parameters in the model and the sample size. In principle, this will not make too much difference in copula modelling within this research, since the copula models announced include at most two parameters to estimate, suggesting AIC and BIC will have only slight differences. Also, when the models will be compared, they will change according to choice of copula models instead of size.

The other important ingredient for AIC and BIC estimation is the determination of log-likelihood. Some of the likelihoods will be estimated as pseudo log-likelihoods [4]. Pseudo log-likelihoods fix for the dependence structure in variables, therefore it is more convenient than maximum likelihood estimation, but still problematic. Therefore, some of the information cri-teria will be calculated taking into account the pseudo rather than original log-likelihoods. Additionally, details about pseudo log-likelihoods will not be explained any further.

The reason why R software is very useful, is due to existing copula modelling packages. Every copula to be estimated is calculated by maximizing the log-likelihood, therefore it gives the parameter estimates with the (pseudo) log-likelihoods. Consequently, given the resulting likelihoods and number of parameters, AIC and BIC can be determined. The only drawback using R software is the principle of not knowing exact calculations behind these copula functions. On the one hand, there are only certain set of possible parametric copulas, hence if there is a better copula, not implemented in R software, this will be missed. On the other hand, this research tries to find the best fitting copulas and show how Gaussian copula can be wrongly assigned in the financial framework. Recall, many experts use Gaussian copula by completely ignoring the tail dependence, but in case the tail-dependence does not play a role in the financial analysis or Gaussian copula fits almost as good as any other copulas, there is no real gain of using other suggested copulas, because there can be complications to understand them. Consequently, the goal is to find better performing copulas (than Gaussian), since more advanced copulas can solve the problem, but have no meaningful explanation or intuition, which in fact can give more

(18)

problems in the end. To summarize the idea, the aim of this research is to see the difference between Gaussian copula and all the other suggested copulas. Hence depending on the financial goal, distinguish the usefulness of copulas.

Next sections will give more insights about certain functions in R software. Each competi-tor pair will be analyzed by fitting nine copula models using maximized log-likelihoods and previously explained functions. Some basic comments about empirical and theoretical copulas will be discussed and comparison between Gaussian copula and other used formulas will be presented.

4.2 R functions in copula modelling

Copula modeling is rather complex task, since there are no clear cuts between assumptions defined and modeling techniques used. This section will explain step by step approach of how to determine possible copula fits using R software.

Before the analysis can be introduced the most important packages for copula modelling are copula(),VineCopula(), CDVine()[25]. Furthermore, using the log returns for competitor pairs, next step is to map these values within uniform distributions, as an input for copula determination. For this purpose pobs()can be used. The pobs()works as follows: given n observations xi of a random vector X, the pseudo-observations are defined as uij = rij/(n + 1)

for i ∈ 1, .., n and j ∈ 1, .., d, where rij denotes the rank of xij. The asymptotic scaling factor

is being used to force the values fall into open unit hypercube [9]. Taking the resulting pobs() values for competitor pair, transforms the log-returns in a probability range and plotting these values give the empirical or call it the true copula.

Empirical copulas for all the competitor pairs are presented below. In fact, empirical copu-las are interesting to look at, but they have no important properties, since the empirical copucopu-las are non-parametric, meaning there is no estimation method. This is one of the reasons, why the aim is to look at most suitable parametric copulas given the true copulas are in Figure 3.

Figure 3: Empirical distributions for each of the competitor pairs: Nike and Adidas; Coke and Pepsi; Philip Morris and British American Tobacco.

Next step is to look at several examples as the possible copulas for each competitor pair. This research will be based on copulas, which were already introduced in previous sections. These copulas are - Gaussian, Joe, Student’s t, Frank, Gumbel, Clayton and Gumbel (BB1), Joe and Gumbel (BB6), Clayton and Joe (BB7), and Joe and Frank (BB8). Even though Section 2. includes an overview about Clayton copula, it will not be considered, since it has a parameter restriction, which will not be satisfied in any of the competitor pair cases.

Depending on the choice of copula fitCopula() can be used. This function asks for the copula models (for instance Gaussian), the data which is being used and method of

(19)

estima-tion (maximum log-likelihood). The fitCopula() requires the pseudo-observaestima-tions or resulting pobs() for each competitor pair in the estimation of the chosen copula family. Therefore, fitCopula() fits the chosen copula by giving those three requirements: copula family, pseudo-observations and estimation method. These functions give the resulting parameter(s) with the maximized log-likelihood (since it is the chosen estimation method), recall it can be the pseudo log-likelihood. Another important task is to validate parameters, since some of them must be in certain ranges according to copula definitions. For instance, if someone chooses to consider the Clayton copula, then resulting parameters are close to 1 (≈ 0.98), while parameter has to be strictly larger than 1. Therefore, Clayton copula cannot be taken, since the parameters do not satisfy this restriction for any of the competitor pairs. Again, this is the reason, why Clayton copula is not even considered.

4.2.1 Additional simulations

The copulas with their estimated parameters for each competitor pair gives unique depen-dence structure. At the same time, it can be useful (not necessary) to look at some additional resulting simulations, in this case maximized log-likelihoods. Given the estimated parameters for a specific copula, simulate 100 cases and determine the resulting likelihood by taking the average. These input for simulations would provide the same parameters as estimated and the same sample size as the original. This gives an insight of how far off is the estimated copula from the simulated one with the same underlying assumptions or inputs. Although it is not the purpose of this research, still it is essential to see if the copula for competitor pair has a better likelihood then randomly generated one. To be able to present these analysis BiCopSim() func-tion will be used. This funcfunc-tion requires several input variables, such as number of simulafunc-tions, chosen copula family and resulting parameters.These additional results will be included, when each competitor pair will be determined.

Furthermore, these additional results will be included next to AIC and BIC calculations using the estimated log-likelihoods from (21) and (22) equations. The results will be shown in later subsections within each competitor pair and copula family.

4.3 Competitors’ dependence structures

Figure 4: Scatter plots for each competitor pair: Nike and Adidas; Coke and Pepsi; Philip Morris and British American Tobacco. Orange dots represent the real log-returns, while black crosses represent one of the simula-tions assuming normal distribution in log-returns

The scatter plots in Figure 4 show the real dependence structure and the simulated de-pendence structure, assuming the competitors follow normal distribution. Since none of the competitors can be assumed to be normally distributed, these plots are purely informative.

(20)

4.3.1 Copula modeling for Nike and Adidas

Given the outlook of empirical copula for Nike and Adidas, the next step is to assign the most applicable parametric copula. Parameter estimates can be calculated depending on which copula is being considered. Figure 1 shows several simulations based on true copula parameters. For instance, if assumption about Gaussian copula is made, then parameter estimates enter as the input for simulation. Even though there are many scenarios possible, still only one scenario is being taken for graphical representation. All these cases are almost indistinguishable, since

Figure 5: Different copula simulation models for Nike and Adidas.

the sample size for Nike and Adidas is rather small. Also, it is very hard to compare these simulated copulas with the empirical copula for Nike and Adidas from Figure 3. This proves the concept that plots cannot give a way to distinguish these copula cases and statistical in-ferences are really necessary. Moreover, Appendix B represents copula model simulations with sample sizes of 5000 given the same set of estimated parameters as it was done here. Clearly, the differences become more obvious between the copula families, but not enough to choose most appropriate one.

(21)

The results in Table 4 suggest best fitting copulas are: Student’s t, BB1 or BB7, based on

Copulas AIC BIC Log-Lik. Sim.Log-Lik.

Gaussian -119.08 -114.74 60.54 62.6 Joe -71.04 -66.89 36.52 47.38 Student’s t -145.76 -137.09 74.88 79.95 Frank -112.36 -108.2 57.18 58.73 Gumbel -110.18 -105.84 56.09 64.22 BB1 -147.88 -139.2 75.94 81.18 BB6 -108.1 -99.43 56.01 67.89 BB7 -149.43 -140.76 76.71 84.18 BB8 -102.82 -94.15 53.04 54.05

Table 4: Copula models with their estimated AIC, BIC, maximum likelihood and average maximum log-likelihood with n = 100 simulations for Nike and Adidas.

both AIC and BIC. The criteria for Student’s t -distribution are AICt= −145.76 and BICt=

−137.09, with slightly higher criteria for BB1 AICBB1 = −147.88 and BICBB1= −139.2, and

for BB7 AICBB7 = −149.43 and BICBB7 = −140.76. Since log-likelihoods enter in AIC and

BIC, the log-likelihoods also support these choices. Notice, the simulated log-likelihoods are al-ways larger compared to estimated log-likelihoods. Intuitively, best performing copulas should have higher likelihoods, therefore higher likelihoods for simulated copulas. Further, the differ-ence should be small, since there should be no distinction between real copula and simulated one, because both have the same parameters and the same shape of dependence. Intuitively, the worst performing copulas should have lower likelihoods and higher simulated likelihoods, which should result in much higher differences. Imagine the worst fit by taking the Joe cop-ula as the results suggest. Clearly, the empirical copcop-ula is far from the assumed (Joe) copcop-ula, because the parameters cannot capture the assumed dependence structure, therefore when the same set of parameters are being used for creating a Joe copula (in the Joe copula simulation), then likelihood is much higher. Again, if the real copulas can be determined by the parametric copula, the differences between estimated and simulated log-likelihoods must be small. In the other case, when parametric copulas cannot capture the real copula structure, the differences should become substantial between estimated and simulated log-likelihood. Similar problems arise when Gumbel copula is being used.

Notice, even the best performing copulas can have this issue of differing log-likelihoods for actual and simulated copulas, suggesting that dependence structure should show a higher likelihood if it is the real copula. Comparing BB1, BB7 and Student’s t -copulas with almost coinciding log-likelihoods, still they have some small differences in estimation and simulation, but in larger scales small differences can be seen as almost no differences.

Note, the fourth best fit is provided by Gaussian copula. Even if the fit is not as good as BB1, BB7 nor t -copula, still it has a potential. Moreover, smaller sample sizes could show smaller outliers, therefore suggesting higher likelihoods for Gaussian copula, which can be the case in this setting. However, in this particular set up favored copulas are Student’s t, BB1 and BB7 copulas, based on AIC and BIC, with slightly lower information criteria for BB7 copula.

(22)

4.3.2 Copula modeling for Pepsi and Coca-Cola (Coke)

In a similar fashion simulate copula models for Coke and Pepsi, based on the parametric estimates of each copula model. Notice, the sample size is higher (n=898) compared to previ-ous case of Nike and Adidas (n=565), intuitively creating a slightly better direction of copula models.

Figure 6: Different copula simulation models for Coke and Pepsi.

To explain Coke and Pepsi competitor pair, each parametric copula example is being simulated in Figure 6. Recall, the outcome of Coke and Pepsi empirical copula in Figure 3. By comparing Figure 3 and Figure 6 outcomes Student’s t -copula might be the copula to pay attention, since it has similar pattern around the center and in tails. The increasing sample size could play a big role in finding the best copula fit. The results in Table 5 highlights the rather bad fit for Gaussian and Joe copula, therefore suggesting to use t -copula indeed. Table 5 supports the choice of Student’s t -copula being the parametric copula with the highest log-likelihood as well as the lowest information criteria AICt = −381 and BICt = −371.4. The Student’s

(23)

Copulas AIC BIC Log-Lik. Sim.Log-Lik. Gaussian -252.2 -247.4 127.1 126.69 Joe -71.04 -66.7 108.4 136.1 Student’s t -381 -371.4 192.5 218.93 Frank -298.2 -293.4 150.1 163.11 Gumbel -288.4 -283.6 145.2 172.28 BB1 -318.39 -308.79 161.2 186.09 BB6 -286.23 -276.63 145.11 170.66 BB7 -310.28 -300 157.14 181.2 BB8 -290.6 -281 147.3 156.9

Table 5: Copula models with their estimated AIC, BIC, maximum likelihood and average maximum log-likelihood with n = 100 simulations for Pepsi and Coke.

the estimation. There are higher differences in worse performing copulas with one exception Gaussian copula. This could be explained, due to well explained fit around the center for the Gaussian copula. Moreover, this could be the reason why Gaussian copula in estimation and simulation almost coincide. However, the tails are long, therefore Student’s t -copula, as the ex-tension to Gaussian copula is strongly advised. Similarly, to the previous competitor pair Nike and Adidas, Coke and Pepsi competitor pair also favors BB1 and BB7 copula. Consequently, this competitor pair should be characterized by t -copula with such high log-likelihood and low BICt and AICt.

4.3.3 Copula modeling for Philip Morris and British American Tobacco

The final analysis regarding Philip Morris and British American Tobacco competitor pair follows the same logic as before. This particular competitor pair has the smallest set of ob-servations (n = 469) compared to previous two competitor pairs. Small sample sizes can be dangerous, since they might give indistinguishable outcomes, when choosing for the right cop-ula. For this competitor pair there is a clear preference for t -copula similarly to the analysis before.

Copulas AIC BIC Log-Lik. Sim. Log-Lik.

Gaussian -173.24 -169.09 87.62 90.46 Joe -71.04 -66.7 59.47 74.11 Student’s t -195.24 -186.94 99.62 103.49 Frank -174.64 -170.49 88.32 89.82 Gumbel -163.22 -159.07 82.61 93.67 BB1 -188.58 -180.28 96.3 102.7 BB6 -161.10 -157.86 82.6 94.72 BB7 -184.64 -176.34 94.32 101.7 BB8 -167.26 -158.96 85.63 85.43

Table 6: Copula models with their estimated AIC, BIC, maximum likelihood and average maximum log-likelihood with n = 100 simulations for Philip Morris and British American Tobacco.

(24)

Figure 7: Different copula simulation models for Philip Morris and British American Tobacco.

4.4 Resulting copulas for each competitor pair

The previous analysis for each competitor pair favors three possible parametric copulas, particularly, Student’s t, BB1 and BB7 copulas, depending on the true underlying copula. To be able to compare these copulas AIC and BIC was calculated. Some of the copulas show similar log-likelihoods to other copulas, and similarly some dependence structures have similar AIC and BIC outcomes. Since on of the goals is to compare Gaussian copula to other defined copulas, these analysis must be further established.

The Nike and Adidas competitor pair has almost similar information criteria for Student’s t, BB1 and BB7 copula, therefore it is not clear, which copula can provide the most suitable fit. Lets assume t -copula is as good as BB7 copula and later see the results applying on of the goodness-of-fit tests. For other competitor pairs the Student’s t -copula must be chosen, since it provides the lowest AIC and BIC for Coke and Pepsi, as well as for Philip Morris and British American Tobacco.

(25)

trans-formed log-returns. However, every competitor pair has almost coinciding estimated and simu-lated log-likelihoods for Gaussian copula, which is the case of well explained distribution center. At the same time Student’s t -distribution is always favored, since it can give more insights in the tails. Also, if the parametric copula can be a good estimate of the empirical copula, then increasing samples should provide even stronger results, meaning higher log-likelihoods and even more significant preferences based on AIC and BIC.

4.5 Using Goodness-of-Fit tests

In addition to previous work, it is possible to find the best fitting copula using gofCopula() in R.The literature by Kojadinovic explains how recent large scale simulations indicate that a powerful goodness-of-fit test for copulas can be obtained by comparing the empirical copula with a parametric estimate of the copula derived under the null hypothesis [13]. The best way to compute approximate p-values for statistics derived from this method consists of using the parametric bootstrap procedure. One of the drawbacks of such a test is the computational time, which increases as the sample size grows and becomes more time consuming, when implicit copulas are being tested. Each bootstrap iteration requires random number generation from the hypothesized (under null) copula with estimated parameters for the parametric copula. The gofCopula() function in R is based on research by Kojadinovic. This function requires several input variables and the resulting output is the p-value and Cr `amer-von Mises statistic. The resulting statistic is a measure, which allows to compute approximated p-value.

Sn = Z [0,1]d nCn(u) − Cθn(u) 2 dCn(u) (23) N−1 N X k=1 1S_n(k) ≥ Sn (24)

If the resulting p-values < 0.05 then the null can be rejected. In the other case, when p-values are larger, the hypothesized parametric copula can be the overall dependence structure for the empirical copula with estimated parameters. Furthermore, by taking each parametric copula and applying gofCopula() test, the outcome would give the resulting parameters as well as the resulting p-values for each competitor pair. The results should be close to results in Section 4.3. in terms of copula choice. This test can be very useful in comparison to previous analysis, because it does not compare copulas, instead it answers the question if the parametric copula is a good estimate for the empirical one. Notice, each bootstrap iteration randomly generations numbers from the hypothetical distribution, running gofCopula() function more times would generate slightly different p-values.

Another way to find the best fitting copula is to use function BiCopSelect(). By choosing the selection criterion (for example AIC or BIC) and estimation method (maximum log-likelihood), this function would return the best fitting copula for chosen competitor pair and estimated parameters. The only drawback of this function is that it provides only one outcome, which is the best outcome given the input. Recall the Nike and Adidas case, when all the parametric copulas have similar information criteria. The BiCopSelect() would return BB7 copula as the best estimated copula based on AIC or BIC. In practise, it would be much easier to deal with Student’s t -distribution, instead of BB7, but using BiCopSelect() only the best result is being presented. Moreover, this method can be seen as a comparison method (similar to section 4.3. approach). Consequently, the best is to choose the possible parametric copulas and manually calculate log-likelihoods with AIC and BIC, and later test, if these copulas are indeed suitable by applying gofCopula() test.

(26)

To be consistent with the analysis in 4.3. the resulting p-values for chosen copulas are pre-sented in Table 7. Note, the gofCopula() runs very slow for Student’s t -copula. This particular test is based on large set of simulations, hence the resulting p-value change, if test is being run again, however it does not change much.

Competitor pair Copula p-value

Nike and Adidas Student’s t 0.004

Coke and Pepsi Student’s t 0.227

Philip Morris and British American Tobacco Student’s t 0.301

Table 7: Applying Goodness-of-Fit test for all competitors with estimated p-values.

Unfortunately, the choice of t -copula for Nike and Adidas is not being supported by p-value outcome. When the same function is applied to BB7 copula having slight preference over Student’s t -copula log-likelihood for Nike and Adidas case, the resulting p-value = 0.1653 (not present in Table 7). For other competitor pairs p-values are larger than 0.10 confidence level, therefore Student’s t -copula seems to not only have lowest information criteria, but also appro-priate fits in general, especially for Philip Morris and British American Tobacco.

4.6 Remarks about empirical and theoretical copulas

Previous sections explain the copula modeling for chosen competitors using R.The idea is to transform empirical or true copula in a well defined parametric copula. To be able to somehow reproduce these empirical copulas, the only way to do so is indeed find the most appropriate parametric copula. Still this is not a straightforward task, since theoretical copulas generalizes the true copula by estimating parameters. These parameters are necessary in order to know the overall dependence structure and to be able to simulate these patterns whenever necessary. If the parametric copula does not capture the right fit for true copula, it can have wrong intuition and consequently huge impacts when applied in practise. For instance, if small sample has relatively little amount of outliers, the Gaussian copula might seem convenient, while in larger samples the same copula can bring wrong forecasts, because of increasing num-ber of outliers. With this in mind, sample size plays crucial role in estimation. Moreover, the empirical copulas are the real copulas with its own distributions, which cannot be assigned to parametric copulas without loosing some information.

4.7 Copulas in real life applications

There are many fields, where copulas can be introduced. When there is a need to explain a general dependence structure then any field looking for dependence measurements can consider copula modelling. Recent medicine articles show copula modelling approach for drug sensitivity [7]. Copulas can be useful in engineering, when it is mandatory to analyze the reliability of complex systems of machine components with competing failure modes [8]. Also, it becomes more and more popular to use copula models for insurance claims models. A paper by Shi P. (et el.) [26] uses the Gaussian copula towards claims management. Even though the claims can be interpreted using different data-driven solutions, there is a demand for claims models. Copulas are particularly interesting in financial framework, when tail dependencies must be acknowledged. For instance, one application is Value-at-Risk (VaR). It can be explained as a statistical technique, which is used to measure and quantify the level of financial risk within a firm or investment portfolio over a specific time interval. This metric is mostly used by banks

(27)

to determine the ratio of potential losses in their portfolios. VaR can be seen as a quantile of a distribution [10]. Before copulas became an essential tool, the VaR was being calculated using covariance-variance matrices for random variables. However, calculations in such a way require normally distributed random variables. Using this approach correlations related to each asset in the investment portfolio can give the approximate variance and standard deviation for the entire portfolio. Knowing the portfolio standard deviation and the required quantile of VaR, there is a straightforward way to calculate VaR. The problem with this type of approach is the idea of using the quantile of a normal distribution, which can underestimate the potential risk. Therefore, some experts agree to use copulas for VaR calculations. It might differ quite significantly when comparing its value under the assumption of joint normality and when having heavy tailed margins [16].

Copulas offer further applications in risk management like modeling joint defaults for credit risks. According to European Banking Authority most of the European banks are regulated by Basel III, which represents certain set of reform measures in banking prudential regulation. Without going too much into detail, one of the requirements of the Basel III is to calculate the probability of default, which can give insights of how big should be the capital in order to absorb the financial shocks [6]. In such models the goal is to calculate the probability of default. Since banks are lending money to individuals or other banks, they must see their own credibility. Additionally, they should be able to define if the potential clients are able to re-pay their debts, hence all risks must be taken into account. Banks would like to have minimum defaulting clients, but these are risks which can be defined based on past or expert opinions, but they are not certain outcomes, therefore must be modelled with accuracy. For credit defaults banks must have additional capital, which should provide enough amount of money in case of occurring defaults. Therefore copulas can be very useful in understanding the overall pattern in credit portfolios and the dependence structure in times of crisis. In fact, the underlying probability default model under Basel III is based on the first order derivative of Gaussian copula [18].

The analysis of biggest competitor pairs can be useful for portfolio investment strategies. Understanding the dependence structure under the t -copula can provide better intuition of stock performance in different economical environments. Due to tail dependence, the portfolio returns will be more extreme for t -copula compared to Gaussian. From financial point of view, all the securities must vary by industries, consequently including the same industry stocks can be very risky and limiting the diversification in the portfolio.

(28)

5 Conclusion

The original hypothesis state every competitor pair has an unique parametric copula. This research provides three competitor pair analysis on weekly stock returns. The t -copula is a good approximation for Pepsi and Coke, and Philip Morris and British American Tobacco. For Nike and Adidas the most suitable copula is the BB7 (Joe-Clayton) copula, although the information criteria were close to t -copula. Some of the concluding remarks can be oriented towards complexity of assumptions. The time dependence still is a problem, especially for Novartis and Pfizer, which was not considered within this work. The time frequency is the key to understand if the series are time related, but using models dealing with time dependence could increase the sample size and give better insights of the possible parametric copula. Intuitively, larger samples should give better idea of the parametric copula. However, smallest sample in the analysis had the highest p-value for the goodness-of-fit test, therefore more asset log-returns must be considered to argue about sample sizes. In general, models involving time variation cannot be ignored and perhaps should be established in the future.

The R software becomes very handy in copula modeling. The copula functions help faster approach the results, but they must be studied in detail, due to complex copula theory. More-over, the R software is rather limited for parametric copula choices, but so far sufficient for financial applications. For future copula models, it could be wise to create own copulas and study their goodness-of-fit. Also future developments could involve multivariate copulas by adding more industry related competitors. Adding more dimensions will increase the copula complexity. With extra dimensions copulas will become less intuitive and impossible to visu-alize. The goodness-of-fit tests will become essential and problematic to interpret as well, but definitely valuable in many fields.

Nevertheless, copulas are very helpful and should be applied in quantitative framework. Practitioners should pay more attention to other existing copulas than Gaussian, due to the re-sults of this research. The Gaussian copula can be used for explaining the non-extreme outcomes, but otherwise different copulas must be investigated. More sophisticated analysis towards choice of copula models will require more knowledge and experience in hand. The demanding copula environment will require more experts in copula modeling, therefore could desire more con-trasting copula structures and rational interpretations in the future.

(29)

A Appendix

(30)

B Appendix

(31)

C Appendix

(32)

D Appendix

Philip Morris and British American Tobacco copula simulations

for sample size n = 50000

(33)

References

[1] _{Apple Inc. (AAPL). Yahoo!Finance. (2017). url:} https://finance.yahoo.com/quote/ AAPL/history?p=AAPL. (consulted: 10.04.2017).

[2] _{Adidas AGG. (ADDYY). Yahoo!Finance. (2017). url:} https://finance.yahoo.com/ quote/ADDYY/history?p=ADDYY. (consulted: 10.04.2017).

[3] _{British American Tobacco p.l.c. (BTI). Yahoo!Finance. (2017). url:} https://finance. yahoo.com/quote/BTI/history?p=BTI. (consulted: 10.04.2017).

[4] X. Chen and Y. Fan. Pseudo-likelihood ratio tests for semiparametric multivariate copula model selection. (2000). url: http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.331.5882&rep=rep1&type=pdf. (consulted: 20.09.2017).

[5] M. Constantino, M. Larran, and C. A. Brebbia. Compuational Finance and its Applica-tions III. (2008).

[6] _{European Banking Authority. (2018). url:} http://www.eba.europa.eu/regulation-and-policy/implementing-basel-iii-europe. (consulted: 10.01.2018).

[7] S. Haider et al. A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction. (2015). url: https://doi.org/10.1371/journal.pone. 0144490.

[8] W. Han, J. Zhou, and K. Sun. Copula analysis of structural systems reliability with cor-related failure mode. Vol. 22. 2011, pp. 278–282.

[9] _{M. Hofert et al. Multivariate Dependence with Copulas. url:}https://cran.r-project. org/web/packages/copula/index.html. (consulted: 05.05.2017).

[10] _{Investopedia: Value -at-Risk. (2017). url:} https://www.investopedia.com/terms/v/ var.asp. (consulted: 01.10.2018).

[11] _{R. Kaas et al. Modern Actuarial Risk Theory - Using R. Springer, (2008). isbn:} 978-3-642-03407-7.

[12] _{The Coca-Cola Company (KO). Yahoo!Finance. (2017). url:} https://finance.yahoo. com/quote/KO/history?p=KO. (consulted: 10.04.2017).

[13] I. Kojadinovic and J. Yan. A Goodness-of-Fit Test for Multivariate Multiparameter

Copu-las Based on Multiplier Central Limit Theorems. (2012). url:https://pdfs.semanticscholar. org/b372/310a3470bc17715afc934fac45ff2aa7234a.pdf. (consulted: 01.08.2018).

[14] E. Kole, K. Koedijk, and M. Verbeek. Selecting Copulas for Risk Management. (2006). url: http : / / www . risknet . de / fileadmin / eLibrary / Copula Risk Management -Kole.pdf.

[15] F. Li. Modeling Covariate-Contingent Correlation and Tail-Dependence with Copulas. (2016). url: https://arxiv.org/pdf/1401.0100.pdf. (consulted: 01.07.2017).

[16] _{H. Manner. Modelling Assymetric and Time-Varying Dependence. (2010). url:} https: //cris.maastrichtuniversity.nl/portal/files/667227/guid-ae8195ad-cf0b%20-4744-8bb1-6a44fbe10fe7-ASSET1.0. (consulted: 20.09.2018).

[17] A.J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management. Princeton Uni-versity Press, (2015). isbn: 978-0-691-03407-16627-8.

[18] _{F. Moreira. Copulas and credit risk models: some potential developments. (2015). url:}

https://tinker.uebs.ed.ac.uk/waf/mdb_event_v2/get_file.php?event_file_id= 219. (consulted: 01.10.2018).

[19] _{Nike Inc. (NKE). Yahoo!Finance. (2017). url:} https://finance.yahoo.com/quote/ NKE/history?p=NKE. (consulted: 10.04.2017).

(34)

[20] _{Novartis AG (NVS). Yahoo!Finance. (2017). url:}https://finance.yahoo.com/quote/ NVS/history?p=NVS. (consulted: 10.04.2017).

[21] _{A. J. Patton. Copula Methods for Forecasting Multivariate Time Series. (2015). url:}

https://pdfs.semanticscholar.org/38cf/b828dcfc3eb4bff3f50504%20ecb9e5ee5f7b53. pdf.

[22] _{Pepsico Inc. (PEP). Yahoo!Finance. (2017). url:} https://finance.yahoo.com/quote/ PEP/history?p=PEP. (consulted: 10.04.2017).

[23] _{Pfizer Inc. (PFE). Yahoo!Finance. (2017). url:} https://finance.yahoo.com/quote/ PFE/history?p=PFE. (consulted: 10.04.2017).

[24] _{Philip Morris International Inc. (PM). Yahoo!Finance. (2017). url:} https://finance. yahoo.com/quote/PM/history?p=PM. (consulted: 10.04.2017).

[25] R Development Core Team. R Foundation for Statistical Computing. (2012).

[26] P. Shi, X. Feng, and J. P. Boucher. Multilevel Modeling of Insurance Claims Using Copula.

(2015). url:https://pdfs.semanticscholar.org/90bf/55a66c44c07521d0e511a3f35eae90bbf44e. pdf. (consulted: 10.01.2018).

[27] _{Samsung Electronics Co. (SSNLF). Yahoo!Finance. (2017). url:} https : / / finance . yahoo.com/quote/SSNLF/history?p=SSNLF. (consulted: 10.04.2017).

[28] Y. Tang, V.N. Huynh, and J. Lawry. Integrated Uncertainty in Knowledge Modelling and Decision Making. (2015).

Copula modeling for world's biggest competitors