• No results found

Parameterizing non-normality : the five moments distribution

N/A
N/A
Protected

Academic year: 2021

Share "Parameterizing non-normality : the five moments distribution"

Copied!
76
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Parameterizing Non-Normality

− The Five Moments Distribution −

Steve Robert Schmidt University of Amsterdam

Abstract

I present a formal framework to parameterize a probability function that matches five finite moments of a data set. The calibrated model is closer to what is being called “well-specified”.

Supervisor: Prof. Dr. Rob Kaas Second reader: Dr. Sami Umut Can

(2)

Contents

1 Introduction . . . 1

2 Ansatz . . . 2

3 Implicit functions . . . 5

4 Case analysis . . . 13

4.1 Positivity of the implicit non-central second moments . . . 13

4.2 Positivity of the implicit variances . . . 15

4.3 Implicit inequalities . . . 17

4.4 A narrow range . . . 20

4.5 The solution set . . . 28

5 Uniqueness of the solution . . . 29

6 The formal framework to fit four moments − An algebraic summary . . . 33

6.1 Numerical solution . . . 34

7 Parametric choices and their implications − Graphical illustrations . . . 34

8 Fitting a fifth moment . . . 39

8.1 Marginal changes in described and implied moments . . . 39

8.2 Numerical implementation . . . 46

9 An actuarial application . . . 46

9.1 Empirical and theoretical risk measures . . . 48

9.2 Model validation . . . 49

10 Conclusion . . . 56

A Appendix - Matlab codes . . . 58

A.1 Required implicit and nested functions . . . 58

A.2 Loop for plotting a family of PDFs . . . 60

A.3 Two variable solver . . . 63

(3)

1 Introduction

Modern statistics is in large parts grounded on the far-reaching legacy of Carl Friedrich Gauss (1777-1855). His popularity as a young mathematician and astronomer arises, among others, from his works on the calculation of planetary orbits. These derived its origin from the discovery of Ceres, a supposedly new planet (actually an asteroid, which is located between Mars and Jupiter), at New Year’s day in 1801 by the Italian astronomer Giuseppe Piazzi. Being observable for few weeks, Ceres eventually disappeared behind the sun. Gauss, and his European colleagues, aimed at calculating its reemergence in the sky. In late 1801, he was able to predict where its position would be. Gauss finished the manuscript, explaining his methodology, in 1806 and published it in 1809 under the title Theoria motus corporum coelestium in sectionibus conicis solem ambientium(Theory of the motion of the heavenly bodies about the sun in conic sections). Therein, he had employed a probabilistic framework to minimize measurement errors, i.e. the method of least squares (LS) as well as the maximum likelihood estimation (MLE), and introduced the normal distribution. It is disputable whether LS was first known to Adrien-Marie Legendre in 1805. Yet, these concepts have endured and been refined to draw statistical inference.

The density shape of the normal distribution is completely characterized by two parameters: the mean and variance of the sample. The so-called “bell curve” assumes symmetry of measurements (or errors) around their most likely value, i.e. the average, whereby large deviations are less likely than small deviations. In contrast to the error curve proposed by Pierre-Simon Laplace in 1774, the Gaussian distribution is continuously differentiable and does not exhibit excess kurtosis. The latter is, besides symmetry, a somewhat unfavorable feature of the normal distribution, as we do observe asymmetries and extremes in reality. Subbotin (1923) united both distributions and proposed the generalized error distribution (GED); also called exponential power distribution. The probability density function (PDF) is given by

h(x) = β

2σ · Γ(1/β )· e

−|x−µσ,

where β is an additional shape parameter. Particular values for β include the set of Laplace distributions (β = 1) as well as the normal distributions (β = 2). That way, the GED allows to model heavy tails (positive excess kurtosis) in a more flexible way. The parameters are calculated using MLE, such that the resulting density aligns the data as close as possible. This parameterization, however, does not exactly reflect the fourth moment, and it also assumes symmetry, i.e. all central odd moments are jointly equal to zero. Extensions and variations regarding the skewness and kurtosis have been suggested since the beginning of the 20th century. New probabilities have been constructed, for which higher moments are often not defined or tend to infinity, as these evolve progressively for leptokurtic variables. A prominent example is the class of α-stable distributions, specified by Paul Lévy in 1925.

The more recent literature promotes further generalizations of the Gaussian and Laplace laws of probability. Often, these curves are low-parameterized, and involve rather complex evolutions for the higher moments, especially on account of the Gamma function, Γ(x). Even though this might be justifiable, there is no reason to believe that the

(4)

evolution of moments, dictated by a handful of parameters under a certain probability function, indeed comes close to that of the real data. The very same can be said for my model (in a higher dimension).

To be more specific, I propose a probability model that is composed of two Gaussian distributions, where parameters are calibrated in such a way that five finite moments can be captured. Starting at the beginning, I will first explain my formal approach (“Ansatz”), and then I give an overview of how the remainder of this thesis is structured.

2 Ansatz

The k-th non-central moment, mk, of a (continuous) variable Y with y ∈ R, under a (presumed) theoretical

distribu-tion P(y), is given by

mk:= Z +∞ −∞ ykp(y) dy =: EhYk| Pi , (1) where m0 = 1 , since P(y) := Z y −∞p(z) dz ,

defines a probability function for Y that monotonically increases from zero to one for y ∈ (−∞ , +∞) .

The empirical mean and variance are defined by µY:= m1and σY2:= m2− m21, respectively. For k ≥ 3, I define the

k-th central moment of Y under P(y) by

µk:= Z +∞ −∞ (y − m1) k p(y) dy = Eh(Y − m1)k| P i = E " k

i=0 k i  Yi(−m1)k−i P # = k

i=0 k i  (−m1)k−iE h Yi P i (1) = k

i=0 k i  (−m1)k−imi = k

i=0 k i  mk−i(−m1)i, k≥ 3 , (2)

(5)

Let the PDF of Y , i.e. p(y), be composed of two normal density functions, π1(y) and π2(y), with parameters µ1,

µ2, σ1and σ2, that are mixed by a weight w ∈ (0 , 1) such that

p(y) = wπ1(y) + (1 − w)π2(y)

= w · 1 σ1 √ 2πe −(y−µ1) 2 2σ 2 1 + (1 − w) · 1 σ2 √ 2πe −(y−µ2) 2 2σ 2 2 (3)

satisfies Kolmogorov’s axioms of probability. I then claim that there exists a real-valued solution vector within the five-dimensional parameter space that matches five empirical moments of Y via p y ; w∗, µ1∗, µ2∗, (σ2

1)∗, (σ22)∗.

The existence of such a solution and its possible restrictions will be analyzed in what follows.

A Gaussian random variable U ∼ N(µ, σ ) is formally represented by U= µ + σV , where V ∼ N(0, 1)

follows a standard normal distribution. Applying the binomial theorem, the higher non-central moments of U can be calculated by E[Uk] = k

i=0 k i  µk−iσiE[Vi] .

Thereby, the (central) moments of the incorporated standard normal random variable V are determined by a deter-ministic expression for i ∈ N0, and evolve as follows:

E[Vi] =          (i − 1)!! = (i − 1) · (i − 3) . . . 5 · 3 · 1 if i even (−1)!! = 1 if i = 0 0 if i odd = 1 + (−1) i 2  (i − 1)!! , such that E[Uk] = k

i=0 k i  µk−iσi 1 + (−1) i 2  (i − 1)!! , (4)

where (i − 1)!! denotes the double factorial. The first six non-central moments of U thus compute to E[U ] = µ E[U2] = µ2+ σ2 E[U3] = µ3+ 3µσ2 E[U4] = µ4+ 6µ2σ2+ 3σ4 E[U5] = µ5+ 10µ3σ2+ 15µσ4 E[U6] = µ6+ 15µ4σ2+ 45µ2σ4+ 15σ6.

(6)

Using (1) and (3), the empirical (or observed) non-central moments of Y , i.e. mk, can be modelled by a probability

function that consists of two component normal distributions, defined by Π1and Π2, such that

mk= Z +∞ −∞ ykp(y) dy = Z +∞ −∞ yk  wπ1(y) + (1 − w)π2(y)  dy = w Z +∞ −∞ ykπ1(y) dy + (1 − w) Z +∞ −∞ ykπ2(y) dy = wEhyk| Π1 i + (1 − w)Ehyk| Π2 i .

Under the evolution of moments of normal random variables in (4), the first six non-central moments of Y can thus be expressed by the following non-linear system of equations:

m0 = w + (1 − w) m1 = w(µ1) + (1 − w)(µ2) (5a) m2 = w(µ12+ σ12) + (1 − w)(µ22+ σ22) (5b) m3 = w(µ13+ 3µ1σ12) + (1 − w)(µ23+ 3µ2σ22) (5c) m4 = w(µ14+ 6µ112+ 3σ14) + (1 − w)(µ24+ 6µ222+ 3σ24) (5d) m5 = w(µ15+ 10µ13σ12+ 15µ1σ14) + (1 − w)(µ25+ 10µ222+ 15µ2σ24) (5e) m6 = w(µ16+ 15µ14σ12+ 45µ12σ14+ 15σ16) + (1 − w)(µ26+ 15µ24σ22+ 45µ22σ24+ 15σ26) . (5f)

To solve this system, I will first express µ2, σ12 and σ22as implicit functions of µ1for some fixed w ∈ (0 , 1) and

given empirical moments, mk, in the subsequent section. In section 4, a case analysis with respect to the component

variances is performed. The focus will be the verification of a solution set w.r.t. w and µ1for which both variances

exist. In section 5, I prove that it is possible to find a unique solution that exactly matches the first four empirical moments of Y based on previously elaborated thoughts. Section 6 provides a summary of equations that need to be considered in an algorithm to fit these moments. The choice of w in a four moment fitting procedure not only has vital implications for the existence of component variances and corresponding densities π1(y), π2(y), and p(y), but

also for the magnitude of implied higher moments (beyond m4), which is dictated by the evolution of moments of

the fitted Gaussian distribution parameters, discussed in section 7. Families of PDFs will illustrate this graphically. As will become apparent, the set-up will leave us with one free parameter, which can be used to ultimately capture a fifth moment, examined in section 8. Subsequently, I will apply several five moment curve fitting procedures to a real data set in section 9, including theoretical in comparison to empirical risk measures and test results for the goodness of fit. Section 10 concludes the thesis.

(7)

3 Implicit functions

In the following, I derive expressions for µ2, σ12and σ22in terms of µ1for some fixed w ∈ (0 , 1) using the system

of equations (5a) to (5f). The goal is to reduce the dimension of this non-linear problem as far as possible.

From (5a) we immediately find an implicit function for µ2, i.e.

m1= w(µ1) + (1 − w)(µ2)

⇔ µ2∗= m1− wµ1

(1 − w) . (6)

Regarding the derivation of implicit expressions for σ12and σ22, I will present two approaches. The first will only have a motivational character and will be disregarded for reasons stated below.

By (5b) we have m2= w(µ12+ σ12) + (1 − w)(µ22+ σ22) ⇔ σ2 2 = m2− w(µ12+ σ12) (1 − w) − µ 2 2 . (7)

Consider first (5b) and (5d) to derive an implicit function for σ12:

m4− 3m22= w(µ14+ 6µ12σ12+ 3σ14) + (1 − w)(µ24+ 6µ22σ22+ 3σ24) − 3w2(µ12+ σ12)2− 6w(1 − w)(µ12+ σ12)(µ22+ σ22) − 3(1 − w)2(µ22+ σ22)2 = w(3µ14+ 6µ112+ 3σ14) − 2wµ14+ (1 − w)(3µ24+ 6µ222+ 3σ24) − 2(1 − w)µ24 − 3w2(µ12+ σ12)2− 6w(1 − w)(µ12+ σ12)(µ22+ σ22) − 3(1 − w)2(µ22+ σ22)2 = 3(w − w2)(µ12+ σ12)2− 6w(1 − w)(µ12+ σ12)(µ22+ σ22) + 3 (1 − w) − (1 − w)2 (µ22+ σ22)2 − 2wµ14− 2(1 − w)µ24 = 3(w − w2) (µ12+ σ12) − (µ22+ σ22)2− 2wµ14− 2(1 − w)µ24 (7) = 3(w − w2)  (µ12+ σ12) −  µ22+m2− w(µ 2 1+ σ12) (1 − w) − µ 2 2 2 − 2wµ4 1− 2(1 − w)µ24 = 3(w − w2) (µ 2 1+ σ12) − m2 (1 − w) 2 − 2wµ4 1− 2(1 − w)µ24 = 3 w 1 − w µ 2 1+ σ12− m2 2 − 2wµ14− 2(1 − w)µ24 ⇔ σ2 1 = m2± r (1 − w) 3w  m4− 3m22+ 2wµ14+ 2(1 − w)µ24  − µ2 1 , (8)

(8)

for which I define τ1:= r (1 − w) 3w  m4− 3m22+ 2wµ14+ 2(1 − w)µ24  such that σ12= (m2± τ1) − µ12, or σ12= (m2)1− µ12,

respectively. Using (7) and (8), it follows that

σ22= m2− w  µ12+ m2± r (1−w) 3w  m4− 3m2 2+ 2wµ14+ 2(1 − w)µ24  − µ2 1  (1 − w) − µ 2 2 = m2∓ w 1 − w r (1 − w) 3w  m4− 3m2 2+ 2wµ14+ 2(1 − w)µ24  − µ22 = m2∓ r w 3(1 − w)  m4− 3m2 2+ 2wµ14+ 2(1 − w)µ24  − µ22, (9)

for which I define

τ2:= r w 3(1 − w)  m4− 3m2 2+ 2wµ14+ 2(1 − w)µ24  such that σ22= (m2∓ τ2) − µ22, or σ22= (m2)2− µ22.

As can be seen from the implicit variance expressions in (8) and (9), the non-negative square roots τ1and τ2 shift

the non-central second moment of Y , i.e. m2, either in a positive or a negative direction resulting in non-central

second moments of the two component normal distributions of (m2)1:= m2± τ1and (m2)2:= m2∓ τ2. For both

variances to exist simultaneously, either τ1is unrestrictedly positive and τ2 has an upper bound of m2− µ22, or τ2

is unrestrictedly positive and τ1has an upper bound of m2− µ12. The implications regarding the locations and the

degree of the dispersions are, however, entirely different in both cases; we either have

(m2)2 < (m2)Y < (m2)1, or

(9)

For instance, let us assume that w ∈ (0 , 1) and |µ1| < |µ2|. Then it follows for the first solution that

(m2)2− (µ2)2 < (m2)Y− (wµ1+ (1 − w)µ2)2 < (m2)1− (µ1)2

⇔ σ22 < σY2 < σ12,

whereas the second solution implies (under the same assumption) that

(m2)1− (µ2)2 < (m2)Y− (wµ1+ (1 − w)µ2)2 < (m2)2− (µ1)2

⇔ σ2

1+ (µ1)2− (µ2)2 < σY2 < σ22+ (µ2)2− (µ1)2,

which leads to

σ12− σ22 < 2(µ2)2− 2(µ1)2.

In the first case, the first density component, with a lower mean in absolute terms, also exhibits a higher dispersion in comparison to the second component. In the second case, however, it is still possible for the first component to simultaneously have a lower variance, i.e. σ12< σ22(see lhs), since the right-hand side is positive by construction (|µ1| < |µ2|). Apparently, each solution pair for the implicit variances is subject to a restriction w.r.t. the skewness

of the mixed Gaussian distribution that is to be parameterized. In this setting, the first solution likely corresponds to a left-skewed variable (higher lower-quantile dispersion), whereas the second solution would be applicable to right-skewed variables (higher upper-quantile dispersion).

The existence of two implicit solutions for either component variance and the limited capability to unrestrainedly capture the skewness, for each solution pair on its own, lead to half-blind computations, if the algorithm bases only on one solution pair, i.e. if either σ12= (m2+τ1)− µ12and σ22= (m2−τ2)− µ22, or alternatively σ12= (m2−τ1)− µ12

and σ22= (m2+ τ2) − µ22was chosen to model the empirical characteristics of Y . On top of that, it unnecessarily

complicates the case analysis.

For these reasons, I will henceforth consider equation (5b) and (5c) to derive the implicit functions for σ12and σ22 from a pair of equations that is linear in the component variances to rule out the existence of multiple solutions for either of them.

Writing (5b) and (5c) in matrix notation,   m2 m3   =   wµ12+ (1 − w)µ2 2 wµ13+ (1 − w)µ23  +   w (1 − w) 3wµ1 3(1 − w)µ2     σ12 σ22  

for which I define on both sides of the equation

m := a + Aσ2,

(10)

such that the solution to this linear system of equations is given by σ2 = A−1b = 1 det(A)adj(A) b ⇔   (σ12)∗ (σ2 2)∗   = 1 a11a22− a12a21   a22 −a12 −a21 a11     b1 b2   ⇔   (σ12)∗ (σ22)∗   = 1 3w(1 − w)(µ2− µ1)   3(1 − w)µ2 −(1 − w) −3wµ1 w     m2− (wµ12+ (1 − w)µ22) m3− (wµ13+ (1 − w)µ23)   ,

from which we obtain the following expressions:

12)∗ = (3(1 − w)µ2)  m2− (wµ2 1+ (1 − w)µ22)  − (1 − w)m3− (wµ3 1+ (1 − w)µ23)  3w(1 − w)(µ2− µ1) = 3µ2  m2− (wµ2 1+ (1 − w)µ22)  −m3− (wµ3 1+ (1 − w)µ23)  3w(µ2− µ1) = 3µ2  m2− (1 − w)µ22  −m3− (−2wµ13+ (1 − w)µ23)  − 3wµ2µ12+ 3wµ13 3w(µ2− µ1) = 3µ2  m2− (1 − w)µ22  −m3− ((1 − w)µ23− 2wµ13)  − 3w(µ2− µ1)µ12 3w(µ2− µ1) =  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12, (10) and (σ22)∗ = (−3wµ1)  m2− (wµ2 1+ (1 − w)µ22)  + wm3− (wµ3 1+ (1 − w)µ23)  3w(1 − w)(µ2− µ1) =  m3− (wµ3 1+ (1 − w)µ23)  − 3µ1  m2− (wµ2 1+ (1 − w)µ22)  3(1 − w)(µ2− µ1) =  m3− (wµ13− 2(1 − w)µ23)  − 3µ1  m2− wµ12  − 3(1 − w)µ3 2+ 3(1 − w)µ1µ22 3(1 − w)(µ2− µ1) =  m3− (wµ13− 2(1 − w)µ23)  − 3µ1  m2− wµ12  − 3(1 − w)(µ2− µ1)µ22 3(1 − w)(µ2− µ1) =  m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22. (11)

(11)

It is easy to show that the implicit variances given under (10) and (11) match the variance of Y (by construction): σY2:= m2− m21 = w(µ12+ σ12) + (1 − w)(µ22+ σ22) − m21 = w    3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1)  + (1 − w)    m3− 3µ1m2+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1)   − m21 =  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  +m3− 3µ1m2+ 2 wµ13+ (1 − w)µ23  3(µ2− µ1) − m2 1 = m2 − m21.

However, for the component variances to exist two conditions have to be satisfied. First, A must be a non-singular matrix, which requires that

w6= {0, 1} ∧ µ16= µ2.

Second, the variances of the two normal density components need to be positive, i.e.

12)∗ =  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12 = (m2)1− µ12 > 0 , (σ22)∗ =  m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ2 2 = (m2)2− µ22 > 0 .

That is, the factor pairs of the non-central second moments, (m2)1and (m2)2, must jointly exhibit a positive sign,

and exceed the individual component means. This is influenced by the sign of the difference in means, i.e. µ2− µ1,

as well as the magnitude of the empirical moments up to m3, and leaves us with two distinct cases, assuming that

w∈ (0 , 1). This will be examined in the next section.

With respect to the impact of the third moment on the implicit variances, we find the following (static) relations: ∂ (σ12)∗ ∂ m3 = − 1 3w(µ2− µ1) , ∂ (σ22)∗ ∂ m3 = 1 3(1 − w)(µ2− µ1) .

Let w ∈ (0, 1), then a gradual increase of m3 leads to a lower dispersion of the first normal density component if

µ1< µ2, as well as a respective increase of the variance of the second component. The opposite is true if µ1> µ2.

This demonstrates that this unique solution for the variances as functions of µ1(after replacing µ2for its implicit

expression using (6)) is able to capture both higher lower-quantile and upper-quantile dispersion, being universally applicable for left- and right-skewed variables, respectively. In other words, it gives us an idea of how the location of the means and the magnitude of component variances accommodate the empirical skewness.

(12)

The derived implicit expressions for µ2, σ12and σ22, given in (6), (10), and (11), will now be inserted in the fourth

moment equation (5d). In doing so, the resulting function (of µ1) will reflect the empirical moments of Y up to m4.

m4 = w(µ14+ 6µ12σ12+ 3σ14) + (1 − w)(µ24+ 6µ22σ22+ 3σ24) = w(3µ14+ 6µ112+ 3σ14) + (1 − w)(3µ24+ 6µ222+ 3σ24) − 2wµ14− 2(1 − w)µ24 = 3w(µ12+ σ12)2 + 3(1 − w)(µ22+ σ22)2− 2wµ4 1− 2(1 − w)µ24 =  3m2µ2− 2(1 − w)µ23− 2wµ13− m3 2 3w(µ2− µ1)2 +  m3+ 2wµ13+ 2(1 − w)µ23− 3µ1m2 2 3(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24,

for which I define

c:= 2wµ13+ 2(1 − w)µ23+ m3, such that m4= 3m2µ2− c 2 3w(µ2− µ1)2 + c− 3µ1m2 2 3(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24, = (1 − w)9m22µ22− 6c m2µ2+ c2  + wc2− 6c m2µ1+ 9m22µ12  3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 − 6c m2 wµ1+ (1 − w)µ2 + c2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 − 6c m2m1+ c2 3w(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 ± 9m22m21− 6c m2m1+ c2 3w(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 − 9m22 wµ1+ (1 − w)µ2 2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24 = 9m22  (w − w2)µ12− 2w(1 − w)µ1µ2+ (1 − w) − (1 − w)2  µ22  + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ14− 2(1 − w)µ24 = 9m22w(1 − w)µ12− 2w(1 − w)µ1µ2+ w(1 − w)µ22  3w(1 − w)(µ2− µ1)2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2w(1 − w)(µ1− µ2)2 3w(1 − w)(µ2− µ1)2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 3m22+ 3m2m1− m3− 2wµ 3 1− 2(1 − w)µ23 2 3w(1 − w) (µ2− µ1)2 − 2wµ14− 2(1 − w)µ24,

(13)

which is equivalent to 0 = 2 wµ14+ (1 − w)µ24 −  3m2m1− m3− 2 wµ13+ (1 − w)µ23 2 3w(1 − w) (µ2− µ1)2 + m4− 3m22.

Inserting (6) and further rearrangements

0 = 2(wµ14+ (1 − w)µ24) − 3m2m1− m3− 2(wµ 3 1+ (1 − w)µ23) 2 3w(1 − w)  µY−wµ1 (1−w) − µ1 2 + m4− 3m 2 2 ⇔ 0 = 2(wµ14+ (1 − w)µ24) − 1 − w 3w  3m 2m1− m3− 2(wµ13+ (1 − w)µ23) 2 (µY− µ1)2 ! + m4− 3m22 ⇔ 0 = 3w m4− 3m22+ 2(wµ14+ (1 − w)µ24) (µY− µ1)2− (1 − w) 3m2m1− m3− 2(wµ13+ (1 − w)µ23) 2 ⇔ 0 = 3w  m4− 3m22+ 2  wµ14+(µY−wµ1)4 (1−w)3  (µY− µ1)−2 − (1 − w)  3m2m1− m3− 2  wµ13+(µY− wµ1) 3 (1 − w)2 2 ,

lead to the function

f(µ1) = 3w  (m4− 3m22+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4  (µY− µ1)−2 −(3m2m1− m3− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 , (12) whose root µ1∗ will guarantee that the first four empirical moments, i.e. mk (k = 1, . . . , 4), are exactly fitted via

incorporated implicit expressions for µ2, σ12, σ22and some chosen weight w ∈ (0 , 1).

Equations (10), (11) and (12) can be written in terms of the (standardized) central moments of Y . Applying the binomial representation of central moments given in (2) and µY = m1we find

µ3= m3− 3m2m1+ 3m1m21− m31 = m3− 3m2m1+ 2m31 ⇔ 3m2m1− m3= 2m31− µ3 = σY3  2µ3 Y− µ3 σY3  = σY3 2  µY σY 3 − µ3 σY3 ! = σY3 2  µY σY 3 − γ1 ! =: δ1,

(14)

where γ1is the skewness of Y , and µ4= m4− 4m3m1+ 6m2m21− 4m1m31+ m41 = m4− 4m3m1+ 6m2m21− 3m41 ⇔ µ4− 3σY4= m4− 4m3m1+ 6m2m21− 3m41− 3(m2− m21)2 = m4− 3m22− 4m3m1+ 12m2m21− 6m41 ⇔ µ4− 3σY4+ 4µ3µY= m4− 3m22− 4m3m1+ 12m2m21− 6m41+ 4(m3− 3m2m1+ 2m31)m1 = m4− 3m22+ 2m41 ⇔ m4− 3m22= µ4− 3σY4+ 4µ3µY− 2µY4 = σY4  µ4− 3σY4+ 4µ3µY− 2µY4 σY4  = σY4 µ4 σY4− 3 + 2 µY σY 2µ3 σY3 −  µY σY 3!! = σY4 γ2+ 2 µY σY 2γ1−  µY σY 3!! =: δ2,

where γ2is the excess kurtosis of Y .

Both δ1 and δ2are parametric expressions of observed empirical moments of Y : mean, standard deviation,

skew-ness, and excess kurtosis.

Inserting δ1and δ2in (10), (11) and (12) leads to

12)∗=  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12 =  3m2µ2± 3m2w(µ1− µ2) − m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12 =  3m2 wµ1+ (1 − w)µ2 − 3m2w(µ1− µ2) − m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12 =  3m2m1− m3− 3(σY2+ µY2)w(µ1− µ2) − 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12 = σY2+ µY2+  δ1− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ2 1 , (13)

(15)

and (σ22)∗=  m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22 =  m3− 3m2µ1± 3m2(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22 =  m3− 3m2 wµ1+ (1 − w)µ2 − 3m2(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22 =  m3− 3m2m1− 3(σY2+ µY2)(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ2 2 = σY2+ µY2−  δ1− 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ2 2 , (14) and f(µ1) := 3w  (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4  (µY− µ1)2−  (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 . (15) So far, I have assumed that the implicit variances exist for a fixed weight w ∈ (0 , 1). Before I present an approach to solve for µ1∗ using the moment matching function (15), and prove the uniqueness of this solution, I will first provide the conditions w.r.t. w satisfying real-valued component variances. This is the goal of the subsequent case analysis.

4 Case analysis

In the following, I distinguish between case (i) and (ii), each corresponding to a different location of the component means. To be more specific, case (i) corresponds to µ1< µ2, and (ii) to µ1> µ2, whereby a wide range w ∈ (0 , 1)

is assumed in either case. For both cases, I derive relations that have to be satisfied to guarantee that the implicit component variances are positive. Of particular interest will be the determination of a narrower range for the weight was a subset of the initially assumed interval.

4.1 Positivity of the implicit non-central second moments (i) Consider (m2)∗1and (m2)∗2for w ∈ (0 , 1) and µ1< µ2:

(m2)∗1 =  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) > 0 ⇔ 3m2µ2− m3> 2 wµ13+ (1 − w)µ23 ,

(16)

(m2)∗2 =  m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) > 0 ⇔ 2 wµ13+ (1 − w)µ23 > 3m2µ1− m3.

Combined we find the relation

3m2µ2− m3> 2 wµ13+ (1 − w)µ23 > 3m2µ1− m3, which leads to µ2> µ1 ⇔ (µY− wµ1) (1 − w) > µ1 ⇔ µY> µ1.

(ii) Consider (m2)∗1and (m2)∗2for w ∈ (0 , 1) and µ1> µ2:

(m2)∗1 =  3m2µ2− m3− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) > 0 ⇔ 3m2µ2− m3< 2 wµ13+ (1 − w)µ23 , (m2)∗2 =  m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) > 0 ⇔ 2 wµ13+ (1 − w)µ23 < 3m2µ1− m3.

Combined we find the relation

3m2µ2− m3< 2 wµ13+ (1 − w)µ23 < 3m2µ1− m3, which leads to µ2< µ1 ⇔ (µY− wµ1) (1 − w) < µ1 ⇔ µY< µ1.

(17)

It is worth noting that the slack conditions µY > µ1 and µY < µ1 on their own do not guarantee that the second

non-central component moments, (m2)1and (m2)2, are positive for all w ∈ (0 , 1) in case (i) and (ii), respectively.

Their sign, or more specifically the range for their existence, is also influenced by the magnitude of the empirical moments of Y up to m3. Apart from that and arguing from a plain analytical point of view regarding the set of all

real-valued points (m1, m2), a positive non-central second moment does not represent a sufficient condition for a

positive variance; the opposite, however, does. This can be seen from the following implications:

σ2= m2− m21> 0 ⇒ m2> 0 , but m2> 0 6⇒ σ2= m2− m21> 0 .

The determination of a range for w under which both component variances exist thus has to be grounded on the variances themselves, yet I will make use of the slack conditions in each case.

4.2 Positivity of the implicit variances (i) Consider (σ2

1)∗and (σ22)∗as given by (13) and (14) for w ∈ (0 , 1) and µ1< µ2:

12)∗ = σY2+ µY2+  δ1− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ12> 0 ⇔ 3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1, (σ22)∗ = σY2+ µY2−  δ1− 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22> 0 ⇔ −3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1. Combined we find 3w(µ2− µ1) µ12− σY2− µY2 < −3(1 − w)(µ2− µ1) µ22− σY2− µY2  ⇔ w µ12− σY2− µY2 < −(1 − w) µ22− σY2− µY2  ⇔ wµ12+ (1 − w)µ22< σY2+ µY2.

By Jensen’s inequality it follows that

µY2 = (wµ1+ (1 − w)µ2)2 ≤ wµ12+ (1 − w)µ22< σY2+ µY2

⇔ σY2> 0 ,

(18)

(ii) Consider (σ12)∗and (σ22)∗as given by (13) and (14) for w ∈ (0 , 1) and µ1> µ2: (σ12)∗ = σY2+ µY2+  δ1− 2 wµ13+ (1 − w)µ23  3w(µ2− µ1) − µ2 1> 0 ⇔ 3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1, (σ22)∗ = σY2+ µY2−  δ1− 2 wµ13+ (1 − w)µ23  3(1 − w)(µ2− µ1) − µ22> 0 ⇔ −3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1. Combined we find 3w(µ2− µ1) µ12− σY2− µY2 > −3(1 − w)(µ2− µ1) µ22− σY2− µY2  ⇔ w µ12− σY2− µY2 < −(1 − w) µ22− σY2− µY2  ⇔ wµ12+ (1 − w)µ22< σY2+ µY2.

By Jensen’s inequality it follows that

µY2 = (wµ1+ (1 − w)µ2)2 ≤ wµ12+ (1 − w)µ22< σY2+ µY2

⇔ σY2> 0 ,

which is true by definition.

To sum up case (i) and (ii) for the implicit variances intermediately, it was shown that the solutions (σ12)∗ and (σ22)∗are positive for some weight w ∈ (0 , 1), irrespective of the location of individual means of the two normal density components. Consequently, p(y) = wπ1(y) + (1 − w)π2(y) will be a PDF. Although we may be tempted

to conclude that the component variances are positive for all w ∈ (0 , 1), it is possible to find counterexamples relatively easily. For instance, a set of chosen parameters w, µ1and µ2satisfying (5a) for given µY, σY2and γ1may

result in implicit variances of opposite signs, or being jointly negative.

By expressing µ2, σ12and σ22 in terms of µ1 and w as demonstrated in the previous section, the original problem

in five dimensions could be reduced to a two-dimensional problem, i.e. the moment matching equation arising from (15) and another higher moment equation in its reduced from, e.g. (5e) or (5f) after incorporating the implicit functions. The remaining equation(s) will merely base on unknown µ1and w, whereby the latter is either arbitrarily

chosen (four-moment fitting) or explicitly computed (five-moment fitting). Based on these thoughts, it is (for now) sufficient to generally describe the set, for which a real-valued solution exists by

(19)

representing a vector field within the x-y-plane that contains all points (µ1, w) for which the implicit component

variances are jointly positive.

The remainder of this section deals with the derivation and the verification of a narrower range for w as a subset between zero and one. For this purpose, I intend to express w itself in terms of µ1 using the implicit expressions

for both component variances and derive relations w.r.t. w that assure their existence. As will become apparent, a further reduction of the dimension is not possible without introducing an additional parameter, thus reinforcing the validity of (16). Yet, the derived results may lead to a better understanding, or intuition, regarding the location of the bounds for w as well as the compactness of a closer solution set.

4.3 Implicit inequalities

Using (13) and (14), I derive two implicit expressions for w in terms of µ1in case (i), and subsequently summarize

the respective conditions for case (ii).

Reconsider the condition

3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1,

which has to hold for (σ12)∗> 0 in case (i) (see section 4.2).

Inserting (6) and some algebra

3w  µY− wµ1 1 − w  − µ1  µ12− σY2− µY2 + 2 wµ13+ (1 − w)  µY− wµ1 1 − w 3! < δ1 ⇔ 3w  µY− µ1 1 − w  µ12− σY2− µY2 + 2  w(1 − w)2 µ13+ (µY− wµ1)3 (1 − w)2  < δ1 ⇔ 3w(1 − w) (µY− µ1) µ12− σY2− µY2 + 2 w(1 − w)2µ13+ (µY− wµ1)3 < (1 − w)2δ1,

for which I define (for reasons of space)

ε := − (µY− µ1) σY2+ −µY3+ µY2µ1+ µYµ12− µ13 ,

lead to the following relations:

3w(1 − w)ε + 2 w(1 − 2w + w2)µ13+ (µY3− 3wµ1µY2+ 3w2µ12µY− w3µ13) < (1 − w)2δ1

⇔ 3w(1 − w)ε + 2 µY3− 3wµ1µY2+ 3w2µ12µY+ w(1 − 2w)µ13 < (1 − w)2δ1

⇔ 3(w − w2)ε − 6wµ1µY2+ 6w2µ12µY+ 2w(1 − 2w)µ13< (1 − w)2δ1− 2µY3

(20)

I define the factor terms in the last inequality as follows 3ε − 6µ1µY2+ 2µ13 = −3 (µY− µ1) σY2− 3µY3− 3µY2µ1+ 3µYµ12− µ13 = −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3 =: ε1, and − 3ε + 6µ2 1µY− 4µ13 = 3 (µY− µ1) σY2+ 3µY3− 3µY2µ1+ 3µYµ12− µ13 = 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 =: ε2,

such that (σ12)∗> 0 holds, if the following quadratic inequality w.r.t. w is satisfied

ε2w2+ ε1w< (1 − w)2δ1− 2µY3

⇔ (ε2− δ1)w2+ (ε1+ 2δ1)w < δ1− 2µY3.

For (σ22)∗> 0 in case (i) it simultaneously has to hold that

−3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1,

(see section 4.2). Inserting (6) and some rearrangements

−3(1 − w)  µY− wµ1 1 − w  − µ1   µY− wµ1 1 − w 2 − σ2 Y− µY2 ! + 2 wµ13+ (1 − w)  µY− wµ1 1 − w 3! > δ1 ⇔ −3 (µY− µ1)  (µY− wµ1)2− (1 − w)2(σY2+ µY2) (1 − w)2  + 2 w(1 − w) 2 µ13+ (µY− wµ1)3 (1 − w)2  > δ1 ⇔ −3 (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) + 2 w(1 − w)2µ13+ (µY− wµ1)3 − (1 − w)2δ1> 0 ,

for which I define

η := (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) , lead to −3η + 2 w(1 − 2w + w2)µ13+ (µY3− 3wµ1µY2+ 3w2µ12µY− w3µ13) − (1 − w)2δ1> 0 ⇔ −3η + 2 µ3 Y− 3wµ1µY2+ 3w2µ12µY+ w(1 − 2w)µ13 − (1 − w)2δ1> 0 ⇔ −3η − 6wµ1µY2+ 6w2µ12µY+ 2w(1 − 2w)µ13− (1 − w)2δ1+ 2µY3> 0 ⇔ −3η + w − 6µ1µY2+ 2µ13 + w2 6µ12µY− 4µ13 − (1 − w)2δ1+ 2µY3> 0 ,

(21)

which, again, is quadratic in w, where −3η = −3 (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2)  = −3 (µY− µ1) (µY2− 2wµ1µY+ w2µ12) − (−2w + w2)(σY2+ µY2) + 3 (µY− µ1) (σY2+ µY2) = −3 (µY− µ1) 2w(σY2+ µY2− µYµ1) + w2(µ12− σY2− µY2) + 3 (µY− µ1) σY2 = −6w σY2(µY− µ1) + µY3− 2µY2µ1+ µYµ12 − 3w2 µYµ12− µ13− µY3+ µ1µY2− (µY− µ1) σY2  + 3 (µY− µ1) σY2 = w −6σ2 Y(µY− µ1) − 6µY3+ 12µY2µ1− 6µYµ12 + w2 −3µYµ12+ 3µ13+ 3µY3− 3µ1µY2+ 3 (µY− µ1) σY2  + 3 (µY− µ1) σY2.

Inserting the last expression, we see that (σ22)∗> 0 holds, if the following inequality w.r.t. w is satisfied −3η + w − 6µ1µY2+ 2µ13 + w2 6µ12µY− 4µ13 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2+ w −6µ1µY2+ 2µ13− 6σY2(µY− µ1) − 6µY3+ 12µY2µ1− 6µYµ12  +w2 6µ12µY− 4µ13− 3µYµ12+ 3µ13+ 3µY3− 3µ1µY2+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2+ w −6σY2(µY− µ1) − 6µY3+ 6µY2µ1− 6µYµ12+ 2µ13 +w2 3µY3− 3µ1µY2+ 3µ12µY− µ13+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2− 2w 3σY2(µY− µ1) + 2µY3+ (µY− µ1)3  +w2 2µY3+ (µY− µ1)3+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3

in which we can insert ε2, such that

⇔ ε2w2− 2ε2w+ 3 (µY− µ1) σY2> (1 − w)2δ1− 2µY3.

⇔ (ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2> δ1− 2µY3.

Summarizing the positivity requirements for the component variances in case (i) it jointly has to hold that (ε2− δ1)w2+ (ε1+ 2δ1)w < δ1− 2µY3, and

(ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2> δ1− 2µY3.

In case (ii), the condition µ1> µ2leads to a mere inversion of the inequality sign for w ∈ (0 , 1), such that

(ε2− δ1)w2+ (ε1+ 2δ1)w > δ1− 2µY3, and

(22)

4.4 A narrow range

Subsequently, I examine the solution sets implied by the inequalities in case (i); again, case (ii) will be summarized at last.

Let us first assume that ε2> δ1 (the inversion ε2< δ1 will be briefly discussed at the end of the case analysis for

(i)), i.e. ε2> δ1 ⇔ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3> (2µY3− σY3γ1) ⇔ (µY− µ1)  3σY2+ (µY− µ1)2  + σY3γ1> 0 ,

This statement is generally true if γ1> 0, since µY > µ1holds in case (i) (see positivity of the non-central second

moments as outlined in 4.1). Using the implicit inequalities in (i) as derived in 4.3, it follows that

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) < 0 (17) and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 . (18)

Consider first (17), for which I define the coefficients

a1:= (ε1+ 2δ1) (ε2− δ1) and a2:= − (δ1− 2µY3) (ε2− δ1) , such that w2+ a1w+ a2< 0 , with roots w1/2= −a1 2 ± r a1 2 2 − a2, and domain w∈ −a1 2 − r a1 2 2 − a2 , −a1 2 + r a1 2 2 − a2 ! := (wa, ¯wa) .

For the roots (or bounds) and thus for (σ12)∗to exist, it has to hold that a1

2 2

− a2> 0 .

Presuming that this is the case, we desire (in accord with the initial assumption w ∈ (0 , 1)) that both bounds are strictly positive, i.e. w1/2> 0, or more specifically wa> 0. On top of the condition for the roots to exist, this holds

(23)

if additionally a1< 0 and a2> 0 is satisfied. For a2> 0 we can rewrite ⇔ a2 = − (δ1− 2µY3) (ε2− δ1) > 0 ⇔ − δ1− 2µY3 > 0 ⇔ − σY3 2  µY σY 3 − γ1 ! − 2µ3 Y ! > 0 ⇔ σY3γ1> 0 ⇔ γ1> 0 ,

i.e. a2> 0 indeed corresponds to a positive skewness in case (i) as motivated by ε2> δ1(see above).

The condition a1< 0 can be rewritten as

a1 = (ε1+ 2δ1) (ε2− δ1) < 0 ⇔ ε1+ 2δ1< 0 ⇔ −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3+ 2 2µY3− σY3γ1 < 0 ⇔ −3 (µY− µ1) σY2+ (µY− µ1)3− 2σY3γ1< 0 ⇔ −3 2(µY− µ1) σ 2 Y+ 1 2(µY− µ1) 3 < σY3γ1,

which can be used to show that a1has a lower bound:

a1 = (ε1+ 2δ1) (ε2− δ1) > −1 ⇔ ε1+ 2δ1> −(ε2− δ1) ⇔ ε1+ ε2> −δ1 ⇔ −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3+ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3> − 2µY3− σY3γ1  ⇔ 2 (µY− µ1)3> σY3γ1, as −3 2(µY− µ1) σ 2 Y+ 1 2(µY− µ1) 3 < σY3γ1,

we can imply that

⇒ 3 2(µY− µ1) σ 2 Y+ 3 2(µY− µ1) 3> 0 ⇔ (µY− µ1)  σY2+ (µY− µ1)2  > 0 ,

(24)

The condition a1> −1 implies a (non-binding) upper bound for a2of 1 4 > a1 2 2 > a2 > 0 . In summary, we have a1∈ (−1 , 0) and a2∈  0 , a1 2 2 , such that the roots of

w2+ a1w+ a2= 0

lie between zero and one (for given µ1∗).

The upper bound of the domain in (17) confirms this statement. It holds that ¯ wa = − a1 2 + r a1 2 2 − a2< 1 ⇔ r a1 2 2 − a2< 1 + a1 2 ⇔ a1 2 2 − a2< 1 + a1+ a1 2 2 ⇔ 0 < 1 + a1+ a2 ⇔ 0 < 1 +(ε1+ 2δ1) (ε2− δ1) −δ1− 2µ 3 Y (ε2− δ1) ⇔ 0 < (ε2− δ1) + (ε1+ 2δ1) − (δ1− 2µY3) ⇔ 0 < ε2+ ε1+ 2µY3 = 2(µY− µ1)3,

which is true by µY > µ1in case (i). For the lower bound in (17) we find that

wa = −a1 2 − r a1 2 2 − a2> 0 ⇔ −a1 2 > r a1 2 2 − a2 ⇔ a2> 0 ,

which has been initially assumed. We can thus conclude that a narrower domain of the quadratic inequality (17) is given by

w∈ (wa, ¯wa) ⊂ (0 , 1) ,

being a cut set of the original interval.

It has to be emphasized that arbitrary parametric choices for w in a wide range of (0 , 1) will not always guarantee that (σ12)∗exists. This will only be true, if the chosen weight w lies in its narrow range, i.e. between waand ¯wa. To

(25)

Using (18), I define w2− 2w + b > 0 , with b := 3 (µY− µ1) σ 2 Y (ε2− δ1) −(δ1− 2µ 3 Y) (ε2− δ1) =: a3+ a2.

It can be shown that the intercept b lies between zero and one:

b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 ⇔ 3 (µY− µ1) σY2− 2µY3− σY3γ1− 2µY3 > 0 ⇔ 3 (µY− µ1) + σYγ1> 0 , b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 1 ⇔ 3 (µY− µ1) σY2− δ1+ 2µY3< (ε2− δ1) ⇔ 3 (µY− µ1) σY2+ 2µY3< ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3< 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 < (µY− µ1)3 .

Both statements are true, since µY> µ1and γ1> 0 hold in case (i) as shown above.

The (non-negative) roots of the quadratic equation arising from (18) are given by

w1,2= 1 ±√1 − b with b∈ (0 , 1) ,

and the domain of this inequality computes to

w∈ (1 −/ √1 − b , 1 +√1 − b) := (wb, ¯wb) .

Regarding the upper bound ¯wb> 1 and the initial assumption w ∈ (0 , 1), the relevant domain of w in (18) thus

reduces to

w∈ (0 , wb) ⊂ (0 , 1) ,

again representing a cut set of the original interval.

Combining this result with the domain in (17), we find the closest possible range for w as a cut set of the domains of both inequalities, i.e.

(26)

Henceforth, I compare the location of the bounds. Since ¯wb> 1, it remains to be shown for the lower bound in (17)

that

wa< wb

and for the upper bound in (17) either ¯

wa< wb, or w¯a> wb

is true, such that the narrow range always exists.

Proof We have b = 3 (µY− µ1) σ 2 Y (ε2− δ1) −(δ1− 2µ 3 Y) (ε2− δ1) = a3+ a2, with a3:= 3 (µY− µ1) σ 2 Y (ε2− δ1) > 0 in (i) , such that 1 − b = 1 − a3− a2 = 1 − 3 (µY− µ1) σY2− (δ1− 2µY3) (ε2− δ1) = (ε2− δ1) − 3 (µY− µ1) σ 2 Y+ (δ1− 2µY3) (ε2− δ1) = (µY− µ1) 3 (ε2− δ1) , so we can express a1as a1 = (ε1+ 2δ1) (ε2− δ1) = −3 (µY− µ1) σ 2 Y− 4µY3+ (µY− µ1)3+ 2δ1 (ε2− δ1) = −3 (µY− µ1) σ 2 Y (ε2− δ1) +(µY− µ1) 3 (ε2− δ1) +2(δ1− 2µ 3 Y) (ε2− δ1) = −a3+ 1 − a3− a2− 2a2 = 1 − 2a3− 3a2, or equivalently a3 = 1 − a1− 3a2 2 ,

which allows us to rewrite b in terms of parameters of inequality (17):

b = 1 − a1− a2

(27)

Thus, we can verify for the lower bound in (17) that wa = −a1 2 − r a1 2 2 − a2 < 1 2− r 1 4− a2 < 2 2− r 4 4− a2 < 1 −p1 − a3− a2 = 1 − √ 1 − b = wb,

which is a true statement since a1∈ (−1 ; 0) and a3> 0.

Now we regard the point of intersection of the quadratic functions in (17) and (18) w2+ a1w+ a2= w2− 2w + 1 − a1− a2 2 ⇔ (a1+ 2)w = 1 − a1− 3a2 2 ⇔ w˜ =1 − a1− 3a2 2(a1+ 2) = 3 2 (1 − a2) (2 + a1) −1 2 , having a function value of

q( ˜w) = ( ˜w)2− 2 ˜w+ b = ( ˜w− 1)2+ b − 1 = 3 2 (1 − a2) (2 + a1) −3 2 2 +1 − a1− a2 2 − 1 =9 4  (1 + a1+ a2) (2 + a1) 2 −1 + a1+ a2 2 .

It is of interest to identify the sign of this value, i.e. we ask

q( ˜w) T 0 ⇔ 9 4  (1 + a1+ a2) (2 + a1) 2 T1 + a1+ a2 2 ⇔ 9(1 + a1+ a2) T 2(2 + a1)2. If a1→ 0 then 9(1 + a2) > 8 ,

holds for all a2∈ (0 , 14).

However, if a1→ −1 then

9a2< 2 ,

(28)

That is, depending on the magnitude of a1, the function value q( ˜w) is either positive or negative. This means that

the point of intersection of the two upward opening parabolas lies either to the left or to the right of wb, implying

that either

¯

wa< wb, or w¯a> wb.



Thus we can conclude that the closest possible range for w is given by a cut set, for which (17) and (18) are jointly satisfied, i.e.

w∈ (wa, min{ ¯wa; wb}) ⊂ (0 , 1) ,

with bounds wa , ¯waand wb as implicit functions of µ1 only (for given empirical moments up to m3), which can

be obtained via substitution using ε1and ε2via a1, a2and a3. For reasons stated in the next subsection, an explicit

expression for this narrow range is not intended.

Let us now, still in case (i), discuss if we would have assumed that ε2< δ1. First of all, this leads to an inversion of

the inequality signs in (17) and (18) such that

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) > 0 and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 0 , or alternatively w2+ a1w+ a2> 0 , and w2− 2w + b < 0 .

Starting by the second inequality, we find a lower bound for the intercept of

b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 1 ⇔ 3 (µY− µ1) σY2+ 2µY3< ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3< 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 < (µY− µ1)3 ,

(29)

We immediately see that the second inequality will never be satisfied in this instance, as

w2− 2w + b = (w − 1)2+ (b − 1) < 0

does not hold for b > 1 .

Consequently, it is impossible to determine a real-valued solution to the implicit functions as given above in this sub-case (ε2< δ1), since (σ22)∗will never be defined, let a alone the desired PDF.

Turning to case (ii) with w ∈ (0 , 1) and µ1> µ2, these requirements for positive variances were derived in 4.3:

(ε2− δ1)w2+ (ε1+ 2δ1)w > δ1− 2µY3, and

(ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2< δ1− 2µY3.

We again consider two sub-cases: ε2> δ1and ε2< δ1. In sub-case ε2> δ1we find

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) > 0 , and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 0 , or alternatively w2+ a1w+ a2> 0 , and w2− 2w + b < 0 .

The examination of the intercept b leads again to a contradiction. It holds that

b := 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 1 ⇔ 3 (µY− µ1) σY2+ 2µY3> ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3> 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 > (µY− µ1)3 ,

which is a true statement, since µY< µ1holds in case (ii) (see above), such that

w2− 2w + b = (w − 1)2+ (b − 1) < 0

(30)

To derive bounds for weight w that do not a priori encompass violations regarding the requirements (σ12)∗> 0 and (σ2

2)∗> 0 it is vital to assume that in case (ii) holds

ε2< δ1 ⇔ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)2< (2µY3− σY3γ1) ⇔ (µY− µ1)  3σY2+ (µY− µ1)2  + σY3γ1< 0 ,

which is generally true if γ1< 0 .

This leads us to the same initial setting as in sub-case ε2> δ1of (i) with

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) < 0 and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 .

Solving these inequalities reproduces the same calculus as already presented in case (i), and reconfirms the same narrow range for w, i.e. w ∈ (wa, min{ ¯wa; wb}), however with one important difference, namely, case (ii) reflects

all instances for which γ1< 0, compared to γ1> 0 in case (i).

In conclusion, we can safely say that the component variances always exist, if w is chosen, or calibrated, in a real subset between zero and one. This holds irrespective of the sign of the difference in component means. Thus, for all w, being elements of the narrow range, the distribution parameters µ1and implicit µ2, σ1and σ2are unrestrainedly

capable to match the empirical skewness.

4.5 The solution set

As motivated above, the narrow range w was implicitly expressed in terms of µ1for given empirical characteristics

of Y , captured in µY, σY, δ1and δ2. The width of this range is given by the difference between the upper and the

lower bound, i.e.

min{ ¯wa; wb} − wa,

and is influenced by the magnitude of the empirical moments up to m3. The bounds themselves cannot be attained,

as this would imply that either (σ12)∗or (σ22)∗is equal to zero, such that one normal density component as well as the desired PDF are not defined then.

(31)

The explicit choice of w assuring that it ranges in (wa, min{ ¯wa; wb}) ⊂ (0 , 1) requires the introduction of an

additional parameter, which links the two implicit bounds in a functional form. There are multiple ways to achieve this. For instance, for an s ∈ (0 , 1) we can think of an weighted arithmetic average of the two bounds, such that we can define our parametric choice as

w:= g(µ1; s)

= s · wa + (1 − s) · min{ ¯wa; wb} .

An alternative could be a weighted geometric average in the form of

w:= g(µ1; s)

= (wa) s

· (min{ ¯wa; wb})1−s .

By construction, both parameterized functions, g(µ1; s), increase monotonously from the lower bound, wa, to the

upper bound, min{ ¯wa; wb}, when s gradually increases. The exact functional form, however, is unknown.

At this point, the question that arises is whether it is actually necessary to introduce an additional parameter. The answer is no. Expressing both bounds implicitly reduces the dimension: w is lost. If we were to make use of these bounds explicitly, it is inevitable to connect them using an additional parameter, such that the original solution set remains intact. Both steps are superfluous when it comes to practical implementations. Apart from that, the high functional complexity in w := g(µ1; s) considerably slows down the efficieny of the algorithm.

The purpose of the last two subsections was to verify that component variances exist for a real w in an open subset between zero and one, which I call W in what follows. By comparing the location of the implicit bounds, we have learned that this narrow range always exists. The bounds and width of the corresponding interval are determined by the empirical moments of Y . Since the dimension cannot be further reduced, the solution set is fully represented by the vector field

(µ1, w) | (σ12)∗(µ1, w ; µY, σY2, δ1) > 0 ∧ (σ22)∗(µ1, w ; µY, σY2, δ1) > 0 ,

where (σ2

1)∗and (σ22)∗are given in (13) and (14), and w ∈ W ⊂ (0 , 1) .

5 Uniqueness of the solution

In the following, I investigate the moment matching function (15), which incorporates the implicit expressions for µ2, σ1and σ2: f(µ1) = 3w  (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4  (µY− µ1)2 −  (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 .

(32)

The goal is to verify that there exists a unique solution in the x-y-plane to the equation f (µ1; w) = 0, such that

component means and variances match the first four empirical moments of Y . We are dealing with a polynomial of the sixth degree in µ1for given µY, σY, γ1 and γ2, and some chosen w assuring the existence of the component

variances. Strictly speaking, the latter is a variable in the context of the non-linear system of equations (5a) to (5f). For the purpose of the proof, I treat it as a constant, merely satisfying w ∈ W ⊂ (0 , 1), such that (σ12)∗, (σ22)∗and not least f exist, which itself bases on the fact that the implicit variances are defined. I allow w to vary in section 7.

Looking at the term with highest exponent in f , we find a positive coefficient for µ16: 3w 2wµ14(1 − w)3+ 2w4µ14µ12− − 2wµ13(1 − w)2+ 2w3µ132> 0 ⇔ 3w2 2(1 − w)3+ 2w3µ16− w2µ16 − 2(1 − w)2+ 2w22> 0 ⇔ 3 2(1 − w)3+ 2w3 − − 2(1 − w)2+ 2w22> 0 ⇔ 3 2 − 6w + 6w2 − − 2 + 4w2> 0 ⇔ 3 − 9w + 9w2− 2 + 8w − 8w2> 0 ⇔ 1 − w + w2> 0 ⇔ w+ (1 − w)2> 0 ,

which is true for all w ∈ W ⊂ (0 , 1). That is, f is an upward opening function.

We will start by discussing the conditions under which f has exactly one real-valued solution. According to the fundamental theorem of algebra, a polynomial of order n has n complex roots and up to n real-valued roots. For a unique solution, f must have a “multiple” root at µ1∗, in the sense that the other (n − 1) = 5 roots are collapsing in one single point. Since f is upward opening, a unique solution here also implies that f must have a global minimum at µ1∗, such that f (µ1) > 0 for all µ16= µ1∗, and f (µ1∗) = 0 for µ1= µ1∗, since the highest exponent is even. Both

conditions, i.e. a unique root representing the global minimum, will be sufficiently satisfied, if we are able to show that f is a convex function (or − f is concave). This behavior (or shape) implies that the intercept of horizontally shifted f (µ1− µ1∗) is equal to zero, such that the root of the latter function coincides with the origin of ordinates.

Proof

To verify convexity, I set f equal to zero and rearrange f(µ1) = 0 0 = 3w  (δ2+ 2wµ14)(1 − w)3+ 2(1 − w)4 (µY− wµ1)4 (1 − w)4  ((wµ1+ (1 − w)µ2) − µ1)2 −  (δ1− 2wµ13)(1 − w)2− 2(1 − w)3 (µY− wµ1)3 (1 − w)3 2 = 3w(1 − w)4 δ2+ 2wµ14+ 2(1 − w)µ24 (µ2− µ1)2 − (1 − w)4 δ1− 2wµ13− 2(1 − w)µ23 2 .

(33)

Further, I redefine f as f =: z1f1 − z2f2, with z1:= 3w(1 − w)4(µ2− µ1)2 > 0 , w∈ (0 , 1) z2:= (1 − w)4 > 0 , f1:= (δ2+ 2wµ14+ 2(1 − w)µ24) , f2:= δ1− 2wµ13− 2(1 − w)µ23 2 .

Since z1is a positive factor, we can verify that f1is convex. Define ˜f1(µ) := 2µ4+ δ2(being convex). By Jensen’s

inequality it follows that

f1= wδ2+ (1 − w)δ2+ 2wµ14+ 2(1 − w)µ24

=: w ˜f1(µ1) + (1 − w) ˜f1(µ2)

≥ ˜f1(wµ1+ (1 − w)µ2)

= ˜f1(µY) .

Since z2is positive, f2is convex. Define f2:= q ˜f2(µ) = ˜f2(µ)

2

with ˜f2(µ) := −2µ3+ δ1(the latter is neither

convex nor concave). We find that

f2= wδ1+ (1 − w)δ1− 2wµ13− 2(1 − w)µ23 2 =: w ˜f2(µ1) + (1 − w) ˜f2(µ2) 2 =: qw ˜f2,1+ (1 − w) ˜f2,2  ≤ wq ˜f2,1 + (1 − w)q ˜f2,2 ,

where q (or equivalently f2) is a quadratic function (being convex) evaluated at points ˜f2,1and ˜f2,2, which represent

cubic transformations of µ1and µ2, respectively.

Based on the fact that f is upward opening, we can imply that the scaled difference between convex functions f1

and f2, i.e. f = z1f1− z2f2, is also convex. This is also supported by the symmetry of ˜f1(µ) and ˜f2(µ), which are

axis-symmetric and point-symmetric at zero, respectively. This feature will be preserved under multiplication with z1, z2, w and (1 − w), such that the linear transformation µ2= (µY− wµ1)/(1 − w) should allow the conclusion that

f is axis-symmetric at µ1∗; or at least its parent function(s) at zero.

(34)

Now, I examine the intercept of f shifted by µ1∗. Using (15), I define the centered function as f(µ1− µ1∗) = 3w  δ2+ 2w(µ1− µ1∗)4(1 − w)3+ 2 µY− w(µ1− µ1∗) 4 µY− (µ1− µ1∗) 2 − δ1− 2w(µ1− µ1∗)3(1 − w)2− 2 µY− w(µ1− µ1∗) 32 =: c(w) + ˜f(µ1− µ1∗) .

A unique solution for f (µ1) exits, if the intercept c(w) of f (µ1− µ1∗), representing the terms that are independent

of µ1, is equal to zero, i.e.

c(w) = 3w δ2+ 2w(µ1∗)4(1 − w)3+ 2 µY+ wµ1∗ 4 (µY+ µ1∗)2 − δ1+ 2w(µ1∗)3(1 − w)2− 2 µY+ wµ1∗ 32 ! = 0 ,

The question that arises at this point is whether the intercept c(w), a function of w ∈ W ⊂ (0 , 1), intersects the x-axis after all. This will be proven next. As a remark, even if c(w) exhibits multiple roots, w∗, the solution to (15) will be unique in any case, since f (µ1− µ1∗) will touch the x-axis in its global minimum at zero, and so will f (µ1)

at µ1∗. Yet, multiple roots would each involve a different set of parameters µ1∗, as well as implicitly defined µ2∗, σ1∗, and σ2∗, which by construction match the first four moments.

Proof

We regard the domain of co-ordinates c(w). Taking limits w.r.t. w in a wide range of (0 , 1) leads to limw→0 c(w) = −(δ1− 2µY3)2 = −σY6γ12 < 0 , if γ16= 0 , and

limw→1 c(w) = 2(µY+ µ1∗)6 > 0 , since µ1∗ 6= µ2∗ 6= µY ,

which allows us to conclude that w.l.o.g.

∃ w∗∈ W ⊂ (0 , 1) s.t. c(w∗) = 0 ,

since c is continuous in w.

 That is, there exists a w∗ within the open subset, for which the intercept of convex f (µ1− µ1∗) is equal to zero,

such that the root µ1∗of f (µ1) can be determined for any skewed parametrization regarding the first four empirical

moments of Y . Note that for non-skewed variables the calibrated component means µ1∗and µ2∗would be equal; this is, however, not possible in this algorithm, as the implicit variances are not defined then (matrix A is singular then; see section 3).

(35)

6 The formal framework to fit four moments − An algebraic summary

This section summarizes the central results and equations from previous sections that need to be considered in a four moment fitting procedure.

The root µ1∗of the function f(µ1) := 3w  (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4  (µY− µ1)2−  (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 ,

with observed parameters (calculated from empirical data)

δ1= σY3 2  µY σY 3 − γ1 ! , δ2= σY4 γ2+ 2 µY σY 2γ1−  µY σY 3!! ,

and some chosen w ∈ W ⊂ (0 , 1) such that component variances are positive, i.e.

(µ∗

1, w) | (σ12)∗(µ1∗, w ; µY, σY2, δ1) > 0 ∧ (σ22)∗(µ1∗, w ; µY, σY2, δ1) > 0

assures via implicit expressions

µ2∗= µY− wµ ∗ 1 (1 − w) , (σ12)∗= σY2+ µY2+  δ1− 2 w(µ1∗)3+ (1 − w)(µ2∗)3  3w(µ2∗− µ∗ 1) − (µ1∗)2, (σ22)∗= σY2+ µY2−  δ1− 2 w(µ1∗)3+ (1 − w)(µ2∗)3  3(1 − w)(µ2∗− µ∗ 1) − (µ2∗)2, that the PDF p(y) = w · 1 σ1∗ √ 2πe −(y−µ∗1) 2 2(σ 21)∗ + (1 − w) · 1 σ2∗ √ 2πe −(y−µ∗2) 2 2(σ 22)∗ with probability

P(y) = w · N(y ; µ1∗, σ1∗) + (1 − w) · N(y ; µ2∗, σ2∗)

(36)

6.1 Numerical solution

It is easy to see that the derivation of a closed-form solution in this framework is not trivial. For practical purposes, I will focus on a numerical solution, whereby the root µ1∗ is approximated in an iterative procedure. This can be done by the Newton-Raphson method. According to this, a closer approximation to the root is given by

µ1,n+1= µ1,n−

f(µ1,n; w)

f0(µ1,n; w)

in the (n + 1)-th iteration, where

µ1,1= µ1,0−

f(µ1,0; w)

f0 1,0; w)

is calculated from a chosen start value µ1,0, each for a fixed w as well as given empirical moments. The procedure

is repeated until convergence is reached.

To get an idea of the narrow range, the procedure is performed for all w in a wide range between zero and one in steps of one-hundredth, thereby disregarding complex solutions. This allows us to generate a family of PDFs with shapes that vary in w ∈ W ⊂ (0 , 1), yet each matching the first four empirical moments of Y . The next section takes a closer look at this.

7 Parametric choices and their implications − Graphical illustrations

We have seen that the component variances exist, and that it is possible to find a real-valued solution to (15), which incorporates (5a) to (5d), if and only if weight w is an element of the vector field given in (16). Thereby, the root µ1∗and implicit expressions for µ2∗, (σ12)∗ and (σ12)∗, evaluated at µ1∗and an arbitrarily chosen w ∈ W ⊂ (0 , 1), allows us to capture the first four empirical moments.

The choice of w has two crucial implications. First, w ∈ W guarantees that the implicit standard deviations can be evaluated. Second, each w ∈ W corresponds to a different set of implied higher moments for each fitted density. This is caused by the evolution of moments of the normal distribution given by

E[Uk] = k

i=0 k i  µk−iσi 1 + (−1) i 2  (i − 1)!! .

In particular, a marginal change of w causes f to shift (see (15)), such that we find another µ1∗and thus different values for implicit µ2∗, (σ12)∗and (σ12)∗. That is, the location and the dispersion of either normal density component changes in such a way that the marginal effects on the distribution parameters do not cancel each other out for the implied higher moments beyond m4. The latter are functions of w for fitted four-moment parameters µ1∗, µ2∗, (σ12)∗

(37)

and (σ22)∗. Each w ∈ W thus corresponds to a different PDF that matches the first four moments, yet every curve of the array has different theoretical higher moments. In this sense w is the only free parameter left, and it can be used to fit a fifth moment.

In order to obtain a family of PDFs, we require seven separate functions in Matlab: implicit mean (6) and variances (13) and (14), the moment matching function (15), nested empirical parameters δ1 and δ2, as well as the density

p(y). The functions are appended in A.1. Now, we let w move from zero to one in steps of one-hundredth. In each step, we approximate the root of f , evaluate the implicit functions at µ1∗and w, redefine the PDF and calculate the moments up to m6numerically. To distinguish graphs, a color-shift can be programmed. The loop can be found in

appendix A.2.

To illustrate the impact of the higher moments, I present three families of PDFs for each different combinations of the mean, standard deviation, skewness and excess kurtosis. In particular, I plot graphs of an almost non-skewed, leptokurtic family, an array of curves with negative skewness and no excess kurtosis, as well as a skewed, platykurtic family, corresponding to Figure 1, 2, and 3, respectively. Some general points can be made here. First, the bendiness of the mixed Gaussian distribution is able to reflect various different shapes. This is a natural consequence of the five-dimensional parameter space. Second, there appears to be a connection between the moments and the modality of a distribution, e.g. uni-modal curves were obtained for non-skewed variables with high excess kurtosis, whereas skewed variables with low excess kurtosis resulted in bi-modal densities. This demonstrates that the algorithm can be applied to various data sets in economics and finance, such as income and return distributions. Last and most notably, the high variability in either family of curves indicates that even slightest changes in the higher moments, caused by extreme observations, can have a dramatic impact on the shape of the density.

The implied evolution of higher moments is not only a reflection of the specific choice of weight w, but also the location and width of the interval W . The latter, and thus the freedom by which the two densities can be combined, is determined by the data itself, namely, the first four empirical moments. Incidentally, for leptokurtic variables I noticed that w tended to be closer to zero; for platykurtic variables w was closer to1/2. Also, very high or low values

for skewness and/or excess kurtosis resulted in a narrower range for w; occasionally, the width of the interval was less than1/100. Thus, the choice of w in an a priori wider subset should naturally be accompanied by a wider set of

possible density shapes, even though the first four moments are properly represented.

To limit the ambiguity arising from parametric choices, it is necessary to align w with the data, i.e. a fifth moment needs to be captured. This will be the goal of the next two sections. Section 8 analyzes possible constraints and the behavior of the implied higher moments when w changes. Further, I outline how the algorithm can be modified in order to find solutions to both µ1∗and w∗. In section 9, I calibrate probabilities to a real data set using a five-moment fitting procedure and test the results.

(38)
(39)
(40)
(41)

8 Fitting a fifth moment

The requirement to fit a fifth moment is owed to the fact that parametric choices for w entail unrealistic assumptions regarding the implied evolution of higher moments. In what follows, I analyze the capability of the algorithm to reflect more information of the data, primarily captured in the tails. The focus will be the fifth and sixth moment of Y , or in standardized terms: the hyper-skewness and hyper-flatness.

The formal definition of the four moments distribution (and its implied data generating process), with PDF

p(y) = w · 1 σ1∗ √ 2πe −(y−µ∗1) 2 2(σ 21)∗ + (1 − w) · 1 σ2∗ √ 2πe −(y−µ∗2) 2 2(σ 22)∗ ,

does not impose restrictions on higher moments. This can be easily visualized when keeping four parameters fixed, e.g. w, µ2, σ12 and σ22. Since µ1 can be any real number, any possible value for an odd moment can be obtained,

once we let µ1move from minus to plus infinity. The same holds for even moments in R+. This is a consequence of

the normal distribution, which has a support y ∈ R. Knowing that w ∈ W ⊂ (0 , 1) will not make this any different, as W is not closed. If w approaches a bound of the narrow range either σ12or σ22converges to a real number or to zero. This partial reduction will be compensated by a respective change of the remaining parameters when fitting four moments, whereby the implied higher moments change accordingly. In other words, it is possible to find µ1∗ and w∗such that the first four and a fifth moment can be described by p(y).

8.1 Marginal changes in described and implied moments

Now, I examine the behavior of the implicit functions as well as the modelled moments when w changes. Starting by the implicit mean given in (6), the first derivative w.r.t. w computes to

∂ µ2 ∂ w = (−µ1− w∂ µ∂ w1)(1 − w) + (µY− wµ1) (1 − w)2 = − w (1 − w) ∂ µ1 ∂ w + (wµ1+ (1 − w)µ2− µ1) (1 − w)2 . = − w (1 − w) ∂ µ1 ∂ w + (µ2− µ1) (1 − w) . (19)

The empirical mean as modelled in (5a) is invariant in w. This can be shown by applying the total derivative: ∂ µY(w, µ1, µ2) ∂ w = ∂ µY ∂ w ∂ w ∂ w+ ∂ µY ∂ µ1 ∂ µ1 ∂ w + ∂ µY ∂ µ2 ∂ µ2 ∂ w = (µ1− µ2) · 1 + w · ∂ µ1 ∂ w + (1 − w) ·  − w (1 − w) ∂ µ1 ∂ w + (µ2− µ1) (1 − w)  = 0 .

(42)

Regarding the component variances, I define τ :=  δ1− 2 w(µ1)3+ (1 − w)(µ2)3  3w(µ2− µ1) ,

such that their implicit expressions, given in (13) and (14), can be written as

σ12= σY2+ µY2+ τ − (µ1)2, and

σ22= σY2+ µY2− w

(1 − w)τ − (µ2)

2.

Taking derivatives w.r.t. w for the first component variance leads to ∂ σ12 ∂ w = ∂ τ ∂ w− 2µ1 ∂ µ1 ∂ w =  − 2 (µ1)3− (µ2)3+ 3w(µ1)2 ∂ µ∂ w1+ 3(1 − w)(µ2)2 ∂ µ∂ w2  3w(µ2− µ1) (3w(µ2− µ1))2 −  δ1− 2 w(µ1)3+ (1 − w)(µ2)3   3(µ2− µ1) + 3w  ∂ µ1 ∂ w + ∂ µ2 ∂ w  (3w(µ2− µ1))2 − 2µ1 ∂ µ1 ∂ w =  − 2 (µ1)3− (µ2)3+ 3w(µ1)2 ∂ µ∂ w1+ 3(1 − w)(µ2)2  − w (1−w) ∂ µ1 ∂ w + (µ2−µ1) (1−w)  (3w(µ2− µ1)) − τ ·  3(µ2− µ1) + 3w  − w (1−w) ∂ µ1 ∂ w + (µ2−µ1) (1−w) − ∂ µ1 ∂ w  (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w =  − 2 (µ1)3− (µ2)3+ 3w (µ1)2− (µ2)2 ∂ µ1 ∂ w + 3(µ2) 2 2− µ1)  (3w(µ2− µ1)) − τ ·  3(µ2− µ1) + 3w  2−µ1) (1−w) − 1 (1−w) ∂ µ1 ∂ w  (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w =  − 2 (µ1)3− (µ2)3+ 3(µ2)2(µ2− µ1)  (3w(µ2− µ1)) − τ ·  3(µ2− µ1) + 3w  2−µ1) (1−w)  (3w(µ2− µ1)) −6w (µ1) 2− (µ 2)2 ∂ µ1 ∂ w (3w(µ2− µ1)) + τ · 3w (1−w) ∂ µ1 ∂ w (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w =  − 2 (µ1)3− (µ2)3+ 3(µ2)2(µ2− µ1)  (3w(µ2− µ1)) − τ ·  1 +(1−w)w  w + 2(µ1+ µ2) ∂ µ1 ∂ w + τ · 1 (1 − w)(µ2− µ1) ∂ µ1 ∂ w − 2µ1 ∂ µ1 ∂ w ,

Referenties

GERELATEERDE DOCUMENTEN

Experiments show that in different cases, with different matching score distributions, the hybrid fu- sion method is able to adapt itself for improved performance over the two levels

Als er verdenkingen zijn op andere oorzaken kunnen er ook nog monsters door het Centrum voor Schelpdieronderzoek (IMARES) worden genomen.. De monsters voor

Figuur 3 Schematische voorstelling van het model INITIATOR voor het berekenen van de concentraties in grond- en oppervlaktewater en ammoniakemissie. Figuur 4 Nitraat concentratie

Een periodieke torsiebelasting van het cilindrisch oppervlak van een rechthoekige koker (gemodificeerde Vlasov-theorie) Citation for published version (APA):..

Muslims are less frequent users of contraception and the report reiterates what researchers and activists have known for a long time: there exists a longstanding suspicion of

The seabird monitoring program executed by the Research Institute for Nature and Forest (INBO) is designed to determine local changes in seabird densities following the construction

The present text seems strongly to indicate the territorial restoration of the nation (cf. It will be greatly enlarged and permanently settled. However, we must

Analyzing the technology and fixed sales structure transformation models and their applications involving impact analysis and multipliers of factor inputs or environmental