Parameterizing non-normality : the five moments distribution

(1)

Parameterizing Non-Normality

− The Five Moments Distribution −

Steve Robert Schmidt University of Amsterdam

Abstract

I present a formal framework to parameterize a probability function that matches five finite moments of a data set. The calibrated model is closer to what is being called “well-specified”.

Supervisor: Prof. Dr. Rob Kaas Second reader: Dr. Sami Umut Can

(2)

Contents

1 Introduction . . . 1

2 Ansatz . . . 2

3 Implicit functions . . . 5

4 Case analysis . . . 13

4.1 Positivity of the implicit non-central second moments . . . 13

4.2 Positivity of the implicit variances . . . 15

4.3 Implicit inequalities . . . 17

4.4 A narrow range . . . 20

4.5 The solution set . . . 28

5 Uniqueness of the solution . . . 29

6 The formal framework to fit four moments − An algebraic summary . . . 33

6.1 Numerical solution . . . 34

7 Parametric choices and their implications − Graphical illustrations . . . 34

8 Fitting a fifth moment . . . 39

8.1 Marginal changes in described and implied moments . . . 39

8.2 Numerical implementation . . . 46

9 An actuarial application . . . 46

9.1 Empirical and theoretical risk measures . . . 48

9.2 Model validation . . . 49

10 Conclusion . . . 56

A Appendix - Matlab codes . . . 58

A.1 Required implicit and nested functions . . . 58

A.2 Loop for plotting a family of PDFs . . . 60

A.3 Two variable solver . . . 63

(3)

1 Introduction

Modern statistics is in large parts grounded on the far-reaching legacy of Carl Friedrich Gauss (1777-1855). His popularity as a young mathematician and astronomer arises, among others, from his works on the calculation of planetary orbits. These derived its origin from the discovery of Ceres, a supposedly new planet (actually an asteroid, which is located between Mars and Jupiter), at New Year’s day in 1801 by the Italian astronomer Giuseppe Piazzi. Being observable for few weeks, Ceres eventually disappeared behind the sun. Gauss, and his European colleagues, aimed at calculating its reemergence in the sky. In late 1801, he was able to predict where its position would be. Gauss finished the manuscript, explaining his methodology, in 1806 and published it in 1809 under the title Theoria motus corporum coelestium in sectionibus conicis solem ambientium(Theory of the motion of the heavenly bodies about the sun in conic sections). Therein, he had employed a probabilistic framework to minimize measurement errors, i.e. the method of least squares (LS) as well as the maximum likelihood estimation (MLE), and introduced the normal distribution. It is disputable whether LS was first known to Adrien-Marie Legendre in 1805. Yet, these concepts have endured and been refined to draw statistical inference.

The density shape of the normal distribution is completely characterized by two parameters: the mean and variance of the sample. The so-called “bell curve” assumes symmetry of measurements (or errors) around their most likely value, i.e. the average, whereby large deviations are less likely than small deviations. In contrast to the error curve proposed by Pierre-Simon Laplace in 1774, the Gaussian distribution is continuously differentiable and does not exhibit excess kurtosis. The latter is, besides symmetry, a somewhat unfavorable feature of the normal distribution, as we do observe asymmetries and extremes in reality. Subbotin (1923) united both distributions and proposed the generalized error distribution (GED); also called exponential power distribution. The probability density function (PDF) is given by

h(x) = β

2σ · Γ(1/β )· e

−|x−µ_σ |β _,

where β is an additional shape parameter. Particular values for β include the set of Laplace distributions (β = 1) as well as the normal distributions (β = 2). That way, the GED allows to model heavy tails (positive excess kurtosis) in a more flexible way. The parameters are calculated using MLE, such that the resulting density aligns the data as close as possible. This parameterization, however, does not exactly reflect the fourth moment, and it also assumes symmetry, i.e. all central odd moments are jointly equal to zero. Extensions and variations regarding the skewness and kurtosis have been suggested since the beginning of the 20th century. New probabilities have been constructed, for which higher moments are often not defined or tend to infinity, as these evolve progressively for leptokurtic variables. A prominent example is the class of α-stable distributions, specified by Paul Lévy in 1925.

The more recent literature promotes further generalizations of the Gaussian and Laplace laws of probability. Often, these curves are low-parameterized, and involve rather complex evolutions for the higher moments, especially on account of the Gamma function, Γ(x). Even though this might be justifiable, there is no reason to believe that the

(4)

evolution of moments, dictated by a handful of parameters under a certain probability function, indeed comes close to that of the real data. The very same can be said for my model (in a higher dimension).

To be more specific, I propose a probability model that is composed of two Gaussian distributions, where parameters are calibrated in such a way that five finite moments can be captured. Starting at the beginning, I will first explain my formal approach (“Ansatz”), and then I give an overview of how the remainder of this thesis is structured.

2 Ansatz

The k-th non-central moment, mk, of a (continuous) variable Y with y ∈ R, under a (presumed) theoretical

distribu-tion P(y), is given by

mk:= Z +∞ −∞ ykp(y) dy =: EhYk| Pi , (1) where m₀ = 1 , since P(y) := Z y −∞p(z) dz ,

defines a probability function for Y that monotonically increases from zero to one for y ∈ (−∞ , +∞) .

The empirical mean and variance are defined by µY:= m1and σY2:= m2− m21, respectively. For k ≥ 3, I define the

k-th central moment of Y under P(y) by

µk:= Z +∞ −∞ (y − m1) k p(y) dy = Eh(Y − m1)k| P i = E " k

∑

i=0 k i Yi(−m1)k−i P # = k

∑

i=0 k i (−m1)k−iE h Yi P i (1) = k

∑

i=0 k i (−m1)k−imi = k

∑

i=0 k i mk−i(−m1)i, k≥ 3 , (2)

(5)

Let the PDF of Y , i.e. p(y), be composed of two normal density functions, π1(y) and π2(y), with parameters µ1,

µ2, σ1and σ2, that are mixed by a weight w ∈ (0 , 1) such that

p(y) = wπ1(y) + (1 − w)π2(y)

= w · 1 σ1 √ 2πe −(y−µ1) 2 2σ 2 1 + (1 − w) · 1 σ2 √ 2πe −(y−µ2) 2 2σ 2 2 (3)

satisfies Kolmogorov’s axioms of probability. I then claim that there exists a real-valued solution vector within the five-dimensional parameter space that matches five empirical moments of Y via p y ; w∗, µ₁∗, µ₂∗, (σ2

1)∗, (σ22)∗.

The existence of such a solution and its possible restrictions will be analyzed in what follows.

A Gaussian random variable U ∼ N(µ, σ ) is formally represented by U= µ + σV , where V ∼ N(0, 1)

follows a standard normal distribution. Applying the binomial theorem, the higher non-central moments of U can be calculated by E[Uk] = k

∑

i=0 k i µk−iσiE[Vi] .

Thereby, the (central) moments of the incorporated standard normal random variable V are determined by a deter-ministic expression for i ∈ N0, and evolve as follows:

E[Vi] =          (i − 1)!! = (i − 1) · (i − 3) . . . 5 · 3 · 1 if i even (−1)!! = 1 if i = 0 0 if i odd = 1 + (−1) i 2 (i − 1)!! , such that E[Uk] = k

∑

i=0 k i µk−iσi 1 + (−1) i 2 (i − 1)!! , (4)

where (i − 1)!! denotes the double factorial. The first six non-central moments of U thus compute to E[U ] = µ E[U2] = µ2+ σ2 E[U3] = µ3+ 3µσ2 E[U4] = µ4+ 6µ2σ2+ 3σ4 E[U5] = µ5+ 10µ3σ2+ 15µσ4 E[U6] = µ6+ 15µ4σ2+ 45µ2σ4+ 15σ6.

(6)

Using (1) and (3), the empirical (or observed) non-central moments of Y , i.e. mk, can be modelled by a probability

function that consists of two component normal distributions, defined by Π1and Π2, such that

mk= Z +∞ −∞ ykp(y) dy = Z +∞ −∞ yk wπ1(y) + (1 − w)π2(y) dy = w Z +∞ −∞ ykπ1(y) dy + (1 − w) Z +∞ −∞ ykπ2(y) dy = wEhyk| Π1 i + (1 − w)Ehyk| Π2 i .

Under the evolution of moments of normal random variables in (4), the first six non-central moments of Y can thus be expressed by the following non-linear system of equations:

m0 = w + (1 − w) m₁ = w(µ1) + (1 − w)(µ2) (5a) m2 = w(µ12+ σ12) + (1 − w)(µ22+ σ22) (5b) m3 = w(µ13+ 3µ1σ12) + (1 − w)(µ23+ 3µ2σ22) (5c) m₄ = w(µ₁4+ 6µ₁2σ₁2+ 3σ₁4) + (1 − w)(µ₂4+ 6µ₂2σ₂2+ 3σ₂4) (5d) m5 = w(µ15+ 10µ13σ12+ 15µ1σ₁4) + (1 − w)(µ₂5+ 10µ₂3σ₂2+ 15µ2σ₂4) (5e) m6 = w(µ16+ 15µ14σ12+ 45µ12σ14+ 15σ16) + (1 − w)(µ26+ 15µ24σ22+ 45µ22σ24+ 15σ26) . (5f)

To solve this system, I will first express µ2, σ12 and σ22as implicit functions of µ1for some fixed w ∈ (0 , 1) and

given empirical moments, mk, in the subsequent section. In section 4, a case analysis with respect to the component

variances is performed. The focus will be the verification of a solution set w.r.t. w and µ1for which both variances

exist. In section 5, I prove that it is possible to find a unique solution that exactly matches the first four empirical moments of Y based on previously elaborated thoughts. Section 6 provides a summary of equations that need to be considered in an algorithm to fit these moments. The choice of w in a four moment fitting procedure not only has vital implications for the existence of component variances and corresponding densities π1(y), π2(y), and p(y), but

also for the magnitude of implied higher moments (beyond m4), which is dictated by the evolution of moments of

the fitted Gaussian distribution parameters, discussed in section 7. Families of PDFs will illustrate this graphically. As will become apparent, the set-up will leave us with one free parameter, which can be used to ultimately capture a fifth moment, examined in section 8. Subsequently, I will apply several five moment curve fitting procedures to a real data set in section 9, including theoretical in comparison to empirical risk measures and test results for the goodness of fit. Section 10 concludes the thesis.

(7)

3 Implicit functions

In the following, I derive expressions for µ2, σ12and σ22in terms of µ1for some fixed w ∈ (0 , 1) using the system

of equations (5a) to (5f). The goal is to reduce the dimension of this non-linear problem as far as possible.

From (5a) we immediately find an implicit function for µ2, i.e.

m1= w(µ1) + (1 − w)(µ2)

⇔ µ₂∗= m1− wµ1

(1 − w) . (6)

Regarding the derivation of implicit expressions for σ₁2and σ₂2, I will present two approaches. The first will only have a motivational character and will be disregarded for reasons stated below.

By (5b) we have m2= w(µ12+ σ12) + (1 − w)(µ22+ σ22) ⇔ σ2 2 = m2− w(µ12+ σ12) (1 − w) − µ 2 2 . (7)

Consider first (5b) and (5d) to derive an implicit function for σ₁2:

m4− 3m22= w(µ14+ 6µ12σ12+ 3σ14) + (1 − w)(µ24+ 6µ22σ22+ 3σ24) − 3w2(µ₁2+ σ₁2)2− 6w(1 − w)(µ₁2+ σ₁2)(µ₂2+ σ₂2) − 3(1 − w)2(µ₂2+ σ₂2)2 = w(3µ₁4+ 6µ₁2σ₁2+ 3σ₁4) − 2wµ₁4+ (1 − w)(3µ₂4+ 6µ₂2σ₂2+ 3σ₂4) − 2(1 − w)µ₂4 − 3w2(µ₁2+ σ₁2)2− 6w(1 − w)(µ₁2+ σ₁2)(µ₂2+ σ₂2) − 3(1 − w)2(µ₂2+ σ₂2)2 = 3(w − w2)(µ₁2+ σ₁2)2− 6w(1 − w)(µ₁2+ σ₁2)(µ₂2+ σ₂2) + 3 (1 − w) − (1 − w)2 (µ₂2+ σ₂2)2 − 2wµ₁4− 2(1 − w)µ₂4 = 3(w − w2) (µ₁2+ σ₁2) − (µ₂2+ σ₂2)2− 2wµ₁4− 2(1 − w)µ₂4 (7) = 3(w − w2) (µ₁2+ σ₁2) − µ₂2+m2− w(µ 2 1+ σ12) (1 − w) − µ 2 2 2 − 2wµ4 1− 2(1 − w)µ24 = 3(w − w2) (µ 2 1+ σ12) − m2 (1 − w) 2 − 2wµ4 1− 2(1 − w)µ24 = 3 w 1 − w µ 2 1+ σ12− m2 2 − 2wµ₁4− 2(1 − w)µ₂4 ⇔ σ2 1 = m2± r (1 − w) 3w m4− 3m22+ 2wµ14+ 2(1 − w)µ24 − µ2 1 , (8)

(8)

for which I define τ1:= r (1 − w) 3w m4− 3m2₂+ 2wµ₁4+ 2(1 − w)µ₂4 such that σ₁2= (m2± τ1) − µ12, or σ₁2= (m2)1− µ12,

respectively. Using (7) and (8), it follows that

σ₂2= m₂− w µ₁2+ m2± r (1−w) 3w m₄− 3m2 2+ 2wµ14+ 2(1 − w)µ24 − µ2 1 (1 − w) − µ 2 2 = m2∓ w 1 − w r (1 − w) 3w m₄− 3m2 2+ 2wµ14+ 2(1 − w)µ24 − µ₂2 = m2∓ r w 3(1 − w) m₄− 3m2 2+ 2wµ14+ 2(1 − w)µ24 − µ₂2, (9)

for which I define

τ2:= r w 3(1 − w) m₄− 3m2 2+ 2wµ14+ 2(1 − w)µ24 such that σ₂2= (m2∓ τ2) − µ22, or σ₂2= (m2)2− µ22.

As can be seen from the implicit variance expressions in (8) and (9), the non-negative square roots τ1and τ2 shift

the non-central second moment of Y , i.e. m2, either in a positive or a negative direction resulting in non-central

second moments of the two component normal distributions of (m2)1:= m2± τ1and (m2)2:= m2∓ τ2. For both

variances to exist simultaneously, either τ1is unrestrictedly positive and τ2 has an upper bound of m2− µ22, or τ2

is unrestrictedly positive and τ1has an upper bound of m2− µ₁2. The implications regarding the locations and the

degree of the dispersions are, however, entirely different in both cases; we either have

(m2)2 < (m2)Y < (m2)1, or

(9)

For instance, let us assume that w ∈ (0 , 1) and |µ1| < |µ2|. Then it follows for the first solution that

(m2)2− (µ2)2 < (m2)Y− (wµ1+ (1 − w)µ2)2 < (m2)1− (µ1)2

⇔ σ₂2 < σ_Y2 < σ₁2,

whereas the second solution implies (under the same assumption) that

(m2)1− (µ2)2 < (m2)Y− (wµ1+ (1 − w)µ2)2 < (m2)2− (µ1)2

⇔ σ2

1+ (µ1)2− (µ2)2 < σY2 < σ22+ (µ2)2− (µ1)2,

which leads to

σ₁2− σ₂2 < 2(µ2)2− 2(µ1)2.

In the first case, the first density component, with a lower mean in absolute terms, also exhibits a higher dispersion in comparison to the second component. In the second case, however, it is still possible for the first component to simultaneously have a lower variance, i.e. σ₁2< σ₂2(see lhs), since the right-hand side is positive by construction (|µ1| < |µ2|). Apparently, each solution pair for the implicit variances is subject to a restriction w.r.t. the skewness

of the mixed Gaussian distribution that is to be parameterized. In this setting, the first solution likely corresponds to a left-skewed variable (higher lower-quantile dispersion), whereas the second solution would be applicable to right-skewed variables (higher upper-quantile dispersion).

The existence of two implicit solutions for either component variance and the limited capability to unrestrainedly capture the skewness, for each solution pair on its own, lead to half-blind computations, if the algorithm bases only on one solution pair, i.e. if either σ₁2= (m2+τ1)− µ12and σ22= (m2−τ2)− µ22, or alternatively σ12= (m2−τ1)− µ12

and σ₂2= (m2+ τ2) − µ₂2was chosen to model the empirical characteristics of Y . On top of that, it unnecessarily

complicates the case analysis.

For these reasons, I will henceforth consider equation (5b) and (5c) to derive the implicit functions for σ₁2and σ₂2 from a pair of equations that is linear in the component variances to rule out the existence of multiple solutions for either of them.

Writing (5b) and (5c) in matrix notation,   m2 m3   =   wµ₁2+ (1 − w)µ2 2 wµ₁3+ (1 − w)µ₂3  +   w (1 − w) 3wµ1 3(1 − w)µ2     σ₁2 σ₂2  

for which I define on both sides of the equation

m := a + Aσ2,

(10)

such that the solution to this linear system of equations is given by σ2 = A−1b = 1 det(A)adj(A) b ⇔   (σ₁2)∗ (σ2 2)∗   = 1 a11a22− a12a21   a22 −a12 −a₂₁ a11     b1 b2   ⇔   (σ₁2)∗ (σ₂2)∗   = 1 3w(1 − w)(µ2− µ1)   3(1 − w)µ2 −(1 − w) −3wµ1 w     m2− (wµ₁2+ (1 − w)µ₂2) m3− (wµ₁3+ (1 − w)µ₂3)   ,

from which we obtain the following expressions:

(σ₁2)∗ = (3(1 − w)µ2) m₂− (wµ2 1+ (1 − w)µ22) − (1 − w)m₃− (wµ3 1+ (1 − w)µ23) 3w(1 − w)(µ2− µ1) = 3µ2 m₂− (wµ2 1+ (1 − w)µ22) −m₃− (wµ3 1+ (1 − w)µ23) 3w(µ2− µ1) = 3µ2 m2− (1 − w)µ₂2 −m3− (−2wµ₁3+ (1 − w)µ₂3) − 3wµ2µ₁2+ 3wµ₁3 3w(µ2− µ1) = 3µ2 m2− (1 − w)µ22 −m3− ((1 − w)µ₂3− 2wµ₁3) − 3w(µ2− µ1)µ12 3w(µ2− µ1) = 3m2µ2− m3− 2 wµ₁3+ (1 − w)µ₂3 3w(µ2− µ1) − µ₁2, (10) and (σ₂2)∗ = (−3wµ1) m₂− (wµ2 1+ (1 − w)µ22) + wm₃− (wµ3 1+ (1 − w)µ23) 3w(1 − w)(µ2− µ1) = m₃− (wµ3 1+ (1 − w)µ23) − 3µ1 m₂− (wµ2 1+ (1 − w)µ22) 3(1 − w)(µ2− µ1) = m3− (wµ₁3− 2(1 − w)µ₂3) − 3µ1 m2− wµ₁2 − 3(1 − w)µ3 2+ 3(1 − w)µ1µ22 3(1 − w)(µ2− µ1) = m3− (wµ₁3− 2(1 − w)µ₂3) − 3µ1 m2− wµ₁2 − 3(1 − w)(µ2− µ1)µ₂2 3(1 − w)(µ2− µ1) = m3− 3m2µ1+ 2 wµ₁3+ (1 − w)µ₂3 3(1 − w)(µ2− µ1) − µ₂2. (11)

(11)

It is easy to show that the implicit variances given under (10) and (11) match the variance of Y (by construction): σ_Y2:= m2− m21 = w(µ₁2+ σ₁2) + (1 − w)(µ₂2+ σ₂2) − m2₁ = w   3m2µ2− m3− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1)  + (1 − w)   m₃− 3µ1m2+ 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1)   − m2₁ = 3m2µ2− m3− 2 wµ13+ (1 − w)µ23 +m3− 3µ1m2+ 2 wµ13+ (1 − w)µ23 3(µ2− µ1) − m2 1 = m2 − m21.

However, for the component variances to exist two conditions have to be satisfied. First, A must be a non-singular matrix, which requires that

w6= {0, 1} ∧ µ16= µ2.

Second, the variances of the two normal density components need to be positive, i.e.

(σ₁2)∗ = 3m2µ2− m3− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) − µ₁2 = (m2)1− µ12 > 0 , (σ₂2)∗ = m₃− 3m2µ1+ 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ2 2 = (m2)2− µ22 > 0 .

That is, the factor pairs of the non-central second moments, (m2)1and (m2)2, must jointly exhibit a positive sign,

and exceed the individual component means. This is influenced by the sign of the difference in means, i.e. µ2− µ1,

as well as the magnitude of the empirical moments up to m3, and leaves us with two distinct cases, assuming that

w∈ (0 , 1). This will be examined in the next section.

With respect to the impact of the third moment on the implicit variances, we find the following (static) relations: ∂ (σ₁2)∗ ∂ m3 = − 1 3w(µ2− µ1) , ∂ (σ₂2)∗ ∂ m3 = 1 3(1 − w)(µ2− µ1) .

Let w ∈ (0, 1), then a gradual increase of m3 leads to a lower dispersion of the first normal density component if

µ1< µ2, as well as a respective increase of the variance of the second component. The opposite is true if µ1> µ2.

This demonstrates that this unique solution for the variances as functions of µ1(after replacing µ2for its implicit

expression using (6)) is able to capture both higher lower-quantile and upper-quantile dispersion, being universally applicable for left- and right-skewed variables, respectively. In other words, it gives us an idea of how the location of the means and the magnitude of component variances accommodate the empirical skewness.

(12)

The derived implicit expressions for µ2, σ12and σ22, given in (6), (10), and (11), will now be inserted in the fourth

moment equation (5d). In doing so, the resulting function (of µ1) will reflect the empirical moments of Y up to m4.

m4 = w(µ14+ 6µ12σ12+ 3σ14) + (1 − w)(µ24+ 6µ22σ22+ 3σ24) = w(3µ₁4+ 6µ₁2σ₁2+ 3σ₁4) + (1 − w)(3µ₂4+ 6µ₂2σ₂2+ 3σ₂4) − 2wµ₁4− 2(1 − w)µ₂4 = 3w(µ₁2+ σ₁2)2 + 3(1 − w)(µ₂2+ σ₂2)2− 2wµ4 1− 2(1 − w)µ24 = 3m2µ2− 2(1 − w)µ23− 2wµ13− m3 2 3w(µ2− µ1)2 + m₃+ 2wµ₁3+ 2(1 − w)µ₂3− 3µ1m2 2 3(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4,

for which I define

c:= 2wµ₁3+ 2(1 − w)µ₂3+ m3, such that m₄= 3m2µ2− c 2 3w(µ2− µ1)2 + c− 3µ1m2 2 3(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4, = (1 − w)9m2₂µ₂2− 6c m2µ2+ c2 + wc2− 6c m2µ1+ 9m22µ12 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 − 6c m2 wµ1+ (1 − w)µ2 + c2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2 wµ12+ (1 − w)µ22 − 6c m2m1+ c2 3w(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4 = 9m 2 2 wµ12+ (1 − w)µ22 ± 9m22m21− 6c m2m1+ c2 3w(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4 = 9m 2 2 wµ12+ (1 − w)µ22 − 9m22 wµ1+ (1 − w)µ2 2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4 = 9m2₂ (w − w2)µ₁2− 2w(1 − w)µ1µ2+ (1 − w) − (1 − w)2 µ₂2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4 = 9m2₂w(1 − w)µ₁2− 2w(1 − w)µ1µ2+ w(1 − w)µ22 3w(1 − w)(µ2− µ1)2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 9m 2 2w(1 − w)(µ1− µ2)2 3w(1 − w)(µ2− µ1)2 + 3m2m1− c 2 3w(1 − w)(µ2− µ1)2 − 2wµ4 1− 2(1 − w)µ24 = 3m2₂+ 3m2m1− m3− 2wµ 3 1− 2(1 − w)µ23 2 3w(1 − w) (µ2− µ1)2 − 2wµ₁4− 2(1 − w)µ₂4,

(13)

which is equivalent to 0 = 2 wµ₁4+ (1 − w)µ₂4 − 3m2m1− m3− 2 wµ13+ (1 − w)µ23 2 3w(1 − w) (µ2− µ1)2 + m4− 3m22.

Inserting (6) and further rearrangements

0 = 2(wµ₁4+ (1 − w)µ₂4) − 3m2m1− m3− 2(wµ 3 1+ (1 − w)µ23) 2 3w(1 − w) µY−wµ1 (1−w) − µ1 2 + m4− 3m 2 2 ⇔ 0 = 2(wµ₁4+ (1 − w)µ₂4) − 1 − w 3w _3m 2m1− m3− 2(wµ13+ (1 − w)µ23) 2 (µY− µ1)2 ! + m4− 3m22 ⇔ 0 = 3w m4− 3m22+ 2(wµ14+ (1 − w)µ24) (µY− µ1)2− (1 − w) 3m2m1− m3− 2(wµ13+ (1 − w)µ23) 2 ⇔ 0 = 3w m4− 3m22+ 2 wµ₁4+(µY−wµ1)4 (1−w)3 (µY− µ1)−2 − (1 − w) 3m2m1− m3− 2 wµ₁3+(µY− wµ1) 3 (1 − w)2 2 ,

lead to the function

f(µ1) = 3w (m4− 3m2₂+ 2wµ₁4)(1 − w)3+ 2(µY− wµ1)4 (µY− µ1)−2 −(3m2m1− m3− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 , (12) whose root µ₁∗ will guarantee that the first four empirical moments, i.e. mk (k = 1, . . . , 4), are exactly fitted via

incorporated implicit expressions for µ2, σ12, σ22and some chosen weight w ∈ (0 , 1).

Equations (10), (11) and (12) can be written in terms of the (standardized) central moments of Y . Applying the binomial representation of central moments given in (2) and µY = m1we find

µ3= m3− 3m2m1+ 3m1m21− m31 = m3− 3m2m1+ 2m31 ⇔ 3m2m1− m3= 2m31− µ3 = σY3 2µ3 Y− µ3 σ_Y3 = σ_Y3 2 µY σY 3 − µ3 σ_Y3 ! = σ_Y3 2 µY σY 3 − γ1 ! =: δ1,

(14)

where γ1is the skewness of Y , and µ4= m4− 4m3m1+ 6m2m21− 4m1m31+ m41 = m4− 4m3m1+ 6m2m21− 3m41 ⇔ µ4− 3σY4= m4− 4m3m1+ 6m2m21− 3m41− 3(m2− m21)2 = m4− 3m22− 4m3m1+ 12m2m21− 6m41 ⇔ µ4− 3σY4+ 4µ3µY= m4− 3m22− 4m3m1+ 12m2m21− 6m41+ 4(m3− 3m2m1+ 2m31)m1 = m4− 3m22+ 2m41 ⇔ m4− 3m22= µ4− 3σY4+ 4µ3µY− 2µY4 = σ_Y4 µ4− 3σY4+ 4µ3µY− 2µY4 σ_Y4 = σY4 µ4 σ_Y4− 3 + 2 µY σY 2µ3 σ_Y3 − µY σY 3!! = σY4 γ2+ 2 µY σY 2γ1− µY σY 3!! =: δ2,

where γ2is the excess kurtosis of Y .

Both δ1 and δ2are parametric expressions of observed empirical moments of Y : mean, standard deviation,

skew-ness, and excess kurtosis.

Inserting δ1and δ2in (10), (11) and (12) leads to

(σ₁2)∗= 3m2µ2− m3− 2 wµ₁3+ (1 − w)µ₂3 3w(µ2− µ1) − µ₁2 = 3m2µ2± 3m2w(µ1− µ2) − m3− 2 wµ₁3+ (1 − w)µ₂3 3w(µ2− µ1) − µ₁2 = 3m2 wµ1+ (1 − w)µ2 − 3m2w(µ1− µ2) − m3− 2 wµ₁3+ (1 − w)µ₂3 3w(µ2− µ1) − µ₁2 = 3m2m1− m3− 3(σY2+ µY2)w(µ1− µ2) − 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) − µ₁2 = σY2+ µY2+ δ1− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) − µ2 1 , (13)

(15)

and (σ₂2)∗= m3− 3m2µ1+ 2 wµ₁3+ (1 − w)µ₂3 3(1 − w)(µ2− µ1) − µ₂2 = m₃− 3m2µ1± 3m2(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ₂2 = m₃− 3m2 wµ1+ (1 − w)µ2 − 3m2(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ₂2 = m₃− 3m2m1− 3(σY2+ µY2)(1 − w)(µ1− µ2) + 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ2 2 = σ_Y2+ µ_Y2− δ1− 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ2 2 , (14) and f(µ1) := 3w (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4 (µY− µ1)2− (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 . (15) So far, I have assumed that the implicit variances exist for a fixed weight w ∈ (0 , 1). Before I present an approach to solve for µ₁∗ using the moment matching function (15), and prove the uniqueness of this solution, I will first provide the conditions w.r.t. w satisfying real-valued component variances. This is the goal of the subsequent case analysis.

4 Case analysis

In the following, I distinguish between case (i) and (ii), each corresponding to a different location of the component means. To be more specific, case (i) corresponds to µ1< µ2, and (ii) to µ1> µ2, whereby a wide range w ∈ (0 , 1)

is assumed in either case. For both cases, I derive relations that have to be satisfied to guarantee that the implicit component variances are positive. Of particular interest will be the determination of a narrower range for the weight was a subset of the initially assumed interval.

4.1 Positivity of the implicit non-central second moments (i) Consider (m2)∗1and (m2)∗2for w ∈ (0 , 1) and µ1< µ2:

(m2)∗1 = 3m2µ2− m3− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) > 0 ⇔ 3m2µ2− m3> 2 wµ13+ (1 − w)µ23 ,

(16)

(m2)∗2 = m3− 3m2µ1+ 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) > 0 ⇔ 2 wµ₁3+ (1 − w)µ₂3 > 3m2µ1− m3.

Combined we find the relation

3m2µ2− m3> 2 wµ13+ (1 − w)µ23 > 3m2µ1− m3, which leads to µ2> µ1 ⇔ (µY− wµ1) (1 − w) > µ1 ⇔ µY> µ1.

(ii) Consider (m2)∗1and (m2)∗2for w ∈ (0 , 1) and µ1> µ2:

(m2)∗1 = 3m2µ2− m3− 2 wµ₁3+ (1 − w)µ₂3 3w(µ2− µ1) > 0 ⇔ 3m2µ2− m3< 2 wµ13+ (1 − w)µ23 , (m2)∗2 = m3− 3m2µ1+ 2 wµ₁3+ (1 − w)µ₂3 3(1 − w)(µ2− µ1) > 0 ⇔ 2 wµ₁3+ (1 − w)µ₂3 < 3m2µ1− m3.

Combined we find the relation

3m2µ2− m3< 2 wµ13+ (1 − w)µ23 < 3m2µ1− m3, which leads to µ2< µ1 ⇔ (µY− wµ1) (1 − w) < µ1 ⇔ µY< µ1.

(17)

It is worth noting that the slack conditions µY > µ1 and µY < µ1 on their own do not guarantee that the second

non-central component moments, (m2)1and (m2)2, are positive for all w ∈ (0 , 1) in case (i) and (ii), respectively.

Their sign, or more specifically the range for their existence, is also influenced by the magnitude of the empirical moments of Y up to m3. Apart from that and arguing from a plain analytical point of view regarding the set of all

real-valued points (m1, m2), a positive non-central second moment does not represent a sufficient condition for a

positive variance; the opposite, however, does. This can be seen from the following implications:

σ2= m2− m21> 0 ⇒ m2> 0 , but m2> 0 6⇒ σ2= m2− m21> 0 .

The determination of a range for w under which both component variances exist thus has to be grounded on the variances themselves, yet I will make use of the slack conditions in each case.

4.2 Positivity of the implicit variances (i) Consider (σ2

1)∗and (σ22)∗as given by (13) and (14) for w ∈ (0 , 1) and µ1< µ2:

(σ₁2)∗ = σY2+ µY2+ δ1− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) − µ₁2> 0 ⇔ 3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1, (σ₂2)∗ = σY2+ µY2− δ1− 2 wµ₁3+ (1 − w)µ₂3 3(1 − w)(µ2− µ1) − µ₂2> 0 ⇔ −3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1. Combined we find 3w(µ2− µ1) µ12− σY2− µY2 < −3(1 − w)(µ2− µ1) µ22− σY2− µY2 ⇔ w µ₁2− σY2− µY2 < −(1 − w) µ22− σY2− µY2 ⇔ wµ₁2+ (1 − w)µ₂2< σY2+ µY2.

By Jensen’s inequality it follows that

µY2 = (wµ1+ (1 − w)µ2)2 ≤ wµ12+ (1 − w)µ22< σY2+ µY2

⇔ σY2> 0 ,

(18)

(ii) Consider (σ₁2)∗and (σ₂2)∗as given by (13) and (14) for w ∈ (0 , 1) and µ1> µ2: (σ₁2)∗ = σ_Y2+ µ_Y2+ δ1− 2 wµ13+ (1 − w)µ23 3w(µ2− µ1) − µ2 1> 0 ⇔ 3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1, (σ₂2)∗ = σY2+ µY2− δ1− 2 wµ13+ (1 − w)µ23 3(1 − w)(µ2− µ1) − µ₂2> 0 ⇔ −3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1. Combined we find 3w(µ2− µ1) µ12− σY2− µY2 > −3(1 − w)(µ2− µ1) µ22− σY2− µY2 ⇔ w µ₁2− σY2− µY2 < −(1 − w) µ22− σY2− µY2 ⇔ wµ₁2+ (1 − w)µ₂2< σY2+ µY2.

By Jensen’s inequality it follows that

µ_Y2 = (wµ1+ (1 − w)µ2)2 ≤ wµ12+ (1 − w)µ22< σY2+ µY2

⇔ σY2> 0 ,

which is true by definition.

To sum up case (i) and (ii) for the implicit variances intermediately, it was shown that the solutions (σ₁2)∗ and (σ₂2)∗are positive for some weight w ∈ (0 , 1), irrespective of the location of individual means of the two normal density components. Consequently, p(y) = wπ1(y) + (1 − w)π2(y) will be a PDF. Although we may be tempted

to conclude that the component variances are positive for all w ∈ (0 , 1), it is possible to find counterexamples relatively easily. For instance, a set of chosen parameters w, µ1and µ2satisfying (5a) for given µY, σY2and γ1may

result in implicit variances of opposite signs, or being jointly negative.

By expressing µ2, σ12and σ22 in terms of µ1 and w as demonstrated in the previous section, the original problem

in five dimensions could be reduced to a two-dimensional problem, i.e. the moment matching equation arising from (15) and another higher moment equation in its reduced from, e.g. (5e) or (5f) after incorporating the implicit functions. The remaining equation(s) will merely base on unknown µ1and w, whereby the latter is either arbitrarily

chosen (four-moment fitting) or explicitly computed (five-moment fitting). Based on these thoughts, it is (for now) sufficient to generally describe the set, for which a real-valued solution exists by

(19)

representing a vector field within the x-y-plane that contains all points (µ1, w) for which the implicit component

variances are jointly positive.

The remainder of this section deals with the derivation and the verification of a narrower range for w as a subset between zero and one. For this purpose, I intend to express w itself in terms of µ1 using the implicit expressions

for both component variances and derive relations w.r.t. w that assure their existence. As will become apparent, a further reduction of the dimension is not possible without introducing an additional parameter, thus reinforcing the validity of (16). Yet, the derived results may lead to a better understanding, or intuition, regarding the location of the bounds for w as well as the compactness of a closer solution set.

4.3 Implicit inequalities

Using (13) and (14), I derive two implicit expressions for w in terms of µ1in case (i), and subsequently summarize

the respective conditions for case (ii).

Reconsider the condition

3w(µ2− µ1) µ12− σY2− µY2 + 2 wµ13+ (1 − w)µ23 < δ1,

which has to hold for (σ₁2)∗> 0 in case (i) (see section 4.2).

Inserting (6) and some algebra

3w µY− wµ1 1 − w − µ1 µ₁2− σY2− µY2 + 2 wµ13+ (1 − w) µY− wµ1 1 − w 3! < δ1 ⇔ 3w µY− µ1 1 − w µ₁2− σY2− µY2 + 2 w(1 − w)2 µ₁3+ (µY− wµ1)3 (1 − w)2 < δ1 ⇔ 3w(1 − w) (µY− µ1) µ12− σY2− µY2 + 2 w(1 − w)2µ13+ (µY− wµ1)3 < (1 − w)2δ1,

for which I define (for reasons of space)

ε := − (µY− µ1) σY2+ −µY3+ µY2µ1+ µYµ12− µ13 ,

lead to the following relations:

3w(1 − w)ε + 2 w(1 − 2w + w2)µ₁3+ (µ_Y3− 3wµ1µY2+ 3w2µ12µY− w3µ₁3) < (1 − w)2δ1

⇔ 3w(1 − w)ε + 2 µ_Y3− 3wµ1µY2+ 3w2µ12µY+ w(1 − 2w)µ13 < (1 − w)2δ1

⇔ 3(w − w2)ε − 6wµ1µY2+ 6w2µ12µY+ 2w(1 − 2w)µ13< (1 − w)2δ1− 2µY3

(20)

I define the factor terms in the last inequality as follows 3ε − 6µ1µY2+ 2µ13 = −3 (µY− µ1) σY2− 3µY3− 3µY2µ1+ 3µYµ12− µ13 = −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3 =: ε1, and − 3ε + 6µ2 1µY− 4µ13 = 3 (µY− µ1) σY2+ 3µY3− 3µY2µ1+ 3µYµ12− µ13 = 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 =: ε2,

such that (σ₁2)∗> 0 holds, if the following quadratic inequality w.r.t. w is satisfied

ε2w2+ ε1w< (1 − w)2δ1− 2µY3

⇔ (ε2− δ1)w2+ (ε1+ 2δ1)w < δ1− 2µY3.

For (σ₂2)∗> 0 in case (i) it simultaneously has to hold that

−3(1 − w)(µ2− µ1) µ22− σY2− µY2 + 2 wµ13+ (1 − w)µ23 > δ1,

(see section 4.2). Inserting (6) and some rearrangements

−3(1 − w) µY− wµ1 1 − w − µ1 µY− wµ1 1 − w 2 − σ2 Y− µY2 ! + 2 wµ₁3+ (1 − w) µY− wµ1 1 − w 3! > δ1 ⇔ −3 (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) (1 − w)2 + 2 w(1 − w) 2 µ₁3+ (µY− wµ1)3 (1 − w)2 > δ1 ⇔ −3 (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) + 2 w(1 − w)2µ13+ (µY− wµ1)3 − (1 − w)2δ1> 0 ,

for which I define

η := (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) , lead to −3η + 2 w(1 − 2w + w2)µ₁3+ (µY3− 3wµ1µY2+ 3w2µ12µY− w3µ₁3) − (1 − w)2δ1> 0 ⇔ −3η + 2 µ3 Y− 3wµ1µY2+ 3w2µ12µY+ w(1 − 2w)µ₁3 − (1 − w)2δ1> 0 ⇔ −3η − 6wµ1µY2+ 6w2µ12µY+ 2w(1 − 2w)µ13− (1 − w)2δ1+ 2µY3> 0 ⇔ −3η + w − 6µ1µY2+ 2µ13 + w2 6µ12µY− 4µ₁3 − (1 − w)2δ1+ 2µY3> 0 ,

(21)

which, again, is quadratic in w, where −3η = −3 (µY− µ1) (µY− wµ1)2− (1 − w)2(σY2+ µY2) = −3 (µY− µ1) (µY2− 2wµ1µY+ w2µ₁2) − (−2w + w2)(σY2+ µY2) + 3 (µY− µ1) (σY2+ µY2) = −3 (µY− µ1) 2w(σY2+ µY2− µYµ1) + w2(µ12− σY2− µY2) + 3 (µY− µ1) σY2 = −6w σY2(µY− µ1) + µY3− 2µY2µ1+ µYµ₁2 − 3w2 µYµ₁2− µ₁3− µY3+ µ1µY2− (µY− µ1) σY2 + 3 (µY− µ1) σY2 = w −6σ2 Y(µY− µ1) − 6µY3+ 12µY2µ1− 6µYµ₁2 + w2 −3µYµ₁2+ 3µ₁3+ 3µY3− 3µ1µY2+ 3 (µY− µ1) σY2 + 3 (µY− µ1) σY2.

Inserting the last expression, we see that (σ₂2)∗> 0 holds, if the following inequality w.r.t. w is satisfied −3η + w − 6µ1µY2+ 2µ13 + w2 6µ12µY− 4µ13 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2+ w −6µ1µY2+ 2µ13− 6σY2(µY− µ1) − 6µY3+ 12µY2µ1− 6µYµ12 +w2 6µ₁2µY− 4µ13− 3µYµ12+ 3µ13+ 3µY3− 3µ1µY2+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2+ w −6σY2(µY− µ1) − 6µY3+ 6µY2µ1− 6µYµ₁2+ 2µ₁3 +w2 3µY3− 3µ1µY2+ 3µ12µY− µ₁3+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3 ⇔ 3 (µY− µ1) σY2− 2w 3σY2(µY− µ1) + 2µY3+ (µY− µ1)3 +w2 2µY3+ (µY− µ1)3+ 3 (µY− µ1) σY2 > (1 − w)2δ1− 2µY3

in which we can insert ε2, such that

⇔ ε2w2− 2ε2w+ 3 (µY− µ1) σY2> (1 − w)2δ1− 2µY3.

⇔ (ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2> δ1− 2µY3.

Summarizing the positivity requirements for the component variances in case (i) it jointly has to hold that (ε2− δ1)w2+ (ε1+ 2δ1)w < δ1− 2µY3, and

(ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2> δ1− 2µY3.

In case (ii), the condition µ1> µ2leads to a mere inversion of the inequality sign for w ∈ (0 , 1), such that

(ε2− δ1)w2+ (ε1+ 2δ1)w > δ1− 2µY3, and

(22)

4.4 A narrow range

Subsequently, I examine the solution sets implied by the inequalities in case (i); again, case (ii) will be summarized at last.

Let us first assume that ε2> δ1 (the inversion ε2< δ1 will be briefly discussed at the end of the case analysis for

(i)), i.e. ε2> δ1 ⇔ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3> (2µY3− σY3γ1) ⇔ (µY− µ1) 3σ_Y2+ (µY− µ1)2 + σ_Y3γ1> 0 ,

This statement is generally true if γ1> 0, since µY > µ1holds in case (i) (see positivity of the non-central second

moments as outlined in 4.1). Using the implicit inequalities in (i) as derived in 4.3, it follows that

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) < 0 (17) and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 . (18)

Consider first (17), for which I define the coefficients

a1:= (ε1+ 2δ1) (ε2− δ1) and a2:= − (δ1− 2µY3) (ε2− δ1) , such that w2+ a1w+ a2< 0 , with roots w_1/2= −a1 2 ± r a₁ 2 2 − a₂, and domain w∈ −a1 2 − r a₁ 2 2 − a₂ , −a1 2 + r a₁ 2 2 − a₂ ! := (w_a, ¯wa) .

For the roots (or bounds) and thus for (σ₁2)∗to exist, it has to hold that a₁

2 2

− a₂> 0 .

Presuming that this is the case, we desire (in accord with the initial assumption w ∈ (0 , 1)) that both bounds are strictly positive, i.e. w1/2> 0, or more specifically wa> 0. On top of the condition for the roots to exist, this holds

(23)

if additionally a1< 0 and a2> 0 is satisfied. For a2> 0 we can rewrite ⇔ a2 = − (δ1− 2µY3) (ε2− δ1) > 0 ⇔ − δ1− 2µY3 > 0 ⇔ − σY3 2 µY σY 3 − γ1 ! − 2µ3 Y ! > 0 ⇔ σY3γ1> 0 ⇔ γ1> 0 ,

i.e. a2> 0 indeed corresponds to a positive skewness in case (i) as motivated by ε2> δ1(see above).

The condition a1< 0 can be rewritten as

a1 = (ε1+ 2δ1) (ε2− δ1) < 0 ⇔ ε1+ 2δ1< 0 ⇔ −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3+ 2 2µY3− σY3γ1 < 0 ⇔ −3 (µY− µ1) σY2+ (µY− µ1)3− 2σY3γ1< 0 ⇔ −3 2(µY− µ1) σ 2 Y+ 1 2(µY− µ1) 3 < σY3γ1,

which can be used to show that a1has a lower bound:

a1 = (ε1+ 2δ1) (ε2− δ1) > −1 ⇔ ε1+ 2δ1> −(ε2− δ1) ⇔ ε1+ ε2> −δ1 ⇔ −3 (µY− µ1) σY2− 4µY3+ (µY− µ1)3+ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3> − 2µY3− σY3γ1 ⇔ 2 (µY− µ1)3> σY3γ1, as −3 2(µY− µ1) σ 2 Y+ 1 2(µY− µ1) 3 < σY3γ1,

we can imply that

⇒ 3 2(µY− µ1) σ 2 Y+ 3 2(µY− µ1) 3_{> 0} ⇔ (µY− µ1) σY2+ (µY− µ1)2 > 0 ,

(24)

The condition a1> −1 implies a (non-binding) upper bound for a2of 1 4 > a₁ 2 2 > a2 > 0 . In summary, we have a₁∈ (−1 , 0) and a₂∈ 0 , a1 2 2 , such that the roots of

w2+ a1w+ a2= 0

lie between zero and one (for given µ₁∗).

The upper bound of the domain in (17) confirms this statement. It holds that ¯ wa = − a1 2 + r a₁ 2 2 − a2< 1 ⇔ r a₁ 2 2 − a2< 1 + a1 2 ⇔ a1 2 2 − a₂< 1 + a1+ a₁ 2 2 ⇔ 0 < 1 + a1+ a2 ⇔ 0 < 1 +(ε1+ 2δ1) (ε2− δ1) −δ1− 2µ 3 Y (ε2− δ1) ⇔ 0 < (ε2− δ1) + (ε1+ 2δ1) − (δ1− 2µY3) ⇔ 0 < ε2+ ε1+ 2µY3 = 2(µY− µ1)3,

which is true by µY > µ1in case (i). For the lower bound in (17) we find that

w_a = −a1 2 − r a₁ 2 2 − a₂> 0 ⇔ −a1 2 > r a₁ 2 2 − a₂ ⇔ a2> 0 ,

which has been initially assumed. We can thus conclude that a narrower domain of the quadratic inequality (17) is given by

w∈ (w_a, ¯wa) ⊂ (0 , 1) ,

being a cut set of the original interval.

It has to be emphasized that arbitrary parametric choices for w in a wide range of (0 , 1) will not always guarantee that (σ₁2)∗exists. This will only be true, if the chosen weight w lies in its narrow range, i.e. between waand ¯wa. To

(25)

Using (18), I define w2− 2w + b > 0 , with b := 3 (µY− µ1) σ 2 Y (ε2− δ1) −(δ1− 2µ 3 Y) (ε2− δ1) =: a3+ a2.

It can be shown that the intercept b lies between zero and one:

b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 ⇔ 3 (µY− µ1) σY2− 2µY3− σY3γ1− 2µY3 > 0 ⇔ 3 (µY− µ1) + σYγ1> 0 , b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 1 ⇔ 3 (µY− µ1) σY2− δ1+ 2µY3< (ε2− δ1) ⇔ 3 (µY− µ1) σY2+ 2µY3< ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3< 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 < (µY− µ1)3 .

Both statements are true, since µY> µ1and γ1> 0 hold in case (i) as shown above.

The (non-negative) roots of the quadratic equation arising from (18) are given by

w_1,2= 1 ±√1 − b with b∈ (0 , 1) ,

and the domain of this inequality computes to

w∈ (1 −/ √1 − b , 1 +√1 − b) := (wb, ¯wb) .

Regarding the upper bound ¯wb> 1 and the initial assumption w ∈ (0 , 1), the relevant domain of w in (18) thus

reduces to

w∈ (0 , wb) ⊂ (0 , 1) ,

again representing a cut set of the original interval.

Combining this result with the domain in (17), we find the closest possible range for w as a cut set of the domains of both inequalities, i.e.

(26)

Henceforth, I compare the location of the bounds. Since ¯wb> 1, it remains to be shown for the lower bound in (17)

that

w_a< wb

and for the upper bound in (17) either ¯

wa< wb, or w¯a> wb

is true, such that the narrow range always exists.

Proof We have b = 3 (µY− µ1) σ 2 Y (ε2− δ1) −(δ1− 2µ 3 Y) (ε2− δ1) = a3+ a2, with a₃:= 3 (µY− µ1) σ 2 Y (ε2− δ1) > 0 in (i) , such that 1 − b = 1 − a3− a2 = 1 − 3 (µY− µ1) σY2− (δ1− 2µY3) (ε2− δ1) = (ε2− δ1) − 3 (µY− µ1) σ 2 Y+ (δ1− 2µY3) (ε2− δ1) = (µY− µ1) 3 (ε2− δ1) , so we can express a1as a₁ = (ε1+ 2δ1) (ε2− δ1) = −3 (µY− µ1) σ 2 Y− 4µY3+ (µY− µ1)3+ 2δ1 (ε2− δ1) = −3 (µY− µ1) σ 2 Y (ε2− δ1) +(µY− µ1) 3 (ε2− δ1) +2(δ1− 2µ 3 Y) (ε2− δ1) = −a3+ 1 − a3− a2− 2a2 = 1 − 2a3− 3a2, or equivalently a3 = 1 − a1− 3a2 2 ,

which allows us to rewrite b in terms of parameters of inequality (17):

b = 1 − a1− a2

(27)

Thus, we can verify for the lower bound in (17) that w_a = −a1 2 − r a₁ 2 2 − a₂ < 1 2− r 1 4− a2 < 2 2− r 4 4− a2 < 1 −p1 − a3− a2 = 1 − √ 1 − b = wb,

which is a true statement since a1∈ (−1 ; 0) and a3> 0.

Now we regard the point of intersection of the quadratic functions in (17) and (18) w2+ a1w+ a2= w2− 2w + 1 − a1− a2 2 ⇔ (a1+ 2)w = 1 − a1− 3a2 2 ⇔ w˜ =1 − a1− 3a2 2(a1+ 2) = 3 2 (1 − a2) (2 + a1) −1 2 , having a function value of

q( ˜w) = ( ˜w)2− 2 ˜w+ b = ( ˜w− 1)2_{+ b − 1} = 3 2 (1 − a2) (2 + a1) −3 2 2 +1 − a1− a2 2 − 1 =9 4 (1 + a1+ a2) (2 + a1) 2 −1 + a1+ a2 2 .

It is of interest to identify the sign of this value, i.e. we ask

q( ˜w) T 0 ⇔ 9 4 (1 + a1+ a2) (2 + a1) 2 T1 + a1+ a2 2 ⇔ 9(1 + a1+ a2) T 2(2 + a1)2. If a1→ 0 then 9(1 + a2) > 8 ,

holds for all a2∈ (0 , 1₄).

However, if a1→ −1 then

9a2< 2 ,

(28)

That is, depending on the magnitude of a1, the function value q( ˜w) is either positive or negative. This means that

the point of intersection of the two upward opening parabolas lies either to the left or to the right of wb, implying

that either

¯

wa< wb, or w¯a> wb.

Thus we can conclude that the closest possible range for w is given by a cut set, for which (17) and (18) are jointly satisfied, i.e.

w∈ (w_a, min{ ¯wa; wb}) ⊂ (0 , 1) ,

with bounds wa , ¯waand wb as implicit functions of µ1 only (for given empirical moments up to m3), which can

be obtained via substitution using ε1and ε2via a1, a2and a3. For reasons stated in the next subsection, an explicit

expression for this narrow range is not intended.

Let us now, still in case (i), discuss if we would have assumed that ε2< δ1. First of all, this leads to an inversion of

the inequality signs in (17) and (18) such that

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) > 0 and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 0 , or alternatively w2+ a1w+ a2> 0 , and w2− 2w + b < 0 .

Starting by the second inequality, we find a lower bound for the intercept of

b = 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 1 ⇔ 3 (µY− µ1) σY2+ 2µY3< ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3< 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 < (µY− µ1)3 ,

(29)

We immediately see that the second inequality will never be satisfied in this instance, as

w2− 2w + b = (w − 1)2_{+ (b − 1) < 0}

does not hold for b > 1 .

Consequently, it is impossible to determine a real-valued solution to the implicit functions as given above in this sub-case (ε2< δ1), since (σ22)∗will never be defined, let a alone the desired PDF.

Turning to case (ii) with w ∈ (0 , 1) and µ1> µ2, these requirements for positive variances were derived in 4.3:

(ε2− δ1)w2+ (ε1+ 2δ1)w > δ1− 2µY3, and

(ε2− δ1) w2− 2(ε2− δ1)w + 3 (µY− µ1) σY2< δ1− 2µY3.

We again consider two sub-cases: ε2> δ1and ε2< δ1. In sub-case ε2> δ1we find

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) > 0 , and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) < 0 , or alternatively w2+ a1w+ a2> 0 , and w2− 2w + b < 0 .

The examination of the intercept b leads again to a contradiction. It holds that

b := 3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 1 ⇔ 3 (µY− µ1) σY2+ 2µY3> ε2 ⇔ 3 (µY− µ1) σY2+ 2µY3> 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)3 ⇔ 0 > (µY− µ1)3 ,

which is a true statement, since µY< µ1holds in case (ii) (see above), such that

w2− 2w + b = (w − 1)2+ (b − 1) < 0

(30)

To derive bounds for weight w that do not a priori encompass violations regarding the requirements (σ₁2)∗> 0 and (σ2

2)∗> 0 it is vital to assume that in case (ii) holds

ε2< δ1 ⇔ 3 (µY− µ1) σY2+ 2µY3+ (µY− µ1)2< (2µY3− σY3γ1) ⇔ (µY− µ1) 3σY2+ (µY− µ1)2 + σ_Y3γ1< 0 ,

which is generally true if γ1< 0 .

This leads us to the same initial setting as in sub-case ε2> δ1of (i) with

w2+(ε1+ 2δ1) (ε2− δ1) w−(δ1− 2µ 3 Y) (ε2− δ1) < 0 and w2− 2w +3 (µY− µ1) σ 2 Y− (δ1− 2µY3) (ε2− δ1) > 0 .

Solving these inequalities reproduces the same calculus as already presented in case (i), and reconfirms the same narrow range for w, i.e. w ∈ (wa, min{ ¯wa; wb}), however with one important difference, namely, case (ii) reflects

all instances for which γ1< 0, compared to γ1> 0 in case (i).

In conclusion, we can safely say that the component variances always exist, if w is chosen, or calibrated, in a real subset between zero and one. This holds irrespective of the sign of the difference in component means. Thus, for all w, being elements of the narrow range, the distribution parameters µ1and implicit µ2, σ1and σ2are unrestrainedly

capable to match the empirical skewness.

4.5 The solution set

As motivated above, the narrow range w was implicitly expressed in terms of µ1for given empirical characteristics

of Y , captured in µY, σY, δ1and δ2. The width of this range is given by the difference between the upper and the

lower bound, i.e.

min{ ¯wa; wb} − wa,

and is influenced by the magnitude of the empirical moments up to m3. The bounds themselves cannot be attained,

as this would imply that either (σ₁2)∗or (σ₂2)∗is equal to zero, such that one normal density component as well as the desired PDF are not defined then.

(31)

The explicit choice of w assuring that it ranges in (w_a, min{ ¯wa; wb}) ⊂ (0 , 1) requires the introduction of an

additional parameter, which links the two implicit bounds in a functional form. There are multiple ways to achieve this. For instance, for an s ∈ (0 , 1) we can think of an weighted arithmetic average of the two bounds, such that we can define our parametric choice as

w:= g(µ1; s)

= s · wa + (1 − s) · min{ ¯wa; wb} .

An alternative could be a weighted geometric average in the form of

w:= g(µ1; s)

= (wa) s

· (min{ ¯wa; wb})1−s .

By construction, both parameterized functions, g(µ1; s), increase monotonously from the lower bound, wa, to the

upper bound, min{ ¯wa; wb}, when s gradually increases. The exact functional form, however, is unknown.

At this point, the question that arises is whether it is actually necessary to introduce an additional parameter. The answer is no. Expressing both bounds implicitly reduces the dimension: w is lost. If we were to make use of these bounds explicitly, it is inevitable to connect them using an additional parameter, such that the original solution set remains intact. Both steps are superfluous when it comes to practical implementations. Apart from that, the high functional complexity in w := g(µ1; s) considerably slows down the efficieny of the algorithm.

The purpose of the last two subsections was to verify that component variances exist for a real w in an open subset between zero and one, which I call W in what follows. By comparing the location of the implicit bounds, we have learned that this narrow range always exists. The bounds and width of the corresponding interval are determined by the empirical moments of Y . Since the dimension cannot be further reduced, the solution set is fully represented by the vector field

(µ1, w) | (σ12)∗(µ1, w ; µY, σY2, δ1) > 0 ∧ (σ22)∗(µ1, w ; µY, σY2, δ1) > 0 ,

where (σ2

1)∗and (σ22)∗are given in (13) and (14), and w ∈ W ⊂ (0 , 1) .

5 Uniqueness of the solution

In the following, I investigate the moment matching function (15), which incorporates the implicit expressions for µ2, σ1and σ2: f(µ1) = 3w (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4 (µY− µ1)2 − (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 .

(32)

The goal is to verify that there exists a unique solution in the x-y-plane to the equation f (µ1; w) = 0, such that

component means and variances match the first four empirical moments of Y . We are dealing with a polynomial of the sixth degree in µ1for given µY, σY, γ1 and γ2, and some chosen w assuring the existence of the component

variances. Strictly speaking, the latter is a variable in the context of the non-linear system of equations (5a) to (5f). For the purpose of the proof, I treat it as a constant, merely satisfying w ∈ W ⊂ (0 , 1), such that (σ₁2)∗, (σ₂2)∗and not least f exist, which itself bases on the fact that the implicit variances are defined. I allow w to vary in section 7.

Looking at the term with highest exponent in f , we find a positive coefficient for µ₁6: 3w 2wµ₁4(1 − w)3+ 2w4µ₁4µ₁2− − 2wµ₁3(1 − w)2+ 2w3µ₁32> 0 ⇔ 3w2 2(1 − w)3+ 2w3µ₁6− w2µ₁6 − 2(1 − w)2+ 2w22> 0 ⇔ 3 2(1 − w)3+ 2w3 − − 2(1 − w)2+ 2w22> 0 ⇔ 3 2 − 6w + 6w2 − − 2 + 4w2> 0 ⇔ 3 − 9w + 9w2− 2 + 8w − 8w2_{> 0} ⇔ 1 − w + w2> 0 ⇔ w+ (1 − w)2> 0 ,

which is true for all w ∈ W ⊂ (0 , 1). That is, f is an upward opening function.

We will start by discussing the conditions under which f has exactly one real-valued solution. According to the fundamental theorem of algebra, a polynomial of order n has n complex roots and up to n real-valued roots. For a unique solution, f must have a “multiple” root at µ₁∗, in the sense that the other (n − 1) = 5 roots are collapsing in one single point. Since f is upward opening, a unique solution here also implies that f must have a global minimum at µ₁∗, such that f (µ1) > 0 for all µ16= µ₁∗, and f (µ₁∗) = 0 for µ1= µ₁∗, since the highest exponent is even. Both

conditions, i.e. a unique root representing the global minimum, will be sufficiently satisfied, if we are able to show that f is a convex function (or − f is concave). This behavior (or shape) implies that the intercept of horizontally shifted f (µ1− µ1∗) is equal to zero, such that the root of the latter function coincides with the origin of ordinates.

Proof

To verify convexity, I set f equal to zero and rearrange f(µ1) = 0 0 = 3w (δ2+ 2wµ14)(1 − w)3+ 2(1 − w)4 (µY− wµ1)4 (1 − w)4 ((wµ1+ (1 − w)µ2) − µ1)2 − (δ1− 2wµ13)(1 − w)2− 2(1 − w)3 (µY− wµ1)3 (1 − w)3 2 = 3w(1 − w)4 δ2+ 2wµ14+ 2(1 − w)µ24 (µ2− µ1)2 − (1 − w)4 δ1− 2wµ13− 2(1 − w)µ23 2 .

(33)

Further, I redefine f as f =: z1f1 − z2f2, with z1:= 3w(1 − w)4(µ2− µ1)2 > 0 , w∈ (0 , 1) z2:= (1 − w)4 > 0 , f1:= (δ2+ 2wµ14+ 2(1 − w)µ24) , f2:= δ1− 2wµ13− 2(1 − w)µ23 2 .

Since z1is a positive factor, we can verify that f1is convex. Define ˜f1(µ) := 2µ4+ δ2(being convex). By Jensen’s

inequality it follows that

f₁= wδ2+ (1 − w)δ2+ 2wµ14+ 2(1 − w)µ24

=: w ˜f1(µ1) + (1 − w) ˜f1(µ2)

≥ ˜f1(wµ1+ (1 − w)µ2)

= ˜f1(µY) .

Since z2is positive, f2is convex. Define f2:= q ˜f2(µ) = ˜f2(µ)

2

with ˜f2(µ) := −2µ3+ δ1(the latter is neither

convex nor concave). We find that

f2= wδ1+ (1 − w)δ1− 2wµ13− 2(1 − w)µ23 2 =: w ˜f2(µ1) + (1 − w) ˜f2(µ2) 2 =: qw ˜f2,1+ (1 − w) ˜f2,2 ≤ wq ˜f_2,1 + (1 − w)q ˜f2,2 ,

where q (or equivalently f2) is a quadratic function (being convex) evaluated at points ˜f2,1and ˜f2,2, which represent

cubic transformations of µ1and µ2, respectively.

Based on the fact that f is upward opening, we can imply that the scaled difference between convex functions f1

and f2, i.e. f = z1f1− z2f2, is also convex. This is also supported by the symmetry of ˜f1(µ) and ˜f2(µ), which are

axis-symmetric and point-symmetric at zero, respectively. This feature will be preserved under multiplication with z1, z2, w and (1 − w), such that the linear transformation µ2= (µY− wµ1)/(1 − w) should allow the conclusion that

f is axis-symmetric at µ₁∗; or at least its parent function(s) at zero.

(34)

Now, I examine the intercept of f shifted by µ₁∗. Using (15), I define the centered function as f(µ1− µ1∗) = 3w δ2+ 2w(µ1− µ1∗)4(1 − w)3+ 2 µY− w(µ1− µ1∗) 4 µY− (µ1− µ1∗) 2 − δ1− 2w(µ1− µ1∗)3(1 − w)2− 2 µY− w(µ1− µ1∗) 32 =: c(w) + ˜f(µ1− µ1∗) .

A unique solution for f (µ1) exits, if the intercept c(w) of f (µ1− µ1∗), representing the terms that are independent

of µ1, is equal to zero, i.e.

c(w) = 3w δ2+ 2w(µ1∗)4(1 − w)3+ 2 µY+ wµ₁∗ 4 (µY+ µ₁∗)2 − δ1+ 2w(µ1∗)3(1 − w)2− 2 µY+ wµ1∗ 32 ! = 0 ,

The question that arises at this point is whether the intercept c(w), a function of w ∈ W ⊂ (0 , 1), intersects the x-axis after all. This will be proven next. As a remark, even if c(w) exhibits multiple roots, w∗, the solution to (15) will be unique in any case, since f (µ1− µ1∗) will touch the x-axis in its global minimum at zero, and so will f (µ1)

at µ₁∗. Yet, multiple roots would each involve a different set of parameters µ₁∗, as well as implicitly defined µ₂∗, σ₁∗, and σ₂∗, which by construction match the first four moments.

Proof

We regard the domain of co-ordinates c(w). Taking limits w.r.t. w in a wide range of (0 , 1) leads to limw→0 c(w) = −(δ1− 2µY3)2 = −σY6γ12 < 0 , if γ16= 0 , and

limw→1 c(w) = 2(µY+ µ₁∗)6 > 0 , since µ₁∗ 6= µ₂∗ 6= µY ,

which allows us to conclude that w.l.o.g.

∃ w∗∈ W ⊂ (0 , 1) s.t. c(w∗) = 0 ,

since c is continuous in w.

That is, there exists a w∗ within the open subset, for which the intercept of convex f (µ1− µ1∗) is equal to zero,

such that the root µ₁∗of f (µ1) can be determined for any skewed parametrization regarding the first four empirical

moments of Y . Note that for non-skewed variables the calibrated component means µ₁∗and µ₂∗would be equal; this is, however, not possible in this algorithm, as the implicit variances are not defined then (matrix A is singular then; see section 3).

(35)

6 The formal framework to fit four moments − An algebraic summary

This section summarizes the central results and equations from previous sections that need to be considered in a four moment fitting procedure.

The root µ₁∗of the function f(µ1) := 3w (δ2+ 2wµ14)(1 − w)3+ 2(µY− wµ1)4 (µY− µ1)2− (δ1− 2wµ13)(1 − w)2− 2(µY− wµ1)3 2 ,

with observed parameters (calculated from empirical data)

δ1= σY3 2 µY σY 3 − γ1 ! , δ2= σY4 γ2+ 2 µY σY 2γ1− µY σY 3!! ,

and some chosen w ∈ W ⊂ (0 , 1) such that component variances are positive, i.e.

(µ∗

1, w) | (σ12)∗(µ1∗, w ; µY, σY2, δ1) > 0 ∧ (σ22)∗(µ1∗, w ; µY, σY2, δ1) > 0

assures via implicit expressions

µ₂∗= µY− wµ ∗ 1 (1 − w) , (σ₁2)∗= σY2+ µY2+ δ1− 2 w(µ1∗)3+ (1 − w)(µ2∗)3 3w(µ₂∗− µ∗ 1) − (µ₁∗)2, (σ₂2)∗= σY2+ µY2− δ1− 2 w(µ1∗)3+ (1 − w)(µ2∗)3 3(1 − w)(µ₂∗− µ∗ 1) − (µ₂∗)2, that the PDF p(y) = w · 1 σ₁∗ √ 2πe −(y−µ∗1) 2 2(σ 2₁)∗ _{+ (1 − w) ·} 1 σ₂∗ √ 2πe −(y−µ∗2) 2 2(σ 2₂)∗ with probability

P(y) = w · N(y ; µ₁∗, σ₁∗) + (1 − w) · N(y ; µ₂∗, σ₂∗)

(36)

6.1 Numerical solution

It is easy to see that the derivation of a closed-form solution in this framework is not trivial. For practical purposes, I will focus on a numerical solution, whereby the root µ₁∗ is approximated in an iterative procedure. This can be done by the Newton-Raphson method. According to this, a closer approximation to the root is given by

µ1,n+1= µ1,n−

f(µ1,n; w)

f0(µ1,n; w)

in the (n + 1)-th iteration, where

µ1,1= µ1,0−

f(µ1,0; w)

f0_(µ 1,0; w)

is calculated from a chosen start value µ1,0, each for a fixed w as well as given empirical moments. The procedure

is repeated until convergence is reached.

To get an idea of the narrow range, the procedure is performed for all w in a wide range between zero and one in steps of one-hundredth, thereby disregarding complex solutions. This allows us to generate a family of PDFs with shapes that vary in w ∈ W ⊂ (0 , 1), yet each matching the first four empirical moments of Y . The next section takes a closer look at this.

7 Parametric choices and their implications − Graphical illustrations

We have seen that the component variances exist, and that it is possible to find a real-valued solution to (15), which incorporates (5a) to (5d), if and only if weight w is an element of the vector field given in (16). Thereby, the root µ₁∗and implicit expressions for µ₂∗, (σ₁2)∗ and (σ₁2)∗, evaluated at µ₁∗and an arbitrarily chosen w ∈ W ⊂ (0 , 1), allows us to capture the first four empirical moments.

The choice of w has two crucial implications. First, w ∈ W guarantees that the implicit standard deviations can be evaluated. Second, each w ∈ W corresponds to a different set of implied higher moments for each fitted density. This is caused by the evolution of moments of the normal distribution given by

E[Uk] = k

∑

i=0 k i µk−iσi 1 + (−1) i 2 (i − 1)!! .

In particular, a marginal change of w causes f to shift (see (15)), such that we find another µ₁∗and thus different values for implicit µ₂∗, (σ₁2)∗and (σ₁2)∗. That is, the location and the dispersion of either normal density component changes in such a way that the marginal effects on the distribution parameters do not cancel each other out for the implied higher moments beyond m4. The latter are functions of w for fitted four-moment parameters µ1∗, µ2∗, (σ12)∗

(37)

and (σ₂2)∗. Each w ∈ W thus corresponds to a different PDF that matches the first four moments, yet every curve of the array has different theoretical higher moments. In this sense w is the only free parameter left, and it can be used to fit a fifth moment.

In order to obtain a family of PDFs, we require seven separate functions in Matlab: implicit mean (6) and variances (13) and (14), the moment matching function (15), nested empirical parameters δ1 and δ2, as well as the density

p(y). The functions are appended in A.1. Now, we let w move from zero to one in steps of one-hundredth. In each step, we approximate the root of f , evaluate the implicit functions at µ₁∗and w, redefine the PDF and calculate the moments up to m6numerically. To distinguish graphs, a color-shift can be programmed. The loop can be found in

appendix A.2.

To illustrate the impact of the higher moments, I present three families of PDFs for each different combinations of the mean, standard deviation, skewness and excess kurtosis. In particular, I plot graphs of an almost non-skewed, leptokurtic family, an array of curves with negative skewness and no excess kurtosis, as well as a skewed, platykurtic family, corresponding to Figure 1, 2, and 3, respectively. Some general points can be made here. First, the bendiness of the mixed Gaussian distribution is able to reflect various different shapes. This is a natural consequence of the five-dimensional parameter space. Second, there appears to be a connection between the moments and the modality of a distribution, e.g. uni-modal curves were obtained for non-skewed variables with high excess kurtosis, whereas skewed variables with low excess kurtosis resulted in bi-modal densities. This demonstrates that the algorithm can be applied to various data sets in economics and finance, such as income and return distributions. Last and most notably, the high variability in either family of curves indicates that even slightest changes in the higher moments, caused by extreme observations, can have a dramatic impact on the shape of the density.

The implied evolution of higher moments is not only a reflection of the specific choice of weight w, but also the location and width of the interval W . The latter, and thus the freedom by which the two densities can be combined, is determined by the data itself, namely, the first four empirical moments. Incidentally, for leptokurtic variables I noticed that w tended to be closer to zero; for platykurtic variables w was closer to1_/₂_{. Also, very high or low values}

for skewness and/or excess kurtosis resulted in a narrower range for w; occasionally, the width of the interval was less than1_/₁₀₀_{. Thus, the choice of w in an a priori wider subset should naturally be accompanied by a wider set of}

possible density shapes, even though the first four moments are properly represented.

To limit the ambiguity arising from parametric choices, it is necessary to align w with the data, i.e. a fifth moment needs to be captured. This will be the goal of the next two sections. Section 8 analyzes possible constraints and the behavior of the implied higher moments when w changes. Further, I outline how the algorithm can be modified in order to find solutions to both µ₁∗and w∗. In section 9, I calibrate probabilities to a real data set using a five-moment fitting procedure and test the results.

(38)

(39)

(40)

(41)

8 Fitting a fifth moment

The requirement to fit a fifth moment is owed to the fact that parametric choices for w entail unrealistic assumptions regarding the implied evolution of higher moments. In what follows, I analyze the capability of the algorithm to reflect more information of the data, primarily captured in the tails. The focus will be the fifth and sixth moment of Y , or in standardized terms: the hyper-skewness and hyper-flatness.

The formal definition of the four moments distribution (and its implied data generating process), with PDF

p(y) = w · 1 σ₁∗ √ 2πe −(y−µ∗1) 2 2(σ 2₁)∗ _{+ (1 − w) ·} 1 σ₂∗ √ 2πe −(y−µ∗2) 2 2(σ 2₂)∗ _,

does not impose restrictions on higher moments. This can be easily visualized when keeping four parameters fixed, e.g. w, µ2, σ12 and σ22. Since µ1 can be any real number, any possible value for an odd moment can be obtained,

once we let µ1move from minus to plus infinity. The same holds for even moments in R+. This is a consequence of

the normal distribution, which has a support y ∈ R. Knowing that w ∈ W ⊂ (0 , 1) will not make this any different, as W is not closed. If w approaches a bound of the narrow range either σ₁2or σ₂2converges to a real number or to zero. This partial reduction will be compensated by a respective change of the remaining parameters when fitting four moments, whereby the implied higher moments change accordingly. In other words, it is possible to find µ₁∗ and w∗such that the first four and a fifth moment can be described by p(y).

8.1 Marginal changes in described and implied moments

Now, I examine the behavior of the implicit functions as well as the modelled moments when w changes. Starting by the implicit mean given in (6), the first derivative w.r.t. w computes to

∂ µ2 ∂ w = (−µ1− w∂ µ_{∂ w}1)(1 − w) + (µY− wµ1) (1 − w)2 = − w (1 − w) ∂ µ1 ∂ w + (wµ1+ (1 − w)µ2− µ1) (1 − w)2 . = − w (1 − w) ∂ µ1 ∂ w + (µ2− µ1) (1 − w) . (19)

The empirical mean as modelled in (5a) is invariant in w. This can be shown by applying the total derivative: ∂ µY(w, µ1, µ2) ∂ w = ∂ µY ∂ w ∂ w ∂ w+ ∂ µY ∂ µ1 ∂ µ1 ∂ w + ∂ µY ∂ µ2 ∂ µ2 ∂ w = (µ1− µ2) · 1 + w · ∂ µ1 ∂ w + (1 − w) · − w (1 − w) ∂ µ1 ∂ w + (µ2− µ1) (1 − w) = 0 .

(42)

Regarding the component variances, I define τ := δ1− 2 w(µ1)3+ (1 − w)(µ2)3 3w(µ2− µ1) ,

such that their implicit expressions, given in (13) and (14), can be written as

σ₁2= σ_Y2+ µ_Y2+ τ − (µ1)2, and

σ₂2= σ_Y2+ µ_Y2− w

(1 − w)τ − (µ2)

2_.

Taking derivatives w.r.t. w for the first component variance leads to ∂ σ₁2 ∂ w = ∂ τ ∂ w− 2µ1 ∂ µ1 ∂ w = − 2 (µ1)3− (µ2)3+ 3w(µ1)2 ∂ µ_{∂ w}1+ 3(1 − w)(µ2)2 ∂ µ_{∂ w}2 3w(µ2− µ1) (3w(µ2− µ1))2 − δ1− 2 w(µ1)3+ (1 − w)(µ2)3 3(µ2− µ1) + 3w ∂ µ1 ∂ w + ∂ µ2 ∂ w (3w(µ2− µ1))2 − 2µ1 ∂ µ1 ∂ w = − 2 (µ1)3− (µ2)3+ 3w(µ1)2 ∂ µ_{∂ w}1+ 3(1 − w)(µ2)2 − w (1−w) ∂ µ1 ∂ w + (µ2−µ1) (1−w) (3w(µ2− µ1)) − τ · 3(µ2− µ1) + 3w − w (1−w) ∂ µ1 ∂ w + (µ2−µ1) (1−w) − ∂ µ1 ∂ w (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w = − 2 (µ1)3− (µ2)3+ 3w (µ1)2− (µ2)2 ∂ µ1 ∂ w + 3(µ2) 2_(µ 2− µ1) (3w(µ2− µ1)) − τ · 3(µ2− µ1) + 3w _(µ 2−µ1) (1−w) − 1 (1−w) ∂ µ1 ∂ w (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w = − 2 (µ1)3− (µ2)3+ 3(µ2)2(µ2− µ1) (3w(µ2− µ1)) − τ · 3(µ2− µ1) + 3w _(µ 2−µ1) (1−w) (3w(µ2− µ1)) −6w (µ1) 2_{− (µ} 2)2 ∂ µ1 ∂ w (3w(µ2− µ1)) + τ · 3w (1−w) ∂ µ1 ∂ w (3w(µ2− µ1)) − 2µ1 ∂ µ1 ∂ w = − 2 (µ1)3− (µ2)3+ 3(µ2)2(µ2− µ1) (3w(µ2− µ1)) − τ · 1 +_(1−w)w w + 2(µ1+ µ2) ∂ µ1 ∂ w + τ · 1 (1 − w)(µ2− µ1) ∂ µ1 ∂ w − 2µ1 ∂ µ1 ∂ w ,