Bias of the maximum likelihood estimator of the generalized Rayleigh distribution

(1)

by

Xiao Ling

B.Sc., Beijing Normal University, 2007

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

MASTER OF ARTS

in the Department of Economics

c

Xiao Ling, 2011 University of Victoria

(2)

Bias of the Maximum Likelihood Estimator of the

Generalized Rayleigh Distribution

by

Xiao Ling

B.Sc., Beijing Normal University, 2007

Supervisory Committee

Dr. David E.A. Giles, Supervisor (Department of Economics)

Dr. Kenneth G. Stewart, Departmental Member (Department of Economics)

(3)

Supervisory Committee

Dr. David E.A. Giles, Supervisor (Department of Economics)

Dr. Kenneth G. Stewart, Departmental Member (Department of Economics)

ABSTRACT

We derive analytic expressions for the biases, to O(n−1) of the maximum likelihood estimators of the parameters of the generalized Rayleigh distribution family. Using these expressions to bias-correct the estimators is found to be extremely effective in terms of bias reduction, and generally results in a small reduction in relative mean squared error. In general, the analytic bias-corrected estimators are also found to be superior to the alternative of bias-correction via the bootstrap.

(4)

List of Tables

Table 3.1 Parameter setting . . . 25 Table 3.2 Parameter setting for bootstrapping . . . 27 Table 3.3 Percentage biases and MSE’s of the Rayleigh distribution . . . 30 Table 3.4 Percentage biases and MSE’s of the Maxwell distribution . . . . 31 Table 3.5 Percentage biases and MSE’s of the Half-Normal distribution . 32 Table 3.6 Percentage biases and MSE’s of the Chi distribution . . . 33 Table 4.1 Total Expenditure on Alcohol and Conditional Budget Shares

for Alcohol: United Kingdom, 1955-1985 . . . 39 Table 4.2 Estimates of Generalized Regression Model with Half Normal

distribution . . . 40 Table 4.3 Mean Marginal Effects . . . 41

(7)

List of Figures

Figure 1.1 Four density plots illustrating specific distributions in the Gen-eralized Rayleigh distribution family: (a) the Rayleigh distribu-tion with parameter λ = 2, (b) the Maxwell distribudistribu-tion with parameter λ = 2, (c) the Half-Normal distribution with σ = 2, and (d) the Chi distribution with parameters a = 1 and τ = 1. 3 Figure 3.1 The Rayleigh probability density plot with different parameter

values: λ = 0.5, 1, 2 and 4. . . 22 Figure 3.2 The Maxwell probability density plot with different parameter

values: λ = 0.5, 1 and 2. . . 23 Figure 3.3 The Half Normal probability density plot with different

param-eter values: σ = 0.5, 1, 2 and 3. . . 24 Figure 3.4 The Rayleigh probability density plot with different parameter

values: τ = 0.5, 1, 2 and 3 in both (a) and (b); a = 1 in (a) and a = 3 in (b). . . 24 Figure C.1 The plot of c.d.f for Rayleigh distribution with scale parameter

(8)

ACKNOWLEDGEMENTS

I would first like to acknowledge my gratitude to my supervisor, Dr. David Giles who has guided me into the field of bias correction. During my study at the University of Victoria, he has provided infinite help and patience. I would also like to express my thanks to my committee members, who have been very helpful in assisting in the completion of the thesis. Finally, I would like to thank my family members for their support.

(9)

Introduction

1.1 The Generalized Rayleigh Distribution

Voda (1976) derived a generalized version of the Rayleigh distribution called the Generalized Rayleigh distribution (GRD). Its density function, which differed from another version introduced by Burr is given as follows:

f (x; θ, k) = 2θ

k+1

Γ(k + 1)x

2k+1_exp{−θx2_}, _(1.1)

with x > 0, θ > 0, and k ≥ 0, where Γ(u) is the well-known gamma function: Γ(u) =R₀∞tu−1_e−t_{dt. A wide range of probability distributions emerge as particular}

cases of the distribution given in (1.1), as follows:

(a) For k = 0 and θ = 1/2λ2 _{we obtain the one-parameter Rayleigh distribution with}

the density function as follows:

f (x; λ) = x

λ2exp{−

x2

2λ2}, x > 0, λ > 0; (1.2)

(10)

with the density function as follows: f (x; λ) = 2 λ3_(2π)1_/2x 2_exp{− x2 2λ2} x > 0, λ > 0; (1.3) (c) For k = a 2 − 1 and θ = 1/2τ

2 _{we obtain the Chi distribution with “a” degrees of}

freedom, and its density function is as follows:

f (x; τ, a) = x a−1 2a2a−1τaΓ(a 2) exp{− x 2 2τ2}, x > 0 a ∈ N, τ > 0; (1.4)

(d) If we drop the positivity requirement for k and take k = −1/2 and θ = 1/2σ2, we obtain the Half-Normal distribution with the density function as follows:

fHN(x; σ) =

2

σ(2π)1/2exp{−

x2

2σ2}, x > 0, σ > 0. (1.5)

Figure 1.1 shows some illustrative density functions for each distribution in the Gen-eralized Rayleigh family. Figure 1.1 (a) shows the density plot of the Rayleigh distri-bution with parameter λ = 2. (b) shows the density plot of the Maxwell distridistri-bution with parameter λ = 2. (c) is the density plot of the Half-Normal distribution with σ = 2. Finally, (d) is the density plot of the Chi distribution with parameters a = 1 and τ = 1.

Given this illustration of the special cases that emerge with different parameter values under the Generalized Rayleigh distribution family, applying a model based on the Generalized Rayleigh distribution family seems to be very useful. For example it is more flexible than the widely used Weibull model for statistical modeling of reliability, since the latter includes only the Rayleigh distribution as a special case, while the former also encompasses the Maxwell distribution. In addition, other distributions, such as the Half-Normal distribution in (1.5) can also be applied widely in social

(11)

0 2 4 6 8 10 0.00 0.15 0.30 (a)

x

pdf(x)

0 1 2 3 4 5 6 0.00 0.15 0.30 (b)

x

pdf(x)

0 1 2 3 4 5 0.1 0.3 (c)

x

pdf(x)

0 1 2 3 4 5 6 0.0 0.4 0.8 (d)

x

pdf(x)

Figure 1.1: Four density plots illustrating specific distributions in the Generalized Rayleigh distribution family: (a) the Rayleigh distribution with parameter λ = 2, (b) the Maxwell distribution with parameter λ = 2, (c) the Half-Normal distribution with σ = 2, and (d) the Chi distribution with parameters a = 1 and τ = 1.

(12)

science research.

More precisely, let us look into some application examples for each distribution in the Generalized Rayleigh distribution family. A Rayleigh distribution has many appli-cations in life testing electro-vacuum devices (Polovko, 1968), and in communication engineering (Dyer and Whisenand, 1973). In addition it also arises when the overall magnitude of a vector is related to its directional components. “One example where the Rayleigh distribution naturally arises is when wind speed is analyzed in terms of its orthogonal 2-dimensional vector components. Assuming that the magnitude of each component is uncorrelated and normally distributed with equal variance, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distri-bution. A second example of this distribution arises in the case of random complex numbers whose real and imaginary components are i.i.d. (independently and identi-cally distributed) Gaussian. In that case, the absolute value (modulus) of the complex number is Rayleigh-distributed. These random variables play an important role in land mobile radio because they accurately describe the instantaneous amplitude and power, respectively, of a multipath fading signal” (Wikipedia, 2011a).

The Maxwell distribution describes how the speed of a mixture of moving particles varies at a particular temperature (Wikipedia, 2011b). In any particular mixture of moving molecules, the speed will vary a great deal, from very slow particles (low energy) to very fast particles (high energy). Most of the particles, however, will be moving at a speed very close to the average. In other words, this distribution describes particle speeds in gases, where the particles do not constantly interact with each other but move freely between short collisions. It describes the probability of a particle’s speed (the magnitude of its velocity vector) being near a given value as a function of the temperature of the system, the mass of the particle, and that speed value.

(13)

The Half-Normal distribution is an extension of the Normal distribution. When-ever a difference or deviation is measured and the algebraic sign is unknown, disre-garded, lost, or otherwise absent, the resulting distribution of these absolute measure-ments can range in shape from Half-Normal to the normal distribution as the limit. Typical of this are the examples which arise in industrial practice, such as quality control.

The Chi distribution usually arises when a k-dimensional vector’s orthogonal com-ponents are independent and each follow a standard normal distribution. The length of the vector will then have a Chi distribution. The most familiar example is the Maxwell distribution of (normalized) molecular speeds which is a Chi distribution with 3 degrees of freedom.

The variety of applications described above demonstrates the reason to investi-gate the Generalized Rayleigh distribution in this thesis. Most researches studying the Generalized Rayleigh distribution concern the moment properties or confidence intervals, not the goodness of fit. This thesis focuses on the bias and mean squared error of the maximum likelihood estimator for the parameters in the Generalized Rayleigh distribution family.

1.2 Estimation Issues

There are several general estimation strategies that can be used in a wide variety of situations, such as generalized method of moments, maximum likelihood, simulation-based estimation and Bayesian methods. Among these methods, the maximum likeli-hood estimator (MLE) is very well known and popular, and this is going to be applied in this thesis to estimate the two unknown parameters in the Generalized Rayleigh distribution given in (1.1). It should be noted that these MLEs cannot be expressed

(14)

in closed form, but we will still be able to obtain an analytic expression for their bias, to O(n−1).

In statistical analysis, an estimator is a rule for calculating an estimate of a partic-ular unknown parameter, based on observed data. Econometric theory is concerned, among other things, with the properties of estimators; that is, with defining prop-erties that can be used to compare different estimators (different rules for creating estimates) for the same quantity, based on the same data. There are two kinds of properties: finite-sample properties, including mean squared error, variance, bias, and so on; and asymptotic properties, including consistency, asymptotic normality, asymptotic efficiency, and so on. Such properties can be used to determine the best rules to use under given circumstances. In this thesis, we are going to focus on one particular issue: the bias of the MLE for the parameters of the Generalized Rayleigh distribution, in finite samples.

In statistics, the bias (or bias function) of an estimator is the difference between this estimator’s expected value and the true value of the parameter being estimated. Suppose that ˆξ is an estimator of the parameter ξ. Then Bias( ˆξ) = E( ˆξ) − ξ. An estimator with zero bias is called unbiased; Bias( ˆξ) = 0. Otherwise the estimator is said to be biased. As is well known, the MLE of the location parameter for the normal distribution is unbiased. But the MLE of the scale parameter for the normal distribution is not. To reduce it, first we need to find the expectation of MLE for the scale parameter. Fortunately, in the Normal distribution, it is easy to find a closed form MLE for the scale parameter. Then we can compute the bias for this parameter and eliminate it exactly. However, for many other distributions, their MLE’s are not in closed form, which means their bias can not be easily determined, or eliminated.

One of the early considerations of analytic approximations to the bias of an MLE was illustrated by Bartlett (1953a). It was the the derivation of the O(n−1) bias for

(15)

the MLE of a one dimensional parameter. To illustrate the basic strategy of this derivation, we set l(θ) as the log-likelihood function for single parameter, θ. Assume that l(θ) is regular with respect to all derivatives up to and including the third order. If ˆθ is the MLE, then l0(ˆθ) ≡ (∂l/∂θ)_|θ=ˆ_θ = 0. Another very important property is that the expectation of the first derivative of the log-likelihood function is zero, i.e. E(l0(θ0)) = 0 where θ0 could be any number in its domain. Then if we consider the

Taylor expansion of the first derivative of the log-likelihood function when θ = ˆθ, we get the following:

l0(ˆθ) = l0(θ0) + (ˆθ − θ0)l00(θ0) +

1

2(ˆθ − θ0)

2

l000(θ0) + ... = 0.

If we take the expectation of both sides of the equation above, we get the following:

(1.6) E((ˆθ − θ0))E(l00(θ0)) + cov((ˆθ − θ0), l00(θ0))

+1 2E((ˆθ − θ0) 2_)E(l000 (θ0)) + 1 2cov((ˆθ − θ0) 2_{, l}000 (θ0)) ≈ 0.

Then we can solve a closed form expression for E((ˆθ − θ0)) which is the bias

equation of the MLE, to O(n−1). Note, however, that we do not need the closed form expression for ˆθ in order to derive the bias equation. This workable sense of the analysis inspired many scientists to study this area during the latter half of twentieth century.

Haldane and Smith (1956), derived expressions for the first four cumulants to this same order of accuracy, and Shenton and Bowman (1963) obtained higher-order approximations for the first four moments of an MLE. Bartlett (1953b) and Haldane (1953) explored the bias of an MLE when p = 2, and Shenton and Wallington (1962) and Cox and Snell (1968) derived formula for the O(n−1) bias of an MLE in the multi-parameter case. In this thesis, we will apply the theory developed by Cox and Snell.

(16)

On the other hand, there are several other methods to reduce the bias, for exam-ple, the Jacknife bias reduction, bootstrap bias reduction and the analytic reduction introduced by Firth (1993). The idea of Jacknife bias reduction is recomputing the statistic estimate leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias of the statis-tic can be calculated. The idea of bootstrap bias reduction is that recalculating the statistic estimate from a bulk of equal size new samples sampling from the original data set with replacement. Then an estimate for the bias of the statistic can be calcu-lated from those resamples. Although there are huge theoretical differences between these two methods in their mathematical insights, the main practical difference for researchers is that the bootstrap gives different results when repeated on the same data, whereas the jackknife might give exactly the same result each time.

Firth considered reducing bias in a different theoretical way. The O(n−1) bias may be removed from the maximum likelihood estimator by the introduction of an appropriate bias term into the score function. If the target parameter is the canonical parameter of an exponential family, the method simply penalizes the likelihood by the Jeffreys invariant prior. For other parameterizations of exponential family mod-els, and for nonexponential families, a choice is available between corrections using observed and expected information. Outside the exponential family, use of the ex-pected information results in a loss of second-order efficiency. The difference between the method introduced by Firth and the one by Cox and Snell is that Firth proposed some consideration of reducing the bias from the score function, while Cox and Snell made some reduction based on the MLE from the original score function.

We will focus on the method introduced by Cox and Snell and the bootstrapping one. However, the theory introduced by Cox and Snell is the most important method we are going to discuss in this thesis. The basic idea is to improve the MLE by

(17)

subtracting the estimated bias . We work on an appropriate bias term coming from (1.6) introduced by Bartlett. With some early papers about this expansion idea for multiple parameters discussed before, the bias estimates for two parameters in the Generalized Rayleigh distribution are also easy to work out. Then, the behavior of the bias-corrected estimates for the parameters in the Generalized Rayleigh distribution is explored in a simulation study. Moreover, we also discuss the effectiveness of reducing the bias of the MLE for the Generalized Rayleigh distribution by using bootstrap simulation.

This thesis is concerned with investigating the bias property of the MLE in the generalized Rayleigh distribution step by step. The rest of this thesis is organized as follows. In Chapter 2, we review important concepts and tools of Cox and Snell’s idea. We first demonstrate the notations and definitions illustrated by Cox and Snell (1968). Then we introduce some examples with well known distribution such as the binomial distribution and normal distribution, to see how the idea of Cox and Snell works. We also discuss the analytical expressions for the biases of the MLEs in the Generalized Rayleigh distribution, and then illustrate the details of these biases and the corresponding bias-adjusted MLEs.

In Chapter 3, a Monte Carlo experiment is used to compare the performance of the latter estimators with that of bias-adjusted estimators that use the bootstrap to determine finite-sample bias. We start by describing the experimental designs for both Monte Carlo simulation and Bootstrapping. Then we will discuss the sampling properties of the bias of MLE with different sample sizes. An empirical example using real data is also included in Chapter 4.

Finally, in Chapter 5 we provide some general discussion of the advantages and disadvantages of Cox and Snell’s method and briefly discuss future work.

(18)

Chapter 2 Theoretical Results

To prepare for the discussion of our bias correction methods for the estimators of the two parameters in the Generalized Rayleigh distribution, we review the important bias derivation introduced by Cox and Snell (1968) in this chapter. In Section 2.1, we discuss the idea proposed by Cox and Snell with basic notations and definitions. In Section 2.2, we apply the procedure described by Cox and Snell to two well known distributions and compare those bias property with bias definition. In Section 2.3, more details about the derivatives of the Generalized Rayleigh distribution that we are going to use in this thesis are calculated.

2.1 Notation and Definition

First of all, we set up the notation, before we start to describe the procedure of estimating bias that we are going to use in this thesis. Let l(θ) be the log-likelihood function based on a sample of n observations, with p-dimensional parameter vector, θ. Assume that l(θ) is regular with respect to all derivatives up to and including the third order. Then recall the basic strategy we described in Chapter 1. The procedure of bias analysis we are going to use involves the joint cumulants of the derivatives of

(19)

l(θ) and the derivatives of cumulants. The joint cumulants of the derivatives of l(θ) are denoted as follows:

kij = E ∂2_l ∂θi∂θj ; i, j = 1, 2, . . . , p kijl = E ∂3l ∂θi∂θj∂θl ; i, j, l = 1, 2, . . . , p (2.1) kij,l = E ∂2_l ∂θi∂θj ∂l ∂θl ; i, j, l = 1, 2, . . . , p.

On the other hand, the derivatives of cumulants are denoted as follows:

k(l)_ij = ∂kij ∂θl

; i, j, l = 1, 2, . . . , p. (2.2) All of the expressions in (2.1) and (2.2) are assumed to be O(n). Extending earlier work by Tukey (1949), Bartlett (1953a, 1953b), Haldane (1953), Haldane and Smith (1956), Shenton and Wallington (1962) and Shenton and Bowman (1963), Cox and Snell (1968) showed that when the sample data are independent (but not necessarily identically distributed) the bias of the sth _{element of the MLE of θ (ˆ}_{θ) is:}

Bias(ˆθs) = p X i=1 p X j=1 p X l=1 ksiksj(1

2kijl+ kij,l) + O(n

−2_), _{s = 1, 2, ..., p,}

where kij _{is the (i, j)}th _{element of the inverse of the (expected) information matrix,}

K = {−kij}. Cordeiro and Klein (1994) note that this bias expression also holds if

the data are nonindependent, provided that all of the k terms are O(n), and show that it can be re-written as:

(20)

Bias(ˆθs) = p X i=1 ksi p X j=1 p X l=1 ksj(k(l)_ij − 1 2kijl)k jl_{+ O(n}−2_), _{s = 1, 2, ..., p.}

The computational advantage of equation above is that it does not involve terms of the form defined in the third equation of (2.1). Now, let a(l)_ij = k_ij(l)− 1

2kijl, for

i, j, l = 1, 2, ..., p, and define the following matrices:

A(l) = {a(l)_ij}; i, j, l = 1, 2, . . . , p

A = [A(1)|A(2)_{|· · · |A}(p)_].

Cordeiro and Klein (1994) show that the expression for the O(n−1) bias of ˆθ can be re-written as:

Bias(ˆθ) = K−1 A vec(K−1) + O(n−2). (2.3)

A “bias-corrected” MLE for θ can then be obtained as:

˜

θ = ˆθ − ˆK−1 A vec( ˆˆ K−1), (2.4)

where ˆK = (K)|_θˆ and ˆA = (A)|_θˆ. It can be shown that the bias of ˜θ will be O(n−2).

It is crucial to note that (2.3) and (2.4) can be evaluated even when the likelihood equation does not admit a closed-form analytic solution, so that the MLE has to be obtained via a numerical solution. For this reason, this methodology is very useful for the Generalized Rayleigh distribution. Applying this procedure to establish the bias of the MLEs for some well-known distributions, such as the normal distribution, is very helpful. Then, some applications are illustrated in the next section.

(21)

2.2 Motivating Examples

First of all, we would like to apply the Cox-Snell methodology in the context of a distribution with a single parameter, the binomial distribution for instance. Suppose there are n independent Bernoulli trials and the probability of success is θ. Then the likelihood function of this binomial distribution is

P r(X = x) =     n x     θx(1 − θn−x) = L(θ|x); x = 0, 1, 2, . . . , n; 0 < θ < 1.

The log-likelihood function is

l(θ) = const. + xln(θ) + (n − x)ln(1 − θ).

According to the methodology introduced by Cox and Snell, we need to find the first, the second and the third derivatives of the log-likelihood function, which are:

∂l/∂θ = (x/θ) − (n − x)/(1 − θ) ∂2_l/∂θ2 _{= −(x/θ}2_{) − (n − x)/(1 − θ)}2

(22)

Next the information matrix K and matrix A should be worked out: k11 = −n/[θ(1 − θ)] K = −k11= n/[θ(1 − θ)] K−1 = θ(1 − θ)/n k111 = 2n(1 − 2θ)/[θ2(1 − θ)2] k₁₁(1) = n(1 − 2θ)/[θ2_{(1 − θ)}2_] a11 = n(1 − 2θ)/[θ2(1 − θ)2] − 0.5[2n(1 − 2θ)]/[θ2(1 − θ)2] = 0.

Since the matrix A = {a11} = 0, according to the methodology applied in this

these, the bias of MLE ˆθ is Bias(ˆθ) = K−1Avec(K−1) = 0. On the other hand, as is well known the MLE of θ in the binomial distribution is ˆθ = x/n. This MLE is exactly unbiased. We could state this as E(ˆθ) = E(x)/n = nθ/n = θ and bias = E(ˆθ)−θ = 0. This is exactly the same as the bias solution calculated by the method introduced by Cox and Snell.

As a second example, we are going to investigate the bias of the MLE for the parameters of a normal distribution. Suppose that X is normally distributed. The data are independently and identically distributed. Then solve (2.1) according to the log-likelihood function is l(µ, σ2) = −(n/2)ln(2π) − nln(σ2)/2 −Pn

i=1(xi− µ) 2_/2σ2_,

and we get the information matrix

K =     n/σ2 ₀ 0 n/2σ4     .

(23)

A(1) =     0 −n/2σ4 −n/2σ4 ₀     , A(2) =     n/2σ2 0 0 0     , and A =     0 −n/2σ4 _n/2σ2 ₀ −n/2σ4 ₀ ₀ ₀     .

Through the expression for the bias of ˆθ to O(n−1) illustrated by Cox-Snell, as well as Cordeiro-Klein, we obtain: Bias     ˆ µ ˆ σ2     = K−1Avec(K−1) =     0 −σ2_/n     .

From the result above, we can see that the bias of ˆµ is zero and that of ˆσ is −σ2_/n,

which coincides with the known properties of the MLE’s for the normal distribution. Again, we can note that this result was obtained without needing to be able to write down the expressions for the MLE’s themselves in closed form. If we work on the “bias-adjusted” estimator of σ2_{, ˜}_σ2 _{= ˆ}_σ2 _{− (−ˆ}_σ2_{/n) = (n + 1)ˆ}_σ2_{/n, and}

Bias(˜σ2_{) = −σ}2_/n2_{. Correcting for the O(n}−1_{) bias yields an estimator that is biased}

O(n−2). On the other hand, of course, in this particular example, we also know how to eliminate the bias in ˆσ2 _{completely. We just use the estimator nˆ}_σ2_{/(n − 1).}

2.3 Bias of the MLE for Parameters in Generalized

Rayleigh Distribution

The preceding has been concerned with obtaining a closed form solution for the O(n−1) bias function introduced by Cox and Snell. It works well for the binomial

(24)

distribution and the normal distribution as discussed in Section 2.2. Now it is appro-priate to spend time on our problem - the bias function of the Generalized Rayleigh distribution. Suppose X is a random variable following the Generalized Rayleigh distribution.

From Section 1.1, the likelihood function, based on a sample of n independent observations, is: L = n Y i=1 f (xi; θ, k) = 2θk+1 Γ(k + 1) n Y i=1 xi !2k+1 exp{−θ n X i=1 x2_i}. Then the corresponding log-likelihood function is:

l = nlog(2) + n(k + 1)logθ − nlogΓ(k + 1) + (2k + 1)

n X i=1 logxi− θ n X i=1 x2_i. (2.5)

Recalling the procedure of working out the expression of the O(n−1) bias in section 2.1, the next thing we need to do is to solve the joint cumulants of the derivatives and the derivatives of cumulants showed in (2.1) and (2.2). First we find out the higher order of the derivatives for (2.5). These are as:

∂l ∂θ = n(k + 1) θ − n X i=1 x2_i ; ∂l ∂k = nlogθ − n( 1 k + Ψ(k)) + 2 n X i=1 logxi ; ∂2_l ∂θ2 = − n(k + 1) θ2 ; ∂2_l ∂k2 = −n(− 1 k2 + Ψ(1)(k)) ; (2.6) ∂3l ∂θ3 = 2n(k + 1) θ3 ; ∂3l ∂k3 = −n( 2 k3 + Ψ(2)(k)) ,

(25)

Ψ(k) = dlogΓ(k)/dk = −γ − ∞ X j=1 ξ(j + 3)(−(k − 1))j, where ξ(s) =P∞ j=1(n

−s_{) is the Riemann zeta function and γ = lim}

n→∞ξ[(

P∞

u=11/k)−

log(n)] = 0.57721... is the Euler-Mascheroni constant. In what follows we also need the trigamma and tetragamma functions, these being Ψ(i)(k) = dilogΨ(k)/dki;

i = 1, 2.

Second, we need to determine the cross derivatives of the two parameters in the generalized Rayleigh distribution. These are follows:

∂2l ∂θ∂k = n θ ; ∂2l ∂k∂θ = n θ ; ∂3_l ∂θ2_∂k = − n θ2 ; ∂3_l ∂k2_∂θ = 0 ; (2.7) ∂3_l ∂θ∂k2 = 0 ; ∂3_l ∂k∂θ2 = − n θ2 ; ∂3l ∂θ∂k∂θ = − n θ2 ; ∂3l ∂k∂θ∂k = 0 .

Based on the higher-order derivatives in (2.6) and (2.7), we note that there are no observations, x’s, i.e. we avoid the calculation of expectations. This shows us that the answers for the joint cumulants of the derivatives and the derivatives of cumulants shown in (2.1) and (2.2) are the same as the higher-order derivatives in (2.6) and (2.7) as follows:

(26)

k11 = −n(k+1)_θ2 k12 = k21 = n_θ k22 = −n −_k12 + Ψ(1)(k) k111 = k11(1) = 2n(k+1) θ3 k112 = k121 = k211 = k12(1) = k (1) 21 = k (2) 11 = −θn2 k122 = k212 = k221 = k (1) 22 = k (2) 12 = k (2) 21 = 0 k222 = k (2) 22 = −n k23 + Ψ(2)(k) .

The information matrix is

K =     n(k+1) θ2 − n θ −n θ n − 1 k2 + Ψ(1)(k)     , and A = n 2     2(k+1) θ3 − 1 θ2 − 1 θ2 0 −n θ 0 0 − 2 k3 + Ψ(2)(k)     . So, to O(n−1), Bias(ˆθ) = θ{[2(k + 1)(− 1 k2 + Ψ(1)(k)) − 3](− 1 k2 + Ψ(1)(k)) − (k + 1)( 2 k3 + Ψ(2)(k))} 2n[(k + 1)(−1/k2_{+ Ψ} (1)(k)) − 1]2 , (2.8) Bias(ˆk) = (k + 1)[−1/k 2_{+ Ψ} (1)(k) − (k + 1)(2/k3+ Ψ(2)(k))] − 2 2n[(k + 1)(−1/k2_{+ Ψ} (1)(k)) − 1]2 . (2.9)

(27)

˜ θ = θ −ˆ ˆ θ{[2(ˆk + 1)(−_ˆ1 k2 + Ψ(1)(ˆk)) − 3](− 1 ˆ k2 + Ψ(1)(ˆk)) − (ˆk + 1)( 2 ˆ k3 + Ψ(2)(ˆk))} 2n[(ˆk + 1)(−1/ˆk2_{+ Ψ} (1)(ˆk)) − 1]2 , (2.10) ˜ k = ˆk −(ˆk + 1)[−1/ˆk 2_{+ Ψ} (1)(ˆk) − (ˆk + 1)(2/ˆk3+ Ψ(2)(ˆk))] − 2 2n[(ˆk + 1)(−1/ˆk2_{+ Ψ} (1)(ˆk)) − 1]2 . (2.11)

(28)

Chapter 3 Numerical Evaluations

In this chapter, we investigate the sampling properties of the bias-adjusted estimators shown in (2.8) and (2.9) within the Generalized Rayleigh distribution family. Our primary objective is to compare the biases and mean squared errors (MSE) of the max-imum likelihood estimators and the bias-corrected maxmax-imum likelihood estimators. In addition, the analysis of the bias and MSE is developed for four specific special cases of the Generalized Rayleigh distribution family; i.e., the Rayleigh distribution, the Maxwell distribution, the Chi distribution, and the half-normal distribution, which are described in (1.2), (1.3), (1.4), and (1.5). To get a good sense on the performance of the bias-adjusted estimators showed in in (2.8) and (2.9), we will apply two well known simulation methods to construct the comparison. One is the Monte Carlo experiment. The other is the bootstrap technique.

The rest of this chapter is organized as follows. In Section 3.1, we provide two detailed algorithm designs for both the Monte Carlo experiment and the bootstrap experiment. In Section 3.2, we analyze the differences between the biases and MSEs of the maximum likelihood estimators and the bias-corrected maximum likelihood es-timators in a Monte Carlo experiment. Moreover, those eses-timators are also compared

(29)

with bias-adjusted estimators constructed using the bootstrap estimate of the bias.

3.1 Simulation design

We now illustrate the iteration algorithms for both the Monte Carlo experiment and the bootstrap experiment. The basic idea is to compare the bias and MSE of the MLE and bias-corrected estimators for each of four different distributions in the Generalized Rayleigh distribution family described in Section 1.1 by two different simulation experiments. We start with the Monte Carlo experiment.

The objective here is to apply the analytic bias-corrected estimator to a randomly generated sample based on a certain distribution function in the Monte Carlo simu-lation. In addition, we are also interested in how sensitive the estimation results are in different samples generated by different parameter values in a particular distribu-tion, as well as with different sample sizes. To implement this, we first determine the sample sizes of 20, 30, 50, 100, and 200 observations. Then we need to select the parameter values for each of four different distributions in the Generalized Rayleigh distribution family to generate the random samples.

We start with the Rayleigh distribution. Figure 3.1 is the density plot of the Rayleigh distribution with some parameter values, λ = 0.5, 1, 2 and 4. In the Generalized Rayleigh distribution family, parameter λ is regarded as θ = 1/2λ2_{, and}

in addition, k = 0. We can see that when λ gets bigger and bigger, the modes of the density functions get smaller and smaller and the shape of those become flatter and flatter. Based on these results, we determine the parameter value of λ less than four, as well as λ > 0 (see the definition of the Rayleigh distribution in (1.2)). Otherwise, the mode of the density is too small, smaller than 0.2, and its shape is too flat to see the differences. Thus, we decide upon λ = 0.5, 1, 2 and 4 as the true

(30)

parameter values for the Rayleigh distribution to generate the random samples in the Monte Carlo simulations. Using the notation of parameters in the Generalized Rayleigh distribution family, the corresponding parameter setting for the Rayleigh distribution will be θ = 2, 0.5, 0.125, and 0.03125. Then we ran the simulation to test the sensitivity of the results with different values of the parameters. In addition, we only construct the sensitivity test with sample size 50 to reduce the computational expenses and to get more reliable comparisons. This is also applied with the other three distributions in the Generalized Rayleigh distribution family.

0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 1.2 x

pdf(x)

lambda=0.5 lambda=1 lambda=2 lambda=4

Figure 3.1: The Rayleigh probability density plot with different parameter values: λ = 0.5, 1, 2 and 4.

Similarly, we decided on the parameter values of the other three distributions in the Generalized Rayleigh distribution family in the same way as for the Rayleigh distribution. For example, in Figure 3.2 the shape of the density of the Maxwell distribution differs substantially when λ = 0.5,1, and 2, i.e. θ = 2, 0.5, and 0.125. When θ = 2, the density of Maxwell distribution is “U” shaped, while it turns out to be monotonically decreasing when θ = 0.5 and 0.125. When θ = 0.125, the shape

(31)

of the density becomes very flat. We choose the parameter values on the basis of the same considerations as in the case of the half normal distribution.

0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 1.2 x

pdf(x)

lambda=0.5 lambda=1 lambda=2

Figure 3.2: The Maxwell probability density plot with different parameter values: λ = 0.5, 1 and 2.

For the Chi distribution, we have to decide on values for two parameters. One is the degrees of freedom parameter, a (or corresponding to the earlier notation in the Generalized Rayleigh distribution family, k = a/2 − 1). The other one is τ (or corresponding to the earlier notation θ = 1/2τ2). When a = 1, the density functions are monotonic decreasing, while these turn out to be “U” shaped when a > 1. The location of these density functions moves to the right when a gets bigger. On the other hand, the density becomes flatter and flatter when τ gets bigger. The biggest number for τ is when the shape becomes almost horizontal with very little change, i.e. τ = 0.5, 1, 2 and 3. More details of the parameter setting and sample sizes determination are shown in Table 3.1 for each of four distributions in the Generalized Rayleigh distribution family that we consider.

(32)

0 1 2 3 4 5 0.0 0.5 1.0 1.5 x

pdf(x)

sigma=0.5 sigma=1 sigma=2 sigma=3

Figure 3.3: The Half Normal probability density plot with different parameter values: σ = 0.5, 1, 2 and 3. 0 1 2 3 4 5 6 0.0 0.5 1.0 1.5 (a) x pdf(x) df=1 tau=0.5 df=1 tau=1 df=1 tau=2 df=1 tau=3 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 1.2 (b) x pdf(x) df=3 tau=0.5 df=3 tau=1 df=3 tau=2 df=3 tau=3

Figure 3.4: The Rayleigh probability density plot with different parameter values: τ = 0.5, 1, 2 and 3 in both (a) and (b); a = 1 in (a) and a = 3 in (b).

(33)

Table 3.1: Parameter setting

Distribution sample size k θ

Rayleigh nobs = 20 k = 0 θ = 1 2λ2 λ = 1 nobs = 30 λ = 1 nobs = 50 λ = 0.5, 1, 2, and 4 nobs = 100 λ = 1 nobs = 200 λ = 1 Maxwell nobs = 20 k = 1₂ θ = _2λ12 λ = 1 nobs = 30 λ = 1 nobs = 50 λ = 0.5, 1, 2 nobs = 100 λ = 1 nobs = 200 λ = 1 Half-normal nobs = 20 k = −1₂ θ = _2σ12 σ = 1 nobs = 30 σ = 1 nobs = 50 σ = 0.5, 1, 2, and 3 nobs = 100 σ = 1 nobs = 200 σ = 1 Chi nobs = 20 k = a₂ − 1 and a = 1 θ = _2τ12 τ = 1 nobs = 30 k = a₂ − 1 and a = 1 τ = 1 nobs = 50 k = a₂ − 1 and a = 1 or 3 τ = 0.5, 1, 2 or 3 nobs = 100 k = a₂ − 1 and a = 1 τ = 1 nobs = 200 k = a₂ − 1 and a = 1 τ = 1

Table 3.1 the k values, θ values (the values of parameters in the Generalized Rayleigh distribution family) and nobs values (sample sizes).

Then we need to compute the bias and the mean squared error (MSE) based on different samples with different parameter setting, with different sample sizes in each Monte Carlo simulation. The actual biases of MLE’s are undertaken using the maxLik package (Toomet, 2008) for the R statistical software environment (R, 2008). On the other hand, the biases of the bias-corrected estimators are computed by applying (2.8) and (2.9). Note that the bias expressions in (2.8) and (2.9) are valid only to

(34)

O(n−1).

As shown in Table 3.1, there are thirty five Monte Carlo simulations. Each simu-lation is designed as following:

Step 1: Parameter setting: decide the values of parameter, θ and k, for each partic-ular distribution in the Generalized Rayleigh distribution family.

Step 2: Generate each particular distribution using the inversion method in R with certain sample sizes, nobs, as described in Appendix C.

Step 3: Compute the MLE using the maxLik package, i.e. ˆθ and ˆk, as well as MSE, i.e. M SE_θˆ= (ˆθ − θ)2 and M SEˆ_k = (ˆk − k)2.

Step 4: Apply (2.8) and (2.9) to calculate the bias of k and θ with the maximum likelihood estimators in Step 3. Then the corresponding analytic bias-corrected maximum likelihood estimators are computed as in (2.10) and (2.11), denoted as ˜

θ and ˜k. On the other hand, the MSE of the analytic bias-corrected maximum likelihood estimators is calculated as the square of the difference of the ana-lytic bias-corrected maximum likelihood estimators and the original parameters specified in Step 1, i.e. M SE_θ˜= (˜θ − θ)2 and M SE˜_k = (˜k − k)2.

Step 5: Repeat Step 2 to 4 a total of 200,000 times.

Step 6: Calculate the asymptotic bias of MLE by subtracting the average of MLE’s in Step 2 and the original parameters in Step 1, and then calculate the MSE of the MLE as the average of MSE’s in Step 3.

Besides the Monte Carlo experiment, we want to build a comprehensive interpre-tation of the bias reduction introduced by Cox and Snell. A good way to do that is to compare the results found by Monte Carlo simulation illustrated with those derived

(35)

Table 3.2: Parameter setting for bootstrapping

Distribution sample size k θ

Rayleigh nobs = 20 k = 0 θ = _2λ12 λ = 1 nobs = 50 λ = 1 Maxwell nobs = 20 k = 1₂ θ = _2λ12 λ = 1 nobs = 50 λ = 1 Half-normal nobs = 20 k = −1 2 θ = 1 2σ2 σ = 1 nobs = 50 σ = 1 Chi nobs = 20 k = a 2 − 1, a = 1 _{θ =} 1 2τ2 τ = 1 nobs = 50 k = a₂ − 1, a = 1 τ = 1

by bootstrapping simulation. Bootstrapping allows one to gather many alternative versions of the single statistic that would ordinarily be calculated from one sample.

The basic idea is to use the bootstrapping method to calculate the finite sample biases and MSE’s for four different distributions in the Generalized Rayleigh distri-bution family with the same consideration of determining values of parameters and the sample sizes. The same consideration of setting values of parameters makes the comparison comparable and reliable. To simplify the comparisons and reduce the computational expenses, the sample sizes for each distribution are only 20 and 50, while the shape parameter of each is only as unity, implying θ = 0.5. In particular, the parameter of degrees of freedom for the Chi distribution is taken as unity; imply-ing k = −0.5. The details are provided in Table 3.2. The key point is computimply-ing the bias by recomputing the statistic when sampling a number of equal size resamples with replacement of the original dataset. Then each of the bootstrapping simulations is designed as following:

Step 1: Generate certain distributions using the inversion method in R with values of the parameters as in Table 3.2.

(36)

Step 2: Estimate the MLE using the maxLik package, i.e. ˆθ and ˆk, as well as MSE; which is the same as in Step 3 of the Monte Carlo simulation.

Step 3: Bootstrap a sample with replacement from the sample generated in Step 1. Step 4: Estimate the MLE for the sample generated in Step 4, i.e. ˆθ0 and ˆk0. Step 5: Repeat Steps 3 and 4 for nboot = 1000 times.

Step 6: Compute the bootstrap bias, i.e. Biasθbs =

Pˆ

θ0/nboot − ˆθ and Biaskbs =

Pˆ

k0/nboot−ˆk. Therefore, the bootstrap-bias-corrected estimators are calculated

as ˇθ = ˆθ − Biasθbs and ˇk = ˆk − Biaskbs. On the other hand, the MSE’s of the bootstrap-bias-corrected estimators are calculated as M SE_θˇ = (ˇθ − θ)2 and

M SEˇ_k= (ˇk − k)2.

Step 7: Repeat Steps from 1 to 6, 200,000 times.

Step 8: Calculate the bias of the MLE by subtracting the average of ˇθ (ˇk) and the original values of parameters chosen in Step 1. In addition, calculate the MSE of the MLE as the average of MSEs in Step 6.

In these two experiments, the Monte Carlo experiment is analytical with a closed form expression for the bias, while the bootstrapping experiment is numerical and computes the bias by resampling the original dataset. The bootstrapping experiment is computationally more expensive. However, it is the only choice if we do not have the analytical solution for the bias. Besides, it provides the comparable analysis in theoretical evaluations to get a sense on the absolute performance of the Monte Carlo experiment.

(37)

3.2 Simulation results

We describe the iteration of two numerical experiments, the Monte Carlo experiment and the Bootstrapping experiment. In the present section, we report the findings of these two experiments. The simulation results are shown in the separate tables for each distribution. To make the results easy to compare, the results of both the Monte Carlo simulation and bootstrapping simulation of the same distribution are displayed in one table. In addition, since we are interested in the sensitivity of the parameters for each distribution in the Generalized Rayleigh distribution family, the simulation results will be displayed in one table if they are computed by different values of the parameters in the same distribution .

We begin with the Rayleigh distribution. The values of the parameter are deter-mined as in Tables 3.1 and 3.2 for the Monte Carlo experiment and the Bootstrapping experiment. The analytic bias-adjusted estimators obtained in the Monte Carlo exper-iment are computed from a total 200,000 replications, denoted as ˜θ. The bootstrap bias-adjusted estimators obtained in the Bootstrapping experiment are computed from a total 200,000 replications with 1,000 repeated samples in each replication, denoted as ˘θ. In addition, the MLE is denoted as ˆθ. Recall that the parameter k in the Rayleigh distribution equals zero as in (1.2). Several joint cumulants of the derivatives or derivatives of cumulants would then be undefined. For example, the first order partial derivative with respect to k is undefined in (2.6), since k equals to zero. Therefore, we can’t apply (2.10) and (2.11) to compute the bias-corrected estimator ˜θ for the Rayleigh distribution. A special deductive procedure is explained in Appendix A to calculate the bias-corrected estimator. These special deductive equations are applied in the Monte Carlo simulation for the Rayleigh distribution. Table 3.3 shows the percentage biases and MSEs (in square brakets) of the MLE, the analytic bias-adjusted estimator and the bootstrap bias-adjusted estimator of the

(38)

Table 3.3: Percentage biases and MSE’s of the Rayleigh distribution n θˆ θ˜ θ˘ (a): λ = 0.5 or θ = 2 50 -0.2801 -0.0308 0.0312 [0.4991] [0.5008] [0.5762] (b): λ = 1 or θ = 0.5 20 -0.6255 -0.0126 -0.0173 [1.2521] [1.2638] [1.3237] 30 -0.3758 0.0393 -0.0342 [0.8339] [0.8394] [0.8407] 50 -0.2244 0.0251 -0.0354 [0.4992] [0.5012] [0.5271] 100 -0.1292 -0.0043 0.0180 [0.2480] [0.2485] [0.2512] 200 -0.0664 -0.0039 0.1247 [0.1252] [0.1254] [0.4976] (c): λ = 2 or θ = 0.125 50 -0.2176 0.0318 0.0268 [0.5013] [0.5033] [0.5003] (d): λ = 4 or θ = 0.03125 50 -0.2668 -0.0175 0.0085 [0.4978] [0.4996] [0.4907]

(39)

Table 3.4: Percentage biases and MSE’s of the Maxwell distribution n ˆk k˜ k˘ θˆ θ˜ θ˘ (a): k = 0.5, λ = 0.5 or θ = 2 50 16.7331 0.07116 -0.4637 6.9467 -0.0299 -0.2062 [40.07451] [33.0095] [33.0029] [6.3404] [5.1692] [5.1912] (b): k = 0.5, λ = 1 or θ = 0.5 20 46.1811 0.1335 -4.9164 19.3819 0.0539 -2.1584 [155.1545] [97.2173] [105.0411] [25.4475] [15.6251] [16.4881] 30 29.1399 0.1363 -0.2486 12.2450 0.0806 1.0504 [81.3852] [59.2491] [62.5967] [13.0656] [9.3493] [10.6324] 50 16.5934 -0.0603 0.1374 6.96158 -0.0164 0.0199 [39.9668] [32.9556] [32.8900] [6.3250] [5.1539] [5.2091] 100 8.0952 0.0217 -0.1536 3.4034 0.0225 -0.0684 [17.2298] [15.6133] [15.5437] [2.7152] [2.4441] [2.4406] 200 4.0092 0.0333 0.0936 1.6897 0.0253 -0.0096 [8.0070] [7.6173] [7.6588] [1.2543] [1.1889] [1.2001] (c): k = 0.5, λ = 2 or θ = 0.125 50 16.60228 -0.0519 -0.5138 6.9521 -0.0251 -0.2228 [39.6036] [32.6314] [32.9174] [6.3073] [5.1391] [5.1857]

(40)

Table 3.5: Percentage biases and MSE’s of the Half-Normal distribution n ˆk ˜k k˘ θˆ θ˜ θ˘ (a): k = −0.5, σ = 0.5 or θ = 2 50 -4.6084 -0.0121 0.2085 8.9605 0.0062 -0.5042 [3.5173] [2.9658] [2.9424] [9.6729] [7.6740] [7.6818] (b): k = −0.5, σ = 1 or θ = 0.5 20 -12.5300 0.0692 1.6159 24.9904 -0.1130 -4.2726 [12.9837] [8.5404] [8.4679] [41.4694] [24.2286] [24.9102] 30 -8.0008 -0.0249 0.4314 15.7146 0.0270 -1.2662 [6.9887] [5.2710] [5.2178] [20.7703] [14.3319] [14.0965] 50 -4.5916 0.0039 -0.1817 8.8967 -0.0527 -0.1817 [3.5124] [2.9627] [7.7405] [9.6484] [7.6661] [7.7405] 100 -2.2525 -0.0196 0.0837 4.2993 -0.0185 -0.1406 [1.5457] [1.4178] [1.4084] [4.0349] [3.5847] [3.5852] 200 -1.1169 -0.0162 0.0042 2.1283 0.0065 0.0048 [0.7285] [0.6975] [0.6987] [1.8427] [1.7348] [1.7523] (c): k = −0.5, σ = 2 or θ = 0.125 50 -4.6569 -0.0579 0.2085 9.0187 0.0616 -0.5042 [3.5429] [2.9847] [2.9421] [9.7361] [7.7241] [7.6813] (d): k = −0.5, σ = 3 or θ = 1/18 50 -4.6425 -0.0443 0.2085 8.9994 0.0472 -0.5041 [3.5242] [2.9692] [2.9423] [9.7866] [7.7700] [7.6817]

(41)

Table 3.6: Percentage biases and MSE’s of the Chi distribution n kˆ k˜ ˘k θˆ θ˜ θ˘ (a): a = 1 or k = −0.5, λ = 0.5 or θ = 2 50 -4.5563 0.0373 0.1199 8.9727 0.0163 -0.3094 [3.4970] [2.9519] [2.9679] [9.7082] [7.7049] [7.7230] (a): a = 3 or k = 0.5, λ = 0.5 or θ = 2 50 16.5948 -0.0589 -0.4748 7.0059 0.0251 -0.2162 [39.6932] [32.7129] [33.2130] [6.3204] [5.1442] [5.2102] (b): a = 1 or k = −0.5, λ = 1 or θ = 0.5 20 -12.6474 -0.0326 1.4878 25.0599 -0.0483 -3.8618 [13.0473] [8.5651] [8.4769] [41.5974] [24.2933] [25.1288] 30 -8.0419 -0.0631 0.5362 15.7766 0.0829 -1.4112 [6.9467] [5.2307] [5.1906] [20.6236] [14.1990] [14.2530] 50 -4.6496 -0.0511 0.0794 9.1072 0.1429 -0.1768 [3.5207] [2.9654] [2.9729] [9.8263] [7.7870] [7.7815] 100 -2.2866 -0.0527 -0.0394 4.2989 -0.0181 0.0189 [1.5475] [1.4181] [1.4278] [4.0359] [3.5858] [3.5942] 200 -1.0849 0.0155 0.0161 2.0831 -0.0381 0.0300 [0.7280] [0.6977] [0.6989] [1.8415] [1.7354] [1.7425] (b): a = 3 or k = 0.5, λ = 1 or θ = 0.5 50 16.8518 0.1829 0.3279 7.0863 0.1009 0.1605 [40.13] [33.0236] [32.9918] [6.3887] [5.1946] [5.1907] (c): a = 1 or k = −0.5, λ = 2 or θ = 0.125 50 -4.5785 0.0163 0.1019 9.0437 0.0821 -0.2595 [3.5103] [2.9620] [2.9615] [9.7865] [7.7620] [7.8086] (c): a = 3 or k = 0.5, λ = 2 or θ = 0.125 50 16.7661 0.1022 -0.7921 7.0316 0.0495 -0.2855 [40.3018] [33.2009] [32.8446] [6.4147] [5.2243] [5.1769] (d): a = 1 or k = −0.5, λ = 3 or θ = 1/18 50 -4.6888 -0.0882 -0.0915 9.0061 0.0503 0.0596 [3.5385] [2.9782] [2.9954] [9.6563] [7.6562] [7.7047] (d): a = 3 or k = −0.5, λ = 3 or θ = 1/18 50 16.5731 -0.0793 -0.5064 6.9514 -0.0258 -0.2199 [39.5854] [32.6238] [32.9079] [6.3260] [5.1557] [5.1846]

(42)

Rayleigh distribution. The percentage biases are the percentage ratios of the biases computed based on the simulations and the values of parameter specified in Tables 3.1 or 3.2. Biases here are the difference between the bias-adjusted estimates com-puted from the simulations and the values of parameter specified in Tables 3.1 or 3.2. The percentage MSEs are the percentage ratios of the MSEs computed from the simulations and the squared parameter values specified in Tables 3.1 or 3.2; i.e., 100 × (MSE/k2_{) or 100 × (MSE/θ}2_).

In Table 3.3, we can see that the percentage biases of the MLEs are much larger than those of both the analytic adjusted estimates and the bootstrapped bias-adjusted estimates, ˜θ and ˘θ. All the percentage biases of the MLEs for the Rayleigh distribution are less than 1 percent. The percentage biases of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are generally reducing by at least one order of magnitude in all cases. In case (b), the absolute values of percentage biases of MLEs monotonically decline as the sample size increases because of the consistency of the MLEs. The analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates perform well too. In the cases listed in Table 3.3, their percentage biases are smaller than those of MLE’s by at least one order of magnitude.

On the other hand, the percentage MSEs represented in Tables 3.3 are very close to each other. We can see that the percentage MSEs of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are a little larger than those of the MLEs. Recalling the definition of the MSE, MSE = variance + bias2, the dramatic decrease of biases we discussed in the last paragraph comes at the expense of increased variance for the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates.

(43)

family, the values of the parameters are also determined as in Tables 3.1 and 3.2. The procedures of the Monte Carlo experiment and the Bootstrapping experiment follow the steps illustrated in Section 3.1. The corresponding MLE, the analytic bias-adjusted estimates and the bootstrapped bias-bias-adjusted estimates are still denoted as ˆk, ˜k and ˘k (ˆθ, ˜θ and ˘θ). Tables 3.4, 3.5 and 3.6 display the percentage bias and MSE for all MLE, bias-adjusted estimators calculated by (2.10) and (2.11), and bias-adjusted estimators calculated by the bootstrapping method for the Maxwell, Half-Normal, and Chi distributions.

Among Tables 3.4, 3.5 and 3.6, we reach conclusions to what we find for the Rayleigh distribution. First, the analytic bias correction and the bootstrap bias correction perform extremely well in all cases, in most cases reducing the percent-age biases by at least an order of magnitude. Especially, the analytic bias-adjusted estimates decline by as much as two orders of magnitude in most cases. Second, these gains come at the cost of increases in variance, as evidenced by the very small differences in the percentage MSEs. Even though the MSEs of both the analytic bias-adjusted estimates and the bootstrapped bias-adjusted estimates are a little bit less than those of MLEs in most cases, we can still see that the variance is increasing since the bias is negligible in percentage terms.

Among these three distributions, the percentage bias of MLE k in the Maxwell distribution with sample size 20 is the largest, around 46 percent. The biases of analytic bias-adjusted estimates are reducing by three orders of magnitude for ˜k with sample size 50, while they are reducing by two order of magnitude for every parameter values for ˜θ. In Table 3.4 case (b), under certain parameter values, when sample size increase from 20 to 200, the percentage biases of ˆθ are more stable than those of ˆk. Those of ˜k are much more stable than those of ˆk since the percentage bias of ˜k are at least reducing two order of magnitude. In addition, those of ˜θ are slightly more

(44)

stable than those of ˜k.

In Table 3.5, we can see that the situation is opposite for Half-Normal distribution from the Maxwell distribution. When sample size increase from 20 to 200, the per-centage biases of ˆk is more stable than those of ˆθ. The absolute values of percentage biases of ˜k are from 0.0039 to 0.0692, and the absolute values of those of ˜θ are from 0.0065 to 0.1130. We can see that those of ˜k are more stable than those of ˜θ.

In Table 3.6, we can see that the percentage biases of ˆk are much larger when k = 0.5 than those when k = −0.5 with sample size 50. However, the differences among those of ˜k are very small for both parameter values for k. On the other hand, for each parameter values for θ with sample size 50, no matter what k is, percentage biases of ˆk are very close to each other, as well as those of ˜θ. If we look into the sensitivity consideration of sample size in case (b), we could get the similar conclusion as in the Half-Normal distribution. The percentage biases of ˆθ are much larger than those of ˆk when sample size equals 20. The percentage biases of ˆk are much more stable than those of ˆθ. On the other hand, the absolute values of percentage biases for ˜k are very close to each other, from 0.0155 to 0.0631, as well as for ˜θ, from 0.0181 to 0.1429.

Next, we look into the comparison of percentage biases and MSE’s among the maximum likelihood estimation, the Cox and Snell methodology, and the bootstrap-ping methodology in Tables 3.3, 3.4, 3.5 and 3.6. Overall, bias-adjusting using the bootstrap also reduces the biases of the MLEs. Often an order of magnitude im-provement can be obtained in percentage terms for small sample sizes, but again this comes at the expense of increased variability in the estimators. In addition, the bootstrapping simulation run in this section is computationally expensive. It takes almost a week to finish a simulation for one of those distributions. It is very good to set up a bootstrap to compare the results with those simulated by another method.

(45)

But it is not very applicable for those analysis people need to find out immediately. Generally, the original MLEs and both the analytically bias-adjusted and boot-strap bias-adjusted MLEs have similar percentage MSEs, so any choice between the two methodologies may be based on the resultant biases. The overall thrust of the results in Tables 3.3 through 3.6 is that the analytical Cox-Snell/Cordeiro-Klein methodology out-performs the use of the bootstrap for bias reduction in almost all of the cases considered, and especially if the sample size is less than n = 100.

(46)

Chapter 4 Illustrative Example

In this Chapter, we use some real data sets to investigate the performance of bias corrected estimates obtained using the method introduced by Cox and Snell. We consider UK consumption of beer, wine and spirits, 1955-1985, in liters per capita (Srlvanathan and Clements, 1995, p130).

The data of total expenditure on alcohol is given in Table 4.1. We are interested how the expenditure on each beverage changes when that on the total alcohol changes. Not like the usual model people probably would apply, the linear regression model, we applied the logistic regression model, one of the Generalized linear regression models. The responses will be the conditional budget shares of each beverage, which are also given in Table 4.1. These conditional budget shares are the expenditure on each beverage expressed as a fraction of total alcohol expenditure. Expenditures are expressed in current pounds per capita.

The link function of this Generalized linear regression analysis is the following:

g(µ) = α + βx, and g(µ) = log µ 1 − µ,

(47)

Table 4.1: Total Expenditure on Alcohol and Conditional Budget Shares for Alcohol: United Kingdom, 1955-1985

Year Beer Wine Spirits Total Alcohol

1955 0.6382 0.1022 0.2596 16.33 1956 0.6316 0.1005 0.2679 16.92 1957 0.6347 0.1038 0.2616 17.62 1958 0.6245 0.1046 0.2709 17.58 1959 0.6022 0.1109 0.287 17.71 1960 0.5884 0.1154 0.2963 18.37 1961 0.5805 0.1154 0.3042 20.36 1962 0.5912 0.1076 0.3012 21.8 1963 0.582 0.1299 0.2882 22.97 1964 0.5698 0.1374 0.2928 25.75 1965 0.5884 0.1314 0.2802 27.58 1966 0.5849 0.1365 0.2786 29.76 1967 0.5871 0.142 0.2709 31.64 1968 0.5706 0.1519 0.2775 33.87 1969 0.5919 0.1518 0.2563 36.58 1970 0.5891 0.1448 0.2661 41.34 1971 0.5885 0.1531 0.2584 46.38 1972 0.5711 0.1619 0.267 51.89 1973 0.5279 0.1788 0.2933 60.9 1974 0.529 0.1837 0.2874 69.63 1975 0.5512 0.172 0.2768 86.24 1976 0.5665 0.1726 0.2609 101.66 1977 0.5575 0.1722 0.2703 116.5 1978 0.5457 0.1799 0.2745 129.67 1979 0.5275 0.1893 0.2832 154.11 1980 0.5344 0.1924 0.2732 176.78 1981 0.5354 0.2038 0.2608 197.8 1982 0.5374 0.2125 0.2502 213.06 1983 0.5339 0.2201 0.246 237.15 1984 0.536 0.2214 0.2426 255.45 1985 0.5332 0.2218 0.245 278.64

(48)

Table 4.2: Estimates of Generalized Regression Model with Half Normal distribution

Beer Wine Spirits

α β α β α β

MLE -5.3399 -0.0005 -6.9225 0.0026 -6.0890 -0.0005

Bias-correct -5.3266 -0.0003 -6.9093 0.0028 -6.0758 -0.0002

expenditure on alcohol, and µ is the mean value of the half normal distribution of y. Besides the consideration of the logistic link function in Generalized linear regression model, we also need to decide upon the distribution of the responses. Since the responses are all non-negative, we consider the half normal distribution, one of the Generalized Rayleigh distribution. We first estimate by the maximum likelihood method as usual, and then the bias corrected estimates are calculated by method illustrated by Cox and Snell. The results are displayed in Appendix. The MLE’s are obtained using R with the Maxlink package. Note that in finding the MLE, the initial points need to be set up. Here we use the least-squares estimates. The bias corrected estimates are worked out by the functions showed in the Appendix. These results are shown in Table 4.2.

Then we calculated the marginal effects for continuous variables based on the estimates in Table 4.2, i.e. the marginal changes in expected probability,

∂E(y|x)

∂x =

βexp( ˆα + ˆβx) (1 + exp( ˆα + ˆβx))2.

Table 4.3 shows the mean value among marginal effects of each observation for each beverage. The first row shows the mean marginal effects calculated by MLE, while the second row shows those from biased corrected estimates. If we sum up the first row, the result is around the zero. It makes sense since the data in Table 4.1 shows that the total alcohol is divided by three kinds, beer, wine and spirits. On the other

(49)

Table 4.3: Mean Marginal Effects

Beer Wine Spirits

MLE −2.4961 × 10−6 _{3.2660 × 10}−6 _{−1.0300 × 10}−6

Bias-corrected −1.4957 × 10−6 _{3.6858 × 10}−6 _{−5.4753 × 10}−7

hand, the second row does not sum up to about zero, because the bias correction does not impose this constraint. In general, the marginal effects in Table 4.3 are very small with order O(n−6). It should be noted that the estimates in Table 4.2 are very small. If we look at the percentages, the percentage of mean marginal effects calculated by bias corrected estimates does change a lot in compared with those obtained by MLE. The methodology introduced by Cox and Snell is workable to find out the bias improved estimators.

(50)

Chapter 5 Conclusions

We have developed the details of bias corrected estimation using a method introduced by Cox and Snell applied to the Generalized Rayleigh distribution, according to the analytic expressions for the bias to O(n−1) of the maximum likelihood estimators of the parameters. As a result, the bias adjusted estimators are unbiased to order O(n−2). We investigated the performance of these bias corrected estimates of each Generalized Rayleigh distribution through a combination of theoretical and empirical approaches. The bootstrap to bias-correct the maximum likelihood estimator is used to obtain comparable estimators.

The Generalized Rayleigh distribution is a very good start in analysis nowadays. The number of parameters in this distribution is small, only two parameters, which would simplify the analysis compared to the huge consideration of parameter set-tings, conditions, and combinations in parameters if there are a lot of them. Besides, this distribution includes several important and useful distribution, such as Rayleigh distribution, Half-Normal distribution, and Maxwell distribution. The Rayleigh dis-tribution is very useful in survival analysis, while the Half-Normal disdis-tribution is useful in many economic topics. For example, the simple income and expenditure

(51)

model could be improved by considering the Half-Normal distribution, since all the observations are non-negative. By constructing the hypothesis test for each parame-ter, we could find out which Generalized Rayleigh distribution is appropriate.

To find the better estimators in analyses using Generalized Rayleigh distribution, we considered reducing the bias of the estimator, which would improve the “unbi-asedness” property. However, the estimators calculated in this thesis are still not completed unbiased. They are at least unbiased to order O(n−2).

The derivation of bias corrected estimators in this thesis was simple. First of all, it started with the widely applicable estimation, MLE. Moreover, it did not involve to deriving the closed form of MLE which is very difficult to obtain for many dis-tributions, and usually we used the Newton-Raphson method to approximate of the MLE. Second, the random variable is eliminated from the second derivative which simplified the later calculation of expectations.

In general, this analytic bias correction, which has not previously been evaluated, is found to be superior to the alternative of bias-correction via the bootstrap - gen-erally by an order of magnitude in percentage terms. Importantly, these gains are usually obtained with a small increase in relative mean squared error, at least for sam-ple sizes of the magnitude likely to be encountered in practice. The net effect is that the original MLEs and their bias-corrected counterparts all have similar percentage mean squared errors. The use of the Cox-Snell bias correction is strongly recom-mended when estimating the parameters of the Generalized Rayleigh distribution by maximum likelihood.

(52)

References

Bartlett, M. S. (1953a), “Approximate confidence intervals ”, Biometrika, 40, 12-19. Bartlett, M. S. (1953b), “Approximate confidence intervals II. More than one

un-known parameter”, Biometrika, 40, 306-317.

Cordeiro, G. M. and R. Klein (1994), “Bias correction in ARMA models”, Statistics and Probability Letters, 19, 169-176.

Cox, D. R. and E. J. Snell (1968), “A general definition of residuals”, Journal of the Royal Statistical Society, B, 30, 248-275.

Dyer, D. D. and Whisenand, C. W. (1973), “Best linear unbiased estimator of the parameter of the Rayleigh distribution-Part I: Small sample theory for censored order statistics”, IEEE Transactions on Reliability, 22, 27-34.

Firth, D. (1993), “Bias reduction of maximum likelihood estimates”, Biometrika, 80, 27-38.

Fisher, R. A. (1922), “ On the mathematical foundations of theoretical statistics”, Philosophical Transactions of the Royal Society of London, A, 222, 309-368. Haldane, J. B. S. (1953), “The estimation of two parameters from a sample”,

Sankhya, 12, 313-320.

Haldane, J. B. S. and S. M. Smith (1956), “The sampling distribution of a maximum likelihood estimate ”, Biometrika, 43, 96-103.

Polovko, A. M. (1968), Fundamentals of Reliability Theory, Academic Press, New York.

(53)

Shenton, L. R. and K. Bowman (1963), “Higher moments of a maximum-likelihood estimate”, Journal of the Royal Statistical Society, B, 25, 305-317.

Shenton, L. R. and P. A. Wallington (1962), “The bias of moment estimators with an application to the negative binomial distribution”, Biometrika, 49, 193-240. Toomet, O. and Henningsen, A. (2008), maxLik: Maximum likelihood estimation.

http://CRAN.R-project.org; http://maxLik.org

Voda, V. Gh. (1976), “Inferential procedures on a generalized Rayleigh variate (I)”, Aplikace Matematiky, 21, 395-412.

Wikipedia (2011a), http://en.wikipedia.org/wiki/Rayleigh distribution.

(54)

Appendix A

The Rayleigh Distribution

Recall that the parameter k in the Rayleigh distribution equals zero as in (1.2). Several of the general joint cumulants of the derivatives or derivatives of cumulants that we use would be undefined in this case. Therefore, we need to re-calculate the bias-corrected estimator for the Rayleigh distribution by the methodology introduced by Cox and Snell, as illustrated in Section 2.1. The following results show how we contruct the bias-corrected estimator in this special case. The log-likelihood function from (1.2) is: l(xi; λ) = −2nlogλ + n X i=1 logxi− Pn i=1x 2 i 2λ2 . Then,

(55)

∂l ∂λ = − 2n λ + Pn i=1x 2 i λ3 ; ∂2_l ∂λ2 = 2n λ2 − 3Pn i=1x2i λ4 ; k11 = − 4n λ2 ; ∂3l ∂λ3 = − 4n λ3 + 12Pn i=1x 2 i λ5 ; k111 = 20n λ3 ; k(1)₁₁ = ∂k11 ∂λ = 8n λ3 ; a11 = k(1)11 − 0.5k111 = − 2n λ3 ; K = −[k11] = 4n λ2 ; K−1 = λ 2 4n. So, Bias(ˆλ) =Kˆ−1_{Avec( ˆ}_K−1_{) =} λ2 4n −2n λ3 λ2 4n = − λ 8n,

which is unambiguously negative. The R code to plot the desity function for the Rayleigh distribution is as follows:

library(VGAM) x = seq(0, 10, 0.001) a1 = 0.5 a2 = 1 a3 = 2 a4 = 4

(56)

leg.txt = c(”lambda=0.5”,”lambda=1”,”lambda=2”,”lambda=4”)

plot(x,drayleigh(x, a1, log=FALSE),type=”l”,lwd=2,main=”Rayleigh Probability Den-sity”,xlab=””, ylab=””)

lines(x, drayleigh(x, a3, log=FALSE),col=3,lwd=2) lines(x, drayleigh(x, a4, log=FALSE),col=4,lwd=2) mtext(”x”,side=1,line=2.5,cex=1.5)

mtext(”pdf(x)”,side=2,line=2.5,cex=1.5)

(57)

Appendix B

R code for Bias-Adjusted

Estimators of the Generalized

Rayleigh Distribution

The R code to calculate the bias-corrected estimators in (2.10) and (2.11) for Maxwell distribution, and to evaluate the properties of these estimators is as follows:

library(maxLik) library(VGAM) kkk = 0.5 lam = 1 eee = 1/(2*lamˆ2) nobs = 20 nrep = 200000

(58)

msekhat = mseehat = msekhat bc = mseehat bc = 0 A1 = array(,c(2,2)) A2 = array(,c(2,2)) Km = array(,c(2,2)) VECKINV = array(,c(4,1)) for (i in 1:nrep) { x = rmaxwell(nobs, a=1/(lamˆ2)) loglik = function(param){ kk = param[1] theta = param[2] ll = nobs*(kk+1)*log(theta)-nobs*log(gamma(kk+1))+(2*kk+1)*sum(log(x))- theta*sum(xˆ2) ll }

res = maxLik(loglik, method=“nm”,start=c(kkk,eee)) $ estimate sumkhat = sumkhat + res[1]

sumehat = sumehat + res[2]

msekhat = msekhat + (res[1]-kkk)ˆ2 mseehat = mseehat + (res[2]-eee)ˆ2 k = res[1] e = res[2] k11 = -nobs*(k+1)/(eˆ2) k12 = nobs/e k22 = -nobs*(-1/kˆ2+trigamma(k)) k111 = 2*nobs*(k+1)/(eˆ3) k222 = -nobs*(2/kˆ3+.Internal(psigamma(k,2))) k112 = k121 = k211 = -nobs/(eˆ2)

(59)

k122 = k212 = k221 = 0 k11e = 2*nobs*(k+1)/(eˆ3) k12e = -nobs/(eˆ2) k22e = 0 k11k = -nobs/(eˆ2) k12k = 0 k22k = -nobs*(2/kˆ3+.Internal(psigamma(k,2))) Km[1,1] = -k11 Km[1,2] = Km[2,1] = -k12 Km[2,2] = -k22 KINV = solve(Km) A1[1,1] = k11e-0.5*k111

A1[1,2] = A1[2,1] = k12e - 0.5*k121 A1[2,2] = k22e - 0.5*k221 A2[1,1] = k11k-0.5*k112 A2[1,2] = A2[2,1] = k12k - 0.5*k122 A2[2,2] = k22k - 0.5*k222 A = cbind(A1,A2) VECKINV[1] = KINV[1,1] VECKINV[2] = KINV[2,1] VECKINV[3] = KINV[1,2] VECKINV[4] = KINV[2,2] B = KINV%*%A%*%VECKINV e bc = e - B[1] k bc = k - B[2] sumkhat bc = sumkhat bc + k bc

(60)

sumehat bc = sumehat bc + e bc

msekhat bc = msekhat bc + (k bc-kkk)ˆ2 mseehat bc = mseehat bc + (e bc-eee)ˆ2 }

baiskhat = sumkhat/nrep-kkk baisehat = sumehat/nrep-eee

baiskhat bc = sumkhat bc/nrep-kkk

baisehat bc = sumehat bc/nrep-eee pbiaskhat = 100*baiskhat/kkk pbiasehat = 100*baisehat/eee

pbiaskhat bc = 100*baiskhat bc/kkk

pbiasehat bc = 100*baisehat bc/eee msekhat = msekhat/nrep mseehat = mseehat/nrep

msekhat bc = msekhat bc/nrep

mseehat bc = mseehat bc/nrep pmsekhat = 100*msekhat/(kkkˆ2) pmseehat = 100*mseehat/(eeeˆ2)

pmsekhat bc = 100*msekhat bc/(kkkˆ2) pmseehat bc = 100*mseehat bc/(eeeˆ2)

(61)

Appendix C

Inversion Method to Generate a

Random Variable

In Figure C.1, we can see that for each particular observation x0, we can compute

one unique corresponding F (x0). The function F can be assigned as any c.d.f. In

this plot, it is the c.d.f for the Rayleigh distribution. Therefore, we can generate the random variable x by applying the function in the reverse manner. According to the property of the c.d.f, F (x) has to be between zero and one. So, the random variable, y = F (x), could be easily generated by the uniform distribution between zero and one. One way to generate a random variable that follows Rayleigh distribution can be illustrated as follows:

Step 1: Generate random variables, y, based on Uniform distribution between zero and one.

Step 2: Compute the inverse c.d.f, F−1(y). For example, for Rayleigh distribution, x = F−1(y) =p−2σ2_{log(1 − y).}

(62)

The R code for generating 20 random variables from the Rayleigh distribution, with σ = 1, is as follows: sigma = 1 y = runif(20,0,1) x = sqrt(-2*sigmaˆ2*log(1-y)) -2 0 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 x F(x) (x0,F(x0))

Figure C.1: The plot of c.d.f for Rayleigh distribution with scale parameter equals to one.

Bias of the maximum likelihood estimator of the generalized Rayleigh distribution

Bias of the Maximum Likelihood Estimator of the

Generalized Rayleigh Distribution

Contents

List of Tables

List of Figures

Introduction

1.1

The Generalized Rayleigh Distribution

x

pdf(x)

x

pdf(x)

x

pdf(x)

x

pdf(x)

1.2

Estimation Issues

Chapter 2

Theoretical Results

2.1

Notation and Definition

2.2

Motivating Examples

2.3

Bias of the MLE for Parameters in Generalized

Rayleigh Distribution

Chapter 3

Numerical Evaluations

3.1

Simulation design

pdf(x)

pdf(x)

pdf(x)

3.2

Simulation results

Chapter 4

Illustrative Example

Chapter 5

Conclusions

References

Appendix A

The Rayleigh Distribution

Appendix B

R code for Bias-Adjusted

Estimators of the Generalized

Rayleigh Distribution

Appendix C

Inversion Method to Generate a

Random Variable