Sample size and the accuracy of a consistent estimator
Citation for published version (APA):
Laan, van der, P., & Eeden, van, C. (2000). Sample size and the accuracy of a consistent estimator. (SPOR-Report : reports in statistics, probability and operations research; Vol. 200001). Technische Universiteit Eindhoven.
Document status and date: Published: 01/01/2000
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at: openaccess@tue.nl
providing details and we will investigate your claim.
TU/e
technlsche universlteit eindhoven I department of mathematics and computing scienceSPaR-Report 2000-01
Sample size and the accuracy of a consistent estimator
P. van der Laan, C. van Eeden
SPaR-Report
Reports in Statistics, Probability and Operations Research
Sample size and the accuracy of a consistent
estimator
Paul
VAN DERLAAN and Constance
VANEEDEN
Eindhoven University of Technology and The University of British Columbia
Key words and phrases: Peakedness, consistency, logconcave densities, strong unimodal-ity.
AMS 1991 subject classifications: 62FlO, 62Fll, 60E15.
ABSTRACT
Birnbaum (1948) introduced the notion of peakedness about () of a random variable T, defined by P(IT -
(}I
<
E), £ > O. What seems to be not well-known is that, for a consistent estimatorTn of (), its peakedness does not necessarily converge to 1 monotonically in n. In this article some known results on how the peakedness of the sample mean behaves as a function of n
are recalled. Also, new results concerning the peakedness of the median and the midrange are presented.
1
Introduction
Suppose Xl, ... ,Xn are a sample from a distribution with finite variance and one wants to estimate It = £XI based on (Xl, ... , Xn). Then it is, of course, well-known that
Xn
=
(I:i=l
Xi)/n is a consistent estimator of IL, i.e., for all £>
0,PxJe) =
P(IX
nILl
<
e) -t 1 as n -t 00. (1.1) What seems to be less well-known is that PxJe) does not necessarily converge to one monotonically in n. Thus, judging the accuracy of Xn by PXn
(e), e>
0, a larger n might give a worse estimator.In this article we first recall in Section 2 some known results on how PXn (e) behaves as a function of n. Then, in Section 3, we present new results on this question for the case where the median or the midrange are used to estimate the median or the mean of Xl.
2
Results for
Xn
and some generalizations
Birnbaum (1948) calls
])1'(£)
=
P(IT -01
<
£) £> 0
the peakedness (with respect to 0) of T and calls T more peaked than S when ])1'( e)
2:
ps( £) for all e
>
O. He proves several properties of the peakedness and gives, e.g., conditions under which, for the same 0 and the same sample size, one of two sample means is more peaked than the other.Proschan (1965) gives several results on the behaviour of PTn (e) as a function of n where
Tn is a convex combination of Xl, ... ,Xn, a sample from a distribution F. He supposes that F has a density which is symmetric with respect to 0 and is logconcave on the support of F. In particular, Proschan shows that for such a distribution
pxJe)
is, for each e>
0, strictly increasing in n (i.e., of course, for those IS>
0 which are in the interior of the support of Xl 0).Proschan also gives an example where PXn{e) is not increasing in n. In fact, he gives a distribution for which Xl is more peaked about 0 than (Xl
+
X2)/2. This distributionis the convolution of a distribution with a symmetric (about zero) logconcave density and a Cauchy distribution with median zero. Then, for 4> strictly increasing and convex on (0,00) with ¢(x) = ¢( -x) for all x,
4>(Xd
is more peaked with respect to zero than (¢(Xt)+4>(X
2))/2. Of course, for this caseXn
does not converge to zero in probability, so the result might not be too surprising. However, Dharmadhikari and Joag-Dev (1988, p. 171-172) show that, e.g., for the densityf(x)
3
1 I(l
xl:::;
1)+
181(1 :::; 1Ixl ::;
4),Xl is more peaked with respect to zero than (Xl X2)/2. And for this distribution
(1.1) clearly holds.
The results of Proschan (1965) have been extended to the multivariate case by Olkin and Tong (1987) (see also Dharmadhikari and Joag-Dev (1988, Theorem 7.11)). Further, Ma (1998) generalized Proschan's (1965) result to the case where the random variables
Xl, ... ,Xn are independent but not necessarily indentically distributed.
3
The case of the median and the midrange
Assume that Xl, ... ,Xn is a sample from a distribution with a density and that n is
odd. Let Mn be the median of Xl,'" ,Xn. For this case, Karlin (1992) proved that, when the density of Xl is symmetric around f.L, Mn+2 is more peaked around f.L than Mn. We give, in Theorem 3.1 below, a more general and more precise form of this result with a different proof.
Theorem 3.1 Let XI, ... ,Xn be a sample from a distribution F with density f. Let )\.-1
=
{xI
F(x)=
1/2} be the set of medians of F. Then, for n odd and m EM,
the peakedness of Mn - m is strictly increasing in n when F(m - c)<
F(m +c). For c such that F(m - c) = F(m+
c)(= 1/2) the peakedness of Mn is independent ofn and equal toO.
Proof. Assume without loss of generality that m
=
O. First note that, for x E (-00,00),(n-l)/2 ( ) 1 IoF(X) P(Mn>x)=
L
~
F(x)i(I-F(x))n-i=l- ( ) tn;:l(l_t(;:ldt. • 1. B nH!!±!. 0 t=O 2 ' 2 So, as a function of y=
F(x), 0<
y<
1, d Y¥(l-y)
dy P( Mn>
x) = - B (!!±!. !!±!.) 2 ' 2Putting Qn(Y) = P(Mn
>
x) - P(Mn+2>
x), this givesd (n 2)1!!±!( )!!.±.l n! n-l )n-l
dyQn(X)
((nil)!)2
y 2 1-y 2((n;l)!f
y-2 (1-y-2n - l n - l
n!
(
(n
+
1)2)
=
y-2 (1-Y)-2((!!:}l
)!r
(n+
1)(n+
2)y(1 - y) - - 2 - . This last expression is, for 0<
y<
1,>
0,=
0,<
0 if and only ifn
+
1 14(n+2) - 4(n+2) -(y
which is equivalent to
I
y -~
I {
~
}
c =~
J(
n+
2)-',So, Qn(Y) is increasing on
(! -
c,!
+
c) and decreasing on (O,! - c) and on(!
+
c, 1). Combining this with the fact that, for all n,1 for y
=
0P(Mn
>
x)=
!
for y=
!
o
for y = 1, 3shows that
>
0 for x such that ~<
F( x)<
1 P(Mn>
x) P(Mn+2>
x)=
0 for x such that F(x)=
1/2<
0 for x such that 0<
F(x)<
~.This shows that for x
2::
0, i.e. for x such that F( -x) ::; 1/2 ::; F( x), P(IMn+2 \<
x) - P(\Mn \<
x)=
P(Mn > x) - P(Mn+2 > x) - [P(Mn
>
-x) - P(Mn+2>
-x)J>0 ifF(-x) <F(x)
=
a
if F( -x) = F(x), which proves the result. 0Note, from Theorem 3.1, that the conditions on F for the median to have increasing peakedness in n are much weaker than those for the mean. Other than the obvious condition that not both m
+
€ and m - e are medians of F, all one needs for the median to have increasing peakedness with respect to an m E M is a density, while for the mean a logconcave symmetric density is needed in the proofs. But in order for the median to be a consistent estimator of the population median, one needs a unique median m and a density which is positive in a neighbourhood of m. Under this condition the peakedness of the median with respect to m is strictly increasing in n for all e>
O.We do not know whether Theorem 3.1 holds for n even.
Now take the case of a sample Xl, .. . , Xn from a uniform distribution on the interval
[8 - 1,8
+
1] and let Sn be the midrange of this sample, i.e. Sn=
-21
(min l:5i:5n Xi
+
l:5i:5n max Xi)' Then the following theorem holds.Theorem 3.2 The peakedness of Sn with respect to (I is strictly increasing in n for n
2::
2 and each € E (0,1).Proof. Suppose, without loss of generality, that (I
=
O. Then the joint density ofminl:5i:5n
Ii
and maxI:5i:5nIi
at (x,y) is, for n2::
2, given by n(n-l)( y-x )n-2So, for 1::; t ::; 0, n(n - 1)
jt
1
2t -x mftXYi ::;
2t) = 2n dx (y l:5l:5n -1 x )n-2d (1+
t)n x y= 2 and, for 0<
t ::; 1,P( min
Yi
+
maxYi
<
2t)=
1 - P( minYi
+
maxYi
<
-2t)=
1 _ (1 - t)n ,1:5i5n l:5i:5n - l:5i:5n l:5i:5n - 2
which gives, for
It I
<
1,P(ISnl
<
t)
=
1 - (1 -tt,
from which the results follows immediately. 0
We have not been able to prove or disprove increasing peakedness in n of the midrange for distributions other than the uniform.
Remark
Note that, in quoting Proschan's (1965) results, we ask for the distribution function F
to have a density
f
which is logconcave on the support of F, while Proschan asks for this density to be a P61ya frequency function of order 2 (PFz). However, it was shown by Schoenberg (1951) thatf
is PF 2 {::::=>f
is logconcave on the support of F, so the two conditions are equivalent.Further note that Ibragimov (1956) showed that, for a distribution function F with a density
f,
f
is strongly unimodal {::::=>f
is logconcave on the support of F,where a density is strongly unimodal if its convolution with all unimodal densities is unimodal. So, the condition of logconcavity of
f
can also be replaced by the condition of its strong unimodality. For more results on P61ya frequency functions see e.g. Marshall and Olkin (1979, Chapter 18) and Karlin (1968).ACKNOWLEDGEMENTS
The authors thank Chunsheng Ma for pointing out the Ma (1998) and the Karlin (1992) references.
4
References
Birnbaum, Z.W. (1948). On random variables with comparable peakedness. Ann. Math. Statist., 19, 76-81.
Dharmadhikari, S. and Joag-Dev, K. (1988). UnimodalitYl Convexity! and Applications,
Academic Press.
Ibragimov, LA. (1956). On the composition of unimodal distributions. Theor. Probab. Appl., 1, 255-260.
Karlin, S. (1968). Total Positivity, Vol. I, Stanford University Press.
Karlin, S. (1992). Stochastic comparisons between means and medians for i.i.d. random variables. The Art of Statistical Science, K.V. Mardia, Ed., John Wiley
&
Sons Ltd, p. 261-274.Ma, C. (1998). On the peakedness of distributions of convex combinations. J. Statist. Plann. Inference, 56, 51-56.
Marshall, A.W. and Oikin, 1. (1979). Inequalities: Theory of Majorization and its Ap-plications, Academic Press.
aIkin, 1. and Tong, Y. L. (1988). Peakedness in multivariate distributions. Statistical Decision Theory and Related Topics IV, S. S. Gupta and J. O. Berger, Eds. Vol. II, Springer-Verlag, New York, p. 373-383.
Proschan, F. (1965). Peakedness of distributions of convex combinations. Ann. Math. Statist., 36, 1703-1706.
Schoenberg, 1. J. (1951). On P61ya frequency functions 1. J. Anal. Math., 1, 331-374. Department of Mathematics and Computing Science
Eindhoven University of Technology P.O. Box 513 5600 MB The Netherlands e-mail: pvdlaan@win.tue.nl Moerland 19 1151 BH Broek in Waterland The Netherlands e-mail: cve@xs4all.nl