Identification and informative sample size
Citation for published version (APA):
Tigelaar, H. H. (1981). Identification and informative sample size. Stichting Mathematisch Centrum.
Document status and date:
Published: 01/01/1981
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be
important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne Take down policy
If you believe that this document breaches copyright please contact us at: openaccess@tue.nl
providing details and we will investigate your claim.
.,
IDENTIFICATION
AND
INFORMATIVE SAMPLE SIZE
' ' /,.
. .r
'
'IDENTIFICATION
AND
INFORMATIVE SAMPLE SIZE
·
PROEFSCHRIFT
ter verkrijging van de graad van doctor in de technische wetenschappen aan de Technische Hogeschool Eindhoven, op gezag van de rector
magnificus, prof. ir. J. Erkelens, voor een
commissie aangewezen door het college van
dekan~n in het openbaar te verdedigen op dinsdag 13 oktober 1981 te 16.00 uur
door
HARM HILGARD TIGELAAR
geboren te Meppel
1981
MATHEMATISCH CENTRUM AMSTERDAM
"
DOOR DE PROMOTOREN
Prof. dr. B.B. van der Genugten
en
.
/..
CONTENTSINTRODUCTION 1'
List of notations and abbreviations 5
I A GENERAL APPROACH TO IDENTIFICATION
1.1 Classes of identifiable statistical statements 6 1.2 Conditional identifiability and informational
independence 10
1.3 Identification and statistical procedures 18 1.4 Identification in statistical decision theory 19 1.5 Identification in Bayesian inference 22 1.6 Finite informative samples from stochastic
processes and the problem of minimum informative
sample size 25
1.7 Identification and prediction 28
1.8 Weak concepts of observational equivalence and
strongly informative samples 31
1.9 Local identifiability 34
II INFORMATIVE SAMPLE SIZES IN UNIVARIATE STATIONARY MODELS
2.1 Introduction and preliminary results 37 2.2 The minimum informative sample size for
MA (q) processes 4'3
2.3 Informative samples from AR(p) processes 47 2.4 Informative samples from ARMA(p,q) processes 49 2.5 Informative samples for the spectral measure;
predictability 54
2.6 Application to linear regression with MA-errors and stationary lagged explanatory variables 58
III INFORMATIVE SAMPLE SIZES IN MULTIVARIATE STATIONARY MODELS
3.1 Introduction
3.2 The fundamental lemma and some special matrix theory
61
63
-
.
IV
multivariate MA(q) processes
3.4 Informative samples from multivariate AR(p) processes
3.5 Informative samples from multivariate ARMA(p,q) processes; predictability
DYNAMIC SIMULTANEOUS EQUATIONS WITH MA-ERRORS 4.1 Introduction
4.2 Informative samples for the reduced form 4.3 Informative samples for the structural form 4.4 The non homogeneous case
APPENDIX
A.1 Univariate spectral theory and stochastic difference equations
A.2 Some multivariate spectral theory . REF:J;:RENCES 76 81 83 95 99 109 116 124 134 139
1
INTRODUCTION.
One of the fundamental problems in models of mathematical statistics is that of identifiability, that is the occurrence
of obser-vationally equivalent parfli!letervalues. Two
parameter-~alues
are called observationally equivalent if they corres· -pond to the same probability distribution. Clearly, one cannot distinguish between two such values on the basis of observa-tions, and any attemptto do so is a priori meaningless. For example in a cointossing experiment i t does not make sense to say something about the value of the coin (the unknown parame-ter), on the basis of the outcome head or tail (the observation). We shall refer to the definition of identifiability givenabove as the alassiaal definition in contrast with more recent concepts of identifiability (for a survey see SCHONFELD [31] )~ The problem is often to see whether there exist observationally equivalent parameter values. In spite of its fundamentality only little attention had been paid to this kind of problem until 1950, when KOOPMANS and RIERSOL ( [23] and [27]) tackled the problem for relatively simple linear relationships. Further BOSE introduced the concept of estimability ( [ 3]), a concept closely related to identifiability, but less fundamental. How-ever, when the models under consideration became more complica-ted, the identifiability problems became- from a mathematical.
point of view - more interes'ting, and often more difficult. Therefore i t is not surprising that the most difficult identi-fiability problems arise in multivariate analysis, as for example in faatoro analysis and in econometr-ic model.s (simul-taneous equations). It is a remarkable fact, that FISHER, who tr.eated the latter identifiability problem in 1966 [ 13] , defi'-nes obser~ational equivalence (and thus identifiability) in a way .that is only valuable for the specific model under conside-.ration. This is a dangerous approach as i t may suggest that
this definition can be generalized in a trivial way to models with lagged (dependent) variables (stochastic differenc·e equa-tions). This, however, is definitely not the case, and one of our goals is to make this point clear.
/
Another class of statistical problems where difficult identifiability problems arise is the statistical analysis of time series. Most important and widely used are stationary time series. The special identifiability problems are first recognized by HANNAN who in a fundamental paper ([ 18]) in 1968 treats the mixed autoregressive moving average model (ARMA):
1
~ ~t-k
=
r
BJ.~t-]·
k=O j=O
t 0,
+
1, ••.Where the m-variate random process {~t} is observable. (Through-out this thesis random.variables will be denoted by underlined
(lower case) letters).
In 1971 HANNAN, one of the leading authors in the field of identifiability in time series, also treated the multiple
equation model with moving average errors ([ 17] ). We also refer to DEISTLER, who treated models with stationary explanatory variables. ([ 4] and [ 5].)
Although most authors refer to the fundamental paper of HANNAN, and consider the ARMA case as completely solved, there is one important but unrecognized problem unsolved. To see what this problem is, i t should be noted that instead of "probability distribution" in the classical definition i t is more realistic to read: "probability distribution of the observed sample". Now the basic tool in the papers of HANNAN and DEISTLER is unique factorization of spectral densities and in this approach one has to study the probability law of the whole (observable) process rather than that of some finite sample. Since in prac-tice one always has a finite sample, the identifiability pro-blems have, in fact, only partially been solved. As far as we know, problems of this kind are not treated in the literature. Only recently MARAVALL [25] proved to be aware of i t in the summary of his thesis. MARAVALL studied
loaal
identifiability in dynamic shock error models in contrast to the classical definition which is sometimes calledglobal
identifiability.3
We shall not pay much attention to local identifiability in this thesis. Furthermore we shall restrict our attention to stochastic processes in discrete time. Identifiability problems for processes in continuous time are hardly found in literatu-re; we refer to WESTCOTT [36].
Fairly general approaches to the theory of identifiabili-ty have been made by SCHONFELD [ 31] and more recently by van der GENUGTEN [ 14] • Following SCHONFELD one can easily get the impression that there is a close connection between identifi-cation and estimation and therefore that identifiability pro-· b!ems are part of estimation theory. This, however, is rather misleading. As van der GENUGTEN points out, identifiability problems may arise in other statistical problems such as hypo~ thesis testing.
In Chapter I we shall present a general approach to iden-tifiability, that enables us to recognize identifiability problems in all kinds of statistical problems, in particular in statistiaaZ prediation problems.
Although we shall not be concerned with Bayesian inference and statistical decision theory, we shall make one excursion into those fields.KADANE [22] says:
"One general question unresolved in this literature is, whether Bayesian theory requires a different definition of identification from the classical one".
Or ROTHENBERG ([29] p. 14):
"We leave unanswered the question of an appropriate Bayesian definition of identification".
MORALES ([ 26] p. 20) reports:
"The cqncept of identification in a Bayesian cont'ext is not alltogether clear. We shall adopt the view of consi-dering a structure 'identified' if the posterior density of the parameters of the model is not 'flat' on a sub-space of the parameter sub-space. This point of view may not be entirely satisfactory".
done by LINDLEY ([24] p 46, foOtnote 34)) the only author dea-ling with this problem is KADANE who ~~esented a Bayesian approach in 1975 using the classical definition. As this in our opinion is not quite satisfactory we present an alterna~ tive approach in § 1.3. KADANE also pays some attention to the
role of identification in statistical decision theory, a topic hardly treated in literature. However, the question what iden-tifiability really means in a decision-theoretic setting re-mains unanswered. A few ideas are presented in § 1.2.
In Chapter II univariate stationary models are treated, and in Chapter III the corresponding multivariate models. We treat them separately, not only for sake of clarity but also because most multivariate problems are essentially more diffi-cult than the corresponding univariate ones, and the 'obvious' generalization may be false. The results of these two chapters may have some interest outside the probabilistic setting as they can be seen as results in the theory of matrices with rational functions of a complex variable as elements.
In Chapter IV we shall deal with dynamic simultaneous equations with moving average errors, using results of Chap-ter III.
,
,'
5
~IST OF NOTATIONS AND ABBREVIATIONS
~ vc lv lR
:nt
a: a: (m) empty setcomplemept of the set V
indicator function of the set V set of real numbers
set of positive real numbers set of complex numbers
set of complex m ~ m matrices
a:+ (m) a:k
set of hermitian positive definite m x m matrices,
(m) z Im A' A*' tr A r [A] A ~ 0 ker A
k-fold cartesian product of <t(m) with itself.
complex conjugate of z
m x m unit matrix
transpos~ of the matrix A
complex conjugate transpose of A
trace of the square matrix A rank of the matrix A
if A E <t(m) and A is semi definite positive
nullspace of the matrix A
linear space spanned by the columns of A orthogonal complement of the linear subspace L
~' ~' £t••• random variables o r - vectors
E {.} expectation
v {.}
i. i.d.i.i.m.
n • n<'\
A : iff 0 B covariance matrixindependently identically distributed
limit in the mean i.e. 1. i.m. X ~ i f
-n n + 00
lim E
n+oo { ~~n
-
~12} = 0
euclidean norm of vector or matrix
Kronecker o i.e. o0
=
1 and ot=
0, t ~ 0A is defined by B if and only if end of proof
I
CHAPTER I
A GENERAL APPROACH TO IDENTIFICATION
1.1 CLASSES OF IDENTIFIABLE STATISTICAL STATEMENTS
In (non-sequential) statistical inference the observatio-nal material (the sample) is considered to be a realization of some random vector or process ~ that takes its values in a measurable space (X,
U1)
(the sample space). The only thing the statistician knows about the true distribution of ~ is that i t belongs to a given classP
of probability distributions on(X,
2 ) .
In most statistical problems the class
P
admits a natural andsimple parametric representation. More precisely, a mapping P is given from a known parameterspace
a
into a given class ofprobability distributions on (X,~). The range of this mapping
is
f.>
and if P 9 denotes the image of e Ea
under P then we canshortly write
P=
{P9
I
e Ea}.
The corresponding statistical problem will be denoted by the triple (~,'P
I a) • The goal ofa statistician is to know something more about the true para-metervalue than that i t belongs to
a.
Thus i t is natural to consider subsets ofa
and to identify them with statistical statements. This leads to the following definition.DEFINITION 1.1.1
a
.
-ca.
0
A statistical statement is a subset
REMARK.A possible interpretation is that a statement is true iff the unknown parametervalue belongs to it.
'·
The parametric formulation is very attractive because of the direct interpretation of the parameter. However, i t can duce the problem of identification. Suppose there exist
9
0 E 90 and
e
1 E 90c
with P9 = P9 • we say that0 1
intra-e
and0
e
1 are observationally equivalent if P90
P
e •
The statjstician7
should then refuse to make the statement e
0 (or e~) .because i t
discriminatE'ls between the observation .. ly equivalent values eo and 01 while e
0 indicates that e0 is true and e1 that i t is false. Therefore a natural concept in statistical inference is the identifiability of statements.
DEFINITION 1.1.2 The statistical statement e0 is called
iden-tifiable
w.r.t.
P
or equivalently~ is said to beinformative
for e
0, if for all e1, 01 E ewe have the implication
It should be noted that observational equivalence is an equi-valence relation on e and therefore induces a dissection of e into equivalence classes, called observational equivalence- , ,
I
.--' '
classes. Thus, if we accept the axiom of choice, i t is formally . r
always possible to avoid identifiability problems by defining a new parameterspace consisting of one element out of each equivalence class. Of course such a reduction may be difficult to perform in practice, but that would not be a fundamental objection. This reduction is in general not reasonable because i t may destroy the simple and natural form of the
parameter-• space in which case the parameter looses its natural interpre-tation.
From a mathematical point of view i t is interesting to '
1consider classes of statements.
THEOREM 1.1.3
\
Let J be the class of all identifiable sta-tements. Then we have
a) eo E J ~ ec 0 E J
b) e E J
,
V E N ~ ne
E J for .arbitrary index set N.\)
v E N \) The simple proof is omitted.
.
\' '
REMARK.a) and b) imply c) 9
e
J, \/ \/ E N d) ~ E J, 9 E J ~ u v E N 9e
J \/In most statistical problems the statistician is not interested in all statements but merely in a certain class of statements.
If {9v}v E N is a class of statements the statistician is
interested in, which means that he is willing to say whether
9v is true or false for all v E N, then he should also be
interested in all statements that can be formed from them by
taking complements and
I
or intersections. Thus the statisti~cian is in fact interested in a class of statements with the properties a) and b). Therefore we define
DEFINITION 1.1.4 A class of statements with the properties a)
and b) is called an infoPmationaZ cZass.
REMARK 1. Statements of an informational class are not necessa-rily identifiable.
REMARK 2. If J
0 is the smallest informational class that
con-tains a given ·set of statements {9v}v EN then J
0 is said to
be genePated by {9v}v E N'
Two simple examples will illustrate the ideas.
EXAMPLE 1.1.5 If the statistician is interested in point
estimation he will consider all one-point subsets (singletons).
The smallest informational class that contains all singletons is the class of all statements.
EXAMPLE 1.1.6 If the statistician is dealing with a
hypo-thesis testing problem, he will consider only two
complementa-ry subsets 9 and 9c. The smallest informational class that
0 0
contains 9 (and 9c) is {9 , 9c, ~. 9} and will be denoted by
·I
I '
./
,
_
9
Both examples are special cases of the more general,situation
, where the statistician is primarily il• ceres ted in the value
taken by a given mapping ~ : e + A from e into some space A.
In such cases attention is restricted to statements that can
be formulated in terms of ~. Formally
DEFINITION 1.1.7 A statistical statement e
0 is said to be in
terms of ~ : .e +A if there exists a subset A c A such that.
0
eo=
~-1
(Ao).LEMMA 1.1.8 The class of all statements in terms of ~ is
an informational class.
The proof is very simple and will be omitted. The
informatio-nal class is said to be generated by ~ and will be denoted by
J •
~
EXAMPLE 1.1.9 (see also examples 1.1.5 and 1.1.6)
a) If ~ : 9 + e is the identity map, then J~ is the class
of all subsets.
b) If ~ = 1 is the indicatorfunction of a subset e
0 of
a;
eoJ = {e , ec, jll, e} = Ja •
~ 0 0 " 0
then
DEFINITION 1.1.10 The mapping ~ is called identifiable w.r.t.
~, or equivalently~ is said to be informative for ~' if r
every statement in terms of ~ is identifiable.
REMARK. It follows from the remark on theorem 1.1.3 that for~
to be identifiable i t is sufficient that
~-
1({A})
isidenti-fiable for all A E A.
EXAMPLE 1. 1. 11 (see also example 1.1.9). The statement 9 0 is identifiable iff its indicator function le
0
is identifiable.
The following lemma shows that a mapping ~ is identifiable iff,
i t is constant on observational equivalence classes and thus
.
,
',
LEMMA 1.1.12 x is informative for ~ : 0 + A iff for all 01 , 02 E 8 the Lmp~ication ~(0
1
)i
~(02
) P0 1
i
P0 2 holds. PROOF. Let x be informative for ~ and ~ (01)i
~ (02). Then the statements ~.., 1 ({ ~(01)}) and - ~ -1 ({ ~(02)}) are identifiable since ~ is and .cp -1 ({ ~ (0 1)}) n cp -1 ({cp(02)})=
~ sincecp(01)
i
cp(02). Hence P0 1
i
P0 2 •Conversely let the implication hold and A E A be such that
cp- 1 ({A})
i
~and cp- 1 ({A})i
8. It follows immediately thatcp-~({A}) is identifiable and since A was arbitrary the result follows from the remark following def. 1.1.10
o.
REMARK.From the lemma i t follows that~ is informative for cp
iff there exists a mapping Cl
such that cp
'f+A
Cl o P. In particular i t follows
tnat x is informative for its moments. p
A
So far the consideration of classes of statements does not give new results. However i t turns out that i t is a fruitful basis'for the presentation of new ideas and for extending the theory to Bayesian statistics and statistical decision theory. This will be done in the next sections.
1.2 CONDITIONAL IDENTIFIABILITY AND INFORMATIONAL INDEPENDENCE
Let 8 be an arbitrary statement. Then we shall denote ,•,0
the dissection {80 , 8~} of 8 by :1:>
8 o • More generally we con-sider arbitrary dissections {Dv}v EN of 8. A mapping cp : 8 + A
induces. a dissection ~cp = { DA} A EA of 8 where DA : cp -1 ({A}). (Every dissection can be generated in this way by a function).
.
'
I '
11
I
values e1 and 02 ~-equivalent if they belong to the same Dv and write 0
1 ~
e
2• It is often easy t.u see that for some Ill,_ ~- equivalent values of 0 are not observational equivalent. Therefore we defineDEFINITION 1.2.1 The statement a1 is said to be identifiable aonditionaZ on the dissection \l>, or equivalently x is called informative for a conditional on ~ if for all e1 , 02 E a the following implication holds
0 1 E a1 , 0 2 E ac
l
1 ~Pe
I'
Pe .
0 ~ 02 1 2 1 "'The mapping ~
1
: a + A is said to be identifiable conditionalon ~ if every statement in terms of ~
1
is. The statement a 1 or the mapping ~1
is called identifiable conditional on ~if~= ~.
~
In the same way as lemma 1.1.8 we have
LEMMA 1.2.2 The mapping ~
1
: a + A is identifiable condi-tional on ~2
: a + Q iff for all 01 , 02 E 8 the following implication holds
}
REMARK, It follows from this lemma that~ is informative for
~
1
: 8 + A conditional on ~2
:e
~n
iff there exists a mappingCl : n x
:P
.
+ A such _that ~1
= Cl o 1jJ where 1/J : 8 + n x P is defined by 1/J (e) = ( ~2
(0), P 0) e Ee.
e
·
n
xP
~/
. A ') I .jrTHEOREM 1.2.3 (Conditional identification theorem).
If x is informative for ~
1
conditional on ~2
and x is informa~tive for ~
2
, then~ is informative for ~1
• PROOF. Suppose ~1
(81
)'f
~1
(82
). I f ~2
(81
)P 8
'f
P 8 by lemma 1.2.2. I f ~2
(81
)i
~2
(8 2 )1 2
since~ is informative for ~
2
.0~
2
(e2) we have we have P 8'f
P 81 2
EXAMPLE 1.2.4 Consider the following simple model
where E{ £}
=
0 , V{~} = 1, and where a E A c:m.
+ andJ.l E M c
:m.
are unknown constants. The random variable .f is unob-servable and ~ is observable. If ~ E Z is a parameter thatcharacterizes the distribution of~ then we may put 8 : = (a, ll,~)
and the natural parameterspace is
e
= A X M Xz •
Let the functions ~land ~
2
be defined by ~1
(8) =a, 8 Ee
and ~
2
(8) = J.l , 8 Ee.
Since A c lR+, different a-values cor-respond to different variances of x and therefore to different distributions of x. Hence x i s informative for ~1
. On the other hand, if a is held fixed, different J.I-Values correspond to qifferent expectations of ~ and thus ~ is informative for ~2
conditional on ~1
. By the conditional identification theorem i t fo+lows that~ is informative for ~1
• Note that Ill~ J.l2 does not imply different expectations for x.
Intuitively one could expect that if the mappings ~
1
and~
2
are in some sense 'independent', conditional identifiability should imply identifiability. In the next section we develop such a COI)?ept of independence.Let
e;
ce
be a statistical statement. Then there surely exist identifiable statements that havee
0 as a subset (e.g.
e),
and i t is easily seen that there exists a uniquely determined smallest identifiable statement with this property (take the union of all observational equivalence classes that have nonempty
intersection with
e
13
DEFINITION 1. 2. 5 The identifiab te huH JC( 90 ) of 9 is the
f I 0 ·'
smallest identifiable statement that L.ls 9
0 as a subset
iff .
a
0 is. identifiable.Pe }.Clearly, JC (9 0) 0 9 .o REMARK.Since JC (9 0) n JC (91) is an identifiable statement as
intersection of identifiable statements we always have
JC (9
0
n
91)c
JC (90)n
JC (91).DEFINITION 1.2.6 The informational classes J
0 and J1 are
cal-led inforomationatty independent if for all 9
0 E J0 and 91 E J.1
we have JC (90 n
e
1)=
JC (90) n JC (91). Two mappings ~1
and ~2
are informationally independent if the informational classes
J and J are. Two statements 9
0 and
a
1 are informationally~1 ~2
independent if J
9 and J9 are.
0 1
Before we can establish the relation between conditional identifiability and informational independence we need the following lemma.
LEMMA 1. 2. 7 Let ~ be informative for 1!12 conditional on ~1.
If 9 E J 0 E 9 and 91 -1 ( { ~1 ( e 1) } ) then
I : ~1
0 ~2 1 0
. JC ( 9 n 91) n 91 = 90 n 91.
0
PROOF. One way ( ::> ) being trivial we only have to prove
JC(90 n
e
1) ne
1c e
0 ne
1 • Ife
0 ne
1 = ~ this istrivial , .. ~o suppose
e
0 ne
1:f:
¢. Let e0 e JC(e
0 ne
1 ) ne
1 be arbitrary. Then e0 e
a
1 and thus we have to prove e 0 Ee
0· •. Suppose e
0 E 9~· Then by the definition of JC (90 n
e
1) thereexists
e
2 ee
0 n
a
1 such that Pe=
P9 • We also have e0:f:
e
2 , ,0 2
and ~
1
(e
0) ~1
(9;2)=
~1
(ei)
sincee
0 ,·
e
2
ee
1 • But then<we have P0
:f:
P0 since ~
2
(and thuse
0) is identifiabletional on ~
1
• Thus we h~ve a contradiction and the lemma is proved.oTHEOREM 1.2.8 If~ is informative for ~
2
conditional on ~1
and ~
1
and ~2
are informationaayindependent then~ is informa-tive for ~2
•PROOF. Let 9 E J be arbitrary and
0 ~2
we have to prove that P
0 1 ~ P0 2 • Put
01 E 90• If 02 E 9~ then
-1 .
9 1 ~1 ({~1(01)}).
Then 01 E 91 and since ~ is informative for ~
2
conditional on~
1
we have by lemma 1.2.7 (1.2.1) JC (9 n 9 ) n 90 1 1
By informational independence we also have
(1.2.2)
and since 91 c JC (91) , (1.2.1) and (1.2.2) imply
Thus 9
0 n~ is an identifiable statement with 01 E 90 n 91 and 0
2 e (90 n 91)c. Hence P0 ~ P0 and the theorem is proved.
0
1 2
We shall consider one important case more closely. Sup-pose 9 is of the form 9 c
u
xv
and let 0=
(~1
(0) , ~2
(0)) , 0 E 9 where u :=
~1
(0) E U and v=
~2
(0) Ev.
Thus ~1
(0)and ~
2
(0) are projeations of 0 on U and V, respectively, and the question arises when ~1
and ~2
are informationally inde-pendent. We haveTHEOREM 1.2.9 The projections ~
1
and ~2
are observationally independent iff all classes of observationally equivalent values15
of e are of the form U
0 x V0 1
u
0 c U 1 V0 c V.PROOF. (If) Let U
0 and V0 be two arbitrary subsets of U and V
respectively. Then we have to prove
v
v
0 1-··- -'JC (U x V) n 'JC (U xv ) .
0 0 'JC (Uo X V) f---· I'
II
I I - 1-- - - -· ·---
- -
-I r- ---I I I 1 e 1 !·~--... e2 I:I
: ~ le • 'lt ' II
I
I
- -
r
-.,
I - - ·-1- - - I - 0 - ---~_;_
- -- -1 'JC (U x V ) 0 i I I I- - - -
~-1 'JCu
0tJ
One inclusion (C) being trivial we only have to prove the other (J). Let e
0 E 'JC (U0 x V) n 'JC (U x
v
0) . Because e0 E 'JC (U xv
0)there exists e1 E U x
v
0 with P00
=
Pe1and since the observa-tipnal equivalence class to which 0
0 belongs is of the form
u
1x ,v
1 1 e1 can be chosen such that ~
1
(01
)=
~1
(e0
) and01 E U x
'll6.
In the same way there exists 02 E U 0 x V 0 such that P8=
P0 and ~2
ce2
)=
~2
ce1
). But then we have Pe P02 1 0 2
and so e E 'JC (U x
v ) .
0 0 0
(Only if) Let ~
1
and ~2
be informationally independent and suppose there exists an equivalence class S c 9 which is not the cartesian product of subsets of U and V. The~ there exist01· (u1 , v 1) E S and 02
=
(u2 , v 2> E S such that either 0 3=
(u2v
1)¢
S or 04 :=
(u
1 , v2)¢
S. Suppose 0 3¢
s.
(see figure). Choose U 0 : = {U2} andv
0 : = {v1}. Thenu
0 xv
0=
{03} and so X (U xv )
ns
= ~v
0 0 --4---~---L--U since S is an equivalence class and 03¥
s.
We also have· X (U X 0 V) ::)s
since 0 2 E (Uo X V) ns.
Similarly X (U Xv
):J 0s.
Hence S c X (U X V) nx
(U X V ) 0 0Since ~
1
and ~2
are informationally independent we haveX (U x V) n X (U x
v)
=X (U xv)
and sos
c X (U xv ).
0 0 0 0 0 0
This contradicts X (U
0 x V0) n S
=
~ and proves the theorem.DEXAMPLE 1.2.10 Consider the standard univariate linear regres-sion model
E{_£} = 0
where
y
is the n-vector of observations, X is a known n x kmatrix of k explanatory variables, 8 is a k-vector of unknown regression coefficients and .£ is an n-vector of (unobservable) errors. I'!i ~ E V is a parameter that characterizes the distri-bution of .£ and 8 E U c lR
~
we may put 0 := (
8 ,0
and thenatur~l choice for
a
isu
Xv.
Note that the distribution p0of y depends on 8 through E {y}
=
X 8. Thus if 01=
(81 , ~1
>and 02
=
(82 , ~2
) are observationally equivalent we must have X 81 = X 82• But then (81 ,~1
) and (82 , ~1
) are observatio-nally equivalent and also (81 , ~2
) and (82 , ~2
).. ~
17
Thus the observational equivalence classes are of the form ·
U x V , U c U , V c V and so 8 anc ~ are informationally '
0 0 0 0
-independent by theorem 1.2.9. It follows from theorem 1.2.8 ·
that in order to investigate identifiability of ~ we may
con-sider
y -
X 8 as observable, and for identification of S wemay consider
y -
~=
Xs
as observable. The latter implies thewell-known results that if
e
=
IRk x V , n~ k then a necessary
and sufficient condition for identifiability of 8 is r [X]
=
k,and that for identifiability of d'S , dE lRk a necessary and
sufficient condition is dE < X' > •
EXAMPLE 1.2.11
(Error in variables model)
Let the variables ~tand vt be related through
t 1, 2, .••
Suppose we can only observe ~t and vt with observational error,
i.e. we observe at t 1, ••• , n
where E {~t}
=
E {~t}=
0. Let ~ E V be a parametercharacte-rizing the distribution of (~t , nt) , t 1, 2, , n.
Put 0 :
=
(a,s,
v1 ••• , vn , ~ ). Obviously, ifa
is such thatv t is allowed to be constant over time, (a , S) is in general '.
not identifiable. We shall show that if
e
is a subset of mn+2x Vsuch that vt is not constant over time, then (a , 8) is iden-tifiable. The model can be put in the form of a linear regres-sion model with unknown regressor vt
t = 1 , 2 , ••• , n
Since vt is not constant over time, i t follows from example
1.2.10 that (a , S) is identifiable conditional on (v1 ••• vn).
However, (';:_'1 ••• vn) is identifiable since i t is the
expecta-tion of the observable vector (~
1
••• ~n). Thus i t follows bytheorem 1.2.3 that (a , S) is identifiable.
1.3 IDENTIFICATION AND STATISTICAL PROCEDURES
In this section we present some new ideas on identifica-tion that, roughly speaking, tell the statistician what statis-tical procedures should be forbidden in the presence of obser-vationally equivalent e's.
Let J denote the informational class of statements the
0
statistician is interested in. Note that since
a
E J there0
is no value of e that is excluded a priori by the statistician from being the true value. Although not all statements in J
0
need to be identifiable, there exist identifiable statements in J
0 (e.g. a) and there exists a largest informational class
J
of identifiable statements in J • Suppose J is endowed0 0 0
with a a-field
S'
of subclasses such that j0 E ~.
DEFINITION 1.3.1 A statistical procedure is a measurable map-ping d : (X,
2 )
-+ (J0 1
J')
with the interpretation that if xis observed, then the statistician makes the statement d(x).
Before a statistician answers the question what 'good' statis-tical procedures are, he should answer the question what pro-cedures he will consider as a priori meaningless. Intuitively i t seems reasonable to ignore procedures that can produce un-identifiable statements with positive probability for some
8 E
a.
This motivates the following definition.DEFINITION 1.3.2 A statistical procedure d is called ignorable
i f P
8 {d(~) E j 0} < 1 for some e E a .
Thus the ~tatistician will only consider statistical procedu-res d with
1
e
e
a
EXAMPLE 1.3.3 Consider the case where J
0 is the class of all
' I"' I
' I
19 ' I
interested in point estimates of 0 , Then definition 1.3.2 implies that a non-ignorable estimator takes its values almost surely in one-point observational equivalence classes or, ·
equivalently, with probability one (V 0) i t does not discrimi-nate between observational equivalent values of 0. In the case that all observational equivalence classes contain more than one element (or : there is no identifiable singleton) the sta-tistician should refuse to produce point estimates. It is important to mention however, that this does not imply that every other statistical statement such as region estimates or acceptation of a hypothesis, is a priori meaningless.
1.4 IDENTIFIC~TION IN STATISTICAL DECISION THEORY
In this section we extend the statistical problem
...
(~
,
:f>
,
a)
to the statistical decision problem (~ , 'f> ,a,
'
A, L) ,where -A is the space of actions (or strategies) for the statistician.
-L is a mapping from
a
x A into a space C, called the space of consequences. In most cases L takes real values and is then called the Loss function or pay-off function, and L (0 , a) is interpreted as the penalty for the statistician for taking action a if 0 is the true parameter.Let La(e) :
=
L (0 , a) , e Ea
denote the section of L at a E A. Thus L is a mapping froma
into the space of consequen~a ces C.
When the statistician has to make up his mind whether he shall take action a or not, he will base his decision on the
conse-!'·,
quence L~(0). This is,however, impossible if La(e) is unknown but he may hope that if~ is informative for La' he can make' a choice which is not a pure gamble.
Let A
0 denote the set of actions a such that x is
informa-', tive for L • Suppose further that A is endowed with a o-field a
~ of subsets of A such that A E~ • A measurable mapping 0
d : (X , 'ijj ) + (A ,
J'l) ,
with the interpretation that action d(x) is taken if x is observed, is ca~~ed adeaision rule.
DEFINITION 1.4.1 A decision rule d is called ignorable if P
0 {d(~) E A0} < 1 for some 0 E
e.
Thus the statistician should only consider decision rules d with P0 {d(~) E A
0 } = P0 {Ld(~) identifiable } = 1 , 0 E
e.
In fact, any statistical problem can be considered as a special case of a statistical decision problem. To see this we take A= J0 , the informational class of statements the
statis-tician is interested in and for L a mapping into a set consis-ting of two consequences c
0 and c1 (c0
f:
c1) to be interpretedas 'true' and 'false' respectively, 0 E a
(1.4.1)
A'decision rule is now a statistical procedure and the follo-wing theorem shows that then definitions 1.3.2 and 1.4.1 are equivalent.
THEOREM 1.4.2 If A is a class of statements and L is given by (1.4.1), then a decision ruled is ignorable iff i t is ignorable as a statistical procedure.
PROOF, The theorem follows if we can prove: Ld(x) identifiable a.s. iff ~(~) identifiable a.s. Let a E A be arbitrary and suppose that La is identifiable. Then L:1({c
0}) =a is
iden-tifiable. On the other hand suppose a is ideniden-tifiable. ThenJ
-1 -1
also La ({c
0}) is identifiable as the complement of a= La ({c0}).
, Thus by the remark following definition 1.1.10 L is identi-a
_I
21 ,'1
If all actions are statistical statements, then a decision-rule is a statistical procedure and the question arises for- -which loss functions the definitions of ignorability are equi- , valent for
L :
a
-+-c
aall rules d. Thus we are looking for functions
such that L is identifiable iff a is identifiable. a
Necessary and sufficient for L to have this property is a
Lc/(0 1) La(02) •• (0 c
1 , 02 e a v 01 , 02 e a ) Thus L must be constant on a and on ac.
a
Hence L is of the form
1
c0 (a) 0 E a
(1.4.2) L (0 1 a)
=
c1(a)
EXAMPLE 1.4.3 (Estimation) Let a A IR and consider the usual quadratic loss function
2
L
(e
,a) : = (0 - a) I 0 Ea
a E ASince L is not constant for 0 ~ a, this loss function is a
.clearly not of the form ( 1. 4. 2) •
EXAMPL_E 1. 4. 4 (Hypothesis testing) Let a
=
IRk and consider the problem of testing H0 : 0 E a0 against H1 : 0 E a~ where
ao is a (measurable) subset of a. Let
A
I
:
{a a c } I and take0 I 0
L
(e
I a) :=
1 - 1 (0) a E A Ie
Ea.
a
Obviously i t is of the form (1.4.2). Note, that a decision rule for this problem is not ignorable iff the statement 9
0 is
identifiable.
1.5 IDENTIFICATION IN BAYESIAN INFERENCE
Let~ be a a-field of subsets of
a,
and T some measure on (8 ,rfi),
called the prior measure. Following KADANE [ 22] we shall interpret T as an opinion on 0 adopted by the statistician before x is observed. It is clear that subsets of 8 of T-measurezero cannot play a role anymore and that the classical concept of identification becomes rather meaningless. Furthermore, when the statistician has the opinion T on (8
,cA)
this implies thathe will consider only cfl-measurable statistical statements. It should be noted that there exist identifiable statements in ~
(e.g. 8), and a good Bayesian definition of identifiability is of course such that these statements are also identifiable in the Bayesian sense. This leads to the following definition.
DEFINITION 1.5.1 A statistical statement 8
0 EcA is called
iden-tifiable w.r.t. T or equivalently x is said to be informative for 8
0 w.r.t. T, if for some N E.:fJ with T {N} = 0 and for all C · 0
1 1 02 E 8.- N we have the implication: 0
1 E 80 , 02 E 8
0
~ P0 ~P0
.1 2
REMARK. By choosing for T a measure that assigns positive mass
to all points of 8, identifiability w.r.t. T is equivalent to identifiability. However in most practical situations this would imply that the measure T is not Q"-finite. Since such models are analytically not very attractive, the statistician
shall prefer Q"-finite prior measures.
Obviously all statements of T-measure 0 and tneir complements are identifiable w.r.t. T and the analogue of theorem 1.1.3 is that the class of all statements which are identifiable w.r.t.
T is a sub o-field of ~ .
Since functions of 0 are now functions on the measurable
space ( 8 , .fJ) , i t will be clear that the only functions <P the
statistician can be interested in, are measurable functions.
Let (A , £ ) be some measurable space, where .C is a o-field of
' it•
23
DEFINITION 1. 5. 2 The measurable function q> : (a , cfJ ) + (A ,
.C )
is called identifiable w.r.t. T, or equivalently~ is said to be informative for q> w.r.t. T if x i s informative for q>-1 (A0)
w.r.t. T for all A
E£.
0
Although definition 1.5.2 is the obvious generalization of def. 1.1.10 i t gives rise to a remarkable difference between classical and Bayesian theory since the obvious analogue of lemma 1.1.12 does not hold. More precisely we have
LEMMA 1. 5. 3 If there exists a set N E~ with T {N}
=
0 such that for all 01 , 02 E 9 - N we have the implication~p(0
1
) ~ q> (62) ~ P0 1
t
P6 2 , then x is informative for 1p w.r.t. T.Let A be arbitrary with T {q>-1 (A)}> 0 and
0 0
PROOF.
T { ip -1
(A~)}
> 0. (If q>-1 (!0) or q>
-1
(A~)
has T-measure zero,nothing remains to prove). Let 0 1 E -1 (Ao) - N and 0 2 E q> -1
(A~)
- N. Then ipq>(01) ~ q>(62) since ip ( 01) E A and ip (62) E Ac
.
Hence0 0
P6
t
P0 and so ~ is informative forq>~1(Ao)
w.r.t. T.
0 1 2That the converse does not hold in general can be seen as fol-lows. Let A be a non-denumerable set and
£
a a-field of subsets of A that contains all'one point subsets. Suppose the measurable function q> : (9 , ofJ ) + (A , .C ) is identifiable w.r.t. some prior measure T.Let A E A be arbitrary and let NA E ~ be a null set such that for ail 01 , 02 E 9 - NA the implication
0
1 E q>- 1 ({A}) , 02 E
(~p-1({A}))c
~
P6 1
~
P0 2 holds. Then a set N, such that for all 01 , 02 Ea -
N we haveq>(02) ~ P
t
P0 , is equal to
61 2 AEA u
Since A was non-denumerable, N need not be a null set, or may
'
'I '
even be nonmeasurable.
In order to obtain the equivalence in lemma 1.5.3 we need a stronger concept of identifiability of functions w.r.t. -r.
DEFINITION 1.5.4 The measurable function cp (8 .,
t.A )
+ (A· ,£ is called uniformly identifiable w.r.t. -r, or equivalently x · is said to be uniformly informative for cp w.r.t. -r, if there , exists a set N E ~ with -r {N}=
0 such that for all A0 E
L
ande1 ,e
2 E 8 - N we have the implication e1 E
cp-~A
0
)
,~ E m-l(,c) ~ p ~ p
~2 Y nO 81 T 82"
'The difference between definitions 1.5.2 and 1.5.4. is that in the former definition the set N may depend on A
0 • It is now easily seen that the function cp is uniformly identifia-ble w.r.t. -r iff there exists a null set N E
ofl
such that for all e1 , e2 E 8 - N we have cp(e1)
t
cp(e2) ~ P81t
P82•Although the concept of identification introduced in defi-nit'ions 1.5.1 and 1.5.2 is different from that in KADANE [22]
(who in fact uses the classical concept) the next theorem will show that the concept of identification introduced in this section is just what we intuitively want i t to be.
For terminological convenience we shall restrict ourselves to the case where -r is a probability measure. It is then called the prior distribution and the identity on (8
,u4)
can be considered as a random object ~· To avoid problems with con-ditional probabilities we shall suppose that ~ is a random vec-tor and that x takes its values in a complete separable metric sp.ace. Suppose further that for all X EJ3
the function0
'I' : (8 ,~) + ([ 0,1] ,
cZ) )
where ~ is the Borelfield on [ 0,1],,·., 0 0
is defined by '!' (8) : = P
8 - {X } and is JY-measurable. (For 0 details on conditional probabilities see ASH [ 2] p. 264 - 265) Then P0 can be interpreted as conditional distribution of x given e and we.may write down the posterior distribution Tx' the distribution of
e
conditional on x. It is interpreted as the opinion one
after x has been observed, and roughly speaking one could expect the opinion one
(non degenerate) to beI .
25
changed with positive probability if x is informative for e w.r.t. ' · Let P denote the joint dist~~bution of~ and e. We have
THEOREM 1.5.5 If 0 < , {e
0} < 1 for some e0 E ~ and if x is informative for eo w.r.t. T then P{Tx
i
T} >o.
PROOF. We have 'x = T a.e. (x) iff the random variables x and
~ are independent or equivalently Pe
=
F a. e. (e) where F denotes the marginal distribution of~· Since~ is informative·for e
0 w.r.t. , , there exists a null-set N E~ such that and ec - N have positive
0
Hence { x
I
'xi
<} must nal to e).D
•-measure we cannot have Pe F a.e. have positive probability
(unconditio-REMARK 1. If all e E
.fl
have ,-measure 0 or 1 , e is identifiable0
-w.r.t. T regardless what Pe is. Of course such an opinion T is not a very realistic one.
REMARK 2. Conversely , if P {T
i
T} > 0 then x and eX
<e) •
are not independent and so there exist e
0 , e1 E e with Pe 0
i
P0 1 • Hence there exist nontrivial identifiable statements (i.e.statements e
0 C e with e0
i
~ and e~i
~). It is however noto;I.ear .whether they are
c.A
-measurable.1.6 FINITE INFORMATIVE SAMPLES FROM STOCHASTIC PROCESSES AND THE PROBLEM OF MINIMUM INFORMATIVE SAMPLE SIZE
Let {~t; t
=
1, 2, •.• } be a (possibly vector-valued) ' observable stochastic process in discrete time. If the distri-bution of the process is characterized by some unknown parame-ter 0 E ewe putp<~>
={P(~)
I
0 E e} where p(oo) denotes the0 0
(infinite dimensional) distribution of the process { ~t}.
Suppose we observe the process { ~t} at t
=
1, 2, ••• , n. Then our actual sample is (~1
, ~2
, ••• ~n). The joint distri-bution·p~n)
of this sample is a marginal distribution of P(oo) LetP ( ) = {
P~n)
I
0 E 8}.It will be clear now, that inferences about a function ~ : 8 + A,
based on the sample (x , ~
2
, .•• ~n) only make sense if~ is identifiable w.r.t. j)ln):rather than w.r.t.'P
(oo).Therefore the sample must be informative for ~· Obviously
iden-tifiability with respect to p(n) implies ideniden-tifiability with respect to
f> ("') ,
but the converse does not hold in general. Although i t follows from KOLMOGOROVS extension theorem thatp~"')
is determined by the sequenceP~n)
, n = 1, 2, ••• thered t . t f . . t P ( 1 ) P ( N) th t
nee no ex1s some 1n1 e sequence
0 , ••• , 0 a
determines
P~oo)
uniquely for all 0 E 8. To be more precise; let 01 , 02 E 8 be such that
p~"')
#
P~oo)
and let N = N (01 , 02)1 2
be the smallest value of n such that P(n)
#
P(n).01 02
Since N need not be bounded in 0
1 and 02 , i t will be clear
that in general identification w.r.t. J'(oo) is an essentially
weaker
~oncept
than identification w.r.t. j-)(n) for some finite n. This fact is important because e.g. in estimation theory i t implies that the existence of consistent estimators for ~(0) does not guarantee a finite sample to be informative for ~(0). The following example may clarify this.EXAMPLE 1.6.1 Let ~
1
, ~2
, ••• be a sequence of independent Bernoulli trials with P{ ~t = 1} 0t , P{ ~t=
0 } = 1 - 0t' 0 ~ 0t ~ 1 such that the sequence 0 = (01 , 02 , ••• ) has a well defined Cesaro limit
lim n+oo
1 n
E
n t=1 0 t ~(0)
Suppose the 0t are unknown, and the statistician wants to
'
'a
{ 0 1 0 27 n (01 , 02 ••• ),lim~
E n+"' t=1 0 t ' ' cp ( 0) }The obvious estimator based on the sample ~
1
, ~2
••• ~n is.!l!.n
1 n
n
r
.~tt=1
I !
It is easily seen to be asymptotically unbiased. FUrthermore,for its variance we have
n
1 E0 (1-0)..::_ n2 t=1 t t 4n
1
+ 0,
Thus in is a (weakly) consistent estimator for ~(0). Neverthe-less i t is easily seen that no finite sample is informative for cp(0).
The previous discussion motivates the following definition. I
DEFINITION 1.6.2 The sample size n is called informative for the function cp
implication
a
+ A if for all 01 0 Ea
we have the ( ~)
cp(01)
I'
cp(02)~ P n)
I'
P;n • 01 "'2REMARK.If the proces {~}is strictly stationary, and the sample size n is informative for cp, then any sample of the process taken at n consecutive time points is informative for ,cp.
If the sample size n is informative for cp, then any sample size N > n is informative for cp because of the implication p(n) , p(~) ~ p(n+1 ) , p(n+1 ) I 01 I 02 E
a •
Therefore an01 02-' 01 02
interesting but often very difficult problem is to find the minimum infoPmative sample size, or at least an upperbound for
it. In the case of a samp~e from a moving-average process the minimum can be found and in mixed autoregressive moving average models an upperbound can be found (Chapter II univariate,and Chapter III multivariate).
1.7 IDENTIFICATION AND PREDICTION
In this section we consider the statistical pPediction
pPoblem, that is the problem of predicting the random variable
(or - vector)
y
on the basis of the observable vector x wherethe joint distribution of ~ and
y
depends on some unknownpara-meter 0 E
a.
The problem needs some special attention since i tdoes not fit the framework of 9 1.1 - § 1.3 in a trivial way.
Let P0 denote the joint distribution of ~ and y and put
f>
= {
P 0I
0 E 8 } • Then the triple ((~
, y) ,'P ,
8) isnot a relevant statistical problem since only ~ is observable
and not(~,
y).
The relevant statistical problem isrepresen-ted by (
~
,P
1 , 8 ) where?
1 = { P1,0
I
0 E 8 } and P1,0denotes the marginal distribution of x.
· If the statistician is interested in prediction of
y
on thebasis of ~,he is, in fact, interested in statements in terms
of the mapping'!': 8 ....
p
defined by '1'(0) = P0 , 0 E 8. This
leads to the following definition.
DEFINITION 1.7.1
y
is said to be pPedictable w.r.t. x if X isinformative for the mapping '!'.
Thus
y
is predictable w.r.t. x iff we have the followingimpli-cation for all 01 , 0
2 E 8 : P0 1 ~ P0 2 ~ P 1,01 ~ P 1,02 (see
lemma 1. 1.12).
Two examples may clarify the ideas.
EXAMPLE 1.7.2 Let ~ and
y
be independent normal variableswith unknown mean 0 and unit variances. Then for all 01 , 0
2 E 8
we have the implications P0 ~ P0
*
01 ~ 02
*
P1 ~ P1 2 '01 1102
Hence
y
is predictable w.r.t. x.EXAMPLE 1.7.3 Let ( ~ ,
y)
be binormal with zero means,unit variances and unknown correlation-coefficient 0. Then we have P
I..
29
is not predictable w.r.t. ~· It should be noted, that if B wer~
known, then
y
would be predictable in a trivial way. However, in that case we don't have a realstatistical
prediction problem, _but merely aprobabitistia
prediction problem which can bedefined as a triple ( (~ , y) ,
'P ,
a )
for which the condi-tional distribution ofy
given ~ does not depend one
(a.s. P1,0 V0 ). Clearly observational equivalence of values of 0
is irrelevant for such prediction problems.
Let x = ( ~
1
, ~2
••• , ~n) be a sample taken from a stochastic process {~ , t=
1, 2, ••• }with probability law( 00) P
0 , B E
a.
Let N > n+1. Theny
=
~ is predictable if thesample size n is informative for
P~N)(def.
1.6.2) or equiva-lently if for all 01 , 02 E
a
we have the implication(1.7.1) P (n)
e
-:fPe
(n)1 2
Usually the index t represents time; therefore {~n+
1
, ~+2
, ••• } is called thefuture
of the process. Thus the future is predic-table iff (1.7.1) holds for all N ~ n+1. In analogy withdef. 1.6.2 we put
DEFINITION 1.7.4 The sample size n is called
predictive
if (1.7.1) holds for all N > n+1.Of course, if the sample size n is predictive, any sample size m ~ n is predictive and so the problem of the
minimum
predic-tive sample size
makes sense. The problem is closely relatedto that o!' minimum informative sample sizes. It differs so far as in_ (1.7.1) the set {
P~N)
I
N=
n+1 , n+2 , ••• } depends on n. The relation to the minimum informative sample size (if i t exists) is given in the next theorem.THEOREM 1.7.5 If N is the minimum informative sample size
q>
for q> :
a
~ A and n0 is the minimum predictive sample size
I
then we have n > N • If cp is 1 - 1 then n
0 N •
0 - cp cp
PROOF, Since N is minimal informative for cp we have for cp (N ) (N ) (N -1) (N -1) cp(0 2) ~ P0 cp ~ P0 cp
f
P0 cp ~ P0. cp -1 -2 -1 2 hence n 0 -> N cpIf cp is 1 - 1 we also have for all 01 I 02 E
e
(N +1) (N +1)
p cp "' p cp
01 02
which implies that N is predictive and so N
cp cp
EXAMPLE 1.7.6 Consider the linear regression model of
exam-ple 1.2.10. We rewrite i t as
~t
=
xt B + ~t1 t=
11 21 • • •Suppose the ~t are i.i.d variables with common
distribution-function F ~ 1 ~ E v. We shall prove that ~ : (y1 •••
Yn)'
ispredictive for Yn+1 iff
y
is informative for x~+1
e.
PROOF.P(n+1) 0
(Only if) Let x~+
1
e
1 ~ x~+1
e
2 • Then we have also1
~ P~;+
1)
and since
y
is predictive for Yn+1 this implies P(n)"'
P(n)Hence
y
is informative for x~+1
B (lemma 1.1.12).0 0
1 2
(If) Let P(n+1) ~ P(n+1)
.
Assume P(n) P(n).
Then we have0 02 0 02
1 1
F
=
F and~1 ~2 xne2 where Xn denotes the n x k matrix
of regressors. Since P(n+ 1 ) is completely determined by
F~
and0 "
.
'[
we must have this implies X n1
31 so x~+1
s
1 ~ x~+1
s
2 • But informative for x~+1 B.
Thus we have a contradiction proving thaty
is predictive for Yn+1'o1.8 WEAK CONCEPTS OF OBSERVATIONAL EQUIVALENCE AND STRONGLY
INFORMATIVE SAMPLES
The concept of identification introduced in the preceding sections is in fact based on the classical concept of obser-vational equivalence, that is on equality of distributions.
As SCHONFELD [31] allready pointed out, any other equivalence relation on ~ can serve as a basis of a (weaker) alternative concept of observational equivalence, and so a stronger notion of informativeness. Let ·~ denote an arbitrary equivalence relation on ~. Then the values
e
1 Ee
ande
2 Ee
are called weakly ~-observational equivalent if P0 ~ P0 • It is easily1 2
seen , that all results of the preceding sections remain valid if (in)-equality of distributions is replaced by (non-) ~ equivalence.
We shall consider two possibilities for ~, where the first one (Mr-equivalence) is the most important (particularly from a practical J?Oint of view) and the second ( t-equivalence) is of some theoretic importance since i t enables us to see a link between the classic concept of identification and sufficient statistics.
Let Mr(e) denote the set of all moments up to order r of the distribution P