Least squares parameter estimation in a dynamic model from
noisy observations
Citation for published version (APA):
Vregelaar, ten, J. M. (1988). Least squares parameter estimation in a dynamic model from noisy observations.
Technische Universiteit Eindhoven. https://doi.org/10.6100/IR293672
DOI:
10.6100/IR293672
Document status and date:
Published: 01/01/1988
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be
important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at: openaccess@tue.nl
providing details and we will investigate your claim.
IN A DYNAMIC MODEL
FROM NOISY OBSERVATIONS
PROEFSCHRIFT
TER VERKRIJGING VAN DE GRAAD. VAN DOCTOR AAN DE TECHNISCHE UNIVERSITEIT EINDHOVEN, OP GEZAG VAN DE RECTOR MAGNIFICUS, PROF. IR. M. TELS, VOOR EEN COMMISSIE AANGEWEZEN DOOR HET COLLEGE VAN DEKANEN lN HET OPENBAAR TE VERDEDIGEN OP
DINSDAG 13 DECEMBER 1988 TE 16.00 UUR
DOOR
JOHANNES MARTINUS TEN VRECELAAR
de promotoren
prof.dr.ir. M.L.J. Hautus en
prof.dr. R. Doornbos
PREFACE
CHAPTER 1. INTRODUCTION 1.1. Statement of the problem 1.2. Estimation method
1.3. Notation for model and observations 1.4. Definition of symbols
CHAPTER 2. ALGORITHM FOR COMPUTING ESTIMATES 2.1. Object function and derivatives
2.2. Computation of object function and 2. 3. The Q·-R decomposition 2.4. Discussion CHAPTER 3. CONSISTENCY 3.1. Preliminaries 3. 2. Random noise 3.3. Identifiability 3.4. Main result
CHAPTER 4. ASYMPTOTIC NORMALITY OF QUADRATIC FORMS 4.1. Convergence in distribution
4.2. Asymptotic normality
4.3. Quadratic forms in normal random variables 4.4. Quadratic forms in independent random variables
CHAPTER 5. ASYMPTOTICALLY NORMAL ESTIMATORS 5.1. Taylor expansion
5.2. Asymptotic normality of the gradient 5. 3. Main result
5.4. On estimating the covariance matrix
2 3 5 8 8 11 14 22 26 26 33 46 50 56 56 58 65 68 74 74 81 95 97
6.2. Asymptotics for constrained optimization 6.3. Partial input noise
6.4. Asymptotics for the partial input noise case
CHAPTER 7. SPECIAL CASES 7. 1. No dynamics
7.2. Normal errors
7.3. No dynamics and normal errors
CHAPTER 8. A SIMULATION STUDY 8.1. Model and asymptotic analysis 8.2. Simulation results REFERENCES SAMENV ATTING CURRICULUM VITAE 104 111 115 125 126 135 142 146 146 150 154 158 160
Let us consider a dynamic black box model with input ~ and output n which may be vectors.
•
black box•
In order to control such models, one must be able to predict the output for known inputs. To this extent, we specify the model to be a member of the fairly general class of ARMA-models:
Auto Regressive part Moving Average part
In many practical situations, both inputs and outputs cannot be observed without error: we assume that observed values of outputs and inputs are given by
yt = nt + [t xt ~t + ot where t:t and
<\
are the noise terms.The object is to estimate the parameter matrices a
1 , ••• ,ap, S0, S1 , ••• , 1\ from a set of observations for inputs and outputs. This is what this mono-graph is about.
On the one hand, we can view our problem as an extension of the so-called
errors-in-var>iables models, see Linssen (1980), Hillegers (1986) and
Johansen (1984). Then the extension consists of taking dynamics into
On the other hand, the present problem is a generalization of the problem described in Aoki and Yue (1970b): the authors study the dynamic model
uithout noise on the input observations.
We do not allow for shocks or model errors, as is done in Maravall (1979): the identification in dynamic models with shocks and noisy observations is discussed there.
A survey of the contents of the thesis is given now.
In Chapter 1, we formulate the problem and define the Zeast squaPes estima-tion method for this problem. This method is the topic of the subsequent chapters.
In Chapter 2, an algorithm for computing least squares estimates is given: the structure in the model is used in an optimal way.
Asstmling a random noise process, we treat the asymptotic analysis of our method. In Chapter 3, the strong consistency is proved under a number of assumptions. A key role is played by the concept: almost surely uniform convergence of random functions.
The asymptotic normality property is the subject of Chapter 5. The essen-tial issue here is the limit distribution of certain quadratic forms in random variables. Therefore, the preliminary Chapter 4 is included.
Two generalizations of the original problem are discussed in Chapter 6: (equality) constraints on the parameters and partial input noise. We deal with the consequences for the algorithm as well as for the asymptotic analysis.
For special cases, it is often possible to reduce or to check the assump-tions which are made. This is investigated in Chapter 7 for the non-dynamic model and the case of normally distributed errors, respectively.
In this chapter we formulate the parameter-estimation problem, that will serve as the starting point for all later chapters of this thesis. More or less the same problem was discussed before by e.g. Aoki and Yue (1970a) and Eising, Linssen and Rietbergen (1983).
After a description of the problem in Section 1.1, in Section 1.2 we intro-duce a method for estimating the unknown parameters. In fact, this estima-tion method is the subject of this monograph. Secestima-tion 1.3 is devoted to defining a compact notation for the problem and estimation method. Some notations and symbols to be used throughout the thesis are collected in Section 1. 4.
1. 1. Statement of the problem
We consider a linear, time-invariant and causal discrete-time system defined by an ARMA (autoregressive-moving average) description of its inputs ~t and outputs nt (t representing time):
1
a. nt . +!
$.~
. , i=1 i - i j=O J t-Jt € 7l • ( 1. 1 a)
The inputs and outputs may be vectors in IRr and IRs, respectively, so for
r,s > 1 (1.1a) defines a MIMO-model (multiple input - multiple output).
The orders p and q are supposed to be known. We define
m := max(p ,q) • ( 1. 1 b)
At time t = 1, we start measuring inputs and outputs of this ongoing process. Both inputs and outputs are observed with noise (see figure):
t 1,2, •.. ,m+N, ( 1. 2)
m+ N denoting the number of observations yt and xt available.
~--.---- ARMA
n
x y
The problem is to estimate the unknown parameter matrices a
1,a2, ••• ,ap,
s
0,s
1, ••• ,Sq from the set of data {(x 1,y 1),(x2,y2), ••• ,(xm+N'Ym+N)}. Unlike Eising, Linssen and Rietbergen (1983), we do not make any assump-tions on initial condiassump-tions, by taking t=
m+1,m+2, .•• ,m+N in (1.1a).Assumptions on the noise processes ot and are deferred to Chapter 3, where asymptotic properties of estimators are discussed.
1. 2. Estimation method
Throughout this thesis we consider the least squares estimation method for
obtaining estimates for the s x [ps + (q+1 )r] matrix of unknown parameters
(1. 3)
By definition of this method (abbreviated in the sequel as Zse method), we minimize
( 1. 4)
with respect to 6 and the unknown 'true' inputs and outputs
~
1
••••• ~m+N' n 1, ••• ,nm+N' subject to the model equations (1.1a) forin order to obtain an estimate fore. The norm 11•11 in (1.4) denotes the Euclidean vector norm. In literature, this method is sometimes called orthogonal distance regression, see Boggs, Byrd and Schnabel (1987).
As will be seen in Chapter 2, elimination of inputs and outputs gives rise to an unconstrained minimization problem with respect to 8 alone. First, we introduce for convenience a compact notation for the model (1.la) and the observations (1.2).
1. 3. Notation for model and observations
We write the model equations (1.la) fort= m+1, ••• ,m+N as a matrix vector equation, in which a freedom of choice for their explicit forms exists:
Dl; 0 • (1. 5)
Here, Dis an sN x (s+r)(m+N) matrix containing the entries of 8, and the
(s+r)(m+N) . .
column vector l; E lR consists of the true inputs and outputs. Let z and e denote the vectors of observations and noises, respectively, corresponding to l;. Then
z=l;+e
is a short-hand notation for (1.2).
The following choices for D and l; will be useful for our purposes:
( i) D
'
'
'
'
'
'
'
'
'
'
'
' '
'
'
T\n+N'
'
n2 ....Bo ••
Sm
n1 l;~+N
t,;2 t,;1 (1. 6) (1. 7)(ii) Yo •• Ym r.;m+N
'
'
'
'
D'
'
( 1. 8)'
'
'
Yo •• Ym'
where a.0 - I (I is the s x s identity matrix)
and t;.
l.
[~~]
i 1 , 2, ••• ,m+N •We define "'it
=
0 for k > p and ~ 0 for k > q. The empty spaces in the matrices D represent zero entries.For both forms (1.7) and (1.8), the matrix Dis row regular for all 0. Consequently, the Moore-Penrose inverse of D is
+ T T -1
D = D (DD ) •
Furthermore, we give a state-space description for the model equation (1.5) (the vector K € 1Rsm denoting the state):
(1. 9) Kt+1 <j'>Kt +
b~ ~
8o:t t 1, 2, ••• ,N ( 1. 10) nt "' hKt + with 0 am I sm x sm matrix ¢ I a.1 i3m + a.m 80 sm x r matrix b 81 + a.1 80 and s x sm matrix h [O ••• 0 I].
This is the so-called observable canonical form.
Remark
An
expression for the state vector Kt in (1.10) isnt+1-a1nt - Bo~t+1-61~t nt - 6o~t
t = 1, 2, ••• ,N-1 • (1.11)
1. II. Definition of symbols
Entries of a matrix A are written as (A) . . or a ..• Row i and column j of 1. .J l.J
A are denoted as Ai* and A*j' respectively.
The Kronecker product of an m x n matrix A and any matrix B is defined by
A@ B :=
a11B a12B alnB
a21B
A
n
a B mn
where, as throughout this dissertation, empty space in a matrix represents zero entries.
For symmetric matrices A and B, A ~ B and B ~ A stand for A-B ~ O, i.e. A-B is positive semidefinite. Likewise, A , B (or B < A) means A-B > O: A-B is positive definite.
By 11•11 we denote the Euclidean vector norm or the Frobenius matrix norm:
llxll2
=
tr XT X for any matrix X.The abbreviations det A and tr A are used for determinant and trace of any square matrix A; o(A)
{A Et; A is eigenvalue of A} is called the
spec-trum of A. We denote the identity matr~x of arbitrary dimension by I.The object function to be defined in the next chapter, will be denoted by JN(e). Then JJ(0) is the (column) vector of derivatives with respect to
h . f0 I ' · u 2 ( ) ·
t e entries o • t is a vector in 1R where µ := ps + q+1 sr is the number of unknown parameters, see (1.3). We shall write the components of
i J~ (0) as JN, so
i 1,. .. ,]J •
Furthennore, J~' is the ]J x µmatrix of second derivatives of JN: its (i,j)-th entry,
(JN") • .
i,J
a2
JN
is also denoted by Jij·
Let M be a matrix depending on e. When no confusion is likely, we use Mi and Mij for ClM/80. and a2 M/CJ0. ae., respectively, which are the matrices
i J i
obtained from M by differentiating all its entries.
When dealing with random variables, lP and
m
are the symbols for probabilityand mean or expectation operator. The latter is applied to scalars as well as vectors. Likewise, we use the abbreviation var for variance: if x is a
If x and y are random vectors in IRm and IRn, respectively, cov(x,y) is the m x n matrix whose (i,j)-th entry equals cov(xi,yi) for i
=
1, ••• ,m andj 1, ... ,n.
More definitions and notations are introduced later on, in particular in the Sections 3.1 and 4.1.
In this chapter we introduce a method for computing least squares estimates (as defined in Section 1.2) for the parameters from the set of observations. In Section 2.1 expressions for the object function and its derivatives up to order three are derived. Since the minimization problem involved has no closed-form solution in general, we use an iterative procedure in order to find minimizing points.
In Section 2.2, we introduce a Q-R decomposition for the matrix DT which is suitable for the computation of the object function and its gradient. In Section 2.3, the Q-R decomposition is discussed in detail. For the matrix D, defined by (1.7), it can be performed very efficiently.
Some remarks on complexity of this method, and on other methods are con-tained in the final Section 2.4.
2. 1. Object function and derivatives
In terms of notations (1.6) and (1.7), the lse method becomes (min abbre-viates minimizing)
2
llz-sll subject to Ds
We view the problem as
0 .
min(minllz-sll2 subjecttoDs
o)
e
s
and we optimize for s keeping
e
fixed.(2. 1)
The result will be a function of 6 alone, to be denoted by JN(8). Then 8 has to be chosen such as to minimize JN(8), without any restriction on 8.
Application of the Lagrange multiplier method to the first optimization gives rise to the Lagrangian
(2. 2)
It follows from the stationary point equations
T i;;+D ;\=z
Di;;
=
0 , that(2. 3)
Then the object function JN(e) becomes
(2. 4)
where
+
P := D D (2. 5)
is the (s+r)(m+N) x (s+r)(m+N) matrix representing the orthogonal
projec-T
tion on the column space of D .
The object function is infinitely often differentiable with respect to the entries of e. For our purposes, expressions for the first, second and
. i
third derivatives will be sufficient. Let A or A denote aA/ae. for any
l
entry of e and any matrix A. Using
(2.6)
we can easily verify that
where
P.L := I - P
represents the orthogonal projection on the kernel of D. For Pij := <J2 P/(lfJ. (lfJ. we obtain
J l
where
(2. 8)
(2. 9a)
We have used here that <J2 D/dfJ. (lfJ. = 0 for all i and j, since D depends
J l
linearly on the elements of
e.
cf. (1.7), (1.8). Likewise, it followswhere
pijk :=
~-<l~
3-,-P
_ _ = Qijk + (Qijk)T 'aek ae. ae.
Jl
Qijk =
_a_
Qijaek
(2. 1 Oa) Pk(Dj) T (DDT) -1 Di p.l + p.l(Dj) T {(DDT)-1 }k Di p.l- p.L(Dj) T (DDT)-1 Di pk+ (2.10b) Here, (2. 11)Expressions for jN (any component of
J~).
JNij ((i,j) element ofJ~')
and• T • JN
=
z Pz (2.12a) Jij =zTijz N J ijk = z T pijk z N2. 2. Computation of object function and gradient
The minimization problem
(2.12b)
(2.12c)
(2. 13)
has no closed-form solution, as was already noticed by Aoki and Yue (1970a). Therefore, an iterative algorithm is applied in order to obtain least squares estimates for
e.
An initial estimate is needed. The algorithm we choose is the so-called Broyden-Fletcher-Goldfarb-Shanno(BFGS) formula, see Scales (1985) pp. 89-90, which has good numerical properties. This quasi-Newton method needs function and first derivative evaluations, which will be discussed now.
From (2.4) we recall
T
JN = z Pz • (2.14a)
Then (2.12a) and (2.7) yield
• T + • _J.
JN = 2z D DP-z • (2.14b)
Definition (1.7) for D and~ is most convenient for describing an algorithm for the computation of JN and its gradient. This is a consequence of the Q-R decomposition for DT, that ~ introduce as method of ccmputation. Recall that DT has full column rank.
In the next section, a detailed description of this deccmposition for the matrix DT of (1.7) is given. Here, we discuss its use for computing the object function and the first derivative vector.
There exists an orthogonal (s+r)(m+N) square matrix Q and a Zo~r-triangu
lar regular sN square matrix R, such that
(2. 15)
Due to definition (1.7) for D, it is obvious that we are looking for a Zower-triangular rather than an upper-triangular matrix R, see (2.30b) in the next section.
Define Q1 as the (s+r)(m+N) x sN matrix consisting of the first sN columns of
Q
(writeQ =
[Q
1Q
2J).
ThenDT = Q R 1
Due to the orthogonality of Q,
p
hence
Furthermore,
A=
(D+)Tz (as before, see (2.3)) impliesIt follows from (2.14b) that
• T • T JN= 2A D(z-D A)
Premultiplying (2.19) with QT gives in view of (2.16) and (2.17), 1
RA
=
u ,with u E ffi.5N is defined by
.T u := 1.11 z (2.16) (2.17) (2. 18) (2.19) (2.20) (2. 21) (2.22)
Swmning up, we have seen so far that by extending the matrix DT with the column vector z, the matrix R and the vector u are computed from
(2.23)
.
Then JN and any component JN of JN follow from R and u, by
(2.24a)
(2. 24b)
RA
=
u • (2.24c)Equation (2.24c) can easily be solved for A, since R is lower-triangular.
Suppose the sN components of
A are arranged as
A
=
[A1, 1 A 1,2 • • A 1,s 2,1 A • • A 2,s • • • • A N, 1 AN ,s ] T • (2.25) Then we obtain the following expressions for the ps 2 + (q+1)sr components of JIN (using 11 and [, in accordance with (2.3)):
(UN N
~ i,~
1,2, ••• ,p 2I
A n,i 11n+k,j'
n=1 1,2, ••• ,s {)JN Nl
k 0,1, •.. ,q <l((\) . .=
2I
A n,i [,n+k,j'
i 1,2, ... ,s 1,J n=l j 1,2, .•• ,r where (note a 0-I)
,j and min(i-1,p) s Ym+N+l-i,j -I
I
(~)~,j Ai-k,~'
k=max(O,i-N) ~=1 i 1 , ••• , p+N j 1,2, •.• ,s (2.26a) (2.26b) (2.27a)t;, • .
l , J min(i-1,q) s xm+N+1-i,J' -I
I
(Bk)~
·Ai-k,~
k=max(O,i-N) ~=1 ,J \ i ( j 1, .•• , q+N 1,2, ••. ,r (2.27b)(here x . . an.d y . . indicate the jth component of the vectors x
1. E 1Rr and l , J l , J
y. E lRS).
l
2. 3. The Q-R decomposition
In this section we discuss the transformation of the matrix [DT
I
z] into[~j ~J
by means of an orthogonal matrix Q, see (2.23). Using the special structure of nT, one can do this efficiently.Introducing a. :=a! and b. :=
s'.f,
i=
0,1, ... ,m, matrix DT defined byl l l l (1.7) is the (s+r)(m+N) x sN matrix ao al a m ao al a DT
=
m (2.28) bo bl b m bo bl b mconsisting of two block-Toeplitz band matrices.
The process of transforming [DT
I
z] into[~I ~l
(cf. (2.23) is split up in=
[~o
v*l
n
[DTI
z]Qi;
(2.29a)and
(2.29b)
with
n
andW
orthogonal. The s(m+N) x sN matrix S and the sN x sN matrix R have similar shapes:s
(2. 30a)R = (2.30b)
(the s . . and r . . are s x s blocks.
]. • J ]. ,J
Step 1: Q [DT
I
z]=[:I :1
Orthogonal matrices
n
1,n
2, ••• ,Qm+N are chosen withThe following scheme shows the effect of the matrices Q. on [DT
I
z]. The ]. matrices in (2.31) representn
1 [DTI
z] andn
2n
1 [DTI
z], respectively.~
Ym+N~
Ym+N am a m ao ao a am-I am-2 m Y3- - -
-
---am ---am-I Y2 s s v2 '.,
.
-- --
5 m+N,N vin2
s.
vl"
-0 * 0 *-
-
- -
-
-
-
--- --
-t§)•
xm+N-1__
0 ..:,._______
*t§)
xm+N-2 bm-1 bm-2 b . m-1 0 0 0 bo bo x2 X2 bm-1 xl bm-1 bm-2 xl (2.31)1 . 1 ' T b n h 1 f DT b 1
By premu tip ying D y "• t e ower part o ecomes equa to zero. This is done block-diagonal by block-diagonal in order to preserve the block-Toeplitz structure. Matrix Qi corresponds to making block-diagonal i equal to zero. The number of nonzero block-diagonals in Qi Qi-l ••• Q1 DT is equal tom for i
=
2,3, ... ,N. Therefore, we need m+N matrices Qi to transform DTinto[~],
see (2.29a).After premultiplication of [DTI z] by Qi' block-row m+N+l-i of S (see (2.30a)) and vi
E 1Rs are determined, for i
=
1,2, ••• ,m+N.Let us consider a matrix Qi in detail. In order to make any column of b 0 (say column! and write (b
0)*!) equal to zero, we apply an (r+l) x (r+l) Householder matrix hQ: we take the nonzero diagonal element of column ! of ao· denoted by (ao>i,i• to obtain
(2.32)
*
representing some nonzero scalar. Now, the block-Toeplitz structure is preserved by applying this~
to allcorrespond[~~]g
pairs[:~]
along the diagonals and to alZ other corresponding pairs b~ , j=
1, ••• ,m.We start by making column s of b
0 equal to zero, proceeding with cclumn s-1, etc. Consequently, a0 remains lower triangular during the transformation process and hence the sk,k bloks in (2.30a) are also lower triangular
(k
=
1, ••• ,N). As already indicated in (2.31),matrices[:~]
transform intor:~
]
for j=
0,1, .•• ,m by premultiplication by Qi. We note thatJ-1
bm 0 for i ~ 2, and, of course, b_1 =
O.
For completeness, we give an expression for Qi' which, however, is not computed explicitly in the algorithm. For given a
0 and b0 and fixed ! E {1,2, ••• ,s} we choose for~ in (2.32)
T T
where
and
see e.g. Golob and Van Loon (1983), Section 3.3.
Introducing := diag(l, ••. ,1,(hr.l)l 1,1, ••. ,1) ,
t .
s x s position J1, the s x r matrix h12 with only nonzero elements in row Ji:
and we define I s I r I r (s+r)(m+N) x (s+r)(m+N) •
The number o~ Is and Ir blocks is i-1, whereas Ht contains m+N+l-i blocks h11' h12' h12 and h22"
Now Qi can be written as a product of s orthogonal matrices
Q.
=
H H ••• H ]_ 1 2 sWhen dealing with Hs-l we take in (2.32) the b0 which has already been transfonned by Hs etc. etc.
Summing up, we have established the following. The effect of an arbitrary Qi on the (transformed) matrix [DT I z] is known by only computing, s times successively (t
=
s, .•. ,1), an (r+l) x (r+l) Householdermatrix~
and thet
[(a·)
t*]
result of postmultiplying hQ by, on the one hand, all matrices b~ , j = 0, 1, .•. ,min(m ,m+N-i) and, on the other hand, all vectors
[ Ym+N-j,t ] . . , J = 0, 1, ••• ,m+N-i. xm+N+l-i-j
~: ~
[*.]
=[;r.]
~ ~ ~Now orthogonal s(m+N) x s(m+N) matrices ~
1
.~2
, ..• ,~N are chosen with ~ := ~N ~N-1 ~2 ~1and
~ := diag(';p,I)
We show the transformation of
[~
I~]
by means of premultiplication by~.
in tenns of the matrices\¥
1 [SI v] and
'q/
2q/
1 [SI v] below. Here, S is as defined in (2.30a) andvm+N
.
.
______
~N_:l_:N:m:l~ ·~~N_:l_:N::_1)_
rN ,N-m • • • rN ,N···oo
•,•.,.
•.. s 0.,.
.,.
vm+J v m ~2-
---~~~]_
r.,. rN-1,N-1 r. s·,·
...
0
.
...
oo
. .
••• s ••• 0 0 vm+N v m (2. 34)As a result of the multiplication by ~. the last sm rows of S become zero. Proceeding block-column by block-column we distinguish N multiplications denoted by ~
1
, ~2
, ••• , ~N.After application of Wi, block-row N+1-i of Rand block-row um+i of
um+N u E JR.SN
'
um+2 um+1 are known, i=
1,2, ••• ,N. ~Let us discuss multiplication by ~i' for arbitrary i E {1,2, ••• ,N} in detail. Again, Householder transformations are used to make the columns of
length sm in (2.34) equal to zero. Considering column £of the sm x s block
SN+1,N+1-i
sN+m,N+1-i
we compute an (sm+1) x (sm+1) Householder matrix~ from that column and from the nonzero diagonal element of the corresponding column of
sN+l-i N+l-i (analogous to (2.33)) in order to obtain
(sN+1-i,N+1-i)i,i * 0
h'l' (sN+1,N+1-i)*i (2.35)
(sN+m,N+l-i)*i 0
Again, * denotes a nonzero scalar.
By starting with the last column of the block (i • s in (2.35)) and pro-ceeding with column s-1, etc., we obtain that the sN+l-i N+l-i blocks
'
remain lower-triangular. The size of the nonzero square appearing in the lower part of S (cf. (2. 34)) remains sm x sm, for i = 2, .. .,N-m.
The result of multiplication by ~i is obtained by only computing, s times successively (i
=
s, ... ,2, 1), an (sm+l) x (sm+ 1) Householder matrix~
and the postmultiplication of h: by, on the one hand,(s N+l-1,N+l-:i.-J ,,,* . . ·)" sN+1,N+1-i-j
= 0,1, ••• ,min(m,N-i)
sN+m,N+l-i-j
and, on the other hand,
The transformed blocks retain their names in (2.34).
Since any diagonal block r . . of R in (2.30b) is a transformed s . . block,
:i.,1 :i.,:i.
it is lower-triangular and regular. Therefore, R itself is lower-triangular and regular.
The computation of object function and its gradient, from R and u, has been given in (2.24)-(2.27), so the algorithm for computing estimates is now complete.
2. 11. Discussion
The following necessary and sufficient conditions concerning the minimiza-tion problem (2.13) are well-known from optimizaminimiza-tion theory, cf. Scales
( 1985).
Necessary conditions:
If the object function JN has a minimum in
e*
then (stationary point)J~'<e*)
<::o .
Sufficient conditions for a local minimum:
The conditions
J
"<e*)
>o
N
are sufficient for
e*
to be a local minimum point of JN. Here, z and N are assumed to be fixed.A computer program based on the algorithm described above is available. In Chapter 8, dealing with a simulation study, we shall discuss numerical results. The conditions above will be verified.
A Q-R algori th:n for an m x n matrix (m :;; n), based on Householder matrices, takes in general n2(m--J) operations, see Golub and Van Loan (1983) p. 148. In view of the expos1t1on in the previous section, the number of operations to obtain R in (2.30b) from DT is of order O(N). In Eising, Linssen and Rietbergen (1983), matrix inversion is proposed, giving an algorithm of 0(N2) operations.
Our algorith:n for obtaining object function and gradient is of 0(N2), because of the computation of u (see the concluding remarks of step 1 in the previous section). However, by somewhat modifying the algorithm, we
even obtain one of
O(N).
In this approach. the vector u is not computed by extending the matrix DT with z as described in (2.23). From (2.16). itT ~ T
follows that Q
1 R D. Hence. u defined by (2.22) as u
=
Q1 z can also be written as u R-T Dz.Once R is determined, u is solved from
T
R u = Dz • (2.36)
which takes only 0(N) operations because of the special form of R, cf. (2.30b). Then, as before. JN and JN are computed from Rand u. see (2.24).
2
If N >> ps + (q+1)sr. we solve u from (2.36) rather than computing it from (2.23).
The matrix DDT• which is the same for both choices (1.7) and (1.8) for D. is a syrmnetric. block-Toeplitz and band matrix:
DDT where do d1 d m
=
dT 1 dT m d m m-k T TI
<a. a.
k +s1. s1.
+k> i=O l i+ (2. 37a) dT tn dT 1 di do k O, 1 ••••• m • (2. 37b)In the literature, one can find some algorithms for inverting such matri-ces. cf. Trench (1974) and Meek (1983). The Q-R decomposition introduced here is superior to these algorithms with regard to the number of opera-tions. Moreover. Q-R decomposition is known to be a numerically stable procedure.
If we assume zero initial conditions, as is done in Eising, Linssen and Rietbergen (1983), we can write the model equations as
0 with -I a.1
D
=
a. m and a. m -I Bo Bm sN x (s+r)NThe Q-R decomposition of DT is much alike the decomposition of DT in (2.28), except for the second step which drops out now, see Ten Vregelaar (1985).
When the input is scalar (r = 1), we can use Givens transformations instead of Householder transformations in step 1 of Section 2.3.
Finally, we may obtain an initial guess for the parameter vector by assuming equation errors instead of noisy observations, cf. Strejc (1980). It is possible to write
I
ai Y t-i +I
B. xt . + error , i=1 j=O J -J t m+1, ••• ,m+N , as yxe
+ errorgiving
6
initial
cf. Goodwin and Payne (1977) p. 83.
See also Chapter 8 for a discussion of the so-called naive least squares approach.
From now on, the noise is assumed to be random.
A minimal requirement for any estimation method is consistency of the estimators: if the number of measurements tends to infinity, estimates for the parameter vector should converge (in a sense to be specified) to the true value.
In Section 3.1, some convergence concepts are introduced for random vari-ables and random functions. Useful results from probability theory are given for later reference.
In Section 3.2 we give formal definitions of least squares estimators and consistency. Some assumptions on parameter space, inputs and model are dis-cussed.
A crucial convergence assumption is the topic of Section 3.3. It has an interpretation for the SISO-model. Then the consistency result follows in Section 3.4 after making an assumption on the noise.
Consistency results for approximate lse methods and the lse method for the special case of no input noise, were given earlier in the papers of Aoki and Yue (1970a, 1970b).
3. 1. Preliminaries
In this section, we give some preliminary results from probability theory, which constitute the tools for proving asymptotic properties of the least squares estimators. A mathematically profound treatment of asymptotics of estimators can be found in Bierens (1981) in the context of nonlinear regression models. Some of his results are convenient for us to apply.
We start by introducing two well-known convergence concepts. Let {x } be a
n
Definition 3. 1
a) The sequence {xn} converges in probability to a random vector x, nota-tion x ~ x, i f
n
lim IP {w E SI
I
llxn (w) - x(w) II > E:} 0 for every E > 0 , n-+oo(for short: lim IP(llxn-xll > d
n->oo
0 for every E > 0).
b) The sequence {x } converges almost surely to a random vector x, notation
n x, if IP {w E SI lim x (w) n x(w)} (for short: IP (xn ...,. x) 1). 1 •
Almost sure convergence is stronger than convergence in probability. How-ever, the following converse holds:
Theorem 3.2
, ••• be random vectors. Then xn ~ x if and only if every sub-sequence {Kuk} of {xn} contains a subsequence {xnk.} converging almost
J surely to x.
Proof. Tucker (1967) p. 103.
...,. IR}', be a Borel-measureable function, which is continuous on a Borel set B with IP(x EB) 1.
If x,x
1 ,x2, ••• are random vectors in IRk with xn...,. x (a.s. or p) then f(x ) ...,. f(x) (a.s. or p).
n
Proof. Tucker (1967) p. 104.
0
0
Now, we define the concept of uniform convergence of random functions, see e.g. Bierens (1981) Section 2.3.
Definition 3.4 Let f(O),f
1(0),f2(0), ••• be random vector functions on a parameter space 8 c lR,1', • i.e. F-measurable functions for every (fixed) 0 E 8, where {Q,F,IP} is the probability space involved.
a) fn(O) converges al;most surely to f(O) uniformly on 8, notation f (0) a~s. f(O) uniformly on 8, if
n
sup llfn (0) f(O) II a._;. O
em
or equivalently, there exists a null set N (IP(N)
=
O) such that for every wE U'-N, fn (0,w) -+ f(O,w) uniformly on 8, holds. See Rudin (1964) p. 134.b) fn(O) converges in probability to f(O) uniformly on 8, notation f (8)
~
f (O) uniformly on 8, ifn
sup llf (0)- f(O)ll -1;. 0 . eEG n
Remark
We will consider only random functions for which
indeed a random variable for n
=
1,2, ••• , in the sense that it is F-measurable.A useful result is the generalization of Theorem 3. 3 for random functions.
Theorem 3.5
Let {fn(e)J be a sequence of random vector functions on a Borel subset 8 c
mi.
Furthermore, assume that f (8) is a random vector function that is a.s. continuous on 8, i.e. continuous on 8 for all wE Q,N with IP(N) = 0. If en and e are random vectors in 8 a.s., thena) f (0) a:f. f(O) uniformly on 8 and n
imply
f (8) a.~. f(e) n n
b) f (8)
....
p f(8) uniformly on 0 and 8....
p8
n n imply f (8 )....
p f(S) n n Proof Write f (8 ) - f(S) n n f (8 ) - f(8 ) n n n + f(8 ) - f(S). nIn Section 3.4, the strong consistency of least squares estimators is proved by applying
Theorem 3.6 (Bierens (1981) Lemmas 3.1.3 and 3.1.8)
Let {f (8)} be a sequence of (real) random functions on a compact set 8. n
Suppose fn(8) + f(8) (a.s. or p) uniformly on 0, where f is a continuous non-random function on 0.
If f has a unique minimlml in 0 and satisfying f
(O )
n n Remark inf 8EO 80 on 0, then for any sequence {On}' a.s. in f (8), it follows that
e ....
80 (a.s. or p).n n
a
In the context of this thesis a function f (8,w) is called non-random if it does not depend on (J).
Uniform convergence on a compact set is obtained from pointwise convergence and equicontinuity, see Lemma 3.9 below.
Definition 3. 7 (Rudin (1964) p. 144)
A sequence of real functions {f } is equicontinuous on E c IRk if for every n
£: > 0 there exists a 8 > 0 such that If (x) - f (y) I < t: for every n E JN
n n
whenever llx- yll < 8 for x,y E E.
A sufficient condition for equicontinuity is given in
Lemma 3.8
Let {fn} be a sequence of differentiable real functions on a convex set E c 1Rk for which there exists a positive nlllllber c such that
II f '(x) II ~ c
n for all n E IN and all x E E ,
Proof
Let £ > 0 be arbitrary. For every x,y
€
E,holds, by application of the mean value theorem (zn is a convex com-bination of x and y). Taking
o
:= £/c gives the result, according to Definition 3.7.Lemma 3. 9
If E is a compact subset of lRk and {f } is a sequence of equicontinuous
n
functions on E (for sufficiently large n), then fn(x) ~ f(x) for all x
€
E implies fn ~ f uniformly on E.Proof. Dieudonne (1969) p. 142. See also Rudin (1964) p. 156.
Remark
Conversely, if {f } is a uniformly convergent sequence of continuous
func-n
tions on the compact E, then {f } is equicontinuous on E, see Rudin (1964)
n
p. 144.
For random functions a similar result can be proved.
Definition 3.10a
A sequence of random vector functions {gn(8)} is a.s.
bounded, uniformly
on a set 0 c lRk, if there exists ad > 0 and a null set N c
st
such that for every w(Q-....N an integer n0 exists, satisfying
llgn(8,w)ll :> d for all n >no and for all
e
€e
Theorem 3. 1 Ob
Suppose {fn(8)} is a sequence of real, a.s. differentiable (with respect
k
to 8) random functions on a convex and compact set 0 c lR • If {f 1 } is
n
a.s. bounded, uniformly on 0, then
for all
e
€e '
[J
implies that
fn a~s. 0 • uniformly on
e .
Proof
B definition of f y n (0) a'..f. 0 for all 0 E O, we have
We rewrite the definition of {f '} is a.s. bounded, uniformly one, as
n
By virtue of the mean value theorem, we obtain
Here,
v
3 expresses the a.s. differentiability. Now it follows from the statements above that
If <e',w>I
n < e:. This we can see by definingand
u
0<e:>
:= {0' Ee
I
He' - e11 <2rd} 3 ,..,
v
:= {Iv.
i=l l.
Obviously, 0 c U U0(£) holds. Since 0 is compact, it follows that
k 3k(finite) 3e
1, ••• ,ekEe 8
c
i~l
ui where U. :=u
8 (£). Let e., U., V. and n0 . correspond to 0, U,
V,
n0,1 i 1 1 1 ,1
respectively, for i
=
1, ••• ,k in the sense of (*). Then, choosing 0 E {01, ••• ,ek} in (*),one obtains
by defining k
v
:=n v.
i .. 1 1 N : .. . _ max nO,i 1-1, . . . ,k andu.
:=u .
l lTaking £ = 1/m with m E 1N and using (**), we find
Introduction of W :=
n
V(i), so that Il'(W)=
1, gives i=l1 <
-m
i 1, .••• k •
This is the definition of f a:;. O, unifonnly on 0, see Definition 3.4a
n
(for arbitrary £ > 0, take m
=
L_!_ + lj). a £By introducing uniform Cauchy-sequences, one can extend the result to non-zero random limit functions for f •
n
A third convergence concept, convergence in distribution, is discussed in Chapter 4.
Lemma 3.11 (Chebyshev's inequality, see Tucker (1967) p. 39)
Let x denote a random variable, with IE lxlt < oo for some t > 0. Then, for every E > 0,
Lemma 3.12 Suppose x,x
1,x2, ••• are random variables. If for every £ > 0,
I
IP(lxn-xl>E)<oo n=l then x n a.s.....
x •Lemma 3.13 (Strong law of large numbers, see Tucker (1967) p. 124) Let x
1,x2, ••• be independent random variables with means µ1,µ2, ••• and
. 2 2 f variances o 1 ,o2, •••• I then 00
I
t=l 2 a. 1-:-2
1 nI
n i=l < 00 (x.-µ.) 1 1 3. 2. Random noise a.s.....
0 .We return to our estimation problem. Recalling the previous section, we assume from now on that the noise vector e, defined in (1.6), is a sequence of random variables on a probability space {Q,F,IP}. Moreover, the noise is supposed to satisfy
the (s+r)(m+N) scalar components of e are assumed to be independent,
2
var e = a I , with a unknown.
(3.1b)
(3. 1 c)
Condition (3.lb) says that all noise terms are independent, so independence in time and place holds; it implies a diagonal variance-covariance matrix var e.
Generalizations of (3.1c) will be discussed in Section 3.4. The object function defined by (2.4) is a random function now. For any realization of the random vector z, i.e. z(w)
=
n
+ e(w) for any w En,
we obtain leastT
squares estimates for the parameters by minimizing JN(G) = z(w) P(G) z(v;) as described in the previous chapter.
For convenience, the factor 1/sN is introduced in the object function:
(3. 2)
The parameter matrix
e,
defined in (1.3), can be seen as a vector in JRµ after rearranging, see (3.6). Here,µ := ps 2 + (q+l)sr (3.3)
is the total number of unknown scalar parameters.
The unknown true value of the parameter vector is denoted by
e
0•In order to be able to prove the consistency property for the least squares estimators to be defined below, we make 5 assumptions. Three of them are stated in this section. The first assumption guarantees the existence of a minimizing solution.
The parameter space 8 is a known compact and convex subset of JRµ, con-taining
eo.
The convexity of 8 enables us to use Taylor expansions.
Now, least squares estimators are defined in accordance with the introduc-tion of the lse method in Chapters 1 and 2 (cf. formula (2.4)).
Definition 3. 14
A sequence
{SN}
is called a (sequence of) least squares estimator(s) for 60 if eN is a minimizing solution:(3.4)
As noticed already in the previous chapter, JN is (a.s.) a C00-function with respect to
e.
Hence it is continuous, so the minimization problem is well-defined. Furthermore, due to the compactness of 8 and the continuity of JN on 8, eN can be chosen to be random in the sense of the probability space definition, see Jennrich (1969).Definition 3.15
The sequence of estimators {eN} is said to be (strongly) aonsistent for the true value parameter vector
eo,
ife
a.s.e
N ..,. 0 as N ..,. oo •
In order to prove consistency,
D~
E IRsN, which is the same for both repre-sentations (1.7) and (1.8), is rewritten as~:
+NJ
D~
=
(H+ K)6-11n+1
(3.5)
where
e
E IRµ is redefined byFurthermore, H and Kare sN x µmatrices:
0
s(N-p) x ps20
s(N-q) x (q+1)sr K :=I @ T
s
T\ii
I s @T\ii+
T 1-p In (3.7), S,n
and~ are defined as{
the N x N shiftZmatrix S
the SN x s2 matrix
and the sN x sr matrix .;
respectively. 0 © ""T "'m+N I @ E; T s m+1 0
It follows from Ds
=
0 for 8 80 and (3.5) that I @~ T s m+l-q (3. 8) (3. 9a) (3. 9b) (3.9c)
DI'.;
=
(H+ K) (8- 80) (3.10) Hence (3. 11)The matrix DDT in (3.11) is the same for both choices (1.7) and (1.8) for D, cf. (2.37).
A usual asslll!lption is the stability of the system. To this extent, we define the polynomial matrices
A(>.)
= -
I +I
a.. Ai i=1 l. q B(,\)l
j=O (3.12a) (3. 12b)If>.. represents the shift-back operator (>..nt nt_
1), (1.1a) can be written as
(3. 13)
2
For all 8 E 0, A(>.) is stable, i.e. det A(>.) has all zeros outside the closed unit disk.
Definition 3. 16
The spectral radius of the matrix$ in (1.10), denoted by r($), is
r($1 :=max {IJ.I; >.Ea($)}
The next lemma is a consequence of Asslll!lptions 3.1 and 3.2.
Lemma 3. 1 7
Proof
r
:=
max r(~) < 1 • 8€0The characteristic polynomial of $ is
det ,m-p(I ,p_"' ,p-1 " s " , ) ""1 " • • • - ap-1 " - a.P
because of the block-companion form of ¢T. For /.. .,, 0,
holds, hence the nonzero eigenvalues of ~ coincide with the reciprocals of zeros of det A(/..). For any fixed 6 E G, r(¢) < 1 is now obvious frcm Assump-tion 3.2. Th~ proof is completed by noticing that r(~) depends continuously on
6
on the compactG.
Remark
By definition, matrix¢ is s-table i f r(cµ) < 1.
Another consequence of the assumptions is the next lemma, which is impor-tant for the sequel. We introduce matrices A and B (of size sN x sN and sN x rN, respectively) associated with the polynomial matric~s (3.12):
p q 0 k Sk
©
B
AI
s
@ ~, BI
(3.15) k k=O k=Owhere S is defined in (3.9a) and a
0 • - I as before.
Lemma 3.18
The matrix AAT is 'bounded' in the sense that there exist some numbers p 1 and p
2 with 0 < p1 <
Pz
< oo, such thatfor all 6 E G and N ~ p+1.
Proof
A sketch of the proof for the case s = 1 can be found in Aoki and Yue (1970b), appendix B. We deal with the general case: s is an arbitrary and fixed positive integer.
The stability Assumption 3.2 is used to obtain the 'lower bound' p 1 IsN·
1. AAT~p
2
IFor any vector xE lRsN
xT AA T x = llAT xll 2
holds. From (3.15) it follows
p k T T AT x - x +
I
[ (S ) © a.k] x , k=1 hence llAT xii~
(1 + p lla.k11)11x11I
k=l Define p 2 = max ( 1 +I
lla.k11)ee:e
k=1which exists, since it is defined as the maximum of a continuous function on a compact set.
The cons an pt t '11 b d f' d h that (AAT)-l <_!_I
1 w1 e e 1ne sue = Pl •
For the vector x considered above we have now
It is easy to show that
N-1 k
I
s
@ gkk=O
Then min(k,p) g =
l
k i=l k = 1 ,2, ••• ,N-1 • ( N-1 llA-l xii~
1 +l
11~11)
llxll • k=lDefine fork= p,p+l, ••• the ps x s matrix
gk gk-1 G k
=
~-(p-1) We obtain Gk+l = lj!Gk • k p,p+l , •••where lji is the ps x ps matrix
I 0
( ps -1
with characteristic polynomial pi)! \)
= p~T(\)
= \
det (-A(\ )) for (3.16)(3.17)
A.
IO,
cf. Chen (1985) p. 49. The spectral radius of ijJ, r(ijJ), is equal to that of$ according to the proof of Lemma 3.17, so Lemma 3.17 impliesr := max r(ijJ) < 1 9€8
r+1 Define y :=
~ -1 k
J (
zI - 1/J) z dz , k - 0,1,2, ••• ,c
where C is the circle {z E (:
I
lz I y}. Consequently,II kH < M k+1
3
M>Ov o
k• ,1, •••"'eEe
1/1 ~Y
'
where M can be defined asM max f(z,e)
(z
,e)Ecxe
(3. 18)
(3.19)
since f(z,O) = ll(zI-l)J)-111 is a continuous function of its arguments z and 0 on the compact set C x 0.
Let us return to (3.16), which is rewritten as
since the number p
1 may not depend on N, From (3. 17) G = l)Jk G p+k p k=0,1,2, ••. holds, with llG kll :>
Ml+
1 llG II p+ pby virtue of (3.19). Via 11~11 ~ llGkll, k ;;: p, this yields
p-1
l
11~11
+ llG llM-fLk=1 p -y
Therefore
p-1 2 -1
pl := [max (1 +
L
llgkll + llG II M1
~")
] > 0 •eEe
k=1P
'
Again, the maximlllll exists since the function involved depends continuously on
e.
Corollary 3. 1 ~
Let p
1 and p2 be the numbers ds!fined in the proof of Lennna 3.18. There exist numbers p
3, p4 such that for all
e
E 8 and N ~ p+land a (3.20) hold, with p 2 Proof 'b d I f T T b . . h ' b d' f T .
The oun s or AA +BB are o v1ous, since t e upper oun or BB is obtained analogously to that of AAT. From
(3.21a)
with
0
(sN x ( s+r )m) {3.21b)
y 1 •. • Ym
the second statement follows, which implies immediately (3.20).
Especially (3.20) turns out to be a useful property of the matrix (DDT)-1, whose entries cannot be given in closed-form in general. For later
refer-ence (Chapter 5), another property is stated here already.
By definition, llMR
00 i•l, ... max ,n lnJ.=l IMiJ" I for any n x n matrix M.
Lemma 3. 20
There exists some positive nl.Bllber p, such that
for all 8 E G and N ~ .;
Proof
Matrix DDT is a block-Toeplitz and band matrix, which is positive definite for all 6 E 0 and N ~ µ/s, see (2.37) and Corollary 3.19. Thereby, it can be interpreted as the covariance matrix
IMA
of an s-variate Moving Average(with order m) process (up to a scalar):
m
. I
b.1=0 1
I .
Here, b0,.
2
.,bm ares x s matrices and {st} is s-variate white noise withvar st= a I, see Anders~n (1971) Section 5.7.1 for the SISO-model. More-over, if BCU
=
lJ=O bj AJ, then det B(A)F
0 for IAI ;;,; 1, due to the positive definiteness of DDT.Now, consider the corresponding Autoregressive process
m
I
bi xt-i =st i=Owith covariance matrix I:AR of (x1, ••• ,,,), then
DDT=~
and:r;;,_
1 only differ in the components in the two ms x ms submatrices located at both endpoints of the main diagonal. see Wise (1955) or Mentz (1976). CompareT -1 -1
also Shaman (1975). Therefore, for N >> m, we may approximate (DD )
=
(1MA_) by LAR. According to Brockwell and Davis (1987) p. 85, the Autoregressive process is causal. Therefore (Problem 11.5 p. 444 by the same authors),if y(O) y(1) I:AR
=
y(N-1)l<1)
T y (N-1) y (h): s x s y(O)then there exist a number£ E (0,1) and a constant K, such that for all i, j, h and 6
I
y .. (h)I ;;;;
K£h • 1] Consequently, llI:ARll_ ;;;; ~ 2 ___!__ 1-£= •
• p • Remark 1For the special case m plicit expressions for
= s = 1 DDT
'
is tridiagonal. We can giveex-T -1 ((DD ) ) . . : l.,J i $ j where . x
=
;K (- 1
+/1-4K
2) and cf. Mentz (1976) p. 430 and (2.37b).The positive definiteness of DDT for all N implies IKI <
J.
hence lxl < 1.This yields easily
II (DDT)-111 $
----
8~----oo 1a 2
1 I ( 1 - x ) ( 1 - l x I)
Remark 2
CJ
The result (DDT)-1
~
-J-
I is a consequence of Lennna 3.20, sinceT -1 1 T 1
m?x IA. I ll(DD) 11
00, where :\1. , i = 1, .•• ,sN are the eigenvalues of (DD)- ,
]_ ]_
see Wilkinson (1965) p. 59. However, the proof of Lemma 3.20 relies upon DDT '1: p
1 I.
Next, the input sequence{~.}~
1 is assumed to be bounded, formally:
Assumption 3. 3 Corollary 3.21 3- V. 2 II 111. II ~ M2 • ~2<00 i=l, , ••• Proof
The proof is similar to that of the well-kno1iill BIBO-stability result, cf. Zadeh and Desoer ( 1963) p. 483.
From the state-space description (1.10), (1.11), we obtain
t 1 '2... • , (3.22)
hence
t-1
~
llh11114>t-lHllK111 + llhllllbllM1 i=l
l
11.Pt-l-ill + llB011M1 t=1,2, . • . • Since r < 1 (Lemma 3.17) it follows, as in the proof of Lemma 3.18,t M/
~>O
Vt=0,1, ••• V8Ee 11¢ II'
r+l t -+ 0 Ct -+ oo) where y := -2- < 1 ' so 11.p II and t-1 ll.Pt-1-ill t-2 11¢ill 00 iI
l
;':: Ml
y = 1-y Mi=l i=O i=O
3. 3. Identifiability
For the true but unknown inputs and outputs, collected in'• see (1.7) or
(1. 8), we have
or equivalently, see (3.5),
T\n+Nl
: = (H+ K)60 •
11m+1
The true value of the parameter vector
e
0 is uniquely determined from ' if and only if (H+K)T(H+K) is regular. A necessary condition for the latter property is N µ/s.
Asymptotically, the identifiability asstnnption below makes sure that
e
0 is uniquely determined.Assumption 3.4
The µ x µ matrix . HT(DDT)-l sN H converges as
N
~""•
for alle
E
e.
The limit matrix, say G, is positive definite on 0.Now, we can prove the next lemma, which is one of the tools for proving consistency.
Lemma 3.22
IEJN(El) + J(8) as N + 00 , for all 6 E 0 where
Proof
From IE z ' and var z = o 2 I we obtain
IE/ Pz = tr (var zP) + (lEz)T P(IEz) sNo 2 + ' T P' ,
(3.23)
from a result to be proved in the next chapter, see (4.9). Hence, using (3. 11)
As N + oo, the effect of K vanishes since it has a finite number of nonzero elements, see (3.8):
and, with x := 8 - 8 0,
llKll2 I + 0 as N + oo,
N µ for all 8 E
e
for all 6 E 0, using (3.20), the Cauchy-Schwarz inequality and Assumption 3.4. Then the lemma follows from Assumption 3.4 again. a
Before proving consistency we give an interpretation for the identifiabil-ity Assumption 3.4 for the SISO-case s : r
=
1. It is similar to that given in Aoki and Yue (1970a). Definitions (3.6) and (3.9) reduce ton
=
1\n+1 1\n+N=
[f,;~+Nl
and f,; • • f,;m+ILet us rewrite Ds, which is the same for both representations (1.7) and
{1.8), as
Ds
=
An + Bf,; + v • {3.25)-1 a.1 a. Bo
s
p q a. Bq A p B (3.26) a.1 -1 Bo andrk
v :=..'.21.
a.Sm
t:m m a.1 a m 81 Sm s1For 9 = 90 (notation superscript o) A0n
= -
B0 f, - v0 holds. Therefore, since ASk=
SkA, k=
1,2, ••• ,m, (3.7) implieswhere w := - [Sv •.•
sPv
0 •.• OJ. Thenwith _ := [f, Sf, ••• Sp+qf,] , N x (p+q+1) , (3.28) and 0 -1
-Bo
a.1 F·=
-8 0 a. -1 (p+q+1) x (p+q+l) • q p (3.29)-Bo
a.1-s
a. q pHT(DDT)-l H
The effect of w0 in N vanishes, hence
o o T,-1 T 1 From Lemma 3.18 and (3.20) we see that the matrices (A (A) ) and (DD)-can be 'bounded'. Hence, provided the existence of the limits,
if and only if two conditions hold:
(i)
_T _ lim £.__E. > 0 , N-+oo N
(ii) F0 is regular.
We interpret Condition (i) as the input sequence{~.}~
1 being
persistent-1 l'"
Zy
exciting or order p+q.Condition (ii) is equal to: the polynomials
p A(A) = - 1 +
I
i=I i a. 1' 1 and B(A) qI
s. ;.)
J=O
Jare relatively prime and B(A) t O, see Wolovich (1974) p. 234-236. The former expresses the controllability of the pair ($,b) in (1.10).
For the MIMO-case, this interpretation seems not to be straightforward. We can obtain
AH= f(s©r )Ari ••• (sP0r )An
I
A~ (S©I )A~ ••• (Sq©I )MJ •s s s s
and define v0 := -(A0
n
+ B0~). Unlike the SISO-case, the effect of v0 and the corresponding w0 cannot be neglected now.3. 4. Main result
In this section we prove the strong consistency of any sequence {SN} defined by (3.4). Apart from (3.1), we impose a condition on the noise process.
Assumption 3.5
Let ei denote (scalar) component i of the noise vector e in (1.6), then
4 < M - 3
Let us start proving the useful
Leunna 3.23
As before, let
P
denote 3P/30. for any element 8. of 0. Then, there existsl l
a number k
1 > 0 such that
and hence
for all 0 € 0 and N ~ µ/s.
Since (for both choices (1.7) and (1.8) for D)
Di}
~
I and (3.30)hold, the first result follows easily from (3.20) and an evident 'upper bound' for DT D.
(s+r)(m+N) .
I
x Pxj T • $ llxll l!Pxll • <n:--T ~ >'Kt X Xwhich proves the second part of the claim.
Mow, the uniform convergence of
m
JN is obtained, referring to convergence as N +oo,
uniformly with respect to 8 on the compact 0.Proposition 3.24
IBJN(e) ..,.. J(0) , uniformly.
2 T
First, IBJN = o + i:; Pi:;/sN (see (3.24)) implies
(Pas in Lemma 3.23).
Then, by virtue of Lemma 3.23,
k~
1 sNDue to Assumption 3.3 and Corollary 3.21 we have
0
(3.31)
Hence, a positive constant c exists such that lldde IBJN(8) II $ c for all 8 € 0 and N ~ µ/s. By application of Lemma 3.8, {IEJN};'=µ/s is equicon-tinuous on 0. The pointwise convergence of IEJN (see Lemma 3.22) implies uniform convergence, cf. Lemma 3.9.
Remark
The uniform limit J is continuous now and since 0 is compact, J is even uniformly continuous on 0 (cf. Rudin (1964) p. 78).
We define the random function
0
Lemma 3.25
LN(S) a::· 0 , for all 8 E 8 .
Proof
From (3.2) and z = r; + e we obtain
T
LN(S) =
.3_
eT P(S)r; + e P(S)e - o2sN sN
Due to Lemma 4.21a there exist some constants c
1 and c2 such that
and
using Assumption 3.5.
In view of Lemma 3.11, from llPr;ll ::£ llr;ll, (3.31) and (3.34a) it follows
for any £ > 0. Therefore
00
I
IP (I
;N e T Pr;I
§:)
< coNo: 1
for all E > 0 •
Lemma 3.12 yields
-3.._ eT P(S)l; a._;. 0 for all 8 E 8 • sN
(3.33)
(3.34a)
(3. 34b)
By virtue of Lemma 3. 11, llPll 2 = tr P sN, and (3. 34b) we obtain for the remaining terms in the right-hand side of (3.33) that
Tp 2 c2M3
IP
(I
e sN e - oI
~£) ;;; - -
2- sN for all£
> 0 • £Due to Corollary 3 in Varberg (1968), P/sN 2; 0 and tr P/sN ~ 1 imply that er P(8)e sN a.s. -+
er
2 for alle
E 0 • Lemma 3.26The sequence {I-N} is a.s. bounded, uniformly on
e.
ProofFor a definition of this notion, see Definition 3.10a. If• denotes
ataei
again, then c (3. 35) T• T 3 23I
e PeI ::;
r.-k _ e e ld . h As a consequence of Lemma • , sN _ >'K 1 sN ho s, wit e Te a. s. s + r 2'SN ....
-s- er
by application of Lemma 3.13. Here, the condition is satisfied due to Assumption 3.5. Furthermore,
T • T
!
T • 2!
2k!
T!
le
P~I
:£(~)
(LU)
:£ (-1 (M2+M2))(~)
sN sN sN s 1 2 sN
using the Cauchy-Schwarz inequality, Lemma 3.23 and (3.31).
(3.36)
Hence, ~ and therefore 111-N II are bounded by an a. s. convergent random variable not depending on
e
with non-random limit. a As an implication of.Lemmas 3.25 and 3.26 we obtainProposition 3.27
~(e) a.l. O uniformly.
Application of Theorem 3.10b. See also Ten Vregelaar (1988a). c