Least squares parameter estimation in a dynamic model from noisy observations

(1)

Least squares parameter estimation in a dynamic model from

noisy observations

Citation for published version (APA):

Vregelaar, ten, J. M. (1988). Least squares parameter estimation in a dynamic model from noisy observations.

Technische Universiteit Eindhoven. https://doi.org/10.6100/IR293672

DOI:

10.6100/IR293672

Document status and date:

Published: 01/01/1988

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be

important differences between the submitted version and the official published version of record. People

interested in the research are advised to contact the author for the final version of the publication, or visit the

DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page

numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

IN A DYNAMIC MODEL

FROM NOISY OBSERVATIONS

(3)

(4)

PROEFSCHRIFT

TER VERKRIJGING VAN DE GRAAD. VAN DOCTOR AAN DE TECHNISCHE UNIVERSITEIT EINDHOVEN, OP GEZAG VAN DE RECTOR MAGNIFICUS, PROF. IR. M. TELS, VOOR EEN COMMISSIE AANGEWEZEN DOOR HET COLLEGE VAN DEKANEN lN HET OPENBAAR TE VERDEDIGEN OP

DINSDAG 13 DECEMBER 1988 TE 16.00 UUR

DOOR

JOHANNES MARTINUS TEN VRECELAAR

(5)

de promotoren

prof.dr.ir. M.L.J. Hautus en

prof.dr. R. Doornbos

(6)

(7)

PREFACE

CHAPTER 1. INTRODUCTION 1.1. Statement of the problem 1.2. Estimation method

1.3. Notation for model and observations 1.4. Definition of symbols

CHAPTER 2. ALGORITHM FOR COMPUTING ESTIMATES 2.1. Object function and derivatives

2.2. Computation of object function and 2. 3. The Q·-R decomposition 2.4. Discussion CHAPTER 3. CONSISTENCY 3.1. Preliminaries 3. 2. Random noise 3.3. Identifiability 3.4. Main result

CHAPTER 4. ASYMPTOTIC NORMALITY OF QUADRATIC FORMS 4.1. Convergence in distribution

4.2. Asymptotic normality

4.3. Quadratic forms in normal random variables 4.4. Quadratic forms in independent random variables

CHAPTER 5. ASYMPTOTICALLY NORMAL ESTIMATORS 5.1. Taylor expansion

5.2. Asymptotic normality of the gradient 5. 3. Main result

5.4. On estimating the covariance matrix

2 3 5 8 8 11 14 22 26 26 33 46 50 56 56 58 65 68 74 74 81 95 97

(8)

6.2. Asymptotics for constrained optimization 6.3. Partial input noise

6.4. Asymptotics for the partial input noise case

CHAPTER 7. SPECIAL CASES 7. 1. No dynamics

7.2. Normal errors

7.3. No dynamics and normal errors

CHAPTER 8. A SIMULATION STUDY 8.1. Model and asymptotic analysis 8.2. Simulation results REFERENCES SAMENV ATTING CURRICULUM VITAE 104 111 115 125 126 135 142 146 146 150 154 158 160

(9)

Let us consider a dynamic black box model with input ~ and output n which may be vectors.

•

black box

•

In order to control such models, one must be able to predict the output for known inputs. To this extent, we specify the model to be a member of the fairly general class of ARMA-models:

Auto Regressive part Moving Average part

In many practical situations, both inputs and outputs cannot be observed without error: we assume that observed values of outputs and inputs are given by

yt = nt + [t xt ~t + ot where t:t and

<\

are the noise terms.

The object is to estimate the parameter matrices a

1 , ••• ,ap, S0, S1 , ••• , 1\ from a set of observations for inputs and outputs. This is what this mono-graph is about.

On the one hand, we can view our problem as an extension of the so-called

errors-in-var>iables models, see Linssen (1980), Hillegers (1986) and

Johansen (1984). Then the extension consists of taking dynamics into

(10)

On the other hand, the present problem is a generalization of the problem described in Aoki and Yue (1970b): the authors study the dynamic model

uithout noise on the input observations.

We do not allow for shocks or model errors, as is done in Maravall (1979): the identification in dynamic models with shocks and noisy observations is discussed there.

A survey of the contents of the thesis is given now.

In Chapter 1, we formulate the problem and define the Zeast squaPes estima-tion method for this problem. This method is the topic of the subsequent chapters.

In Chapter 2, an algorithm for computing least squares estimates is given: the structure in the model is used in an optimal way.

Asstmling a random noise process, we treat the asymptotic analysis of our method. In Chapter 3, the strong consistency is proved under a number of assumptions. A key role is played by the concept: almost surely uniform convergence of random functions.

The asymptotic normality property is the subject of Chapter 5. The essen-tial issue here is the limit distribution of certain quadratic forms in random variables. Therefore, the preliminary Chapter 4 is included.

Two generalizations of the original problem are discussed in Chapter 6: (equality) constraints on the parameters and partial input noise. We deal with the consequences for the algorithm as well as for the asymptotic analysis.

For special cases, it is often possible to reduce or to check the assump-tions which are made. This is investigated in Chapter 7 for the non-dynamic model and the case of normally distributed errors, respectively.

(11)

In this chapter we formulate the parameter-estimation problem, that will serve as the starting point for all later chapters of this thesis. More or less the same problem was discussed before by e.g. Aoki and Yue (1970a) and Eising, Linssen and Rietbergen (1983).

After a description of the problem in Section 1.1, in Section 1.2 we intro-duce a method for estimating the unknown parameters. In fact, this estima-tion method is the subject of this monograph. Secestima-tion 1.3 is devoted to defining a compact notation for the problem and estimation method. Some notations and symbols to be used throughout the thesis are collected in Section 1. 4.

1. 1. Statement of the problem

We consider a linear, time-invariant and causal discrete-time system defined by an ARMA (autoregressive-moving average) description of its inputs ~t and outputs nt (t representing time):

1

a. nt . +

!

$.

~

. , i=1 i - i j=O J t-J

t € 7l • ( 1. 1 a)

The inputs and outputs may be vectors in IRr and IRs, respectively, so for

r,s > 1 (1.1a) defines a MIMO-model (multiple input - multiple output).

The orders p and q are supposed to be known. We define

m := max(p ,q) • ( 1. 1 b)

At time t = 1, we start measuring inputs and outputs of this ongoing process. Both inputs and outputs are observed with noise (see figure):

(12)

t 1,2, •.. ,m+N, ( 1. 2)

m+ N denoting the number of observations yt and xt available.

~--.---- ARMA

n

x y

The problem is to estimate the unknown parameter matrices a

1,a2, ••• ,ap,

s

₀

,s

_{1, ••• ,Sq from the set of data {(x 1,y 1),(x2,y2), ••• ,(xm+N'Ym+N)}.} Unlike Eising, Linssen and Rietbergen (1983), we do not make any assump-tions on initial condiassump-tions, by taking t

=

m+1,m+2, .•• ,m+N in (1.1a).

Assumptions on the noise processes ot and are deferred to Chapter 3, where asymptotic properties of estimators are discussed.

1. 2. Estimation method

Throughout this thesis we consider the least squares estimation method for

obtaining estimates for the s x [ps + (q+1 )r] matrix of unknown parameters

(1. 3)

By definition of this method (abbreviated in the sequel as Zse method), we minimize

( 1. 4)

with respect to 6 and the unknown 'true' inputs and outputs

~

₁

••••• ~m+N' _{n 1, ••• ,nm+N' subject to the model equations (1.1a) for}

(13)

in order to obtain an estimate fore. The norm 11•11 in (1.4) denotes the Euclidean vector norm. In literature, this method is sometimes called orthogonal distance regression, see Boggs, Byrd and Schnabel (1987).

As will be seen in Chapter 2, elimination of inputs and outputs gives rise to an unconstrained minimization problem with respect to 8 alone. First, we introduce for convenience a compact notation for the model (1.la) and the observations (1.2).

1. 3. Notation for model and observations

We write the model equations (1.la) fort= m+1, ••• ,m+N as a matrix vector equation, in which a freedom of choice for their explicit forms exists:

Dl; 0 • (1. 5)

Here, Dis an sN x (s+r)(m+N) matrix containing the entries of 8, and the

(s+r)(m+N) . .

column vector l; E lR consists of the true inputs and outputs. Let z and e denote the vectors of observations and noises, respectively, corresponding to l;. Then

z=l;+e

is a short-hand notation for (1.2).

The following choices for D and l; will be useful for our purposes:

( i) D

'

_'

'

_'

' '

'

_'

T\n+N

'

n2 ....

Bo ••

Sm

n1 l;

~+N

t,;2 t,;1 (1. 6) (1. 7)

(14)

(ii) _{Yo •• Ym} r.;m+N

'

_'

'

_'

D

_'

'

( 1. 8)

'

Yo •• Ym

'

where a.

0 - I (I is the s x s identity matrix)

and t;.

l.

[~~]

i 1 , 2, ••• ,m+N •

We define "'it

=

0 for k > p and ~ 0 for k > q. The empty spaces in the matrices D represent zero entries.

For both forms (1.7) and (1.8), the matrix Dis row regular for all 0. Consequently, the Moore-Penrose inverse of D is

+ T T -1

D = D (DD ) •

Furthermore, we give a state-space description for the model equation (1.5) (the vector K € 1Rsm denoting the state):

(1. 9) Kt+1 _{<j'>Kt +}

b~ ~

8_o:t t 1, 2, ••• ,N ( 1. 10) nt "' hKt + with 0 _am I sm x sm matrix ¢ I _a.1 i3m + a.m 80 sm x r matrix b 81 + a.1 80 and s x sm matrix h [O ••• 0 I]

.

(15)

This is the so-called observable canonical form.

Remark

An

expression for the state vector Kt in (1.10) is

nt+1-a1nt - Bo~t+1-61~t nt - 6o~t

t = 1, 2, ••• ,N-1 • (1.11)

1. II. Definition of symbols

Entries of a matrix A are written as (A) . . or a ..• Row i and column j of 1. .J l.J

A are denoted as Ai* and A*j' respectively.

The Kronecker product of an m x n matrix A and any matrix B is defined by

A@ B :=

a11B a12B alnB

a21B

A

n

a B mn

where, as throughout this dissertation, empty space in a matrix represents zero entries.

(16)

For symmetric matrices A and B, A ~ B and B ~ A stand for A-B ~ O, i.e. A-B is positive semidefinite. Likewise, A , B (or B < A) means A-B > O: A-B is positive definite.

By 11•11 we denote the Euclidean vector norm or the Frobenius matrix norm:

llxll2

=

tr XT X for any matrix X.

The abbreviations det A and tr A are used for determinant and trace of any square matrix A; o(A)

{A Et; A is eigenvalue of A} is called the

spec-trum of A. We denote the identity matr~x of arbitrary dimension by I.

The object function to be defined in the next chapter, will be denoted by JN(e). Then JJ(0) is the (column) vector of derivatives with respect to

h . f0 I ' · u 2 ( ) ·

t e entries o • t is a vector in 1R where µ := ps + q+1 sr is the number of unknown parameters, see (1.3). We shall write the components of

i J~ (0) as JN, so

i 1,. .. ,]J •

Furthennore, J~' is the ]J x µmatrix of second derivatives of JN: its (i,j)-th entry,

(JN") • .

i,J

a2

J

N

is also denoted by Jij·

Let M be a matrix depending on e. When no confusion is likely, we use Mi and Mij for ClM/80. and a2 M/CJ0. ae., respectively, which are the matrices

i J i

obtained from M by differentiating all its entries.

When dealing with random variables, lP and

m

are the symbols for probability

and mean or expectation operator. The latter is applied to scalars as well as vectors. Likewise, we use the abbreviation var for variance: if x is a

(17)

If x and y are random vectors in IRm and IRn, respectively, cov(x,y) is the m x n matrix whose (i,j)-th entry equals cov(xi,yi) for i

=

1, ••• ,m and

j 1, ... ,n.

More definitions and notations are introduced later on, in particular in the Sections 3.1 and 4.1.

(18)

In this chapter we introduce a method for computing least squares estimates (as defined in Section 1.2) for the parameters from the set of observations. In Section 2.1 expressions for the object function and its derivatives up to order three are derived. Since the minimization problem involved has no closed-form solution in general, we use an iterative procedure in order to find minimizing points.

In Section 2.2, we introduce a Q-R decomposition for the matrix DT which is suitable for the computation of the object function and its gradient. In Section 2.3, the Q-R decomposition is discussed in detail. For the matrix D, defined by (1.7), it can be performed very efficiently.

Some remarks on complexity of this method, and on other methods are con-tained in the final Section 2.4.

2. 1. Object function and derivatives

In terms of notations (1.6) and (1.7), the lse method becomes (min abbre-viates minimizing)

2

llz-sll subject to Ds

We view the problem as

0 .

min(minllz-sll2 subjecttoDs

o)

e

s

and we optimize for s keeping

e

fixed.

(2. 1)

The result will be a function of 6 alone, to be denoted by JN(8). Then 8 has to be chosen such as to minimize JN(8), without any restriction on 8.

(19)

Application of the Lagrange multiplier method to the first optimization gives rise to the Lagrangian

(2. 2)

It follows from the stationary point equations

T i;;+D ;\=z

Di;;

=

0 , that

(2. 3)

Then the object function JN(e) becomes

(2. 4)

where

+

P := D D (2. 5)

is the (s+r)(m+N) x (s+r)(m+N) matrix representing the orthogonal

projec-T

tion on the column space of D .

The object function is infinitely often differentiable with respect to the entries of e. For our purposes, expressions for the first, second and

. i

third derivatives will be sufficient. Let A or A denote aA/ae. for any

l

entry of e and any matrix A. Using

(2.6)

we can easily verify that

(20)

where

P.L := I - P

represents the orthogonal projection on the kernel of D. For Pij := <J2 P/(lfJ. (lfJ. we obtain

J l

where

(2. 8)

(2. 9a)

We have used here that <J2 D/dfJ. (lfJ. = 0 for all i and j, since D depends

J l

linearly on the elements of

e.

cf. (1.7), (1.8). Likewise, it follows

where

pijk :=

~-<l~

3

-,-P

_ _ = Qijk + (Qijk)T '

aek ae. ae.

_J

l

Qijk =

_a_

Qij

aek

(2. 1 Oa) Pk(Dj) T (DDT) -1 Di p.l + p.l(Dj) T {(DDT)-1 }k Di p.l- p.L(Dj) T (DDT)-1 Di pk+ (2.10b) Here, (2. 11)

Expressions for jN (any component of

J~).

JNij ((i,j) element of

J~')

and

(21)

• T • JN

=

z Pz (2.12a) Jij =zTijz N J ijk = z T pijk z N

2. 2. Computation of object function and gradient

The minimization problem

(2.12b)

(2.12c)

(2. 13)

has no closed-form solution, as was already noticed by Aoki and Yue (1970a). Therefore, an iterative algorithm is applied in order to obtain least squares estimates for

e.

An initial estimate is needed. The algorithm we choose is the so-called Broyden-Fletcher-Goldfarb-Shanno

(BFGS) formula, see Scales (1985) pp. 89-90, which has good numerical properties. This quasi-Newton method needs function and first derivative evaluations, which will be discussed now.

From (2.4) we recall

T

JN = z Pz • (2.14a)

Then (2.12a) and (2.7) yield

• T + • _J.

JN = 2z D DP-z • (2.14b)

Definition (1.7) for D and~ is most convenient for describing an algorithm for the computation of JN and its gradient. This is a consequence of the Q-R decomposition for DT, that ~ introduce as method of ccmputation. Recall that DT has full column rank.

In the next section, a detailed description of this deccmposition for the matrix DT of (1.7) is given. Here, we discuss its use for computing the object function and the first derivative vector.

(22)

There exists an orthogonal (s+r)(m+N) square matrix Q and a Zo~r-triangu

lar regular sN square matrix R, such that

(2. 15)

Due to definition (1.7) for D, it is obvious that we are looking for a Zower-triangular rather than an upper-triangular matrix R, see (2.30b) in the next section.

Define Q₁as the (s+r)(m+N) x sN matrix consisting of the first sN columns of

Q

(write

Q =

[Q

₁

Q

₂

J).

Then

DT = Q R 1

Due to the orthogonality of Q,

p

hence

Furthermore,

A=

(D+)Tz (as before, see (2.3)) implies

It follows from (2.14b) that

• T • T JN= 2A D(z-D A)

Premultiplying (2.19) with QT gives in view of (2.16) and (2.17), 1

RA

=

u ,

with u E ffi.5N is defined by

.T u := 1.11 z (2.16) (2.17) (2. 18) (2.19) (2.20) (2. 21) (2.22)

(23)

Swmning up, we have seen so far that by extending the matrix DT with the column vector z, the matrix R and the vector u are computed from

(2.23)

.

Then JN and any component JN of JN follow from R and u, by

(2.24a)

(2. 24b)

RA

=

u • (2.24c)

Equation (2.24c) can easily be solved for A, since R is lower-triangular.

Suppose the sN components of

A are arranged as

A

=

[A₁, ₁A _1,2 • • A _{1,s 2,1}A • • A _2,s• • • • A _{N, 1} AN _,s] T • (2.25) Then we obtain the following expressions for the ps 2 + (q+1)sr components of JI

N (using 11 and [, in accordance with (2.3)):

(UN N

~ i,~

1,2, ••• ,p 2

_I

A n,i 11n+k,j

'

n=1 1,2, ••• ,s {)JN N

l

k 0,1, •.. ,q <l((\) . .

=

2

I

A n,i [,n+k,j

_'

i 1,2, ... ,s 1,J n=l j 1,2, .•• ,r where (note a 0

-I)

,j and min(i-1,p) s Ym+N+l-i,j -

I

(~)~,j Ai-k,~'

k=max(O,i-N) ~=1 i 1 , ••• , p+N j 1,2, •.• ,s (2.26a) (2.26b) (2.27a)

(24)

t;, • .

l , J min(i-1,q) s xm+N+1-i,J' -

I

(Bk)~

·

Ai-k,~

k=max(O,i-N) ~=1 ,J \ i ( j 1, .•• , q+N 1,2, ••. ,r (2.27b)

(here x . . an.d y . . indicate the jth component of the vectors x

1. E 1Rr and l , J l , J

y. E lRS).

l

2. 3. The Q-R decomposition

In this section we discuss the transformation of the matrix [DT

I

z] into

[~j ~J

by means of an orthogonal matrix Q, see (2.23). Using the special structure of nT, one can do this efficiently.

Introducing a. :=a! and b. :=

s'.f,

i

=

0,1, ... ,m, matrix DT defined by

l l l l (1.7) is the (s+r)(m+N) x sN matrix ao al a m ao al a DT

=

m (2.28) bo bl b m bo bl b m

(25)

consisting of two block-Toeplitz band matrices.

The process of transforming [DT

I

z] into

[~I ~l

(cf. (2.23) is split up in

=

[~o

_v*l

n

[DT

I

z]

Qi;

(2.29a)

and

(2.29b)

with

n

and

W

orthogonal. The s(m+N) x sN matrix S and the sN x sN matrix R have similar shapes:

s

(2. 30a)

R = (2.30b)

(the s . . and r . . are s x s blocks.

]. • J ]. ,J

(26)

Step 1: Q [DT

I

z]

=[:I :1

Orthogonal matrices

n

₁

,n

₂, ••• ,Qm+N are chosen with

The following scheme shows the effect of the matrices Q. on [DT

I

z]. The ]. matrices in (2.31) represent

n

1 [DT

I

z] and

n

2

n

1 [DT

I

z], respectively.

~

Ym+N

~

Ym+N am a m ao ao a am-I am-2 m Y3

- - -

-

---am ---am-I Y2 s s v2 '

.,

.

-- --

₅ m+N,N vi

_n2

s

.

vl

"

-0 _* 0 _*

-

- -

-

--- --

-t§)•

xm+N-1

__

0 ..:,.

_______

*

t§)

xm+N-2 bm-1 bm-2 b . m-1 0 0 0 bo bo x2 X2 bm-1 xl bm-1 bm-2 xl (2.31)

(27)

1 . 1 ' T b n h 1 f DT b 1

By premu tip ying D y "• t e ower part o ecomes equa to zero. This is done block-diagonal by block-diagonal in order to preserve the block-Toeplitz structure. Matrix Qi corresponds to making block-diagonal i equal to zero. The number of nonzero block-diagonals in Qi Qi-l ••• Q₁DT is equal tom for i

=

2,3, ... ,N. Therefore, we need m+N matrices Qi to transform DT

into[~],

see (2.29a).

After premultiplication of [DTI z] by Qi' block-row m+N+l-i of S (see (2.30a)) and vi

E 1Rs are determined, for i

=

1,2, ••• ,m+N.

Let us consider a matrix Qi in detail. In order to make any column of b 0 (say column! and write (b

0)*!) equal to zero, we apply an (r+l) x (r+l) Householder matrix hQ: we take the nonzero diagonal element of column ! of ao· denoted by (ao>i,i• to obtain

(2.32)

*

representing some nonzero scalar. Now, the block-Toeplitz structure is preserved by applying this

~

to all

correspond[~~]g

pairs

[:~]

along the diagonals and to alZ other corresponding pairs b~ , j

=

1, ••• ,m.

We start by making column s of b

0 equal to zero, proceeding with cclumn s-1, etc. Consequently, a₀ remains lower triangular during the transformation process and hence the sk,k bloks in (2.30a) are also lower triangular

(k

=

1, ••• ,N). As already indicated in (2.31),

matrices[:~]

transform into

r:~

]

for j

=

0,1, .•• ,m by premultiplication by Qi. We note that

J-1

bm 0 for i ~ 2, and, of course, b_₁=

O.

For completeness, we give an expression for Qi' which, however, is not computed explicitly in the algorithm. For given a

0 and b0 and fixed ! E {1,2, ••• ,s} we choose for~ in (2.32)

T T

(28)

where

and

see e.g. Golob and Van Loon (1983), Section 3.3.

Introducing := diag(l, ••. ,1,(hr.l)l 1,1, ••. ,1) ,

t .

s x s position J1, the s x r matrix h

12 with only nonzero elements in row Ji:

and we define I s I r I r (s+r)(m+N) x (s+r)(m+N) •

(29)

The number o~ Is and Ir blocks is i-1, whereas Ht contains m+N+l-i blocks h11' h12' h12 and h22"

Now Qi can be written as a product of s orthogonal matrices

Q.

=

H H ••• H ]_ 1 2 s

When dealing with Hs-l we take in (2.32) the b₀which has already been transfonned by Hs etc. etc.

Summing up, we have established the following. The effect of an arbitrary Qi on the (transformed) matrix [DT I z] is known by only computing, s times successively (t

=

s, .•. ,1), an (r+l) x (r+l) Householder

matrix~

and the

t

[(a·)

t*]

result of postmultiplying hQ by, on the one hand, all matrices b~ , j = 0, 1, .•. ,min(m ,m+N-i) and, on the other hand, all vectors

[ Ym+N-j,t ] . . , J = 0, 1, ••• ,m+N-i. xm+N+l-i-j

~: ~

[*.]

=

[;r.]

~ ~ ~

Now orthogonal s(m+N) x s(m+N) matrices ~

₁

.~

₂

, ..• ,~N are chosen with ~ := ~N ~N-1 ~2 ~1

and

~ := diag(';p,I)

We show the transformation of

[~

I

~]

by means of premultiplication by

~.

in tenns of the matrices

\¥

1 [SI v] and

'q/

2

q/

1 [SI v] below. Here, S is as defined in (2.30a) and

vm+N

(30)

.

______

~N_:l_:N:m:l~ ·~~N_:l_:N::_1)_

rN ,N-m • • • rN ,N

···oo

•,•

.,.

•.. s 0

.,.

vm+J v m ~2

-

---~~~]_

_r.,. _rN-1,N-1 r. s

_·,·

...

0

.

...

oo

. .

••• s ••• 0 0 vm+N v m (2. 34)

As a result of the multiplication by ~. the last sm rows of S become zero. Proceeding block-column by block-column we distinguish N multiplications denoted by ~

1

, ~

2

, ••• , ~N.

After application of Wi, block-row N+1-i of Rand block-row um+i of

um+N u E JR.SN

'

um+2 um+1 are known, i

=

1,2, ••• ,N. ~

Let us discuss multiplication by ~i' for arbitrary i E {1,2, ••• ,N} in detail. Again, Householder transformations are used to make the columns of

length sm in (2.34) equal to zero. Considering column £of the sm x s block

SN+1,N+1-i

sN+m,N+1-i

we compute an (sm+1) x (sm+1) Householder matrix~ from that column and from the nonzero diagonal element of the corresponding column of

sN+l-i N+l-i (analogous to (2.33)) in order to obtain

(31)

(sN+1-i,N+1-i)i,i * 0

h'l' (sN+1,N+1-i)*i (2.35)

(sN+m,N+l-i)*i 0

Again, * denotes a nonzero scalar.

By starting with the last column of the block (i • s in (2.35)) and pro-ceeding with column s-1, etc., we obtain that the sN+l-i N+l-i blocks

'

remain lower-triangular. The size of the nonzero square appearing in the lower part of S (cf. (2. 34)) remains sm x sm, for i = 2, .. .,N-m.

The result of multiplication by ~i is obtained by only computing, s times successively (i

=

s, ... ,2, 1), an (sm+l) x (sm+ 1) Householder matrix

~

and the postmultiplication of h: by, on the one hand,

(s _{N+l-1,N+l-:i.-J ,,,*}. . ·)" sN+1,N+1-i-j

= 0,1, ••• ,min(m,N-i)

sN+m,N+l-i-j

and, on the other hand,

The transformed blocks retain their names in (2.34).

Since any diagonal block r . . of R in (2.30b) is a transformed s . . block,

:i.,1 :i.,:i.

it is lower-triangular and regular. Therefore, R itself is lower-triangular and regular.

The computation of object function and its gradient, from R and u, has been given in (2.24)-(2.27), so the algorithm for computing estimates is now complete.

(32)

2. 11. Discussion

The following necessary and sufficient conditions concerning the minimiza-tion problem (2.13) are well-known from optimizaminimiza-tion theory, cf. Scales

( 1985).

Necessary conditions:

If the object function JN has a minimum in

e*

then (stationary point)

J~'<e*)

<::

o .

Sufficient conditions for a local minimum:

The conditions

J

"<e*)

>

o

N

are sufficient for

e*

to be a local minimum point of JN. Here, z and N are assumed to be fixed.

A computer program based on the algorithm described above is available. In Chapter 8, dealing with a simulation study, we shall discuss numerical results. The conditions above will be verified.

A Q-R algori th:n for an m x n matrix (m :;; n), based on Householder matrices, takes in general n2(m--J) operations, see Golub and Van Loan (1983) p. 148. In view of the expos1t1on in the previous section, the number of operations to obtain R in (2.30b) from DT is of order O(N). In Eising, Linssen and Rietbergen (1983), matrix inversion is proposed, giving an algorithm of 0(N2) operations.

Our algorith:n for obtaining object function and gradient is of 0(N2), because of the computation of u (see the concluding remarks of step 1 in the previous section). However, by somewhat modifying the algorithm, we

(33)

even obtain one of

O(N).

In this approach. the vector u is not computed by extending the matrix DT with z as described in (2.23). From (2.16). it

T ~ T

follows that Q

1 R D. Hence. u defined by (2.22) as u

=

Q1 z can also be written as u R-T Dz.

Once R is determined, u is solved from

T

R u = Dz • (2.36)

which takes only 0(N) operations because of the special form of R, cf. (2.30b). Then, as before. JN and JN are computed from Rand u. see (2.24).

2

If N >> ps + (q+1)sr. we solve u from (2.36) rather than computing it from (2.23).

The matrix DDT• which is the same for both choices (1.7) and (1.8) for D. is a syrmnetric. block-Toeplitz and band matrix:

DDT where do d1 d _m

=

dT 1 dT m d m m-k T T

I

<a. a.

k +

s1. s1.

+k> i=O l i+ (2. 37a) dT tn dT 1 di do k O, 1 ••••• m • (2. 37b)

In the literature, one can find some algorithms for inverting such matri-ces. cf. Trench (1974) and Meek (1983). The Q-R decomposition introduced here is superior to these algorithms with regard to the number of opera-tions. Moreover. Q-R decomposition is known to be a numerically stable procedure.

(34)

If we assume zero initial conditions, as is done in Eising, Linssen and Rietbergen (1983), we can write the model equations as

0 with -I a.1

D

=

_a. m and a. _m -I Bo Bm sN x (s+r)N

The Q-R decomposition of DT is much alike the decomposition of DT in (2.28), except for the second step which drops out now, see Ten Vregelaar (1985).

When the input is scalar (r = 1), we can use Givens transformations instead of Householder transformations in step 1 of Section 2.3.

Finally, we may obtain an initial guess for the parameter vector by assuming equation errors instead of noisy observations, cf. Strejc (1980). It is possible to write

I

ai Y t-i +

I

B. xt . + error , i=1 j=O J -J t m+1, ••• ,m+N , as y

xe

+ error

(35)

giving

6

initial

cf. Goodwin and Payne (1977) p. 83.

See also Chapter 8 for a discussion of the so-called naive least squares approach.

(36)

From now on, the noise is assumed to be random.

A minimal requirement for any estimation method is consistency of the estimators: if the number of measurements tends to infinity, estimates for the parameter vector should converge (in a sense to be specified) to the true value.

In Section 3.1, some convergence concepts are introduced for random vari-ables and random functions. Useful results from probability theory are given for later reference.

In Section 3.2 we give formal definitions of least squares estimators and consistency. Some assumptions on parameter space, inputs and model are dis-cussed.

A crucial convergence assumption is the topic of Section 3.3. It has an interpretation for the SISO-model. Then the consistency result follows in Section 3.4 after making an assumption on the noise.

Consistency results for approximate lse methods and the lse method for the special case of no input noise, were given earlier in the papers of Aoki and Yue (1970a, 1970b).

3. 1. Preliminaries

In this section, we give some preliminary results from probability theory, which constitute the tools for proving asymptotic properties of the least squares estimators. A mathematically profound treatment of asymptotics of estimators can be found in Bierens (1981) in the context of nonlinear regression models. Some of his results are convenient for us to apply.

We start by introducing two well-known convergence concepts. Let {x } be a

n

(37)

Definition 3. 1

a) The sequence {xn} converges in probability to a random vector x, nota-tion x ~ x, i f

n

lim IP {w E SI

I

llxn (w) - x(w) II > E:} 0 for every E > 0 , n-+oo

(for short: lim IP(llxn-xll > d

n->oo

0 for every E > 0).

b) The sequence {x } converges almost surely to a random vector x, notation

n x, if IP {w E SI lim x (w) n x(w)} (for short: IP (xn ...,. x) 1). 1 •

Almost sure convergence is stronger than convergence in probability. How-ever, the following converse holds:

Theorem 3.2

, ••• be random vectors. Then xn ~ x if and only if every sub-sequence {Kuk} of {xn} contains a subsequence {xnk.} converging almost

J surely to x.

Proof. Tucker (1967) p. 103.

...,. IR}', be a Borel-measureable function, which is continuous on a Borel set B with IP(x EB) 1.

If x,x

1 ,x2, ••• are random vectors in IRk with xn...,. x (a.s. or p) then f(x ) ...,. f(x) (a.s. or p).

n

Proof. Tucker (1967) p. 104.

0

Now, we define the concept of uniform convergence of random functions, see e.g. Bierens (1981) Section 2.3.

(38)

Definition 3.4 Let f(O),f

1(0),f2(0), ••• be random vector functions on a parameter space 8 c lR,1', • i.e. F-measurable functions for every (fixed) 0 E 8, where {Q,F,IP} is the probability space involved.

a) fn(O) converges al;most surely to f(O) uniformly on 8, notation f (0) a~s. f(O) uniformly on 8, if

n

sup llfn (0) f(O) II a._;. O

em

or equivalently, there exists a null set N (IP(N)

=

O) such that for every wE U'-N, fn (0,w) -+ f(O,w) uniformly on 8, holds. See Rudin (1964) p. 134.

b) fn(O) converges in probability to f(O) uniformly on 8, notation f (8)

~

f (O) uniformly on 8, if

n

sup llf (0)- f(O)ll -1;. 0 . eEG n

Remark

We will consider only random functions for which

indeed a random variable for n

=

1,2, ••• , in the sense that it is F-measurable.

A useful result is the generalization of Theorem 3. 3 for random functions.

Theorem 3.5

Let {fn(e)J be a sequence of random vector functions on a Borel subset 8 c

mi.

Furthermore, assume that f (8) is a random vector function that is a.s. continuous on 8, i.e. continuous on 8 for all wE Q,N with IP(N) = 0. If en and e are random vectors in 8 a.s., then

a) f (0) a:f. f(O) uniformly on 8 and n

imply

f (8) a.~. f(e) n n

(39)

b) f (8)

....

p f(8) uniformly on 0 and 8

....

p

8

n n imply f (8 )

....

p f(S) n n Proof Write f (8 ) - f(S) n n f (8 ) - f(8 ) n n n + f(8 ) - f(S). n

In Section 3.4, the strong consistency of least squares estimators is proved by applying

Theorem 3.6 (Bierens (1981) Lemmas 3.1.3 and 3.1.8)

Let {f (8)} be a sequence of (real) random functions on a compact set 8. n

Suppose fn(8) + f(8) (a.s. or p) uniformly on 0, where f is a continuous non-random function on 0.

If f has a unique minimlml in 0 and satisfying f

(O )

n n Remark inf 8EO 8

0 on 0, then for any sequence {On}' a.s. in f (8), it follows that

e ....

80 (a.s. or p).

n n

a

In the context of this thesis a function f (8,w) is called non-random if it does not depend on (J).

Uniform convergence on a compact set is obtained from pointwise convergence and equicontinuity, see Lemma 3.9 below.

Definition 3. 7 (Rudin (1964) p. 144)

A sequence of real functions {f } is equicontinuous on E c IRk if for every n

£: > 0 there exists a 8 > 0 such that If (x) - f (y) I < t: for every n E JN

n n

whenever llx- yll < 8 for x,y E E.

A sufficient condition for equicontinuity is given in

Lemma 3.8

Let {fn} be a sequence of differentiable real functions on a convex set E c 1Rk for which there exists a positive nlllllber c such that

II f '(x) II ~ c

n for all n E IN and all x E E ,

(40)

Proof

Let £ > 0 be arbitrary. For every x,y

€

E,

holds, by application of the mean value theorem (zn is a convex com-bination of x and y). Taking

o

:= £/c gives the result, according to Definition 3.7.

Lemma 3. 9

If E is a compact subset of lRk and {f } is a sequence of equicontinuous

n

functions on E (for sufficiently large n), then fn(x) ~ f(x) for all x

€

E implies fn ~ f uniformly on E.

Proof. Dieudonne (1969) p. 142. See also Rudin (1964) p. 156.

Remark

Conversely, if {f } is a uniformly convergent sequence of continuous

func-n

tions on the compact E, then {f } is equicontinuous on E, see Rudin (1964)

n

p. 144.

For random functions a similar result can be proved.

Definition 3.10a

A sequence of random vector functions {gn(8)} is a.s.

bounded, uniformly

on a set 0 c lRk, if there exists ad > 0 and a null set N c

st

such that for every w(Q-....N an integer n

0 exists, satisfying

llgn(8,w)ll :> d for all n >no and for all

e

€

e

Theorem 3. 1 Ob

Suppose {fn(8)} is a sequence of real, a.s. differentiable (with respect

k

to 8) random functions on a convex and compact set 0 c lR • If {f 1 } is

n

a.s. bounded, uniformly on 0, then

for all

e

€

e '

[J

(41)

implies that

fn a~s. 0 • uniformly on

e .

Proof

B definition of f _y _n(0) a'..f. 0 for all 0 E O, we have

We rewrite the definition of {f '} is a.s. bounded, uniformly one, as

n

By virtue of the mean value theorem, we obtain

Here,

v

3 expresses the a.s. differentiability. Now it follows from the statements above that

If <e',w>I

_n < e:. This we can see by defining

and

u

₀

<e:>

:= {0' E

e

I

He' - e11 <

2rd} 3 ,..,

v

:= {I

v.

i=l l.

Obviously, 0 c U U0(£) holds. Since 0 is compact, it follows that

(42)

k 3_k(finite)3_e

1, ••• ,ekEe 8

c

i~l

ui where U. :=

u

₈ (£). Let e., U., V. and n

0 . correspond to 0, U,

V,

n0,

1 i 1 1 1 ,1

respectively, for i

=

1, ••• ,k in the sense of (*). Then, choosing 0 E {0

1, ••• ,ek} in (*),one obtains

by defining k

v

:=

n v.

i .. 1 1 N : .. _{. _ max} _nO,i 1-1, . . . ,k and

u.

:=

u .

l l

Taking £ = 1/m with m E 1N and using (**), we find

Introduction of W :=

n

V(i), so that Il'(W)

=

1, gives i=l

1 <

-m

i 1, .••• k •

This is the definition of f a:;. O, unifonnly on 0, see Definition 3.4a

n

(for arbitrary £ > 0, take m

=

L_!_ + lj). a £

By introducing uniform Cauchy-sequences, one can extend the result to non-zero random limit functions for f •

n

A third convergence concept, convergence in distribution, is discussed in Chapter 4.

(43)

Lemma 3.11 (Chebyshev's inequality, see Tucker (1967) p. 39)

Let x denote a random variable, with IE lxlt < oo for some t > 0. Then, for every E > 0,

Lemma 3.12 Suppose x,x

1,x2, ••• are random variables. If for every £ > 0,

I

IP(lxn-xl>E)<oo n=l then x n a.s.

....

x •

Lemma 3.13 (Strong law of large numbers, see Tucker (1967) p. 124) Let x

1,x2, ••• be independent random variables with means µ1,µ2, ••• and

. 2 2 f variances o 1 ,o2, •••• I then 00

I

t=l 2 a. 1

-:-2

1 n

I

n i=l < 00 (x.-µ.) 1 1 3. 2. Random noise a.s.

....

0 .

We return to our estimation problem. Recalling the previous section, we assume from now on that the noise vector e, defined in (1.6), is a sequence of random variables on a probability space {Q,F,IP}. Moreover, the noise is supposed to satisfy

(44)

the (s+r)(m+N) scalar components of e are assumed to be independent,

2

var e = a I , with a unknown.

(3.1b)

(3. 1 c)

Condition (3.lb) says that all noise terms are independent, so independence in time and place holds; it implies a diagonal variance-covariance matrix var e.

Generalizations of (3.1c) will be discussed in Section 3.4. The object function defined by (2.4) is a random function now. For any realization of the random vector z, i.e. z(w)

=

n

+ e(w) for any w E

n,

we obtain least

T

squares estimates for the parameters by minimizing JN(G) = z(w) P(G) z(v;) as described in the previous chapter.

For convenience, the factor 1/sN is introduced in the object function:

(3. 2)

The parameter matrix

e,

defined in (1.3), can be seen as a vector in JRµ after rearranging, see (3.6). Here,

µ := ps 2 + (q+l)sr (3.3)

is the total number of unknown scalar parameters.

The unknown true value of the parameter vector is denoted by

e

0•

In order to be able to prove the consistency property for the least squares estimators to be defined below, we make 5 assumptions. Three of them are stated in this section. The first assumption guarantees the existence of a minimizing solution.

The parameter space 8 is a known compact and convex subset of JRµ, con-taining

eo.

The convexity of 8 enables us to use Taylor expansions.

Now, least squares estimators are defined in accordance with the introduc-tion of the lse method in Chapters 1 and 2 (cf. formula (2.4)).

(45)

Definition 3. 14

A sequence

{SN}

is called a (sequence of) least squares estimator(s) for 6₀if eN is a minimizing solution:

(3.4)

As noticed already in the previous chapter, JN is (a.s.) a C00-function with respect to

e.

Hence it is continuous, so the minimization problem is well-defined. Furthermore, due to the compactness of 8 and the continuity of JN on 8, eN can be chosen to be random in the sense of the probability space definition, see Jennrich (1969).

Definition 3.15

The sequence of estimators {eN} is said to be (strongly) aonsistent for the true value parameter vector

eo,

if

e

a.s.

e

N ..,. 0 as N ..,. oo •

In order to prove consistency,

D~

E IRsN, which is the same for both repre-sentations (1.7) and (1.8), is rewritten as

~:

+NJ

D~

=

(H+ K)6

-11n+1

(3.5)

where

e

E IRµ is redefined by

Furthermore, H and Kare sN x µmatrices:

(46)

0

s(N-p) x ps2

0

s(N-q) x (q+1)sr K :=

I @ T

s

T\ii

I s @

T\ii+

T 1-p In (3.7), S,

n

and~ are defined as

{

the N x N shiftZmatrix S

the SN x s2 matrix

and the sN x sr matrix .;

respectively. 0 © ""T "'m+N I @ E; T s m+1 0

It follows from Ds

=

0 for 8 8

0 and (3.5) that I @~ T s m+l-q (3. 8) (3. 9a) (3. 9b) (3.9c)

DI'.;

=

(H+ K) (8- 8₀) (3.10) Hence (3. 11)

The matrix DDT in (3.11) is the same for both choices (1.7) and (1.8) for D, cf. (2.37).

(47)

A usual asslll!lption is the stability of the system. To this extent, we define the polynomial matrices

A(>.)

= -

I +

I

a.. Ai i=1 l. q B(,\)

l

j=O (3.12a) (3. 12b)

If>.. represents the shift-back operator (>..nt nt_

1), (1.1a) can be written as

(3. 13)

2

For all 8 E 0, A(>.) is stable, i.e. det A(>.) has all zeros outside the closed unit disk.

Definition 3. 16

The spectral radius of the matrix$ in (1.10), denoted by r($), is

r($1 :=max {IJ.I; >.Ea($)}

The next lemma is a consequence of Asslll!lptions 3.1 and 3.2.

Lemma 3. 1 7

Proof

r

:=

max r(~) < 1 • 8€0

The characteristic polynomial of $ is

det ,m-p(I ,p_"' ,p-1 _" _{s "} , ) ""1 " • • • - ap-1 " - a.P

(48)

because of the block-companion form of ¢T. For /.. .,, 0,

holds, hence the nonzero eigenvalues of ~ coincide with the reciprocals of zeros of det A(/..). For any fixed 6 E G, r(¢) < 1 is now obvious frcm Assump-tion 3.2. Th~ proof is completed by noticing that r(~) depends continuously on

6

on the compact

G.

Remark

By definition, matrix¢ is s-table i f r(cµ) < 1.

Another consequence of the assumptions is the next lemma, which is impor-tant for the sequel. We introduce matrices A and B (of size sN x sN and sN x rN, respectively) associated with the polynomial matric~s (3.12):

p q 0 k _Sk

_©

_B

A

I

_s

@ ~, B

I

(3.15) k k=O k=O

where S is defined in (3.9a) and a

0 • - I as before.

Lemma 3.18

The matrix AAT is 'bounded' in the sense that there exist some numbers p 1 and p

2 with 0 < p1 <

Pz

< oo, such that

for all 6 E G and N ~ p+1.

Proof

A sketch of the proof for the case s = 1 can be found in Aoki and Yue (1970b), appendix B. We deal with the general case: s is an arbitrary and fixed positive integer.

The stability Assumption 3.2 is used to obtain the 'lower bound' p 1 IsN·

(49)

1. AAT~p

2

I

For any vector xE lRsN

xT AA T x = llAT xll 2

holds. From (3.15) it follows

p k T T AT x - x +

_I

~

₍₁ ₊ p lla.k11)11x11

I

k=l Define p ₂ = max ( 1 +

I

lla.k11)

ee:e

k=1

which exists, since it is defined as the maximum of a continuous function on a compact set.

The _{cons an p}t t '11 b d f' d h that (AAT)-l <_!_I

1 w1 e e 1ne sue = Pl •

For the vector x considered above we have now

It is easy to show that

N-1 k

I

s

@ gk

k=O

(50)

Then min(k,p) g =

l

k i=l k = 1 ,2, ••• ,N-1 • ( N-1 llA-l xii

~

1 +

l

11~11)

llxll • k=l

Define fork= p,p+l, ••• the ps x s matrix

gk gk-1 G _k

=

~-(p-1) We obtain Gk+l = lj!Gk • k p,p+l , •••

where lji is the ps x ps matrix

I 0

( ps -1

with characteristic polynomial pi)! \)

= p~T(\)

= \

det (-A(\ )) for (3.16)

(3.17)

A.

IO,

cf. Chen (1985) p. 49. The spectral radius of ijJ, r(ijJ), is equal to that of$ according to the proof of Lemma 3.17, so Lemma 3.17 implies

r := max r(ijJ) < 1 9€8

r+1 Define y :=

(51)

~ -1 k

J (

zI - 1/J) z dz , k - 0,1,2, ••• ,

c

where C is the circle {z E (:

I

lz I y}. Consequently,

II kH < M k+1

3

_M>O

_{v o}

_{k• ,1, •••}

_"'eEe

_1/1 _~

_Y

_'

where M can be defined as

M max f(z,e)

(z

,e)Ecxe

(3. 18)

(3.19)

since f(z,O) = ll(zI-l)J)-111 is a continuous function of its arguments z and 0 on the compact set C x 0.

Let us return to (3.16), which is rewritten as

since the number p

1 may not depend on N, From (3. 17) G = l)Jk G p+k p k=0,1,2, ••. holds, with llG kll :>

Ml+

1 llG II p+ p

by virtue of (3.19). Via 11~11 ~ llGkll, k ;;: p, this yields

p-1

l

11~11

+ llG llM-fL

k=1 p -y

Therefore

(52)

p-1 2 -1

pl := [max (1 +

L

llgkll + llG II M

1 ~")

] > 0 •

eEe

k=1

P

'

Again, the maximlllll exists since the function involved depends continuously on

e.

Corollary 3. 1 ~

Let p

1 and p2 be the numbers ds!fined in the proof of Lennna 3.18. There exist numbers p

3, p4 such that for all

e

E 8 and N ~ p+l

and a (3.20) hold, with p 2 Proof 'b d I f T T b . . h ' b d' f T .

The oun s or AA +BB are o v1ous, since t e upper oun or BB is obtained analogously to that of AAT. From

(3.21a)

with

0

(sN x ( s+r )m) {3.21b)

y 1 •. • Ym

the second statement follows, which implies immediately (3.20).

Especially (3.20) turns out to be a useful property of the matrix (DDT)-1, whose entries cannot be given in closed-form in general. For later

refer-ence (Chapter 5), another property is stated here already.

(53)

By definition, llMR

00 _{i•l, ...}max _,nlnJ.=l IMiJ" I for any n x n matrix M.

Lemma 3. 20

There exists some positive nl.Bllber p, such that

for all 8 E G and N ~ .;

Proof

Matrix DDT is a block-Toeplitz and band matrix, which is positive definite for all 6 E 0 and N ~ µ/s, see (2.37) and Corollary 3.19. Thereby, it can be interpreted as the covariance matrix

IMA

of an s-variate Moving Average

(with order m) process (up to a scalar):

m

. I

b.

1=0 1

I .

Here, b₀,.

2

.,bm ares x s matrices and {st} is s-variate white noise with

var st= a I, see Anders~n (1971) Section 5.7.1 for the SISO-model. More-over, if BCU

=

lJ=O bj AJ, then det B(A)

F

0 for IAI ;;,; 1, due to the positive definiteness of DDT.

Now, consider the corresponding Autoregressive process

m

I

bi xt-i =st i=O

with covariance matrix I:AR of (x₁, ••• ,,,), then

DDT=~

and

:r;;,_

1 only differ in the components in the two ms x ms submatrices located at both endpoints of the main diagonal. see Wise (1955) or Mentz (1976). Compare

T -1 -1

also Shaman (1975). Therefore, for N >> m, we may approximate (DD )

=

(1MA_) by LAR. According to Brockwell and Davis (1987) p. 85, the Autoregressive process is causal. Therefore (Problem 11.5 p. 444 by the same authors),

if y(O) y(1) I:AR

=

y(N-1)

l<1)

T y (N-1) y (h): s x s y(O)

(54)

then there exist a number£ E (0,1) and a constant K, such that for all i, j, h and 6

I

y .. (h)

I ;;;;

K£h • 1] Consequently, llI:ARll_ ;;;; _~ 2 ___!__ _1-£

= •

_•p • Remark 1

For the special case m plicit expressions for

= s = 1 DDT

'

is tridiagonal. We can give

ex-T -1 ((DD ) ) . . : l.,J i $ j where . x

=

;K (- 1

+

/1-4K

2₎ _and cf. Mentz (1976) p. 430 and (2.37b).

The positive definiteness of DDT for all N implies IKI <

J.

hence lxl < 1.

This yields easily

II (DDT)-111 $

----

8

~----oo _1a 2

1 I ( 1 - x ) ( 1 - l x I)

Remark 2

CJ

The result (DDT)-1

~

-J-

I is a consequence of Lennna 3.20, since

T -1 1 T 1

m?x IA. I ll(DD) 11

00, where :\1. , i = 1, .•• ,sN are the eigenvalues of (DD)- ,

]_ ]_

see Wilkinson (1965) p. 59. However, the proof of Lemma 3.20 relies upon DDT '1: p

1 I.

Next, the input sequence{~.}~

1 is assumed to be bounded, formally:

(55)

Assumption 3. 3 Corollary 3.21 3- V. 2 II 111. II ~ M2 • ~2<00 i=l, , ••• Proof

The proof is similar to that of the well-kno1iill BIBO-stability result, cf. Zadeh and Desoer ( 1963) p. 483.

From the state-space description (1.10), (1.11), we obtain

t 1 '2... • , (3.22)

hence

t-1

~

llh11114>t-lHllK

111 + llhllllbllM1 _i=l

l

11.Pt-l-ill + llB011M1 t=1,2, . • . • Since r < 1 (Lemma 3.17) it follows, as in the proof of Lemma 3.18,

t M/

~>O

Vt=0,1, ••• V8Ee 11¢ II

_'

r+l t -+ 0 Ct -+ oo) where y := -2- < 1 ' so 11.p II and t-1 ll.Pt-1-ill t-2 _11¢ill 00 i

I

l

;':: M

_l

y ₌_1-yM

i=l i=O i=O

(56)

3. 3. Identifiability

For the true but unknown inputs and outputs, collected in'• see (1.7) or

(1. 8), we have

or equivalently, see (3.5),

T\n+Nl

_: ₌_(H+_K)6

0 •

11_m+1

The true value of the parameter vector

e

0 is uniquely determined from ' if and only if (H+K)T(H+K) is regular. A necessary condition for the latter property is N µ/s.

Asymptotically, the identifiability asstnnption below makes sure that

e

0 is uniquely determined.

Assumption 3.4

The µ x µ matrix . HT(DDT)-l sN H converges as

N

~

""•

for all

e

E

e.

The limit matrix, say G, is positive definite on 0.

Now, we can prove the next lemma, which is one of the tools for proving consistency.

Lemma 3.22

IEJN(El) + J(8) as N + 00 , for all 6 E 0 where

Proof

From IE z ' and var z = o 2 I we obtain

IE/ Pz = tr (var zP) + (lEz)T P(IEz) _sNo2 _{+ '}T _{P' ,}

(3.23)

(57)

from a result to be proved in the next chapter, see (4.9). Hence, using (3. 11)

As N + oo, the effect of K vanishes since it has a finite number of nonzero elements, see (3.8):

and, with x := 8 - 8 0,

llKll2 I ₊ ₀ _as_N₊ _oo,

N µ for all 8 E

e

for all 6 E 0, using (3.20), the Cauchy-Schwarz inequality and Assumption 3.4. Then the lemma follows from Assumption 3.4 again. a

Before proving consistency we give an interpretation for the identifiabil-ity Assumption 3.4 for the SISO-case s : r

=

1. It is similar to that given in Aoki and Yue (1970a). Definitions (3.6) and (3.9) reduce to

n

=

1\n+1 1\n+N

=

[f,;~+Nl

and f,; • • f,;m+I

Let us rewrite Ds, which is the same for both representations (1.7) and

{1.8), as

Ds

=

An + Bf,; + v • {3.25)

(58)

-1 _a.1 a. _Bo

s

p q a. _Bq A p B (3.26) a.1 -1 Bo and

rk

v :=

..'.21.

a.

_Sm

t:m m a.1 a _m 81 Sm s1

For 9 = 9₀(notation superscript o) A0n

= -

B0 f, - v0 holds. Therefore, since ASk

=

SkA, k

=

1,2, ••• ,m, (3.7) implies

where w := - [Sv •.•

sPv

0 •.• OJ. Then

with _ := [f, Sf, ••• Sp+qf,] , N x (p+q+1) , (3.28) and 0 -1

-Bo

a.1 F

·=

-8 0 a. -1 (p+q+1) x (p+q+l) • q p (3.29)

-Bo

a.1

-s

a. q p

(59)

HT(DDT)-l H

The effect of w0 in N vanishes, hence

o o T,-1 T 1 From Lemma 3.18 and (3.20) we see that the matrices (A (A) ) and (DD)-can be 'bounded'. Hence, provided the existence of the limits,

if and only if two conditions hold:

(i)

_T _ lim £.__E. > 0 , N-+oo N

(ii) F0 is regular.

We interpret Condition (i) as the input sequence{~.}~

1 being

persistent-1 l'"

Zy

exciting or order p+q.

Condition (ii) is equal to: the polynomials

p A(A) = - 1 +

I

i=I i a. 1' 1 and B(A) q

I

s. ;.)

J=O

J

are relatively prime and B(A) t O, see Wolovich (1974) p. 234-236. The former expresses the controllability of the pair ($,b) in (1.10).

For the MIMO-case, this interpretation seems not to be straightforward. We can obtain

I

s s s s

and define v0 := -(A0

n

+ B0~). Unlike the SISO-case, the effect of v0 and the corresponding w0 cannot be neglected now.

(60)

3. 4. Main result

In this section we prove the strong consistency of any sequence {SN} defined by (3.4). Apart from (3.1), we impose a condition on the noise process.

Assumption 3.5

Let ei denote (scalar) component i of the noise vector e in (1.6), then

4 < M - 3

Let us start proving the useful

Leunna 3.23

As before, let

P

denote 3P/30. for any element 8. of 0. Then, there exists

l l

a number k

1 > 0 such that

and hence

for all 0 € 0 and N ~ µ/s.

Since (for both choices (1.7) and (1.8) for D)

Di}

~

I and (3.30)

hold, the first result follows easily from (3.20) and an evident 'upper bound' for DT D.

(s+r)(m+N) .

(61)

I

x Pxj T • $ llxll l!Pxll • <n:--T ~ >'Kt X X

which proves the second part of the claim.

Mow, the uniform convergence of

m

JN is obtained, referring to convergence as N +

oo,

uniformly with respect to 8 on the compact 0.

Proposition 3.24

IBJN(e) ..,.. J(0) , uniformly.

2 T

First, IBJN = o + i:; Pi:;/sN (see (3.24)) implies

(Pas in Lemma 3.23).

Then, by virtue of Lemma 3.23,

k~

₁ _sN

Due to Assumption 3.3 and Corollary 3.21 we have

0

(3.31)

Hence, a positive constant c exists such that lldde IBJN(8) II $ c for all 8 € 0 and N ~ µ/s. By application of Lemma 3.8, {IEJN};'=µ/s is equicon-tinuous on 0. The pointwise convergence of IEJN (see Lemma 3.22) implies uniform convergence, cf. Lemma 3.9.

Remark

The uniform limit J is continuous now and since 0 is compact, J is even uniformly continuous on 0 (cf. Rudin (1964) p. 78).

We define the random function

0

(62)

Lemma 3.25

LN(S) a::· 0 , for all 8 E 8 .

Proof

From (3.2) and z = r; + e we obtain

T

LN(S) =

.3_

eT P(S)r; + e P(S)e - o2

sN sN

Due to Lemma 4.21a there exist some constants c

1 and c2 such that

and

using Assumption 3.5.

In view of Lemma 3.11, from llPr;ll ::£ llr;ll, (3.31) and (3.34a) it follows

for any £ > 0. Therefore

00

I

IP (

I

;N e T Pr;

I

§:)

< co

No: 1

for all E > 0 •

Lemma 3.12 yields

-3.._ eT P(S)l; a._;. 0 for all 8 E 8 • sN

(3.33)

(3.34a)

(3. 34b)

By virtue of Lemma 3. 11, llPll 2 = tr P sN, and (3. 34b) we obtain for the remaining terms in the right-hand side of (3.33) that

Tp 2 c2M3

IP

(I

e sN e - o

I

~

£) ;;; - -

₂- sN for all

£

> 0 • £

(63)

Due to Corollary 3 in Varberg (1968), P/sN 2; 0 and tr P/sN ~ 1 imply that er P(8)e sN a.s. -+

er

2 for all

e

E 0 • Lemma 3.26

The sequence {I-N} is a.s. bounded, uniformly on

e.

Proof

For a definition of this notion, see Definition 3.10a. If• denotes

ataei

again, then c (3. 35) T• T 3 23

_I

e Pe

I ::;

r.-k _ e e ld . h As a consequence of Lemma • , sN _ >'K 1 sN ho s, wit e Te a. s. s + r 2

'SN ....

-s- er

by application of Lemma 3.13. Here, the condition is satisfied due to Assumption 3.5. Furthermore,

T • T

!

T • 2

!

2k

!

T

!

le

P~I

:£

(~)

(LU)

:£ (-1 (M2+M2))

(~)

sN sN sN s 1 2 sN

using the Cauchy-Schwarz inequality, Lemma 3.23 and (3.31).

(3.36)

Hence, ~ and therefore 111-N II are bounded by an a. s. convergent random variable not depending on

e

with non-random limit. a As an implication of.Lemmas 3.25 and 3.26 we obtain

Proposition 3.27

~(e) a.l. O uniformly.

Application of Theorem 3.10b. See also Ten Vregelaar (1988a). c