On estimating the parameters of a dynamic model from noisy input and output measurements

(1)

On estimating the parameters of a dynamic model from noisy

input and output measurements

Citation for published version (APA):

Vregelaar, ten, J. M. (1988). On estimating the parameters of a dynamic model from noisy input and output measurements. (Memorandum COSOR; Vol. 8802). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/1988

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Department of Mathematics and Computing Science

Memorandum COSOR 88-02

On estimating the parameters of a dynamic model from noisy input and output measurements

by

J.M. ten Vregelaar

Eindhoven University of Technology

Department of Mathematics and Computing Science P.O. Box 513

5600 MB Eindhoven The Netherlands

Eindhoven, January 1988 The Netherlands

(3)

from noisy input and output measurements

Jan ten Vregelaar

Department of Mathematics and Computing Science Eindhoven University of Technology

P.O. Box 513 5600 MB Eindhoven

The Netherlands

ABSTRACf

In this paper an algorithm is provided to compute least squares estimates for the parameters of a dynamic model from noisy measurements of inputs and outputs. Furthermore, we prove the consistency property for the corresponding estimators under some assumptions.

Keywords: Parameter estimation, Dynamic model, Noisy measurements, Least squares esti-mation, Consistency.

(4)

In [5], Eising et al discuss an identification method for a prototype situation of an ARMA-mode1 with noisy measurements on inputs and outputs. In order to obtain estimates for the unknown parameters, they propose to use only object function evaluations which are calcu-lated by inverting some matrix with a lot of "structure". Furthennore it is stated in [5] that the corresponding estimators should be consistent. To our knowledge no proof of consistency has appeared up to now.

For a much more general situation of the model (MIMO system, possible different autoregressive and moving average orders, not assuming zero initial conditions) we present in this paper a new algorithm to calculate both object function and its first derivative evaluations. Furthennore, estimators as minimizing solutions are proved to be consistent under some mild assumptions.

We consider the ARMA model

p q

1\, =

LCX;1\/-i

+

L

~j~'-j , t =m + I,m + 2, . .. ,

i=l j=()

(1.1)

with known orders p and q and m := max(p,q). The inputs ~, and outputs 1\" which may be vectors say of length r and s respectively (MIMO system when r, s > 1), are supposed to be measured with noise

Y, =1\, +£, ,

t = 1,2, ... , m + N . x, = ~, +S, ,

The problem is to estimate the unknown parameter matrix (with sizes x [ps +(q +l)r])

e

= [ala2'"

ap

13o~1··· ~q]

from the set of data {(Xl,Yl),(X2'Y~" . . ,(Xm+N,Ym+N))'

A least squares estimate for

e

is obtained as solution of the minimization problem

m+N

min

L

(IIY, - 1\,f+IIx, - ~,1I2]

,=1

withrespect to

e,

~h . . . ,~m+N'1\h . . . ,1\m+N ,

subject to the model equation (1.1)for t =m +1, ... ,m +N.

(1.2)

(1.3)

(1.4)

The nonn II·nis the Euclidean vector nonn. Inorder to use a compact notation we rewrite (1.1) for t = m +1, ... ,m +N as

D(e) ~=0 ,

where(I denoting the identity matrix)

(5)

I

-I

~l ~

'.' (X"'. , D= -I (Xl • • . (X'" ~O' •• ~'" ~O' • , ~'" (1.5b)

(has size sN x (s +r) (m +N), consists of two block-Toeplitz submatrices; empty space represents zero elements whereas we define (Xl :=0 for k >p, ~l :=0 fork >q), and

fl",+N

~l

Letz and e denote the corresponding vectors of observations and noises such that

z='+e,

is a shorthand notation for (1.2).

Then the minimization problem(1.4) can be written as

min liz -

CII

2 subject toD

C

= 0 . 9.~

For the stationary point

(t,

A) of the Lagrangian

L

(C,

A)=

t

(z -

Cf

(z -

0

+ATD , we obtain ~+DT).=Z D ~=O • hence

A

=

(D+fZ ~= (I - D+D)z •

withD+

=

DT(DDTrl being the Moore-Penrose inverse ofD sinceD is offull row rank.

Therefore (1.7) reduces to (1.5c) (1.6) (1.7) (1.8) (1,9)

As pointed out by Aoki et al [1] for the SISO model, solutions of (1.9) correspond to maximum likelihood estimates of

e

when we assume Gaussian white noise. Algorithms and convergence results are givenin [1] for approximate maximum likelihood methods, whereas in a companion paper [2] they prove convergence results for the true maximum likelihood estima-tors in the special case of no input noise.

(6)

2. Algorithm

Since (1.9) has in general no closed-form solution, we need an iterative algorithm to cal-culate estimates. We propose the Broyden-Fletcher-Goldfarb-Shanno formula (cf. [7], pp. 89-90) which has good numerical properties. It uses object function and gradient evaluations. LetIN denote the object function,

then for any component of its gradientIN ',

jN = 2zTD+iJpi z

(2.1)

(2.2)

holds, since

P

=D+iJpi +piiJT(D+l

(. representing

a:.

for any elementSj from S).

I

Here, both P and pi :=I - P are orthogonal projection matrices. To evaluate IN and IN ' we

prefer performing a Q - R decomposition ofDT rather than invertingDDT. The matrixDT has full columnrank, so there exist orthogonalQ and regularR (sN xsN) such that

QTDT=

[~]

. (2.3)

Because of the special form of DT, R will be lower triangular. IfQ1 is the submatrix of Q

consisting of its firstsN columns, then

holds.

Due to the orthogonality ofQ , p = QIQ?

hence

Furthermore, when A.:= (D~Tz (cf. (1.8» then

DTA.= pz and (2.2) implies

jN = 2'AT

iJ

(z - DTA.)

Premultiplying (2.7) by QIT gives via (2.4) and (2.5)

R'A=QITZ .

Summarizing, when matrixR and vectoru :=Q1TZ (lengthsN) are computed from (2.4) (2.5) (2.6) (2.7) (2.8) (2.9)

(7)

then, from (2.6), (2.8) and (2.9) we obtain

IN = lIull2

and

where A, is easily solved from RA,=u ,

sinceR is lower triangular.

(2.10)

(2.11a)

(2.11b)

(2.l1c)

The matrix Q in (2.10) is not computed explicitly: by means of Householder matrices DT is

transfonned into

[~].

Using the special structure this canbe done very efficiently. For details we refer to [10]. The computation of R takes 0 (N) operations, whereas matrix inversion

o

(N2),cf. [5].

(8)

3. Assumptions

From now on the vector of errors e in (1.6) is assumed to be random with zero mean, all its scalar components are stochastically independent and the covariance matrix of e is vare

=

rrI

where cr is unknown. The latter assumption is discussed later, see also [5].

For convenience, the object function which is random now, is multiplied by the term

~:

(3.1) It willbe suitable to rewrite 9 as a vector of length Il:=ps2+(q +1)sr,

(a.[).! (a.[).s (aD.! 9= (3.2) (a;).s (~J)*! (~~)*s

whereM._i denotes columnj of any matrixM.

We make 5 assumptions which turn out to be sufficient conditions for consistency of any

eN defined by

(3.3) Assumption 1

The set

e

is a known compact subset of RJI., which contains the unknown true parameter

vector 90•

According to Bierens [3], p. 53,

e

is compact and IN is continuous imply that eN defined by (3.3) is indeed a random vector.

Inorder to state the next assumption we introduce the polynomial matrices

p .

A(A.)= -I +

L

ail..'

i=l

ForArepresenting the shift-back operator, (1.1) canbewritten as

A(1..)'111 +B(1..)1;1 = 0, t = m +I,m +2, . .. .

(3.4)

(9)

Assumption 2

Forall

e

E

e,

A ().,) is stable, Le. the zeros of detA ().,) lie outside the closed unit disk.

We associate with (3.4) matrices A and B (of size sN xsN and sN xrN respectively) defined by

(3.6) whereS is theN xN shiftmatrix

and 0 denotes the Kronecker product for matrices. From (1.5b) it follows that

D = [A CI I B C

il

with (3.7) (3.8) (sN x sm matrix) al ... 'am and (sN x rm matrix) . ~l . . . . ~m

Now an important consequence of Assumption2is given by Lemma 1.

The matrix AAT has "uniform bounds": there exist some constants Pl and pz with

o

<PI <Pz<00 suchthat

Pl I ~ AAT~ pzI for all

e

E

e

andN ~ p + 1 (BY definition: MI~ Mz ifxTM1X ~ X TMzx for allx).

(10)

Proof Appendix.

Corollary 1

For all

e

E

e

andN ~ P+ 1,

PI1~ DDT~ P31 holds, with pz~ P3 <00.

Proof

The "lower bound" is obvious from DDT =AA T +BB T+CIC? +

czcl

and how to define P3

is evident from the proof of Lemma 1.

0

Assumption 3

The sequence of true inputs (~dt:1 is bounded:

there exists a constantMI such thatlI~iII~ MI for i = 1,2, ...

As a consequence of the well-known BIBO-stability result Assumptions 2 and 3 imply:

Corollary 2

The sequence of true outputs (1'\i) i';,l is bounded.

Inorder to state the next assumption we rewrite the vectorD~in (1.5) as

[ 1'\m+N

DC

=

(H +K)

e -

:

1'\",+1

where

e

as defined in (3.2) andH and K are sN xf.1 matrices given by H = [(S 0 Is)1'\ ... (SP 0 [s)1'\ I ~ (S 0 Is)~ ... (sq 0 Is)~]

(3.9)

(3. lOa)

(11)

and

The model equation (1.1) holds for the nue value 00, so

DC

=

0 for 0

=

00• Therefore (3.9) implies

DC

= (H +K) (0 - 00) . (3.11)

Assumption 4

Th_{e matrtx}. HT(DDTrlH_sN _{converges as} _N _~ _00, _urn°fi₀rm1_{y on}

e

_. Th_{e urn}°fi_OIm

r '

_{lUllt, say} G ,is positive definite on

e.

As will be seen in the next section this assumption implies a convergence result for IE INCO),

which is one of the tools for proving consistency.

Generalizing Aoki et al [:I], we can give an interpretation for the convergence of H

T

(D~

Ttl

H to a positive definite matrix in the SISO-case s

=

r

=

1. The above defined

O,H,11 and ~ reduce to

o

= [(Xl •• (Xp

Po . . .

~q

f ,

H =[S11 ... SP111 ~ S~ ... sq~] , 11= [11m+N 0 0 • 11m+lf and Defining V=DC-A11-B~ , (1.5) and (3.8) imply

For 0

=

00 (notation sup index 0) AIl.rl

=

-Bo~ - VO holds. Then, using ASk

=

SkA for

k = 1,2,.o,m,

AOH=[-SBO~ " 0 -SPBo~ IAo~ ... SqAo~]+CJ)o

where CJ):=-[Sv ... SPv I 0 ... 0].

(12)

with

and

-1

E= '-I _, _(p ₊_q ₊₁₎_{x (p} ₊_q ₊ _{1) ,}

HT(DDTr1H

The effect ofroO in N vanishes, whence

lim HT(DD Tr1H = (E~T lim ::::.T (A~-T(DDTr1(A~-l::::, EO

N~oo N N-+oo N (3.12)

The matrices (A~-T(A~-l and (DDTrl canbe bounded in the sense of Lemma 1 and its Corol-lary. Therefore, provided the existence of the limits, the limitin the left hand side of (3.12) is positive definite if and only if

(i) (ii) -T-lim ..=...:=:. >0 N-+oo N EOisregular ,

Condition (i) could be a definition of persistency of excitation of order p + q for the input sequence (~L":l (see [1], p, 544), whereas the second condition is equivalent to the statement that the polynomialsA ().)and B ().)in (3.4) arecoprime (see [9], p. 133).

The last assumption requires the fourth moment of the noises tobe uniformly bounded.

Assumption5

Letej denote (scalar) componenti of the noise vectore, There exists

a

constantM2 such that

(13)

4. Consistency

Consistency is obtained by using a result of Bierens [3], p. 54, 65: when the object func-tion converges in some sense unifOImly on a compact set to a continuous limit function which is uniquely minimal in the true value of the parameter vector then any minimizing solution converges in that sense to the true value.

In the sequel uniform convergence refers to convergence with respect to

e

on the compact

e.

Let us start by proving two lemmas.

Lemma 2

IE IN(e) H J(e) (N ~ 00),unifonnly ,

where

Proof

Observe that the object function defined by (3.1) has mean

IE I N

=

0'2+

~ ~Tp~

= 0'2 + (e _

eol

(H +K)T (DDTrl(H +K) (e - eo)

sN

(4.1)

by (3.11). The lemma follows from Assumption 4, sinceK has a finite number of nonzero

ele-~~

0

Lemma 3 var IN ~ 0 (N ~ 00), unifonnly . Proof From we obtain Hence

J < 1 vareTpe 4 var;Tpe 4

_I

Tp _/varYTpe

var N - 2' 2 +""""'2 2 +2' .

'J

vare e

'J

~

s N s N S N2 N 2

by virtue of Cauchy-Schwarz.

(14)

vareTpe ~ k max lEej411PII2

j=I,2,..,(s+r)(m+N)

withk is some constant.

Obviously, var ~TPe = cr2

_11P

_l;1I2_holds.

The orthogonal projection matrixP obeys liP

f

= sN andliP

01

2_~

₁₁₀₁

2 vareTpe

Assumptions 5 and 3 and Corollary 2 imply now that N₂ zero, uniformly, as N ~ 00,proving the lemma.

Remark. As one can see from the proof, Assumption 5 may be weakened to

max lEej4= 0(N) . j=1;2,.. .,(s+r )(m +N)

The functionJ defined by (4.1) is continuous with respect to 9 as it is a uniform limit of continuous functions. Now the convergence of the sequence of object functions JN to I will be

shown.

Proposition

P

IN HI, uniformly ,

(p meaning convergence in probability), Le.

JP(~C I IN(9) - 1(9) I>e) ~

°

asN ~ 00

for any

e

>0. Proof

Since IN and I are continuous on the compact

e,

there exists a sequence 9N with 9NE

e,

such

that

JP(~c I IN(9) - 1(9) I >e)= IP( I _{I N}(9N) - 1(eN) I >e)

IE(IN (eN) - 1(eN)i varIN (eN) +(IE _{IN (eN) - l(eN))2}

<

=-...;.--..,;...;...;.-,....;-;...,;;.;,...;...;.~-- e2 _e2

for any

e

>0, applying Chebyshev's inequality. Lemmas 2 and 3 complete the proof.

0

We are able to give the main result.

Theorem

Under Assumptions 1-5, any least squares estimator

eN

(defined by (3.3)) is (weakly) con-sistent for the true parameter vector

eo,

Le.

(15)

Proof

According to Lemma 3.1.8 (p. 65) in [3] the result is immediate from the Proposition and two properties ofJ: it is continuous and has on 8 a unique minimum in

eo.

which is obvious from

(4.1) and Assumption 4.

0

Remarks.

1. An estimate for the unknown variance

cr

is given by IN

(9

N ): as a consequence of the

A p 2

Theorem and the Proposition_IN(eN) H J(e_o)= cr holds.

2. To discuss the covariance matrix assumption vare =

cr

I, we consider the very special case of no dynamics (p

=

q

=

0) and SISO-model (s

=

r

=

1). The model and measure-ments read 111 = ~~I , YI

=

111 +£1 • XI = ~I +~I , t = 1,2,· .. ,N t = 1,2, ...,N .

MatricesD andP reduce to

D = [-I

1JI1,

r

=

[:~2

[!IJI :,;]

whereas ,

=

~]

and 9

=

~.

When we assume now e.g. vare = diag(cr;l ,crlI) then the object function

1 1

F-TF-IN(~)

=

2

IIY -

~x1I2 has mean IE IN(~)

=

- - 2 [cr;+~2crl+ (~o - ~)2 .2...:i.] which

(1+~)N 1+~ N

:=:w~e

: :

~

=':f;

r

+

~2Gl

+

~

-

~f

G]

by

Assumption 4, where

~o

denotes

For

~o

'#0,

J'(~o)

=

2:~

2 (crl- cr;)'#0 when crl)'#cr_e. Therefore J is not minimal in

~o

(1+ 0)

and consistency is not obtained for any minimizing solution ofIN(~)'

Solari [8] has shown that the corresponding likelihood has no maximum, whence addi-tional information is needed (see also Linssen [6], p. 3-4).

(16)

S. Conclusion

Estimating the unknown parameters of a dynamic model with errors in all variables by means of the least squares method, gives an object function which contains some inverse matrix. We propose a Q - R decomposition to evaluate object function and gradient. Those are used in an iterative procedure in order to obtain estimates.

Furthermore, though in general the object function can not be written as a sum of independent random variables (cf. [2]), we are able to prove weak consistency under some mild conditions on input, system and noise.

In future we will report on results of simulations by applying a computer program based on the algorithm described in section 2 and on a proof for asymptotic normality. Generaliza-tions as contraints on the parameters and partial input noise will have our attention as well.

(17)

Appendix

Proof of Lemma 1

1. AAT ~ P21

For any sN vectorx, xTAAT X =IIATxll2holds. By definition ofA (see(3.6)) we have

p

AT X = -x +

L

[(Sly @ ak T]x , i=1

hence

P

IIATxII~ (l +

L

Ilakl!) IIxll

k=1 Therefore, define

2. AAT ~ PI I

It will be proved that(AATtl ~

.l...

I. PI

Generalizing AoId et al [2] (appendixB), we obtain

xT(AATtlx = IIA-Ix 112

with

N-I

A-I =

L

Sk @ gk

k=O

and thes x s matricesgk satisfy

go=-I min(k.p) gk =

L

ai gA;-j, i=1 Consequently k = 1,2, ... ,N - 1 . N-I

UA-Ixll~ (1 +

L

Ilgkl!) Ilxll

k=l

Define fork

=

p,p + I, ... theps x s matrix

ObViously

(AI)

(18)

Gk+1

=

'I'Gk , k

=

P,p +1, ...

holds, where'I' is the ps xps matrix

_I~I.

<Xz •.. <Xp]

'1'- .

' / 0

with characteristic polynomial

pvCA-) =PVTCA.)= ).,lS det(-A(A-I» for A::1=0

(withA(A) as defined in (3.4), cf. Chen [4], Section 2.3). Let r('I') denote the spectral radius of'1', Le.

r('I')= max {IAI; Aeigenvalue of'I'} , then Assumptions 1 and 2 imply

r :=max r('I') < 1 , ge8

since the nonzero eigenvalues of'I' coincide with the reciprocals of zeros of detA(A) and r('I') depends continuously on

e

on the compact

e.

Define'(=

r;l,

then

r

<'« 1and Cauchy's formula (cf. zadeh et al [12], p. 606) gives

whereC is the circle {z E q:; IzI

='(}.

Therefore,

n~1I ~ M '(hI for allk

=

0,1, . .. and all

e

E

e

_(A3)

whereM:= max

f

(z,e) with

f

(z, e)= lI(z I - 'l'rlll is a continuous function on the compact (z.8)eCxe

set C x 9.

By virtue of (AI), (A2) and (A3) we have now

00

IIA-Ixll~ (1 +

L

IlgklD Ilxli

k=1

with

(19)

gives

(20)

References

[1] M. Aoki and P.e. Yue, On a priori error estimates of some identification methods, IEEE

Trans. Automatic ControlAC-15 (1970) 541-548.

[2] M. Aoki and P.C. Yue, On certain convergence questions in system identification, SIAM

J. Control 8 (2) (1970) 239-256.

[3] H.J. Bierens, Robust Methods and Asymptotic Theory in Nonlinear Econometrics, Lect. Notes Econom. Math. No. 192 (Springer, Berlin, 1981).

[4] H.P. Chen, Recursive Estimation and Control for Stochastic Systems (John Wiley, New York, 1985).

[5] F. Eising, H.N. Linssen and H. Rietbergen, System identification from noisy measure-ments of inputs and outputs, Systems & Control Letters 2 (1983) 348-353.

[6] H.N. Linssen, Functional Relationships and Minimum Sum Estimation, Ph.D. Thesis, Eindhoven University of Technology (1980).

[7] L.E. Scales, Introduction to Non-linear Optimization (MacMillan, London, 1985).

[8] M.E. Solari, The "maximum likelihood solution" of the problem of estimating a linear functional relationship,J. R. Statist. Soc. B 31 (2) (1969) 372-375.

[9] R.M. Staley and P.e. Yue, On system parameter identifiability, Inform. Sciences 2 (1) (1970) 127-138.

[10] J.M. ten Vregelaar, An algorithm for computing estimates for parameters of an ARMA-model from noisy measurements of inputs and outputs, CaSaR-memorandum 87-13, Eindhoven University of Technology (1987).

[11] P. Whittle, Bounds for the moments of linear and quadratic forms in independent vari-ables, Theory Prob. Applications 5 (1960) 302-305.

[12] L.A. zadeh and C.A. Desoer, Linear Systems Theory, the State-Space Approach (McGraw-Hill, New York, 1963).