On estimating the parameters of a dynamic model from noisy
input and output measurements
Citation for published version (APA):
Vregelaar, ten, J. M. (1988). On estimating the parameters of a dynamic model from noisy input and output measurements. (Memorandum COSOR; Vol. 8802). Technische Universiteit Eindhoven.
Document status and date: Published: 01/01/1988
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.
Department of Mathematics and Computing Science
Memorandum COSOR 88-02
On estimating the parameters of a dynamic model from noisy input and output measurements
by
J.M. ten Vregelaar
Eindhoven University of Technology
Department of Mathematics and Computing Science P.O. Box 513
5600 MB Eindhoven The Netherlands
Eindhoven, January 1988 The Netherlands
from noisy input and output measurements
Jan ten Vregelaar
Department of Mathematics and Computing Science Eindhoven University of Technology
P.O. Box 513 5600 MB Eindhoven
The Netherlands
ABSTRACf
In this paper an algorithm is provided to compute least squares estimates for the parameters of a dynamic model from noisy measurements of inputs and outputs. Furthermore, we prove the consistency property for the corresponding estimators under some assumptions.
Keywords: Parameter estimation, Dynamic model, Noisy measurements, Least squares esti-mation, Consistency.
In [5], Eising et al discuss an identification method for a prototype situation of an ARMA-mode1 with noisy measurements on inputs and outputs. In order to obtain estimates for the unknown parameters, they propose to use only object function evaluations which are calcu-lated by inverting some matrix with a lot of "structure". Furthennore it is stated in [5] that the corresponding estimators should be consistent. To our knowledge no proof of consistency has appeared up to now.
For a much more general situation of the model (MIMO system, possible different autoregressive and moving average orders, not assuming zero initial conditions) we present in this paper a new algorithm to calculate both object function and its first derivative evaluations. Furthennore, estimators as minimizing solutions are proved to be consistent under some mild assumptions.
We consider the ARMA model
p q
1\, =
LCX;1\/-i
+L
~j~'-j , t =m + I,m + 2, . .. ,i=l j=()
(1.1)
with known orders p and q and m := max(p,q). The inputs ~, and outputs 1\" which may be vectors say of length r and s respectively (MIMO system when r, s > 1), are supposed to be measured with noise
Y, =1\, +£, ,
t = 1,2, ... , m + N . x, = ~, +S, ,
The problem is to estimate the unknown parameter matrix (with sizes x [ps +(q +l)r])
e
= [ala2'"ap
13o~1··· ~q]from the set of data {(Xl,Yl),(X2'Y~" . . ,(Xm+N,Ym+N))'
A least squares estimate for
e
is obtained as solution of the minimization problemm+N
min
L
(IIY, - 1\,f+IIx, - ~,1I2],=1
withrespect to
e,
~h . . . ,~m+N'1\h . . . ,1\m+N ,subject to the model equation (1.1)for t =m +1, ... ,m +N.
(1.2)
(1.3)
(1.4)
The nonn II·nis the Euclidean vector nonn. Inorder to use a compact notation we rewrite (1.1) for t = m +1, ... ,m +N as
D(e) ~=0 ,
where(I denoting the identity matrix)
I
-I~l ~
'.' (X"'. , D= -I (Xl • • . (X'" ~O' •• ~'" ~O' • , ~'" (1.5b)(has size sN x (s +r) (m +N), consists of two block-Toeplitz submatrices; empty space represents zero elements whereas we define (Xl :=0 for k >p, ~l :=0 fork >q), and
fl",+N
~l
Letz and e denote the corresponding vectors of observations and noises such that
z='+e,
is a shorthand notation for (1.2).
Then the minimization problem(1.4) can be written as
min liz -
CII
2 subject toDC
= 0 . 9.~For the stationary point
(t,
A) of the LagrangianL
(C,
A)=t
(z -Cf
(z -0
+ATD , we obtain ~+DT).=Z D ~=O • henceA
=
(D+fZ ~= (I - D+D)z •withD+
=
DT(DDTrl being the Moore-Penrose inverse ofD sinceD is offull row rank.Therefore (1.7) reduces to (1.5c) (1.6) (1.7) (1.8) (1,9)
As pointed out by Aoki et al [1] for the SISO model, solutions of (1.9) correspond to maximum likelihood estimates of
e
when we assume Gaussian white noise. Algorithms and convergence results are givenin [1] for approximate maximum likelihood methods, whereas in a companion paper [2] they prove convergence results for the true maximum likelihood estima-tors in the special case of no input noise.2. Algorithm
Since (1.9) has in general no closed-form solution, we need an iterative algorithm to cal-culate estimates. We propose the Broyden-Fletcher-Goldfarb-Shanno formula (cf. [7], pp. 89-90) which has good numerical properties. It uses object function and gradient evaluations. LetIN denote the object function,
then for any component of its gradientIN ',
jN = 2zTD+iJpi z
(2.1)
(2.2)
holds, since
P
=D+iJpi +piiJT(D+l(. representing
a:.
for any elementSj from S).I
Here, both P and pi :=I - P are orthogonal projection matrices. To evaluate IN and IN ' we
prefer performing a Q - R decomposition ofDT rather than invertingDDT. The matrixDT has full columnrank, so there exist orthogonalQ and regularR (sN xsN) such that
QTDT=
[~]
. (2.3)Because of the special form of DT, R will be lower triangular. IfQ1 is the submatrix of Q
consisting of its firstsN columns, then
holds.
Due to the orthogonality ofQ , p = QIQ?
hence
Furthermore, when A.:= (D~Tz (cf. (1.8» then
DTA.= pz and (2.2) implies
jN = 2'AT
iJ
(z - DTA.)Premultiplying (2.7) by QIT gives via (2.4) and (2.5)
R'A=QITZ .
Summarizing, when matrixR and vectoru :=Q1TZ (lengthsN) are computed from (2.4) (2.5) (2.6) (2.7) (2.8) (2.9)
then, from (2.6), (2.8) and (2.9) we obtain
IN = lIull2
and
where A, is easily solved from RA,=u ,
sinceR is lower triangular.
(2.10)
(2.11a)
(2.11b)
(2.l1c)
The matrix Q in (2.10) is not computed explicitly: by means of Householder matrices DT is
transfonned into
[~].
Using the special structure this canbe done very efficiently. For details we refer to [10]. The computation of R takes 0 (N) operations, whereas matrix inversiono
(N2),cf. [5].3. Assumptions
From now on the vector of errors e in (1.6) is assumed to be random with zero mean, all its scalar components are stochastically independent and the covariance matrix of e is vare
=
rrI
where cr is unknown. The latter assumption is discussed later, see also [5].For convenience, the object function which is random now, is multiplied by the term
~:
(3.1) It willbe suitable to rewrite 9 as a vector of length Il:=ps2+(q +1)sr,(a.[).! (a.[).s (aD.! 9= (3.2) (a;).s (~J)*! (~~)*s
whereM.i denotes columnj of any matrixM.
We make 5 assumptions which turn out to be sufficient conditions for consistency of any
eN defined by
(3.3) Assumption 1
The set
e
is a known compact subset of RJI., which contains the unknown true parametervector 90•
According to Bierens [3], p. 53,
e
is compact and IN is continuous imply that eN defined by (3.3) is indeed a random vector.Inorder to state the next assumption we introduce the polynomial matrices
p .
A(A.)= -I +
L
ail..'i=l
ForArepresenting the shift-back operator, (1.1) canbewritten as
A(1..)'111 +B(1..)1;1 = 0, t = m +I,m +2, . .. .
(3.4)
Assumption 2
Forall
e
Ee,
A ().,) is stable, Le. the zeros of detA ().,) lie outside the closed unit disk.We associate with (3.4) matrices A and B (of size sN xsN and sN xrN respectively) defined by
(3.6) whereS is theN xN shiftmatrix
and 0 denotes the Kronecker product for matrices. From (1.5b) it follows that
D = [A CI I B C
il
with (3.7) (3.8) (sN x sm matrix) al ... 'am and (sN x rm matrix) . ~l . . . . ~mNow an important consequence of Assumption2is given by Lemma 1.
The matrix AAT has "uniform bounds": there exist some constants Pl and pz with
o
<PI <Pz<00 suchthatPl I ~ AAT~ pzI for all
e
Ee
andN ~ p + 1 (BY definition: MI~ Mz ifxTM1X ~ X TMzx for allx).Proof Appendix.
Corollary 1
For all
e
Ee
andN ~ P+ 1,PI1~ DDT~ P31 holds, with pz~ P3 <00.
Proof
The "lower bound" is obvious from DDT =AA T +BB T+CIC? +
czcl
and how to define P3is evident from the proof of Lemma 1.
0
Assumption 3
The sequence of true inputs (~dt:1 is bounded:
there exists a constantMI such thatlI~iII~ MI for i = 1,2, ...
As a consequence of the well-known BIBO-stability result Assumptions 2 and 3 imply:
Corollary 2
The sequence of true outputs (1'\i) i';,l is bounded.
Inorder to state the next assumption we rewrite the vectorD~in (1.5) as
[ 1'\m+N
DC
=
(H +K)e -
:
1'\",+1where
e
as defined in (3.2) andH and K are sN xf.1 matrices given by H = [(S 0 Is)1'\ ... (SP 0 [s)1'\ I ~ (S 0 Is)~ ... (sq 0 Is)~](3.9)
(3. lOa)
and
The model equation (1.1) holds for the nue value 00, so
DC
=
0 for 0=
00• Therefore (3.9) impliesDC
= (H +K) (0 - 00) . (3.11)Assumption 4
The matrtx. HT(DDTrlHsN converges as N ~ 00, urn°fi0rm1y on
e
. The urn°fiOImr '
lUllt, say G ,is positive definite one.
As will be seen in the next section this assumption implies a convergence result for IE INCO),
which is one of the tools for proving consistency.
Generalizing Aoki et al [:I], we can give an interpretation for the convergence of H
T
(D~
Ttl
H to a positive definite matrix in the SISO-case s=
r=
1. The above definedO,H,11 and ~ reduce to
o
= [(Xl •• (XpPo . . .
~qf ,
H =[S11 ... SP111 ~ S~ ... sq~] , 11= [11m+N 0 0 • 11m+lf and Defining V=DC-A11-B~ , (1.5) and (3.8) implyFor 0
=
00 (notation sup index 0) AIl.rl=
-Bo~ - VO holds. Then, using ASk=
SkA fork = 1,2,.o,m,
AOH=[-SBO~ " 0 -SPBo~ IAo~ ... SqAo~]+CJ)o
where CJ):=-[Sv ... SPv I 0 ... 0].
with
and
-1
E= '-I , (p +q +1)x (p +q + 1) ,
HT(DDTr1H
The effect ofroO in N vanishes, whence
lim HT(DD Tr1H = (E~T lim ::::.T (A~-T(DDTr1(A~-l::::, EO
N~oo N N-+oo N (3.12)
The matrices (A~-T(A~-l and (DDTrl canbe bounded in the sense of Lemma 1 and its Corol-lary. Therefore, provided the existence of the limits, the limitin the left hand side of (3.12) is positive definite if and only if
(i) (ii) -T-lim ..=...:=:. >0 N-+oo N EOisregular ,
Condition (i) could be a definition of persistency of excitation of order p + q for the input sequence (~L":l (see [1], p, 544), whereas the second condition is equivalent to the statement that the polynomialsA ().)and B ().)in (3.4) arecoprime (see [9], p. 133).
The last assumption requires the fourth moment of the noises tobe uniformly bounded.
Assumption5
Letej denote (scalar) componenti of the noise vectore, There exists
a
constantM2 such that4. Consistency
Consistency is obtained by using a result of Bierens [3], p. 54, 65: when the object func-tion converges in some sense unifOImly on a compact set to a continuous limit function which is uniquely minimal in the true value of the parameter vector then any minimizing solution converges in that sense to the true value.
In the sequel uniform convergence refers to convergence with respect to
e
on the compacte.
Let us start by proving two lemmas.Lemma 2
IE IN(e) H J(e) (N ~ 00),unifonnly ,
where
Proof
Observe that the object function defined by (3.1) has mean
IE I N
=
0'2+~ ~Tp~
= 0'2 + (e _
eol
(H +K)T (DDTrl(H +K) (e - eo)sN
(4.1)
by (3.11). The lemma follows from Assumption 4, sinceK has a finite number of nonzero
ele-~~
0
Lemma 3 var IN ~ 0 (N ~ 00), unifonnly . Proof From we obtain HenceJ < 1 vareTpe 4 var;Tpe 4
_I
Tp _/varYTpevar N - 2' 2 +""""'2 2 +2' .
'J
vare e'J
~s N s N S N2 N 2
by virtue of Cauchy-Schwarz.
vareTpe ~ k max lEej411PII2
j=I,2,..,(s+r)(m+N)
withk is some constant.
Obviously, var ~TPe = cr2
11P
l;1I2holds.The orthogonal projection matrixP obeys liP
f
= sN andliP01
2~1101
2 vareTpeAssumptions 5 and 3 and Corollary 2 imply now that N2 zero, uniformly, as N ~ 00,proving the lemma.
Remark. As one can see from the proof, Assumption 5 may be weakened to
max lEej4= 0(N) . j=1;2,.. .,(s+r )(m +N)
The functionJ defined by (4.1) is continuous with respect to 9 as it is a uniform limit of continuous functions. Now the convergence of the sequence of object functions JN to I will be
shown.
Proposition
P
IN HI, uniformly ,
(p meaning convergence in probability), Le.
JP(~C I IN(9) - 1(9) I>e) ~
°
asN ~ 00for any
e
>0. ProofSince IN and I are continuous on the compact
e,
there exists a sequence 9N with 9NEe,
suchthat
JP(~c I IN(9) - 1(9) I >e)= IP( I I N(9N) - 1(eN) I >e)
IE(IN (eN) - 1(eN)i varIN (eN) +(IE IN (eN) - l(eN))2
<
=-...;.--..,;...;...;.-,....;-;...,;;.;,...;...;.~-- e2 e2
for any
e
>0, applying Chebyshev's inequality. Lemmas 2 and 3 complete the proof.0
We are able to give the main result.
Theorem
Under Assumptions 1-5, any least squares estimator
eN
(defined by (3.3)) is (weakly) con-sistent for the true parameter vectoreo,
Le.Proof
According to Lemma 3.1.8 (p. 65) in [3] the result is immediate from the Proposition and two properties ofJ: it is continuous and has on 8 a unique minimum in
eo.
which is obvious from(4.1) and Assumption 4.
0
Remarks.
1. An estimate for the unknown variance
cr
is given by IN(9
N ): as a consequence of theA p 2
Theorem and the PropositionIN(eN) H J(eo)= cr holds.
2. To discuss the covariance matrix assumption vare =
cr
I, we consider the very special case of no dynamics (p=
q=
0) and SISO-model (s=
r=
1). The model and measure-ments read 111 = ~~I , YI=
111 +£1 • XI = ~I +~I , t = 1,2,· .. ,N t = 1,2, ...,N .MatricesD andP reduce to
D = [-I
1JI1,
r
=[:~2
[!IJI :,;]
whereas ,
=~]
and 9
=~.
When we assume now e.g. vare = diag(cr;l ,crlI) then the object function
1 1
F-TF-IN(~)
=
2IIY -
~x1I2 has mean IE IN(~)=
- - 2 [cr;+~2crl+ (~o - ~)2 .2...:i.] which(1+~)N 1+~ N
:=:w~e
: :
~
=':f;
r
+~2Gl
+~
-~f
G]
byAssumption 4, where
~o
denotes
For
~o
'#0,J'(~o)
=2:~
2 (crl- cr;)'#0 when crl)'#cre. Therefore J is not minimal in~o
(1+ 0)
and consistency is not obtained for any minimizing solution ofIN(~)'
Solari [8] has shown that the corresponding likelihood has no maximum, whence addi-tional information is needed (see also Linssen [6], p. 3-4).
S. Conclusion
Estimating the unknown parameters of a dynamic model with errors in all variables by means of the least squares method, gives an object function which contains some inverse matrix. We propose a Q - R decomposition to evaluate object function and gradient. Those are used in an iterative procedure in order to obtain estimates.
Furthermore, though in general the object function can not be written as a sum of independent random variables (cf. [2]), we are able to prove weak consistency under some mild conditions on input, system and noise.
In future we will report on results of simulations by applying a computer program based on the algorithm described in section 2 and on a proof for asymptotic normality. Generaliza-tions as contraints on the parameters and partial input noise will have our attention as well.
Appendix
Proof of Lemma 1
1. AAT ~ P21
For any sN vectorx, xTAAT X =IIATxll2holds. By definition ofA (see(3.6)) we have
p
AT X = -x +
L
[(Sly @ ak T]x , i=1hence
P
IIATxII~ (l +
L
Ilakl!) IIxllk=1 Therefore, define
2. AAT ~ PI I
It will be proved that(AATtl ~
.l...
I. PIGeneralizing AoId et al [2] (appendixB), we obtain
xT(AATtlx = IIA-Ix 112
with
N-I
A-I =
L
Sk @ gkk=O
and thes x s matricesgk satisfy
go=-I min(k.p) gk =
L
ai gA;-j, i=1 Consequently k = 1,2, ... ,N - 1 . N-IUA-Ixll~ (1 +
L
Ilgkl!) Ilxllk=l
Define fork
=
p,p + I, ... theps x s matrixObViously
(AI)
Gk+1
=
'I'Gk , k=
P,p +1, ...holds, where'I' is the ps xps matrix
_I~I.
<Xz •.. <Xp]'1'- .
' / 0
with characteristic polynomial
pvCA-) =PVTCA.)= ).,lS det(-A(A-I» for A::1=0
(withA(A) as defined in (3.4), cf. Chen [4], Section 2.3). Let r('I') denote the spectral radius of'1', Le.
r('I')= max {IAI; Aeigenvalue of'I'} , then Assumptions 1 and 2 imply
r :=max r('I') < 1 , ge8
since the nonzero eigenvalues of'I' coincide with the reciprocals of zeros of detA(A) and r('I') depends continuously on
e
on the compacte.
Define'(=
r;l,
thenr
<'« 1and Cauchy's formula (cf. zadeh et al [12], p. 606) giveswhereC is the circle {z E q:; IzI
='(}.
Therefore,
n~1I ~ M '(hI for allk
=
0,1, . .. and alle
Ee
(A3)whereM:= max
f
(z,e) withf
(z, e)= lI(z I - 'l'rlll is a continuous function on the compact (z.8)eCxeset C x 9.
By virtue of (AI), (A2) and (A3) we have now
00
IIA-Ixll~ (1 +
L
IlgklD Ilxlik=1
with
gives
References
[1] M. Aoki and P.e. Yue, On a priori error estimates of some identification methods, IEEE
Trans. Automatic ControlAC-15 (1970) 541-548.
[2] M. Aoki and P.C. Yue, On certain convergence questions in system identification, SIAM
J. Control 8 (2) (1970) 239-256.
[3] H.J. Bierens, Robust Methods and Asymptotic Theory in Nonlinear Econometrics, Lect. Notes Econom. Math. No. 192 (Springer, Berlin, 1981).
[4] H.P. Chen, Recursive Estimation and Control for Stochastic Systems (John Wiley, New York, 1985).
[5] F. Eising, H.N. Linssen and H. Rietbergen, System identification from noisy measure-ments of inputs and outputs, Systems & Control Letters 2 (1983) 348-353.
[6] H.N. Linssen, Functional Relationships and Minimum Sum Estimation, Ph.D. Thesis, Eindhoven University of Technology (1980).
[7] L.E. Scales, Introduction to Non-linear Optimization (MacMillan, London, 1985).
[8] M.E. Solari, The "maximum likelihood solution" of the problem of estimating a linear functional relationship,J. R. Statist. Soc. B 31 (2) (1969) 372-375.
[9] R.M. Staley and P.e. Yue, On system parameter identifiability, Inform. Sciences 2 (1) (1970) 127-138.
[10] J.M. ten Vregelaar, An algorithm for computing estimates for parameters of an ARMA-model from noisy measurements of inputs and outputs, CaSaR-memorandum 87-13, Eindhoven University of Technology (1987).
[11] P. Whittle, Bounds for the moments of linear and quadratic forms in independent vari-ables, Theory Prob. Applications 5 (1960) 302-305.
[12] L.A. zadeh and C.A. Desoer, Linear Systems Theory, the State-Space Approach (McGraw-Hill, New York, 1963).