Parameter estimation in a general state space model from short observation data: A SMC based approach

(1)

PARAMETER ESTIMATION IN A GENERAL STATE SPACE MODEL FROM SHORT

OBSERVATION DATA: A SMC BASED APPROACH

S. Saha, P. K. Mandal, A. Bagchi

Department Of Applied Mathematics

University of Twente

The Netherlands

Y. Boers, H. Driessen

THALES Nederland BV

Haaksbergerstraat 49, 7554 PA

Hengelo, The Netherlands

ABSTRACT

In this article, we propose a SMC based method for estimat-ing the static parameter of a general state space model. The proposed method is based on maximizing the joint likelihood of the observation and unknown state sequence with respect to both the unknown parameters and the unknown state se-quence. This in turn, casts the problem into simultaneous es-timations of state and parameter. We show the efﬁcacy of this method by numerical simulation results.

Index Terms— parameter estimation, sequential Monte Carlo, particle ﬁlter

1. INTRODUCTION

In recent times, starting with Gordon’s seminal paper [1], par-ticle based sequential Monte Carlo method (SMC) has been getting increasing attention for solving nonlinear and/or non Gaussian estimation problems. In this method, the posterior is approximated by a cloud of N weighted particles, whose empirical measure closely approximates the true posterior for large N[2]. Standard SMC algorithms assume the perfect knowledge of the system parameters whereas, in many prac-tical situations, often the model parameters of the dynamic system are not known a priori and their estimation is also of interest. However, it is well known that despite the generality, standard SMC based techniques have limitations to estimate the unknown ﬁxed parameters. Generic solutions using this method for parameter estimation, which are useful for any model are still limited in performance, thus opening up the possibility of further research in this direction.

Among the existing approaches, the classical remedy is to augment the parameter as additional state with artificial dynamics and then taking filtered estimate of the additional state vector as the estimate of parameters. The artificial evo-lution, however, in effect, renders the fixed parameter into a slowly varying one [3]. As a result, the variance of the estimate of the parameter increases with time. Since in this framework, the initial augmented state vector is free of any

This work is supported by a research grant from THALES Nederland B.V.

artiﬁcial dynamics, the authors in [4] consider the marginal smoother of initial augmented state vector and correspond-ing estimate of the additional states are taken as estimated parameters. However, the computational burden with the growing memory requirement is a stumbling block here. An-other proposed scheme is to marginalize the static parameters out of the posterior either analytically [5] or by Monte Carlo procedures [6]. However, such methods are strictly model dependent. The authors in [7] also proposed an on line esti-mation method for static parameters with the assumption that the state space models are stationary and ergodic. Among others, the particle based maximum likelihhod (ML) esti-mator has also been developed recently. In this framework, when the number of parameters are small, one can consider the direct particle approximation of the log-likelihood eval-uated on a grid of values of parameters [8]. However, when the dimension of parameter vector is large, optimizing the log-likelihood through a grid based approximation becomes unwieldy and calls for a more structured and efﬁcient op-timization strategy like gradient based opop-timization or the Expectation-Maximization (EM) (see [9] for references there in). Though, estimate based on ML is asymptotically optimal, particle based implementation is rather complicated and the convergence is known to require a substantial amount of data [10].

In this article, we consider a different approach in deal-ing with non dynamic nature of the unknown parameters, us-ing limited observation data. Here we cast the problem in a joint state estimation and model parameter identiﬁcation framework. In our approach, rather than maximizing the like-lihood of the observed data with respect to the parameters (as done in ML estimation), we maximize the joint likeli-hood of the observation and unobserved state sequence with respect to both the unknown parameters and the unobserved state sequence. This criterion has been ﬁrst considered by [11] for linear-Gaussian case. See also [12],[13] for simi-lar approaches. However, the optimization steps for estimat-ing the joint state sequence for general nonlinear and/or non Gaussian model is not trivial and as a result, similar study in-volving general state space model is missing in the literature.

41 978-1-4244-2710-9/09/$25.00 c 2009 IEEE

(2)

For the special cases, where closed form solutions for opti-mal state sequence can be obtained (for example, when both the state and observation sequences are jointly Gaussian), it is known that the estimate is biased, but for short observa-tion data, this method outperforms the ML estimate in terms of mean-squarred error (MSE) [14]. MSE is a direct mea-sure of estimation error which takes both bias and variance into account and in many cases, biased estimate may result in MSE smaller than Cramer-Rao lower bound (CRLB), which characterizes the smallest achievable variance of any unbi-ased estimator. In fact, biunbi-ased estimation methods are used extensively in different signal processing applications, spe-cially with short/limited observation data and/or low signal to noise ratios (SNRs)[15]. Moreover, an unbiased estimator may not even exist in many cases or unbiasedness require-ment can lead to nonsensical results. Thus, even though ML is an asymptotically efﬁcient estimator (unbiased with mini-mum variance), for short data records, where ML is indeed incapable of achieving its asymptotic optimality, estimator based on minimum MSE may be preferable. To implement our proposed method for a general state space model, the cru-cial step is, as mentioned before, the estimation of the opti-mal joint state sequence, which is in general analytically in-tractable. We approximate this optimal state sequence using a SMC based methods developed in the recent past by [16]. Thus, our contribution extends the existing results to a gen-eral state space problem. We describe the formulations of our approach in the next section.

2. PROBLEM FORMULATION

Consider the discrete time state space model

xt∼ p (xt|xt−1; θ) (1)

yt∼ p (yt|xt; θ) (2)

where at timet, ytcontains the observations andxtis the

un-observed state. θ ∈ Rm _{is a}_{m dimensional unknown static}

parameter vector andp (·|·) is generic conditional probabil-ity densprobabil-ity function (pdf). Now, given a relatively small set of observationsy1:T, our main objective here is to extract the un-knownθ. One conventional way to achieve this is the classical ML criterion, where one maximizes p(y1:T; θ) (also known

as likelihood) with respect toθ to obtain θML, as

arg max

θ p(y1:T; θ) = arg maxθ

p(x0:T, y1:T; θ)dx0:T. (3)

In practice, however, the above marginalization step is in gen-eral, analytically intractable. On the other hand, the joint likelihood of the observations and unobserved state sequence (also known as complete likelihood) is relatively easy to con-struct due to Markovian nature of the model considered :

p(x0:T, y1:T; θ) = p(x0; θ) T

t=1

p(xt|xt−1; θ)p(yt|xt; θ). (4)

Consequently, the complete log-likelihood is then given by log(p(x0:T, y1:T; θ)) = log(p (x0; θ)) + + T t=1 log(p (xt|xt−1; θ)) + T t=1 log(p (yt|xt; θ)). (5)

In this article, we maximize the complete likelihood with re-spect to both the unknown parameter (θ) and the unobserved state sequence (x0:T), rather than maximizing the likelihood

of the observed data with respect to parameters. This leads to dividing the problem into two interconnected sub problems — sub problem A (state estimation) and sub problem B (pa-rameter estimation). The details are described below:

2.1. Sub problem A

Estimation of (smoothed) state assuming the parameter is known : First assume thatθ = θold_{. The state estimation}

problem is then ﬁnding x0:T = arg max x0:T p(x0:T, y1:T; θ old_). ₍₆₎ Since p(x0:T|y1:T; θ) = p(x0:T, y1:T; θ) p(y1:T; θ) (7)

and the denominator in equation (7) is independent ofx0:T,

the state estimation problem in (6) can be cast into the usual Maximum a posteriori (MAP) sequence ofx0:T conditioned

on observedy1:T and (assumed)θold. For a general nonlinear

and/or non Gaussian state space model, this problem is non-trivial. However, one can approximate this MAP sequence us-ing the particle based method developed in [16]. The memory requirement of this MAP sequence algorithm isO(N T ) and the computational complexity isO(N2T ). However, since we are dealing with short data regime, memory requirement or complexity is not a serious issue here.

2.2. Sub problem B

Estimation of the parameter assuming the state is known : Now, given all the observation datay1:T and the estimate of

the state,x0:T ≡ x0:T(y1:T, θold) from sub problem A, one

can obtain a new estimate of θ as θnew= θ = arg max

θ p(x0:T, y1:T; θ). (8)

This maximizing problem can be translated into ﬁnding the zeros of the gradient of the complete log-likelihood. Deﬁne L(θ) log p(x0:T, y1:T; θ). Then θ is a solution to

∇L(θ) = 0 (9)

where∇ is the gradient vector (w.r.t θ). From (5) we get ∇L(θ) = ∇p (x0; θ) p (x0; θ) x0:T y1:T + T t=1 ∇p (xt|xt−1; θ) p (xt|xt−1; θ) x0:T y1:T + T t=1 ∇p (yt|xt; θ) p (yt|xt; θ) x0:T y1:T . (10)

42 2009 IEEE/SP 15th Workshop on Statistical Signal Processing

(3)

Equation (9) and (10) lead to a system of nonlinear equations inθ and θnew_{can be obtained by solving them.}

Thus, starting with an initial valueθ(0), the final solution is obtained by iterating between these two sub problems un-till a pre-specified stopping criterion is reached. The stopping criterion can be reached, for example, when either of the fol-lowing is satisfied:

C1 : for a pre-determined Δ, when θ(k)−θ(k−1)

θ(k−1) ≤ Δ

C2 : when the iteration stage k reaches (pre-deﬁned)

maxi-mum allowable stage (KMax). This signiﬁes that the

iteration does not converge and one has to start afresh with new initial values of parameters.

This is essentialy a batch method of estimating the param-eters. However, estimates obtained through our method us-ing limited observations can be used, for example, as initial values of the parameters in any on line parameter estimation method. This (batch) method provides means to obtain pa-rameter estimates without any addition of artiﬁcial process noise. Therefore, it can be expected that our method will pro-vide more accurate parameter estimates than the augmented particle ﬁlter. A detailed comparison between these two meth-ods (in terms of accuracy as well as computational load) will be the subject of our future work. In the proceeding section, we will rather illustrate our method by means of examples.

3. SIMULATION RESULTS

We demonstrate the proposed method on the benchmark time series model [1] given by

xk = f(xk−1; θ1, θ2) + wk, (11) yk = x 2 k 20+ vk, (12) wheref (xk−1; θ1, θ2) = θ1xk−1 + _1+xθ2xk−12 k−1 + 8 cos(1.2k)

andθ1andθ2are unknown parameters. We assumep(x0) ∼

N (0, 5), wk ∼ N(0, 10) and vk ∼ N(0, 1), which are

mutu-ally independent. Note that here,∂f (.)_∂θ

1 = xk−1and

∂f (.) ∂θ2 =

xk−1

1+x2

k−1. Subsequently, use of (10) reduces (9) to

T k=1 {xk− f(.)}∂f (.) ∂θ1 x0:T = 0 T k=1 {xk− f(.)}∂f (.) ∂θ2 x0:T = 0.

For simulations, true values are taken to beθ∗₁ = 0.5 and θ∗₂= 25 respectively. We use T = 50, N = 500 particles and so called ”Exact Moment Matching (EMM) proposal” [17] . We start withθ(0)₁ = −0.3 and θ₂(0) = 10. Here we perform 40 iterations regardless of stopping criteria and the estimated parameters with different iteration stages are shown in Figure 1 and Figure 2. If working with Δ = 0.005, we would have

stopped at iteration 7. We should note that both the estimates show distinct bias, even with increasing number of iteration. This is already observed in linear-Gaussian case [14]. We fur-ther compare our approach with the augmented state method (ASM), where the variance of the artiﬁcial noise (AN) is re-duced over time as in [18]. We assumeθ2 ﬁxed at its true

value and estimateθ1. Our estimate ofθ1 is similar to the

previous example and we do not include it here. For ASM, the selection of initial variance of AN is not obvious and the convergence of the additional state (parameter) depends heav-ily on this selection. Our method does not have this difﬁculty. For subsequent implementation of ASM, we takep(θ1(0)) ∼

uniform (−0.5, 1.5), initial variance of AN = 1, T = 1000, N = 2000 and state transition density as proposal. As seen from Figure 3, estimate of θ1 using ASM, keeps oscillating

even after large time steps. On the other hand, our method does converge quite rapidly, albeit with a distinct bias. If this bias can somehow be theoretically ascertained, then our method will decidedly give better result.

0 5 10 15 20 25 30 35 40 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 Number of Iteration θ1 True Estimated

Fig. 1. Estimate of θ1w.r.t. iteration stage.

0 5 10 15 20 25 30 35 40 −5 0 5 10 15 20 25 30 35 40 Number of Iteration θ2 True Estimated

Fig. 2. Estimate of θ2w.r.t. iteration stage.

2009 IEEE/SP 15th Workshop on Statistical Signal Processing 43

(4)

4. CONCLUSIONS

In this paper, we present a SMC based method of estimat-ing the static parameter of a general state space model. The method provides simultaneous estimates of unknown states and parameters in batch fashion. We illustrate the perfor-mance of this method through numerical simulations. Com-parisons with other methods as well as rigorous assessment of convergence issues are topics for further research.

0 200 400 600 800 1000 −1 −0.5 0 0.5 1 Time step θ1 True Estimated

Fig. 3. Estimate of θ1using augmented state method.

5. REFERENCES

[1] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state esti-mation,” IEE Proc. F-Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, April 1993.

[2] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle ﬁlters for on-line non-linear/non-Gaussian Bayesian tracking,” IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174–188, 2002.

[3] J. Liu and M. West, “Combined parameter and state esti-mation in simulation-based ﬁltering,” Sequential Monte

Carlo Methods in Practice (A. Doucet, N. D. Freitas and N. Gordon Eds.), Springer, 2001.

[4] S. Saha, P. K. Mandal, and A. Bagchi, “A new approach to particle based smoothed marginal MAP,” in

Proceed-ings of EUSIPCO 2008, 2008.

[5] P. M. Djuric and J. Miguez, “Sequential particle ﬁlter-ing in the presence of aditive Gaussian noise with un-known parameters,” in Proceedings of IEEE ICASSP, 2002, vol. II, pp. 803–806.

[6] G. Storvik, “Particle Filters for State-Space Mod-els With the Presence of Unknown Static Parameters,”

IEEE Transaction on Signal Processing, vol. 50(2), pp.

281–289, February 2002.

[7] C. Andrieu, A. Doucet, and V. B. Tadic, “Online simulation-based methods for parameter estimation in nonlinear non Gaussian state space models,” in

Pro-ceedings of IEEE conference on Decision and Control,

2005.

[8] J. Olsson and T. Ryden, “Asymptotic properties of parti-cle ﬁlter-based maximum likelihood estimators for state space models,” Stochastic Processes and their

Applica-tions, vol. 118(4), pp. 649–680, April 2008.

[9] O. Cappe, S. J. Godsill, and E. Moulines, “An overview of existing methods and recent advances in sequential Monte Carlo,” IEEE Proceedings, vol. 95(5), pp. 899– 924, 2007.

[10] A. Doucet and V. B. Tadic, “Parameter estimation in general state-space models using particle methods,”

Annals of the Institute of Statistical Mathematics, vol.

55(2), pp. 409–422, June 2003.

[11] Y. Bar-Shalom, “Optimal Simultaneous State Estima-tion and Parameter IdentiﬁcaEstima-tion in Linear Discrete-Time Systems,” IEEE Transaction on Automatic

Con-trol, vol. 17(3), pp. 308–319, June 1972.

[12] J. S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech,” IEEE Transaction on Acoustics,

Speech and Signal Processing, vol. 26, pp. 197–210,

May 1978.

[13] A. Bagchi and P. ten Brummelhuis, “Parameter identiﬁ-cation in tidal models with uncertain boundaries,”

Auto-matica, vol. 30(5), pp. 745–759, May 1994.

[14] A. Yeredor, “The joint MAP-ML criterion and its re-lation to ML and to Extended Least-Squares,” IEEE

Transaction on Signal Processing, vol. 48(12), pp.

3484–3492, December 2000.

[15] S. Kay and Y. C. Eldar, “Rethinking Biased Estimation,”

IEEE Signal Processing Magazine, vol. 25(3), pp. 133–

136, May 2008.

[16] S. Godsill, A. Doucet, and M. West, “Maximum a

pos-teriori sequence estimation using Monte Carlo particle

ﬁlters,” Annals of the Institute of Statistical

Mathemat-ics, vol. 53, pp. 82–96, 2001.

[17] S. Saha, P. K. Mandal, Y. Boers, H. Driessen, and A. Bagchi, “Gaussian proposal density using moment matching in SMC methods,” Statistics and Computing, vol. 19(2), pp. 203–208, June 2009.

[18] F. Gustafsson and P. Hriljac, “Particle ﬁlters for system identiﬁcation with application to chaos prediction,” in

Proceedings of SYSID03. Rotterdam, The Netherlands,

2003.

44 2009 IEEE/SP 15th Workshop on Statistical Signal Processing