• No results found

Parameter estimation in a general state space model from short observation data: A SMC based approach

N/A
N/A
Protected

Academic year: 2021

Share "Parameter estimation in a general state space model from short observation data: A SMC based approach"

Copied!
4
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

PARAMETER ESTIMATION IN A GENERAL STATE SPACE MODEL FROM SHORT

OBSERVATION DATA: A SMC BASED APPROACH

S. Saha, P. K. Mandal, A. Bagchi

Department Of Applied Mathematics

University of Twente

The Netherlands

Y. Boers, H. Driessen

THALES Nederland BV

Haaksbergerstraat 49, 7554 PA

Hengelo, The Netherlands

ABSTRACT

In this article, we propose a SMC based method for estimat-ing the static parameter of a general state space model. The proposed method is based on maximizing the joint likelihood of the observation and unknown state sequence with respect to both the unknown parameters and the unknown state se-quence. This in turn, casts the problem into simultaneous es-timations of state and parameter. We show the efficacy of this method by numerical simulation results.

Index Terms— parameter estimation, sequential Monte Carlo, particle filter

1. INTRODUCTION

In recent times, starting with Gordon’s seminal paper [1], par-ticle based sequential Monte Carlo method (SMC) has been getting increasing attention for solving nonlinear and/or non Gaussian estimation problems. In this method, the posterior is approximated by a cloud of N weighted particles, whose empirical measure closely approximates the true posterior for large N[2]. Standard SMC algorithms assume the perfect knowledge of the system parameters whereas, in many prac-tical situations, often the model parameters of the dynamic system are not known a priori and their estimation is also of interest. However, it is well known that despite the generality, standard SMC based techniques have limitations to estimate the unknown fixed parameters. Generic solutions using this method for parameter estimation, which are useful for any model are still limited in performance, thus opening up the possibility of further research in this direction.

Among the existing approaches, the classical remedy is to augment the parameter as additional state with artificial dynamics and then taking filtered estimate of the additional state vector as the estimate of parameters. The artificial evo-lution, however, in effect, renders the fixed parameter into a slowly varying one [3]. As a result, the variance of the estimate of the parameter increases with time. Since in this framework, the initial augmented state vector is free of any

This work is supported by a research grant from THALES Nederland B.V.

artificial dynamics, the authors in [4] consider the marginal smoother of initial augmented state vector and correspond-ing estimate of the additional states are taken as estimated parameters. However, the computational burden with the growing memory requirement is a stumbling block here. An-other proposed scheme is to marginalize the static parameters out of the posterior either analytically [5] or by Monte Carlo procedures [6]. However, such methods are strictly model dependent. The authors in [7] also proposed an on line esti-mation method for static parameters with the assumption that the state space models are stationary and ergodic. Among others, the particle based maximum likelihhod (ML) esti-mator has also been developed recently. In this framework, when the number of parameters are small, one can consider the direct particle approximation of the log-likelihood eval-uated on a grid of values of parameters [8]. However, when the dimension of parameter vector is large, optimizing the log-likelihood through a grid based approximation becomes unwieldy and calls for a more structured and efficient op-timization strategy like gradient based opop-timization or the Expectation-Maximization (EM) (see [9] for references there in). Though, estimate based on ML is asymptotically optimal, particle based implementation is rather complicated and the convergence is known to require a substantial amount of data [10].

In this article, we consider a different approach in deal-ing with non dynamic nature of the unknown parameters, us-ing limited observation data. Here we cast the problem in a joint state estimation and model parameter identification framework. In our approach, rather than maximizing the like-lihood of the observed data with respect to the parameters (as done in ML estimation), we maximize the joint likeli-hood of the observation and unobserved state sequence with respect to both the unknown parameters and the unobserved state sequence. This criterion has been first considered by [11] for linear-Gaussian case. See also [12],[13] for simi-lar approaches. However, the optimization steps for estimat-ing the joint state sequence for general nonlinear and/or non Gaussian model is not trivial and as a result, similar study in-volving general state space model is missing in the literature.

41 978-1-4244-2710-9/09/$25.00 c 2009 IEEE

(2)

For the special cases, where closed form solutions for opti-mal state sequence can be obtained (for example, when both the state and observation sequences are jointly Gaussian), it is known that the estimate is biased, but for short observa-tion data, this method outperforms the ML estimate in terms of mean-squarred error (MSE) [14]. MSE is a direct mea-sure of estimation error which takes both bias and variance into account and in many cases, biased estimate may result in MSE smaller than Cramer-Rao lower bound (CRLB), which characterizes the smallest achievable variance of any unbi-ased estimator. In fact, biunbi-ased estimation methods are used extensively in different signal processing applications, spe-cially with short/limited observation data and/or low signal to noise ratios (SNRs)[15]. Moreover, an unbiased estimator may not even exist in many cases or unbiasedness require-ment can lead to nonsensical results. Thus, even though ML is an asymptotically efficient estimator (unbiased with mini-mum variance), for short data records, where ML is indeed incapable of achieving its asymptotic optimality, estimator based on minimum MSE may be preferable. To implement our proposed method for a general state space model, the cru-cial step is, as mentioned before, the estimation of the opti-mal joint state sequence, which is in general analytically in-tractable. We approximate this optimal state sequence using a SMC based methods developed in the recent past by [16]. Thus, our contribution extends the existing results to a gen-eral state space problem. We describe the formulations of our approach in the next section.

2. PROBLEM FORMULATION

Consider the discrete time state space model

xt∼ p (xt|xt−1; θ) (1)

yt∼ p (yt|xt; θ) (2)

where at timet, ytcontains the observations andxtis the

un-observed state. θ ∈ Rm is am dimensional unknown static

parameter vector andp (·|·) is generic conditional probabil-ity densprobabil-ity function (pdf). Now, given a relatively small set of observationsy1:T, our main objective here is to extract the un-knownθ. One conventional way to achieve this is the classical ML criterion, where one maximizes p(y1:T; θ) (also known

as likelihood) with respect toθ to obtain θML, as

arg max

θ p(y1:T; θ) = arg maxθ



p(x0:T, y1:T; θ)dx0:T. (3)

In practice, however, the above marginalization step is in gen-eral, analytically intractable. On the other hand, the joint likelihood of the observations and unobserved state sequence (also known as complete likelihood) is relatively easy to con-struct due to Markovian nature of the model considered :

p(x0:T, y1:T; θ) = p(x0; θ) T



t=1

p(xt|xt−1; θ)p(yt|xt; θ). (4)

Consequently, the complete log-likelihood is then given by log(p(x0:T, y1:T; θ)) = log(p (x0; θ)) + + T  t=1 log(p (xt|xt−1; θ)) + T  t=1 log(p (yt|xt; θ)). (5)

In this article, we maximize the complete likelihood with re-spect to both the unknown parameter (θ) and the unobserved state sequence (x0:T), rather than maximizing the likelihood

of the observed data with respect to parameters. This leads to dividing the problem into two interconnected sub problems — sub problem A (state estimation) and sub problem B (pa-rameter estimation). The details are described below:

2.1. Sub problem A

Estimation of (smoothed) state assuming the parameter is known : First assume thatθ = θold. The state estimation

problem is then finding x0:T = arg max x0:T p(x0:T, y1:T; θ old). (6) Since p(x0:T|y1:T; θ) = p(x0:T, y1:T; θ) p(y1:T; θ) (7)

and the denominator in equation (7) is independent ofx0:T,

the state estimation problem in (6) can be cast into the usual Maximum a posteriori (MAP) sequence ofx0:T conditioned

on observedy1:T and (assumed)θold. For a general nonlinear

and/or non Gaussian state space model, this problem is non-trivial. However, one can approximate this MAP sequence us-ing the particle based method developed in [16]. The memory requirement of this MAP sequence algorithm isO(N T ) and the computational complexity isO(N2T ). However, since we are dealing with short data regime, memory requirement or complexity is not a serious issue here.

2.2. Sub problem B

Estimation of the parameter assuming the state is known : Now, given all the observation datay1:T and the estimate of

the state,x0:T ≡ x0:T(y1:T, θold) from sub problem A, one

can obtain a new estimate of θ as θnew= θ = arg max

θ p(x0:T, y1:T; θ). (8)

This maximizing problem can be translated into finding the zeros of the gradient of the complete log-likelihood. Define L(θ)  log p(x0:T, y1:T; θ). Then θ is a solution to

∇L(θ) = 0 (9)

where∇ is the gradient vector (w.r.t θ). From (5) we get ∇L(θ) =  ∇p (x0; θ) p (x0; θ)   x0:T y1:T + T  t=1  ∇p (xt|xt−1; θ) p (xt|xt−1; θ)   x0:T y1:T + T  t=1  ∇p (yt|xt; θ) p (yt|xt; θ)   x0:T y1:T . (10)

42 2009 IEEE/SP 15th Workshop on Statistical Signal Processing

(3)

Equation (9) and (10) lead to a system of nonlinear equations inθ and θnewcan be obtained by solving them.

Thus, starting with an initial valueθ(0), the final solution is obtained by iterating between these two sub problems un-till a pre-specified stopping criterion is reached. The stopping criterion can be reached, for example, when either of the fol-lowing is satisfied:

C1 : for a pre-determined Δ, when θ(k)−θ(k−1)

θ(k−1) ≤ Δ

C2 : when the iteration stage k reaches (pre-defined)

maxi-mum allowable stage (KMax). This signifies that the

iteration does not converge and one has to start afresh with new initial values of parameters.

This is essentialy a batch method of estimating the param-eters. However, estimates obtained through our method us-ing limited observations can be used, for example, as initial values of the parameters in any on line parameter estimation method. This (batch) method provides means to obtain pa-rameter estimates without any addition of artificial process noise. Therefore, it can be expected that our method will pro-vide more accurate parameter estimates than the augmented particle filter. A detailed comparison between these two meth-ods (in terms of accuracy as well as computational load) will be the subject of our future work. In the proceeding section, we will rather illustrate our method by means of examples.

3. SIMULATION RESULTS

We demonstrate the proposed method on the benchmark time series model [1] given by

xk = f(xk−1; θ1, θ2) + wk, (11) yk = x 2 k 20+ vk, (12) wheref (xk−1; θ1, θ2) = θ1xk−1 + 1+xθ2xk−12 k−1 + 8 cos(1.2k)

andθ1andθ2are unknown parameters. We assumep(x0) ∼

N (0, 5), wk ∼ N(0, 10) and vk ∼ N(0, 1), which are

mutu-ally independent. Note that here,∂f (.)∂θ

1 = xk−1and

∂f (.) ∂θ2 =

xk−1

1+x2

k−1. Subsequently, use of (10) reduces (9) to

T  k=1  {xk− f(.)}∂f (.) ∂θ1  x0:T = 0 T  k=1  {xk− f(.)}∂f (.) ∂θ2  x0:T = 0.

For simulations, true values are taken to beθ∗1 = 0.5 and θ∗2= 25 respectively. We use T = 50, N = 500 particles and so called ”Exact Moment Matching (EMM) proposal” [17] . We start withθ(0)1 = −0.3 and θ2(0) = 10. Here we perform 40 iterations regardless of stopping criteria and the estimated parameters with different iteration stages are shown in Figure 1 and Figure 2. If working with Δ = 0.005, we would have

stopped at iteration 7. We should note that both the estimates show distinct bias, even with increasing number of iteration. This is already observed in linear-Gaussian case [14]. We fur-ther compare our approach with the augmented state method (ASM), where the variance of the artificial noise (AN) is re-duced over time as in [18]. We assumeθ2 fixed at its true

value and estimateθ1. Our estimate ofθ1 is similar to the

previous example and we do not include it here. For ASM, the selection of initial variance of AN is not obvious and the convergence of the additional state (parameter) depends heav-ily on this selection. Our method does not have this difficulty. For subsequent implementation of ASM, we takep(θ1(0)) ∼

uniform (−0.5, 1.5), initial variance of AN = 1, T = 1000, N = 2000 and state transition density as proposal. As seen from Figure 3, estimate of θ1 using ASM, keeps oscillating

even after large time steps. On the other hand, our method does converge quite rapidly, albeit with a distinct bias. If this bias can somehow be theoretically ascertained, then our method will decidedly give better result.

0 5 10 15 20 25 30 35 40 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 Number of Iteration θ1 True Estimated

Fig. 1. Estimate of θ1w.r.t. iteration stage.

0 5 10 15 20 25 30 35 40 −5 0 5 10 15 20 25 30 35 40 Number of Iteration θ2 True Estimated

Fig. 2. Estimate of θ2w.r.t. iteration stage.

2009 IEEE/SP 15th Workshop on Statistical Signal Processing 43

(4)

4. CONCLUSIONS

In this paper, we present a SMC based method of estimat-ing the static parameter of a general state space model. The method provides simultaneous estimates of unknown states and parameters in batch fashion. We illustrate the perfor-mance of this method through numerical simulations. Com-parisons with other methods as well as rigorous assessment of convergence issues are topics for further research.

0 200 400 600 800 1000 −1 −0.5 0 0.5 1 Time step θ1 True Estimated

Fig. 3. Estimate of θ1using augmented state method.

5. REFERENCES

[1] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state esti-mation,” IEE Proc. F-Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, April 1993.

[2] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for on-line non-linear/non-Gaussian Bayesian tracking,” IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174–188, 2002.

[3] J. Liu and M. West, “Combined parameter and state esti-mation in simulation-based filtering,” Sequential Monte

Carlo Methods in Practice (A. Doucet, N. D. Freitas and N. Gordon Eds.), Springer, 2001.

[4] S. Saha, P. K. Mandal, and A. Bagchi, “A new approach to particle based smoothed marginal MAP,” in

Proceed-ings of EUSIPCO 2008, 2008.

[5] P. M. Djuric and J. Miguez, “Sequential particle filter-ing in the presence of aditive Gaussian noise with un-known parameters,” in Proceedings of IEEE ICASSP, 2002, vol. II, pp. 803–806.

[6] G. Storvik, “Particle Filters for State-Space Mod-els With the Presence of Unknown Static Parameters,”

IEEE Transaction on Signal Processing, vol. 50(2), pp.

281–289, February 2002.

[7] C. Andrieu, A. Doucet, and V. B. Tadic, “Online simulation-based methods for parameter estimation in nonlinear non Gaussian state space models,” in

Pro-ceedings of IEEE conference on Decision and Control,

2005.

[8] J. Olsson and T. Ryden, “Asymptotic properties of parti-cle filter-based maximum likelihood estimators for state space models,” Stochastic Processes and their

Applica-tions, vol. 118(4), pp. 649–680, April 2008.

[9] O. Cappe, S. J. Godsill, and E. Moulines, “An overview of existing methods and recent advances in sequential Monte Carlo,” IEEE Proceedings, vol. 95(5), pp. 899– 924, 2007.

[10] A. Doucet and V. B. Tadic, “Parameter estimation in general state-space models using particle methods,”

Annals of the Institute of Statistical Mathematics, vol.

55(2), pp. 409–422, June 2003.

[11] Y. Bar-Shalom, “Optimal Simultaneous State Estima-tion and Parameter IdentificaEstima-tion in Linear Discrete-Time Systems,” IEEE Transaction on Automatic

Con-trol, vol. 17(3), pp. 308–319, June 1972.

[12] J. S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech,” IEEE Transaction on Acoustics,

Speech and Signal Processing, vol. 26, pp. 197–210,

May 1978.

[13] A. Bagchi and P. ten Brummelhuis, “Parameter identifi-cation in tidal models with uncertain boundaries,”

Auto-matica, vol. 30(5), pp. 745–759, May 1994.

[14] A. Yeredor, “The joint MAP-ML criterion and its re-lation to ML and to Extended Least-Squares,” IEEE

Transaction on Signal Processing, vol. 48(12), pp.

3484–3492, December 2000.

[15] S. Kay and Y. C. Eldar, “Rethinking Biased Estimation,”

IEEE Signal Processing Magazine, vol. 25(3), pp. 133–

136, May 2008.

[16] S. Godsill, A. Doucet, and M. West, “Maximum a

pos-teriori sequence estimation using Monte Carlo particle

filters,” Annals of the Institute of Statistical

Mathemat-ics, vol. 53, pp. 82–96, 2001.

[17] S. Saha, P. K. Mandal, Y. Boers, H. Driessen, and A. Bagchi, “Gaussian proposal density using moment matching in SMC methods,” Statistics and Computing, vol. 19(2), pp. 203–208, June 2009.

[18] F. Gustafsson and P. Hriljac, “Particle filters for system identification with application to chaos prediction,” in

Proceedings of SYSID03. Rotterdam, The Netherlands,

2003.

44 2009 IEEE/SP 15th Workshop on Statistical Signal Processing

Referenties

GERELATEERDE DOCUMENTEN

The reason for this is that this phenomenon has a much larger impact in hazard models than in other types of regression models: Unobserved heterogeneity may introduce, among

Recently we have established the existence and uniqueness of weak solutions to a two-phase reaction-diffusion system with a free boundary where an aggressive fast reaction

To overcome this problem, a new algorithm was proposed that is able to estimate the parameters of a circle best describ- ing the edge of the lumen, even in the presence of noise

Een vermindering van de omvang van de Nube programmering wordt enerzijds bereikt wanneer het programmeren zelf door bepaalde hulpmiddelen vereenvoudigd wordt en

Recently Mehrkanoon et al.∼ [1] proposed a different approach based on Least Squares Support Vector Machines (LSSVM) for estimating the unknown parameters in ODEs3. As opposed to

Recently Mehrkanoon et al.∼ [1] proposed a different approach based on Least Squares Support Vector Machines (LSSVM) for estimating the unknown parameters in ODEs3. As opposed to

In the second part of the paper, these general results will be applied to state-space models for formalizing, and hence solving, a wide variety of classical estimation problems

Even later komt ze weer tevoorschijn en vliegt onrustig heen en weer, kenne­ lijk omdat wij er met onze neus vlak bovenop staan. Is ze de eerste die we zien en zullen