An adaptive feature-based tracking system

Hele tekst

(1)An Adaptive Feature-based Tra king System by. Eugene Pretorius Thesis presented at the University of Stellenbos h in partial fullment of the requirements for the degree of. Master of S ien e in Applied Mathemati s Department of Mathemati al S ien es University of Stellenbos h Private Bag X1, 7602 Matieland, South Afri a. Supervisors: Prof B.M. Herbst Dr K.M Hunter. 2008.

(2) Copyright © 2008 University of Stellenbos h All rights reserved..

(3) De laration I, the undersigned, hereby de lare that the work ontained in this thesis is my own original work and that I have not previously in its entirety or in part submitted it at any university for a degree.. Signature: . . . . . . . . . . . . . . . . . . . . . . . . . . E. Pretorius Date: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ii.

(4) Abstra t An Adaptive Feature-based Tra king System. E. Pretorius Department of Mathemati al S ien es University of Stellenbos h Private Bag X1, 7602 Matieland, South Afri a. Thesis: MS (Applied Mathemati s) 2008 In this paper, tra king tools are developed based on obje t features to robustly tra k the obje t using pati le ltering. Automati on-line initialisation te hniques use motion dete tion and dynami ba kground modelling to extra t features of moving obje ts. Automati ally adapting the feature models during tra king is implemented and tested.. iii.

(5) Opsomming In hierdie thesis word video volgings gereedskap ontwikkel en getoets. Deur gebruik te maak van 'n voorwerp se kenmerke is dit moontlik om sodoende 'n voorwerp robust te kan volg deur "Parti le Filtering" tegnieke. Die stelsel word automaties geinisialiseer met beweging deteksie en agtergrond modellering om voorwerpe se kenmerke te identiseer en te ontrek. Automaties opdatering van die kenmerk modelle geduurende video volging word geimplementeer en getoets.. iv.

(6) A knowledgements Spe ial thanks to my supervisors, Professor Ben Herbst and Dr. Karin Hunter for their insights and guidan e. And to my family, friends and girl friend for their support and understanding.. v.

(7) Contents De laration. ii. Abstra t. iii. Opsomming. iv. A knowledgements. v. Contents. vi. List of Figures. ix. List of Abbreviations. x. List of Symbols. xi. 1 Introdu tion. 1. 1.1 1.2 1.3 1.4. Problem statement: Tra king Literature study . . . . . . . . Obje tive of the study . . . . Dissertation stru ture . . . . .. 2 Parti le lter theory 2.1 2.2 2.3 2.4. . . . .. . . . .. . . . .. Introdu tion to Bayesian estimation Dynami system representation . . Bayesian lter . . . . . . . . . . . . Monte Carlo (MC) integration . . .. vi. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . .. 1 2 4 5. 7. . 7 . 8 . 9 . 10.

(8) vii. Contents. 2.5 Parti le lter algorithm . . . . . . . . . . . . . . . . . . . . . 12 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15. 3 Feature ve tors 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10. Parti les and Features . . . . . . . . . Colour-based feature . . . . . . . . . . Histogram of oriented gradients feature Motion model . . . . . . . . . . . . . . Feature adaptivity . . . . . . . . . . . Finding an obje t and dete ting a loss Perspe tive adaption . . . . . . . . . . Algorithm . . . . . . . . . . . . . . . . Feature tra king experiment . . . . . . Summary . . . . . . . . . . . . . . . .. 4 System implementation 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8. System design and goals . . . . . . . Ba kground modelling . . . . . . . . Motion tra king . . . . . . . . . . . . User dened parameters . . . . . . . Obje t dete tion experimental results Parti le lter tra king . . . . . . . . Automati initialisation tra king . . . Summary and on lusion . . . . . . .. 5 Con lusion 5.1 5.2 5.3 5.4. Future hallenges . Restri tions . . . . Goals a hieved . . Re ommendations .. A Proje t les. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .. 18 18 19 22 30 32 34 35 36 37 41. 42 42 44 48 53 54 57 60 63. 68 69 69 70 70. 72. A.1 Required software . . . . . . . . . . . . . . . . . . . . . . . . 72 A.2 Installing tra king tools . . . . . . . . . . . . . . . . . . . . 73.

(9) Contents. viii. A.3 Running an example . . . . . . . . . . . . . . . . . . . . . . 73 A.4 Dire tory stru ture and le des ription . . . . . . . . . . . . 73 A.5 Hardware onguration used . . . . . . . . . . . . . . . . . . 76. Referen es. 77.

(10) List of Figures 2.1 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Parti le lter iteration . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9. Colour weighting fun tion . . . . . . . . . . . . . . . . . . . Colour feature example . . . . . . . . . . . . . . . . . . . . . Example HOG des riptor . . . . . . . . . . . . . . . . . . . Example HOG images . . . . . . . . . . . . . . . . . . . . . HOG feature ve tors at dierent ell sizes . . . . . . . . . . Model images used to obtain similarity results of Figure 3.7. HOG features omparison . . . . . . . . . . . . . . . . . . . Simulated tra king example using olour and HOG features Simulated tra king example: HOG adaption . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 20 21 25 26 27 30 31 38 40. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9. Feature-based tra king modules . . . . . System module design . . . . . . . . . . Motion tra king . . . . . . . . . . . . . . Ba kground modelling . . . . . . . . . . Colour feature tra king or so er players Fa e image used to initialise tra king . . HOG feature tra king of fa e . . . . . . Tra king fa e using ombined features . Tra king snooker balls . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 43 44 52 55 58 59 65 66 67. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. A.1 GUI main window . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.2 Proje t les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74. ix.

(11) List of Abbreviations. EM. Expe tation Maximisation. FGD. Foreground obje t dete tion. HOG. Histogram of gradients. HSV. Hue Saturation Value. KLT. Kanada-Lu as-Tomasi. MC. Monte Carlo. MOG Mixture of Gaussian's pdf. probability distribution fun tion. RGB. Red Green Blue. SIR. Sequential importan e resampling. SIS. Sequential importan e sampling. SMC. Sequential Monte Carlo. x.

(12) List of Symbols ρ. Bhatta haryya oe ient. f. Transition fun tion. h. Observation fun tion. I. Image. k. Time. M Motion Image R Image region s. Parti le samples. Sσ Samples set deviation T. Threshold. v. Speed. w Weight x. State. z. Observation. xi.

(13) Chapter 1 Introdu tion Suppose a robot is given a video stream, and through it, intera ts with the world around it. In this s enario, omputer vision te hniques would have to be programmed so that the robot an tra k, and maybe even re ognise ertain obje ts. Obje t tra king omes naturally to humans, sin e it is instin tive to observe the world around us. Computers, however, need sophisti ated te hniques in order to mimi our tra king ability and to automate tedious tasks. These te hniques attempt to solve obje t tra king through luttered s enes with noisy measurements. Spe i algorithms ea h have inherent short omings due to the nature of the problem, while su essful approa hes use obje t appearan es that indeed mimi the way humans would tra k obje ts.. 1.1 Problem statement: Tra king Tra king of obje ts in a video sequen e is one of the most fundamental problems in Computer Vision. It forms the basis of appli ations as diverse as surveillan e, tra monitoring, gesture re ognition and sport analysis su h as so er. Some of the most widely used tra king algorithms in lude the KanadaLu as-Tomasi (KLT) feature tra ker whi h is an opti al ow method. These. 1.

(14) Chapter 1. Introdu tion. 2. algorithms tra k obje ts by omparing onse utive pairs of frames, no dynami information about the moving obje t is used. The problem is that as soon as the motion of the obje t itself is used, some information about the obje t is required. On the other hand using dynami information leads to more robust tra king algorithms, allowing, for example, tra king through o lusions. It turns out that it is not hard to in orporate dynami information into the tra king algorithms by using a parti le lter. That leaves nding information about the obje t itself that is to be tra ked. There are several hoi es to obtain information about the obje t. Isard and Blake [13℄ tra k the shape of an obje t, allowing, but also restri ting shape deformations. This is known as a tive ontours. Another hoi e is to tra k features that des ribe the olour or texture of the obje ts. Then it is possible to ombine all these whi h resulted in the so- alled A tive Appearan e Models (AAM) [3℄. Although robust, AAM's are omputationally expensive algorithms. A simple fa t is that lients requiring systems based on omputer vision te hniques, su h as surveillan e, often annot aord the ne essary CPU power. It is therefore of onsiderable interest to explore light-weight alternatives. That takes us ba k to olour and texture tra king using a parti le lter.. 1.2 Literature study A study of the most re ent developments in tra king has shown that olour-based parti le ltering is used su essfully to tra k non-rigid obje ts [14℄. A olour distribution model is built in RGB spa e and a similarity measure is employed for the obje t model. The authors ompare this te hnique with the mean-shift algorithm whi h tries to minimise the distan e between the theoreti al mean and the observed ones. It is shown that the mean-shift algorithm fails when the obje t's position in su essive frames does not overlap, where the parti le lter has no su h problems. The olourbased parti le lter would however fail if lighting onditions hange to an.

(15) Chapter 1. Introdu tion. 3. extent where the similarity between the obje t model and measurements is indistinguishable from the ba kground. A self-adapting histogram model is used in [6℄ to adjust to lighting hanges. This is possible sin e the olour model uses the HSV instead of the RGB olour spa e. The adaptivity allows for small illumination hanges as well as partial rotation in 3D-spa e. A onden e measure is al ulated from the probability distribution that des ribes how well obje ts are being tra ked. Adapting is done using this onden e measure to only adapt to the a tual target when onden e is high. Also, the implementation is done on a smart amera ( amera with CPU) and runs real-time. Sin e all the pro essing is done on the amera itself, no images need to be sent over the network. This is a very important property in se urity appli ations where lient priva y might be an issue. Blob tra king [7℄ is also a su essful feature-based tra ker. A multiresolution graph for tra ked regions is built from onne ted omponents (blobs). The point is made by the authors that robust tra king annot be handled by only one algorithm. Modules need to be built up that solve problems robustly at ea h step of a semanti ladder. The rst step being segmentation, and the next step tra king. The algorithm handles larger slow moving blobs that are easy to tra k, and fast moving, small blobs that are mu h more di ult to tra k equally well. In ases where the algorithm failed the segmentation step produ ed unsatisfa tory results. Either a region of interest is not segmented, two separate regions merge and form one blob or the relationship between a blob in onse utive frames has a low likelihood at a low resolution in the multi-resolution s ale. This is handled at the next semanti step. Multi- amera systems are implemented in [21℄ and [20℄ for tra king football players and surveillan e purpose, respe tively. Cameras with overlapping elds of view are used. In the ase of the football players, ea h amera's pro essing is done separately and then ombined. A Kalman l-.

(16) Chapter 1. Introdu tion. 4. ter is used for ea h amera to tra k players. Measurement data is used whenever available to minimize estimation errors. For the surveillan e appli ation the Kanada-Lu as-Tomasi (KLT)[18℄ feature tra king algorithm is used. KLT tries to estimate the motion at every pixel position using on urrent available frames. Contour features are used in [13℄ implementing the ondensation (parti le lter) algorithm. Spline urves are tted to an obje t's shape and high ontrast features are extra ted at intervals along the urve. Obje t ontours (splines) are des riptive features and are su essfully used to tra k urves through lutter. This is known as ontour-based tra king. An impressive experiment is done tra king a falling leaf against a ba kground lled with similar leaves. Contour-based tra king has the disadvantage of being omputationally expensive. In [4℄ edges and the pixel gradients are onsidered as feature models. Images are broken into ells, ea h a histogram of oriented gradients (HOG). Combined these ells represent the feature model. This approa h has been su essfully implemented in obje t re ognition type problems. This te hnique suers from expensive al ulations and slow exe ution on less simplisti s ene omposition. Some of the most popular tra king algorithms where shown here. An important fa tor for ea h of these algorithms is their omputational ost. For any tra ker to be useful it should be robust and light-weight and should be heap to build and use.. 1.3 Obje tive of the study In this thesis several light-weight tra kers are studied. More spe i ally implementing of a olour-based tra ker and a texture-based feature using an adapted HOG des riptor is developed. The tra king "engine", a parti le lter, is implemented. The proje t obje tives are.

(17) Chapter 1. Introdu tion. 5. Design a tra king implementation to solve problem statement Build a parti le lter for the design Create a olour-based feature model Investigate and implement other type feature models su h as texture Test the feature's ee tiveness and robustness Make improvements to the original implementation based on learnt. short omings Automate the tra king pro ess after initialisation Automati initialisation of the tra ker Feature adaption when obje t's appearan e hange. HOG was developed to dete t obje ts in a s ene at dierent image s ales. Its su ess, when used as an obje t dete tor, sparked our interest for use in a tra king ontext. An adaption to the HOG texture feature is developed and ombined with the olour feature to improve tra king robustness. The HOG feature is used to nd a similarity measure between the target obje t and samples. Failure as a robust tra king feature is dis overed and adjustments to the HOG onstru tion are developed. The developed feature des riptor an su essfully tra k obje ts using only texture information, (when texture is available) and tra king improves when ombined with a olour feature.. 1.4 Dissertation stru ture This thesis builds on the theory in Chapter 2 to a working implementation in Chapter 4..

(18) Chapter 1. Introdu tion. 6. After rst overing the basi parti le lter on epts, obje t features are investigated in Chapter 3. Features are adapted through new observations and the pro ess is automated to adapt independent of user intera tion. Features with omplementary hara teristi s, that ontain su ient information, are investigated. A motion tra ker is built to help with automating a tra king system at initialisation. Ba kground modelling te hniques are also investigated as part of the tra ker initialisation and feature extra tion. Ea h system module is dis ussed as implemented and results are shown..

(19) Chapter 2 Parti le lter theory A parti le lter is a non-linear sub-optimal model estimation te hnique based on simulation. It is an implementation of the formal re ursive Bayesian lter that performs sequential Monte Carlo (SMC) estimation based on a weighted representation of probability densities [16℄. Random sampled approximations of the probability density fun tion (pdf) are alled the weighted parti les. In general more parti les lead to a better approximations of the pdf. Parti le lters propagate a nite number of these samples a ording to the dynami s of the system and update the pdf using the observed measurements [1℄.. 2.1 Introdu tion to Bayesian estimation The Bayesian approa h aims to onstru t the posterior pdf based on all available previous information and urrent measurements. In su h a ase where the pdf is onstru ted from all available information the solution is omplete and an optimal estimate (in a minimising-of-a- ost-fun tion sense) of the state is possible. A re ursive approa h is onsidered that allows for a new estimate whenever new measurements are obtained. Re ursively, predi tions and updates form the two main steps for most Bayesian estimators. The predi tion step propagates the state pdf forward a ording to a dynami system model. The update step, using Bayes' theorem, uses the. 7.

(20) Chapter 2. Parti le lter theory. 8. latest measurements to al ulate the predi tion pdf. The re ursive Bayesian estimation or lter therefore provides a formal me hanism for propagating and updating the posterior pdf as new information is re eived [17℄. The following se tions develop the ba kground theory of parti le ltering. Firstly, a dynami system is represented by a dynami s model and a measurement model in a probabilisti form so that a Bayesian approa h may be adopted. Then the re ursive estimation in Bayesian ltering [5℄, predi tion and update steps, ts this dynami representation. Integration di ulties in Bayesian ltering are handled by Monte Carlo (MC) estimation [16℄ presented in Se tion 2.4. Finally, the implemented parti le ltering algorithm is dis ussed.. 2.2 Dynami system representation A sequen e of evolving probability distributions π(xk ), indexed by dis rete time k = 0, 1, 2, ..., is alled a probabilisti dynami system [12℄. A dynami system is generally represented by a state spa e xk . Two models are required for analysis in a dynami system: a dynami model and a measurement model. Firstly, a dynami model des ribing system evolution, the hange in the state over time, is dened. The state sequen e is a Markov random pro ess and the state equation is written as. xk = fk−1(xk−1, vk−1),. (2.2.1). where xk is the state ve tor at time step k, fk−1 is the (possibly non-linear) state transition fun tion that propagates the system from time step k − 1 to time step k. Pro ess noise is modelled by vk and the pdf is assumed known. Se ondly, a measurement model where noisy measurements are related to.

(21) Chapter 2. Parti le lter theory. 9. the state is needed. The observation equation is of the form. zk = hk (xk , wk ),. (2.2.2). where zk is the observation ve tor at time step k, hk is the observation fun tion that relates the state spa e to the observations and the observation noise, wk , whi h has a known pdf. The state and observation equations an also be represented by probability densities. Note that (2.2.1) is a rst order Markov pro ess and that the state equation is equivalent to p(xk |xk−1 ), also known as the transition density. Similarly, the observation equation (2.2.2) is equivalent to p(zk |xk ). In summary, the probabilisti des ription of a dynami al system formulated in a probabilisti way ts the Bayesian estimation approa h, as des ribed in the next se tion.. 2.3 Bayesian lter Bayesian ltering attempts to onstru t the posterior pdf from all available information. The state ve tor, xk , ontains information des ribing the system. This true state, xk , is assumed to be a Markov pro ess whi h annot be observed dire tly, and the measurements zi , where the set Zk = {zi , i = 1, ..., k} are the observations of the state. A Markov assumption is made about the state spa e, that assumes the urrent state is only dependent on the immediately pre eding state, p(xk |xk−1) = p(xk |x0 , ..., xk−1 ).. (2.3.1). Similarly, the measurement at the k-th time step depend only on the urrent state and is independent of all other states given the urrent state p(zk |xk ) = p(zk |x0 , ..., xk−1 ).. (2.3.2).

(22) 10. Chapter 2. Parti le lter theory. From the Markov assumption made, the formulation of (2.3.1), (2.3.2) is equivalent to the dynami system state representation in Se tion 2.2. Given the posterior pdf at time k − 1, p(xk−1 |Zk−1 ), the idea is to nd p(xk |Zk ). This is a hieved by means of a predi tion and an update step. First, p(xk |Zk−1 ), the prior pdf, is obtained using the transition density p(xk |xk−1 ). p(xk |Zk−1) =. Z. p(xk |xk−1 )p(xk−1 |Zk−1 )dxk−1 .. (2.3.3). Now the new observation is obtained, this is used to update the posterior pdf using Bayes' rule by in luding the observation zk , p(xk |Zk ) = p(zk |xk )p(xk |Zk−1 )/p(zk |Zk−1 ).. (2.3.4). The normalization fa tor is given as usual by p(zk |Zk−1 ) =. Z. p(zk |xk )p(xk |Zk−1)dxk .. Bayesian ltering is dened by the predi tion step in (2.3.3) and the update step in (2.3.4) with initial ondition p(x0 |z0 ) = p(x0 ) obtained from assumed or given data. Analyti al evaluation of the pdf in (2.3.3) and (2.3.4) is impossible ex ept in ases su h as the Kalman lter and hidden nite-state spa e Markov hains where linearisation (Gaussian pdf's) simplies the equations. Monte Carlo (MC) integration, on the other hand, is not limited by linear-Gaussian assumptions and will be des ribed in the following se tion.. 2.4 Monte Carlo (MC) integration Monte Carlo (MC) integration methods use pseudo-random numbers to numeri ally approximate multi-dimensional, denite integrals and form the.

(23) 11. Chapter 2. Parti le lter theory. basis of sequential monte arlo (SMC) methods. Pseudo-random numbers are generally used for omputational onvenien e. By the Law of large numbers1 if N → ∞ then MC integration approa hes the exa t solution. MC integration is used to evaluate the integral (2.3.3) of the optimal Bayesian lter. Consider a multi-dimensional, denite integral g(x). Writing g(x) = f (x)π(x) its integral be omes I=. Z. g(x)dx =. Z. f (x)π(x)dx.. (2.4.1). The integral g(x) is fa torised su h that π(x) is a density. Sin e π(x) is a density I is interpreted as the mean of f (x). In a Bayesian ontext π(x) is realised as the posterior pdf. Where {xi ; i = 1, ..., N} are the samples drawn from π(x). The MC estimate of I is the sample mean N 1 X f ( xi ) IN = N i=1. (2.4.2). and onverges to I if N is hosen large enough. Unfortunately, ee tive sampling from π(x) is not possible due to the distribution being multi-variate, non-Gaussian and only known up to a proportional onstant. Importan e sampling rather samples from a known density distribution q(x) that approa hes π(x) when N is in reased. This proposed pdf q(x) is referred to as the importan e or proposal pdf. Sin e q(x) is a weighted density of the sample set, MC estimation is possible. The integral (2.4.1) is written as I= 1 Ja ob. Z. f (x)π(x)dx =. Z. f (x). π(x) q(x)dx, q(x). (2.4.3). Bernoulli rst des ribed the law of large numbers as so simple that even the stupidest man instin tively knows it is true. -http : //en.wikipedia.org/wiki/Law_of _large_numbers# note − 0.

(24) 12. Chapter 2. Parti le lter theory. and the MC estimation is al ulated, by drawing N ≫ 1 samples, as IN =. where. N 1 X f (xi )w(xi ) N i=1. w(x) ∝. π(x) q(x). are the normalised importan e weights so that. (2.4.4). (2.4.5) N X. w i = 1. Therefore, from. i=1. (2.4.5) samples drawn from the known importan e density q(x) have weights w(xk ) ∝. p(xk |Zk ) . q(xk ). (2.4.6). The hoi e of the importan e density q(x) is ru ial when designing the SMC. In this ase a suboptimal hoi e is made to approximate q(x). Choosing the transitional prior, q(xk | xk−1 , zk ) = p(xk , xk−1 ), the weights are updated by. w(xik ) ∝ w(xik )p(zk | xik ).. This indi ates that the weight at time k an only be omputed after the observation and the parti les have been propagated at time k. SMC is also known as parti le ltering. Other names by whi h SMC is known in lude bootstrap ltering, the ondensation algorithm, intera ting parti le approximation and survival of the ttest. The SMC method implements a re ursive Bayesian lter of Se tion 2.3 using the MC integration method (sub-optimal) to evaluate the integrals.. 2.5 Parti le lter algorithm The parti le lter algorithm is a dire t implementation of the re ursive Bayesian lter using MC methods. The basi parti le lter, sequential im-.

(25) Chapter 2. Parti le lter theory. 13. (SIS) algorithm is des ribed in this se tion as well as the sampling importan e resampling (SIR) algorithm.. portan e sampling. Given the posterior p(xk−1 |Zk−1 ) and that N samples are randomly drawn x = [1...N]. Then in the predi tion phase samples are passed from time step k −1 and propagated using a dynami model to generate the prior sample set at time step k. These prior samples xik , i = [1...N] produ ed by the dynami s model are samples from the prior pdf p(xk |Zk−1 ). i k−1 , i. In the update step, a new measurement zk is obtained. The measurement is used to update the prior a ording to the parti le's weight wki . The new weight value is al ulated as the measurement likelihood evaluated at the prior sample: wki = p(zk |xik ). The weights must sum to one after normalisation. In the SIR algorithm a further step is added to resample these normalised weights. The resampling algorithm hooses parti les from the prior set with a probability equal to its weight. This new set of parti les is onsidered to be samples from the required pdf p(xk |Zk ). The parti le lter algorithm repeats the predi tion and update phases at ea h time step to obtain the posterior pdf at the next time step [17℄. 2.5.1. Sequential importan e sampling (SIS) algorithm. This is the most basi implementation of the parti le lter. Sampling is done from the prior pdf and weights are assigned to the parti les. The pdf is re ursively updated or propagated using measurements at ea h time step (sequentially). A serious problem arises when applying the SIS algorithm. After a few iterations the pdf ollapses around a single parti le and all other parti les have negligible weight. This phenomenon is alled degenera y. Also, propagating these parti les is omputationally ostly and fails to represent the true pdf a urately. A possible solution requires resampling of the parti les..

(26) Chapter 2. Parti le lter theory. 14. The SIS algorithm is shown in Algorithm 1. The algorithm notation used in this se tion is similar to [1℄. The state spa e samples xk−1 have orresponding weights wk−1 at time k − 1. New observations at time k is des ribed by zk .. 1 2 3 4. input : xk−1, wk−1 ,zk output: xk , wk x for i ← 1 to N do draw xik ∼ q(xk |xk−1 , zk ) generated samples; assign parti le weight, wki = p(zk |xik ) ; end Algorithm 1: SIS Algorithm. 2.5.2. Resampling algorithm. The SIR algorithm is an extension of the SIS algorithm by in orporating the resampling step des ribed here. Degenera y o urs when only a few parti les have a large weight and the rest of the parti les have weights that are almost zero. In su h a situation the prior pdf is not an a urate representation. To redu e the ee ts of degenera y on the parti le lter a resampling step is added. Resampling is done by hoosing parti les with larger weights more frequently than those with smaller weights. Dierent methods of resampling exist su h as multinomial, residual, stratied and systemati [2℄. A systemati resampling s heme is onsidered here with omplexity of O(N), where N is the number of parti les.. Resampling steps The resampling pro ess is shown in Algorithm 2 and. the steps are explained as follows. Figure 2.1 illustrates the resampling algorithm..

(27) 15. Chapter 2. Parti le lter theory. Step 1 omputes the umulative sum of N parti le weights, C i = [1...N]. Note that the weights w represent a pdf and that. i. C. =. i X. wj ,. j=0. N. = 1.. C. is an index of the umulative weights and it is divided into equally spa ed intervals of N1 .. Step 2 sets where the index should start, namely at the rst parti le's. weight index.. . . In Step 3 a random oset value λ ∈ 0, N1 , is generated from a uniform distribution.. Step 4 insures that all the parti les' weights are onsidered whilst moving. up the index.. Step 5, starting at the oset value, moves up along the index values. step 6 draws samples by omparing the value λ to Ci . In step 7, if λ > Ci , the parti le weight of wi is small and not sampled by in reasing i. This ee tively skips past a few parti les with small weights. Otherwise, the weight at index i is sampled repeatedly in step 9 until ondition λ > Ci is not true. It is lear that larger weights are sampled more often whilst moving along the indexed values of C and smaller weights are ignored.. 2.6 Summary The basi parti le lter has an elegant and simple algorithm that an be applied in general to most non-linear estimation problems. It is important to realise the dangers, su h as the hoi e of dynami model, sample set size and impoverishment of the sample set..

(28) 16. Chapter 2. Parti le lter theory. input : xk , wk output: x∗k , w∗k Ci =. i X. w j //Create the umulative values index;. 1 j=0 2 i = 0 //start oset index; 3 λ1 ← random[0, N1 ]; 4 for j ← 1 to N do 5 λj = λj−1 + N1 //moving 6 while λj > C i do 7 i = i + 1; 8 end 9 10 11. end. up C ;. i xj∗ k = xk ; wkj∗ = N1 ;. Algorithm 2: Resampling algorithm. Figure 2.1:. Resampling of 6 sample weights.. In Figure 2.2 a possible iteration of the parti le lter algorithm is shown at parti le level. The graphi al representation visually summarises the main idea behind parti le ltering. The three main se tions as des ribed in this hapter are shown, namely, sele tion and predi tion of the parti les, the resampling step and the observational update of the pdf..

(29) 17. Chapter 2. Parti le lter theory. Figure 2.2:. Parti le lter iteration.

(30) Chapter 3 Feature ve tors Feature ve tors, su h as olour, ontour, texture, edge and intensity, des ribe an obje t's appearan e. Features are olle ted in a state, and the state is represented by a pdf. The pdf is known through its samples as des ribed in the previous hapter. Sampling measurements represented by these feature are needed to update the parti le lter's posterior pdf. Samples are ompared and weighted a ording to these appearan e similarities. This hapter rst des ribes the implementation of a olour- and texturebased feature ve tor, while in Se tions 3.5 and 3.6 des ribe improvements for a robust tra ker. The last se tion illustrates the feature-based algorithm.. 3.1 Parti les and Features Parti les s are ve tors si = [xi , yi, dxi , dyi, F], i = [1, .., N], where (x, y) is the parti le's position, (dx, dy) are the velo ity omponents, F the set of one or more features and N is the number of samples. A parti le's features are obtained (a sample) at (x, y). The feature is extra ted from a smaller region in the image whi h may ontain the obje t. If the target feature is known, the samples are ompared and a weight, dire tly proportional to their similarity, an be assigned to a parti le. Parti les an in general ontain any number of features.. 18.

(31) 19. Chapter 3. Feature ve tors. 3.2 Colour-based feature Good features are essential if an obje t is to be tra ked su essfully. Colour histograms model an obje t's olour distribution. These olour histograms have the advantage that obje ts an have non-rigid shapes or rotate in an environment and still be dete table provided the olour distribution des ribing the obje t remains the same. 3.2.1. Colour model. Colour image samples are obtained in a red-green-blue (RGB) representation and onverted to a hue-saturation-value (HSV) olour spa e. A HSV histogram model allows that the intensity, V, an be handled separately. The advantage is that ree tions and shadows, mostly present in V spa e, an be handled more robustly. A 2D Hue-Saturation (HS) and a 1D intensity (V) histogram represent the obje t's olour feature.. Weighted histogram Non-rigid obje ts rarely have a re tangular shape. A kernel fun tion is used to weigh spe i positions in an image region dierently. Dening a kernel fun tion for example as, k(µ) =. (. 1 − µ2 0. if µ < 1 otherwise. (3.2.1). where µ is a normalised distan e of a pixel to its region's enter, weighs the olour distribution of pixels on the edges less than in the enter. Kernels su h as epane hnikov, quarti (biweight), tri ube (triweight) or Gaussian ould also be employed. In Figure 3.1, the hange in radius µ, illustrates how the kernel (3.2.1) weighs the image regions. Assuming that the most important information is ontained around the enter of an obje t this fun tion will be adequate and allows for partial o lusion at the edges. An image region, Ri , has a user dened height and width, respe tively, Hx and Hy ..

(32) 20. Chapter 3. Feature ve tors. The image region, Ri is entred at (xi + H2x , y i + H2y ). Note that if the entire image is used as a region Hx and Hy then des ribes the entire image.. Des ription of weighting fun tion al ulation: (left) input image mask, (middle) distan es from enter, (right) output weighting fun tion.. Figure 3.1:. An image histogram is built using the image pat h weighted pixel values of (3.2.1). Every pixel r = (x, y) in an image region Ri is binned in the histogram pi [b] = f. X. rεRi. k. kr − dbi k a. !. δ [I(r) − b] .. The distan e from pixel (x, y) to region Ri enter is dbi = q. q. (3.2.2) ( 12 Hx − x)2 + ( 21 Hy − y)2.. S aling of dbi by the region ir ular radius a = 14 (Hx2 + Hy2 ) ensures that the kernel fun tion assigns the largest weights to pixels at the region's enter. Image I represents the weighted HS- and V- omponents. The δ fun tion bins the pixels for intensities in image I , into bins b.. The 2D HS-histogram is represented as an image as illustrated in Figure 3.2. The HS-histogram image is divided into re tangles of equal size that represent the histogram bins. A high bin value in the image is proportional to a high olour intensity (white) and a low bin value is bla k. Representing the olour model in histogram spa e using (3.2.2), it is possible to ompare.

(33) 21. Chapter 3. Feature ve tors. Figure 3.2: (left) input image, (middle) 2D Hue-Saturation histogram image using 50x50 bins, (right) V histogram image 50 bins. feature samples while tra king. To update the re ursive nature of the algorithm when new observations are introdu ed a similarity measure is needed in the tra king estimate, as explained below.. Parti le weight update Given an obje t's olour feature histogram, the target appearan e is known. The target appearan e needs to be ompared with parti le samples that represent the observations zk to update p(xk |Zk ). To ompare the target model and the samples, the Bhatta haryya similarity that measures the similarity of two dis rete probability distributions, is used. Both the 2D HS-histogram and 1D V-histogram similarity values, ρhs and ρv respe tively, are obtained when a sample's histogram is ompared with the target's histogram model using the dis rete Bhatta haryya oe ient ρ. . . p ,q i. B X p = pi [b] q [b],. (3.2.3). b=1. where pi are the sample histograms and q is the model histogram, and B the number of bins. Note that for the olour feature ve tor Fi = pi . Both pi and q are seen as a pdf and normalised to sum to unity. When pi and q are exa t, the similarity is maximized, ρ = 1. The more similar the appearan e between the target and model histograms, the higher the similarity measure ρ. The ρhs and ρv similarity values are ombined using.

(34) 22. Chapter 3. Feature ve tors. alpha blending to weigh the histograms a ording to their importan e, ρ = α × ρhs + (1 − α) × ρv .. To minimize lighting hanges the V-histogram is weighted less in the experiments and α = 0.7 is used. This value was suggested by [6℄ and tested by trail and error. A smaller value for alpha usually redu es a ura y while tra king and a α = 0.5 usually fails to tra k an obje t su essfuly. The Bhatta haryya distan e is al ulated using di =. p. 1 − ρ [pi , q].. (3.2.4). This distan e is used when al ulating the parti le weights, w using a Gaussian. The weights of sample set s then is. w= √. d2 1 e(− 2σ2 ) . 2πσ. (3.2.5). where the varian e σ is a user-dened variable. When the varian e is low, the hoi e of parti les with high ρ are favoured when propagated to the next step. The result of hoosing σ too small results in a degenera y of the pdf.. 3.3 Histogram of oriented gradients feature Histogram of oriented gradients (HOG) was developed as a su essful human dete tor [4℄. The idea is that gradients of an obje t ontain shape and texture information that an be used to distinguish it from other obje ts. In this way, HOG aptures an obje t's stru ture and texture into a feature ve tor that an be used to dete t humans in a s ene. The goal is to be able to use HOG features to tra k an obje t by omparing sample HOG features with a target appearan e..

(35) 23. Chapter 3. Feature ve tors. 3.3.1. HOG model des ription. An image I is divided into uniformly spa ed ell regions Ic . Cells may overlap and have a user dened size Cx , Cy . Cal ulating the number of nonoverlapping ells in an image region is then L = HCxx × HCyy , where Hx and Hy are the dimension of image I . For ea h ell Iic , i = [0, ..., L] a histogram of gradients is al ulated. Gradients are dete ted by onvolving with a lter mask [−1 01]. When dealing with olour images the gradients are al ulated for ea h olour plane. The gradients are redu ed to a single plane by sele ting the pixel gradient value with the largest magnitude from ea h plane. Ea h ell bins the gradient values weighted a ording to their magnitude. Combined, these ells form the HOG model's feature ve tor. The pro ess is illustrated for a single ell in the following example and the results are shown in Figures 3.4 and 3.5 for an entire image. 3.3.2. HOG illustrative example. A he kered board matrix fun tion  1 0 1   f (x, y) =  0 1 0  1 0 1 . represented by image. (3.3.1). , is onstru ted. Dierentiating f (x, y) is done in pra tise by onvolution with the kernel fun tions, Kx =. h. −1 0 1. i. (3.3.2).

(36) 24. Chapter 3. Feature ve tors. and. .  Ky = . −1. .  0 . 1. (3.3.3). For larity, in this example, we dierentiate f (x, y) using, ∂f (x, y) f (x + 1, y) − f (x − 1, y) ≈ . ∂x 2. and. ∂f (x, y) f (x, y + 1) − f (x, y − 1) ≈ . ∂y 2. Boundary ases are handled by padding the edges with the boundary values. Applying the lter above to f (x, y) we respe tively obtain  1 − 21 0 2   1  2 0 − 21  1 − 12 0 2 . in the x-dire tion and in the y-dire tion.  − 12  0 0 .. . − 12   0 1 2. Viewing the omponents Table 3.1. ,. ∂f ∂f ∂x ∂y. 1 2. − 12. 1 2. in polar oordinates a magnitude shown in. |∇f | =. s. ∂f 2 ∂f 2 + ∂x ∂y. and angle shown in Table 3.2 θ = ar tan. ∇fy ∇fx. is al ulated, shown here in degrees. A histogram of these al ulated gradients, weighted by their magnitude, is onstru ted as shown in Figure 3.3. Ea h of these bins an also be represented as a ve tor with angle equal.

(37) 25. Chapter 3. Feature ve tors.  Table 3.1:. . √1 2 1 2 √1 2. 1 2. 0 1 2. √1 2 1 2 √1 2.  . Magnitude values and orresponding image representation.  Table 3.2:.  45 270 135  180 0 0  315 90 225. Angle values and orresponding image representation. to the bin index and magnitude dire tly related to its bin value. Interesting results are observed when al ulating a HOG for shapes with uniform olour and no texture, su h as a lled re tangle or ir le. Gradient information available only on the edges of these shapes reates a double edge (two neighbouring pixels ontain gradient and magnitude information) image that results from the onvolution using (3.3.2) and (3.3.3).. Example HOG des riptor for image f (x, y) using 3 bins. Ea h magnitude and orresponding angle is shown in every bin.. Figure 3.3:. Figure 3.5 shows the HOG des riptor in ve tor form, for the input image in Figure 3.4 at dierent ell size sele tions. When the ell size is small, 2×2 pixels, detail is high and the edge information is learly noti eable on ells.

(38) Chapter 3. Feature ve tors. 26. Example HOG feature steps for entire input image on left, (middle) magnitude image, (right) angle image. Figure 3.4:. with dense gradient information. Note that by sele ting a small ell size su h as 2 × 2 allows that the feature an be ompared at dierent s ales by ombining neighbouring ells into larger ells. For example, 10 × 10 ells an be ombined from 5 groups of 2 × 2 ells without re al ulation from the sour e image. Histogram bins an also be redu ed by summing neighbouring bins. This less a urate representation might be ne essary to al ulate a feature more qui kly to maintain real-time speeds. It is also important to note that ea h ell ve tor is asso iated with a position in the image. Note that the normalised HOG des riptor an be interpreted as a pdf with the following useful properties. In this normalised form the HOG is s ale invariant and less dependent on the magnitude of the gradients. Spe ial are should be taken to normalise the pdf for a uniform region where no gradients are present in the region (all histogram bins equal zero). This is more likely to happen with smaller ell sizes. This situation is handled separately to ensure that similarity omparison between su h features is zero. Also, note that the HOG des riptor as des ribed is not rotationally invariant. This is explained by the fa t that a rotated obje t's edge gradient values are binned into dierent histogram bins and usually not in the same ell. Note that the HOG ells prevent that rotation an be dete ted by.

(39) 27. Chapter 3. Feature ve tors. [b℄. [a℄. [ ℄ HOG features at dierent ell sizes, using 36 bin histograms. The images show ea h ell's HOG as ve tors graphi ally (a) ell sizes at 2x2, number of histograms 60x80, (b) ell sizes at 6x8, number of histograms 20x20, ( ) ell sizes at 12x16, number of histograms 10x10 Figure 3.5:.

(40) 28. Chapter 3. Feature ve tors. a linear shift of ea h histogram. Comparing two HOG des riptors qui kly be omes a hallenging problem. This observation is explained in the next se tion. 3.3.3. Similarity measure between HOG features. Comparison between two HOG ve tors v1 , v2 is done in a similar way to that of the olour-based feature ve tor. Representing ea h HOG ve tor ell Iic as a probability distribution, the Bhatta haryya similarity measure an be used. The ell similarity measures are ombined in a single similarity value by taking the average over the similarity measures, L. ρ=. B. 1 XX L i=1 b=1. q. v1i [b]v2i [b],. (3.3.4). where b is the gradient histogram bins, L is the number of ells and B is the number of bins. A omparison between the trained HOG model and parti le sample HOG's allows the use of the similarity value, ρ to update the parti le weights using (3.2.4) and (3.2.5). In pra ti e, however, this approa h fails to be an a urate measure to tra k an obje t and is dis ussed in the next se tion.. 3.3.3.1 HOG similarity used in tra king Dividing an image into ells allows that hanges in small parts of the image do not ee t the entire feature. This is an advantage when using HOG for dete tion as presented in [4℄. When viewed in a parti le lter tra king ontext, two hallenging problems arise. Firstly, sampling at predi ted lo ations does not, in general, sample at exa tly the orre t position. Consider the situation where a sample is taken just left of the a tual obje t lo ation. Then ea h of the ell histograms ontain gradient information that are unaligned to the right of the model histogram, resulting in a low similarity. Se ondly, histograms are binned using an image's gradient angles. Cell histograms are not rotational invariant in su h a situation. Again, samples.

(41) Chapter 3. Feature ve tors. 29. might be mis-aligned due to obje t rotation. This is not easy to deal with if multiple ells are used. Thus, instead of using multiple ells, we use a single ell for ea h region. Using the tra ker predi tions of possible obje t lo ation the single ell HOG is used to nd a similarity value. The advantage of this is three fold. Firstly, samples at non-exa t predi ted lo ations that ontains only part of the target might still have a large similarity. Se ondly, obje t rotation an easily be handled by a linear shift. And thirdly, a mu h faster implementation is possible assuming that obje t rotation between frames is small. Then orrelation redu es to shifting the histogram bins one bin position left or right respe tively. For example, using 36 bins, an obje t an rotate 10 degrees without ae ting the similarity value.. 3.3.3.2 HOG similarity omparison An experiment is done to determine how similar obje ts appear using the single histogram HOG and the general HOG with dierent ell sizes. A subset of the ETH-80 dataset is used [10℄ to test how well HOG des riptors ompare obje ts at dierent bin and ell sizes. Figures 3.7 shows the ee ts of bin and ell sele tion when omparing obje ts in Figures 3.6 entred in an image. A pear image is hosen as a model in Figures 3.7 (a) and ompared with other pear images. Ea h of the pear images is then ompared with a tomato image and the results are shown in Figure 3.7 (b). Again the test is repeated where a up image is ompared with ea h pear image. The similarity results for a range of dierent bin sizes are shown in Figure 3.7 (c). For ea h of these tests the single ve tor (1 ell) HOG results are shown in Figure 3.7 (d). These results show a 5% better similarity when omparing pears with pears than omparing tomatoes and pears. And a 10% better similarity is obtained when omparing a up with pears. Also, note that hanging the number of bins does not ee t these results..

(42) 30. Chapter 3. Feature ve tors. The results show that similarity measurements suer greatly when the number of ells is in reased. A single ell representation a hieves best results for all bin sizes tested. Tra king using a single ell HOG ve tor improves performan e and an experiment is done in Se tion 3.9. In the experiment it is shown that a single ell HOG feature is more robust, allowing for small translation and rotation errors from tra ker predi tions.. Figure 3.6:. Model images used to obtain similarity results of Figure 3.7.. 3.4 Motion model Parti les are propagated to the next step a ording to a dynami motion model. A onstant velo ity model is used without a eleration. A eleration, handled by noise, is not onsidered sin e the state spa e be omes too high dimensional and requires far too many samples, whi h is omputationally expensive. Parti les are propagated using. xk = Axk−1 + wk−1. (3.4.1). where A denes deterministi parameters, wk−1 the sto hasti and k the time. We remind the reader that xk is the state spa e representing the dynami s. Using (3.4.1) to propagate a parti le using the motion model its.

(43) Chapter 3. Feature ve tors. 31. [a℄. [b℄. [ ℄. [d℄ HOG features omparison at dierent bin and ell sizes. (a) Comparison results of image set pears. (b) Comparison results of omparing a tomato with pears. ( ) Comparison results of a up with pears. (d) Comparing results of up, tomato and pears using only one ell.. Figure 3.7:.

(44) 32. Chapter 3. Feature ve tors (x, y) oordinate is updated using velo ity ". xk vk. #. =A. ". v + k, #. xk−1 vk−1. + wk−1 .. (3.4.2). The value of A is user dened to be either a random position model, A=. ". 1 0 0 0. #. ,. (3.4.3). or a onstant velo ity model, also used in [6℄ A=. ". 1 1 0 1. #. .. (3.4.4). 3.5 Feature adaptivity Changes in lighting and shape of the obje t, result in a bad representation of the histogram des ribing the obje t. Appearan e hanges of an obje t an be handled by adapting the model to in rease tra king robustness. Adaptivity as implemented and tested here is presented in [6℄, [8℄, [15℄. When tra king an obje t in real-time, adapting the target model needs to be done automati ally. The model qk is adapted using, qk = α × sjk + (1 − α) × qk−1. (3.5.1). where sjk is the most likely obje t position at time k. Ea h target bin is blended mixing α ∈ {0, 1} with sample j , having the highest appearan e similarity of all the samples. This is done for both the HS- and V-histogram olour feature and HOG feature. The hoi e of α is dire tly related to the onden e measure des ribed in the following se tions..

(45) Chapter 3. Feature ve tors. 33. Sele tive adaption is ne essary to avoid adapting the target feature. models in ases where the tra ked obje t is lost. If the loss is undete ted, the models will be in orre tly updated and be ome orrupted. This is learly an undesirable ee t. Automati adaption is possible using a onden e measure to only adapt if the system has a high onden e that the obje t is being tra ked. A slow adaption rate handles o lusion better sin e the target model hanges less over time. Fast appearan e hanges are handled when the rate of adaption is qui k. Note that the rate at whi h adaption is applied ae ts the situations that an be handled by the tra ker. Consider the situation when the tra ked obje t moves behind a stru ture and the adaption is fast. While the obje t is lost from view the target model is adapted in orre tly using the best predi ted lo ation. When the obje t reappears it might not be tra ked orre tly due to a bad representation of the target model.. In ases where the obje t is being tra ked with high pre ision, the tra king pdf has a high peak and most samples are grouped together. On the other hand a low ertainty of obje t position is shown by a uniform pdf. In [6℄ the onden e is measured dire tly from the tra king pdf. The onden e measure is des ribed by the degree of unimodality of the resulting pdf p(x | Z). A low onden e is measured when the pdf has a very uniform distribution. The parti le weights approximate the onden e of the tra ked pdf. This omputationally simple onden e measure works well. However, failure an o ur. When the ba kground region's olours or textures are similar to the target model or the size of the parti le's image pat hes are onsiderably smaller than the region being tra ked, whi h might have a uniform olour, onden e is low. In both ases the pdf be omes more uniform, and the onden e measure in orre tly results in a tra king loss. Experiments using dierent values for σ in (3.2.5), illustrated that the onden e measure is related to the hoi e of σ . Note that σ determines the varian e in the position of parti les. A onden e value is obtained from a.

(46) 34. Chapter 3. Feature ve tors. threshold dened by. 1 Sσ < K( ), σ where σ is the user dened value from (3.2.5), K a normalisation and Sσ. the standard deviation of the tra king pdf. As mentioned previously the hoi e of alpha is dire tly related to the onden e measure. Sin e the onden e value an be al ulated during run-time it is used as the value for α in (3.5.1). It is now lear that the target model is only adapted when the onden e is high. The next se tion des ribes how to re over from tra king failure. Both methods des ribed above are used to al ulate a onden e when testing whether to adapt the histogram model. These methods an also be used to determine whether an obje t is being tra ked orre tly.. 3.6 Finding an obje t and dete ting a loss Assume that the obje t that will be tra ked is known. Then its features, available as a pdf, are also known. Finding the obje t's position in an image is then possible. Using the prior knowledge of the target histogram, a sear h for the obje t in the rst frame an be done. The Bhatta haryya similarity measure (3.2.3) is used to ompare the target model at every image region. These regions will have a low similarity when the obje t is not present and a high similarity when the obje t appears in the frames. A mean value µ and a standard deviation σ of the similarities in all the regions are al ulated M 1 X ρ [sRi , q] , µ= M i=0. σ2 =. M 1 X (ρ [sRi , q)] − µ)2 , M i=0. (3.6.1) (3.6.2).

(47) Chapter 3. Feature ve tors. 35. where sRi are samples al ulated at ea h of the M regions in the image. Assuming a Gaussian distribution1 an appearan e threshold is dened, in [14℄, ρ [sRi , q] > µ + 2σ. (3.6.3) The appearan e threshold indi ates a 95 % onden e that the region Ri is not part of the ba kground. The parti le lter is initialised in the region if more than a user-dened fra tion of the sample set s meets the appearan e threshold. The same rule is applied to dete t when the tra ker loses the obje t. When the obje t leaves the frame or be omes o luded for a ouple of frames, ondition (3.6.3) fails and the initialisation phase is entered again.. 3.7 Perspe tive adaption Adjusting the region size a ording to the obje t's per eived size is ne essary to robustly tra k obje ts in a 3D environment. An obje t moving away from the amera, hanges size relative to the amera's perspe tive. Dete ting whether an obje t is moving loser or away from the amera is done by sampling at dierent region sizes. The region's size is sampled at ±2% of the region size (Hx , Hy ) at the urrent best predi ted lo ation and ompared to the target model. If a omparison is found to have a higher similarity to the feature models, the region size is adjusted. Note that features su h as olour and HOG are s ale invariant so an adjustment of the region size does not ae t the features. Also, it is useful to only adapt when there is a signi ant dieren e in the similarity value to minimise omputation. This adaption is not dire tly related to the onden e measure, but results in a higher onden e if the obje t's size is sampled at orre t region sizes to avoid in luding ba kground whi h leads to bad feature model representations. 1 The. empiri al rule states that for a normal distribution assumption, about 68% of the values are within 1 standard deviation of the mean, about 95% of the values are within two standard deviations and about 99.7% lie within 3 standard deviations..

(48) Chapter 3. Feature ve tors. 36. 3.8 Algorithm Implementation of Algorithm 3 follows the same steps as the basi parti le lter from Algorithm 1 using the resampling step des ribed in Se tion 2.5.2. The spe ialisation of the feature-based steps are des ribed using the models and rules des ribed throughout this hapter. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24. #Initialization step; qk = get observation model at time, k=0;. while true do. µ, σ from eq 3.6.1 and 3.6.2; N X f= ρ pik , qk > µ + 2σ ;. if. i=0. f > (0.1)N then objectf ound = true;. end. #Measurement step;. for i ← 1 to N do end. sik−1 ← get particle samples, eq. 3.2.2; i πk−1 ← assign particle weight, eq. 3.2.5;. normalize πk−1 ;. # Robustness improvements; // onden e measure;. if. objectf oundandconf idence > T hreshold then Adapt_sample_size(); qk+1 = adapt histogram(pk−1, qk ) , eq. 3.5.1;. end. πk ← resample pdf πk−1 using algorithm2;. # Predi tion step;. sk ← apply motion model, eq 3.4.1;. end Algorithm 3: Feature-based parti le lter algorithm.

(49) Chapter 3. Feature ve tors. 37. 3.9 Feature tra king experiment Implementation of Algorithm 3 is tested using ea h of the feature types; olour, texture and a ombination of both. In the latter ase, a user dened weighting value is used to ombine the features using alpha blending. For generality, an arbitrary number of features an be handled in this manner.. Experiment 1 A simulated test is done to a omplish the following; Colour obje t tra king using a HS-,V- histogram des riptor Texture obje t tra king using a HOG-histogram des riptor Combined feature tra king Tra king through lutter/noisy ba kground Corre t tra king with partial o lusion Corre t tra king with full o lusion. As shown in Figure 3.8 the simulated test pla es four simple rigid shapes, two triangles and two re tangles, ea h following a ir ular path shown in Figure 3.8. All obje ts have onstant movement and maintain their ir ular motion in their own dire tion. Ea h of the re tangle and triangle olour shapes interse t and overlap with other shapes with the same olour. The white re tangle is the obje t being tra ked. Figure 3.8 shows the re tangles at interesting positions as well as the parti les X,Y movement along the ir ular path. The sequen e runs for 620 frames. The parti le's position and weight are shown as red ir les on the image where the size of the ir le is dire tly related to the weight.. Colour obje t tra king It is lear that the olour-based tra king is likely. to fail at some point due to ba kground olours and other shapes with the same olour. The parti le movement in the Y-dire tion Figure 3.8 (c) of this sequen e shows that at frame 80 the dire tion hanges as the tra ker.

(50) 38. Chapter 3. Feature ve tors. [a℄. [b℄. [ ℄. [d℄. [e℄. [f℄. [g℄. [h℄. [i℄. (a,b, ) Colour tra king frame 80, (d,e,f) HOG tra king frame 310, (g,h,i) feature ombined frame 250. (a,d,g) Tra ked parti les X-movement, ( ,f,i) tra ked parti les Y-movement. 50 parti les are used with a zero velo ity motion model. Figure 3.8:.

(51) Chapter 3. Feature ve tors. 39. onfuses obje ts moving in opposite dire tions. The failure is the result of a higher similarity value for a white triangle than the white square when the two obje ts ross (a partial o lusion).. HOG obje t tra king The HOG tra ker fails at frame 310 when the. two re tangle shapes with olours green and white overlap (full o lusion). From Figures 3.8 (d, f ) we see the sudden hange of dire tion in parti le movement at frame 310. From Figure 3.8 (e) it an be seen that the parti les are distributed a ross both re tangles ea h having a high similarity value as they move past ea h other.. Combined obje t tra king When features are ombined, it is lear from. the X-and Y-dire tion graphs, Figures 3.8 (g, i), that the parti les tra k the orre t obje t throughout the sequen e su essfully through partial o lusion (re tangle moves under triangle) and full o lusion (white re tangle moves under green re tangle).. Experiment 2 The goal of this experiment is to illustrate HOG feature. adaption to a urately tra k an obje t rotating as des ribed in Se tion 3.5. As des ribed previously, a single ell HOG feature is not rotationally invariant. However, obje t rotation an be dete ted by a linear shift of the histogram bins. This experiment tra ks a rotating square obje t. The obje t is rotating around its enter while it is following a ir ular path. The HOG feature is represented as a ve tor where the angle des ribes the histogram bin and the bin value the ve tor's magnitude. In Figure 3.9 the HOG feature is shown in ea h of the frames at the bottom right orner. Adapting the feature model is only done when the tra king onden e is high. A low tra king onden e is measured when only HOG is used. The reason for this is that the noisy ba kground is similar to the obje t. The result is that adaption to the rotating obje t is not done. Tra king fails when the obje t has rotated more than the linear shift allows. To in rease the onden e the olour feature is also used. From the parti les X and.

(52) Chapter 3. Feature ve tors. 40. Tra king experiment 2: HOG adaption. Tra king a square over 400 frames. Sele ted frames 1,50,100,200,300,400, shown. (Bottom left) X-position of parti les.(Bottom right) Y-position of parti les. Figure 3.9:.

(53) Chapter 3. Feature ve tors. 41. Y positions it is lear that tra king using the HOG adaption is su essful. when the olour feature is used to in rease onden e. When the onden e is high, noti e how the noisy edges in the ba kground also be ome part of the model, seen as ve tors at right angles. Feature adaption is su essful, and the rotating obje t is tra ked a urately.. 3.10 Summary Both olour and texture information are modelled as features that an be used to des ribe an obje t. The histogram methods used allow that adapting to obje ts undergoing small shape, rotation, size or olour hanges an be handled ee tively. These feature are learly useful for tra king purposes. Also, ombining features signi antly improve results. It is also important to note that the pro ess runs real-time when the region sizes are small. Performan e is mostly ae ted by the number of parti les and the image region sizes that need to be pro essed to extra t features..

(54) Chapter 4 System implementation Robustly tra king obje ts relies heavily on a urate features. The featurebased parti le lter is most ee tive when the obje t is rigid, an only rotate in a 2D-plane and has a onstant olour and texture histograms. The self-adapting histogram omponents and onden e measure are added to handle realisti tra king s enarios more ee tively. The next hallenge is to obtain prior knowledge of the obje t's features and dynami information about its movement. This hapter des ribes these hallenges and presents an approa h to integrate automati feature extra tion of moving obje ts and feature-based parti le ltering inside a system.. 4.1 System design and goals Automati obje t tra king relies heavily on robust obje t dete tion and, in our ase, initialisation of motion and features. These dierent hallenges are implemented in self- ontained modules that need to be integrated in a system. A modular approa h des ribed in [7℄ is used where a semanti ladder, built from feature extra tion to a tion re ognition, des ribes the hallenges as well as the system implementation. This intuitive design approa h allows for models that an be reated to handle a spe i problem where ea h step up the ladder relies on the previous step. In this way the system is dynami in that all omponents an be repla ed as improvements. 42.

(55) Chapter 4. System implementation. 43. to te hnology and algorithms be ome available. Using this high level design methodology an automati tra king system has been developed. Ea h of the following se tions are modules that, when ombined, handle the system from initialisation to the tra king of an obje t. The design and implementation of feature models from Chapter 3 are shown in Figure 4.1. Note that at ea h level the design is modular and easily extendable.. Figure 4.1:. Feature-based tra king modules. An overview of the system modules is shown in Figure 4.2 and des ribed in the following se tions. Note that the fo us of these se tions are to dete t an obje t of interest in a s ene. Dete tion an also be used to tra k obje ts by means of repeated dete tion in every frame. This is very time onsuming and it is omputationally mu h qui ker to tra k an obje t using predi tion methods su h as the parti le lter. Real-time tra king is onsidered to.

(56) Chapter 4. System implementation. 44. be at least 5 frames per se ond (fps). Although, speeds of more than 10 fps is a hieved when using a single module. Most web ameras an perform theoreti ally up to 30 fps, but realisti ally speeds of 5 to 20 frames is normal.. Figure 4.2:. System module design. 4.2 Ba kground modelling A ba kground region ontains obje ts that stay in the same pla e or bounded region over time, while foreground regions or regions of interest move around more freely. Information about a s ene's ba kground is useful to minimise noise during tra king or when extra ting features. A ba kground model an be dened as a referen e stru ture that des ribes the ba kground of a s ene. The simplest stru ture being a time-averaged referen e image where.

(57) Chapter 4. System implementation. 45. on urrent frames are subtra ted. Con urrent subtra tion of frames results in very noisy images that need to be leaned, usually using thresholds. Obtaining an a urate ba kground model using these omputationally simple algorithms requires a training period within a ontrolled environment absent from movement, foreground obje ts or illumination hanges. Any hanges to the s ene requires a re-estimation of the ba kground. This type of solution onsequently requires that the ba kground be updated onstantly. Ba kground modelling is a separate eld of resear h and two popular types of ba kground modelling te hniques are investigated in this se tion and ompared with a time-averaging method. A foreground obje t dete tor [11℄ and an adaptive ba kground mixture model [9℄ are investigated. These methods are hosen based on their ability to dynami ally model omplex s enes and real-time exe ution. Both FGD and the ba kground mixture model algorithms are implemented in the OpenCV library. 4.2.1. Foreground Obje t Dete tion (FGD). In a omplex s ene, possibly ontaining dynami moving obje ts su h as trees, ba kground pixels an have multiple values. FGD integrates multiple features where most other ba kground modelling te hniques only use one type of feature to model stati and dynami parts. The FGD's fo us is to model dierent parts of the ba kground using dierent types of features. Feature models for both stati and dynami ba kground pixels are used. Extra ting foreground obje ts from a omplex s ene is done using a Bayes de ision rule whi h has been extended to deal with general features. It is then possible to lassify both ba kground and foreground pixels using multiple features.. Classi ation rule A lassi ation rule is formulated in general to las-. sify a pixel as foreground or ba kground. Following the notation in [11℄, let vk be a feature at time k lo ated at position r = (x, y) where r is possibly.

(58) 46. Chapter 4. System implementation. a ba kground or a foreground pixel. Using Bayes theorem, the posterior probability of vk of a ba kground pixel b or foreground pixel f is P (C | vk , r) =. P (vk | C, r)P (C | r) , P (vk | r). (4.2.1). where C = f or b. Classi ation of a pixel as foreground using Bayes rule is given by P (f | vk , r) > P (b | vk , r). (4.2.2) To lassify a pixel at run-time as part of the foreground or ba kground the probabilities, P (b | vk , r), P (vk | r) and P (vk | b, r) need to be trained. A table stru ture is used to store these statisti s for every pixel in the image.. Table of feature statisti s In [11℄ a histogram of feature ve tors is used to approximate P (vk | r) and P (vk | b, r) whi h is not known in general. A ba kground pixel only has a limited number of values, they are onsidered to only be on entrated in a small subspa e of the feature histogram. This indi ates that with a good feature sele tion a ba kground pixel an ee tively be overed by a small number of histogram bins. On the other hand, foreground pixel values will not be as on entrated in these histogram bins and will in general be spread more widely. Then we let P (vik | r), i = 1, ..., N be the rst N bins from the feature histogram des ribing the multiple ba kground values. A table of feature statisti s is reated to store the dierent feature histograms. The table Svr,kk of feature statisti s maintains three omponents for every pixel in an image,. Svr,k,i = k.     . i pk,i v = P (vk | r) i pk,i v,b = P (vk | b, r) ,. vik = [ai1 , ..., ain]T. where aij are the dierent states that a feature an have. For a feature vk the table maintains the most signi ant portion, where there is the.

(59) Chapter 4. System implementation. 47. highest on entration of pixel values, of the feature histogram. The table is maintained by ea h update of the ba kground model. The probability to lassify a pixel as foreground, P (b | vk , r), P (vk | r) and P (vk | b, r), are known for ea h pixel from the feature table statisti s.. Feature ve tors For a pixel lassied as part of a stati ba kground the olour is hosen as a feature to be stored in the table and, vk is substituted in (4.2.1) by k = [rk gk bk ]T . Stati ba kgrounds where pixel values do not hange over time is simple to handle. The feature k is hosen if the rst N entries in the feature table do not vary.. A moving ba kground's pixel values hange between frames. The olour o-o urren e of the hange in pixel values between frames are hosen as a feature ve tor and again, vk is substituted in (4.2.1) by ok = [rk−1 gk−1 bk−1 rk gk bk ]T . Sele ting the olour o-o urren e feature is based on the observation that, for a moving ba kground, the pixel values varies greatly, and Sro,k,i and always at the same lo ation in an image. Both states S r,k,i k k are stored for every pixel to represent the multiple states. Representing the ba kground using multiple states allows for alternating pixel values without noisy interferen e with foreground obje ts. In [11℄ the omplete algorithm is dis ussed in detail. 4.2.2. Mixture of Gaussian ba kground modelling. The adaptive mixture of Gaussian (MOG) models the variation in pixel values using a Gaussian mixture model (GMM) onsisting of up to K Gaussians, where 3 ≤ K ≤ 5. Ea h pixel in an image is modelled by a MOG distributions. Dierent Gaussian represents dierent olours. Note that we have mentioned that ba kground pixels are present in a s ene for longer periods. Then, a weight w is applied to ea h Gaussian that is proportional to the time those olours stay in a s ene. The idea is that a pixel is drawn from a GMM allowing for multi-modal distributions of pixel values. The rst N most frequent o urren es of a spe i olour is onsidered to rep-.

(60) 48. Chapter 4. System implementation. resent the ba kground model. The adaptive ba kground mixture model is developed in [9℄ and builds on previous work done by Grimson and Stauffer [19℄. This method improves the update speed (learning time) of the ba kground model.. MOG Model The K Gaussian's at pixel r = (x, y) models the probability of the olour values k = [rk gk bk ]T at time k and we write p( k ) =. K X i=1. w i η ( k | µ i , Σi ) ,. K X. wi = 1.. (4.2.3). (4.2.4). i=0. as a 1 dimensional GMM. Then wi is the weight of the ith Gaussian and η ( k | µi , Σi ) is its normal distribution. Training is needed to nd wi , µi, Σi and the standard EM algorithms are used. The method is improved upon in [9℄ to speed up the learning time. A two step pro ess is used in the optimised equations. Firstly, estimation of the mixture model by the EM algorithms are performed. After this initial estimate, the updating step only onsiders the last L frames allowing urrent hanges in the s ene to have a higher priority. This improved adaptive MOG adapts qui ker and has a learning time mu h shorter than [19℄.. 4.3 Motion tra king Motion tra king pi ks up onstant motion based on repeated dete tion in every frame. Motion dete tion is used to nd regions of interest whi h are dened as regions whi h are onsistently present in onse utive frames to minimise noise. Sin e noise is onsidered random it is assumed not to have a onstant motion and is only present in a sequen e of frames for short periods. S enes omposed of obje ts in onstant movement in front of stati ba kgrounds are assumed. Note that motion tra king refers to a.

(61) Chapter 4. System implementation. 49. simple method for dete ting obje ts in a s ene and the tra king only refers to the mat hing (keeping tra k) of these regions between frames. 4.3.1. Motion Dete tion. Motion dete tion is the rst step in pro essing input frames. Conne ted omponents (blobs) in image M are segmented into re tangular regions. From experiments it is found that many re tangle regions of the same moving obje t overlap. These overlapping re tangles are ombined to form a larger re tangular region to ompletely bound the obje t. Filtering out of small regions is done after overlapping re tangular regions have been ombined. These regions are pro essed as des ribed in the following se tion. Motion dete tion builds up a motion image that aptures pixels that hange between frames. Motion is dete ted by maintaining a sequen e of the last onse utive frames in gray s ale. A silhouette image, S , is al ulated by the absolute dieren e between frame at time k and its pre eding frame at time k − 1, and then thresholded to remove small and isolated noisy regions. The motion image M is onstru ted and maintained by updating M using S ,   if S(x, y) 6= 0,  k, M(x, y) = 0, if S(x, y) = 0 and M(x, y) < k-D,   M(x, y), otherwise,. where D is the duration that pixels are allowed to be present in a s ene. D is a user-dened onstant value in millise onds. A large value for D in reases the time a pixel is present in M and usually results in a delayed shadow or ghost ee t. A small value of D de reases the likelihood that slow motion is dete ted..

(62) Chapter 4. System implementation. 4.3.2. 50. Motion tra king implementation. Regions dete ted in the motion algorithm are pro essed and tra ked. Not all of these regions are onsistent in their motion over a time period and need to be dis arded. The pro ess is divided into logi al se tions and is des ribed in ea h of the following steps: dete ting tentative regions, onrming a tentative region and preparing a region for initialisation for feature extra tion in the parti le lter.. Step 1: Tentatives Motion tra king keeps tra k of regions that have. been dete ted. Newly dete ted regions, blobs of motion pixels group together, are labelled as tentative when they rst appear. Mat hing of these regions to previously dete ted regions is done in a nearest-neighbour fashion. Only regions within the tentative region's neighbourhood are tested for a mat h. Regions are mat hed by their width and breadth. These regions are allowed to hange in size in onse utive frames. If there is no mat h to previous regions, the new region is given a timestamp and linked to a tentative list of regions.. Step 2: Conrmed Continuous dete ted tentative regions are upgraded. to a onrmed region if motion is present for a minimum time limit. Any region from the tentative or onrmed list is removed if they are not dete ted within that minimum time limit. This step has the advantage that ba kground noise is qui kly removed before the region is onrmed. Also, any onrmed regions are removed if in onse utive frames there is no new dete ted region that mat hes the size and velo ity in that region's lose proximity.. Step 3: Initialisation Ea h region in this tra ker has a history ve tor h des ribing its positions (x, y), width and height (w, h) and speed omponents (vx , vy ), su h that h = (x, y, w, h, vx, vy ) over a period of sequential. frames. These parameters are used when onrmed regions are passed to the parti le lter to automati ally initialise the dynami s and extra t features..

No results found