• No results found

Conclusions and recommendations

6.7 Conclusions and recommendations

The problem of estimating the motion of a feature point has two aspects: the accuracy of the result and the convergence of the tracker. The accuracy is well addressed by the standard feature point detectors. The corner-like points can be accurately matched. The Harris corner operator is a nice example.

A well conditioned matrix Z assures low sensitivity to the noise but only if the tracking converges. Our new goodness measures are related to the convergence of the tracker. Consequently, the new measures should be used as an additional check to improve the selection. Practically, the effect is that the ’clean’ corners, like L-corners from a structure in man made environment, that are usually fairly stable during tracking are preferred. The feature points from a local area with rapidly changing and inconsistent gradients are usually unreliable and discarded by the new criteria. An important case are the areas with repetitive structures (an example is discussed in the previous section).

Tracking can easily switch to a similar structure in the neighborhood and this errors would be difficult to detect.

Having no prior knowledge about the scene we select small patches (here it was 7 × 7 pixel) for tracking and assume no deformation between successive frames. For the feature point candidates (local maxima of the IRHarris or possibly IRSU SAN) we compute the additional IRSCRor IRHarris(σ)(blurred IRSU SAN (σ) is also possible) and discard the problematic ones. By blurring the images we consider a larger neighborhood of a feature point. When calcu-lating IRSCRwe use only the initial image and therefore also consider a larger neighborhood. Although the ’no deformation’ assumption between frames has less chance to be valid, using the larger neighborhood in this way seems to be useful in practice. The AUC results (table 6.1) show large improvements for the Harris detector IRHarris with the additional check using the IRSCR or the blurred IRHarris(σ). The empirical combination of the new measures leads to even better results for our data set.

0 0.2 0.4 0.6 0.8 1 0

0.2 0.4 0.6 0.8 1

False Positive

True Positive

Harris+SCR Harris Random guess (theoretical)

Harris

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

False Positive

True Positive

SUSAN+SCR SUSAN

Random guess (theoretical)

SUSAN

Figure 6.2: Improvement using SCR

0 0.5 1 1.5 2 2.5

0 1 2 3

σ min( λ12 )

point #4 point #12

IR-Harris with blurring

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

False Positive

True Positive

Harris+Harris(σ=3.5) Harris+Harris(σ=2.5) Harris+Harris(σ=1.0) Harris(σ=1.5)

Harris with blurring

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

False Positive

True Positive

Harris+SCR Harris+Harris(σ=2.5) Harris+(SCR+log(Harris(σ=2.5)))

combination

Figure 6.3: Improvement with blurring

Corner Selection + Additional check AUC (Standard Error) Harris + (SCR + log(Harris(σ = 2.5))) 0.77(0.010)

Harris + SCR 0.73(0.011)

Harris + Harris(σ = 2.5) 0.72(0.011) Harris + Harris(σ = 3.5) 0.70(0.011)

SUSAN + SCR 0.67(0.012)

Harris + Harris(σ = 1.0) 0.63(0.012)

Harris(σ = 1.5) 0.58(0.013)

Harris 0.56(0.013)

SUSAN 0.50(0.013)

Table 6.1: Area under ROC curve- comparison

6.7. CONCLUSIONS AND RECOMMENDATIONS 93

artichoke backyard

building01 building02

charge cil-forwardL

hotel house

marbled block unmarked rocks

Figure 6.4: Image sequences

Bibliography

[1] A. Azarbayejani and A. Pentland. Recursive estimation of motion, struc-ture, and focal length. IEEE Transactions Pattern Analysis and Machine Intelligence, 6(17), 1995.

[2] S. S. Beauchemin and J. L. Barron. The computation of optical flow.

ACM Computing Surveys, 27(3):433—467, September 1995.

[3] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik. A real-time com-puter vision system for measuring traffic parameters. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, 1997.

[4] A.P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145—1159, 1997.

[5] R. Fletcher. Practical Methods of Optimization. J. Wiley, 1987.

[6] A. Fusiello, E. Trucco, T. Tommasini, and V. Roberto. Improving fea-ture tracking with robust statistics. Pattern Analysis and Applications, 2(4):312—320, 1999.

[7] G. D. Hager and P. N. Belhumeur. Efficient region tracking with paramet-ric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10):1025—1039, 1998.

[8] C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of 4th Alvey Vision Conference, pages 147—151, 1988.

[9] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vi-sion. Cambridge University Press, 2000.

[10] Berthold K. P. Horn and Brian G. Schunck. Determining optical flow: a retrospective. Artificial Intelligence, 59(1-2):81—87, January 1993.

95

[11] H. Jin, P. Favaro, and S. Soatto. Real-time feature tracking and outlier rejection with changes in illumination. In proceedings of ICCV, pages 684—689, 2001.

[12] A. Kuijper and L.M.J. Florack. Understanding and modeling the evolu-tion of critical points under gaussian blurring. In Proceedings of the 7th European Conference on Computer Vision, pages 143—157, 2002.

[13] T. Lindeberg. Feature detection with automatic scale selection. Interna-tional Journal of Computer Vision, 30(2):77—116, 1998.

[14] B.D. Lucas. Generalized Image Matching by the Method of Differences.

PhD Thesis, Dept. of Computer Science, Carnegie-Mellon University, 1984.

[15] B.D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings IJCAI81, pages 674—679, 1981.

[16] Michael Otte and Hans-Hellmut Nagel. Estimation of optical flow based on higher-order spatiotemporal derivatives in interlaced and non-interlaced image sequences. Artificial Intelligence, 78(1/2):5—43, Novem-ber 1995.

[17] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interest point de-tectors. International Journal of Computer Vision, 37(2):151—172, 2000.

[18] J. Shi and C. Tomasi. Good features to track. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pages 593—600, 1994.

[19] S.M. Smith and J.M. Brady. SUSAN - a new approach to low level image processing. International Journal of Computer Vision, 23(1):45—78, 1997.

[20] Y. Song, L. Goncalves, and P. Perona. Unsupervised learning of human motion models. In, T. G. Dietterich and S. Becker and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, 2002.

[21] C. Tomasi and T. Kanade. Detection and tracking of point features.

Carnegie Mellon University Technical Report CMU-CS-91-132, 1991.

[22] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization approach. International Journal of Com-puter Vision, 2(9):137—154, 1992.

BIBLIOGRAPHY 97 [23] M. Trajkovic and M. Hedley. Fast corner detection. Image and Vision

Computing, 16(2):75—87, 1998.

[24] M. Vidyasagar. Nonlinear Systems Analysis. Prentice-Hall, 1993.

Chapter 7

3D Object Tracking

A simple method is presented for 3D head pose estimation and tracking in monocular image sequences. A generic geometric model is used. The initial-ization consists of aligning the perspective projection of the geometric model with the subjects head in the initial image. After the initialization, the gray levels from the initial image are mapped onto the visible side of the head model to form a textured object. Only a limited number of points on the object is used allowing real-time performance even on low-end computers. The appear-ance changes caused by movement in the complex light conditions of a real scene present a big problem for fitting the textured model to the data from new images. Having in mind real human-computer interfaces we propose a simple adaptive appearance changes model that is updated by the measure-ments from the new images. To stabilize the model we constrain it to some neighborhood of the initial gray values. The neighborhood is defined using some simple heuristics.

7.1 Introduction

The reconstruction of 3D position and orientation of objects in monocular image sequences is an important task in the computer vision society. This paper concentrates on 3D human head tracking. The applications we have in mind are: model-based coding for video conferencing, view stabilization for face expression recognition and various possible human-computer interface applications. Anyway, the approach proposed here can be applied in general for rigid object tracking in 3D.

In the initialization procedure we align our generic geometric head model with the observed subject’s head. This can be done manually, or automatically

99

by using some other algorithm [12]. For new images in the sequence, tracking consists of estimating the human head pose with respect to this initial pose.

Because of the perspective projection of standard cameras it is possible to estimate the 3D pose from the 2D image data. We use an initially aligned generic geometric 3D head model. Therefore, as described later, the 3D pose is estimated only up to a scaling factor. However, this is of no importance for the applications we are considering.

The paper is organized as follows. Related work is presented in the next section. Then the geometric part of our model based approach is described.

The adaptive radiometric model is presented in section 7.6. Finally, the whole algorithm is described and some experimental results are discussed.