Semi-interactive construction of 3D event logs for scene investigation

(1)

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Dang, T.K.

Publication date 2013

Link to publication

Citation for published version (APA):

Dang, T. K. (2013). Semi-interactive construction of 3D event logs for scene investigation.

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

(2)

Chapter

7

Summary and Conclusion

7.1 Summary

The aim of this thesis is to enhance scene investigation processes by developing automated techniques which support the investigator in capturing and reviewing the scene. Our contri-bution is a complete framework to build 3D event logs that capture the scene as well as the investigation process itself. The capabilities and limitations of existing methods have been studied and successful components have been incorporated into the framework. From there new automatic and semi-interactive methods have been developed. The research leading to the framework has been described in five chapters.

Core to the framework is the use of 3D models of the scene as many tasks in scene investigation require or can benefit from such a model. For instance, the measurement of certain objects can be done off-site without pre-determining what needs to be measured while capturing. Complicated tasks like hypothesis validation absolutely require a 3D model of the scene. Thus it would be ideal if we have an efficient method to easily model a scene for investigation from some images, or even better from a video log of the investigation process. In Chapter 2, we show how to apply automatic methods to reconstruct a 3D model of a scene from videos. 3D reconstruction from images or video has been extensively studied in literature. While good results have been shown in controlled environments using high quality still images, we have established several challenges of using video input in an uncontrolled environment. This chapter gives an overview of the complete process and reviews the related work in each step. By doing so we identify the opportunities to apply 3D reconstruction from images/videos in scene investigation.

One aspect that determines the usefulness of a 3D model in scene investigation is its accuracy. In Chapter 3, we look into one of the aspects affecting the accuracy of a recon-structed 3D model, namely the feature location accuracy. Many modern feature detectors such as SIFT, SURF, and Hessian-Affine detect blob-like structures. Thanks to their robust-ness, those blob detectors have become very popular in many applications. We focus on 3D reconstruction, an application where next to robustness accuracy is an important factor. We identify three factors influencing the accuracy namely lens distortion errors, stochastic errors,

(3)

102 Chapter 7. Summary and Conclusion

and perspective errors. The first two are well studied in literature, we consider the perspective errors. Our analysis is based on the observation that blob detectors detect the centroids of the projections of representative blobs of structures as features; but because of perspective dis-tortion, centroids of projections are not mapped to the centroids of representative blobs. We analyze the resulting erroneous localization using a simplified theoretical model, and show that the effect is small, but systematic, and measurable for six modern detectors. In addi-tion, we predict two effects in 3D reconstrucaddi-tion, which we confirm in a typical experimental setup. In practical settings, the random errors and lens distortion errors have a higher impact. We conclude that in such practical cases priority should be given to reducing the other kind of errors before finding ways to correct for the perspective errors.

Chapter 4 studies the intrinsic difficulties occurring when capturing indoor scenes, com-monly encountered in scene investigation. Many methods for exist for reconstructing outdoor scenes, but indoor scenes are more difficult. Due to the limited space for movement, the input is very often close to being degenerate. For such input making a model is impossible without smart input processing. This chapter presents a framework for modeling of indoor scenes. We first analyze the video to segment it into general and degenerate parts. Rather than ignor-ing degenerate segments, we utilize them to build a model that is far more complete than the ones obtained with traditional frameworks.

Though we have made improvement in terms of completeness when automatically re-constructing 3D models from videos of a scene, the result is still far from what is needed in practice. The accuracy of 3D models built by automatic methods is limited, and the vi-sual quality varies. In Chapter 5, we present a semi-interactive method for 3D reconstruction specialized for indoor scenes which combines computer vision techniques with efficient in-teraction. We use panoramas, popularly used for visualization of indoor scenes. As starting point we use panoramas, popular for their great field of view, but clearly not able to show depth. Exploiting user defined knowledge, in term of a rough sketch of orthogonality and parallelism in scenes, we design smart interaction techniques to semi-automatically recon-struct a scene from coarse to fine level. The framework is flexible and efficient. Users can build a coarse walls-and-floor textured model in five mouse clicks, or a detailed model show-ing all furniture in a couple of minutes of interaction. We show results of reconstruction on four different scenes. The accuracy of the reconstructed models is quite high, around one per-cent error at full room scale. Thus, our framework is a good choice for applications requiring accuracy as well as applications requiring a mere 3D impression of the scene.

Chapter 6 presents the core results of the thesis, a complete framework using video logs and 3D models to enhance scene investigation processes. In scene investigation, creating a work log using a handheld camera is more convenient and more complete than using photos and notes. By introducing video analysis and computer vision techniques, it is possible to build a system enabling users to navigate through the logs of an investigation in time and space. To complete the framework developed in the previous chapters, we develop methods for processing video logs and present an interface for navigating the result. The processing includes (i) segmenting a log into events using a novel structure and motion feature so that it is more accessible in the time dimension, and (ii) mapping video frames to a 3D model of the scene so the log can be navigated in space. Our results show that, using our proposed features, we can recognize more than 70 percent of all frames correctly, and that we find all the events. From there we provide a method to semi-interactively map those events to a 3D

(4)

model of the scene with which we can map more than 80 percent of the events. The result is a spatio-temporal representation of the investigation that nicely supports applications such as revisiting the scene, examining the investigation itself, or hypothesis testing.

In summary, Chapter 2, 3, 4 are our quest to come to our main contribution in Chapter 5 and 6, a solution for constructing 3D event logs for scene investigation.

7.2 Conclusion and Future Work

To find out to what extent automatic methods can be used for accurately reconstructing a 3D model from video logs, we have accomplished three things: a thorough review of literature, a quantitative analysis of feature location accuracy, and an improved method for 3D modeling from video in indoor scenes. All elements for an automatic 3D reconstruction system are in principle available. In some steps there are even more options to take, but how well those elements work together to meet the requirements of scene investigation, in terms of accuracy and completeness, is an unanswered question. Searching for an improvement in accuracy led us to an unstudied feature location uncertainty, the perspective drift in blob detectors. Fortunately, through theoretical and experimental study, we found that in practice the errors caused by perspective drift are small. This implies that we can hardly improve the accuracy of automatic 3D reconstruction by looking at the feature processing step. To improve the completeness of 3D models reconstructed automatically from videos, instead of throwing away degenerate input, we detect and use them in the reconstruction. In short, though we can improve automatic methods to reconstruct a scene from video more completely, the limited accuracy of the result is still an obstacle. As fully manual model construction is cumbersome, this implies that any practical solution requires a semi-interactive approach.

We have shown that semi-interactive methods indeed yields a practical approach to con-struct 3D models for use in scene investigation. Our panorama-based semi-interactive 3D reconstruction system for indoor scenes only requires simple and intuitive interaction to ef-fectively build accurate 3D models. It is suitable for a broad range of applications, from a coarse model created in a few seconds for a quick presentation, to a detailed model for measurement in crime scene investigation. There are still limitations, such as the lack of the ability to model complex objects. This, however, could be counteracted by other more ex-pensive techniques or portable equipments as objects that have fine details are usually small. Detecting and understanding investigation events can be done automatically using suit-able features in combination with machine learning. We obtain more than 70 percent accuracy when classifying frames into events. Quality depends on both scene content and user inten-tion, current results are promising. While the proposed solution is shown to be effective in locating important events, recovering the investigation story accurately is still an issue. As the framework for identifying investigation events is quite standard, we think that more ad-vanced machine learning and interaction techniques could further improve the purity of the resulting investigation stories.

To uncover the spatial relations between investigation events and 3D models, we again combined computer vision techniques with user interaction. This semi-interactive approach allows us to map all events to a 3D model of the scene, allowing to build a 3D event log of a scene. Such a log provides users with the new ability to navigate investigations logs

(5)

104 Chapter 7. Summary and Conclusion

in both space and time. This qualitatively changes the way users interact with investigation logs. Naturally, to bring this 3D event log into practice, the next step would be quantifying its benefit by evaluating user experience when navigating 3D event logs.

The framework we presented has the potential to drastically change the way scene inves-tigation is performed, improving both the quality and completeness. With the resolution and quality of head-mounted cameras and mobile phones rapidly increasing, the whole process of capturing the scene will become easy and unobtrusive. With limited interaction effort 3D event logs can be created and this opens up several new applications and possibilities for any professional performing scene investigation.