Enhancing Projected Augmented Reality Interactions using Haptic Technology

(1)

using Haptic Technology

Enhancing Projected Augmented Reality Interactions

Academic year 2019-2020

Master of Science in Information Engineering Technology

Master's dissertation submitted in order to obtain the academic degree of

Counsellors: Ir. Joris Heyse, Dr. ir. Maria Torres Vega

Supervisors: Prof. dr. ir. Filip De Turck, Dr. ir. Femke De Backere

Student number: 01710212

Nicolas Legrand

(2)

(3)

using Haptic Technology

Enhancing Projected Augmented Reality Interactions

Academic year 2019-2020

Master of Science in Information Engineering Technology

Master's dissertation submitted in order to obtain the academic degree of

Counsellors: Ir. Joris Heyse, Dr. ir. Maria Torres Vega

Supervisors: Prof. dr. ir. Filip De Turck, Dr. ir. Femke De Backere

Student number: 01710212

Nicolas Legrand

(4)

Acknowledgements

I would like to express my deep gratitude to ir. Joris Heyse for providing me persistent help and guidance as a mentor throughout the past year. I extend my immense gratitude to my supervisor dr. ir. Femke De Backere, for her valuable tips and feedback. In addition, I would like to thank the rest of my thesis committee members, prof. dr. ir. Filip De Turck and dr. ir. Maria Torres Vega who also made this dissertation possible.

I am extremely grateful to my parents, Jerry and Anne Marie Legrand, who have supported me throughout my entire life. Their love, caring and sacrifices enabled my education and prepared me for the future. I am also very much thankful to my girlfriend Latifa Obda for her constant support the past two years.

Special thanks to all the teachers from whom I have learned so many things.

(5)

Permission for use of content

“The author(s) gives (give) permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use.

In all cases of other use, the copyright terms have to be respected, in particular with regard to the obligation to state explicitly the source when quoting results from this master dissertation.”

(6)

Enhancing Projected Augmented Reality

Interactions using Haptic Technology

by

Nicolas Legrand

Master’s dissertation submitted in order to obtain the academic degree of Master of Science in Information Engineering Technology

Academic year 2019–2020

Supervisors: Prof. dr. ir. Filip De Turck, dr. ir. Femke De Backere Counsellors: Ir. Joris Heyse, dr. ir. Maria Torres Vega

Faculty of Engineering and Architecture Ghent University

Abstract

Augmented Reality (AR) is a versatile technology with a lot of potential that enables the inte-gration of virtual objects into 3D environments in real-time. However, a common problem with AR systems is the lack of tactile feedback when interacting with computer generated objects. In this paper, the focus lies on projection-based AR. Not much research has been done about the addition of haptic feedback to Projected Augmented Reality (PAR) specifically, which could potentially improve user experience, accuracy and speed while interacting with projected vir-tual objects. A haptic enabled PAR system is implemented and evaluated in terms of latency, performance and accuracy in this study. The results affirm that it is a suitable framework with a lot of potential.

Keywords

(7)

Verbeterde geprojecteerde Augmented

Reality interactie door haptische technologie

door Nicolas Legrand

Scriptie ingediend tot het behalen van de academische graad van Master of Science in de industriële wetenschappen: informatica

Academiejaar 2019–2020

Promotoren: Prof. dr. ir. Filip De Turck, dr. ir. Femke De Backere Begeleiders: Ir. Joris Heyse, dr. ir. Maria Torres Vega

Faculteit Ingenieurswetenschappen en Architectuur Universiteit Gent

Samenvatting

Augmented Reality (AR) is een veelzijdige technologie met veel potentieel die de integratie van virtuele objecten in 3D-omgevingen in real-time mogelijk maakt. Een veel voorkomend probleem met AR systemen is dat bij interactie met computer gegenereerde objecten er een gebrek is aan tactiele feedback. In deze paper ligt de focus op projectie gebaseerde AR. Onderzoek naar de toevoeging van haptische feedback aan Projected Augmented Reality (PAR) is beperkt terwijl dit de gebruikerservaring, nauwkeurigheid en snelheid zou kunnen verbeteren tijdens interactie met geprojecteerde virtuele objecten. Een haptisch geactiveerd PAR-systeem wordt geïmplementeerd en geëvalueerd op vlak van latentie, performantie en nauwkeurigheid in dit onderzoek. De resultaten bevestigen dat het een geschikt raamwerk is met veel potentieel.

Trefwoorden

(8)

Enhancing Projected Augmented Reality

Interactions using Haptic Technology

Nicolas Legrand

Supervisor(s): Prof. dr. ir. Filip De Turck, dr. ir. Femke De Backere, ir. Joris Heyse, dr. ir. Maria Torres Vega

Abstract— Augmented Reality (AR) is a versatile technology with a lot of potential that enables the integration of virtual objects into 3D environ-ments in real-time. However, a common problem with AR systems is the lack of tactile feedback when interacting with computer generated objects. In this paper, the focus lies on projection-based AR. Not much research has been done about the addition of haptic feedback to Projected Augmented Reality (PAR) specifically, which could potentially improve user experience, accuracy and speed while interacting with projected virtual objects. A hap-tic enabled PAR system is implemented and evaluated in terms of latency, performance and accuracy in this study. The results affirm that it is a suit-able framework with a lot of potential.

Keywords—Projected augmented reality, table-top, haptic gloves

I. INTRODUCTION

A

UGMENTED REALITY (AR) is one of today’s most en-chanting and futuristic technologies in the field of com-puter science and application development [1]. It enables the in-tegration of computer-generated information in views of the real world which is useful in a wide array of applications in various field such as healthcare, entertainment and architecture [2]. One of the most commonly accepted definitions of AR describes it as technology that possesses three key characteristics [2], (i) a dis-play allowing the combination of real and virtual images; (ii) it is able to generate interactive graphics and respond to user in-put in real-time; (iii) it is able to make the virtual image appear fixed in the real world. AR should not be mistaken for Virtual Reality (VR) in which the view of the real world is completely replaced by computer-generated graphics. AR systems usually rely heavily on computer vision techniques. Augmentation not only refers to adding objects to a real environment but poten-tially removing objects as well [2].

A common problem with AR is the lack of tactile feedback when interacting with computer generated objects [3]. While this does not render AR systems unusable, enhancing AR expe-riences using haptics can be a major added value [4]. The word hapticsis derived from the Greek word haptikos, which means “tactile” or “pertaining to the sense of touch”. It is a technology in the world of computer interface devices and looks promis-ing as many think it will change the way people communicate ideas and interact with information [5]. Even though research has been conducted about the use of Head-Mounted Displays (HMDs) in combination with haptic technology [6], as of to-day, not much can be found about projection-based AR systems where virtual haptic stimuli is provided during interactions.

The goal of this research is to design a framework for the study and evaluation of the effects of haptic feedback onto Pro-jected Augmented Reality (PAR). In PAR, physical objects are “augmented” by overlaying virtual view images directly on their surfaces. A haptic-enabled PAR system, as shown in Figure 1,

consists of a surface area in front of the user above which a pro-jector and a camera are placed. The user interacts with virtual objects projected on the surface area using his/her hands. The camera tracks the position of the physical objects as well as the user’s movements, gestures and interactions such as hovering, touching and grasping. Haptic feedback is provided to the user while wearing haptic gloves.

Projector Camera User V_irtual object Haptic gloves Surface

Fig. 1: PAR system scheme

The expected effect of haptic feedback on PAR is formulated in four hypotheses which the designed PAR can verify. Each hypothesis suggests an improvement while using a haptic glove over not using one:

Hypothesis 1 (H1): Receiving haptic feedback reduces the amount of time taken by the user to perform tasks compared to performing the same tasks without receiving haptic feedback. Hypothesis 2 (H2): Receiving haptic feedback while perform-ing tasks improves the user’s error recovery time compared to performing the same tasks without receiving haptic feedback. Hypothesis 3 (H3): Receiving haptic feedback while perform-ing tasks improves the user’s accuracy compared to performperform-ing the same tasks without receiving haptic feedback.

Hypothesis 4 (H4): Receiving haptic feedback while perform-ing tasks improves user’s perceived usability compared to per-forming the same tasks without receiving haptic feedback.

The rest of this paper is structured as follows. First, related work concerning PAR is covered in Section II. Second, a high-level overview of the PAR system implemented in this study is provided in Section III, including use cases, software archi-tecture, and hardware components. Implementation details of the PAR system such as processing steps and challenges faced are described in Section IV before proceeding to an evaluation of the system in terms of accuracy, performance and latency in

(9)

Section V. A conclusion is finally given in Section VI. II. RELATEDWORK

Xiao et al. [7] aim to achieve ubiquitous computing vision. To this end, they designed WorldKit, which is a library that lever-ages a depth camera paired with a projector to instantly con-vert ordinary surfaces into interactive ones. The users are able to “paint” interactive elements onto the environment with their hands. The framework implemented is interesting for develop-ers as it offdevelop-ers a set of abstractions that facilitates the creation of interfaces. The system is able to recognize a very limited set of objects, but expanding the set of everyday objects it can recognize would introduce considerable new possibilities.

A more portable implementation of PAR was developed by Harrison et al. [8] at Microsoft Research. The team created a wearable projection system named OmniTouch which consists of a depth camera and a pico-projector and is worn on the user’s shoulders. The virtual images are projected on the user’s hand or on surfaces in front of the user. Their work is different from most other PAR systems as it shows that PAR can be a quite mobile and flexible technology.

Punpongsanon et al. [9] developed a PAR system which pro-vides pseudo-haptic feedback called SoftAR. The sense of soft-ness perceived by the user is manipulated using visual tech-niques. By projecting the surface deformation and change in ap-pearance when the user touches a physical object, they succeed in making him/her believe it is softer than it actually is. The manipulation of senses other than sight using PAR sure seems fascinating and holds a lot of potential.

Mousavi et al. [10] show us an example of how useful PAR can be for patients during post-stroke rehabilitation using a table-top home-based PAR system. Patients interact with the system to practice basic motions such as touching, pushing, pulling, grasping, lifting and pouring while wearing colored markers on their fingertips to facilitate the computer vision as-pect of the system. As future work, the researchers suggested the addition of haptic feedback to make the interaction process more realistic and improve the effectiveness of the rehab sys-tem [10].

As mentioned earlier, while research has been conducted about the relevance of haptics in AR and VR in general [4], [11], [12], not much was found concerning PAR specifically. This led to the main question this paper sets out to answer: Does the addition of haptic technology improve PAR?

III. PROJECTEDAUGMENTEDREALITYSYSTEM

A high-level overview of the PAR system implemented for this study is provided in this section. First, the system’s use cases are described in Section III-A. Second, the software ar-chitecture is discussed in Section III-B before finally covering the hardware components in Section III-C.

A. Use Cases

Three use cases are designed to verify the hypotheses de-scribed in the introduction.

A.1 Trace the Line

The user is asked to trace a line with his/her fingertip. The user has to reach the other end of the line while staying as close as possible to it. While wearing a haptic glove, haptic stimuli in the form of vibrations are delivered to the user when he/she deviates from the line. The intensity of the vibrations is based on the distance from the line. Figure 2a shows an example of a line generated by the application and the user’s index finger location tracked during a test run.

A.2 Obstacle Course

The user is asked to drag an object from one end of the canvas to the other and drop it without letting it collide with obstacles along the way. Upon collision, the user receives haptic stimuli in the form of vibrations when wearing the haptic gloves. Slight pulses are also be received based on how far the object to drag is to the closest obstacle. Figure 2b provides an example of a generated obstacle course as well as the tracked fingertips and virtual objects during a test run.

A.3 Simon Says

Three colored blocks are projected in front of the user. In-structions are projected about which one of the colored blocks the user should touch. The goal is for the user to touch the right colored block during five rounds. The gloves vibrate for a short duration upon touching the correct colored block and vibrate for a longer duration upon touching a wrong block. Figure 2c illus-trates how the user is given instructions to touch colored blocks and the tracked fingertips during a test run.

B. Architecture

The system consists of multiple software components that need to interact with each other. Figure 3 provides an overview of the structure of the application, on which the Orchestration component can be seen. The architecture is based on the tor Pattern wherein the Orchestration component acts as Media-tor and encapsulates the interaction between most other compo-nents. The pattern was chosen to avoid tight coupling between the set of interacting components and to be able to easily change the interaction between them independently. The Camera com-ponent is responsible for everything related to the Red, Green, Blue - Depth/Distance (RGB-D) camera, from basic configura-tion to fetching color and depth frames from the device. The Displaycomponent contains projector and display related code, this component is responsible for all visual output of the system, i.e. everything the user sees. The system undergoes a calibra-tion phase before the user is able to interact with it, and all code related to that can be found in the Calibration component. The main goal is to make sure the input from the camera is inter-preted correctly and that the projection is accurate. Two kinds of models are used in the application. First, the Physical Objects component contains software counterparts of physical objects in the real world recognized by the system. They are used to pro-cess interactions with the system. Second, the Virtual Objects component contains computer generated objects with which the user is able to interact. Virtual objects are mostly basic shapes possessing basic attributes such as location and size. They all

(10)

(a) Trace the Line use case (b) Obstacle Course use case (c) Simon Says use case

Fig. 2: Use cases of the PAR system

expose an interface which allows the object to be drawn on an image at the right location to be projected afterwards. The De-tectioncomponent is responsible for the detection of physical objects: hands and fingertips. The detection is based on the sys-tem’s main input components which are the color and/or depth frames captured by the camera. Two detection methods are im-plemented, a markerless and a marker-based approach. Detected objects need to be tracked to get an idea of how fast they are moving. The Tracking component contains tracking code to monitor physical objects across frames and build a history of their locations. The Interaction component is used to detect and process interactions between virtual objects and physical objects as well as virtual objects interacting with other virtual objects. The Haptics component is responsible for providing haptic stim-uli to the user. This component exposes an interface to control the vibration of the haptic gloves. The Experiments component contains code to control the flow of experiments. It makes use of virtual objects and event handlers attached to them to do so. And finally, the Orchestration component coordinates the main application loop and leverages most of the other components to form a processing pipeline from frame capture to image projec-tion. Detect Physical Objects Update Update Create Create Track Physical

Objects Draw Vibrate

Process Interactions Calibrate System Control Flow Capture Frames Project Image «Component» Physical Objects «Component» Virtual Objects «Component» Interaction «Component»

Tracking «Component»Experiments

«Component» Detection «Component» Calibration «Component» Display «Component» Orchestration «Component» Haptics «Component» Camera

Fig. 3: Software components

C. Hardware

The setup consists of a desktop computer, a RGB-D camera, a projector, a haptic glove and a white canvas. The desktop computer used possesses an i5-4690 CPU, a Nvidia GeForce GTX 970 GPU and 8GB of RAM while running Windows 10. As for the projector, the LG PB60G projector is chosen for its

quality-price-ratio and convenience. The Intel Realsense D435 was chosen as RGB-D camera as it seemed to be a solid option, with good developer support. It offers simple out-of-the-box integration and quality depth for a variety of applications, in-cluding AR. And finally, the Noitom Hi5 VR Glove was chosen to provide haptic feedback to the user as it possesses an easily programmable vibration rumbler at each wrist. The camera and the projector are mounted next to each other on tripods behind the canvas at a height of 1 meter.

IV. IMPLEMENTATION

The implementation of the PAR system is covered in more detail in this section. First, the calibration process needed for the correct registration of virtual objects in the real world is described in Section IV-A. Second, the main processing steps enabling interaction between the user and virtual objects are de-scribed. Fingertip detection and tracking are covered in Sec-tion IV-B and SecSec-tion IV-C respectively. SecSec-tion IV-D then pro-vides information about how the system processes interactions between virtual and physical objects. An overview of essential libraries and frameworks is given in Section IV-E before finally covering the challenges faced during development in Section IV-F.

A. Calibration

The system first undergoes a calibration phase when it is launched before the user is able to interact with it. The calibra-tion process has two main goals, (i) obtaining a transformacalibra-tion matrix and (ii) obtaining a depth map of the background.

The transformation matrix is used to map camera frames to the projection image. This is necessary because the camera and the projector operate in different image coordinate systems. A homography can be estimated that maps each point of the cam-era image to the projector image. Homography is a mapping between two structures of the same type in projective geome-try [13]. Figure 4 shows an example of images from two dif-ferent viewpoints related by homography. This transformation matrix can then be used later on to convert all incoming camera frames to match the projection image coordinate system.

The transformation matrix is a 3 x 3 matrix that can be written as follows: H =   h00 h01 h02 h10 h11 h12 h20 h21 h22   (1)

(11)

X

Real world plane

Homography P image 1 image 2 O2 O1 p(x 1 ,y_{1 )} p'(x2,y 2)

Fig. 4: Images of the same plane from two viewpoints related by homography

Considering a first set of corresponding points — (x1, y1) in

the projection image and (x2, y2) in the camera image, the

trans-formation matrix maps them in such a way that:   x2 y2 1  = H   x1 y1 1  =   h00 h01 h02 h10 h11 h12 h20 h21 h22     x1 y1 1   (2)

The depth data from the camera is used to detect when users touch the canvas. Each depth frame obtained from the camera is a two dimensional matrix representing the distance of objects from the viewpoint. During the calibration process, the second goal is to keep a mapping of how far the canvas (background) is, to be able to differentiate background and foreground objects to support interaction between the user and virtual objects. B. Fingertip Detection

Two approaches are implemented for fingertip detection and tracking: a marker-based and a markerless approach. For the marker-based implementation the user wears fluo stickers at his/her fingertips, which helps the system to easily detect them with color filtering. The markerless approach implemented is based on the Convex Hull Algorithm described in Gurav’s pa-per [14]. This method also makes use of color filtering but on skin and glove color instead.

C. Fingertip Tracking

Markerless and marker-based fingertip detection only provide information about the location and shape of fingertips in indi-vidual frames. To calculate a fingertip’s speed or displacement across consecutive frames, it needs to be identified and tracked, for which Centroid Tracking can be used. Centroid Tracking makes use of the Euclidean distance between new centroids and old centroids from previous frames to identify them and build a location history for each fingertip [15].

D. Processing Interactions

Virtual objects are projected, and the user can interact with them with his/her fingertips. Callback functions can be attached to virtual objects which are called when they are hovered on, touched or pinched and dragged. The functions are also used to provide haptic feedback to the user where necessary. The implemented system supports hovering, touching, pinching and dragging the virtual objects projected.

E. Essential Libraries & Frameworks

PAR relies heavily on Computer Vision techniques for which the Open source Computer Vision (OpenCV) library is lever-aged. OpenCV is a cross-platform library written in C++ con-taining a large amount of programming functions, mainly aimed at real-time computer vision and it is used extensively around the world [16].

The Intel Realsense D435 camera comes with Intel RealSense Software Development Kit (SDK) 2.01and cross-platform sup-port. A Python wrapper of the SDK is available under the name pyrealsense2 and is included in the main application. The SDK offers extensive control over the camera and various options to configure and capture color and depth images.

As for the haptic gloves, Noitom makes SDKs2available for Unity and Unreal Engine only. A workaround is applied to con-trol the haptic gloves. The system is split into two applications. It consists of a main Python application, supported by a smaller Unity application in which the Hi5 Interaction SDK is included. The Python application and the Unity Application communi-cate using ZeroMQ3_{which is a high-performance asynchronous}

cross-platform messaging library available for Python as well as C#. That way, the main Python application can be used to trigger haptic stimuli by sending a simple message to the Unity application.

F. Challenges

A few challenges surfaced during the implementation of the PAR system. The processing steps performed can be quite resource-intensive. To ensure a smooth experience for the user, the aim is to process as close as possible to 30 Frames per Sec-ond (FPS). To achieve this, the processing pipeline had to be optimized. By making use of cProfile, which is a Python pro-filer, the bottlenecks were identified. The bitmap operations and depth image post-processing needed to detect interactions were taking the most time.

The camera input resolution was reduced to 640 x 480 pixels from 1280 x 720 pixels to improve the time taken waiting for the post-processing of depth frames, and it indeed reduced the delay considerably. The time taken by bitmap operations in the appli-cation had to be reduced as well. The projector outputs images at 1280 x 800 pixels. All incoming frames from the camera were being transformed to match this output resolution. However, the processing time of bitmap operations scales quadratically with the resolution. To reduce the amount of time taken by them, in-coming frames were transformed to 40 % of the projector output size instead. After performing the processing steps, the output image was scaled up for projection. The amount of bitmap op-erations are reduced as well by replacing them by mathemati-cal mathemati-calculations wherever possible. The post-processing of in-coming depth frames also took a toll on performance, mainly the aligning and spatial filtering applied to each incoming depth frame. To remediate the issue, the pipeline was modified to pro-cess depth frames every 5 frames instead. A summary of the optimizations made can be found in Table I.

1_{https://www.intelrealsense.com/sdk-2/} 2_{https://hi5vrglove.com/}

(12)

TABLE I: Optimization Details

Setting Unoptimized Optimized RGB resolution 1280 x 720 pixels 640 x 480 pixels Depth resolution 1280 x 720 pixels 640 x 480 pixels Projection

resolu-tion

1280 x 800 pixels 512 x 320 pixels Depth processing

rate

Every frame Every 5 frames Collision

detec-tion

Binary masks Enclosing circle distances and bi-nary masks

V. SYSTEMEVALUATION

The implemented PAR system is assessed to confirm if it is reliable enough to perform user tests. An evaluation of the system’s accuracy, performance and latency is covered in Sec-tion V-A, SecSec-tion V-B and SecSec-tion V-C respectively. The mark-erless and marker-based approach implemented are compared to each other, and the impact of the optimizations made in terms of FPS are discussed.

A. Accuracy

The accuracy of the system is important as it can impact the results of user testing later on. The marker-based and the mark-erless implementation are compared to each other based on de-tection rate and how accurate the position dede-tection is.

The system does not always detect the user’s fingertips. A few factors come into play such as lighting, motion blur and oc-clusion. The detection rate here is defined as the ratio of frames where the finger tips are correctly detected by the system to the total amount of frames where the fingertips are visible. A test is designed to determine the detection rate of the marker-based approach versus the markerless one. A start and end point are projected onto the canvas and the user drags his hand across the canvas from one point to the other, while assuming a “pinching” stance with his fingers. Normal and low lighting conditions are compared. The test is performed thirty times in both normal and poor lighting conditions to obtain sufficient sample sizes. Fig-ure 5a shows how both implementations scored in normal and low lighting conditions. The markerless implementation scored an average of 93.70 % correct detections while the marker-based implementation scored 99.31 %, which is nearly perfect. In low lighting conditions, the markerless implementation’s detec-tion rate decreased to 61.59 % while the marker-based approach performed perfectly. The marker-based approach is clearly less sensitive to change in lighting conditions. Another notable ob-servation is that the variance is much higher for the markerless approach.

The system is able to pinpoint the user’s fingertip location with a certain margin of error. The aim is to determine how far off the detection is from the actual fingertip location. Ac-curacy is defined as the difference between the detected finger-tip position and the actual fingerfinger-tip position. A test is imple-mented to compare the markerless and marker-based implemen-tations. During the evaluation, thirty random frames are selected

for each run where visible fingertip(s) are detected. The frames are manually annotated with fingertip locations, which form the ground truth. The distance from the ground truth is calculated for each finger in each selected frame. The results for the finger-tip position accuracy experiments are shown in Figure 5b. The markerless implementation averages 2.58 and 2.92 pixel devia-tion from the ground truth for single-finger and two-finger de-tection respectively. The marker-based approach scored better here as well with 1.31 and 1.74 pixel deviation respectively for single-finger and two-finger detection. Here again, a higher vari-ance can be observed for the markerless approach whether it is for the detection of one or two fingers, meaning it is less precise. Those values are based on a scaled down projection image that is 512 x 320 pixels in resolution. The image gets scaled back up to 1280 x 800 pixels for projection. The final size of the image projected on the canvas is about 38.4 x 24 cm. Which trans-lates to margin of errors of 0.10 - 0.13 cm for the marker-based implementation.

Normal light Low light 40 60 80 100 Marker-based Markerless

(a) Detection rate (%)

1 finger

2 fingers

0

1

2

3

4

5 _Marker-based

Markerless

(b) Deviation (pixels)

Fig. 5: Marker-based and markerless approach comparison

B. Performance

A jump from 9.45 FPS to 23.36 FPS is noted owing to the op-timizations described in Table I. The execution time of the pro-cessing pipeline for the optimized implementation is also anal-ysed in more detail. The processing pipeline is split in three: frame capturing, processing and image projection. The resulting data can be seen in Table II. Capturing frames barely takes any CPU time, frames are thus captured and buffered on the camera itself. The processing of the input frames takes about 31.89 ms and projecting the output takes roughly 12.25 ms on average.

TABLE II: Execution Time and Latency

Metric Mean Execution Time Capturing frames 0.01 ± 0.04 ms Processing 31.89 ± 0.61 ms Projecting image 12.25 ± 0.58 ms Latency Monitor-projector 41.67 ± 10.67 ms Camera-projector 179.67 ± 10.48 ms

(13)

C. Latency

Latency turns out to be a quite noticeable with the imple-mented PAR system. Even though the processing time is im-proved by the optimizations made, the delay between the user’s action and the system’s reaction persists. Unfortunately, neither the projector nor the camera used provide any information con-cerning latency/input lag. First, a rudimentary method is used to get an idea of the camera-projector latency. A small applica-tion is written to re-project the input the RGB sensor captures. A stopwatch is placed in front of the camera and another cam-era is used to capture the elapsed time on the stopwatch and the projected stopwatch simultaneously. A second latency measure-ment is made to estimate the projector latency. A virtual stop-watch is displayed on the projector and on a monitor with known latency (5 ms) simultaneously, thirty pictures are taken where the time elapsed is visible for both the monitor and the projector are taken with a camera. Table II shows the measured latency between the image captured by the camera and the correspond-ing projected virtual image as well as the results concerncorrespond-ing the projector latency in relation to the monitor. The average latency between capturing and projecting frames is 179.67 ms while the latency of the projector relative to the monitor is 41.67 ms. To put those values into perspective, execution times from Table II are taken into account. The processing time is only a small frac-tion of the total latency observed, meaning the hardware might be responsible for the latency.

D. Discussion

The marker-based approach fared better compared to the markerless both in terms of accuracy and detection rate, which was expected. Latency remains a minor issue, however zero latency is not achievable, as there always is a temporal delay between the captured view of the real world and the projected virtual view based on it. The delay is only noticeable when the user moves his/her finger too fast. The optimizations did how-ever improve the FPS considerably, resulting in a system which runs smoothly enough (23.36 FPS). The evaluation performed confirm that the performance and accuracy of the implemented system are suitable for user tests involving the combination of haptics and PAR. The modular architecture offers flexibility in terms of software and hardware components and facilitates the design of use cases.

VI. CONCLUSION ANDFUTUREWORK

A haptic enabled PAR system was successfully implemented with the goal of evaluating the effects of haptics on PAR. A few challenges surfaced such as the low FPS caused by processing time and latency. Optimizations made were successful at ob-taining a suitable frame rate. A slight latency persists but the minor delay should not pose too much of a problem during user testing as it is only noticeable when moving fast. Replacing the RGB-D camera might be the next step in resolving the issue. A marker-based and a markerless approach to fingertip detection were implemented. Both implementations were compared and the marker-based approach was found to be more accurate and more suitable to perform user tests. The system could be also be improved with the integration of a haptic glove that provides

haptic feedback to individual fingers. Three use cases were de-signed to verify hypotheses regarding the influence of haptic technology on PAR systems. The system was implemented in such a way that new use cases could easily be implemented as well if needed. User tests will hopefully be performed at a later date as all the tools to do so have been implemented and made available.

REFERENCES

[1] Schmalstieg and Tobias Hollere, Augmented reality : principles and prac-tice. Boston:Addison-Wesley, 2016.

[2] M. Billinghurst, A. Clark, and G. Lee, “A survey of augmented reality,” Found. Trends Hum.-Comput. Interact., vol. 8, pp. 73–272, Mar. 2015. [3] L. Meli, C. Pacchierotti, G. Salvietti, F. Chinello, M. Maisto, A. De

Luca, and D. Prattichizzo, “Combining wearable finger haptics and aug-mented reality: User evaluation using an external camera and the microsoft hololens,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4297– 4304, 2018.

[4] A. H. Mason, M. A. Walji, E. J. Lee, and C. L. MacKenzie, “Reaching movements to augmented and graphic objects in virtual environments,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’01, (New York, NY, USA), p. 426–433, Association for Computing Machinery, 2001.

[5] A. El Saddik, “The potential of haptics technologies,” Instrumentation & Measurement Magazine, IEEE, vol. 10, pp. 10 – 17, 03 2007.

[6] M. Maisto, C. Pacchierotti, F. Chinello, G. Salvietti, A. De Luca, and D. Prattichizzo, “Evaluation of wearable haptic systems for the fingers in augmented reality applications,” IEEE Transactions on Haptics, vol. 10, no. 4, pp. 511–522, 2017.

[7] R. Xiao, C. Harrison, and S. E. Hudson, “Worldkit: Rapid and easy cre-ation of ad-hoc interactive appliccre-ations on everyday surfaces,” in Proceed-ings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’13, (New York, NY, USA), p. 879–888, Association for Computing Machinery, 2013.

[8] C. Harrison, H. Benko, and A. D. Wilson, “Omnitouch: Wearable mul-titouch interaction everywhere,” in Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, (New York, NY, USA), p. 441–450, Association for Computing Machinery, 2011.

[9] P. Punpongsanon, D. Iwai, and K. Sato, “Softar: Visually manipulating haptic softness perception in spatial augmented reality,” IEEE Transac-tions on Visualization and Computer Graphics, vol. 21, no. 11, pp. 1279– 1288, 2015.

[10] H. Mousavi Hondori, M. Khademi, L. Dodakian, S. Cramer, and C. Lopes, “A spatial augmented reality rehab system for post-stroke hand rehabilita-tion,” Studies in health technology and informatics, vol. 184, pp. 279–85, 02 2013.

[11] M. Sarac, A. M. Okamura, and M. D. Luca, “Effects of haptic feedback on the wrist during virtual manipulation,” 2019.

[12] J. Perret and E. Vander Poorten, “Touching virtual reality: a review of haptic gloves,” in Conference: ACTUATOR 18, 06 2018.

[13] Berger and Marcel, Geometry I. Springer-Verlag, 2009.

[14] R. Gurav and P. Kadbe, “Real time finger tracking and contour detection for gesture recognition using opencv,” 2015 International Conference on Industrial Instrumentation and Control, ICIC 2015, pp. 974–977, 07 2015. [15] F. Bashir and F. Porikli, “Performance evaluation of object detection and tracking systems,” IEEE International Workshop on Performance Evalua-tion of Tracking and Surveillance (PETS),, 01 2006.

[16] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief in-troduction to opencv,” in 2012 Proceedings of the 35th International Con-vention MIPRO, pp. 1725–1730, 2012.

(14)

Verbeterde geprojecteerde Augmented Reality

interactie door haptische technologie

Nicolas Legrand

Begeleider(s): prof. dr. ir. Filip De Turck, dr. ir. Femke De Backere, ir. Joris Heyse, dr. ir. Maria Torres Vega

Abstract— Augmented Reality (AR) is een veelzijdige technologie met veel potentieel die de integratie van virtuele objecten in 3D-omgevingen in real-time mogelijk maakt. Een veel voorkomend probleem met AR syste-men is dat bij interactie met computer gegenereerde objecten er een brek is aan tactiele feedback. In deze paper ligt de focus op projectie ge-baseerde AR. Onderzoek naar de toevoeging van haptische feedback aan Geprojecteerd Augmented Reality (PAR) is beperkt terwijl dit de gebrui-kerservaring, nauwkeurigheid en snelheid zou kunnen verbeteren tijdens interactie met geprojecteerde virtuele objecten. Een haptisch geactiveerd PAR-systeem wordt ge¨ımplementeerd en ge¨evalueerd op vlak van latentie, performantie en nauwkeurigheid in dit onderzoek. De resultaten bevestigen dat het een geschikt raamwerk is met veel potentieel.

Trefwoorden— Geprojecteerde Augmented Reality, table-top, haptische handschoenen

I. INTRODUCTIE

A

UGMENTED REALITY (AR) is één van de meest beto-verende en futuristische technologieën van vandaag op het gebied van computerwetenschap en applicatieontwikkeling [1]. Het maakt de integratie mogelijk van computer gegenereerde in-formatie in de echte wereld, wat handig is in een breed scala aan toepassingen op verschillende gebieden, zoals gezondheidszorg, entertainment en architectuur [2]. De meest aanvaarde definitie beschrijft AR als een technologie die beschikt over drie sleutel-kenmerken [2], (i) een beeldscherm, dat de combinatie van echte en virtuele beelden mogelijk maakt; (ii) de mogelijkheid om in-teractieve afbeeldingen te genereren en in real-time te reageren op gebruikersinvoer; en (iii) het is in staat om een virtueel beeld in de echte wereld te laten verschijnen. AR mag niet verward worden met Virtual Reality (VR), waarin de echte wereld volle-dig wordt vervangen door computer-gegenereerde afbeeldingen. AR systemen maken meestal uitgebreid gebruik van computer-visietechnieken. Augmentatie laat enerzijds toe om objecten toe te voegen aan een echte omgeving en anderzijds om objecten te verwijderen van een echte omgeving [2].

Een veelvoorkomend probleem met AR is het gebrek aan tactiele feedback bij interactie met computer-gegenereerde ob-jecten [3]. Hoewel dit AR systemen niet onbruikbaar maakt, kan het toevoegen van haptiek een grote toegevoegde waarde zijn [4]. Het woord haptiek is afgeleid van het Griekse woord haptikos, wat “tactiel” betekent of “tot de tastzin behorend”. Dit is veelbelovend, omdat velen denken dat het de manier waarop mensen idee¨en communiceren en omgaan met informatie zal veranderen [5]. Onderzoek naar het gebruik van Head-Mounted Display (HMD) in combinatie met haptische technologie [6] is reeds gebeurd, maar er is nog weinig geweten over Geprojec-teerd Augmented Reality (PAR)-systemen waar virtuele hapti-sche stimuli worden gegeven tijdens interacties.

Het doel van dit onderzoek is om een raamwerk te

ontwer-pen voor de studie en evaluatie van de effecten van haptische feedback op PAR. In PAR worden fysieke objecten “geaugmen-teerd” door virtuele afbeeldingen rechtstreeks een oppervlak te projecteren. Een PAR-systeem dat haptische feedback voorziet, is weergegeven op Figuur 1. Het bestaat uit een oppervlak waar-boven een projector en een camera zijn geplaatst en de gebrui-ker met zijn/haar handen kan interageren met virtuele objecten, die op het oppervlak worden geprojecteerd. De camera volgt de positie van fysieke objecten en de bewegingen, gebaren en inter-acties van de gebruiker. De gebruiker krijgt haptische feedback terwijl hij haptische handschoenen draagt.

Projector Camera User V_irtual object Haptic gloves Surface

Fig. 1: Schema van het PAR systeem

Het verwachte effect van haptische feedback op PAR is ge-formuleerd in vier hypothesen. Elke hypothese suggereert een verbetering bij het gebruik van een haptische handschoen in ver-gelijking met een scenario zonder:

Hypothese 1 (H1) Het ontvangen van haptische feedback ver-mindert de hoeveelheid tijd die de gebruiker nodig heeft om ta-ken uit te voeren in vergelijking met het uitvoeren van dezelfde taken zonder haptische feedback te krijgen.

Hypothese 2 (H2) Het ontvangen van haptische feedback tij-dens het uitvoeren van taken verbetert de hersteltijd van fouten van de gebruiker in vergelijking met het uitvoeren van dezelfde taken zonder haptische feedback te krijgen.

Hypothese 3 (H3) Het ontvangen van haptische feedback tij-dens het uitvoeren van taken verbetert de nauwkeurigheid van de gebruiker in vergelijking met het uitvoeren van dezelfde ta-ken zonder haptische feedback te krijgen.

Hypothese 4 (H4) Het ontvangen van haptische feedback tij-dens het uitvoeren van taken verbetert de gebruiksvriendelijk-heid in vergelijking met het uitvoeren van dezelfde taken zonder haptische feedback te krijgen.

(15)

Deze paper is als volgt opgebouwd. Eerst wordt bestaand on-derzoek met betrekking tot PAR behandeld in Sectie II. Hierna wordt een overzicht gegeven van het PAR-systeem in Sectie III. Implementatie details van het PAR-systeem worden beschreven in Sectie IV voordat overgegaan wordt naar een evaluatie van het systeem op vlak van nauwkeurigheid, prestaties en latentie in Sectie V gecombineerd met de bevindingen. Tot slot wordt een conclusie gegeven in Sectie VI.

II. GERELATEERDWERK

Xiao et al. [7] streven naar een alomtegenwoordige compu-tervisie. Hiervoor ontwierpen ze WorldKit, een bibliotheek die gebruik maakt van een dieptecamera in combinatie met een pro-jector om gewone oppervlakken onmiddellijk om te zetten in interactieve oppervlakken. Het ge¨ımplementeerde raamwerk is interessant voor ontwikkelaars omdat het een reeks abstracties biedt, die het maken van interfaces vereenvoudigd. Het systeem kan maar een zeer beperkte set van objecten herkennen.

Een meer draagbare implementatie van PAR is ontwikkeld door Harrison et al. [8] bij Microsoft Research. Het team cre¨eerde een draagbaar projectiesysteem genaamd OmniTouch, dat bestaat uit een dieptecamera en een pico-projector, die op de schouders van de gebruiker wordt gedragen. Hun werk ver-schilt van de meeste andere PAR-systemen omdat het laat zien dat PAR een vrij mobiele en flexibele technologie kan zijn.

Punpongsanon et al. [9] hebben een PAR-systeem ontwikkeld dat pseudo-haptische feedback biedt, genaamd SoftAR. Door de oppervlaktevervorming en verandering in uiterlijk te projecteren wanneer de gebruiker een fysiek object aanraakt, slagen ze erin hem/haar te laten geloven dat het zachter is dan het in werkelijk-heid is. De manipulatie van andere zintuigen dan het zicht met behulp van PAR lijkt zeker fascinerend en heeft veel potentieel. Mousavi et al. [10] laat ons een voorbeeld zien van hoe nuttig PAR kan zijn voor pati¨enten tijdens revalidatie na een beroerte met behulp van een “table-top” thuisgebaseerd PAR-systeem. Pati¨enten interageren met het systeem om basisbewegingen te oefenen, terwijl ze gekleurde markeringen op hun vingertoppen dragen om het computervisie aspect van het systeem te verge-makkelijken. Als toekomstig werk stelden de onderzoekers de toevoeging van haptische feedback voor om het interactiepro-ces realistischer te maken en de effectiviteit van de revalidatie systemen te verbeteren [10].

Zoals eerder vermeld, is er onderzoek gedaan naar de rele-vantie van haptics in AR en VR in het algemeen [4], [11], [12], maar er werd niet veel gevonden over PAR specifiek. Dit leidde tot de belangrijkste vraag, die we in dit artikel wille beantwoor-den: verbetert de toevoeging van haptische technologie PAR?

III. GEPROJECTEERDEAUGMENTEDREALITYSYSTEEM

In deze sectie wordt een overzicht gegeven van het PAR-systeem dat voor dit onderzoek werd ge¨ımplementeerd. A. Use Cases

Drie use cases zijn ontworpen om de hypotheses, die beschre-ven werden in de inleiding te verifi¨eren.

A.1 Volg de Lijn

De gebruiker wordt gevraagd een lijn te volgen met zijn/haar vingertop. De gebruiker moet het andere uiteinde van de lijn bereiken terwijl zijn of haar hand zo dicht mogelijk bij de lijn blijft. Tijdens het dragen van een haptische handschoen worden haptische stimuli in de vorm van trillingen aan de gebruiker ge-geven wanneer hij/zij van de lijn afwijkt. De intensiteit van de trillingen is gebaseerd op de afstand tot de lijn. Figuur 2a toont een voorbeeld van een lijn, die door de applicatie is gegenereerd en de locatie van de wijsvinger van de gebruiker tijdens een test. A.2 Hindernisbaan

De gebruiker wordt gevraagd om een object van het ene uit-einde van het canvas naar het andere te slepen en het daar los te laten zonder dat het onderweg tegen obstakels botst. Bij een botsing krijgt de gebruiker via de haptische handschoenen hap-tische stimuli in de vorm van trillingen. Kleine pulsen wor-den ook ontvangen op basis van hoe ver het te slepen object zich tot het dichtstbijzijnde obstakel bevindt. Figuur 2b geeft een voorbeeld van een gegenereerde hindernisbaan, de gedetec-teerde vingertoppen en virtuele objecten tijdens een test. A.3 Simon Zegt

Drie gekleurde blokken worden voor de gebruiker geprojec-teerd. Er worden instructies geprojecteerd om aan te geven welke van de gekleurde blokken de gebruiker moet aanraken. Het doel is dat de gebruiker gedurende vijf ronden het juiste ge-kleurde blok aanraakt. De handschoenen trillen korte tijd bij het aanraken van het juiste gekleurde blok en trillen voor langere tijd bij het aanraken van een verkeerd blok. Figuur 2c illustreert hoe de gebruiker instructies krijgt om gekleurde blokken aan te raken en toont de gedetecteerde vingertoppen tijdens een test. B. Architectuur

Het systeem bestaat uit meerdere softwarecomponenten, die met elkaar moeten communiceren. Figuur 3 geeft een overzicht van de structuur van de applicatie. De architectuur is gebaseerd op het Mediator Pattern waarin de Orchestration component de Mediatoris en ook de interactie tussen de meeste andere compo-nenten is hierin omvat. Het patroon is gekozen om een strakke koppeling tussen de set interactieve componenten te vermijden en de interactie tussen de componenten gemakkelijk te kunnen veranderen. De Camera component is verantwoordelijk voor al-les wat met de Red, Green, Blue - Depth/Distance (RGB-D) ca-mera te maken heeft, van basisconfiguratie tot het ophalen van kleuren diepteframes van het apparaat. De Display component bevat projector en display gerelateerde code, deze component is verantwoordelijk voor alle visuele output van het systeem. Het systeem ondergaat een kalibratiefase voordat de gebruiker er-mee kan communiceren en alle daaraan gerelateerde code is te vinden in de Calibration component. Deze component is ver-antwoordelijk voor het correct interpreteren van de invoer van de camera en een nauwkeurige projectie.

In de applicatie worden twee soorten modellen gebruikt. De Physical Objectscomponent bevat software tegenhangers van fysieke objecten in de echte wereld die door het systeem worden herkend. Ze worden gebruikt om interacties met het systeem

(16)

(a) Volg de Lijn use case (b) Hindernisbaan use case (c) Simon Zegt use case

Fig. 2: Use cases van het PAR systeem

te verwerken. De Virtual Objects component bevat computer-gegenereerde objecten, waarmee de gebruiker kan interageren. Virtuele objecten zijn meestal simpele vormen met kenmerken, zoals locatie en grootte. Ze bieden een interface waarmee het object op de juiste locatie op een afbeelding kan worden gete-kend om daarna geprojecteerd te worden. De Detection com-ponent is verantwoordelijk voor de detectie van fysieke objec-ten: handen en vingertoppen. De detectie is gebaseerd op de belangrijkste invoer componenten van het systeem, namelijk de kleur en/of diepteframes, die door de camera zijn opgenomen. Er zijn twee detectiemethoden ge¨ımplementeerd, een marker-loze en een markergebaseerde benadering.

Gedetecteerde objecten moeten gevolgd kunnen worden om een idee te krijgen van hoe snel ze bewegen. De Tracking com-ponent bevat tracking code om fysieke objecten over verschil-lende frames te monitoren en een geschiedenis van hun locatie op te bouwen. De Interaction component wordt gebruikt om interacties tussen virtuele objecten en fysieke objecten te detec-teren en te verwerken, evenals virtuele objecten, die onderling interageren. De Haptics component is verantwoordelijk voor het leveren van haptische stimuli aan de gebruiker. Dit onderdeel biedt een interface aan om de trilling van de haptische hand-schoenen aan te sturen. De Experiments component bevat code om het verloop van experimenten te besturen. Het maakt hier-voor gebruik van virtuele objecten en event handlers, die eraan zijn gekoppeld. Ten slotte, de Orchestration co¨ordineert de ap-plicatielus en maakt gebruik van de meeste andere componenten om een verwerkingspijplijn te vormen vanaf het opnemen van frames tot uiteindelijke beeldprojectie.

Detect Physical Objects Update Update Create Create Track Physical

Objects Draw Vibrate

Process Interactions Calibrate System Control Flow Capture Frames Project Image «Component» Physical Objects «Component» Virtual Objects «Component» Interaction «Component»

Tracking «Component»Experiments

«Component» Detection «Component» Calibration «Component» Display «Component»

Orchestration «Component»_Haptics

«Component» Camera

Fig. 3: Software componenten

C. Hardware

De opstelling bestaat uit een desktop computer, een RGB-D camera, een projector, een haptische handschoen en een wit can-vas. De gebruikte desktop computer heeft een i5-4690 CPU, Nvidia GeForce GTX 970 GPU, 8 GB RAM en draait Windows 10. Wat betreft de projector werd de LG PB60G projector geko-zen vanwege zijn prijs-kwaliteitverhouding en gemak. De Intel Realsense D435 werd gekozen als RGB-D camera omdat het een redelijk recente toestel is, dat goede kwaliteit aanbiedt en goede ondersteuning voor ontwikkelaars. Het biedt eenvoudige out-of-the-box integratie voor een verscheidenheid aan toepas-singen, waaronder AR. Tot slot, is er gekozen voor de Noitom Hi5 VR handschoenen om de gebruiker van haptische feedback te voorzien. De camera en de projector zijn achter het canvas naast mekaar en op een hoogte van 1 meter op statieven gemon-teerd.

IV. IMPLEMENTATIE

De implementatie van het PAR-systeem wordt in deze sec-tie in meer detail besproken. Eerst wordt het kalibrasec-tieproces beschreven in Sectie IV-A. Daarna worden in Secties IV-B en IV-C de belangrijkste verwerkingsstappen beschreven, die in-teractie tussen de gebruiker en virtuele objecten mogelijk ma-ken. Sectie IV-D bespreekt systeem interacties tussen virtuele en fysieke objecten. Een overzicht van essenti¨ele bibliotheken en raamwerken wordt gegeven in Sectie IV-E alvorens de uitda-gingen die opgedoken zijn tijdens de implementatie besproken worden in Sectie IV-F.

A. Kalibratie

Het systeem ondergaat bij opstart eerst een kalibratiefase voordat de gebruiker ermee kan interageren. Het kalibratiepro-ces heeft twee hoofddoelen, (i) het verkrijgen van een transfor-matiematrix en (ii) het maken van een achtergronddieptemap-ping.

De transformatiematrix wordt gebruikt om cameraframes af te stemmen op het projectiebeeld. Dit is nodig omdat de camera en de projector in verschillende beeldco¨ordinatensystemen wer-ken. Er kan een homografie worden geschat, die elk punt van het camerabeeld aan het projectorbeeld toewijst [13]. Figuur 4 toont een voorbeeld van afbeeldingen vanuit twee verschillende gezichtspunten gerelateerd door homografie. Deze transforma-tiematrix kan vervolgens worden gebruikt om alle binnenko-mende cameraframes om te zetten in overeenstemming met het co¨ordinatensysteem van het projectiebeeld.

(17)

X

Real world plane

Homography P image 1 image 2 O2 O1 p(x 1 ,y_{1 )} p'(x2,y 2)

Fig. 4: Beelden van hetzelfde vlak vanuit twee gezichtspunten gerelateerd door homografie

De transformatiematrix is een 3 x 3 matrix, die als volgt ge-schreven kan worden:

H =   h00 h01 h02 h10 h11 h12 h20 h21 h22   (1)

Gezien een eerste set corresponderende punten — (x1, y1)

in het projectiebeeld en (x2, y2) in het camerabeeld, brengt de

transformatiematrix ze zo in kaart dat:   x2 y2 1  = H   x1 y1 1  =   h00 h01 h02 h10 h11 h12 h20 h21 h22     x1 y1 1   (2)

De dieptegegevens van de camera worden gebruikt om te de-tecteren wanneer gebruikers het canvas aanraken. Elk diepte-frame verkregen van de camera is een tweedimensionale matrix die de afstand van objecten vanuit het gezichtspunt weergeeft. Tijdens het kalibratieproces wordt in kaart gebracht hoe ver het canvas (achtergrond) is, zodat achtergrond- en voorgrondobjec-ten onderscheiden kunnen worden om de interactie tussen de gebruiker en virtuele objecten te ondersteunen.

B. Vingertop Detectie

Er zijn twee benaderingen ge¨ımplementeerd voor vingertop-detectie en tracking: een markerloze en een markergebaseerde benadering. Voor de op markergebaseerde implementatie draagt de gebruiker fluostickers op zijn/haar vingertoppen, waardoor het systeem ze gemakkelijk kan detecteren met kleurfiltering. De ge¨ımplementeerde markerloze benadering is gebaseerd op het Convex Hull Algorithm zoals beschreven [14]. Deze me-thode maakt ook gebruik van kleurfiltering, maar op huid- en handschoenkleur.

C. Vingertop Tracking

Markerloze en op markergebaseerde gebaseerde vingertopde-tectie bieden alleen informatie over de locatie en vorm van vin-gertoppen in individuele frames. Om de snelheid of verplaat-sing van een vingertop over opeenvolgende frames te berekenen, moet deze ge¨ıdentificeerd en gevolgd worden, waarvoor Cen-troid Tracking gebruikt kan worden. CenCen-troid Tracking maakt gebruik van de Euclidische afstand tussen nieuwe centro¨ıden en oude centro¨ıden uit eerdere frames om ze te identificeren en een locatiegeschiedenis voor elke vingertop op te bouwen [15].

D. Interactie Verwerking

Virtuele objecten worden geprojecteerd en de gebruiker kan ermee interageren met zijn/haar vingertoppen. Callback-functies kunnen worden gekoppeld aan virtuele objecten, die worden aangeroepen wanneer ze worden aangehouden, aange-raakt of geknepen en gesleept. De functies worden ook gebruikt om waar nodig haptische feedback aan de gebruiker te geven. Het ge¨ımplementeerde systeem ondersteunt het zweven, aanra-ken, knijpen en slepen van de geprojecteerde virtuele objecten. E. Essenti¨ele Bibliotheken & Raamwerken

Voor computervisietechnieken wordt de bibliotheek Open source Computer Vision (OpenCV) vaak benut. OpenCV is een platformonafhankelijke bibliotheek geschreven in C++ met een groot aantal programmeerfuncties, voornamelijk gericht op real-time computervisie [16].

Voor de Intel Realsense D435camera is de Intel RealSense Software Development Kit (SDK) 2.0 beschikbaar met cross-platform support. Een Python wrapper van de SDK is beschik-baar onder de naam pyrealsense2. De SDK biedt uitgebreide controle over de camera en verschillende opties om kleuren- en dieptebeelden te configureren en op te nemen.

Wat de haptische handschoenen betreft, maakt Noitom SDKs enkel beschikbaar voor Unity en Unreal Engine. Het systeem is opgesplitst in twee applicaties. Het bestaat uit een Python ap-plicatie, ondersteund door een kleinere Unity apap-plicatie, waarin de Hi5 Interaction SDK is opgenomen. De Python-applicatie en de Unity-applicatie communiceren met behulp van ZeroMQ, een krachtige asynchrone platformonafhankelijke messaging bi-bliotheek, die zowel voor Python als voor C# beschikbaar is. Op die manier kan de Python-applicatie haptische stimuli triggeren door een eenvoudig bericht naar de Unity applicatie te sturen. F. Uitdagingen

Tijdens de implementatie van het PAR-systeem kwamen en-kele uitdagingen naar voren. De uitgevoerde verwerkingsstap-pen kunnen behoorlijk veel resources vergen. Om gebruiks-vriendeiljkheid te garanderen, is het de bedoeling om zo dicht mogelijk bij 30 Frames per Second (FPS) te verwerken. Om dit te bereiken, moest de verwerkingspijplijn geoptimaliseerd wor-den. Door gebruik te maken van cProfile, een Python-profiler, werden de knelpunten ge¨ıdentificeerd. De bitmapbewerkingen en diepte-nabewerking van afbeeldingen, die nodig waren om interacties te detecteren, namen de meeste tijd in beslag.

De resolutie van de camera-invoer werd verlaagd van 640 x 480 pixels naar 1280 x 720 pixels om de wachttijd voor de nabewerking van diepteframes te verbeteren en de vertraging werd inderdaad aanzienlijk verminderd. De tijd die bitmapbe-werkingen in beslag namen, moest ook verminderd worden. De projector voert afbeeldingen uit van 1280 x 800 pixels groot. Alle inkomende frames van de camera werden getransformeerd om deze outputresolutie te evenaren. De verwerkingstijd van bitmapbewerkingen wordt echter kwadratisch geschaald met de resolutie. Om de hoeveelheid tijd die ze nodig hadden te vermin-deren, werden inkomende frames omgevormd tot 40 % van de projectoruitvoer. Na het uitvoeren van de verwerkingsstappen wordt het outputbeeld opgeschaald voor projectie. Het aantal

(18)

bitmapbewerkingen wordt ook verminderd door ze waar moge-lijk te vervangen door wiskundige berekeningen. De nabewer-king van inkomende diepteframes heeft ook zijn tol ge¨eist van de performantie, voornamelijk de uitlijning en ruimtelijke fil-tering die op elk inkomend diepteframe wordt toegepast. Om het probleem op te lossen, werd de pijplijn gewijzigd om elke vijf frames diepteframes te verwerken. Een overzicht van de gemaakte optimalisaties is te vinden in Tabel I.

TABLE I: Optimalisatie Details

Instelling Niet geoptimaliseerd Geoptimaliseerd RGB resolutie 1280 x 720 pixels 640 x 480 pixels Diepte resolutie 1280 x 720 pixels 640 x 480 pixels Projectie

resolu-tie

1280 x 800 pixels 512 x 320 pixels Diepteframe

ver-werkingssnelheid

Elk frame Om de 5 frames Botsingsdetectie Binaire maskers Omcirkelende

cirkelafstan-den en binaire maskers

V. EVALUATIE VAN HETSYSTEEM

Het ge¨ımplementeerde PAR-systeem wordt ge¨evalueerd om te bevestigen of het betrouwbaar genoeg is om gebruikerstests uit te voeren. Een evaluatie van de nauwkeurigheid, performan-tie en latenperforman-tie van het systeem wordt behandeld in deze secperforman-tie. A. Nauwkeurigheid

De nauwkeurigheid van het systeem is belangrijk omdat het de resultaten van gebruikerstests later kan be¨ınvloeden. De mar-kergebaseerde en de markerloze implementatie worden met el-kaar vergeleken op basis van detectiegraad en hoe nauwkeurig de positiedetectie is.

Het systeem detecteert niet altijd de vingertoppen van de ge-bruiker. Een aantal factoren spelen een rol, zoals verlichting, be-wegingsonscherpte en occlusie. De detectiesnelheid wordt hier gedefinieerd als de verhouding tussen frames, waarbij de vin-gertoppen correct worden gedetecteerd door het systeem en het totale aantal frames waar de vingertoppen zichtbaar zijn. Een test is ontworpen om de detectiegraad van de markergebaseerde benadering te bepalen ten opzichte van de markerloze. Begin -en eindpunt worden op het canvas geprojecteerd en de gebrui-ker sleept zijn hand over het canvas van het ene punt naar het andere, terwijl hij met de vingers een “knijpende” houding aan-neemt. Normale en lage lichtomstandigheden worden vergele-ken. De test wordt dertig keer uitgevoerd onder zowel normale als slechte lichtomstandigheden om voldoende steekproefom-vang te verkrijgen. Figuur 5a laat zien hoe beide implementa-ties scoorden onder normale en weinig lichtomstandigheden. De markerloze implementatie scoorde een gemiddelde van 93,70 % correcte detecties terwijl de marker-gebaseerde implementatie 99,31 % scoorde, wat bijna perfect is. Bij weinig licht daalde het detectiegraad van de markerloze implementatie tot 61,59 %, terwijl de markergebaseerde aanpak perfect presteerde. De

marker-based benadering is duidelijk minder gevoelig voor ver-andering in lichtomstandigheden.

Het systeem kan met een bepaalde foutmarge de vingertop-locatie van de gebruiker bepalen. Het doel is om te bepalen hoe ver de detectie verwijderd is van de daadwerkelijke vinger-toplocatie. Nauwkeurigheid wordt gedefinieerd als het verschil tussen de gedetecteerde vingertoppositie en de werkelijke vin-gertoppositie. Er wordt een test uitgevoerd om de markerloze en markergebaseerde implementaties te vergelijken. Tijdens de evaluatie worden dertig willekeurige frames geselecteerd voor elke run, waarbij zichtbare vingertoppen worden gedetecteerd. De frames worden handmatig geannoteerd met vingertoploca-ties, die de grondwaarheid vormen. De afstand tot de grond-waarheid wordt berekend voor elke vinger in elk geselecteerd frame. De resultaten voor de experimenten omtrent nauwkeu-righeid van de vingertoppositie worden getoond in Figuur 5b. De markerloze implementatie wijkt gemiddeld af met 2,58 en 2,92 pixel van de grondwaarheid voor detectie met respectie-velijk één en twee vingers. De markergebaseerde benadering scoorde ook hier beter met respectievelijk afwijkingen van 1.31 en 1.74 pixel voor detectie met één vinger en twee vingers. Ook hier kan een grotere variantie worden waargenomen voor de markerloze benadering, of het nu gaat om de detectie van één of twee vingers, wat betekent dat het minder precies is. Deze waarden zijn gebaseerd op een verkleind projectiebeeld met een resolutie van 512 x 320 pixels. Het beeld wordt terug geschaald tot 1280 x 800 pixels voor projectie. De uiteindelijke grootte van de afbeelding die op het canvas wordt geprojecteerd, is on-geveer 38,4 x 24 cm. Dit vertaalt zich in een foutenmarge van 0,10 - 0,13 cm voor de markergebaseerde implementatie. B. Performantie

De optimalisaties, zoals getoond in Tabel I, leverden een sprong van 9.45 FPS naar 23.36 FPS aop. De uitvoeringstijd van de verwerkingspijplijn voor de geoptimaliseerde implemen-tatie wordt ook nader geanalyseerd. De verwerkingspijplijn is in drie gesplitst: frame-opname, verwerking en beeldprojectie. De resulterende gegevens zijn te zien in Tabel II. Frames op-nemen kost amper CPUtijd, frames worden dus vastgelegd en gebufferd op de camera zelf. De verwerking van de invoerfra-mes duurt gemiddeld 31,89 ms en het projecteren van de uitvoer duurt gemiddeld 12,25 ms.

Normal light Low light 40 60 80 100 Marker-based Markerless (a) Detectiegraad (%)

1 finger

2 fingers

0

1

2

3

4

5 _Marker-based

Markerless

(b) Afwijking (pixels)

Fig. 5: Vergelijking van de markergebaseerde en markerloze be-naderingen

(19)

TABLE II: Uitvoeringstijd and Latentie Metriek Gemiddelde Uitvoeringstijd Frame opname 0.01 ± 0.04 ms Verwerking 31.89 ± 0.61 ms Beeldprojectie 12.25 ± 0.58 ms Latentie Monitor-projector 41.67 ± 10.67 ms Camera-projector 179.67 ± 10.48 ms C. Latentie

Latentie blijkt nogal merkbaar bij het ge¨ımplementeerde PAR-systeem. Hoewel de verwerkingstijd wordt verbeterd door de gemaakte optimalisaties, blijft de vertraging tussen de actie van de gebruiker en de reactie van het systeem bestaan. He-laas bieden noch de projector, noch de gebruikte camera enige informatie over latentie/invoervertraging. Ten eerste wordt een rudimentaire methode gebruikt om een idee te krijgen van de latentie van de camera-projector. Er wordt een korte applicatie geschreven om de invoer, die door de Red, Green, Blue (RGB) camera is opgenomen te herprojecteren. Een stopwatch wordt voor de camera geplaatst en een andere camera wordt gebruikt om de verstreken tijd op de stopwatch en de geprojecteerde stop-watch tegelijkertijd vast te leggen. Een tweede latentiemeting wordt uitgevoerd om de latentie van de projector in te schat-ten. Een virtuele stopwatch wordt tegelijkertijd weergegeven op de projector en een monitor met bekende latentie (5 ms), dertig foto’s worden genomen waarbij de verstreken tijd zichtbaar is voor zowel de monitor als de projector. Tabel II toont de geme-ten lageme-tentie tussen het beeld dat door de camera is opgenomen, het bijbehorende geprojecteerde virtuele beeld, en de resultaten met betrekking tot de latentie van de projector in relatie tot de monitor. De gemiddelde latentie tussen het vastleggen en jecteren van frames is 179.67 ms terwijl de latentie van de pro-jector ten opzichte van de monitor 41.67 ms is. Om die waarden in perspectief te plaatsen, wordt rekening gehouden met uitvoe-ringstijden getoond in Table II. De verwerkingstijd is slechts een klein deel van de totale waargenomen latentie, wat betekent dat de hardware mogelijk verantwoordelijk is voor de latentie. D. Discussie

De markergebaseerde implementatie presteerde beter in ver-gelijking met de markerloze. Latentie blijft een beperkt pro-bleem, maar is alleen merkbaar als de gebruiker zijn/haar vinger te snel beweegt. De optimalisaties verbeterden de FPS aanzien-lijk, wat resulteert in een systeem dat snel genoeg draait. De uitgevoerde evaluatie bevestigt dat de prestaties en nauwkeu-righeid van het ge¨ımplementeerde systeem geschikt zijn voor gebruikerstests. De modulaire architectuur biedt flexibiliteit en vergemakkelijkt het ontwerp van use cases.

VI. CONCLUSIE ENTOEKOMSTWERK

Een PAR-systeem dat haptische feedback ondersteunt, is met succes ge¨ımplementeerd met als doel de effecten van haptiek op PAR te evalueren. Een paar uitdagingen kwamen naar voor, zoals de lage FPS veroorzaakt door verwerkingstijd en

laten-tie. De gemaakte optimalisaties waren succesvol bij het verkrij-gen van een geschikte framesnelheid. Een kleine latentie blijft bestaan, maar de kleine vertraging mag geen problemen ople-veren tijdens gebruikerstesten. Het vervangen van de RGB-D camera is een mogelijke volgende stap om dit probleem op te lossen. Er werd een markergebaseerd en een markerloze be-nadering voor vingertopdetectie ge¨ımplementeerd. Beide im-plementaties werden vergeleken en de markergebaseerde aan-pak bleek nauwkeuriger en geschikter om gebruikerstests uit te voeren. Het systeem kan ook worden verbeterd met de integra-tie van een haptische handschoen die haptische feedback geeft aan individuele vingers. Het systeem is zo ge¨ımplementeerd dat nieuwe experimenten indien nodig ook eenvoudig kunnen wor-den ge¨ımplementeerd.

REFERENCES

[1] Schmalstieg and Tobias Hollere, Augmented reality : principles and prac-tice. Boston:Addison-Wesley, 2016.

[2] M. Billinghurst, A. Clark, and G. Lee, “A survey of augmented reality,” Found. Trends Hum.-Comput. Interact., vol. 8, pp. 73–272, Mar. 2015. [3] L. Meli, C. Pacchierotti, G. Salvietti, F. Chinello, M. Maisto, A. De Luca,

and D. Prattichizzo, “Combining wearable finger haptics and augmented reality: User evaluation using an external camera and the microsoft holo-lens,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4297–4304, 2018.

[4] A. H. Mason, M. A. Walji, E. J. Lee, and C. L. MacKenzie, “Reaching movements to augmented and graphic objects in virtual environments,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’01, (New York, NY, USA), p. 426–433, Association for Computing Machinery, 2001.

[5] A. El Saddik, “The potential of haptics technologies,” Instrumentation & Measurement Magazine, IEEE, vol. 10, pp. 10 – 17, 03 2007.

[6] M. Maisto, C. Pacchierotti, F. Chinello, G. Salvietti, A. De Luca, and D. Prattichizzo, “Evaluation of wearable haptic systems for the fingers in augmented reality applications,” IEEE Transactions on Haptics, vol. 10, no. 4, pp. 511–522, 2017.

[7] R. Xiao, C. Harrison, and S. E. Hudson, “Worldkit: Rapid and easy cre-ation of ad-hoc interactive appliccre-ations on everyday surfaces,” in Pro-ceedings of the SIGCHI Conference on Human Factors in Computing Sys-tems, CHI ’13, (New York, NY, USA), p. 879–888, Association for Com-puting Machinery, 2013.

[8] C. Harrison, H. Benko, and A. D. Wilson, “Omnitouch: Wearable mul-titouch interaction everywhere,” in Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, UIST ’11, (New York, NY, USA), p. 441–450, Association for Computing Machinery, 2011.

[9] P. Punpongsanon, D. Iwai, and K. Sato, “Softar: Visually manipulating haptic softness perception in spatial augmented reality,” IEEE Transacti-ons on Visualization and Computer Graphics, vol. 21, no. 11, pp. 1279– 1288, 2015.

[10] H. Mousavi Hondori, M. Khademi, L. Dodakian, S. Cramer, and C. Lopes, “A spatial augmented reality rehab system for post-stroke hand rehabilita-tion,” Studies in health technology and informatics, vol. 184, pp. 279–85, 02 2013.

[11] M. Sarac, A. M. Okamura, and M. D. Luca, “Effects of haptic feedback on the wrist during virtual manipulation,” 2019.

[12] J. Perret and E. Vander Poorten, “Touching virtual reality: a review of haptic gloves,” in Conference: ACTUATOR 18, 06 2018.

[13] Berger and Marcel, Geometry I. Springer-Verlag, 2009.

[14] R. Gurav and P. Kadbe, “Real time finger tracking and contour detection for gesture recognition using opencv,” 2015 International Conference on Industrial Instrumentation and Control, ICIC 2015, pp. 974–977, 07 2015. [15] F. Bashir and F. Porikli, “Performance evaluation of object detection and tracking systems,” IEEE International Workshop on Performance Evalua-tion of Tracking and Surveillance (PETS),, 01 2006.

[16] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief in-troduction to opencv,” in 2012 Proceedings of the 35th International Con-vention MIPRO, pp. 1725–1730, 2012.

(20)

xx TABLE OF CONTENTS

3 Projected Augmented Reality System 18 3.1 System Requirements . . . 18 3.2 Hardware . . . 19 3.2.1 RGB-D Camera . . . 19 3.2.2 Haptic Devices . . . 21 3.2.3 Projector . . . 22 3.2.4 Conclusion . . . 23 3.3 Software Architecture . . . 24 3.4 Conclusion . . . 27 4 Implementation 28 4.1 Code . . . 28

4.2 Essential Software Libraries . . . 28

4.3 Calibration . . . 29

(22)

xxii TABLE OF CONTENTS

4.3.2 Background Depth Map . . . 32 4.4 Marker-based Processing Steps . . . 32 4.4.1 Color Frame Processing . . . 34 4.4.2 Depth Frame Processing . . . 37 4.4.3 Processing Interactions . . . 38 4.5 Markerless Fingertip Detection . . . 40 4.6 Haptic Feedback . . . 43 4.7 Challenges . . . 44 4.7.1 Performance . . . 44 4.7.2 Latency . . . 45 4.7.3 Markerless Tracking Accuracy . . . 46 4.8 Conclusion . . . 46

5 Evaluation of the system 47

5.1 Experiments . . . 47 5.1.1 Accuracy of Fingertip Detection . . . 47 5.1.2 Performance . . . 48 5.1.3 Latency . . . 49 5.2 Results . . . 50 5.2.1 Fingertip Detection Rate . . . 50 5.2.2 Fingertip Position Accuracy . . . 51 5.2.3 Performance . . . 51 5.2.4 Latency . . . 53 5.3 Conclusion . . . 54

(23)

TABLE OF CONTENTS xxiii 6.1 Hypotheses . . . 55 6.2 Tasks . . . 56 6.2.1 Trace the Line . . . 56 6.2.2 Obstacle Course . . . 56 6.2.3 Simon Says . . . 57 6.3 Experiment . . . 57 6.3.1 Participants . . . 58 6.3.2 Test Procedure . . . 58 6.4 Evaluation . . . 60 6.4.1 Collected Data . . . 60 6.4.2 Hypothesis 1: Task Duration . . . 62 6.4.3 Hypothesis 2: Error Recovery Time . . . 62 6.4.4 Hypothesis 3: User Accuracy . . . 62 6.4.5 Hypothesis 4: User Experience . . . 62 6.5 Software Guide . . . 63 6.5.1 Installation . . . 63 6.5.2 Running the Application . . . 63 6.5.3 Experiment Data . . . 64

7 Conclusion and Future Work 65

Bibliography 67

Appendix A SUS Test 76

Appendix B User Instructions 77

Enhancing Projected Augmented Reality Interactions using Haptic Technology

using Haptic Technology

Enhancing Projected Augmented Reality Interactions

Counsellors: Ir. Joris Heyse, Dr. ir. Maria Torres Vega

Supervisors: Prof. dr. ir. Filip De Turck, Dr. ir. Femke De Backere

Nicolas Legrand

using Haptic Technology

Enhancing Projected Augmented Reality Interactions

Counsellors: Ir. Joris Heyse, Dr. ir. Maria Torres Vega

Supervisors: Prof. dr. ir. Filip De Turck, Dr. ir. Femke De Backere

Nicolas Legrand

Acknowledgements

Permission for use of content

Enhancing Projected Augmented Reality

Interactions using Haptic Technology

Abstract

Keywords

Verbeterde geprojecteerde Augmented

Reality interactie door haptische technologie

Samenvatting

Trefwoorden

Enhancing Projected Augmented Reality

Interactions using Haptic Technology

A

1 finger

2 fingers

0

1

2

3

4

5

Marker-based

Markerless

Verbeterde geprojecteerde Augmented Reality

interactie door haptische technologie

A

1 finger

2 fingers

0

1

2

3

4

5

Marker-based

Markerless

Table of Contents

_Marker-based

_Marker-based