Exploring User Interaction Techniques for Head Mounted Devices to Control Content on Multiple Displays

(1)

Exploring User Interaction Techniques for Head Mounted Devices

to Control Content on Multiple Displays

Thesis - Master Information Studies (Human Centered Multimedia)

Barry Kollee (10349863) Supervisors Sven Kratz and Frank Nack

A thesis presented for the degree of Master of Science

Faculty of Science University of Amsterdam

The Netherlands 01/08/2014

(2)

Exploring User Interaction Techniques for Head Mounted Devices

to Control Content on Multiple Displays

Barry Kollee (10349863)

ABSTRACT

We introduce the usage of several interaction techniques that can be performed on a head mounted device. We explored several user interaction techniques by means of 2 user stud-ies to find out whether hand tracking and hand gestural interaction in an ego-centric manner, inside a smart office environment, outperforms the interaction techniques that a commercial HMD already offers users. The user studies show that the proposed modalities outperform voice com-mand control based on task completion time, amount of er-rors, intuitiveness, comfortability and effectiveness. We re-port qualitative and quantitative data from both user stud-ies, which gives insight into the difference in performance between a selection of interaction techniques that can be performed on a head mounted device.

Keywords

Head Mounted Device (HMD), smart glasses, multi display environment, interaction techniques, modalities, ego-centric, smart office, hand gestures, close to body interaction, wear-ables

1. INTRODUCTION

Wearable technology is currently gaining in popularity, which is shown by an increase of the number of head- mounted-devices (HMDs) and other wearables - such as smart watches - that are becoming available commercially.

Next to the innovation of wearable technology many spaces within office buildings are becoming ’smart’ (i.e. spaces such as meeting rooms are being equipped with a prolif-eration of devices, such as displays, wireless speakerphones and remotely-controlled lighting that can be interconnected via network over wireless or wired connection.)

In this paper, we explore the use of wearable devices, specifically HMDs, for interacting with such a smart space. For this study we have used one of the most current HMD available on the market named; Google Glass. Since this device has a relatively limited array of input options. (i.e. voice command-control, head orientation information (gy-roscope, accelerometer), and the rim-mounted touchpad), developers and researchers are trying to find out whether there is a more intuitive, productive and quicker way to use these devices for some applications. Therefore, researchers and developers attach different type of sensors (see [5] and [4]) onto the HMD to see how these sensors work in their specific test- environment and how they outperform the ex-isting interaction techniques the device already has. This type of research might give insight to the hardware creators to embed these new sensor-components into the device itself to give all the developers the chance to create applications with them, which could eventually result in a better user ex-perience for consumers. When adding these additional sen-sors to a HMD some design choices need to be considered. Colaco et al. [5] mentioned three characteristics an addi-tional sensor should have when speaking of close to body interaction (see Related Work).

In the context of smart spaces it seems that this limited amount of interaction techniques poses usability problems,

1

2

3

Figure 1: An experimental setup of a “smart” office environment. Users, such as the person on the right (1) can use their HMD to interact with the displays (2, 3).

for instance, it can be difficult to select external devices in the environment. Previous work has already tried to ad-dress this problem Chen et al. [4], although the interactions presented there still seems cumbersome (i.e. [4, p. 9] ) ”The shortcoming of the IR mode was that you had to be a certain distance away in order for it to detect the appli-ance”). A further issue, which we touch tangentially, is the question whether smart spaces should be instrumented to (externally) track users, or if the users themselves should be instrumented for input tracking in an ego-centric man-ner. Although, in this paper, we do not make a direct com-parison between ego-centric and external tracking, we high-light some of the interaction possibilities for ego-centrically tracked users working in smart spaces.

We propose the use of air hand gestures to increase in-put possibilities for HMDs, and to make interaction with external devices and displays in a smart space scenario eas-ier. However, since there are numerous modalities available to user interface designers of HMDs or other type of wear-ables, e.g. touch pad, voice commands, head gestures and hand gestures, we are also interested in determining the us-ability of these different modalities for input tasks on HMDs and HMD-based interaction with smart spaces.

Specifically, we contribute a prototype HMD that uses ego- centric tracking to enable in-air hand gestures as input, in addition to the other modalities mentioned previously we present the results of two user studies centered around a smart-space i nteraction scenario. In the studies, we examine the usability of a range of input techniques available through our prototype for smart space interaction.

In the first study, we gather qualitative feedback on the usability of the proposed input techniques for an interaction scenario involving and HMD and a smart space. In the second study, we measure user performance on a prototype system that implements the interaction scenario, in order to examine if the qualitative results from the first study are reflected in quantitative performance measures.

(3)

interaction technique fits the tasks of the prototyped appli-cation best. At the outset of our work, a hypothesis was established, which addresses the problem/goal-statement of the research (i.e. interaction techniques for HMDs in a smart space environment.) The following main-research question was set up, where with manipulating we mean; moving vi-sual items from one display to another.

”Does hand tracking and hand gestural input outperform the head orientation, audio-based or list-based interaction on HMDs, when selecting and manipulating visual items on multiple displays?”

This paper will give insight into the proposed interaction techniques, voice command and the touchpad and how they perform compared to each other in an experimental smart environment. At first this paper will discuss previous related work, which gives an overview on the current types of inter-action that are demonstrated in HMD appliances. Secondly, the setup and results of the first user study will be discussed. Thirdly, a description will be given regarding the prototype setup. (i.e. the used hardware and software components for the prototype). Fourthly, the setup and results of the sec-ond user study will be discussed. Finally, this paper shows the setup of the prototyped application, the findings and conclusions of the user study and how further research can be conducted with respect to the usage of HMDs in a smart office environment.

2. RELATED WORK

The first developments regarding HMDs origin from the 1960s, when the first head mounted stereophonic telephonic display (called ’Telesphere Mask’) was created and patented by Morton Heilig [8, p. 4] consisting of ”a hollow casing, a pair of optical units, a pair of television tube units, a pair of ear phones and pair of air discharge nozzles, all coacting to cause the user to comfortably see the images...”. In the following, we provide an overview of related work, with a focus on the interaction techniques we study in this paper.

2.1 Head Orientation

Chen et al. [4, p. 9] showed ways how physical home appli-ances could be controlled by using head orientation as input for their prototype. They compared head orientation selec-tion versus the device’s touchpad (i.e. list-view selecselec-tion) to control these home appliances. They found that if more than 6 home appliances are selectable, head orientation input out-performed the list view of the device’s touchpad. Meaning that home appliances could be selected and controlled faster with head orientation input than with touchpad input.

Still head orientation is less precise than hand tracking as Patel et al. (2003) [12] showed with his laser prototype mounted on a mobile phone. Since hand tracking is not adopted yet by big companies, such as Google, research is still conducted in this interaction technique as seen in [5].

2.2 Hand Tracking and Gestures

Hand gestural input on a mobile device is seen as the fu-turistic approach for consumers to interact with their glasses. Since these devices are going towards consumer usage, re-searchers want to find ways to embed new technologies into commercial products. However, taken the design character-istics from Colaco et al. [5], a scalable hardware technology is needed that suits a HMD for consumer usage.

Figure 2: Classified hand gestures on MIME [5]

In stages when additional hardware components will be attached to HMDs, Colaco et al. [5] described 3 design con-siderations that should be taken into account. In their case it consisted of a Google Glass device, which was able to track one hand of the user by means of an additional hard-ware component (i.e. a 3D sensor). Colaco et al. [5, p. 3] list these 3 close to body interaction design characteristics as follows:

• ”Technical: High accuracy, low power, low latency, small size, daylight insensitivity, and robust perfor-mance in cluttered, noisy and fast-changing environ-ment.

• User experience: Interacting with the HMD should be intuitive and should not induce fatigue upon pro-longed use. The input device must be able to support both motion- and positioned- controlled gestures in 2D and 3D.

• User Convenience: The sensor should be embedded within the HMD to enable unencumbered user inter-action...”[5]

From a technical point of view we need to take into ac-count that the 3D/2D sensor should be capable of tracking hand positions inside various environments and also shifts in these environments (i.e. when using the sensor in and out-doors). Due to lack of processing power it is not yet possible to perform image processing on 2D images in any environ-ment without losing performance and maintaining battery life. Therefore, researchers try to find ways to invent new type of sensors suitable for HMDs, such as the time-of-flight module used by Colaco et al.[5]. All with the goal to out-perform regular image processing techniques.

Secondly, Colaco et al. [5] stated that the scalability of the sensor is an issue. It should be embedded into the HMD to obtain unencumbered interaction and be small in size. Also, visual markers, such as the colored glove used by Keskin et al.[9], should be avoided. They might replace additional hardware technology on the device, but on a user friendliness point of view they are not useful.

3D sensors

Not only tracking hand positions but also capturing hand gestures (i.e. shape of the hand) can be used as input for

(4)

HMDs as Colaco et al. [5] showed. Since the built-in camera (on its own) cannot be solely used as input for hand gestures, [5] found ways to use 3D sensors in addition to the RGB camera that most HMDs have.

A lot of 3D sensors are already on the market but most of them consume a lot of power, which is not in compliance with the design considerations from Colaco et al. [5]. Still, with sensors, such as the time-of-flight module, wherein Co-laco et al. [5] showed how to accomplish recognizing low level hand-gestures while maintaining low battery consumption. Further advancement in 3D camera technology will most likely enable us to recognize higher level hand gestures.

2.3 Eye Tracking and Gaze Gestures

Since a HMD is worn on the user’s head, it might sound obvious to take the user’s eye position (i.e. where is the user looking at) into account. Ferrari et al. [6] already showed how they were able to classify certain eye positions and recognize these on a mobile device.

These eye positions could individually be action handlers, but also certain gaze-patters (e.g. first up, then down, then left, then right) could be used for more in depth interaction with a smartphone or even for typing purposes as shown with the EyeS system from Porta et al. [14]. These type of interactions are called gaze-interaction and can easily be developed with an of the shelve RGB-cameras as input. The gaze interaction technique suits an HMD since the RGB camera should not have to use too much power, which is in compliance with the design characteristics from Colcaco et al. [5].

Mardanbegi et al. [11] showed how not only eye position at various intervals, but eye position tracking could be achieved as well. Here, users can select home appliances listed on a display and by blinking their eyes they could interact and perform actions.

2.4 Touch control

A commonly used interface for computers consists of a touchpad interface, where a touchpad provides screen cur-sor/pointer movement control Ferrari et al. [6]. It works by means of mapping the user’s finger to the device’s cur-sor/pointer.

Not only tracking consists of a common interface tech-nique, but also swiping (e.g. Left, right, up, down) and tapping as shown within one of the popular HMDs called Google Glass, wherein the timeline interface is swipe and tap based [7]. The scalability and low power consumption of touchpads make them ideally suitable for HMDs.

2.5 Voice control

As HMDs are meant for mobile and unencumbered usage, interacting with the device could be performed without any physical actions. By means of voice command controls users can interact with HMDs. Since voice command controls are based on fixed vocabularies, it still remains hard for HMDs to completely understand what people are saying. Optimiz-ing the recognition vocabularies (or lexicons) is therefore a common research-field in information technology.

Still voice control has some disadvantages. Interacting with devices in public might be peculiar since people around you can hear you speaking out loud, which also brings along privacy issues. Finally, voice command control’s accuracy can highly differentiate between users and their accents.

Therefore, it does not seem a good idea to control HMDs solely with voice commands.

2.6 Wearables

When speaking of attaching new sensors to a HMD, next to the embedded hardware components, wearables can be used in addition to HMDs as well. A wearable consists of a mini-computer that is able to sense the user’s physical and/or emotional state and has first been named in the field of information technology by Steve Mann, whom is a pioneer in the field of wearables. [10]

Most wearables do not act on their own and are connected to some kind of external device, where the wearables’ sensor-data can be fetched and looked into, such as heartbeat and sweat-monitors. Other wearables consist of devices that are able to show information to the user (such as the Pebble Watch [13]) or even let the user interact with information that the wearable shows (e.g. the Razer Nabu [15] wristband that notices hand movement).

2.7 Devices

HMDs can also be connected to other devices via a wire-(less) connection, such as mini-computers (e.g. the XBee radio from Chen et al. [4]) and smartphones. Since some HMDs are equipped with WIFI and Bluetooth receivers they are able to communicate with these devices. Therefore, these devices can be used as input for the HMD as well. Nonetheless, using an external device in combination with a HMD does not comply with the User Convenience guide-lines from Colaco et al. [5] since we are not talking about embedded but external hardware components.

3. ENVIRONMENT AND SCENARIO

Modern work environments contain spaces such as con-ference rooms that are equipped with a large amount of technology, such as displays, projectors, audio and telecom-munication systems, climate control and lighting systems. However, the interface to this technology is often cumber-some and heterogeneous. Thus, we believe that new and im-proved ways of interacting with infrastructure that is present in such smart spaces need to be found. Therefore, we de-vised a scenario that allows us to prototype and evaluate HMD-based gestural interactions in smart work spaces.

This smart office environment within the experimental setup consists of a conference room, wherein multiple dis-plays are placed throughout the space as seen in Figure 1. Each display depicts several image thumbnails, which can be manipulated by the user (i.e. the user has the option to transfer these image thumbnails from one screen to an-other). The manipulation of these images is done by means of using the proposed interaction techniques and the devices’ default interaction techniques (i.e. voice command and the touchpad interaction technique).

Boring et al. [2] already showed how visual content could be selected, controlled and transferred from one screen to another by using a mobile device (i.e. an iPhone 3G) and the embedded camera from the device. For our user study we use the same setup, however; we have replaced the mobile device for a HMD. We investigated whether the same tasks can be accomplished by a HMD and what hardware components are needed to achieve the best user-interaction experience within a smart office environment.

(5)

(a) Head Gestures

(b) Hand Tracking

(c) Hand Gesture

Figure 3: Proposed gestural techniques for gestu-ral interaction with external devices using an HMD with ego-centric tracking.

3.1 Interaction

Interaction Techniques and Tasks

The task (i.e. moving an image thumbnail from one screen to another) that needs to be performed by the user consists of three steps.

1. Select the display, which holds the image thumbnail that needs to be transferred.

2. Select the image thumbnail on the selected display. 3. Select the display where the image thumbnail needs to

be transferred to.

These tasks are accomplished by means of the proposed HMD interaction techniques, voice command and touchpad.

Head gestures

Within the scenario of head gestures the user gazes towards the display that needs to be selected or the image thumbnail that needs to be transferred. By means of a ’nod’-gesture (i.e. moving the head up and down) the selection takes place as shown in figure 3(a).

Hand tracking

This interaction technique consists of the tracking of the hand’s position with respect to the HMD. By means of a push movement (see figure 3(b)) in space (i.e. a rapid shift in z-axis position) a selection takes place.

Hand gestures

With hand gestures we mean the pose of the hand. As shown in figure 2, Colaco et al. [5] already used the paradigm of hand gestures. For our application we choose for a grasp gestures (as shown in figure 3(c)) in the following order, where step 2 and 3 emphasized a natural user interface (i.e. grab and let go).

1. The user selects the display by looking at the display and grasping it in space

2. The user opens up his hand and hovers over the thumb-nails. When the desired image thumbnail is in the view of the user, the user grasps the thumbnail by holding his or her hand as a fist.

3. The user looks towards the display where the image thumbnail needs to be transferred to and opens his or her hand again.

Within the scenario the HMD, in specific; the Google Glass, is used as input device for all the interaction tech-niques. Next to the input, it also uses the embedded display to give visual feedback to the user on the current selected item or the current item that is in the view direction of the user.

4. PRESTUDY

Before implementating the previously discussed scenario as a software prototype, we conducted a prestudy to elicit qualitative user feedback on the general input techniques we discussed in the previous section, i.e. head gestures, hand tracking and hand gestures. The goal of this study is to see whether there is a user acceptance with respect to the proposed interaction techniques and to elicit qualitative feedback regarding three different types of gestural input modalities. The participants of the user study needed to watch three demonstration videos (videos can be found on www.fxpal.com), wherein all the proposed interaction tech-niques were shown. (i.e. hand tracking, hand gestures and head gestures). After that they were asked to fill in a ques-tionnaire (see Appendix A). The scenario behind the demo video consists of a brainstorm with architects, whom want to decide what type of stone to use for one of their projects. They organize the types they want by means of placing im-ages of the stone-textures onto another display.

The prestudy questionnaire (see Appendix A) consisted of three parts:

1. Five Likert-scale questions with respect to effective-ness, learnability, comfort, practicalness and intuitive-ness

2. Eliciting emotional responses to the shown interaction examples using a custom variant of Agerwal et al [1]. In this method, we allowed the test subjects to dis-tribute up to three ”points” to any emotion on the EmoCard scale. We believe that this approach allows users to either express a wider range of emotions or strongly express a single emotion while rating a tech-nique.

3. Provide three ranking preferences for each of the the proposed interaction techniques vs. the already pro-vided interaction techniques that HMDs already offer. (i.e. voice command, and touchpad )

(6)

Technique Hand Gestures Hand Tracking Head Gestures Intuitiveness 7 6 5 4 3 2 1 0 23 Page 1 (a) intuitiveness Technique Hand Gestures Hand Tracking Head Gestures Comfort 7 6 5 4 3 2 1 0 14 Page 1 (b) comfort Technique Hand Gestures Hand Tracking Head Gestures Effectiveness 7 6 5 4 3 2 1 0 14 Page 1 (c) effectiveness

Figure 4: Box plots of the results of the prestudy Likert scale questionnaires rating the intuitiveness, comfort and effectiveness of the interaction techniques. A lower rating is better.

Excited Neutral Excited Pleasant Average Pleasant Calm Pleasant Calm Neutral Calm Unpleasant Average Unpleasant Excited Unpleasant Head Gestures Hand Tracking Hand Gestures 15 15 12 12 11 8 9 7 7 7 6 9 7 4

(a) Frequency Slice Chart of EmoCard Rat-ings Excited Neutral (0) Excited Pleasant Average Pleasant (90) Calm Pleasant Calm Neutral (180) Calm Unpleasant Average Unpleasant (270) Excited Unpleasant 166.9 107.8 113.4 Head Gesture Hand Tracking Hand Gesture

(b) Mean EmoCard Ratings

Technique Touchpad Voice Command Gestural Technique Count 120 100 80 60 40 20 0 Less Preferred Mid Preferred Most Preferred Rank Page 1 1st preference 2nd preference 3rd preference

(c) Technique Ranking Fre-quency

Figure 5: (a) Frequency slice chart of ratings on EmoCard sectors. (b) Mean EmoCard Ratings plotted as angles on the EmoCard scale. (c) Ranking frequency of gestural techniques vs. basline techniques.

The questionnaire has been filled in by 16 participants from an industrial research lab, whom were in the range of 26 - 55 years old. 4 participants were female and 10 were male.

4.1 Findings

Based on the results from the likert scale questionnaire (Figure 4) with respect to intuitiveness , comfortability and effectiveness head gestures are less favorable than hand track-ing and hand gestures. A nonparametric Kruskall-Wallis test on the rating data shows a significant difference within these measures with p = 0.005 for intuitiveness and comfort and p = 0.002 for effectiveness. The most concerned feed-back given from a participant regarding head gestures was as follows: ”Head gesture is going to make my neck sore and seems embarrassing in a professional situation”.

We received a total of 48 emotion ratings for head gestures using the custom method we described previously (i.e., 16 participants×3 distributed emotions for head gestures). 28 ratings (58%) were given in the “Average Unpleasant” to “Calm Neutral” sectors (Figure 5(a)). The EmoCard results (Figure 5) are consistent with the results for intuitiveness, comfort and effectiveness of the Likert scale questionnaires.

Figure 5(a) shows a frequency slice chart that presents a histogram of the test subjects’ ratings plotted on the Emo-Card sectors. We can see that head gestures (green color) were rated more towards the neutral/unpleasant side of the EmoCard scale.

In an alternative analysis, we assigned an angle value to each EmoCard emotion (e.g., “Excited Neutral” = 0, “Calm Neutral” = 180, etc.) and calculated the average angle. Fig-ure 5(b) shows the average EmoCard angles plotted on po-lar coordinates. It is clear that head gestures (green) are centered mostly around the “Calm Neutral” sector and the hand gesture techniques are on average in the sector between “Calm Pleasant” and “Average Pleasant”. An ANOVA on the converted EmoCard ratings shows a significant difference (f1,44= 4.666, p = 0.036). A Bonferroni post-hoc

compar-ison shows a significant difference between head gestures and hand tracking (p = 0.030) as well as a marginally sig-nificant difference between head gestures and hand ges-tures (p = 0.056).

A final comparison for the pre user study is the ranking of the proposed interaction techniques against the devices’ default interaction techniques (i.e. voice commando inter-action and touchpad interinter-action) as depicted in Figure 5(c).

(7)

Camera Forwarder Backend Glass App Glass Forwarder Camera Publisher Hand Location Gesture Info QR Info Camera Preview Received Input Selected Display Selected Item Voice Command List Item Camera Preview Hand Location Gesture Info QR Info Voice Command List item

Figure 6: This system diagram shows the direction and type of messages that are sent between the three software components that comprise the prototype.

This chart shows that the voice interaction is disliked the most and the proposed (original) technique is preferred by the participants of the pre user study.

The pre user study tells us that head gestural input for the prototype application is disliked more than the hand tracking or hand gestural input. Therefore, we decided to exclude head gestural input for the prototype since it seems that users would not like to use this type of interaction in the first place.

5. TECHNICAL OVERVIEW OF PROTOTYPE

This section describes all the hardware and software com-ponents that are used for the second user study’s prototype application.

In Figure 6 the entire system overview is depicted. Each component is discussed in turn.

5.1 Depth camera

To be able to fetch the hand position in space (i.e. the hand’s X, Y and Z position) and its pose a depth camera is needed. The current state of HMDs, specifically Google Glass (as used in the prototype), does not have the option to fetch visual information with respect to depth. Therefore, another separate hardware component is needed that is able to fetch this information. Within the prototype application a ’Creative SENZ3D’1is used. This device has two cameras enabling it to capture RGB images and depth images, which can be processed by a C++ application.

This C++ application is able to process the raw depth im-ages using the Intel Perceptual Computing SDK2_(PCSDK),

OpenCV3 _{and ZBar}4 _{libraries to publish hand tracking}

in-formation, a low-resolution camera preview image stream and QR Code information for use by the backend applica-tion.

The capturing device has been customized in such a way so that can it be worn by the user by means of a head strap (see figure 7). This substitutes the scenario wherein the depth camera is actually embedded into the HMD. We designed a custom 3D printed coupler to attach the capturing device to the head strap.

1

The maximum IR depth resolution of this sensor is 320x240 and its diagonal field of view is 73◦ is used. The sensor captures depth and RGB images at a rate of 30 Hz.

2_{https://software.intel.com/en-us/vcsource/tools/}

perceptual-computing-sdk/home

3_{http://opencv.org/} 4

http://zbar.sourceforge.net/

Figure 7: Depth camera head strap as a substitution for embedded hand tracking in the HMD.

Preview of Selected Item Camera Preview

Text Description of Selection

Figure 8: Screenshot of the HMD application.

5.2 Java Backend

The Java backend has been set up by using the Processing Libraries [20]. The backend receives the hand position and pose from the Visual Studio project and uses this informa-tion to detect selecinforma-tion events made by the user. Also, it is able to receive click events made by the HMD. All these selection events consist of:

• Hand tracking: rapid shifts in Z position of the hand. • Hand gestures: shifts in hand-openness (open vs.

close)

• Touchpad selection: tap events from a list view on the HMD.

• Voice command selection: voice commands from the HMD.

Next to the monitoring of selection-events the backend produces the visualization on the displays in the smart office space as it depicts visual identifiers (i.e. QR-codes) on the displays that can be accessed by the depth camera. When selection events are produced by the backend it adopts the user interface of the display to the received selection.

5.3 HMD

The HMD is used to give visual feedback to the user while using the prototype by means of hand tracking and hand gestural input as depicted in figure 8. A square live image feed is transferred from the Visual Studio project towards the Google Glass. It is used to give the user a better sense on what the current view direction is of the depth and RGB-camera.

The voice command option in the prototype is using Google’s voice recognition software, which is also used in the systems default voice command environment. To improve the detec-tion accuracy for the command set we are using a Hamming Distance measure is incorporated for finer-grained string matching information. Thus, it enables the voice command

(8)

Figure 9: User study in progress

control’s environment to be less strict, meaning; a difference in 2 characters can be recognized and filtered so that the intended command is forwarded to the java backend. Fi-nally, the HMD has a list view option wherein a vertical list is presented. The user can swipe through this list and on each tap-event a message is forwarded to the java backend containing the selected item.

5.4 ZeroMQ frame forwarder

The last component of the system is the ZeroMQ [21] frame forwarder. This piece of software is able to send in-formation to one actor to another and handles the queu-ing of data packages automatically. In stages when all ac-tors are connected to a local hotspot ZeroMQ is using a publish/subscribe semantic, enabling it to push these data packages from one socket to another. ZeroMQ has been im-plemented in all three software components.

In the future hardware manufacturers might consider em-bedding this type of camera into the existing HMDs to give developers the chance to optimize the user experience while using their HMD.

6. USER STUDY

The second user study is set up to provide quantitative usability information on an actual prototype (see figure 9 implementing the scenario we discussed in chapter 4 (i.e. Prestudy).

The participants of the user study were asked to use each interaction technique in turn. They needed to perform 2 tasks which each interaction technique. These tasks con-sisted of

1. Move a visual item from display 1 to display 2. 2. Move a visual item from display 1 to display 2 and

transfer it back to display 1.

While performing the tasks we have measured the total task completion time, selection speed per subtask (see 3.1) and errors (i.e. incorrect selections).

When the participants were done with using the interac-tion technique they were asked to fill in a quesinterac-tionnaire (see Appendix 2 for user study questionnaire), which are used

to see whether the prestudy results were also reflected on the actual user study. This questionnaire consisted of three parts.

1. Likert scale questions in the format of SUS (System Usability Scale) from Brooke et al. [3]

2. Likert scale questions with respect to effectiveness, learn-ability, comfort, practicalness and intuitiveness (i.e. just as in the pre user study)

3. Ranking of interaction techniques per task.

Before measuring the total task completion time, selection speed per subtask and errors the participants got instruc-tions on how to use the system. They were instructed to practise with the system for 15 minutes to get a feeling on how to interact with it. After they got to know the interac-tion technique the measurement started.

To recreate the scenario from the prestudy the test envi-ronment consisted of two big LCD screens, (where the proto-type’s visualization was depicted on) in an office conference room. Each participant sat at the table in front of both screens and conducted both tasks for each interaction tech-nique. 12 participants from an industrial research lab have participated in the user study. Their age was in the range of 23-44 years old.

6.1 Findings

The total task time from the user study (see figure 10(a) and 10(b)) tells us that the proposed modalities (i.e. hand tracking and hand gestures) outperformed the voice com-mand control in terms of task completion time. An ANOVA shows that there was a significant difference in the task completion times for task (2) (F3,44 = 5.516, p = 0.003).

For task (2), list view was the technique with the low-est total execution time of 49.92 s, followed by hand ges-tures at 69.33 s and hand tracking at 75.92 s. Voice command had the highest total execution time at 95.92 s. However, a Bonferroni pairwise comparison showed only a significant difference between the touchpad and voice command techniques.

The amount of errors from the user study are shown in fig-ure 10(c) and 10(d)). These graphs tells us that the partic-ipants made more mistakes while using the voice command interaction technique. For task 1, the participants made an average of 1 error per task against an average of 0.33 for both hand gestures and hand tracking. No errors were ob-served for the list view modality. Task 2 shows an average of 0.75 for voice command, 0.25 for hand tracking and 0.17 for hand gestures. The list view technique showed an average of 0.09 errors.

We conducted a Kruskall-Wallis test on the error count measure. The only observable statistically significant dif-ference was for task (1), where touchpad and voice com-mand differed significantly (p = 0.006).

We obtained an average SUS score (see figure 10(e) of 51 for list view, voice command and hand tracking as well as 49 for hand gestures. Not surprisingly, an ANOVA showed that there was no statistical significant difference for the SUS score. We should note, however that the score for all tech-niques was rather low on the SUS scale (which ranges from 0 to 100). The somewhat diffuse SUS results suggest that this questionnaire might not be entirely suitable to evaluate

(9)

Technique List view Voice Command Hand Gestures Hand Tracking TotalTask 200 150 100 50 0 30 16 20 43 Page 1 (a) Total Time Task (1)

Technique List view Voice Command Hand Gestures Hand Tracking TotalTask 200 150 100 50 0 18 8 30 17 Page 1 (b) Total Time Task (2)

Technique List view Voice Command Hand Gestures Hand Tracking Mean Error 4 3 2 1 0 Error bars: +/- 1 SD Page 1 (c) Errors Task (1) Technique List view Voice Command Hand Gestures Hand Tracking Mean Error 4 3 2 1 0 Error bars: +/- 1 SD Page 1 (d) Errors Task (2) Technique Hand gestures Hand tracking List view Voice command SUSscore 100 90 80 70 60 50 40 30 20 10 0 46 47 Page 1

(e) SUS Score

Figure 10: Quantitative results of our main user study. Note: the error bars on boxplots (a) and (b) show the 95 % confidence interval, whereas the bars on plots (c) and (d) show one standard deviation from the mean.

the usability of the post-desktop interactions we discuss in this paper.

Finally, the likert scale questions with respect to effec-tiveness, learnability, comfortability, practicalness and in-tuitiveness, whom were also asked within the questionnaire of the prestudy, reported no significant difference between the different interaction techniques the participants used. Therefore, we exclude the results from this part of the ques-tionnaire.

7. FURTHER APPLICATIONS OF

HMD-BASED SPATIAL INTERACTION

Apart from the application and scenario we have already

discussed in this paper, there are numerous possibilities of leveraging the spacial input properties afforded by our pro-totype. With ego-centric tracking, users can perform spatial interactions with any type of connected device (i.e., reach-able via TCP/IP or other communication channel) in the environment, without requiring further instrumentation of the device to track user inputs. In the following, we describe two demonstration applications we have developed based on our prototype.

As our prototype allows full 3D capture of the user’s hand, we can, for example, implement user interfaces for 3D scene viewing on ubiquitous displays very easily (see the demo application in Figure 11(a)). A further advantage for

(10)

ego-(a) Interaction with 3D content on large display.

(b) Ambient light control through in-air gestures. Figure 11: Two demonstration applications we im-plemented using our prototype.

centric tracking here is that, e.g., for very large displays (e.g., in public spaces) supporting interaction by larger numbers of users there is no sensor scalability or coverage problem as each user is equipped with a personal sensing device.

Another application example for our prototype is control-ling smart room appliances such as ambient lighting (see Fig-ure 11(b)). In contrast to previous work in this domain [4], we believe that interacting directly through gestures can be more effective than using GUI-based controls on the HMD. Gestures, for example, allow more possibilities to fine tune certain parameters that are of interest when controlling am-bient lighting, for instance, fine-tuning the lights’ brightness and hue settings. Also, targeting the ambient devices can be accomplished by visual selection, i.e., turning the head towards the device, or direct pointing via a gesture.

8. CONCLUSIONS

We showed several methods for interacting with a HMD in a smart space environment where users can select and ma-nipulate visual items on multiple displays in an ego-centric manner. We proposed the following main research question: ”Does hand tracking and hand gestural input outperform the head orientation, audio-based or list-based interaction on HMDs, when selecting and manipulating visual items on multiple displays?”

The prestudy tells us that hand tracking and hand gestu-ral interaction technique, based on intuitiveness, comforta-bility and effectiveness, outperformed the head gesture inter-action technique. Also, the participants from the prestudy disliked the head gesture interaction technique more than

the hand tracking and hand gesture interaction technique. The user study found that there is a clear difference in the task completion time of the tasks that needed to be per-formed with the prototype setup. It took the participants significantly longer, while using the voice command inter-action technique, to move visual items from one screen to another. Also based on the amount of errors the participants made, we see that the hand tracking and hand gestural in-put outperformed the voice command interaction technique. However, touchpad interaction was not outperformed by the proposed interaction techniques.

Since there is not a significant different on the SUS results, it tells us that the proposed types of interaction are not dis-liked more than the existing modalities (i.e. voice command input and touchpad input). Therefore, it seems that there is a future for the proposed interaction techniques.

Still we cannot conclude that hand tracking and hand gestural input outperformed all the default interaction tech-niques, since the list view (i.e. touchpad interface) gave sim-ilar results than the proposed interaction techniques. How-ever, the user studies did told us that the proposed modal-ities outperform the voice command interaction technique. Therefore, embedding a depth camera inside a HMD, which enables hand gestural input, can be valuable since it can enrich the user experience of these devices.

9. FUTURE WORK

We presented interaction techniques for HMDs in an ego-centric matter. However, further research might be con-ducted in the field of tracking the user in an exocentric manner and how this would outperform the ego-centric in-teraction technique.

Next to the scenario of a multi-display smart office en-vironment the interaction techniques can also be used as a prototype for different types of applications. This might give a better insight for what type of environment the proposed interaction would (not) work. We already contributed two examples for HMD-based spatial interaction but there are many others to think of.

Finally, the research could get new insights in stages when the depth camera is embedded into the HMD. This makes using the proposed interaction techniques less cumbersome for possible participants. They might feel more at ease when using the prototype and could be more positive about using it.

10. REFERENCES

[1] Anshu Agarwal and Andrew Meyer. Beyond usability: evaluating emotional response as an integral part of the user experience. In CHI’09 Extended Abstracts on Human Factors in Computing Systems, pages 2919–2930. ACM, 2009.

[2] Sebastian Boring, Dominikus Baur, Andreas Butz, Sean Gustafson, and Patrick Baudisch. Touch projector: mobile interaction through video. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2287–2296. ACM, 2010.

[3] John Brooke. Sus-a quick and dirty usability scale. Usability evaluation in industry, 189:194, 1996. [4] Yu-Hsiang Chen, Ben Zhang, Claire Tuna, Yang Li,

(11)

for the real world: Controlling physical appliances through head-worn infrared targeting. Technical report, DTIC Document, 2013.

[5] Andrea Cola¸co, Ahmed Kirmani, Hye Soo Yang, Nan-Wei Gong, Chris Schmandt, and Vivek K Goyal. Mime: compact, low power 3d gesture sensing for interaction with head mounted displays. In Proceedings of the 26th annual ACM symposium on User interface software and technology, pages 227–236. ACM, 2013. [6] Alberto Ferrari and Marco Tartagni. Touchpad

providing screen cursor/pointer movement control, May 21 2002. US Patent 6,392,636.

[7] Google. Google glass

-http://www.google.com/glass/start/, July 2014. [8] Morton L Heilig. Stereoscopic-television apparatus for

individual use, October 4 1960. US Patent 2,955,156. [9] C Keskin, A Erkan, and L Akarun. Real time hand

tracking and 3d gesture recognition for interactive interfaces using hmm. ICANN/ICONIPP, 2003:26–29, 2003.

[10] Steve Mann. “smart clothing”: wearable multimedia computing and “personal imaging” to restore the technological balance between people and their environments. In Proceedings of the fourth ACM international conference on Multimedia, pages 163–174. ACM, 1997.

[11] Diako Mardanbegi and Dan Witzner Hansen. Mobile gaze-based screen interaction in 3d environments. In Proceedings of the 1st conference on novel

gaze-controlled applications, page 2. ACM, 2011. [12] Shwetak N Patel and Gregory D Abowd. A 2-way

laser-assisted selection scheme for handhelds in a physical environment. In UbiComp 2003: Ubiquitous Computing, pages 200–207. Springer, 2003.

[13] Pebble. Pebble watch - https://getpebble.com/, July 2014.

[14] Marco Porta and Matteo Turina. Eye-s: a full-screen input modality for pure eye-based communication. In Proceedings of the 2008 symposium on Eye tracking research & applications, pages 27–34. ACM, 2008. [15] Razer. Razer nabu - http://www.razerzone.com/nabu,

(12)

(13)

(14)

(15)

(16)

(17)

APPENDIX B: USER STUDY QUESTIONNAIRE

Hand tracking Input

Effectiveness

This interaction technique would work effective for me.

Strongly Disagree Disagree Disagree

somewhat

Undecided Aree somewhat Agree Strongly Agree

Learnability

This interaction technique would be easy to learn for me.

somewhat

Comfortability

This interaction technique would feel comfortable to me.

somewhat

Practical

This interaction technique would be practical in use for me.

somewhat

Intuitivity

This interaction technique would be intuitive in use for me.

somewhat

I think that I would like to use this system frequently.

Strongly Disagree Disagree Undecided Agree Strongly Agree

I found the system unnecessarily complex.

I thought the system was easy to use.

(18)

I found the various functions in this system were well integrated.

I thought there was too much inconsistency in this system.

I would imagine that most people would learn to use this system very quickly.

I found the system very cumbersome to use.

I felt very confident using the system.

I needed to learn a lot of things before I could get going with this system.

Hand gestural Input

Effectiveness

somewhat

Learnability

somewhat

Comfortability

somewhat

(19)

Practical

somewhat

Intuitivity

somewhat

I think that I would need the support of a technical person to be able to use this system.

(20)

Voice Command Input

Effectiveness

somewhat

Learnability

somewhat

Comfortability

somewhat

Practical

somewhat

Intuitivity

somewhat

(21)

List view input

Effectiveness

somewhat

Learnability

somewhat

(22)

Comfortability

somewhat

Practical

somewhat

Intuitivity

somewhat

(23)

Rank your favorite interaction technique per task ( 1 is preferred the most and 4 is disliked the most) Select a display

Voice Command Input List view/ touchpad Input Hand Tracking Input Hand Gestures Input

Select a visual item/thumbnail on screen

Transfer the visual item/thumbnail by selecting a display

If you have further comments or remarks regarding the interaction techniques, please list them below (i.e. which interaction technique would you prefer/dislike the most and why?):