User-Centered 3D Manipulation Model
for Scientific Visualization
Nian Liu
nian.liu@hotmail.comAugust 11, 2015
Host Organization: Centrum Wiskunde & Informatica Company Supervisor: Prof. Dr. Robert van Liere
Abstract
3D interaction with scientific data has for many years been a research topic in the human-computer interaction community. A popular approach to interaction has been to employ the 3D Widget Model. However, some research has shown that 3D Widget Model is not intuitive and efficient to use with the 3D controller. In this project, the author proposes a user-centered approach called Triangle Model, as an alternative to the object-centered 3D Widget Model. With this approach, the users can manipulate scientific data in a more intuitive and efficient manner by using their bare hands. A feasibility study of Triangle Model has been conducted and the manipulation techniques of translation, rotation and scaling have been specified into detail.
Table of Contents
Introduction ... 1 Background ... 1 3D Widget Model ... 1 Leap Motion ... 2 Research Definition ... 3 Research Question ... 4 Thesis Structure ... 5 Related Work ... 6Hand Tracking Techniques ... 6
Glove-Based Techniques ... 6
Vision-Based Techniques ... 6
Gesture Techniques ... 8
Static Single-Hand Gestures ... 8
Dynamic Double-Hand Gestures ... 9
3D Widget Model ... 9 Experiment ... 11 Design ... 11 Translation Task ... 12 Rotation Task ... 12 Scaling Task ... 13 Setup ... 13 Participants ... 15 Procedure ... 15 Analysis ... 17 Results ... 18
Translation Tasks ... 18
Rotation Tasks ... 18
Scaling Tasks ... 19
Discussion ... 19
Single-hand vs. Two-hand ... 19
Inside vs. Outside Interaction Box ... 20
Triangle Size ... 20 Lessons Learned ... 21 Triangle Model ... 22 State Diagram ... 22 Reference Frame ... 23 Operations ... 23 Translation ... 24 Rotation ... 25 Scaling ... 26
Conclusion and Future Work... 30
Chapter 1
Introduction
The research project presented in this thesis is about the development of a user-centered model for the 3D manipulation of objects in the interactive scientific visualization applications.
Background
According to McCormick’s definition, scientific visualization (SV) makes use of computer graphics to create visual images which aid in the understanding of complex, often massive scientific data [1, 2]. It is primarily concerned with the three-dimensional visualization, especially in the fields like architecture, meteorology, biology and medical science [3]. An interactive SV environment provides insight into the data by giving users the control of how the data is presented and explored. The 3D Widget Model is a popular model for object interaction in the SV environments. In Virtual Reality (VR), 3D controllers (e.g. wands and trackers) can be used to control the behavior of the widget model.
3D Widget Model
As the interaction technique, 3D Widget Model encapsulates geometry and behavior of an application object into a corresponding widget for providing control and displaying information [4]. An example can be shown in Figure 1.1. Different types of manipulation like translation, rotation and scaling are encapsulated into the handles attached to the object. The user can perform the desired manipulation by operating the corresponding handle.
However, the 3D Widget Model has some drawbacks. Wiegers’ research [5] has shown that the widget model is not efficient when using 3D controllers. Moreover, as an object-centered model, the 3D Widget Model requires the user to go to the objects to perform the operation. In contrast, the user-centered approach brings the objects to the user. Comparing these two types of approaches, 3D Widget Model is not intuitive to use.
The definition of 3D Widget Model and the evidence of its drawbacks will be discussed in detail in Section 2.3.
Figure 1.1: 3D widget1 [4] Figure 1.2: Leap Motion controller2
(1) Object-centered (2) User-centered
Figure 1.3: Object-centered vs. User-centered
Leap Motion
Leap Motion controller (see Figure 1.2) is a camera-based hand tracking device released in 2013. It can accurately track all ten fingers up to 1/100th of a millimeter. This new technology brought many new possibilities to human-computer interaction and gained a lot of attention from Virtual Reality community in recent years.
1 Snibbe, S.S., accessed 15 June 2015, http://www.snibbe.com/research/widgets 2 RobotShop, accessed 17 June 2015,
For 6DOF (degrees of freedom) controller, there are many choices for instance data-glove, VR Cave, Personal Space Station, etc. The author chose to use Leap Motion because it does not require any extra devices to equip and is easy to set up.
However, the camera-based tracking suffers from one common drawback - the occlusion problem, which is demonstrated in Figure 1.3. When hands or fingers overlap with each other during the hand tracking, it can be difficult for the camera to determine the gestures and positions. In such scenarios, the camera can lose track of the hands, hence affecting the accuracy and robustness of retrieved data.
Figure 1.3: A demonstration of occlusion problem
Research Definition
As mentioned in the previous section, the current 3D Widget Model has some drawbacks that make it not intuitive and efficient to use. For these reasons, we propose a user-centered model, called Triangle Model, as an alternative approach to the 3D Widget Model. With Triangle Model, users can manipulate 3D scientific data in an intuitive and efficient manner by using their hands. Our initial idea is to use three fingers of a hand to perform the manipulation. The reason we need three fingers is because three points can form a triangle (see Figure 1.4) and with this triangle we can achieve the operations in six degrees of
freedom. The author will discuss the specification of Triangle Model in detail in Chapter 4.
Figure 1.4: The triangle formed by three fingers
Because camera-based tracking has the drawback of occlusion problem and because we had limited knowledge about how robust Leap Motion was to handle this problem. The author decided to conduct a robustness study at the beginning of the research.
Research Question
With all the concerns mentioned above, the research questions can be formulated as follow:
1. Can Leap Motion detect Triangle Model accurately and robustly?
a. Can Leap Motion detect the triangles of both hands at the same time?
b. Can Leap Motion detect small triangles?
2. How to specify the manipulation techniques of the Triangle Model? a. How to specify the translation technique?
b. How to specify the rotation technique? c. How to specify the scaling technique?
Thesis Structure
Chapter 2 provides an overview of related research.
Chapter 3 presents a docking experiment that the author conducted for studying the robustness of Leap Motion and for verifying the feasibility of our concept. Chapter 4 describes the specification of our Triangle Model in detail.
Chapter 5 gives a conclusion of this research and some recommendation for the future work.
Chapter 2
Related Work
In this chapter, an overview of related research will be presented from three perspectives. Firstly, we will look at different sensing techniques for gesture recognition. Secondly, we will present two gesture techniques. Finally, we will discuss an interaction technique, 3D Widget Model, in detail.
Hand Tracking Techniques
The current hand tracking techniques can be classified into various categories.
Glove-Based Techniques
Glove-based techniques (see Figure 2.1) require the users to wear a glove-type instrument as the input device, attaching with a sensor for collecting the data of user’s hand. The data provided by this device contains the information of the position and orientation of the palm and fingers. Such techniques have been applied in different fields like sign language understanding, robotics, healthcare, education, and training.
In general, the drawbacks of glove-based techniques are [7, 8, 9]: • High cost
• Limited range of motion
• Wearing the glove can make this technique cumbersome • Need calibration for new users
Vision-Based Techniques
Vision-based techniques (see Figure 2.2) use cameras to recognize hand gesture. In contrast to glove-based techniques, vision-based techniques do not require the users to wear any extra devices. Thus, they can directly interact with the computer using their bare hands. Such techniques can be further divided into two categories: appearance-based and model-based approaches.
Figure 2.1: Glove-based techniques [7] Figure 2.2: Vision-based techniques [8]
Appearance-Based Approaches
Appearance-based approaches use image features (e.g. edges and color blobs) to learn a direct mapping from the input image to the user’s hand space [11]. This type of approach does not use the spatial representation. Therefore, there is no need to extract information by searching through the whole configuration space; the information can be directly derived from the 2D images.
One popular approach is to look for skin colored regions in the image [12, 13]. Although this approach has been widely used, it does have some limitations. First, it is very sensitive to the light condition. Second, adapting to a flexible skin model can be challenging.
Another classical approach is to use the eigenspace. Barhate et al. [14] developed a robust shape-based tracker by using an eigenspace approach (so-called Eigentracker). This tracker can successfully track the motion of both hands across all possible dynamics during occlusion, and a wide range of illumination condition.
The appearance-based approach has the advantage of less expensive computation, because of the easier extraction in 2D images, but has the disadvantages of high sensitiveness to noise and partial occlusion [11].
Model-Based Approaches
Model-based approaches rely on the 3D kinematic hand model with considerable DOFs, and search in the large configuration space to find the best matching hypothesis between the hand model and the input image[11].
Rehg and Kanade proposed one of the earliest model-based approaches in 1994 [15]. They developed a tracking system, called DigitEyes, which can recover the state of a 27 DOF hand model from ordinary grayscale images at the speed of up to 10 Hz.
A new technology called depth camera (e.g. Kinect) became available in the past few years, and have been often adopted in motion recognition techniques. It can provide area information and edge information from the images. With this information, the user’s hands can be more easily distinguished from other objects [11]. Oikonomidis et al. [16] proposed a model-based approach that integrated the Kinect sensor with their hand tracking algorithm. This method can track the full pose of a hand in complex articulation robustly and efficiently.
Model-based approaches can model arbitrary hand poses and robustly handle partial occlusion, but are computationally more expensive [11].
Gesture Techniques
Gesture techniques specified predefined gestures based on the information gathered from the tracking techniques. Such gestures can be static, dynamic, single-hand or double-hand gestures.
Static Single-Hand Gestures
Kim et al. [17] built an immersive 3D modeling system that allows the user to perform non-trivial 3D tasks using a few static gestures. For the system to recognize the user’s head and hands, he or she needs to wear a white paper-rimmed polarized glasses and fingertip marker. As shown in Figure 2.3, there were five single-hand gestures introduced in this system:
• Pause: Open the whole hand.
• Point: Close the hand except for the thumb and index finger. The object is picked based on the direction of the index finger.
• Grab: Close the hand while pinching the thumb and index finger, and then move it.
• Rotate: Open the hand while pinching the thumb and index finger, and then rotate it.
• Scale: Open the hand while pinching the thumb and middle finger, and then move it.
(1) Pause (2) Point (3) Grab (4) Rotate (5) Scale
Figure 2.3: Gestures proposed by Kim et al. [17]
Dynamic Double-Hand Gestures
Bettio et al. [18] proposed a practical approach for spatial manipulation. They used the vision-based markerless technique for hand tracking and several two-hand gestures for interaction. Figure 2.4 demonstrates the gestures defined in this approach:
• Idle: Closing both hands initiates the gesture. • Translation: Moving two hands in parallel.
• Rotation: Rotating two hands around their barycenter.
• Scaling: Moving two hands apart, or moving a hand closer to the other one.
(1) Idle (2) Translate (3) Rotate (4) Scaling
Figure 2.4: Gestures proposed by Bettio et al. [18]
3D Widget Model
3D Widget Model is a popular approach for high-level interaction. A widget can be defined as an encapsulation of geometry and behavior used to control or
display information about application objects [4]. This model is mainly used for 2D controllers, such as standard PC mice, to perform manipulation with 6 degrees of freedom (6DOF). Since the 2D controller can only manipulate two of the three dimensions at the same time, two operations are required to manipulate three variables. 3D controllers can be used to manipulate all three dimensions simultaneously.
However, with 3D controllers, some drawbacks can be identified as follow: 1. Not efficient
Wiegers et al. conducted a study [5] on the performance of the widget model by comparing 2D and 3D controllers. His study showed that the efficiency of using 3D controller was not very high. One of the major problems with interaction in three-dimensions is the lack of depth perception [6]. Determining which object is closer than others can often be very hard. That makes the selection of the objects very difficult.
2. Not intuitive
3D Widget Model is an object-centered model that requires the user to go to the objects. In contrary, the user-centered approach brings the objects to the user. Comparing two types of approaches, the object-centered one requires the user to change the behavior to accommodate the operating object, while the user-centered one does not. Therefore, 3D Widget Model is not intuitive to use.
In summary, the hand tracking techniques are closely related to the hardware and vary based on the technology chose for the hand recognition. Such techniques provide the fundamental information at very low level. Gesture techniques use this information to define some gestures to be used in the system. The gestures can be static, dynamic, single-handed and two-handed. The 3D Widget Model determines the way of interaction at a high level. In this project, the objective was to propose an alternative (Triangle Model) to this model by using hand gestures. As a vision-based technique, Leap Motion was chosen for tracking user’s hands. The model and its gestures will be specified in Chapter 4, Triangle Model.
Chapter 3
Experiment
This experiment was conducted to answer the Research Question 1 for studying the robustness of Triangle Model when triangles are sensed with Leap Motion. The experiment was designed with three types of docking tasks, translation, rotation and scaling, with the initial concept of Triangle Model.
Design
When Leap Motion recognizes a user's hand, a triangle will be drawn on the screen based the positions of that hand’s thumb, index, and middle fingers. The blue triangle represents the left hand; the yellow triangle represents the right hand.
Figure 3.1: Form a triangle based on fingers’ position
For each docking task, the user will be asked to manipulate the source object (in red color) to match one of the attributes of position, orientation and size of the target object (in green color). The manipulation starts when a triangle is intersected with the source object. The task completes once the source object
matches the target object, and the experiment will switch to the next task if there are tasks remaining.
Interaction techniques for the 3D manipulation include three fundamental tasks: object translation, object rotation and object scaling [19]. Therefore, the author decided to design docking tasks for each of them.
Translation Task
Translation task (see Figure 3.2) requires the user to translate the Red Square to match the position of the Green Square with the triangle formed by the thumb, index, and middle fingers.
Figure 3.2: Translation task
Rotation Task
Rotation task (see Figure 3.3) requires the user to rotate the Red Pie to match the orientation of the Green Pie with the triangle formed by the thumb, index, and middle fingers.
Scaling Task
Scaling task (see Figure 3.4) requires the user to change the size of Red Square to match the size of the Green Square with the triangle formed by the thumb, index, and middle fingers.
Figure 3.4: Scaling task
Software Implementation
The author spent two and half weeks to implement the program of this experiment. The program was written in Java and the GUI library used for this development was JavaFX. The source code of this program has around 3192 lines of code and can be accessed via https://github.com/NianLiu89/UserCentered3D. As shown in Figure 3.5, the Model-View-Controller pattern was adopted.
Model
The Model package consists of three classes: Triangle, Record, and Logging. Triangle Class contains the coordinates of the triangle formed by user’s three fingers. Record Class contains the result of each experiment. There are two responsibilities of Logging Class: one is to store the experiment records into the database; the other one is to write the raw data gathered from Leap Motion into separate data files.
View
The View package contains the FXML files that implemented the user interface of all types of experiment tasks, translation, rotation and scaling.
Controller
classes that are corresponding to the views. LeapMotionController sets up the connection with the Leap Motion device and reads the coordinates of three fingers at run-time. Controller classes of the views define how the system should react to the user inputs and manipulates the Triangle defined in the Model package.
Utility
The Utility package consists of two classes: CoordinateMapper and Mathematics. CoordinateMapper Class creates a mapping between the coordinate system of Leap Motion and the one of the application. With this class, the coordinate of one system can be easily converted to the other one. Mathematics Class contains the methods to calculate cross product, dot product, normal vector and etc.
Setup
The computer used for running the experiment program was the Lenovo Thinkpad T420. The screen size was 14 inch and the resolution was set at 1600 x 900. As the hand tracking device, the Leap Motion controller was placed between the user and the computer. The user can place his/her hands around 20 centimeters above the controller to interact with the system.
Figure 3.6: The experiment setup
Participants
Fifteen unpaid volunteers participated as the test subjects during this experiment, including the students from University of Amsterdam and the colleagues from Centrum Wiskunde & Informatica (CWI). All the participants were right-handed and had no experience of using Leap Motion.
Procedure
At the beginning of each experiment, the instructor gave a verbal explanation of the tasks and interface and demonstrated the hand tracking with Leap Motion. The experiment consisted of three subtasks for each type of manipulation, resulting in a total of nine docking tasks per subject. All the subjects received exactly the same sequence of tasks.
Before starting each task, an instruction displayed on the top part of the screen. The subjects were encouraged to ask questions if they were not clear about the
task or interface. However, once the tasks started, the subjects have to perform the operation unaccompanied.
The sequence of experiment tasks can be described as follow: 1. Translation Tasks
a. Single-hand
Subjects were required to perform the translation task using one of their hands.
b. Two-hand
Subjects were required to perform the translation task using both of their hands.
c. Single-hand or Two-hand
Subjects can perform the translation task using either one or both of their hands based on personal preference.
2. Rotation Tasks a. Easy
Subjects were required to perform a 30-degree rotation task. b. Normal
Subjects were required to perform a 90-degree rotation task. c. Hard
Subjects were required to perform an 180-degree rotation task. 3. Scaling Tasks
a. Scale up
Subjects were required to perform a scale-up task. b. Scale down
Subjects were required to perform a scale-down task. c. Scale up
Same as the task 3.a.
During each experiment, two types of data were logged: one was the time that the subject spent on each task; the other one was the raw data produced by Leap Motion. The former gave a quick overview of the subject’s performance and was stored in the local database. While the latter was the primary source for analysis and was written as data files. These data files contained information such as the
position, orientation and speed of the subjects’ palms and fingers at each camera frame.
Analysis
To gain insight into the robustness of Leap Motion, the author measured the frequency that Leap Motion lost track of the user’s hand during the experiment according to the formula:
Frequency of Losing Track = Amount of Losing Track ÷ Completion Time The interaction area of Leap Motion was an important notion used for defining the losing track in the analysis. As shown in Figure 3.7, the red box above the Leap Motion controller is defined as the interaction box, which represents a rectilinear area within the Leap Motion field of view (FOV). As long as the user’s hand stays inside this box, it is guaranteed to remain within the range of detection. Therefore, two types of losing track were defined: outside the interaction box and inside the interaction box. For this analysis, the losing track that the author took into account for calculating the frequency in the formula above was the ones that occurred inside the interaction box.
Results
Translation Tasks
Table 3.1 shows an overview of the results of translation tasks. All the figures of the two-hand task were significantly higher than the rest two tasks. These figures indicate that two-hand task introduced more losing track and cost more time to complete.
Task 1.c was the only free-choice task among all nine docking tasks. The selections of this task of all subjects can be summarized in Table 3.2, which shows that 100% of the subjects chose to use the single-hand approach for performing this task.
1. Translation
a b c
Single-hand Two-hand Choice with Single-hand
Total amount of LT * 0.2 9 0.4
Amount of LT within FOV ** 0.00 3.37 0.20
Completion time (s) 8.0 28.4 8.3
LT Frequency within FOV (LT/s) 0.000 0.119 0.024
* LT = losing track, ** FOV = field of view.
Table 3.1: The results of translation tasks
Task 1.c
Single-hand Two-hand
Number of subjects 15 0
Percentage (%) 100 0
Table 3.2: The selection of task 1.c
Rotation Tasks
Table 3.3 shows an overview of the results of rotation tasks. The table indicates that subjects spent more time on this type of tasks on average, comparing to translation and scaling tasks. However, the frequency of losing track was relatively low.
2. Rotation
a b c
30-degree 90-degree 180-degree Total amount of LT * 0.07 0.73 0.67
Amount of LT within FOV ** 0.07 0.27 0.25
Completion time (s) 12.9 19.8 12.1
LT Frequency within FOV (LT/s) 0.005 0.014 0.021
* LT = losing track, ** FOV = field of view.
Table 3.3: The results of rotation tasks
Scaling Tasks
Table 3.4 shows an overview of the results of scaling tasks. The frequency of scale up is significantly higher than the one of scale down. By comparing this table with rest two tables, the figures show than scaling task had the highest frequency of losing track on average.
3. Scaling
a & c b
Scale up Scale down
Total amount of LT * 2.87 1.33
Amount of LT within FOV ** 0.98 0.20
Completion time (s) 8.55 7.8
LT Frequency within FOV (LT/s) 0.114 0.026
* LT = losing track, ** FOV = field of view.
Table 3.4: The results of scaling tasks
Discussion
Single-hand vs. Two-hand
During the experiment, the author observed that the tasks required the users to perform with both hands seemed to lose track more often. To get an insight of this argument, all the tasks were categorized into two groups: single-hand group and two-hand group. The single-hand group consisted of task 1.a, 1.c, 2.a, 2.b, and 2.c. The two-hand group consisted of task 1.b, 3.a, 3.b, and 3.c. Based on the results from Table 3.1, 3.3 and 3.4 the author calculated the average frequency of
losing track of both groups as shown in Table 3.5. The results showed that the frequency of two-hand tasks was more than seven times of the frequency of single-hand tasks, in other words, the detection of two-hand approaches were significantly less stable than one-hand approaches.
Single-hand Two-hand Frequency of losing track within FOV (LT/s) 0.0128 0.0938
Table 3.5: Comparing sing-hand and two-hand groups
Also, the results from Table 3.2 show that all subjects preferred single-hand approach than the two-hand approach for the translation tasks.
Therefore, the author decided to use single-hand approaches to design the Triangle Model.
Inside vs. Outside Interaction Box
There are two types of losing track defined in the previous section: outside and inside interaction box. Based on the amount of losing track in Table 3.1, 3.3, and 3.4, the author calculated the overall amount of losing track for each type as shown in Table 3.6. The table indicates that the majority (65%) of the losing track occurred outside the Leap Motion field of view.
Therefore, the author found it was necessary to provide real-time feedback of the user’s hand position on the screen, in order to reduce the amount of losing track occurred outside the interaction box, hence increasing the stableness of the detection.
Inside Outside Total
Amount of losing track 11.82 6.32 18.14
Percentage of total (%) 65 35 100
Table 3.6: Comparing two types of losing track
Triangle Size
The author analyzed the influence of triangle size on the losing track occurred inside interaction box. The triangle size was determined by the distance between fingers. We calculated the minimum length of the side of the triangle when losing track as shown in Table 3.7. The results indicated that Leap Motion can
easily lose track of Triangle Model when the length of the side of the triangle was smaller than about 3 cm.
Translation Rotation Scaling Average Minimum length of sides of
triangle when losing track (mm) 21.7 38.2 25.5 28.5
Table 3.7: Triangle size when losing track
Lessons Learned
In conclusion, the author used the concept of Triangle Model to design and implement this experiment. The success of completing all the tasks by all the subjects has proved that the Triangle Model is a feasible approach. The research findings of this experiment include:
• Single-hand approach outperformed two-hand approach with respect to the robustness of detection.
• A big amount of losing track occurred outside the interaction area of Leap Motion.
• Leap Motion cannot detect the triangle robustly if the length of the side of the triangle was smaller than 3cm.
Chapter 4
Triangle Model
The objective of this chapter is to answer the Research Question 2. The manipulation techniques of Triangle Model will be described in detail.
State Diagram
The operations of Triangle Model can be described by the state diagram shown in Figure 4.1. As the diagram displayed, all types of manipulation events can be handled by two states: idle state and manipulate state. A trigger (e.g. foot pedal) can be used to change the states. A manipulation cycle can be explained as follow:
1. Start
• The manipulation starts when the trigger is down.
• The state changes from the idle state to the manipulate state. 2. Manipulating
• The manipulation keeps performing while the trigger is not up. • The state remains the manipulate state.
3. End
• The manipulation stops once the trigger is up.
• The state changes from the manipulate state to the idle state.
For each manipulation cycle, only one type of manipulation can be performed.
Reference Frame
The reference frame of Triangle Model is provided by the triangle formed by user’s thumb, index, and middle fingers. It is the basis to calculate the manipulation of Triangle Model. The coordinate system of this reference frame is specified as shown in Finger 4.2:
• The x-axis is defined by 𝐼𝐼𝑀�����⃗, which is the vector from index finger to middle finger.
• The y-axis is defined by a vector perpendicular to 𝐼𝐼𝑀�����⃗ and on the same triangle plane (△ TIM) as 𝐼𝐼𝑀�����⃗.
• The z-axis is defined by a vector perpendicular to both x and y-axes.
Figure 4.2: Reference frame of Triangle Model
Operations
In the Triangle Model, two types of 3D space are defined: user space and virtual space. The former is the space that user performs the operations. While the latter is the space that virtual objects exist. The manipulation of the virtual objects is achieved by mapping the coordinate system of user space to the coordinate system of the virtual space. There are two types of mapping can be adopted: absolute mapping and relative mapping. Absolute mapping creates a one-to-one mapping from the user space to the virtual space. Relative mapping applies the difference retrieved from user space to the objects in the virtual space.
Comparing these two types of mapping, absolute mapping requires the user to match all the attributes like position and orientation of the virtual object before starting the manipulation, while relative mapping allows the user to choose freely how to start the manipulation. With the purpose of designing a user-centered model, the author decided to use relative mapping. In the following section, the author will describe how this type of mapping was applied in each type of manipulation.
Translation
As shown in Figure 4.3, two reference frames are involved: one for the starting point of the translation, indicating by 𝑅𝑅1; the other one for the end point of
translation, indicating by 𝑅𝑅2. The positions of origins of both reference frames
can be retrieved by the sensing techniques, assuming to be (𝑥𝑥1, 𝑦𝑦1, 𝑧𝑧1) and
(𝑥𝑥2, 𝑦𝑦2, 𝑧𝑧2) respectively.
The translation of an object is operated by applying the difference of the position of two reference frames (Δ𝑥𝑥, Δ𝑦𝑦, Δ𝑧𝑧) to all points (𝑥𝑥, 𝑦𝑦, 𝑧𝑧) on this object according to the following formula:
Δ𝑥𝑥 = 𝑥𝑥2 − 𝑥𝑥1
Δ𝑦𝑦 = 𝑦𝑦2− 𝑦𝑦1
Δ𝑧𝑧 = 𝑧𝑧2− 𝑧𝑧1
(𝑥𝑥, 𝑦𝑦, 𝑧𝑧) → (𝑥𝑥 + Δ𝑥𝑥, 𝑦𝑦 + Δ𝑦𝑦, 𝑧𝑧 + Δ𝑧𝑧)
Figure 4.3: Translation of reference frame
𝑅𝑅
2
𝑅𝑅
1
(𝑥𝑥1, 𝑦𝑦1, 𝑧𝑧1)
Figure 4.4 demonstrates the translation of a cube using Triangle Model.
Before After
Figure 4.4: Demonstration of translation
Rotation
As shown in Figure 4.5, 𝑅𝑅1 represents a reference frame when starting the
rotation. The angle between 𝑅𝑅1’s coordinate system and the world coordinate
system along each coordinate indicated as 𝑥𝑥1 , 𝑦𝑦1 and 𝑧𝑧1 respectively. 𝑅𝑅2
represents a reference frame when finishing the rotation. It also contains three angles, 𝑥𝑥2, 𝑦𝑦2 and 𝑧𝑧2.
The rotation of an object can be operated by applying the difference of angles of each coordinate (Δ𝑥𝑥, Δ𝑦𝑦, Δ𝑧𝑧) to its current rotation value 𝑅𝑅𝑥, 𝑅𝑅𝑦 and 𝑅𝑅𝑧 according
to the following formula:
Δ𝑥𝑥 = 𝑥𝑥2 − 𝑥𝑥1 Δ𝑦𝑦 = 𝑦𝑦2− 𝑦𝑦1 Δ𝑧𝑧 = 𝑧𝑧2− 𝑧𝑧1 𝑅𝑅𝑥= 𝑅𝑅𝑥+ Δ𝑥𝑥 𝑅𝑅𝑦 = 𝑅𝑅𝑦+ Δ𝑦𝑦 𝑅𝑅𝑧 = 𝑅𝑅𝑧+ Δ𝑧𝑧
Figure 4.5: Rotation of reference frame
Figure 4.6 demonstrates the rotation of a cube using Triangle Model.
Before After
Figure 4.6: Demonstration of rotation
Scaling
As shown in Figure 4.7, 𝐷𝐷1 indicate the distance between the thumb and index
finger when starting the scaling. 𝐷𝐷2 indicates the distance between the thumb
and index finger when finishing the scaling.
The scalar factor increases if 𝐷𝐷2 > 𝐷𝐷1; the scalar factor decreases if 𝐷𝐷2 < 𝐷𝐷1.
𝑥𝑥1 𝑥𝑥2 𝑦𝑦1 𝑦𝑦2 𝑧𝑧2 𝑧𝑧1
𝑅𝑅
1
𝑅𝑅
2
Figure 4.7: Scaling
Figure 4.8 demonstrates the scaling up of a cube using Triangle Model.
Before After
Figure 4.8: Demonstration of scaling
Proof of Concept
The proof of concept was developed using LWJGL (Lightweight Java Game Library). The author chose LWJGL because it can provide the access to the native API of OpenGL. For the purpose of 3D programming, there are more resources and reference available in LWJGL and OpenGL, comparing to JavaFX.
The proof of concept has about 1500 lines of code and can be accessed via
https://github.com/NianLiu89/UserCentered3D. The source code of this proof of concept mainly consists of the following parts:
𝐷𝐷
1
𝐷𝐷
2
𝐷𝐷
1
𝐷𝐷
2
𝐼𝐼1 𝑇𝑇1 𝑇𝑇2 𝐼𝐼2 𝑇𝑇1 𝑇𝑇2 𝐼𝐼2 𝐼𝐼1• DisplayManager
The DisplayManager Class is responsible for creating, updating and destroying the application window and the OpenGL context.
• Loader
The Loader Class is responsible for loading the vertex data, RGB data and texture data from the input file and constructing the model to be rendered in the scene.
• Model
The model is the object to be displayed in the window. A model can be constructed from the loader based on the input data file. The
TexturedModel is the type of model used in this application. It consists of two separate models: one for raw object, the other one for texture.
• Shader
There are two types of shader files used in this application: the Vertex Shader and the Fragment Shader. They are written in GLSL (OpenGL Shading Language). The ShaderProgram Class creates a shader program of OpenGL and loads the shaders from the shader files.
• Renderer
The Renderer Class is responsible for rendering the loaded model into the scene.
• Entity
An entity contains a loaded model and its corresponding matrices like transformation matrix, translation matrix, rotation matrix, and scaling matrix. The TexturedEntity is the type of entity used in this application.
o Cube
For this proof of concept, a textured cube is used for manipulation. This Cube Class extends the TexturedEntity Class. All types of manipulation are implemented in this class.
As shown in Figure 4.9, the application can be divided into three stages: Initializing Stage, Looping Stage and Closing Stage.
Initializing Stage
The Initializing Phase is the first stage. It begins when the application starts running. This stage has the following responsibilities:
1. Creating the application window and OpenGL context. 2. Creating the shader and the renderer.
3. Creating the loader and loading the model
4. Creating the camera and the object to be rendered. Looping Stage
The application enters the Looping Stage when the Initializing Stage is completed. The code in this part is executed repeatedly until the application window is closed. In each loop, the status of the camera and the objects in the scene is updated, and then, rendered.
Closing Stage
The application enters the Closing Stage once the window is closed. In this stage, all the resources used by this application are released.
Chapter 5
Conclusion and Future Work
In this project, we proposed a user-centered model for the 3D manipulation of objects in the interactive scientific visualization environment, called Triangle Model. The author conducted an initial study to prove the feasibility of this model and specified its manipulation techniques into detail.
For the Research Question 1, the author conducted an experiment to study the robustness of Triangle Model when triangles are sensed Leap Motion. The results showed that single-hand approach outperformed two-hand approach, and the triangles cannot be detected robustly when the length of the side was less than 3cm.
For the Research Question 2, the author specified three types of manipulation techniques: translation, rotation and scaling. The operation of translation technique was based on the difference of position of reference frames. The operation of rotation technique was based on the difference of the angle of reference frames. The operation of scaling technique was based on the difference of distance between thumb and index fingers.
Our Triangle Model introduces a general concept of manipulating 3D objects. This concept is not limited to the sensing technique (Leap Motion) that we chose for this project. It is also compatible with any other hand tracking techniques, for instance, glove-based, marker-based techniques, as long as the position and orientation of the user’s fingers can be detected accurately and robustly.
For future work, we suggest to improve the display technique by integrating the Oculus Rift with Leap Motion to provide an immersive 3D environment. For the sensing techniques, it is recommended to try multiple-cameras-setup at different positions to overcome the drawback of occlusion problem. For the trigger of the manipulation events, it is possible to use some pre-defined gestures as the replacement of the foot pedal that we currently used in this project.
Bibliography
[1] McCormick, B. H. (1988). Visualization in scientific computing. SIGBIO Newsl., 10(1), 15-21.
[2] Koutek, M. (2003). Scientific Visualization in Virtual Reality: Interaction Techniques and
Application Development. Ph.D. thesis, Delft University of Technology.
[3] Friendly, M. (2009). Milestones in the history of thematic cartography, statistical graphics,
and data visualization. Retrieved from http://www.math.yorku.ca/SCS/Gallery/milestone/milestone.pdf
[4] Conner, Brookshire D., Snibbe, Scott S., Herndon, Kenneth P., Robbins, Daniel C., Zeleznik, R.C., van Dam, A. (1992). Three-dimensional Widgets. In Proceedings of the
1992 Symposium on Interactive 3D Graphics (pp. 183-188). Cambridge, Massachusetts,
USA: ACM.
[5] Wiegers, R. (2006). Interaction with 3D VTK widgets: A quantitative study on the
influence of 2DOF and 6DOF input devices on user performance. Master thesis,
University of Amsterdam.
[6] Kok, A.J.F., van Liere, R. (2007). A multimodal virtual reality interface. Knowledge and
Information Systems, 13(2), 197-219.
[7] Dipietro, L., Sabatini, A.M., Dario, P. (2008). A Survey of Glove-Based Systems and Their Applications. Systems, Man, and Cybernetics, Part C: Applications and Reviews,
IEEE Transactions on, 38(4), 461-482.
[8] Mohr, D., Zachmann, G. (2007). Segmentation of Distinct Homogeneous Color Regions in Images. In Computer Analysis of Images and Patterns (pp. 432-440). Springer Berlin Heidelberg.
[9] LaViola, Jr., Joseph J. (1999). A Survey of Hand Posture and Gesture Recognition
Techniques and Technology. Providence, RI, USA: Brown University.
[10] Rama, B.D., Mohod, P.S. (2014). Survey on Hand Gesture Recognition Approaches.
International Journal of Computer Science and Information Technologies, 5(2), 2050-2052.
[11] Mohr, D., Zachmann G. (2015). Hand Pose Recognition — Overview and Current Research. In Virtual Realities (pp. 108-129). Springer International Publishing.
[12] Garg, P., Aggarwal, N., Sofat, S. (2009). Vision Based Hand Gesture Recognition.
World Academy of Science, Engineering and Technology, 972-977.
[13] Simion, G., Gui, V., Otesteanu, M. (2012). Vision Based Hand Gesture Recognition: A Review. International journal of circuits, systems and signal processing, 6(4), 275-282. [14] Barhate, K.A., Patwardhan, K.S., Roy, S.D., Chaudhuri, S., Chaudhury, S. (2004).
Robust shape based two hand tracker. In Image Processing, 2004. ICIP '04. 2004
[15] Rehg, J., Knad, T. (1994). Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking. In European Conference on Computer Vision (pp. 35-46). Springer-Verlag.
[16] Oikonomidis, I., Kyriazis, N., Argyros, A. (2011). Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British Machine Vision
Conference (pp. 101.1-101.11). BMVA Press.
[17] Kim, H., Albuquerque, G., Havemann, S., Fellner, D.W. (2005). Tangible 3D: Hand Gesture Interaction for Immersive 3D Modeling. In Proceedings of the 11th
Eurographics Conference on Virtual Environments (pp. 191-199). Aalborg, Denmark:
Eurographics Association.
[18] Bettio, F., Giachetti, A., Gobbetti, E., Marton, F., Pintore, G. (2007). A practical vision based approach to unencumbered direct spatial manipulation in virtual worlds.
Eurographics Italian Chapter Conference.
[19] Jankowski, J., Hachet, M. (2013). A Survey of Interaction Techniques for Interactive 3D Environments. Eurographics 2013 - STAR.