Read binary file - 3D face scanner application

3D face scanner application

3.1 Read binary file

The first step of the application is to read the binary file that contains the required information for the 3D reconstruction. The binary file is composed of two parts, the header and the actual data. The header contains metadata of the acquired frames, such as the number of frames and the resolution of each one. The second part contains the actual data of the captured frames. Figure 3.2 shows an example of such frame sequence, which from now on will be referred to as camera frames.

3.2 Preprocessing

The preprocessing stage comprises the four steps shown in figure 3.3. Each of these steps is described in the following subsections.

Preprocessing Parse XML

file

Discard

frames Crop frames

Scale

•Convert to float

•Range from 0-1

Figure 3.3: Flow diagram of the preprocessing stage.

3.2.1 Parse XML file

In this stage, the application first reads an XML file that is included for every scan.

This file contains relevant information for the structured light reconstruction. This

information includes: (i) the type of structured light patterns that were projected when acquiring the data; (ii) the number of frames captured while structured light patterns were being projected; (iii) the image resolution of each frame to be considered and (iv) the calibration data.

3.2.2 Discard frames

Based on the number of frames value read from the XML file, the application discards extra frames that do not contain relevant information for the structured light approach, but that are provided as part of the input.

3.2.3 Crop frames

The original resolution of each camera frame (480 × 768) is modified in order to obtain a new, more suitable resolution for the subsequent algorithms of the program (480 × 754). This is accomplished by cropping the pixels that are close to the top border of the images. Note that this operation does not imply a loss of information in this application in particular. This is because pixels near the frame borders do not contain facial information, and therefore, can be safely removed.

3.2.4 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is represented by an 8-bit unsigned integer value that ranges from 0 to 255. In this stage, the data type is transformed from unsigned integer to floating point while dividing each pixel value by 255. The new set of values range between 0 and 1.

3.3 Normalization

Even though this section is entitled Normalization, a few more tasks are being performed in this stage of the application, as shown by the blue rectangles in Figure 3.4. Here, wide arrows represent flow of data, whereas dashed lines represent the order of execution. The numbers inside the small data arrows pointing towards the different tasks, represent the number of frames used as input by each task. The dashed line rectangle that encloses the normalization and texture 2 tasks, represents that there is not a clear sequential execution between these two, but rather that these are executed in an alternating fashion.

This type of diagram will result particularly useful in Chapter 5 in order to explain the

Normalization

Figure 3.4: Flow diagram of the normalization stage.

modifications that were made to the application to improve its performance. An example of the different frames that are produced in this stage are visualized in Figure 3.5. A brief description of each of the tasks involved in this stage follows.

3.3.1 Normalization

The purpose of this stage is to extract the reflectivity component (texture information) from the camera frames, while aiming at enhancing the deformed illumination patterns in the resulting frame sequence. Figure 3.5a illustrates the result of this process. The deformed patterns are essential for the 3D reconstruction process.

In order to understand how this process takes place, we need to look back at Figure 3.2. Here, it is possible to observe that the projected patterns in the top row frames are equal to their corresponding frame in the bottom row, with the only difference being that the values of the projected pattern are inverted. For each corresponding pair, a new image frame is generated according to the following equation:

Fnorm(x, y) = Fcamera(x, y, a) − Fcamera(x, y, b) F_camera(x, y, a) + F_camera(x, y, b)

where a and b correspond to aligned top and bottom frames in Figure 3.2, respectively.

An example of the resulting frame sequence is shown in Figure 3.5a.

(a) Normalized frame sequence.

(b) Texture 2 frame sequence.

Figure 3.5: Example of the 18 frames produced in the normalization stage.

3.3.2 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one used to calculate the normalized frame sequence. In fact, the output of this process is an intermediate step in the calculation of the normalized frames, being this the reason why the two processes are said to be performed in an alternating fashion. The mathematical equation that describes the calculation of the texture 2 frame sequence is:

Ftexture2(x, y) = Fcamera(x, y, a) + Fcamera(x, y, b)

The resulting frame sequence (Figure 3.5b) is used later in the global motion compen-sation stage.

3.3.3 Modulation

The purpose of this stage is to find the range of measured values for each (x, y) pixel of the camera frame sequence, along the time dimension. This is done in two steps. First, two frames are generated by finding the maximum and minimum values along the time (t) dimension (Figure 3.6) for every (x, y) value in a frame.

Camera Frame Sequence

y t

Figure 3.6: Camera frame sequence in a coordinate system.

Second, a modulation frame is produced by finding the difference between the previously generated frames, i.e.:

Fmod(x, y) = Fmax(x, y) − Fmin(x, y)

Such modulation frame (Figure 3.5c) is required later during the decoding stage.

3.3.4 Texture 1

Finally, the last task in the Normalization stage corresponds to the generation of the texture image that will be mapped onto the final 3D model. In contrast to the previous three tasks, this subprocess does not take the complete set of 16 camera frames as input, but only the 2 with finest projection patterns. Figure 3.7 shows the four processing steps that are applied to the input in order to generate a texture image such as the one presented in Figure 3.5d.

Texture 1 Average

frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 3.7: Flow diagram for the calculation of the texture 1 image.

In document Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a hand-held device Roa Villescas, M. (pagina 31-36)