Marks Fusion: Development Of Facial Marks Detection System And Fusion With Face Recognition System
Lucian Chirca
University of Twente P.O. Box 217, 7500AE Enschede
The Netherlands
l.chirca@student.utwente.nl
1. ABSTRACT
Facial marks like freckles, moles, scars, pockmarks have been used in the past to identify individuals. There have been developed systems integrating both Facial Marks de- tection with Facial Recognition [17] [2], which showed im- proved performance over only using Facial Recognition.
These systems used classic blob detection approaches like LoG (Laplacian of Gaussian) or Fast Radial Symmetry Transform for detecting facial marks, which gave a lot of False Positives, or had people manually annotate fa- cial marks, which is too time consuming. Although there have been significant improvements in detecting Facial Marks using a Convolutional Neural Network, a system integrating this new approach with facial detection has not been implemented yet. This paper improves the state- of-the art in Facial Marks detection by using CNNs with deeper architectures and shows that a system combining a state-of-the-art algorithm in Facial Recognition with a Fa- cial Marks Systems outperforms one that only uses Facial Recognition, especially in the case of monozygotic twins.
Keywords
Facial marks, Facial recognition, Convolutional Neural Net- works, Monozygotic Twins
2. INTRODUCTION
Facial marks (e.g freckles, moles, scars, pockmarks, etc) are soft biometric features that have been shown to de- crease the error rates in facial recognition software [8].
Although they are not discriminative enough by them- selves to identify an individual, they have been proven to be effective at narrowing the search for the person [11], and helping to distinguish between people, especially in the case of monozygotic twins.[14]. Where Facial Recog- nition systems could output high scores, two people can have radically different facial marks patterns, not taken into account by a Facial Recognition system. This is es- pecially true for identical twins, where it is very difficult to differentiate them only by using their facial structure, since they look so much alike. This is where facial marks can offer enough information to indicate if two pictures are from the same person or not. To be able to use facial marks effectively, their correct detection is crucial. Work Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth- erwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
34
thTwente Student Conference on IT Jan. 29
th, 2021, Enschede, The Netherlands.
Copyright 2021 , University of Twente, Faculty of Electrical Engineer- ing, Mathematics and Computer Science.
has been done to show that a Shallow Convolutional Neu- ral Networks (CNN) outperforms the classic blob detector approach such as Laplacian of Gaussian (LoG) or Fast Ra- dial Symmetry Transform [15]. The topic facial recogni- tion has gotten much attention, with a lot of research being done into improving the performance of such systems [9]
[5] [12] [10]. The relevance of this topic is not surprising since facial recognition has many important implications such as being used for identifying suspects in relation to a crime, or being used as a way to identify an individual for security purposes. In this paper, we aim to improve the performance of a state-of-the-art face recognition algo- rithm, Open-face [1], by combining it with our facial marks system (FMS), and seeing if it can help differentiate be- tween images of people, especially in the case of monozy- gotic twins. To this end, a subset of the FRGCv2 and 2009 and 2010 TwinsDays festivals in Twinsburg, Ohio datasets will be used and the following research questions will be addressed:
RQ1 To what extent does increasing the number of lay- ers improve the performance of a facial marks detection CNN?
RQ2 To what extent is transfer learning better than cre- ating CNN’s from scratch.
RQ3 Can we, and to which extend, improve face recogni- tion performance by fusing results obtained by the FMS, particularly in the case of monozygotic twins.
Experiment 1 will address Research Question 1 by compar- ing the performance of 4 CNNs with a shallow architecture (up to 3 layers) with 4 CNNs with a deeper architecture (between 3 and 9 layers). Experiment 2 will address Re- search Question 2 by comparing the performance of the 8 CNNs trained in Experiment 1 with a pre-trained deep CNN used for image recognition that has has the last 4 layers replaced and retrained for detecting facial marks.
Lastly, Experiment 3 will use a CNN from the previous ex- periments to detect facial marks and fuse these scores with scores from a state-of-the art facial recognition software to try to get better results than just using facial recognition alone, especially in the case of monozygotic twins.
3. RELATED WORK
Facial marks have been used as a means of identification for a long time, the Bertillonage system being the first modern system to use facial features for identifying sus-
Figure 1. Example of facial mark patches
pects. [3] In Park and Jain [11], a facial marks detection system is implemented using classic blob detectors like Ac- tive Appearance Model (AAM) and Laplacian of Gaussian (LoG). This type of approach introduced a lot of false pos- itives that needed to be filtered. In the case of [14], the case of identifying monozygotic twins is explored using classic blob detector methods like Fast Radial Symmetry transform. Since then, better Facial Marks detection sys- tems have been made using Shallow Convolutional Neural Networks [15]. Grid-based approaches have been seen in [16] showing improved performance over classical meth- ods. This novel approach surpassed the performance of traditional approaches such as blob detection with heuris- tics and had significantly less false positives. There are still ways to improve the performance of CNNs. As has been discussed in [13], creating deeper models is usually better than creating wider models. A model can have the same effective receptive field, with more layers, as has been shown [13]. Work has been done to show that a FMS can improve the performance of state-of-the-art facial recog- nition systems in [8] [2] [17]. Looking at the work that has been done so far, there seems to be a gap in using a CNN-based facial marks detection system with a state-of- the-art facial recognition system, such as [1], especially in the case of monozygotic twins, therefore this paper will focus on fusing these systems together and solving the dif- ficult problem of identifying identical twins.
4. METHODOLOGY 4.1 Dataset
In this paper, from the FRGCv2 dataset, we will use the subset containing 12306 images of 568 subjects, in which the facial marks were manually annotated by [16]. The people in these images show a natural facial expression and were photographed under controlled conditions. This way, we provide our system with a relatively consistent dataset. The reason why we use this dataset is because it is sufficiently large, with a relatively consistent environ- ment where the images were taken and has been manually annotated before. Otherwise, it will take too much time to manually annotate it ourselves. Additionally, we will use identical twin images acquired at the 2009 and 2010 Twins Days Festival in Twinsburg, Ohio, to test our sys- tem in the case of monozygotic twins identification. From this dataset, 100 pairs of identical twins will be selected, resulting in 200 people and for each person 2 pictures will be extracted. This will result on 400 images to be used in experiments. These images will be processed the same as the first dataset. The reason 2 images are extracted for each person is so that we have enough images to see if the system can recognize if two images are from the same person or from different people.
4.2 Image pre-processing
For the image pre-processing part, we apply geometric and photometric transformations to the images prior to the fa- cial marks detection and facial recognition steps. We crop the images to (800,600) and use the affine transformation that will map the pupils to the coordinates (200,250) and (400,250), such that the inter-pupillary distance for every image is the same (200 px). [15] This gives images a con- sistent coordinate system in which to store the locations of detected facial marks. It also provides consistency be- tween images of a person taken in different environments.
For the photometric transformation, we apply a grayscale transformation to reduce the complexity of the problem.
After this we normalize the image by subtracting a mean of 0.5 and dividing with a standard deviation of 0.5. This
is done to reduce the time to converge. Figure 2 shows an example of this procedure.
Figure 2. Image 1 - Original, Image 2 - After transformation
4.3 Patch generation
After pre-processing, the dataset is split into training and evaluation sets. The images from the first 390 people will be used for training and the rest will be used for evalu- ation. For each of these sets, we will extract 10000 skin patches containing facial marks and 50000 skin patches not containing facial marks. This procedure will be repeated for 3 different skin patch sizes: 15 × 15 px, 19 × 19 px and 25 × 25 px. These collections will be used for training and evaluating models in Experiment 1 and Experiment 2.
Skin patches containing facial marks will be extracted ac- cording to the locations that have already been manually annotated. To generate skin patches not containing facial marks, firstly the face of the subject will be detected and withing that bounding box skin patches will be randomly selected such that a) they don’t overlap with facial marks and b) they don’t overlap with already selected patches.
4.4 Facial mark detection
To detect the facial mark pattern on a person’s face, we shall use a grid-based approach that will divide the face into a grid of equal-sized rectangles. For each rectangle, we will use our CNN-based classifier to detect the facial mark. Once it is detected, the result will be added to the set containing the facial marks of the person’s face, along with the location of the facial mark. [16] The size of each rectangle will be decided based on experimental results, but we expect that it should be large enough to contain a relatively large facial mark and small enough to separate small facial marks close to each other.
4.5 Facial marks matching
Once facial marks have been detected, it is important to calculate how different two images are given the fa- cial mark locations. Given an image, we split it into a grid of rows x columns, where the size of each rectan- gle is determined by dimensions of the face bounding box split over the number of rows and columns. The best grid configuration will be found during Experiment 3, based on empirical results, from 4 different categories: coarsest, coarse, finer and finest, defined by setting the number of rows and columns to the aspect ratio of the image (4 × 3), multiplied by 2,5,8 and 11, respectively. Then we run our CNN model in a ”sliding window” fashion, whereby we classify skin patches taken distance(stride) d from ad- jacent patches, such that some visual data may overlap.
If the center of a skin patch classified as having a facial
mark is within a rectangle in a grid, then that rectangle
gets the score 1, indicating that the rectangle contains at
least one facial mark, otherwise the score is 0. After this
grid has been established for two images, we can compute
the negative hamming distance between them. This dis-
tance will then be used to determine how similar the facial
Figure 3. Example of a facial matching 3x4 grid
mark pattern between two images is. An example of such a grid can be seen in Figure 3.
4.6 Fusing algorithm
To combine the similarity scores between the Facial Recog- nition (FR) system and the Facial Marks System (FMS), it is important that we first normalize the outputs of each, in such a way that they are in the same range.
To normalize the Facial Mark System score, we will per- form min-max normalization on the negative hamming dis- tance between the facial grids of two images. This will result in a score from 0 to 1, whether the higher the score, the more likely the two images have the same facial mark pattern.
For the Facial Recognition system, we will obtain the fea- ture vector containing 2D and 3D facial features of two images, v
1and v
2, respectively. We will perform the fol- lowing operation to get the angle between the two vectors:
angle =
arccos(v|v 1·v2)1||v2|