Face identification in videos from mobile cameras

(1)

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 NCCV #**** NCCV #****

NCCV 2014 Submission #****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

Face identification in videos from mobile cameras

Meiru Mu, Luuk Spreeuwers, Raymond Veldhuis

SCS group, University of Twente

Enschede, The Netherlands

{M.Mu,L.J.Spreeuwers,R.N.J.Veldhuis}@utwente.nl

Abstract

It is still challenging to recognize faces reliably in videos from mobile camera, although mature automatic face recognition technology for still images has been avail-able for quite some time. Suppose we want to be alerted when suspects appear in the recording of a police Body-Cam, even a good face matcher on still images would give many false alarms due to the uncontrolled conditions. This paper presents an approach to identify faces in videos from mobile cameras. A commercial face matcher FaceVACS is used to process the face recognition frame by frame. On a video of certain length, in order to suppress the false alarms, we propose to count the recognized identities and set thresholds to the counts, as well as to the matching scores for still-image face recognition. In this way, the fa-cial information of a single subject over time is exploited without implementing face tracking, which is complicated and more difficult for low-quality unconstrained videos. For experiments, videos are recorded by two type of mobile cameras, which provide different video qualities. The re-sults demonstrate the efficiency of our proposed approach.

1. Introduction

Face recognition in the context of visual surveillance ap-plication has been a topic of growing interest in computer vision. The object of our work is to develop a suitable approach to face recognition from mobile police cameras, which is expected to warn the holder of the camera whether someone on a blacklist is spotted by the camera. Currently, the mobile cameras on policemen or police cars are primar-ily intended for the recording of events. The application we are considering to supplement in this work can be addressed as recognizing targeted subjects on a list from a sequence of uncontrolled video frames by face recognition. Figure1 illustrates the considered system framework. Before face identification, face extraction and feature extraction (or part of them) can be executed on mobile camera, or the complete

Is there someone on my list nearby? PC Watch List Image data Alert

Figure 1. The considered system framework. The proposal is to extend the functionality with automatic face recognition, so that the bearer of the camera can receive an alert when someone who is sought or has an area ban is nearby.

video recording needs to be forwarded to a central server to process.It depends on the camera processing power and the available bandwidth of network connection. An alert is expected to be transferred to the camera when some tar-get subject is identified. The extracted faces of one subject from a long-term video frames can be represented a set of vectors. The faces of targeted subjects on watch list, saved as mug-shots, can be respectively represented as one vector. Accordingly, we address our considered video face recogni-tion issue as a set-to-one similarity measurement problem, or query set and still image matching issue as some other pa-pers present [1]. The faces of a same subject from the long-term observations vary in scale, pose, illumination, and ex-pressions etc. In addition, because of the moving camera, the frame images are of poor quality due to the compres-sion and of heavy motion blur. In this case, we highlight the identity similarities, i.e. determine whether the face set from video and some face from mug-shots are of the same subject, while ignoring the image similarities.

In the context of face recognition from a set of images obtained over time, how to integrate information from mul-tiple observations has been well studied [2]. The possible schemes for integrating information include selecting the best observation from each set or simply averaging all ob-servations prior to classification, and constructing some sta-tistical models from multiple observations to compare the existing face models. However, it is challenging to first 1

(2)

108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 NCCV #**** NCCV #****

• Frame 1: A # # C # • Frame 2: A D # # F • Frame 3: # C Z # # • Frame 4: # A # # P • Frame 5: # # A B # A:4 B:1 C:2 D:1 P: 1 Z:1

Figure 2. Illustration of the proposed idea for assisting identifica-tion determinaidentifica-tion on video, assuming that the gallery set consists of {A - Z} 26 mugshots and multiple faces are extracted from each frame and rank-1 identified for faces. By counting the result-ing identities and orderresult-ing the counts, subject A is suspected in the video.

achieve those face observations of the same subject to in-tegrate information, given each video frame includes sev-eral faces of multiple subjects. Gensev-erally, face detection and tracking are needed for that [3]. But, for the common surveillance videos from mobile camera as we consider in this paper, face tracking is extremely challenging due to the poor image quality and unconstrained subjects behavior.

In this paper, we present an approach to face recognition on surveillance videos from mobile camera. Each frame includes multiple subjects. We highlight the measurement way of identity similarity between the extracted faces from a sequence of video and the stored mug-shots. The face tracking is not involved, which is usually complicated in this considered scenario. The face extraction and identi-fication algorithms are carried out by a commercial face matcher FaceVacs [4], since the face recognition scheme based on still images is out of our discussion. Section 2 presents the proposed approach. Experimental results are given in Section 3. Section 4 is about some conclusion and discussion.

2. Identity determination in video

So far there have been mature automated face recogni-tion technologies available in still images. However, it is challenging to recognize faces in online videos. In case of identifying suspects to assist law enforcement, the un-constrained videos usually provide images of low-quality. If we carry out the single-image face matching frame by frame, the false matching rate could be significantly high. Suppose we set an alarm when the query face is matched with someone in a background set of mugshots, probably we will receiver many false alarms. But for a video of cer-tain frames, a simple fact is that a single face may appear in many frames. And if we frequently receive alarms of a same subject, then chances are higher for this subject to genuinely appear in the video. Figure2illustrates this idea by assum-ing that the gallery set consists of_{{A - Z} 26 mugshots and} five subjects simultaneously appear on one frame. Faces are

firstly extracted frame by frame and then face matchings on still images are carried out. By analyzing the rank-1 iden-tification results, one can guess that chances are the most higher that subject A appearances in the video.

Intuitively, it is not a trustable alarm by only counting the identification results from all the extracted faces in a video. The matching score, which indicates the similarity of two faces, should play an important role for identity determi-nation in the same way as that for classical face recognition task on still images. Figure3illustrates how we make a final identity determination by setting thresholds on the identity counts and matching scores. The involved processing steps can be listed as following:

1. Given a video of certain frames, faces are extracted from frame by frame. The rank-1 face identifica-tion is subsequently carried out by some mature face matching technology. It provides each face a maxi-mum matching score and its corresponding identity. It should be noted that there are some non-faces result-ing from face extraction due to the unavoidable false face detections. Some extracted faces are of badly poor quality. The commercial face matcher FaceVacs which we use for experiments filters them out before feature extraction by implementing eyes detection. As it shows in Figure3, the identification results of some faces are denoted by #.

2. For each single face, threshold T1is set to reduce the

false identification rate. And those low-quality faces and non-faces, which don’t get identification results from the face matcher, are removed from the list. Then the number of each recognized identity are counted, and their corresponding matching scores are group into one score by maximum.

3. Threshold T2is set to the grouped maximal score, and

threshold T3 is set to the percent of identity counts.

Note that thresholds T1 and T2 are depended on the

quality of single images and of course the face match-ing algorithm itself.While the threshold T3is more

de-pended on the video content. They are adjustable as the input videos vary and the applied face recognition algorithm differs.

3. Experimental results

Video recording data and face matcher: The videos

for our experiments are recorded by two types of Zepcam, which are respectively ChestCam and ShoulderCam. The Zepcam, developed and manufactured in Netherlands, sists of a recorder unit, a camera, and a wireless remote con-trol, which has been used in many European police forces as well as security organizations [5]. We suppose that the policeman starts a recording when he finds someone who is 2

(3)

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 NCCV #**** NCCV #****

Is there someone on my list nearby? PC Watch List Image data Alert Face

Extraction IdentificationFace

Watch List Video recording Threshold T1 Counting Threshold T2 Threshold T3 Probably person Ais nearby us... Alert making

Figure 3. Illustration of our proposed approach. Input: video data. Output: determined identity by supposing subjects{A,B,C,D} are on the watch list. Face extraction and identification, which are carried out by a commercial face matcher ’FaceVacs’, provide a list of identities and corresponding scores from all of extracted faces. For the final identity determination, the list are processed by thresholds.

suspicious in the camera’s view (assuming 3-5 persons to-gether for case study in our experiments). It is uncontrolled condition so that the captured faces from one subject could be wildly different in scale, expression, pose etc. And there are additional problems due to the mobile camera, includ-ing distortion, blur, low-resolution and compression. The recording is expected to be sent into a central PC by wireless network, where the face recognition function is processed and the final determined identity (possibly its stored per-sonal information as well) will be sent to camera by some alert signal. For our experiments, the videos are of around 10 seconds, including about 200 frames. About 20 videos from two types of Zepcam are collected. The commercial face matcher we use is FaceVACS-SDK8.7.0.2 [4], which processes face extraction and matching by algorithm B7.

Data on watchlist:: For experiments, we collected some

still frontal face images from volunteers in our research group. Besides, for a bigger gallery size, we randomly se-lected some frontal face images from FRGC database. In total, there are 203 subjects on the watchlist. For feature template extraction, one image per subject is enrolled by FaceVACS-SDK8.7.0.2, which is the same algorithm as that for the video recording data.

Thresholds: For finding out the suitable thresholds

{T1, T2, T3} as introduced in Section 2, the distributions

of genuine and impostor scores, which are from grouping the list by maximum, are analyzed. Figure 4 plots these score distributions by taking a video from ChestCame as example. As it shows, the genuine scores are primarily ranged in [0.2,0.6], and the impostor scores are mainly in

[0.2,0.4]. Accordingly, T1 = 0.2 and T1 = 0.4 are

deter-mined. For determining T3, the percentages of recognized

identity counts are calculated. The right axis, as Figure5 shows, plots the percentages we get. Based on those plots of count frequencies, T3 = 0.1, i.e. T3 = 10% is

deter-mined. It should be noted that there are more videos col-lected in our experiments to investigate the suitable thresh-olds. _{T1 = 0.2, T2 = 0.4, T3 = 0.1} are feasible for a

moderate video condition. They are adjustable as the video quality varies.

Identity determination results: Based on the three

thresholds we can easily filter out most of the false recog-nition resulting from still image-to-image matching. Fig-ure5displays the identity determination results by thresh-olds, in which the red line shows the identities to stimulate alerts making. For example, we are alarmed that there are three suspects in the video ChestCam Group1,and one sus-pect in the video ShoulderCam 0276. Assuming without the count percentage threshold T3 given, the video

Chest-Cam Group1 and ShoulderChest-Cam 0276 will both make four alarms, as shown by the blue makers above the blue dash line.

4. Conclusion and discussion

In this paper, we present an approach to automatically determine the subject’s identity over a surveillance video from mobile camera, under the assistance of face recogni-tion technology. By setting a threshold to the counts fre-quency of recognized faces as well as two thresholds to the matching scores, the false alarms are suppressed. The 3

(4)

324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 NCCV #**** NCCV #****

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8

(a) Genuine_200 query faces

Ma xS co re 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8

(b) Genuine_195 query faces

Ma xS co re 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8

(c) Genuine_203 query faces

Ma xS co re 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8

(d) Impostor_61 query faces

Ma

xS

co

re

Figure 4. Plots of genuine and impostor scores by taking a video from ChestCam as example. Three subjects{195,200,203} in gallery set appear in the video, and one mistakenly recognized identity 61 is randomly chosen to display the distribution of impostor scores.

alert of final determined identity is more reliable. The re-sults demonstrate the simplicity and reliability of the pro-posed approach. Besides, since there is no face tracking involved, the approach can be efficiently implemented in real-time applications. In order to further decrease the false alarms for our considered application scenario, the optimal face matcher may not be FaceVACS. And for enhancing the reliability of determined identity by our proposed approach, some guidance on the video recording for users would help. These issues could be our future work. A demonstration system working on it, including mobile camera recording, wireless connection, face recognition scheme would help looking for a better algorithm solution for more realistic ap-plication.

References

[1] Beveridge, J.Ross,Phillips, P.Jonathon, Bolme, David S. et al. The challenge of face recognition from digital point-and-shoot cameras. In Proc. 6th IEEE Int. Conf. Biometrics: The-ory, Applications and Systems (BTAS), pages 1–8, Arlington, VA, USA, 2013.1

[2] Gregory Shakhnarovich, John W. Fisher, Trevor Darrell. Face Recognition from Long-Term Observations. In Proc. 7th European Conference on Computer Vision-Part III (ECCV ’02), pages 851–868, London, UK, 2012.1 [3] Duffner, S., Idiap Res. Inst., Martigny, Switzerland, Odobez,

J.. Track Creation and Deletion Framework for Long-Term Online Multiface Tracking. IEEE Transactions on Image Processing, pages 272–285, Vol.22(1) , 2013.2

[4] http://www.cognitec-systems.de/facevacs-sdk.html2,3 [5] http://www.interconnective.co.uk/zepcam/2 0 50 100 150 200 250 0 0.2 0.4 0.6 0.8 1 S co re s

Persons in Gallery Set Results from Video ChestCam_Group1

0 50 100 150 200 2500 0.2 0.4 C ou nt p er ce nt ag e o f r ec og ni ze d f ac e 0 50 100 150 200 250 0 0.2 0.4 0.6 0.8 S co re s

Persons in Gallery Set Results from Video ShoulderCam_0276

0 50 100 150 200 2500 0.1 0.2 0.3 0.4 C ou nt p er ce nt ag e o f r ec og ni ze d f ac e

Figure 5. Results display by taking two videos as examples, which are respectively from ChestCam and ShoulderCam.