• No results found

Design and synthesis of an image-based object detection system

N/A
N/A
Protected

Academic year: 2021

Share "Design and synthesis of an image-based object detection system"

Copied!
134
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Design and synthesis of an image-based

object detection system

N Vermaak

Orcid.org/0000-0002-0041-7311

Dissertation accepted in fulfilment of the requirements for the

degree

Master of Engineering in Computer & Electronic

Engineering

at the North-West University

Supervisor:

Prof JEW Holm

(2)

Acknowledgements

Firstly, I would like to thank God, for the breath in my lungs, the opportunities that He has given me, and for strengthening me when challenges became real.

I would like to thank the following people:

• Prof. Holm, for all the guidance and support throughout the research;

• Pieter Jordaan, for his support, for proofreading the thesis, and providing valuable insights and corrections;

• Rossouw van der Merwe and the Jericho Systems team, for the support and guidance during the research, and for allowing me to work on the thesis during office hours;

• Liaan Moolman, for struggling alongside me to finish on time, and helping me bounce ideas whenever I got stuck;

• My family, for their support and patience throughout my life, and for always allowing me to be myself.

Finally, I would like to thank my wife, Leandri, for her love and sup-port throughout the last two years of my thesis.

Thank you for all the times I could count on you to bring me a cup of coffee when I was working late. And thank you for helping me plot the graphs for the cost-performance model in Matlab.

(3)

Abstract

This thesis presents the design and synthesis of an image-based ob-ject detection system, as applied to a facial recognition (FR)-enabled surveillance system. By employing the Design Science Research (DSR) paradigm and following an Action Design Research (ADR) methodol-ogy, the real-world problem of synthesizing an FR-enabled surveillance system for the South African environment was investigated.

To address this problem, a literature review was done into facial recognition, open-source and commercial facial recognition offerings, systems engineering principles, and the unique constraints of facial recognition in South Africa.

An integrated system was designed and synthesized, and tested to ensure that functional requirements were met.

A comparative study of one open-source FR engine and one com-mercial FR engine was done, after which a cost-performance evalua-tion model was designed. This was done to compare the two engines, along with a hybrid FR engine that comprises both the open-source and commercial FR engines arranged in series. This model was used to compare the recognition rate to cost of ownership for each of the three FR engines for a range of real-world scenarios.

The results from these analyses indicate that there are regions of cost-performance in which each of the FR engines performs best. The most important consideration apart from accuracy is cost, but the em-phasis on performance of the FR engine alone is not always warranted as performance is achieved at a cost. The study therefore provides insight into real-world scenarios encountered in South Africa and thus contributes to the knowledge base on facial recognition systems. Keywords: facial recognition, system design, object detection, hybrid

architecture, open-source technology, design science research, cost-performance

(4)

Opsomming

Hierdie tesis bied die ontwerp en sintese van ’n beeldgebaseerde objekopsporingstelsel aan, soos toegepas op ’n visule sekuriteitstelsel met ge¨ıntegreerde gesigsherkenning (GH). Deur gebruik te maak van die ”Design Science Research” (DSR) paradigma en die volg van ’n ”Action Design Research” (ADR) -metodologie, is die werklike prob-leem van die sintese van ’n GH-ge¨ıntegreerde visuele toesigstelsel vir die Suid-Afrikaanse omgewing ondersoek.

Om hierdie probleem aan te spreek, is ’n literatuuroorsig gedoen oor gesigsherkenning, oopbron en kommersi¨ele aanbiedings vir gesigsh-erkenning, stelselingenieurswese-beginsels en die unieke beperkings van gesigsherkenning in Suid-Afrika.

’n Ge¨ıntegreerde stelsel is ontwerp, gesintetiseer en getoets om aan die funksionele vereistes te voldoen.

’n Vergelykende studie van een oopbron GH-enjin en een kom-mersi¨ele GH-enjin is gedoen, waarna ’n koste-prestasie-evalueringsmodel ontwerp is. Dit is gedoen om die twee enjins te vergelyk, tesame met ’n hibriede GH-enjin wat bestaan uit sowel die oopbron as die kom-mersi¨ele GH-enjins wat in serie gerangskik is. Hierdie model is gebruik om die herkenningskoers te vergelyk met die koste van eienaarskap vir elk van die drie GH-enjins vir ’n verskeidenheid werklike scenario’s.

Die resultate van die ontledings dui aan dat daar gebiede van koste-verrigting is waarin elk van die GH-enjins die beste presteer. Afgesien van akkuraatheid is koste die belangrikste oorweging, maar die klem op die werkverrigting van die FR-enjin alleen is nie altyd geregverdig nie, aangesien prestasie teen ’n koste behaal word. Die studie bied dus insig in lewenswerklike scenario’s wat in Suid-Afrika tegekom word en dra dus by tot die kennisbasis van gesigsherkenningstelsels.

Sleutelwoorde: gesigsherkenning, stelselontwerp, objekopsporing, hi-briede argitektuur, oopbronsagteware, ontwerpswetenskapsnavors-ing, koste-prestasie

(5)

Contents

Acknowledgements i

Abstract ii

Opsomming iii

List of Figures vii

List of Tables ix

List of Abbreviations x

1 Introduction 1

1.1 Introduction to problem domain . . . 1

1.2 Outline . . . 2

2 Research Methodology 4 2.1 Design Science Research . . . 4

2.1.1 DSR Environment . . . 5

2.1.2 Three-Cycle View . . . 8

2.1.3 DSR Guidelines . . . 11

2.2 Elaborated Action Design Research . . . 12

2.2.1 ADR . . . 12

2.2.2 eADR . . . 15

2.2.3 Research Maturity . . . 16

2.2.4 DSR Process . . . 17

2.3 Quality Research Management . . . 19

2.4 Summary . . . 19

3 Problem Statement 20 3.1 Introduction . . . 20

(6)

3.1.1 Real-World Systems . . . 20 3.1.2 Academic Literature . . . 23 3.1.3 Popular Publications . . . 25 3.2 Problem Statement . . . 27 3.2.1 Problem Analysis . . . 27 3.3 Research Challenges . . . 28

3.3.1 Use of open-source vs. proprietary technology . . . 28

3.3.2 Best solution unknown . . . 28

3.3.3 Optimal computational resource allocation unknown . 29 3.3.4 Constraints . . . 29

3.4 Summary . . . 30

4 Literature Study 31 4.1 Introduction . . . 31

4.2 Facial Recognition . . . 31

4.2.1 Open-Set Facial Recognition . . . 32

4.2.2 Facial Recognition Task . . . 35

4.2.3 Proprietary Engines . . . 36

4.2.4 Open Source Engines . . . 39

4.2.5 Selection of Engine . . . 41

4.2.6 Summary . . . 42

4.3 Constraints . . . 43

4.3.1 Bias & Demographics . . . 43

4.3.2 Cost Constraints . . . 44

4.3.3 Technical Infrastructure . . . 44

4.3.4 Data Quality . . . 44

4.3.5 Summary . . . 45

4.4 System Design Principles . . . 45

4.4.1 Distributed Computing . . . 45 4.4.2 Software as a Service . . . 47 4.4.3 Low-Cost Computing . . . 48 4.4.4 Systems Engineering . . . 48 4.5 Conclusion . . . 49 5 Synthesis 51 5.1 Design Requirements . . . 51 5.1.1 Functional Requirements . . . 51 5.1.2 Non-Functional Requirements . . . 52

5.2 Synthesis and Evaluation . . . 53

5.2.1 System Design . . . 53

(7)

5.2.3 Functional Unit 2: Processing Stage . . . 56

5.2.4 Functional Unit 3: Monitoring . . . 61

5.3 Functional Evaluation . . . 64

5.3.1 Req. 1 Processing Time . . . 65

5.3.2 Req. 2 Assisted Monitoring . . . 65

5.3.3 Req. 3 Remote Monitoring . . . 65

5.4 Conclusion . . . 65

6 Validation & Experiments 67 6.1 Functional Validation . . . 67 6.1.1 Functional Test 1 . . . 68 6.1.2 Functional Test 2 . . . 70 6.1.3 Functional Test 3 . . . 72 6.2 Evaluation of FR Engines . . . 74 6.2.1 Experiment 1 - Accuracy . . . 74

6.2.2 Experiment 2 - Real-World Accuracy . . . 76

6.2.3 Experiment 3-Cost Comparison . . . 79

6.3 Cost-Comparison Model . . . 82 6.3.1 Model . . . 82 6.3.2 Hybrid approach . . . 85 6.3.3 Scenario . . . 87 6.3.4 Results . . . 88 6.3.5 Analysis of results . . . 88 6.4 Summary . . . 91 7 Conclusion 93 7.1 Research Summary . . . 93 7.2 Discussion of results . . . 94

7.2.1 Systems engineering approach . . . 94

7.2.2 Open-Source vs Commercial FR engine . . . 95

7.2.3 Cost-Performance Model . . . 95

7.3 Future Work . . . 96

7.3.1 Data Quality Filter . . . 96

7.3.2 Suitable training Data . . . 96

7.4 Validation . . . 97

7.5 Contributions . . . 98

Bibliography 100

(8)

A Cost-Performance Model Results 109 B IEEE Conference Proceeding 113

(9)

List of Figures

2.1 IDEF0 Representation of DSR environment [1] . . . 5

2.2 Three Cycle View of DSR [2] . . . 9

2.3 Stages of ADR approach [3] . . . 13

2.4 ADR stages with eADR cycle shown [4] . . . 16

2.5 Research maturity matrix . . . 17

3.1 Real-world scenario on which cost-performance model is based. 22 4.1 Basic Steps of Facial Recognition Task . . . 35

4.2 Distributed Architecture for Facial Recognition System . . . . 46

5.1 Functional Diagram of Proposed System . . . 53

5.2 Design of F/U 1: Data Acquisition . . . 55

5.3 Functional diagram of F/U 2: Cloud Server . . . 57

5.4 Flow Chart for the FR task, performed in F/U 2.4 . . . 58

5.5 Design of F/U 3: Monitoring . . . 62

6.1 Experimental setup used in functional testing . . . 68

6.2 Histogram of results for functional test 1 . . . 69

6.3 Table showing results for Functional Test 2 - Commercial FR Task . . . 71

6.4 Table showing results for Functional Test 2 - Email Task . . . 72

6.5 Table showing results for Functional Test 3 - Open-source FR Task . . . 73

6.6 Price Comparison of Commercial and Open-Source FR Engines 81 6.7 OpSet DIR curves for opsource and commercial FR en-gines . . . 83

6.8 Functional diagram of hybrid architecture . . . 86

6.9 Graph of low throughput scenario, AOI=1000 . . . 89

6.10 Graph of medium throughput scenario, AOI=5000 . . . 90

(10)
(11)

List of Tables

3.1 Problem validation as part of the Research Validation Matrix 30

4.1 Confusion matrix for 1:1 . . . 33

4.2 Validation matrix for literature study . . . 50

5.1 Table showing the results of testing two face detection methods 60 5.2 Table showing where functional requirements were met by sys-tem design . . . 66

6.1 Table showing results for Functional Test 2 . . . 71

6.2 Recognition Results . . . 75

6.3 Recognition Results . . . 78

6.4 Accuracy Results . . . 78

6.5 Assumptions regarding the scenarios for the cost-performance model . . . 87

(12)

List of Abbreviations

ADR Action Design Research AI Artificial Intelligence AOI Area of Interest

API Application Programming Interface AWS Amazon Web Services

BaaS Biometrics as a Service CD Conceptual Design

CSV Comma Separated Values

DIR Detection and Identification Rate DSR Design Science Research

eADR Elaborated Action Design Research EC2 Elastic Compute Cloud

EVM Extreme Value Machine F/U Functional Unit

FN False Negative FP False Positive FR Facial Recognition

GPU Graphical Processing Unit I/F Interface

(13)

IDEF0 ICAM Definition for Function Modeling IP Internet Protocol

LAN Local Area Network

LFW Labeled Faces in The Wild MMOD Max-margin Object Detection MUCT Millborrow/University of Cape Town PCA Principle Component Analysis

PD Problem Diagnosis PF Problem Formulation POI Person of Interest

PoPI Protection of Personal Information Act RR Recognition Rate

SaaS Software as a Service SBC Single Board Computer SDK Software Development Kit SES Simple Email Service SOA State of the Art

TCO Total Cost of Ownership TN True Negative

TP True Positive

(14)

Chapter 1

Introduction

The purpose of this research is the design and synthesis of an image-based ob-ject detection system. This research is focused on one specific type of image-based object detection system: a facial recognition (FR) enabled surveillance system. The principles used to synthesize the final system can also be used to synthesize other types of image-based object detection systems, and the design can function as a basis for similar systems.

1.1

Introduction to problem domain

Facial recognition technology is a growing field globally, finding its way into everything from smartphones and computers, to high-security access con-trol applications. While interest into the field of FR research has surged in recent years, the state of the technology is not yet at a point where au-tomated actionable facial recognition can be considered a mature technology. The current state of FR technology has many shortcomings and challenges that face its implementation, both technically and in terms of public percep-tion and ethics. These shortcomings and challenges are exacerbated when applying the technology in countries like South Africa, which have vastly different circumstances than the countries that devote the most resources to FR technology and typically adopt FR systems.

A specific area of interest for applying FR technology is surveillance, where film and television have long been exaggerating the capabilities of FR tech-nology to assist in keeping people and institutions safe.

(15)

The purpose of this study is to design and synthesize an FR-enabled surveil-lance system, as a specific instantiation of a general image-based object de-tection system.

This will be achieved by conducting research using the design science research (DSR) paradigm in order to solve the real-world problem of synthesizing an FR-enabled surveillance system to function within the South African envi-ronment, and the unique challenges thereof.

Following the DSR paradigm, research challenges will be identified, a system will be designed and evaluated, and research contributions will be made in the form of a documented best-practice implementation of the proposed sys-tem, a comparative study of open-source and commercial FR ”engines”, a cost-performance model for evaluating the effect of different FR engines on a human-AI hybrid system, as well as other contributions to the knowledge base.

The eADR methodology is followed throughout the completion of this re-search.

1.2

Outline

The research document is divided into 7 chapters:

• Chapter 1: Introduction provides an introduction to the problem do-main and research document;

• Chapter 2: Research Methodology introduces DSR as the main re-search paradigm in which this rere-search is undertaken. DSR as a gen-eral paradigm is explained, as well as how this research fits into that paradigm. eADR as the methodology for performing the research is also introduced and discussed. Finally, Quality Research Management (QRM) is introduced as a method to manage research to meet objec-tives;

• Chapter 3: Problem Statement contains an in-depth analysis of the problem investigated. Initial research was done in order to identify research challenges, and derive the formal problem statement which will be investigated. The problem identified from this initial research is the need for an FR-enabled surveillance system designed for the South African environment. The validation matrix (from QRM) is also

(16)

introduced as a visual tool to help validate the research as well as the way in which it was conducted;

• Chapter 4: Literature study contains an in-depth investigation into relevant literature topics in order to validate the identified research challenges, and propose solutions to these challenges. Facial recogni-tion technology is investigated in order to get a basic understanding of what an FR-enabled system would entail, and identify best-practices regarding FR. Facial recognition engines are investigated in order to see the options available for performing FR, as well as the differences between open-source and commercial FR technology. Challenges of facial recognition are investigated in order to identify any potential shortcomings the proposed system would have, and to find methods to mitigate these shortcomings as much as possible. Systems engineering and system design principles are also investigated in a bid to find a more holistic approach to improving the performance of the proposed system in a way that is not well-documented in traditional FR literature; • Chapter 5: Synthesis shows the design and integration of the final

system. The system is synthesized using best-practices as identified in the literature study in order to mitigate the shortcomings inherent to FR;

• Chapter 6: Validation and Experiments contains the functional eval-uation of the final system, a comparative study of the chosen open-source and commercial FR engines, and a cost-performance model for a real-world human-AI hybrid system. The comparative study showed that the commercial FR engine was generally more accurate than the open-source FR engine, while the open-source engine was cheaper, in general. The cost-performance model allowed these general assump-tions to be translated into a real-world scenario, where a commercial and open-source hybrid FR engine was combined, and found to be the best option in the general use case, while the individual open-source and commercial FR engines both have specific use cases where they outperform the other two engines;

• Chapter 7: Conclusion presents a summary of the work done in the document, along with the findings of this research. The final validation matrix is also shown and discussed.

(17)

Chapter 2

Research Methodology

This chapter introduces design science research (DSR) as the main research paradigm, and elaborated action design research (eADR) as a methodology of performing research within this paradigm. The application of the DSR paradigm to this specific research problem is also shown.

2.1

Design Science Research

DSR, or Design Science Research, is a problem-solving paradigm which aims to use the building of artifacts to increase knowledge of a problem domain [1]. This paradigm is a good fit for research which is based in real-world problems, as it results in additions to the knowledge base, as well as real-world artifacts which exist to address said problems. This has the advantage of delivering outputs which are beneficial to both researchers, who benefit from additions to the knowledge base, and practitioners, who benefit from the artifact produced by the DSR process.

DSR works by translating real-world problems into the research domain. Here, the problem is analysed and solved using research and techniques from the existing knowledge base. The techniques are then used in the real-world to synthesise and validate a solution to the original problem, most often in the form of an artifact, although principles, theories, and techniques could also serve as potential solutions.

DSR was chosen as the main research paradigm, as it provides a good balance between focus on the theoretical knowledge base, which is important for any

(18)

theoretical research, and the practical development of technological artifacts, which is important for this specific research problem. By using this approach, the general theoretical knowledge base can be expanded, while the specific real-world problem is also addressed.

eADR was further chosen as a methodology within this research paradigm, as it addresses shortcomings in the ADR approach, allowing the research to be approached in an effective manner. This is discussed further in section 2.2.2.

2.1.1

DSR Environment

The goal of DSR is to produce artifacts as research outputs. Resources are used to solve a problem given as input, while satisfying certain constraints. Figure 2.1 is an IDEF0 representation of the entire DSR environment.

(19)

2.1.1.1 Research Inputs

The research input of DSR in this case is the problem statement: ”An FR-enabled surveillance system is needed that can operate in the unique condi-tions of the South African context”. This will serve as the starting point of the DSR process.

2.1.1.2 Research Outputs

Effective application of DSR should result in artifacts which could be physi-cal, in the form of constructs and instantiations, or academic, in the form of models, or methods:

Constructs Constructs refer to the symbols and vocabulary which repre-sent solutions.

Instantiations Instantiations represent physical implementations of con-structs, either as prototypes, or fully implemented systems.

Models Models are theoretical representations of instantiations. This is a solution in the academic domain, as it can be used by other researchers as a solution to their problem, or as a starting point from which to develop their own solution.

In this research, a cost-comparison model is synthesized for a theoretical surveillance system in order to assess the financial implications of using FR with an open-source or commercial FR engine.

A theoretical design of the integrated system will also be done before syn-thesis can begin.

Methods Methods represent the techniques and practices used in the DSR process to reach a solution. The documented design and validation process, as set out in this document, serves as a method meta-artifact.

(20)

2.1.1.3 Research Resources

Research resources refer to the resources used in conducting an investigation. The resources for this study include a literature study, investigation of real-world systems, and experimentation.

Literature Study

A literature review was needed in order to decide on the best techniques to use when solving the problem. The following topics were focus areas that required thorough investigation:

• Facial recognition technology, challenges, and techniques; • Comparative studies of facial recognition engines;

• Unique challenges for FR technology within South Africa; • Systems engineering design principles.

Real-World Systems

Real-world systems were investigated in order to see the practical working of such systems and to identify shortcomings and challenges that were addressed by the final integrated system, or noted in the research document for future work.

Experimentation

Hands-on experimentation and prototyping played a major role in completing this project. After investigation into the best ways to complete the project, the various techniques and engines were investigated practically, by produc-ing prototypes, and testproduc-ing their capabilities in formal and informal testproduc-ing. This led to an intuitive understanding of some of the strengths and short-comings of the various techniques, and helped in eliminating some of the less suitable options.

2.1.1.4 Research Constraints

Within a research project, there are usually certain constraints which limit the potential solution in some ways. Constraints, such as cost, time, and technology, are discussed in this section.

(21)

Cost

As stated in the research inputs section of this chapter, the requirement of this project is to build a system suitable for the African environment. One of the biggest constraints when working in the African environment is cost, as African economies are generally less wealthy than some of the Western and Asian economies. This means that existing solutions are not necessarily suited for the African environment. Therefore, one of the aims is to make a cost-effective system that is comparable in accuracy and efficacy to the more expensive solutions that already exist in the problem space. Time

From the research done into FR techniques, it was seen that one of the pos-sible ways to improve the performance of the system involved collecting large amounts of training data in order to train more accurate neural networks. This collection of data and training of neural networks would take a massive time investment, which would be out of the scope of this project. Optimis-ing the efficacy of current FR technology was prioritised as the method of achieving the research goals.

Technology

Another constraint for the African environment is the availability of tech-nologies in general. The African environment does not have the same type of access to newer technologies that Western and Asian countries have. This had to be taken into account when designing a solution, because it would not necessarily be practical to put down the latest, greatest technology in all cases.

2.1.2

Three-Cycle View

Figure 2.2 shows the three-cycle view of DSR, as set out by Hevner [2]. This view serves as a framework for solving a project by dividing the project into three domains:

• Environment - The ”real world” with different problems, constraints, resources and outcomes;

• Design Science - Abstract methodology for solving research problems; • Knowledge Base - Abstract theories, techniques, information that

(22)

Figure 2.2: Three Cycle View of DSR [2]

Interaction between the domains occurs within three cycles, namely the rel-evance cycle, design cycle, and the rigor cycle.

2.1.2.1 Relevance Cycle

The relevance cycle is the interaction between the environment and design science domains. The application context shall be derived from the environ-ment domain. For this, the relevance cycle serves the following purposes:

• The inputs to the research problem, as set out previously in this chap-ter, are identified. This includes the requirements, constraints, and opportunities that exist within the real-world environment;

• The evaluation criteria are identified. The solution shall be evaluated in the context of the environment, in order to ensure that the proposed solution is fitting to the problem which was identified in the environ-ment. Only then can the solution be validated successfully.

Field testing occurs within the environment, and the results of this will decide whether further iterations of the relevance cycle are required.

(23)

2.1.2.2 Design Cycle

The main aim of the design cycle is the continuous process of designing solutions, evaluating them, and using feedback from the evaluation in order to improve them. This process runs in a loop until a satisfactory solution is found.

This cycle is unique from the other two, in the sense that it is heavily de-pendent on them for receiving inputs and requirements for designing and evaluating solutions, but functions completely independently once the infor-mation is gleaned from these cycles.

This continues until the complete solution is found, after which contributions to the environment and knowledge base are produced, in the form of artifacts, methods, constructions, and instantiations.

In order to do this, the cycle consists of two distinct phases:

• The design/build phase, where the design and synthesis of the solution is done;

• The evaluation phase, where the solution is evaluated. Design Phase

The design phase draws from the relevance cycle in order to define the inputs for the solution. From the rigor cycle, methods and techniques are drawn, so that a solution can actually be designed.

Evaluation Phase

The evaluation phase draws from information from the relevance cycle in order to evaluate how well the solution performs in solving the real-world problem. From the rigor cycle, the solution is evaluated in terms of the research contributions that it provides.

Both of these are important, and arguably equally so, so that the solution can be validated within both the environment and knowledge base domains. When both sets of requirements are met, the solution is suitable for both the real-world, and as research.

Evaluation will occur once an iteration of the design phase is completed. If the solution is not deemed acceptable yet, a new iteration of the design phase is done, with an updated set of inputs, so that the design can be improved upon.

(24)

2.1.2.3 Rigor Cycle

The rigor cycle interfaces the design science and knowledge base domains. The knowledge base contains the theories and techniques which are specific to the problem domain, as well as:

• Experiences which make up the state of the art;

• Existing artifacts and processes found in the application domain. The main purpose of the rigor cycle is validating the research contribution. By comparing what was produced within the design science domain, to all the theories, methods, artifacts etc. within the knowledge base, the solution can be critically evaluated in order to ensure that valid research contribu-tions were made, instead of simple application of previous knowledge in the creation of a new artifact [2].

Another important purpose of the rigor cycle is grounding the solution within the knowledge base. This will require thorough research in order to find appli-cable methods within the knowledge base, as well as opportunities to improve existing methods, ensuring the solution is relevant.

2.1.3

DSR Guidelines

There are seven guidelines that are used to guide a DSR project [1]: 1. Design as an artifact: DSR must produce an artifact.

2. Problem Relevance: DSR must solve a relevant business problem via a technology-based solution.

3. Design Evaluation: The solution must be critically evaluated in order to gauge how successful it is.

4. Research Contributions: DSR must produce valid research contri-butions in the process of developing a solution.

5. Research Rigour: DSR involves the thorough application of known methods and techniques in construction and evaluation of the solution. 6. Design as a Search: DSR involves the iterative process of searching for the ideal solution, by researching well-known methods and tech-niques, and adapting or developing new ones.

(25)

7. Communication of Research: Successful application of the DSR methodology involves communicating the findings in a format suitable for technical, and non-technical personnel.

Through the application of these guidelines, the DSR process was effectively applied to this project.

2.2

Elaborated Action Design Research

Elaborated Action Design Research (eADR) is an elaborated version of the Action Design Research (ADR) methodology. Before eADR can be explained, ADR must first be introduced.

2.2.1

ADR

ADR was described by Sein et al [3], as an attempt to create a methodology that combines the activities of DSR with a focus on the continuous devel-opment and evaluation inspired by the practical use and application of the solution.

ADR provides a set of guidelines and methods to effectively perform the DSR process.

The main goal of ADR is finding solution domains where: • The problem domain is inspired by practice;

• Rigorous research provides a base from which to develop new artifacts, find new solution domains, and generate knowledge which is beneficial to both research and practice.

This is done by identifying four stages of a project, with principles guiding each stage.

Figure 2.3 shows the stages and principles of ADR: 2.2.1.1 Stage 1 - Problem Formulation

The first stage is triggered by a problem, or knowledge-creation opportunity identified by practitioners in a certain field. This drives the problem for-mulation procedure. This stage is characterized by two principles, namely practice-inspired research, and theory-ingrained artifacts.

(26)

Figure 2.3: Stages of ADR approach [3] Principle 1 - Practice-Inspired Research

The purpose of this principle is to view technical problems as opportunities to create knowledge that can be applied to a class of problems. In this way, ADR differs from other methodologies, in that the focus is not on solving an individual problem per se, but on producing knowledge.

Principle 2 - Theory-Ingrained Artifact

This principle simply states that any artifacts produced in solving the prob-lem must be informed by theory.

(27)

2.2.1.2 Stage 2 - Building, Intervention and Evaluation This stage refers to:

• Building an artifact;

• Intervention into the organisation where the technical problem has been identified; and

• Evaluation of the artifact by all stakeholders;

This is an iterative process which will be repeated until a satisfactory ar-tifact has been produced. This stage is characterized by three principlees, reciprocal shaping, mutually influential roles, and authentic and concurrent evaluation.

Principle 3 - Reciprocal Shaping

Reciprocal shaping refers to the way the two domains (problem domain and solution domain), the artifact, and the organisational context, shape each other in this stage.

Principle 4 - Mutually Influential Roles

This principle refers to how the different stakeholders influence each other, and the importance of mutual learning by all.

Principle 5 - Authentic and Concurrent Evaluation

Principle 5 refers to the way that artifacts are evaluated in the ADR method-ology. ADR differs from many other stage-based methodologies, in that eval-uation is something that happens continuously as the artifact is built, instead of in a separate stage. This includes technical evaluations of the artifact, as well as evaluating the influence on the organisation.

2.2.1.3 Stage 3 - Reflection and Learning

Stage 3 is where the knowledge gained from producing an artifact is gen-eralised and applied to the broader class of problems. This stage occurs in parallel to the first two stages. Processes in the first two stages can be tweaked in order to maximise the potential addition to the knowledge base. This stage is characterized by the principle of guided emergence.

(28)

Principle 6 - Guided Emergence

Guided emergence is a term used to convey the idea that solutions in the ADR methodology will come from a combination of design, and a natural evolution, as the solution is shaped by continuous evaluation in the context of both technical criteria, and organisational intervention.

2.2.1.4 Stage 4 - Formalisation of Learning

Stage 4 is where all the additions to the knowledge base are formalised. This is the ultimate goal of ADR, to generalise the knowledge and theories developed in a certain solution, in order that it be applicable to a general class of problems that the specific problem belongs to. This stage is characterized by the principle of generalised outcomes.

Principle 7 - Generalised Outcomes

The final principle refers to the practice of generalising all aspects of the ADR process for the specific situation, in order to make it applicable to a broader range of problems. In an ADR investigation, a very specific solution is found for a specific problem within a specific context. All three aspects can and should be generalised in order to produce as big a contribution to the knowledge base as possible.

2.2.2

eADR

With ADR explained, eADR as a methodology can be introduced.

eADR was developed by Mullarkey et al in 2015 [5] as a response to practical shortcomings that they found with the definition of ADR.

The Problem Formulation (PF) stage of ADR is divided into two distinct stages:

• Problem Diagnosis (PD), where the research problem is properly researched and defined; and

• Conceptual Design (CD), where a conceptual design is made and refined.

Both these steps are done with close collaboration between researchers and practitioners.

(29)

The eADR cycle then, consists of the following steps: 1. Problem Formulation; 2. Artifact Creation; 3. Evaluation; 4. Reflection; 5. Learning.

These steps are cycled through for every ADR stage, as demonstrated in figure 2.4:

Figure 2.4: ADR stages with eADR cycle shown [4]

By applying the eADR methodology while following the DSR paradigm, this research can be completed and documented in a structured, concise manner.

2.2.3

Research Maturity

Research maturity refers to both the application domain maturity and solu-tion maturity. Figure 2.5 shows the maturity of this research on the maturity matrix.

The research done in this thesis hopes to improve the efficacy of a face-recognition enabled surveillance system within constraints that are inherent to the South African context by focusing on design techniques and principles

(30)

that are not usually emphasised in the literature. The research contribution can therefore be classified as an improvement, as shown in figure 2.5, as new solutions are developed for a problem which is well known.

Figure 2.5: Research maturity matrix

2.2.4

DSR Process

The DSR process consists of three cycles, the relevance cycle, rigor cycle, and design cycle, as explained in section 2.1.2 of this document. This section details how these cycles were applied to this study.

(31)

Relevance Cycle 1 1. Problem Formulation

• Initial investigation into problem area;

• Formal problem statement and scope definition. 2. Requirements Analysis

• Identify functional requirements;

• Define evaluation methods and criteria; Rigor Cycle 1

3. Literature Review

• Research literature relevant to FR, the design of FR-enabled systems, and systems engineering principles.

Design Cycle 1 4. Conceptual Design

• Design of sub-systems as well as integrated system;

• Evaluate designs and design alternatives to find appropriate solution. 5. Sub-System Synthesis

• FR task synthesis and evaluation; • Comparative study of FR engines;

• Surveillance task synthesis and evaluation. 6. Integration

• Integration of FR and surveillance sub-systems; Relevance Cycle 2

7. Functional evaluation

(32)

Rigor Cycle 2

8. Contribute to knowledge base

• Show appropriateness of artifacts developed through this research; • Document and communicate contributions;

• Document and communicate future work and opportunities that arise from this research;

2.3

Quality Research Management

This research used Quality Research Management (QRM) to ensure the re-search requirements are managed throughout the process [6]. QRM is a method to trace research requirements through to solutions in the design re-search process. Also, it provides visibility of the rere-search requirements, tracks progress, and ensures validation and verification. A Research Validation Ma-trix is used to capture requirements and to allocate solutions to requirements, as presented in the chapters of this thesis. In this research, specific research challenges were derived from a real-world problem, a literature survey, and expert inputs from practitioners. Solutions are defined from literature as well as creative input, and experiments in Chapter 5 provide critical evaluation of research solutions that address research challenges. The cost-comparison models are used to evaluate different solutions economically, and form the main artifacts of this research.

2.4

Summary

DSR was introduced as the research paradigm used to perform this research. A summary of the most important aspects of DSR was given, as well as how it was applied to this research. eADR was also introduced as the methodology used in the DSR paradigm. QRM was introduced as a method to manage the research quality and to provide focused research solutions.

The research maturity was shown to be an improvement (implementing a new solution to a known problem).

(33)

Chapter 3

Problem Statement

In this chapter, the research problem will be analysed in terms of the research methodology introduced in Chapter 2. Initial research will be done on relevant topics, in order to identify research challenges, and to set the research scope.

3.1

Introduction

The following sections contain initial research into the research problem. Initial research is done into real-world FR-enabled systems, monolithic solu-tions, and FR engines in general.

3.1.1

Real-World Systems

A variety of real-world FR and FR-enabled surveillance systems exist, both open-source, and proprietary. They range from FR engines and services, where facial recognition can be done on an ad hoc basis via an SDK or API [7][8][9], to monolithic FR-enabled surveillance platforms which perform surveillance, face detection, and face collection on one central processing unit [10].

(34)

3.1.1.1 Monolithic Solutions

Most commercially available FR-Enabled surveillance software platforms come in the form of monolithic solutions. These are usually software solutions, in-stalled on a server and used with supported cameras.

These processing units typically meet very high performance requirements, and are specifically designed for the task at hand. This makes them very suitable for the task, but cannot be developed and improved upon, except via updates made available by the companies selling the systems.

Due to the highly specific purpose and design of the processing units, they tend to be very expensive to acquire and difficult to maintain and repair in the case of a failure. The use of a single high-performance processing unit might not be the best design for such a system, and doubly so when considering South Africa’s unique challenges. These monolithic solutions may not always be suitable due to the financial climate of South Africa which does not allow high costs.

3.1.1.2 FR Engines

Various APIs and SDKs that perform facial recognition are available, both for commercial purchase and for use as open-source software. These FR engines do not have a surveillance aspect to it, but simply handle the FR aspect of the system.

FR engines have the advantage of not being limited to specific processing units, while leaving the management of the FR results to the developers. This can be advantageous, as it provides an opportunity to divide some of the processing for the FR and surveillance tasks in smart ways, so as to maximise the computational resources available, although it does come at the cost of devoting time and resources to the development of software to manage the FR results.

The availability of open-source solutions that report high accuracy provide an interesting avenue of investigation, as these provide the best possibility for customisation and further development, should issues arise from a certain implementation. This is not available with commercial FR engines, where end-users might become subject to vendor lock-in.

On the other hand, commercial FR engines, in general, deliver higher ac-curacy and provide a more fully-integrated, polished product, which could

(35)

potentially be improved and maintained more regularly than open-source of-ferings. These early observations provide enough reason to investigate open-source and commercial FR engines in more detail. The real-world scenario to be investigated is shown in figure 3.1.

Figure 3.1: Real-world scenario on which cost-performance model is based. In the diagram, the basis for a real-world model is provided. A population of users is accessing transaction points where facial images are captured and sent to a central location via the cloud. In the cloud, commercial or open-source facial recognition takes place in an automated manner. Manual recognition is not considered as a single solution in this research due to the extremely large volume of images, although a combination of automated and manual facial recognition may be an option.

(36)

3.1.1.3 Summary

By observing FR-enabled systems in the real world, clear research questions arise:

• The optimal way of using computational power to meet all the require-ments of the system;

• The use of open-source versus commercial technology for the real-world application;

• The suitability of existing system designs for use in the South African environment; and

• The cost-performance comparison between open-source and commercial technology.

3.1.2

Academic Literature

In this section, the academic literature relevant to the research problem are discussed. Focus is given to publications supporting a human-AI hybrid, the presence of algorithmic bias in technology, and the lack of publications on real-world surveillance systems that include FR integration.

3.1.2.1 Human-AI Hybrid

Recent research (2018) [11] confirms the fact that humans are excellent at facial recognition. However, the research also showed that a combination of human and machine provides the best facial recognition performance when compared to individual methods. That is, a hybrid approach to facial recog-nition is definitely a viable, and practical, approach. The approach in [11] was to compare human-human combinations with human-machine combina-tions in terms of accuracy. It was clear that human-machine combinacombina-tions outperformed other approaches, and that in general the combination of meth-ods (called fusing) improves accuracy. From the research, it was found that performing mathematical averaging will improve recognition accuracy. Sim-ilarly, research in 2009 [12] shows that the performance of humans in facial recognition is extremely high, with an accuracy of almost 100% when given sufficient context (99.2%). Even without an actual face, humans were able to recognise individuals with an accuracy of around 94%. This shows the value of using humans in a forensic setting, which is similar to the setting

(37)

of this research, but with our research having additional performance re-quirements of fairly high volume and limited time. However, humans do not have the parallel processing (or capacity) of distributed processing and are prone to err when fatigued after a long session of work. Therefore, it would make sense to augment humans with machines. In our research, based on the evidence above, it was decided to utilise both humans and machines in a hybrid configuration. The value of humans in the recognition pipeline has been demonstrated [12] and the value of combining machines and humans are also recognized [11]. The approach will be to use the large number of face collection devices in a field application to pre-process data in a parallel (distributed) manner, followed by centralised facial recognition to reduce ini-tial volume, to be finally processed by human analysts for higher accuracy. The large volume of input images may result in a relatively high number of false positives, but the reduction in total volume will be significant enough to lead to a highly reduced number of human analysts.

The challenge is thus to find an answer to the cost-effectiveness of using a hybrid approach with different automated algorithms (machines). Thus, by using a hybrid approach, one may reduce the cost of high volume facial recognition by selecting a slightly less costly machine solution with reduced accuracy, and then increasing accuracy by using humans later in the pipeline. In addition to recognition functions, humans have the ability to take action and follow standard operating procedures after a positive identification has been done by interacting with other human role players in the larger, oper-ational system.

In this research, humans will be used to compare faces from a reference li-brary of persons of interest to images obtained from the recognition machine. Hence, there is no requirement to specifically remember faces, i.e. recalling from memory is not a requirement, but rather to match faces as presented by the system.

3.1.2.2 Lack of Publications on Hybridization/Realistic Scenarios There is an abundance of literature focusing on improving the state of the art (SOA) for FR [13][14], and numerous publications on the implementation of FR engines in a wide array of applications [15][16], but literature about real-world surveillance systems, where human agents are integrated, is lacking. In initial research, very little literature could be found which looked at FR technology functioning as a part of a surveillance system in a holistic sense.

(38)

While research focused on improving the performance of FR technology is beneficial, it does leave challenges to researchers and practitioners in the present, as it is not viable to rely on technology to advance to an appropriate level of accuracy. This lack in literature means that an optimal solution utilising FR technology, at its current state, has not been found yet, or at least is not widely known or agreed upon.

Designing a solution that makes optimal use of the current state of FR tech-nology, within a surveillance system, will be one of the main challenges of this research.

3.1.2.3 Algorithmic Bias

Algorithmic bias refers to an inherent bias within the underlying algorithms and data sets within technology. Researchers have unveiled disparities within FR performances across different demographics [17][18]. This research shows that accuracy results for facial recognition performed on female and dark-skinned subjects are much worse than on light-dark-skinned and male subjects, for instance.

This is especially of concern in South Africa, as the country’s census data show that the majority of the population is not light-skinned, and highlights the need for a solution that does not rely solely on FR technology for making decisions. The need to identify methods to overcome the constraints of FR technology for the South African environment is also reiterated.

3.1.3

Popular Publications

Facial recognition technology has been a popular topic in the media for the past few years, with many highlighting the challenges the technology is facing, and the risks associated with adopting FR technology into widespread use [19][20][21][22].

The main concerns that popular media have with FR technology are: • Privacy & Potential for Misuse [23];

• Algorithmic Bias & Potential for Discrimination [19][20]; • Mistaken Identity & Responsibility [22][24].

(39)

3.1.3.1 Privacy

As advocates of FR technology increase, so too do the amount of commenta-tors who foresee an Orwellian future, where constant surveillance and privacy invasions are the norm. This is a valid concern, which can be addressed in the design of an FR-enabled system, but where legislation such as South Africa’s PoPI (Protection of Personal Information) Act, plays the biggest role [23]. 3.1.3.2 Algorithmic Bias

Historic demographic bias is unwittingly being transferred to algorithms of all sorts, and specifically FR algorithms, as reported by numerous publications [19][20]. This bias is especially visible in the disparate accuracy results that FR algorithms produce when used on people from different demographic groups. This disparity has been well publicised in academic literature [17], as well as in popular media [19][25][26].

This is a matter of grave concern, and increases the potential for misuse by parties with malicious intent. The fact that this bias has made its way into the public psyche highlights the severity of the issue, and also the need to ensure that these biases are mitigated as much as possible.

3.1.3.3 Mistaken Identity

Commercially available FR technology has been trialled by numerous police and security forces around the world [22][24]. Grave concerns have been raised on the effects of incorrect FR results if no clear policy or procedure has been implemented for acting on FR results. While some companies have guidelines for use in sensitive applications where mistaken identity can have serious consequences, there is no way of ensuring that these policies are implemented by end-users [27][24]

All of this highlights the need for an extra layer of accountability when acting on FR results, and validates the need to find a solution that works best with the limitations currently faced by FR technology.

(40)

3.2

Problem Statement

Considering the previous discussions and literature the problem statement for this research is as follows:

An FR-enabled surveillance system is needed that can operate in the unique conditions of the South African context

3.2.1

Problem Analysis

In this section, the functional and non-functional requirements for this re-search are discussed.

3.2.1.1 Functional Requirements

The following functional requirements shall be met in order for the research to be considered a success.

Req. 1: Processing Time

The processing time for both the FR and the surveillance tasks shall oc-cur within a to be determined time, so as to enable analysts to take action. Req. 2: Assisted Monitoring

The final system shall assist analysts in taking action, either by reducing the amount of time taken to make a decision, or by allowing them to make more accurate decisions.

Req. 3: Remote Monitoring

Analysts shall be able to monitor the surveillance site from a remote lo-cation.

(41)

3.2.1.2 Non-Functional Requirements

The following non-functional requirements shall be met in order for the re-search to be considered a success.

Deployment Options

The system, by design, shall allow for deployment options with regard to performance and cost.

Scaleability

The design of the system shall be such that scaling of deployments, both vertically and horizontally is pre-emptable.

3.3

Research Challenges

The following research challenges have been identified:

3.3.1

Use of open-source vs. proprietary technology

There are many different proprietary and open-source options for FR engines, with no clear indication that any of them are better suited to a specific ap-plication than any other. Research should be done into which FR engines are the best representative of each category, and further investigation should be undertaken to determine the most suitable engine for a variety of appli-cations.

3.3.2

Best solution unknown

Initial research showed that there are many different ways to undertake the FR task and the surveillance tasks with regard to firmware, software, hardware, and algorithmic methods. Research should be done to determine whether there is a ”best”, or ideal solution, and a solution must then be synthesised utilising these best-practices.

(42)

3.3.3

Optimal computational resource allocation

un-known

When looking at FR-enabled systems, specifically FR-enabled surveillance systems, there are a wide array of different computational architectures used in commercial and non-commercial systems, depending on the application. Specific research will focus on how to optimally allocate the computational resources in order to ensure scaleability and cost-effectiveness, while main-taining adequate performance.

3.3.4

Constraints

The use of facial recognition technology, in a surveillance system, in the South African environment, provides a unique set of constraints and challenges:

• Financial Constraints: The state of the South African economy means that certain solutions are not as viable as in stronger economies; • Technical Infrastructure: The technical infrastructure and availability

of certain technical resources is limited in South Africa, when compared to many developed countries;

• Data Quality: A surveillance system does not guarantee good quality face images for comparison with the target database. One shortcoming of current FR technology is the lack of robustness with regard to low-quality probe images;

• Demographics: Another well-documented shortcoming of current FR technology is the discrepancy in performance between different graphic groups, specifically dark-skinned persons. South Africa’s demo-graphic distribution is different from many of the countries where FR technology is pioneered, and is therefore vulnerable to the bias inherent in the current generation of FR technology

All of these constraints must be taken into consideration when designing the system, and research must be done in order to design a system which can perform optimally within this set of constraints.

(43)

3.4

Summary

In this chapter, the research problem was analysed. Different sources were used during the research in order to inform the problem statement:

An FR-enabled surveillance system is needed that can operate in the unique conditions of the South African context

This initial research also helped to identify specific research challenges, which will be investigated further in the subsequent chapters. Table 3.1 shows the relevant research challenges, and information sources which informed these challenges, in the form of a research validation matrix.

Table 3.1: Problem validation as part of the Research Validation Matrix

Throughout this document, a section of the Research Validation Matrix will ensure that adherence to the ADR methodology is maintained, and that the research process is carried out in a structured manner.

The Research Validation Matrix (from QRM) will also provide a layer of visibility into how the research process was applied to validate the research challenges identified in this chapter, as well as the solutions to each individual challenge, as described in the rest of the research.

(44)

Chapter 4

Literature Study

In this chapter, the topics introduced in Chapter 3 are researched and dis-cussed in depth. This research will help validate the research challenges iden-tified in the previous chapter, and inform and validate corresponding research solutions

4.1

Introduction

The aim of the chapter is to gain a better understanding into the topics which are relevant to the problem domain. The topics that will be discussed are:

• Facial Recognition; • Constraints;

• Allocation of Computational Resources; • Best Practices

Each of these topics plays an important role in designing the best possible solution, using a systems approach.

4.2

Facial Recognition

In this section, key concepts regarding facial recognition will be discussed. A selection of proprietary and open-source facial recognition engines will also be discussed, after which the best engine from each category will be selected.

(45)

These two will then be used in the final system, and compared to each other to determine the performance of each.

4.2.1

Open-Set Facial Recognition

The accuracy of a deep learning algorithm can be measured in many different ways, for different tasks. A popular way of reporting on accuracy is by measuring performance of the 1:1 verification task, which is where two faces are compared to each other, and the algorithm must determine whether or not they are of the same person [28]. The de-facto standard database for measuring this accuracy is the Labeled Faces in the Wild (LFW) database, where algorithms have surpassed human performance and regularly produce accuracy results higher than 99% [29].

Impressive as these results are, they do not necessarily hold true for other tasks, such as open-set facial recognition. Open-set facial recognition is when an algorithm is tasked with identifying a person which the algorithm may or may not have seen before. Compare this with closed-set recognition, where the target identity has definitely been seen before and enrolled into the set of known identities. In the case of open-set recognition, the question ”Do we know this person?” must be answered, before the question: ”Who is this person?” [30].

While much literature exists where the accuracy of FR algorithms in the verification and closed-set recognition tasks is improved [13][14][29], the lit-erature on open-set facial recognition is sparse, even though this task is relevant for many applications where there is considerable interest in using FR technology, such as identification of criminals in surveillance footage, or face collections where users are enrolled automatically when seen for the first time, and recognised on subsequent sightings.

This scarcity of literature is troubling, because the accuracies touted by many papers do not hold true for all use cases, or the perception that the lay person has of what the accuracy result means. This helps to verify research challenge B, best solution unknown.

(46)

4.2.1.1 Evaluation of open-set facial recognition

In this section, various ways of evaluating FR performance are discussed from literature.

Confusion Matrix

The evaluation of the open-set identification task in facial recognition differs from the evaluation of the verification task, because the number of different possible results are intrinsically different [31]. For 1:1 verification, there are four possible results for each face pair, as shown in table 4.1:

Table 4.1: Confusion matrix for 1:1 Expected

1 0 Result 1 TP FP

0 FN TN

The explanation of the confusion matrix values follows [28]:

• T P - True positive: The face pair was expected to match, and produced a positive match;

• F P - False positive: The face pair was expected not to match, but incorrectly produced a positive match;

• F N - False negative: The face pair was expected to match, but was incorrectly classified as not matching;

• T N - True negative]: The face pair was expected not to match, and was correctly classified as not matching.

For open-set identification, the possible values for the confusion matrix are as follows [28]:

• T Pk - Known true positive: A known person was correctly matched to

the corresponding person in the target database;

• F Pk - Known false positive: A known person was incorrectly matched

to the wrong person in the target database;

• F Nk - Known false negative: A known person was incorrectly classified

(47)

• T Nu - Unknown true negative: An unknown person was correctly

clas-sified as being unknown;

• F Pu - Unknown false positive: An unknown person was incorrectly

matched to a person in the target database;

This difference in potential results mean that evaluation methods used for 1:1 verification cannot be directly applied to the identification task, and the results from verification do not necessarily translate well to the identification task [32].

Detection and Identification Rate Curve

The detection and identification rate (DIR) refers to the rate at which known persons are matched. The DIR rate for the identification task is calculated by taking the number of T Pk results, and dividing this by the total number

of known persons in the test dataset.

For the remainder of this study, the term recognition rate (RR) will refer to the detection and identification rate of a system, and DIR will only be used to refer to the DIR curve.

Another important measure is the false acceptance rate (FAR). This is cal-culated by taking the sum of two measures:

• The amount of F Pk results divided by the total number of known

per-sons in the test dataset; and

• The amount of F Pu results divided by the total number of unknown

persons in the test dataset.

When these two measures are known, a DIR curve can be plotted, showing the RR values versus the FAR values for a given FR system, giving a good indication of the strictness with which the FR system makes a match. The DIR curve can be a misleading indicator of accuracy, as it is heavily affected by the proportion of known and unknown subjects in a test dataset, but for the purposes of creating a cost-performance model, the DIR curve will be a useful measure [28].

(48)

4.2.2

Facial Recognition Task

In [13], the authors present a deep neural network that create vector-space embeddings of input faces. These embeddings can then be compared to each other in order to ascertain whether faces are of the same person, or, in the case of open-set recognition, whether the target person is present in the face collection. Figure 4.1 shows the basic steps of the facial recognition task, as identified from the sections above:

Figure 4.1: Basic Steps of Facial Recognition Task

In the following sections, the state-of-the-art methods for each of the steps in figure 4.1 are discussed.

4.2.2.1 Face Detection

As far back as the 1980s, methods for detecting faces in photos were proposed, mostly by detecting certain features of a face. The Viola-Jones face detector [33], proposed in 2001, was one of the most influential face detectors, shifting the field of face detection research into sliding-window feature extraction. The latest works focus on neural-network based detectors such as the work done in [13][14][34]

4.2.2.2 Face Embedding

The Eigenfaces method [35], uses Principal Component Analysis (PCA) to find certain eigenvectors which correspond to individual faces. By projecting images to the ”face space”, the authors produce face embeddings that are unique enough to identify individual persons by their face alone, albeit at low accuracy rates. The latest works on face embeddings use neural net-works to create embeddings, and achieve SOA accuracies of 99%+ for the 1:1 verification task [13][14][34].

(49)

4.2.2.3 Comparison

Comparison, in the simplest case, can be done by calculating the cosine distance and using a simple threshold to decide whether two faces are of the same subject [13][36].

Classifiers are also used, with the Joint-Bayes classifier used to good effect in numerous papers [36][37], and the Extreme Value Machine(EVM), a classifier which is designed with open-set problems in mind [38], proposed as a possible solution for the open-set recognition task.

4.2.3

Proprietary Engines

There are many proprietary facial recognition services available to prospec-tive developers. These services offer ”black-box” solutions, where all the necessary steps of the FR task are implemented and can be used, in a sin-gle operation or API calls in some cases [9][7]. Three of the most popular cloud-based computer vision APIs will be discussed in greater depth in the following sections. The main advantages of using a proprietary engine are as follows [39]:

• Ease of development: By using a proprietary API, focus can be shifted to other areas of development in the system.

• Stability and scaleability: Significant development has gone into mak-ing each of the APIs stable and scaleable, which are very significant aspects of the system that will need to be figured out otherwise. • Continuous Development: Companies are continuously improving the

underlying technologies at use in their APIs. In theory, this means that results should continuously improve while using proprietary technol-ogy, though in practice, these improvements are not always backwards-compatible.

(50)

The main disadvantages of using a proprietary API as an engine are [39]: • Cost: Most APIs charge per API call, which means that costs scale

linearly as demand grows.

• Flexibility: When using proprietary technology, one is at the mercy of the developers in terms of underlying technology and techniques. This is not ideal, especially when dealing with very specific constraints, as is the case in this thesis.

• Control: Somewhat related to flexibility, using proprietary technology means that one has no control over circumstances like server down-time, data loss, data security etc. This becomes more important as the amount of data, as well as the sensitivity of the data increases.

4.2.3.1 Microsoft Azure Face

Microsoft Azure provides a web-based API for certain face detection and facial recognition tasks. The relevant tasks that we are interested in is the face detection and face identification tasks. Face identification in this case refers to identifying a new face from a dataset or collection of faces, with the added ability of adding and removing faces from the collection.

Microsoft Azure Face has the concept of Persons built in, so that they can emphatically say whether two images are of the same person or not, if that is what the user wants. A collection of persons can be made, with many faces linked to each person in the group. Recognizing a new image and querying such a collection will return the correct person, if that person is known. Optionally, the faces can be matched without persons being taken into ac-count. This will return the closest faces that match the test subject, even if the FR service does not believe the match is close enough to be the same person. A match confidence is returned, allowing users to make their own inferences.

Microsoft Azure Face returns facial landmarks and pose estimation informa-tion, both of which will be useful for potential future advancements.

The pricing of FR APIs is influenced by usage, but for simplicity, we will take the most expensive tier, which is from 0 transactions to 1 million transactions per month. This gives an effective cost of $0.001 per transaction [9].

(51)

4.2.3.2 Amazon Web Services Rekognition

Amazon Web Services(AWS) Rekognition is an web-based computer vision API available from Amazon Web Services. AWS Rekognition is very similar to Microsoft Azure Face in features, but does not have the concept of persons. A face collection can be made, which can be managed by the user. A user can then search for matches to a test subject. The FR service then returns the specified amount of matches, along with the confidence level of each match. The user can use this confidence to determine whether the match is strong enough to be a valid match. The API returns the same key properties as Azure, such as face landmarks and pose estimation, and also has the same effective pricing of $0.001 per transaction [40].

4.2.3.3 Kairos

Kairos is another web-based facial recognition API which offers a few unique features which the other FR providers do not:

• ”Diversity” detection, which provides ethnicity identification of sub-jects;

• Liveliness/Anti-spoofing detection, which provides an extra layer of security by picking up whether a person is wearing a mask or holding up a photo.

Kairos has a different pricing model from the other FR services. It requires an upfront payment, depending on what tier you choose, after which there is an additional cost per transaction.

The cheapest cost per transaction is $0.001 per transaction, which comes after a $499 monthly payment [8]. The cheaper tiers limit the number of transactions per minute, which is a major concern. The effective cost per transaction is also more expensive.

(52)

4.2.4

Open Source Engines

Due to the interest in this field of research, quite a few open-source projects exist that aim to implement the latest facial recognition techniques. Us-ing open-source technology for the face-recognition engine has the followUs-ing advantages:

• Access to source-code: Open-source technology gives access to the source code of the software being used. This allows a developer to understand the underlying technology used, as well as modify and aug-ment it to suit their needs.

• Price: Open-source technology is available for use, free of charge, al-though limitations sometimes exist on commercial use of open-source software and licensing restrictions on derivations.

• Community support: Because the source code of these projects are freely available, anyone with an interest can offer improvements or modifications to the projects. This usually means that there are many variations on popular projects, making it easier to find one that suits the problem.

Open-source technology has a number of disadvantages. The following are some that are common to the open-source engines discussed:

• No dedicated support: While a popular open-source project ensures that there is a lot of community support, open-source software often lacks dedicated support. This can make development and troubleshoot-ing more difficult than with proprietary software.

• Ease-of-Use: Open-source software, generally speaking, is aimed at ad-vanced users who have knowledge of the technologies being used. These projects can sometimes be difficult to implement and integrate.

Referenties

GERELATEERDE DOCUMENTEN

Op basis van de bedrijven in een groep kunnen uitspraken worden gedaan voor de hele groep, doordat door de stratificatie bedrijven uit alle groepen zijn opgenomen kunnen

The goal of this thesis is to quantify the performance benefits and accuracy trade­offs that occur       when performing the Stroke Width Transform (SWT) algorithm on a mobile GPU,

The present study aimed to investigate the return of negative emotion following different imagery interventions within a new complex emotional learning paradigm, using the Trier

Zoals vermeld is een grachtsite te zien op historische kaarten en worden deze sporen geïdentifi­ ceerd met het Oude Herlegem of Beerlegem, terwijl Nieuw Herlegem zich vanaf de late

Net als ten oosten van de aangetroffen structuur meer naar het zuiden in sleuf 6 (S 23 en S 24 zie infra), kunnen ook hier sporen van landbouwactiviteiten (diepploegen) ten

• Draagt medeverantwoorde- lijkheid voor het aansturen van processen en de ontwik- keling van collega’s....

(as the states are hidden) but we can find which sequence of states gives the highest probability of producing the sequence of observations =

Vierhoek ABDE is koordenvierhoek omdat de hoeken ADB en AEB beide 90 0