Secret-key rates and privacy leakage in biometric systems

(1)

Secret-key rates and privacy leakage in biometric systems

Citation for published version (APA):

Ignatenko, T. (2009). Secret-key rates and privacy leakage in biometric systems. Technische Universiteit Eindhoven. https://doi.org/10.6100/IR642839

DOI:

10.6100/IR642839

Document status and date: Published: 01/01/2009

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Biometric Systems

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de rector magnificus, prof.dr.ir. C.J. van Duijn, voor een commissie aangewezen door

het College voor Promoties in het openbaar te verdedigen op dinsdag 2 juni 2009 om 16.00 uur

door

Tanya Ignatenko

(3)

Copromotor:

dr.ir. F.M.J. Willems

This research was kindly supported by SenterNovem, The Netherlands, as a part of IOP Generieke Communicatie Progam.

c

All rights reserved. No part of this publication may be reproduced, stored in a re-trieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright owner.

Ignatenko, Tanya

Secret-key rates and privacy leakage in biometric systems / by Tanya Ignatenko. - Eindhoven : Technische Universteit Eindhoven, 2009.

A catalogue record is available from the Eindhoven University of Technology Library

(4)

prof.dr.ir. J.W.M. Bergmans, promotor

Technische Universiteit Eindhoven, The Netherlands dr.ir. F.M.J. Willems, copromotor

Technische Universiteit Eindhoven, The Netherlands prof.dr. P. Narayan, extern lid

University of Maryland, USA prof.dr. A.A.C.M. Kalker, extern lid Harbin Institute of Technology, China prof.dr.ir. J.P.M.G. Linnartz, lid TU/e

Technische Universiteit Eindhoven, The Netherlands prof.dr. M.C. Gastpar, extern lid

University of California at Berkeley, USA prof.dr.ir. C.H. Slump, extern lid

University of Twente, The Netherlands prof.dr.ir. A.C.P.M. Backx, voorzitter

(5)

(6)

(7)

(8)

Secret-Key Rates and Privacy Leakage in Biometric Systems

In this thesis both the generation of secret keys from biometric data and the binding of secret keys to biometric data are investigated. These secret keys can be used to regulate access to sensitive data, services, and environments. In a biometric secrecy system a secret key is generated or chosen during an enrollment procedure in which biometric data are observed for the first time. This key is to be reconstructed after these biometric data are observed for the second time when authentication is required. Since biometric measurements are typically noisy, reliable biometric secrecy systems also extract so-called helper data from the biometric observation at the time of en-rollment. These helper data facilitate reliable reconstruction of the secret key in the authentication process. Since the helper data are assumed to be public, they should not contain information about the secret key. We say that the secrecy leakage should be negligible. Important parameters of biometric key-generation and key-binding systems include the size of the generated or chosen secret key and the information that the helper data contain (leak) about the biometric observation. This latter pa-rameter is called privacy leakage. Ideally the privacy leakage should be small, to prevent the biometric data of an individual from being compromised. Moreover, the secret-key length (also characterized by the secret-key rate) should be large to mini-mize the probability that the secret key is guessed and unauthorized access is granted. The first part of this thesis mainly focuses on the fundamental trade-off between the key rate and the privacy-leakage rate in biometric generation and secret-binding systems. This trade-off is studied from an information-theoretical perspective for four biometric settings. The first setting is the classical secret-generation setting as proposed by Maurer [1993] and Ahlswede and Csiszár [1993]. For this setting the achievable secret-key vs. privacy-leakage rate region is determined in this thesis. In the second setting the secret key is not generated by the terminals, but independently chosen during enrollment (key binding). Also for this setting the region of achievable secret-key vs. privacy-leakage rate pairs is determined. In settings three and four

(9)

zero-leakage systems are considered. In these systems the public message should contain only a negligible amount of information about both the secret key and the biometric enrollment sequence. To achieve this, a private key is needed, which can be observed only by the two terminals. Again both the secret generation setting and chosen secret setting are considered. For these two cases the regions of achievable secret-key vs. private-key rate pairs are determined. For all four settings two notions of leakage are considered. Depending on whether one looks at secrecy and privacy leakage separately or in combination, unconditional or conditional privacy leakage is considered. Here unconditional leakage corresponds to the mutual information be-tween the helper data and the biometric enrollment sequence, while the conditional leakage relates to the conditional version of this mutual information, given the secret. The second part of the thesis focuses on the privacy- and secrecy-leakage anal-ysis of the fuzzy commitment scheme. Fuzzy commitment, proposed by Juels and Wattenberg [1999], is, in fact, a particular realization of a binary biometric secrecy system with a chosen secret key. In this scheme the helper data are constructed as a codeword from an error-correcting code, used to encode a chosen secret, masked with the biometric sequence that has been observed during enrollment. Since this scheme is not privacy preserving in the conditional privacy-leakage sense, the un-conditional privacy-leakage case is investigated. Four cases of biometric sources are considered, i.e. memoryless and totally-symmetric biometric sources, memoryless and input-symmetric biometric sources, memoryless biometric sources, and station-ary and ergodic biometric sources. For the first two cases the achievable rate-leakage regions are determined. In these cases the secrecy leakage rate need not be posi-tive. For the other two cases only outer bounds on achievable rate-leakage regions are found. These bounds, moreover, are sharpened for fuzzy commitment based on systematic parity-check codes. Using the fundamental trade-offs found in the first part of this thesis, it is shown that fuzzy commitment is only optimal for memoryless totally-symmetric biometric sources and only at the maximum secret-key rate. More-over, it is demonstrated that for memoryless and stationary ergodic biometric sources, which are not input-symmetric, the fuzzy commitment scheme leaks information on both the secret key and the biometric data.

Biometric sequences have an often unknown statistical structure (model) that can be quite complex. The last part of this dissertation addresses the problem of finding the maximum a posteriori (MAP) model for a pair of observed biometric sequences and the problem of estimating the maximum secret-key rate from these sequences. A universal source coding procedure called the Context-Tree Weighting (CTW) method [1995] can be used to find this MAP model. In this thesis a procedure that de-termines the MAP model, based on the so-called beta-implementation of the CTW method, is proposed. Moreover, CTW methods are used to compress the biometric sequences and sequence pairs in order to estimate the mutual information between the

(10)

sequences. However, CTW methods were primarily developed for compressing one-dimensional sources, while biometric data are often modeled as two-one-dimensional pro-cesses. Therefore it is proved here that the entropy of a stationary two-dimensional source can be expressed as a limit of a series of conditional entropies. This result is also extended to the conditional entropy of one two-dimensional source given an-other one. As a consequence entropy and mutual information estimates can be ob-tained from CTW methods using properly-chosen templates. Using such techniques estimates of the maximum secret-key rate for physical unclonable functions (PUFs) are determined from a data-set of observed sequences. PUFs can be regarded as inanimate analogues of biometrics.

(11)

(12)

In dit proefschrift wordt de generatie van geheime sleutels uit biometrische data en het binden van geheime sleutels aan biometrische data onderzocht. Deze geheime sleutels kunnen gebruikt worden om de toegang te regelen tot gevoelige gegevens, diensten en omgevingen. In een biometrisch secrecy-systeem wordt een geheime sleutel gegenereerd of gekozen (gebonden) tijdens een enrollment procedure waar-bij de biometrische data voor de eerste keer worden geobserveerd. Deze geheime sleutel moet gereconstrueerd kunnen worden als de authentieke biometrische data voor een tweede keer geobserveerd worden tijdens de authentication procedure. Om-dat biometrische metingen in het algemeen verruist zijn, extraheert een biometrisch secrecy-systeem ook zogenaamde helper-data uit de biometrische observatie tijdens de enrollment procedure. Deze helper-data maken betrouwbare reconstructie mo-gelijk tijdens de authentication procedure. Omdat de helper-data openbaar worden verondersteld, zouden ze geen informatie mogen bevatten over de geheime sleutel. We zeggen dat de secrecy-leakage verwaarloosbaar klein moet zijn. Belang-rijke pa-rameters van een biometrisch sleutel-generatie schema en een sleutel-binding schema zijn de grootte van de geheime sleutel en de informatie die de helper-data bevat over de biometrische data. Deze laatste parameter wordt de privacy-leakage genoemd. In het ideale geval is deze privacy-leakage klein om te voorkomen dat de biometrische gegevens van een persoon gecompromitteerd raken. Bovendien moet de lengte van de geheime sleutel (ofwel de secret-key rate) groot zijn om de kans dat hij geraden wordt, waardoor onbevoegde toegang wordt verkregen, zo klein mogelijk te maken.

Het eerste deel van dit proefschrift richt zich op de fundamentele balans tussen secret-key rate en privacy-leakage in sleutel-generatie en sleutel-binding systemen. Deze balans wordt vanuit een informatietheoretisch perspectief bestudeerd voor vier biometrische situaties. De eerste situatie is de klassieke sleutel-generatie situatie zoals voorgesteld door Maurer [1993] en Ahlswede en Csiszár [1993]. Voor deze situatie wordt het bereikbare secret-key versus privacy-leakage gebied bepaald in dit proefschrift. In de tweede situatie wordt de geheime sleutel niet gegenereerd tijdens de enrollment-procedure maar onafhankelijk gekozen (sleutel-binding). Ook voor deze situatie wordt het gebied van bereikbare secret-key versus privacy-leakage paren hier afgeleid. In situaties drie en vier worden zero-leakage systemen beschouwd. In deze systemen mag de publieke helper-data slechts een verwaarloosbare hoeveel-heid informatie over de geheime sleutel en de biometrische enrollment data

(13)

bevat-ten. Om dit te kunnen bereiken is een private sleutel nodig die alleen maar beschik-baar is voor beide terminals (tijdens enrollment en authentication). Ook hier worden sleutel-generatie en sleutel-binding onderzocht. Voor deze twee gevallen worden de bereikbare gebieden van secret-key versus private-key rate paren afgeleid. In alle vier de situaties beschouwen we twee soorten van privacy-leakage. Afhankelijk van of men nu kijkt naar secrecy-leakage en privacy-leakage afzonderlijk of in combi-natie, wordt niet-conditionele of conditionele privacy-leakage beschouwd. Hierbij correspondeert niet-conditionele leakage met de mutuele informatie tussen de helper-data en de biometrische helper-data, terwijl conditionele leakage correspondeert met deze mutuele informatie gegeven de geheime sleutel.

Het tweede deel van het proefschrift richt zich op de privacy-leakage versus secrecy-leakage analyse van fuzzy-commitment schema’s. Fuzzy commitment, voor-gesteld door Juels en Wattenberg [1999], is een speciale realisatie van een binair biometrisch systeem met een gekozen geheime sleutel (sleutel-binding). In ditschema wordt de helper-data gevormd door het codewoord van een fout-verbeterende code, dat ontstaan is uit de sleutel, te maskeren door er de biometrische enrollment data bij op te tellen. Omdat dit schema geen bescherming biedt tegen privacy-leakage in het conditionele geval, onderzoeken we hier niet-conditionele privacy-leakage. We beschouwen vier soorten bronnen, geheugenloze totaalsymmetrische bronnen, geheugenloze inputsymmetrische bronnen, geheugenloze bronnen, en stationaire er-godische bronnen. Voor de eerste twee klassen bepalen we het bereikbare secret-key rate versus privacy-leakage gebied. Het blijkt dat hier de secrecy-leakage niet positief hoeft te zijn. Voor de andere twee klassen kunnen we alleen bovengren-zen voor de bereikbare gebieden afleiden. Deze bovengrenbovengren-zen kunnen worden ver-scherpt als in het fuzzy-commitment schema gebruik wordt gemaakt van systema-tische parity-check codes. Als we de fundamentele balans die we afgeleid hebben in het eerste gedeelte van dit proefschrift vergelijken met de balans voor fuzzy com-mitment, blijkt dat fuzzy commitment alleen optimaal kan zijn voor geheugenloze totaalsymmetrische bronnen als de secret-key rate maximaal is. Bovendien wordt voor geheugenloze en stationaire ergodische bronnen, die niet inputsymmetrisch zijn, aangetoond dat fuzzy commitment informatie lekt over zowel de biometrische data als de geheime sleutel.

Biometrische rijen hebben een statistische structuur (model) die vaak onbekend en vrij complex is. Het laatste gedeelte van dit proefschrift gaat over de bepaling van het maximum a-posteriori (MAP) model dat past bij een paar geobserveerde biometrisch rijen. Een universele broncodeer-methode die de naam Context-Tree Weighting (CTW) [1995] methode heeft, kan gebruikt worden om dit MAP-model te vinden. In dit proefschrift stellen we een procedure voor die het MAP-model bepaalt, gebaseerd op de zogenaamde beta-implementatie van het CTW algoritme. Daarnaast gebruiken we het CTW algoritme om biometrische rijen en paren van

(14)

ri-jen te comprimeren om zodoende een schatting te krijgen van de mutuele informatie tussen deze rijen. Omdat CTW methodes primair ontwikkeld zijn om eendimension-ale data-rijen te comprimeren terwijl biometrische data vaak gemodelleerd worden als tweedimensionaal, bewijzen we eerst dat de entropie van een stationair tweedi-mensionaal proces uitgedrukt kan worden als een limiet van een reeks conditionele entropieën. Dit resultaat wordt vervolgens uitgebreid naar de conditionele entropie van een tweedimensionaal proces gegeven een tweede proces. Als gevolg hiervan kunnen schattingen van entropieën en mutuele informaties verkregen worden met het CTW algoritme als we behoorlijk-gekozen context-templates gebruiken. Met behulp van deze technieken worden schattingen van de maximale secret-key rate voor physi-cal unclonable functions (PUFs) gemaakt gebaseerd op een dataset die geobserveerde paren van data-rijen bevat. PUFs kunnen beschouwd worden als levenloze analogons van biometrieën.

(15)

(16)

1 Introduction 3

1.1 Introduction . . . 3

1.2 Biometrics and Physical Unclonable Functions . . . 4

1.2.1 Traditional Biometric Systems . . . 4

1.2.2 Physical Unclonable Functions . . . 6

1.3 From Traditional Biometric Systems to Biometric Secrecy Systems . 7 1.3.1 Types of Security . . . 7

1.3.2 Biometric Secrecy Systems with Helper Data . . . 8

1.4 Modeling Biometric Data . . . 13

1.5 Related Work . . . 14

1.6 Thesis Organization . . . 15

1.6.1 Chapter 2: Secret Sharing and Biometric Systems . . . 15

1.6.2 Chapter 3: Privacy Leakage in Biometric Secrecy Systems . 16 1.6.3 Chapter 4: Leakage in Fuzzy Commitment Schemes . . . . 18

1.6.4 Chapter 5: Context Weighting And Maximizing Using Ratio Representation . . . 19

1.6.5 Chapter 6: Secret-Key Rate Estimation Based on Context Weighting Methods . . . 19

1.7 Publications by the Author . . . 20

1.7.1 Book Chapters . . . 20

1.7.2 Journals . . . 21

1.7.3 Conference Proceedings . . . 21

1.8 BASIS . . . 22

2 Secret Sharing and Biometric Systems 23 2.1 Introduction . . . 23

2.2 Biometric Secret Generation Model . . . 24

2.2.1 Definitions . . . 24

2.2.2 Statement of Result . . . 25

2.3 Proof of Thm. 2.1 . . . 26

2.3.1 Jointly Typical Sequences . . . 26

2.3.2 Achievability Proof . . . 27

(17)

2.4 Privacy Leakage for Codes Achieving the Maximum

Secret-Key Rate . . . 30

2.5 Stationary Ergodic Case . . . 31

2.6 FRR and FAR in Biometric Secret Generation Models . . . 32

2.7 Conclusions . . . 35

3 Privacy Leakage in Biometric Secrecy Systems 37 3.1 Introduction . . . 37

3.1.1 Motivation . . . 37

3.1.2 Eight Models . . . 38

3.1.3 Chapter Outline . . . 39

3.1.4 An Example . . . 40

3.2 Eight Cases, Definitions . . . 41

3.2.1 Basic Definitions . . . 41

3.2.2 Biometric Secret Generation Model . . . 43

3.2.3 Biometric Model with Chosen Keys . . . 44

3.2.4 Biometric Secret Generation Model with Zero-Leakage . . . 45

3.2.5 Biometric Model with Chosen Keys and Zero-Leakage . . . 46

3.3 Statement of Results . . . 47

3.4 Properties of the Regions . . . 49

3.4.1 Secret-Key Rates in Regions

R

1,

R

2and

R

3 . . . 49

3.4.2 Bound on the Cardinality of Auxiliary Random Variable U . 50 3.4.3 Convexity . . . 51

3.4.4 Example: Binary Symmetric Double Source . . . 53

3.5 Proofs of the Results . . . 56

3.5.1 Modified Typical Sets . . . 56

3.5.2 Proof of Thm. 3.1 . . . 58 3.5.3 Proof of Thm. 3.2 . . . 65 3.5.4 Proof of Thm. 3.3 . . . 65 3.5.5 Proof of Thm. 3.4 . . . 69 3.5.6 Proof of Thm. 3.5 . . . 70 3.5.7 Proof of Thm. 3.6 . . . 72 3.5.8 Proof of Thm. 3.7 . . . 74 3.5.9 Proof of Thm. 3.8 . . . 76

3.6 Relations Between Regions . . . 77

3.6.1 Overview . . . 77

3.6.2 Comparison of

R

1and

R

2 . . . 79

3.6.3

R

3. Relation to

R

1 . . . 80

3.6.4

R

4 . . . 81

(18)

4 Leakage in Fuzzy Commitment Schemes 83

4.1 Introduction . . . 83

4.2 The Fuzzy Commitment Scheme . . . 84

4.2.1 Description . . . 84

4.2.2 Preliminary Analysis of Information Leakage . . . 87

4.3 The Totally-Symmetric Memoryless Case . . . 87

4.3.1 Statement of Results, Discussion . . . 87

4.3.2 Proof of the Results . . . 90

4.4 The Input-Symmetric Memoryless Case . . . 92

4.5 The Memoryless Case . . . 94

4.6 The Stationary Ergodic Case . . . 97

4.7 Tighter Bounds with Systematic Parity-Check Codes . . . 101

4.7.1 Tighter Bounds for the Stationary Ergodic Case . . . 101

4.7.2 Tighter Bounds for the Memoryless Case . . . 103

5 Context Weighting And Maximizing Using Ratio Representation 107 5.1 Introduction . . . 107

5.2 Context-Tree Weighting Methods . . . 109

5.2.1 Arithmetic Coding . . . 109

5.2.2 The Krichevski-Trofimov Estimator . . . 109

5.2.3 Tree Sources . . . 110

5.2.4 Unknown Parameters, Known Model . . . 111

5.2.5 Weighting . . . 112 5.2.6 Unknown Model . . . 112 5.2.7 Performance . . . 113 5.3 Ratios of Probabilities . . . 113 5.4 Context-Tree Maximizing . . . 115 5.4.1 Two-Pass Methods . . . 115

5.4.2 The Context-Tree Maximizing Algorithm . . . 116

5.4.3 Performance . . . 116

5.5 Context-Tree Maximizing Using Ratio Representation . . . 117

5.5.1 Computing A Posteriori Model Probabilities . . . 117

5.5.2 Finding the Maximum A Posteriori Model . . . 118

5.6 Context Maximizing Using Ratio Representation: Class III . . . 119

(19)

5.6.2 Computing A Posteriori Model Probabilities . . . 122

5.6.3 Finding the Maximum A Posteriori Model . . . 123

6 Secret-Key Rate Estimation Based on Context Weighting Methods 127 6.1 Introduction . . . 127

6.2 On the Entropy of Two-Dimensional Stationary Processes . . . 128

6.2.1 On the Entropy of a Two-Dimensional Stationary Process . 128 6.2.2 On the Conditional Entropy of a Two-Dimensional Station-ary Process Given a Second One . . . 131

6.3 Mutual Information Estimation . . . 134

6.3.1 Convergence . . . 134

6.3.2 Using Context Weighting Methods . . . 134

6.4 Biometric Secrecy Systems in the Stationary Ergodic Case . . . 136

6.5 Experimental Results . . . 136

6.5.1 Physical Unclonable Functions . . . 136

6.5.2 Secret-Key Rate Estimation . . . 137

6.7 Acknowledgements . . . 144

7 Conclusions and Future Directions 145 7.1 Conclusions . . . 145 7.2 Future Directions . . . 147 Bibliography 148 Glossary 157 Acknowledgment 159 Curriculum Vitae 161

(20)

(21)

Chapter 1

Introduction

Big Brother is watching you (G. Orwell).

1.1 Introduction

Nowadays people live in the era of large-scale computer networks connecting huge numbers of electronic devices. These devices execute applications that use the net-works for exchanging information. Sometimes the information that is transmitted within these networks and stored by the devices is sensitive to misuse. Moreover, the networks and devices cannot always be trusted. This can lead to intrusions into the privacy of users by e.g. hackers, commercial parties, or even by governmental institutions. Also illegal copying of copyrighted content, illegal use of e–payment systems, and identity theft can be foreseen. In order to prevent all such malicious actions the security of networks and devices should be adequate.

Traditional systems for access control, which are based on the possession of se-cret knowledge (passwords, sese-cret keys, etc.) or on a physical token (ID card, smart-card, etc.), have the drawback that they cannot guarantee that it is the legitimate user who e.g. enters a password or presents a smart-card. Moreover, passwords can often be guessed, since people tend to use passwords which are easy to remember. Physical tokens in their turn can be lost, stolen, or copied.

Biometric systems offer a solution to most of the problems mentioned above. They could be either substituted for traditional systems or used to reinforce them. Biometric systems are based on physical or behavioral characteristics of human be-ings, like faces, fingerprints, voice, irises, gait, see Jain et al. [36]. The results of the measurement of these characteristics are called biometric data. Biometric data have the advantage that potentially they are unique identifiers of human beings, as was argued by Clarke [12]. They provide therefore a closer bond with the identity of their owner than a password or a token does. Moreover, biometric data cannot be stolen or lost. They potentially contain a large amount of information and there-fore are hard to guess. All this makes biometrics a good candidate for substitution of traditional passwords and secret keys. A drawback of using biometrics is that the

(22)

out-come of their measurements is, in general, noisy due to intrinsic variability, varying measurement conditions, or due to the use of different hardware. However, advanced signal-processing and error-correcting techniques can be applied to guarantee reliable overall behavior.

The attractive property of uniqueness, that holds for biometrics, also results in its major weakness. Unlike passwords and secret keys, biometric information, if com-promised once, cannot be canceled and easily replaced by other biometric informa-tion, since people only have limited resources of biometric data. Theft of biometric data results in a partially stolen identity, and this is, in principle, irreversible. There-fore requirements for biometric systems should include secure storage and secure communication of biometric data in the applications where they are used.

Although biometric data may provide solutions to the problems discussed above, there are situations when they cannot be used. There is e.g. a small percentage of people whose fingerprints cannot be used due to intrinsic bad quality, see Dorizzi [24]. Also DNA recognition fails for identical twins. In such situations standard cryptographic tools are needed to provide additional security.

An artificial inanimate analog of biometrics is a Physical Unclonable Function (PUF). PUFs were introduced by Pappu [52] as objects having properties similar to standard biometric modalities. They cannot easily be copied or cloned and are unique, and just like human biometrics the data that result from their measurements are noisy. The most prominent advantage of PUFs over human biometrics is that it can easily be replaced when necessary. Privacy is not a point of concern in systems based on PUFs, but note also that there is no strong bonding between a PUF and its owner.

In what follows we will first describe traditional biometric systems in more detail. In these systems biometric data are supposed to be stored in the clear although these data can provide access to data or to a service. After that, we will discuss techniques that make (complete) reconstruction of the biometric data from the stored information practically impossible. These techniques therefore prevent an attacker from getting access to data or to a service after breaking into the database or eavesdropping on the network.

1.2 Biometrics and Physical Unclonable Functions

1.2.1 Traditional Biometric Systems

The terms “Biometrics” and “Biometry” have been used since the first part of the 20th century to refer to the field of development of statistical and mathematical methods applicable to data analysis problems in biological sciences [1]. Relatively recently the term “Biometrics” has also been used to refer to the field of technology devoted to automatic identification of individuals using biological traits, such as those based

(23)

on retinal or iris scanning, fingerprints, faces, signatures, etc. Such biological traits are unique for individuals as noted in Jain et al. [36].

Traditionally, biometric recognition was used in forensic applications and per-formed by human experts. However, recent advantages in automated recognition re-sulted in the spreading of biometric applications, now ranging from border control at airports to access control in Walt Disney amusement parks (see Wayman et al. [85]). A typical biometric system is essentially a pattern recognition system, which performs one or more identity checks based on specific physiological or behavioral characteristics possessed by individuals. There are two different ways to resolve an individual’s identity, i.e. authentication and identification. Authentication (Am I who I claim I to be?) involves confirming or denying the individual’s claimed identity. In identification, one has to establish the individual’s identity (Who am I?). Each of these approaches has its own characteristics and could be solved best by biometric systems.

All biometric technology systems have certain aspects in common. All are de-pendent upon an accurate reference or enrollment data. If a biometric system is to identify or to authenticate an individual, it first must have these reference data posi-tively linked to the subject. Modern biometric identification systems, based on digital technologies, analyze personal physical attributes at the time of enrollment and dis-till them into a series of numbers. Once this reference sample or template is in the system, future attempts to identify an individual rest on comparing “live” data to the reference data.

A perfect system would always recognize an individual, and always reject an impostor. However, biometric data are gathered from individuals under environmen-tal conditions that cannot always be controlled, over equipment that may slowly be wearing out, and using technologies and methods that vary in their level of precision. Consequently, an ideal behavior of biometric systems cannot be realized in practice. Traditionally, the probability that an authorized individual is rejected by a biometric system is called False Rejection Rate (FRR), and the probability that an unauthorized individual is accepted by a biometric system is called False Acceptance Rate (FAR). There are also other performance measures that characterize biometric systems. For for an excellent overview and similar issues see Jain et al. [36], Maltoni et al. [21], or Wayman et al. [85].

Although biometric technologies have their advantages when they are applied in access control systems, privacy aspects of biometric data should not be ignored. Identification and authentication require storage of biometric reference data in some way. However, people feel uncomfortable with supplying their biometric information to a huge number of seemingly secure databases for various reasons, such as

• practice shows that one cannot fully trust an implementation of secure algo-rithms by third parties. Even governmental organizations that are typically

(24)

trusted by the majority of the population cannot always guarantee that impor-tant sensitive data are securely stored;

• databases might be attacked from inside, which allows an owner of a database to abuse biometric information, for example, by selling it to third parties; • people have limited resources of biometric data, that can be conveniently used

for access control. Therefore an “identity theft” of biometric information has much more serious implications than a “simple” theft of a credit card. In the latter case, one can simply block and replace this credit card, while biometric information cannot be easily revoked and replaced by other biometric informa-tion.

It is often argued that privacy need not be a real issue in biometric systems, since biometric data are not secret and can easily be captured (faces, irises) or left in public (fingerprints), see Schneier [65]. However, this information, unlike the reference data, is typically of low quality and therefore cannot be easily used for impersonation. Even if it was of good quality, which might be the case with faces, connecting it to the corresponding database is not always an easy task.

Another important point is, that obtaining biometric data of a specific person as well as any other secret information belonging to him, is always possible when suf-ficient effort is exerted. In contrast, compromising a database, requires a comparable effort, but then provides immediate access to the biometric data of large number of individuals. Therefore it makes sense to concentrate on protecting the database. It would be ideal if, in case the database becomes public, the biometric reference data could not be recovered.

1.2.2 Physical Unclonable Functions

Physical Unclonable Functions (PUFs) were first discovered and studied in Pappu [52]. Pappu used the name “physical one-way” functions for PUFs. Later, the name was changed to “physical random functions” and to “physical unclonable functions”, see Gassend et al. [30]. This was done to avoid confusion since PUFs do not match the standard definition of one-way functions, see e.g. Schneier [64]. A PUF is defined as a function that maps challenges to responses and is embodied by a physical device. The properties of PUFs are

• the response to challenge is easy to obtain;

• they are hard to characterize, i.e. given physical measurements of a PUF, an attacker can only extract a negligible amount of information about the response to a randomly chosen challenge.

(25)

PUFs can be used to obtain unique, tamper resistant and unforgeable identifiers from physical structures. This was observed by Pappu [52]. Uniqueness implies that the number of independent degrees of freedom in the output space should be large. Tamper resistance means that the output of the physical system is very sensitive to changes in the challenge or in the system itself. Finally, unforgeable stands for the property of the system to be very difficult to clone in such a way that the cloned version produces an identical response to all challenges.

From the above we can conclude that PUFs can be regarded as a particular metric modality that comes from inanimate objects. However, unlike standard bio-metric modalities, for PUF-based systems privacy is not a major point of concern. Unlike human biometrics, PUFs can be easily replaced. The main problem with us-ing PUFs lies in their noisy nature and therefore can be formulated as extraction of secure keys out of noisy data.

In the first part of this thesis we will focus on standard biometrics and on the corresponding privacy problems, while in the second part, we will investigate PUFs without considering privacy issues.

1.3 From Traditional Biometric Systems to Biometric

Se-crecy Systems

1.3.1 Types of Security

To assess cryptographic protocols, two notions of security are commonly used, i.e. information-theoretical security and computational security.

Computationally secure protocols rely on such an assumption as hardness of mathematical problems, e.g. factoring and taking discrete logarithms, and assume that an adversary has bounded computing power. However, hardness of a problem is sometimes difficult to prove, and in practice certain problems are “assumed” to be hard.

Protocols whose security does not rely on computational assumptions, i.e. they are secure even when the adversary has unbounded computing power, are called un-conditionally or information-theoretically secure. Information-theoretically secure protocols are more desirable, but not always achievable. Therefore, in practice, cryp-tographers mostly use computational security.

In the present thesis we will treat security from an information-theoretical point of view. The key mathematical concept on which information theory is built and which is also relevant for considering information-theoretical security, is entropy. The no-tion of entropy comes from Shannon [68]. Entropy is a measure of the informano-tion contained in a random variable. Although there are a number of alternative entropy concepts, e.g. Rényi and min-entropy (Rényi entropy of order 2) [59], and smooth

(26)

Rényi entropy [58], we will only use the classical (Shannon) notion of entropy here. Another Shannon-type concept is that of mutual information. Mutual information measures by how much the entropy of the first random variable decreases if access to the second random variable is obtained, and this notion can be defined in terms of entropies. For the exact definitions, properties and their proofs of entropy and mutual information we refer to Shannon [68] or e.g. Cover and Thomas [13].

An interesting special case of information-theoretical security is perfect security. This concept was introduced by Shannon [69]. He defined a secrecy system to be perfect if the mutual information between plaintext M and ciphertext C satisfies

I(M;C) = 0, (1.1)

i.e. if a ciphertext C, which is a function of a plaintext M and a secret key K, provides no information about the plaintext M, in other words, if C and M are statistically independent. Shannon proved that perfect secrecy can only be achieved when the key-entropy and plaintext-entropy satisfy

H(K) ≥ H(M). (1.2)

An example of a perfectly secure system is the one-time pad system, also referred to as the Vernam cipher [82]. In one-time pad, a binary plaintext is concealed by adding modulo-2 (XOR-ing) a random binary secret key.

In practice it is quite possible and common for a secrecy system to leak some information. Although such a system is not perfectly secure, it can be information-theoretically secure up to a certain level.

1.3.2 Biometric Secrecy Systems with Helper Data

Biometric secrecy systems in which the stored reference data satisfy certain secrecy and privacy constraints can be realized using the notion of helper data. In the next subsections we will follow an intuitive discussion that will eventually introduce us to systems in which helper data are applied. After that we will discuss two applications in which helper data play a role.

Noisy Passwords and Helper Data

A perfect system for a secure biometric access control has to satisfy three require-ments. Biometric data have to be private, namely, the reference information stored in a database should not reveal the actual biometric data. Reference data that are communicated from a database to a point where access can be granted have to be resilient to eavesdropping. Reference data stored in a database have to be resilient to guessing, i.e. to brute-force attacks.

(27)

A simple naive approach to satisfy both the first and the second requirements would be to use the biometric data as a password in a UNIX-password authentication scheme. In such a scheme, a user possesses a password x that gives access to his account. There is a trusted server that stores some information y = f (x) about the password. The user gains access to the account only if he enters the password x0_,

such that f (x0_{) = y. The scheme has the requirement that nobody can figure out the}

password x from y in any way other than by guessing. To fulfill this requirement, a UNIX-password scheme relies on one-way functions. A one-way function f (·) is a function that is easy to compute but “hard to invert”, where “hard to invert” refers to the property that no probabilistic polynomial-time algorithm can compute a pre-image of f (x) with a better than negligible probability when x is chosen at random.

Thus, if we would use the UNIX-password authentication scheme and apply a one-way function to the biometric data, the storage of biometric data in the clear would be circumvented. However, there are a number of problems that would arise if we use biometric data in the UNIX scheme. First, the security properties that are guaranteed by one-way functions rely on the assumption that x is truly uniform, while we know that biometric data are far from uniform, although they do contain random-ness of course. Moreover, one-way functions, as all cryptographic primitives, require their entries to be exactly reproducible for positive authentication1_{, while} biomet-ric data measurements are almost never identical. Therefore additional processing (e.g. error-correction and compression) is needed to realize a biometric UNIX-like authentication scheme that can tolerate a reasonable amount of errors in biometric measurements and results in uniform entries to the one-way function. One way of operating would be to use a collection of error-correcting codes such that for each observed biometric enrollment template there is a code that contains this template as a codeword. The index to this code is then stored in the database as helper data. Upon observing the individual for a second time, the helper can then be used to re-trieve the enrollment template from the authentication template. The error-correcting code should be strong enough to correct the errors between the enrollment and au-thentication templates. From this we may conclude that error-correcting techniques and helper data can be applied to combat errors. Subsequently compression methods can be used to achieve almost uniform entries.

Now that we have argued that helper data could be used to create a reliable sys-tem, the question arises what requirements ideal helper data should satisfy. Since helper data need to be stored (and communicated) for authentication, it would be ad-vantageous if they could be made publicly available without compromising or leaking any information about the data that are used to get access to the system. We say that

1_{Positive authentication can also be a result of an entry that produces a collision. However, here we}

do not consider collisions, since this is a problem associated with the design of one-way functions and therefore beyond the scope of this thesis.

(28)

secrecy leakage from the helper data has to be negligible. Note that these data could be obtained using a one-way function as in the UNIX-scheme, but better procedures may exist as well. On the other hand, the helper data should leak as little information as possible about the observed biometric enrollment template. This would reduce privacy-related problems. Note that it might be impossible to make this leakage neg-ligible, since helper data should contain some information about the biometric data in order to set up a reliable system. It will become clear later in this thesis that a notion of secret key sharing originated from Information Theory (see Ahlswede and Csiszár [4]) will be essential in designing and analyzing biometric systems in which public helper data is used. For these secret-key sharing systems, the problem of max-imizing the size of the extracted secrets (the data needed to get access) was solved. This provides the solution for our third requirement, resilience to guessing.

In what we have discussed up to now, we have always assumed that keys were obtained as a result of a one-way operation on a password or on a biometric template. A biometric system would however be more flexible if we could choose the keys ourselves. We will show that the helper-data construction will make this possible. In the rest of the thesis we will therefore distinguish between generated-key systems and chosen-key systems. Sometimes their performance will not differ that much, but in other situations the differences can be dramatic.

In the next two subsections, we will shortly discuss two applications of biometric access with helper data. In the first application the secret key is stored in the database in an encrypted form, while in the second application the key is discarded.

Application A: Biometric Access

A general protocol for secure authentication can be schematically represented as the diagram in Fig. 1.1. A typical authentication procedure reads as follows.

During enrollment, the biometric data of a subject are captured and analyzed, and the template XN_{is extracted. A secret K is chosen or generated from these data. Then}

the template XN _{is linked to the key K via a helper message M. The key is encrypted}

using a one-way function and stored in a database as f (K), together with an ID of the subject and the helper message M.

During authentication, the subject claims his identity (ID). His biometric data are captured and preprocessed again, resulting in the template YN. The key bK is es-timated based on YN _{and the helper message M that corresponds to the claimed ID.}

This estimated key is encrypted and then matched against the encrypted key f (K) corresponding to the claimed ID. Only if f ( bK) is the same as f (K) the subject is positively authenticated.

(29)

-6 6 ? ? 6 6 6 6 -6 6 6 6 6 f (K) = f ( bK) f ( bK) f (K) Authentication Enrollment YES/NO ID ID ID ID f (K) b K K M M YN XN K DATABASE

Figure 1.1: Secure authentication. The dotted arrow indicates the possibility that the secret key is chosen, not generated from XN_.

-? 6 6 -- -6 -6 Decryption Encryption EncK EncK b K K M M plaintext plaintext Network K XN _YN Enrollment Authentication

Figure 1.2: Secure encryption. Dotted arrow indicates the possibility that the secret key is chosen, not extracted from XN_.

Application B: Biometric Encryption

Another system of interest is a system that uses biometric based keys for encryption. A protocol for biometric based encryption is depicted in Fig. 1.2.

The first step of biometric based encryption is similar to the enrollment proce-dure in the authentication protocol, viz. biometric data of a subject are captured and

(30)

analyzed, and a template XN_{is derived. Then, a secret key K is extracted/chosen and}

linked to the template via a helper message M. This secret is used to encrypt a plain-text m as EncK(m) (here encryption is assumed to be symmetric). This encrypted

plaintext and the helper message are either stored or transmitted, while the key is discarded.

In the decryption phase, in order to decrypt the plaintext, the subject provides a measurement of his biometrics. This measurement is preprocessed, resulting in a noisy template YN_{. The template and the helper message M are now used to derive a}

key bK. The key bK is used to decrypt the encrypted plaintext EncK(m). The decryption

is successful, viz. Dec_K_b(EncK(m)) = m, only if bK = K, since symmetric encryption

is used.

Two Generic Settings

-6 -6 - Kb YN Decoder M (b) XN Encoder K 6 -6 YN Decoder M XN Encoder 6 6 (a) b K K

Figure 1.3: Generic settings, generated and chosen keys.

From the discussions above we may conclude that in order to design a good biometric secrecy system, it is enough to focus on a number of generic structures, i.e. models that constitute the core of any biometric secrecy system. These generic, secret-key sharing models can be subdivided into a class of models with generated keys, see Fig. 1.3(a), and a class of models with chosen keys, see Fig. 1.3(b). This subdivision also appears in the overview paper of Jain et al. [37]. In both models K is a randomly generated/chosen secret key, XN _{and Y}N _{are biometric enrollment}

and authentication sequences having length N, M is a helper message, and bK is an estimated secret key. The channel between an encoder and decoder is assumed to be public. We only assume that passive attacks are possible, namely, an attacker can see all public information but cannot change it. The information leakage is characterized in terms of mutual information, and the size of the secret keys in terms of entropy. The generic models must satisfy the following requirements

Pr(K 6= bK) ≈ 0 (reliability), (1.3) H(K)/N ≈ log |

K

|/N is as large as possible (secret-key rate), (1.4) I(K; M)/N ≈ 0 (secrecy leakage), (1.5)

(31)

I(XN; M)/N is as small as possible (privacy leakage). (1.6)

Remark: In this thesis log denotes logarithm to the base 2. Moreover, further in the thesis, we denote the generated and chosen keys by S and K, respectively.

1.4 Modeling Biometric Data

Throughout this thesis we assume that our biometric sequences (feature vectors) are discrete, independent and identically distributed (i.i.d.). Fingerprints and irises are typical examples of such biometric sources. A discrete representation of other bio-metric modalities can by obtained using quantization. The independence of biomet-ric features is not unreasonable to assume, since PCA, LDA and other transforms, which are applied to biometric measurements during feature extraction (see Wayman et al. [85]) result in more or less independent features. In general, different compo-nents of biometric sequences may have different ranges of correlation. However for reasons of simplicity we will only discus identically distributed biometrics here.

In Chapters 2, 4 and 6 we consider stationary ergodic biometric sources. Whether biometric data sources can be modeled as stationary and ergodic is still a research question, however, there are some indications that irises, fingerprints and DNA can be considered to be stationary ergodic, see [2]. On the other hand, PUFs are modeled as stationary ergodic processes. Indeed from Feng et al. [26] we know that the two-point intensity correlations in a speckle pattern are translation invariant. Moreover, these processes are also ergodic due to the fact that the spatial distribution of intensities is the same as the PUF ensemble distribution of intensities, see Goodman [31].

In the first part of the thesis, we assume that our biometric secrecy systems are based on a biometric source with distribution {Q(x, y), x ∈

X

, y ∈

Y

}. This source produces enrollment sequences xN = (x1, x2, . . . , xN) of N symbols from the finite

alphabet

X

and authentication sequences yN_{= (y}

1, y2, . . . , yN) of N symbols from the

finite alphabet

Y

. When a sequence pair (xN_{, y}N_{) comes from the same person, then}

it is characterized in terms of joint probability distribution {Q(x, y), x ∈

X

, y ∈

Y

}. In that case the biometric sequences XN _{and Y}N _{are in general not independent of}

each other. However, when in the sequence pair (xN_{, y}N_{) the sequences come from}

different persons, the pair is characterized in terms of {Q(x)Q(y), x ∈

X

, y ∈

Y

}, where Q(x) and Q(y) are marginals of Q(x, y). Therefore the biometric sequences XN _{and Y}N _{that come from different persons are assumed to be independent. These}

assumptions can also be observed in Schmid and Nicolo [62], where biometric system capacity is studied under global PCA and ICA encoding.

(32)

1.5 Related Work

Privacy concerns related to the use of biometric data in various secrecy systems are not new. Schneier [65] pointed out that biometric data are not standard secret keys that cannot be easily canceled. Ratha et al. [56] investigated the vulnerability points of biometric secrecy systems. In Prabhakar et al. [54] security and privacy concerns were raised. Linnartz and Tuyls [45] looked at the problem of achieving biometric systems with no secrecy leakage. Finally, at the DSP forum [83] secrecy- and privacy-protecting technologies were discussed. The extent to which secrecy and privacy problems were investigated in literature also received attention there.

Considerable interest in the topic of biometric secrecy systems resulted in the proposal of various techniques over the past decade. Recent developments in the area of biometric secrecy systems led to methods grouped around two classes: cancelable biometrics and “fuzzy encryption”. Detailed summaries of these two approaches can be found in Uludag et al. [80] and in Jain et al. [37].

It is the objective of cancelable biometrics, introduced by Ratha et al. [56], [57], Ang et al. [7], and Maiorana et al. [46], to avoid storage of reference biometric data in the clear in biometric authentication systems. These methods are based on non-invertible transformations that preserve the statistical properties of biometric data and rely on the assumption that it is hard to exactly reconstruct biometric data from the transformed data and applied transformation. However, hardness of a problem is difficult to prove, and, in practice, the properties of these schemes are assessed using brute-force attacks. Moreover, visual inspection shows that transformed data, e.g. the distorted faces in Ratha et al. [57], still contain a lot of biometric information.

The “fuzzy encryption” approach focuses on generation and binding of secret-keys from/to biometric data. Implementation of “fuzzy encryption” includes methods based on various forms of Shamir’s secret sharing [67]. These methods are used to harden passwords with biometric data (Monrose et al. [49], [48]). Methods based on error-correcting codes, that bind uniformly distributed secret keys to biometric data and that tolerate (biometric) errors in these secret keys, were formally defined by Juels and Wattenberg [41]. Less formal approaches can be found in Davida et al. [19], [18]. Error-correction based methods were extended to the set difference metric developed by Juels and Sudan [40]. Some other approaches focus on continuous biometric data and provide solutions which are based on quantization of biometric data as in Linnartz and Tuyls [45], Denteneer et al. [20] (with emphasis on reliable components), Teoh et al. [74], and Buhan et al. [10].

Finally, a formal approach for designing secure biometric systems for three metric distances (Hamming, edit and set), called fuzzy extractors, was introduced in Dodis et al. [22] and further elaborated in [23]. Dodis et al. [22], [23] were the first ones who addressed the problem of code construction for biometric secret-key generation in a systematic information-theoretical way. Although their works provide results on

(33)

the maximum secret-key rates in biometric secrecy systems, they also give the cor-responding results for the maximum privacy leakage. In biometric setting, however, the goal is to minimize the privacy leakage. The need for quantifying the exact in-formation leakage on biometric data was also stated as an open question in Sutcu et al. [73].

Another branch of work on “fuzzy encryption” focuses on combination of bio-metric and cryptographic keys. Methods in this direction include attempts to harden the fuzzy vault scheme of Juels and Sudan [40] with passwords by Nandakumar et al. [50] and dithering techniques that were proposed by Buhan et al. [9].

1.6 Thesis Organization

In the current thesis we study a number of problems related to the design of biometric secrecy systems.

First of all we address the problems of what the fundamental trade-off between the secret-key rate and the privacy leakage is in biometric secrecy systems that ex-tract or convey secret keys, and what the methods are to achieve optimal systems. Chapter 3 is devoted to these problems and is the main chapter of this thesis.

The results obtained in Chapter 3 are further used to assess the optimality of a popular existing technique for designing biometric secrecy systems, i.e. fuzzy com-mitment, which was proposed by Juels and Wattenberg [41]. We study the properties of fuzzy commitment in Chapter 4.

Then we focus on a problem that needs to be addressed before any practical biometric secrecy system is built, viz. how much secret information can be extracted or conveyed with a certain biometric modality. In Chapter 5 we describe the methods that we use to estimate this amount of secret-key information. Moreover, since to design codes that achieve a nearly-optimal performance we need to know the statistics of the biometric source, we also study in this chapter the problem of how to find the model that matches best a pair of observed biometric sequences.

Then, in Chapter 6, we concentrate on the estimation of maximum secret-key rates for two-dimensional biometric sources. We use the techniques described in Chapter 5 and perform a series of experiments to estimate the secret-key rates for optical PUFs.

Next, we present in detail the content of the chapters that compound this thesis.

1.6.1 Chapter 2: Secret Sharing and Biometric Systems

Chapter 2 is mainly a review chapter that sets theoretical grounds for our investi-gation of secret-key rates and privacy leakage in biometric secrecy systems. In this chapter we revisit the classical Ahlswede and Csiszár [3] and Maurer [47] problem of

(34)

generating a secret from two dependent sequences but in the biometric setting. Here the biometric source is assumed to be discrete memoryless. The maximum secret-key rate that is achievable for this model is equal to the mutual information between a biometric enrollment sequence XN_{and a biometric authentication sequence Y}N_{, i.e.}

I(X;Y ). Although this result was already proved using strong typicality by Ahlswede and Csiszár [3], we provide our version of the proof, which is based on weak typi-cality. This proof will be the core part of the achievability proofs given in Chapter 3 where we deal with more general biometric settings.

Then, as a warm-up, we derive a characterization for privacy leakage for the biometric secret generation systems, which achieve the maximum secret-key rates with codes determined in our achievability proof.

Moreover, we discuss how typical biometric performance measures, i.e. the FRR and the FAR, relate to the results obtained for the biometric secret generation model. We show that these error probabilities can be made arbitrarily small for positive secret-key rates smaller than or equal to I(X;Y ). Furthermore, we argue that the FAR for the biometric secret generation model can be characterized in terms of the identification capacity of a typical biometric identification system with no security constraints.

Finally, we extend the i.i.d. results derived in this chapter to stationary ergodic sources.

1.6.2 Chapter 3: Privacy Leakage in Biometric Secrecy Systems

In Chapter 3 we continue to study secret-key rates and privacy leakage. There, how-ever, we concentrate on a more general situation. One of the challenges in designing biometric secrecy systems is to minimize the privacy leakage for a given secret-key rate. Therefore, in Chapter 3, we focus on finding the fundamental trade-off between secret-key rates and privacy leakage. In this chapter we assume that our biometric source is discrete memoryless.

Since biometric secrecy systems can be designed as those where secret keys are generated from biometric data but also as those where secret keys are bound to bio-metric data, see the overview paper of Jain et al. [37] and our discussions above, we investigate both types of systems, i.e. biometric secret generation systems and biometric systems with chosen (bound) secret keys.

We consider four biometric settings. The first one is again the standard Ahlswede-Csiszár secret-generation setting. Now, however, we have the requirements that the helper data should not only contain a negligible amount of information about the secret, but also should leak as little information as possible about the biometric data. In the second setting we consider a biometric model with chosen keys where the secret key is not generated by the terminals but independently chosen at the encoder side and conveyed to the decoder. This model has the same requirements as biometric

(35)

secret generation model.

The other two biometric settings that we consider correspond to biometric secrecy systems with zero privacy leakage. Ideally, biometric secrecy systems should leak a negligible amount of information not only on the secret but also on the biometric data. However, in order to be able to generate or convey large secret keys reliably, we have to send some data (helper data) to the second terminal. Without any precautions, the helper data leak a certain amount of information on the biometric data. In this way biometrics solely may not always satisfy the security and privacy requirements of certain systems. However, the performance of biometric systems can be enhanced using standard cryptographic keys. In our models we assume that only the two termi-nals have access to an extra independent private key, which is observed together with the correlated biometric sequences. The private key is used to achieve a negligible amount of privacy leakage (zero leakage). We investigate both the secret generation model with zero-leakage and the model with chosen keys and zero-leakage.

All the four settings that we have described are studied for two types of leak-age, i.e. unconditional and conditional privacy leakage. Unconditional leakage cor-responds to the mutual information between the helper data and the biometric en-rollment sequence and describes the information that the helper data leak about the biometric data. The second type of leakage, the conditional one, relates to the mutual information between the helper data and the biometric enrollment sequence condi-tioned on the secret. This type of privacy leakage is motivated by the fact that the helper data may provide more information on the pair of secret key and biometric data than on each of these entities separately. Therefore we have to consider the mu-tual information between the pair of secret key and biometric enrollment sequence and the helper data. This mutual information has to be as small as possible. In Chap-ter 3 we show that this requirement on the leakage on the pair can be reformulated in terms of conditional privacy leakage that has to be minimized and secrecy leakage that has to be negligible.

The four described biometric settings combined with two notions of privacy leak-age result in eight biometric models. In Chapter 3 we determine the fundamental trade-offs between secret-key rates and privacy-leakage rates, and secret-key rates and private-key rates for all eight models. The result obtained for the first set-ting is similar and a special case of the secret-key part of Thm. 2.4 in Csiszár and Narayan [16].

For systems without a private key the achievable regions that we find are all equal, except for the chosen-key model with conditional leakage where the achievable gion is in principle smaller. Similarly, for zero-leakage systems the achievable re-gions are all equal, except for the chosen-key model with conditional leakage. In the latter case, it is important to note that from the derived region we may conclude that the biometrics are actually useless. Generally, in zero-leakage systems the secret-key

(36)

rate cannot be smaller than the private-key rate.

The achievability proofs that we provide in Chapter 3 suggest that optimal codes should incorporate both vector quantization methods and Slepian-Wolf techniques.

1.6.3 Chapter 4: Leakage in Fuzzy Commitment Schemes

Chapter 4 is devoted to the analysis of the properties of fuzzy commitment, intro-duced by Juels and Wattenberg [41]. Fuzzy commitment is a particular realization of a binary biometric secrecy system with chosen secret keys. It became a popular technique for designing biometric secrecy systems, since it is convenient and easy to implement using standard error-correcting codes. However, fuzzy commitment is primarily designed for binary uniform memoryless biometric sequences, and it is provably secure for this case.

In Chapter 4 we focus on the achievable regions for fuzzy commitment. We investigate fuzzy commitment when the biometric data statistic is memoryless and totally-symmetric, memoryless and input-symmetric, memoryless, and stationary er-godic. Unlike in Chapter 3, where we obtain the regions of secret-key vs. privacy leakage pairs, the regions for fuzzy commitment are given for triples of achievable secret-key, secrecy-leakage, and privacy-leakage rates.

We could determine the achievable regions when data statistics are memoryless and totally-symmetric, and memoryless and input-symmetric. For the general mem-oryless and stationary ergodic cases we cannot determine the achievable rate-leakage regions, and we only provide outer bounds on the corresponding regions. These bounds, moreover, can be sharpened for systematic parity-check codes.

Given the achievable regions (bounds), the optimality of fuzzy commitment in terms of secret-key vs. privacy-leakage balance is assessed using the fundamental secret-key vs. privacy-leakage rate trade-offs found in Chapter 3, if we “project” the fuzzy-commitment regions on secret-key vs. privacy-leakage rate plane. It turns out that the fuzzy commitment scheme is only optimal for the totally-symmetric memory-less case and only if the scheme operates at the maximum secret-key rate. Moreover, for the general memoryless case the scheme reveals more information than necessary on both the secret and biometric data.

To assess the stationary ergodic case, we use the results obtained in Chapter 2. Then we compare the fuzzy commitment scheme to a two-layer scheme for stationary ergodic biometric sources. The latter scheme is based on a biometric secret gener-ation model with a masking layer on top of it. It appears that the two-layer scheme enjoys better properties. Hence we may conclude that also for the stationary ergodic case the scheme reveals more than necessary information on both the secret and bio-metric data.

(37)

1.6.4 Chapter 5: Context Weighting And Maximizing Using Ratio Rep-resentation

In order to design codes that achieve near-optimal performance according to the guidelines stated in Chapter 3, we need to know the statistics of the biometric source. The statistics of the source is determined by the model (structure) of the biometric source and the probabilities which the biometric source uses to generate symbols. Given a model of the source we can partition an observed biometric sequence into subsequences according to the model and then calculate the empirical probabilities in these subsequences as a fraction of digits that occur in the subsequences. The main problem therefore is to find the model of the biometric source.

In Chapter 5 we derive the procedure to find the best model that matches an ob-served biometric sequence pair. This procedure is based on the universal source cod-ing method, i.e. on the Context-Tree Weightcod-ing (CTW) method of Willems, Shtarkov, and Tjalkens [88]. In order to obtain an efficient procedure, we focused on the im-plementation of the CTW method based on ratios of block probabilities, proposed in Willems and Tjalkens [91]. Our procedure for finding the best model therefore uses these ratios. The best model for an observed biometric sequence turns out to be the maximum a posteriori (MAP) model. In Chapter 5 we describe a procedure for deriving the MAP-model for two classes of general finite context sources, i.e. for tree sources and for the so-called class III models. The general finite context sources were described in [90]. The class III weighting methods are based on a richer model than tree sources, and therefore with this class we may obtain more reliable model estimates and, consequently, parameters of the source.

1.6.5 Chapter 6: Secret-Key Rate Estimation Based on Context Weight-ing Methods

In Chapter 6 we study a problem that has to be addressed before any practical bio-metric secrecy system is built, viz. how much secret information can be extracted or conveyed with a certain biometric modality. In Chapters 2 and 4 it was argued that the maximum secret-key rate in biometric secret generation systems and biometric systems with chosen keys is equal to the mutual information between the biometric enrollment and authentication sequences. These results hold for both i.i.d. biometric sources and stationary ergodic sources. Thus we have to estimate the mutual infor-mation between the biometric enrollment and authentication sequences. In Chapter 6 we focus on stationary ergodic biometrics.

The CTW method that we discuss in Chapter 5 can be used to estimate the re-quired mutual information. Since the CTW method finds a good coding distribution and therefore the resulting codeword has a small redundancy, the codeword length, divided by the length of the source sequence, gives a good estimate of the entropy.

(38)

Thus while estimating the mutual information we can focus on estimating the corre-sponding entropies.

Biometric data such as iris codes, fingerprint minutiae maps, face patterns, PUFs, etc. are often modeled as realizations of two-dimensional processes, see e.g. Jain et al. [36]. Therefore we concentrate on estimating the mutual information and, corre-spondingly, the entropy for two-dimensional sources.

In order to apply CTW methods, we first show that the entropy of a stationary two-dimensional source is a limit of a series of conditional entropies. A similar result was obtained by Anastassiou and Sakrison [6]. Then we extend this result to the conditional entropy of one two-dimensional source given another one. Finally, we demonstrate that the CTW method also approaches the source entropy in the two-dimensional stationary ergodic case. This result carries over to conditional entropies and joint entropies in the two-dimensional stationary ergodic case.

Using the CTW methods and the results discussed in Chapter 6, we further esti-mate the maximum secret-key rate of speckle patterns from optical PUFs. We use the CTW method, referred to as class IV, and the class III context weighting method to obtain maximum secret-key rate estimates. We show that class III context weighting methods give more reliable and slightly higher estimates of the secret-key capacity than class IV methods. This result can be explained by noting that, on one hand, class III context weighting methods are based on a richer model class than class IV meth-ods, but on the other hand, the size of PUF-sequences is large enough to compensate for the model redundancy.

The estimates that we obtain in Chapter 6 can be used to evaluate not only the suitability of a biometric modality for a certain system but also the performance of existing methods for secret-key extraction and secret-key binding.

1.7 Publications by the Author

1.7.1 Book Chapters

[BC-1] T. Ignatenko, F. Willems, G.J. Schrijen, B. Škori´c, and P. Tuyls, In Book Security with Noisy Data: Private Biometrics, Secure Key Storage and Anti-Counterfeiting. Springer, 2007. Chapter 13. Entropy Estimation for Optical PUFs Based on Context-Tree Weighting Methods, pp. 217-234.

See Chapter 6.

[BC-2] P. Tuyls, G.J. Schrijen, F. Willems, T. Ignatenko, and B. Škori´c. In Book Security with Noisy Data: Private Biometrics, Secure Key Storage and Anti-Counterfeiting. Springer, 2007. Chapter 16. Secure Key Storage with PUFs, pp. 269-292.

(39)

1.7.2 Journals

[JP-1] T. Ignatenko and F. Willems, “Leakage in Fuzzy Commitment Schemes,” sub-mitted to IEEE Transactions on Information Forensics and Security, 2009. See Chapter 4.

[JP-2] T. Ignatenko and F. Willems, “Biometric Systems: Privacy and Secrecy As-pects,” submitted to IEEE Transactions on Information Forensics and Security, September 19, 2008.

See Chapter 3.

1.7.3 Conference Proceedings

[IC-1] T. Ignatenko and F. Willems, “Privacy Leakage in Biometric Secrecy Sys-tems,” Proc. of 2008 Forty-Sixth Annual Allerton Conference on Communica-tion, Control, and Computing, 23-26 September 2008, Monticello, IL, USA. See Chapter 3.

[IC-2] T. Ignatenko and F. Willems, “On Privacy in Secure Biometric Authentication Systems,” Proc. of 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), 15-20 April 2007, Honolulu, HI, USA, pp. 121-124 (finalist of the best student paper award).

See Chapters 2 and 4.

[IC-3] T. Ignatenko, G.J. Schrijen, B. Škori´c, P. Tuyls, and F. Willems, “Estimat-ing the Secrecy-Rate of Physical Unclonable Functions with the Context-Tree Weighting Method,” Proc. of 2006 IEEE International Symposium on Infor-mation Theory, 9-14 July 2006, Seattle, WA, USA, pp. 499-503.

See Chapter 6.

[IC-4] T. Ignatenko and F.Willems, “On the Security of XOR-Method in Biometric Authentication Systems,” Proc. of the Twenty-Seventh Symposium on Infor-mation Theory in the Benelux, 8-9 June 2006, Noordwijk, The Netherlands, pp. 197-204.

See Chapters 2 and 4.

[IC-5] F.M.J. Willems,T.J. Tjalkens, and T. Ignatenko, “Context-Tree Weighting and Maximizing: Processing Betas,” Proc. of Inaugural Workshop ITA (Informa-tion Theory and its Applica(Informa-tions), 6-10 February 2006, UCSD Campus, La Jolla, CA, USA.

See Chapter 5.

[IC-6] T. Ignatenko, A.A.C.M. Kalker, M. van der Veen, A. Bazen, “Reference Point Detection for Improved Fingerprint Matching,” Proc. of SPIE: Security,