Membership Inference Attacks on Differentially Private Machine Learning Models

(1)

MIA on Differentially

private Machine Learning

models

(2)

Layout: typeset by the author using LA_TEX.

(3)

MIA on Differentially

private Machine Learning models

part of the Machine Learning Privacy Audit Framework

Aristides A. Stamatiou 11038292

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor S. Amiri, M.Sc.

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 907 1098 XG Amsterdam

(4)

Abstract

As Machine Learning (ML) models are increasing in popularity by the day, more and more fields are using these models for various different applications. With the rise of these models in medical appliances, combined with the necessity of vasts amounts of data to train these models, recent developments in the field of membership inference attacks pose a severe privacy and security threat. Methods like Differential Privacy (DP) have been proposed to mitigate the information con-tained in ML models by adding noise during the training of the model. This thesis aims to investigate the potential of DP to mitigate the vulnerability of a ML model to a membership inference attack. This is done by implementing the membership inference attack proposed by Shokri et al. [14] and testing its effectiveness on a ML model trained with increasing levels of DP.

Keywords. Membership Inference Attack, Machine Learning, Privacy Preserving Machine Learning, Differential Privacy

(5)

Chapter 1 Introduction

Machine Learning (ML) has become increasingly popular as a data-crunching tech-nique over the last decade. ML allows users to to build complex mathematical functions that map input samples to certain outputs. These models can be used for various different applications like classifying images [11], medical diagnosis[7] to speech recognition. As more data is available for digital processing and therefore usable in these ML applications, their accuracy and utility is ever increasing. Due to this growing interest in and use of ML techniques, companies like Google and Amazon are offering easy to implement Machine Learning as a Service (MLaaS) products for many of these tasks. These applications only requires the user to upload data and the model will do the rest. This allows users to train their models online and easily give access to others to query their model. This access is usually granted as black-box access in which a user is only able to query the model with a data record but is not given access to its inner workings. The model will produce a prediction vector based on the provided sample which is then returned to the user.

One downside of this trend of training ML models online (through MLaaS) is the fact that ML models are trained in the cloud by users that don’t necessarily have expert knowledge about the risks of such models as they only need to upload their data and the rest will be done for them.

Whilst this can be great, it also poses a threat as information can be leaked through these models. Especially in the example when a user gives black-box access to outsiders. Information leakage can have different forms such as model in-version[4], reconstruction attacks or membership inference attacks[12]. This thesis will be focusing on the latter, the membership inference attack.

In a membership inference attack, the adversary (the attacker) wants to assess whether a data point was part of the training data of the victim ML model. A MIA pipeline has been depicted in Figure 1.1 in which a COVID-19 classifier is

(8)

attacked. The target model is queried with a record, in this case a chest x-ray, it produces a prediction vector containing posteriors for each class. This is then fed into the attack model which classifies the sample as either ’inside’ or ’outside’ the training dataset of the COVID-19 classifier.

Being able to get this kind of information can cause serious privacy concerns when the ML model in question is a classifier trained on, for instance, medical data. Being able to classify such records would allow the adversary to know whether a person has been used in the training of certain models, and reveal personal information about that individual. It could also be used to test whether illegal data has been used in the training of a ML model. This is a great benefit for authorities like the European Data Protection Board as it allows them to check ML models for instances of illegal training data, something which, due to the opaque nature of ML models, has been very difficult to do in the past.

The field of privacy preserving machine learning is concerned with developing methods to mitigate these risks in ML applications. In general there are two categories within this field; Perturbation based privacy and cryptography [16].

Cryptography relies on the encryption of either the model, the training data or the output of the model. While this allows for privacy from outside attacks, it still allows an inside adversary to gain access to sensitive data. Besides this, cryptography is prone to high computational cost due to the mathematical nature of most encryption methods.

Perturbation based methods use noise to modify random records within the dataset to allow for plausible deniability [3]. This would mean that if, for instance, a record from a sensitive database was leaked, the information in the record can be perturbed and thus there is a possibility that the information is incorrect, therefore it can be plausibly denied. This would allow more peace of mind for the people that have contributed to the database as the information that could be leaked could very well be wrong or changed thus resulting in less of a privacy hazard.

A modern perturbation based solution to mitigate this kind of information leakage, and thus the privacy risk, of ML models is the implementation of differ-ential privacy in the ML model. Differdiffer-ential privacy provides a mathematically provable guarantee of privacy protection against the identifiability of the presence or absence of an individual record in a dataset. It does this by providing a mathe-matical definition of privacy in the context of machine learning analysis. It is able to give a degree of privacy by providing an -value which is a metric of privacy loss at a differential change in the data. A differential change means the addition or subtraction of one entry. In essence it describes the difference between the output of the model when one entry is changed, e.g. one sample is taken from the training set.

(9)

sub-traction or addition of the record, the output of the model is virtually the same. A higher would mean that the output of the model differs more with the change of one record, therefore meaning the model is less private3.

This thesis will focus on attacking two target models; one trained without the use of differential privacy and one where differential privacy is used. The results will then be analysed to see the level of protection differential privacy offers against membership inference attacks. This thesis is part of the Machine Learning Privacy Audit Framework which researches the effects of privacy preserving ML models on attack such as model inversion, model extraction and membership inference.

Figure 1.1: A membership inference attack on a Covid-19 classifier

1.1 Research question

Earlier work by Shokri et al. covered the general method of membership inference attacks, this thesis introduces differential privacy to the equation and aims to answer:

Is Differential Privacy a valid method of protecting against membership infer-ence attacks?

Which can be answered by answering the following two sub-questions:

1. What is the effect of Differential Privacy on the vulnerability of a model to a membership inference attack?

2. What is the impact of adding Differential Privacy to a Convolutional Neural Network on its performance?

(10)

Chapter 2 Previous Work

2.1 Foundation for membership inference attacks

The paper ‘Membership Inference Attacks against Machine Learning Models’ by Shokri et al. [14]. kicked off the field of membership inference attacks (MIA) against Machine Learning (ML) models. With over 600 citations it is by far the most influential paper in this field of research.

The paper identified leakage of information through ML models by developing a method to assess whether a record was part of the training set of the target (ML) model. The method proposed required the adversary to have black-box access to the target model, access to a model of the same structure and access to data of the same general distribution as the training data of the target model.

The strategy proposed in the paper uses the fact that ML models tend to over fit on their training data. In essence this means that a ML model will react differently to a data record that it has seen before versus a completely new sample. It will be able to classify a previously seen sample with a higher confidence value as it has ’learned’ in its training to classify that sample correctly. The resulting prediction vector, which is returned after querying the model with a data record, will thus have a different distribution of posteriors than prediction vectors from never-before-seen data.

The proposed method consists of three different steps; First a dataset is con-structed, then multiple shadow models are trained, one per class, finally an attack model is trained on the outputs of these shadow models.

The shadow models are supervised ML models, with the same architecture as the target model, that are trained on a dataset of a similar distribution as the target model. Multiple shadow models are trained with the aim of mimicking the output of the target model but allowing the adversary to have knowledge about which samples were used in the training of the model. After the training of the

(11)

shadow model it is queried with its training data and the resulting output vectors are labeled as ’inside’ training samples. The same is done with the test data of the shadow model which is labeled as ’outside’ samples. For each shadow model these sets are constructed, they are then put together and will serve as training data for the attack model.

The attack model is a LightGBM model which is Gradient Boosting Decision Tree algorithm. As it only has to distinguish between two classes (namely ‘in’ or ‘out’) it has a binary output. The attack model is fitted on the generated dataset to learn the connection between the prediction vectors it gets as input and the correct ‘in’ or ‘out’ label.

A few key assumptions are made in this paper such as; The adversary needs to train shadow models with the same architecture as the target model, it requires the adversary to train multiple shadow models to mimic the target model behaviour and the adversary needs to have access to a dataset of the distribution. These assumptions restrict the feasibility of the attack in many real-life scenarios. This paper was however still selected to serve as the basis for the membership inference attack performed in this research due to its popularity and effectiveness. Recently a paper was published relaxing these assumptions and therefore increasing the fea-sibility of membership inference attacks in many scenarios. This paper is reviewed below.

2.2 Recent developments

The paper by Salem et al. [12] builds upon the foundation laid by the paper of Shokri et al. [14]. It presents three different attack strategies that gradually relax the assumptions made by Shokri et al. [14] to perform the attack. Figure 2.1 depicts a pipeline of the first attack method presented in this paper.

Figure 2.1: A membership inference attack on a COVID-19 classifier First a shadow model is trained on data of the same distribution as the target model. This attack therefore still assumes the adversary to have access to data of a similar distribution as the data that was used in the training of the target model.

(12)

The shadow training dataset is constructed by querying an image to the target model and then labeling the image with the corresponding prediction vector. The shadow model is then trained to mimic the target model by producing the same prediction vectors for the same images. The difference with the previous method is that only one shadow model is constructed which handles all the classes present in the target model dataset. Using only one shadow model greatly reduces the expense of training and achieves very similar performance[12].

The attack model is a binary supervised ML model which is trained on labeled prediction vectors. The attack model training set consists of labeled prediction vectors. Prediction vectors are labeled as ’in’ when they were used in the training of the shadow model and as ’out’ if they were not used in the training.

As stated before, this attack still requires the adversary to have data of the same distribution as the target model . This requirement make this attack less feasible in situations where data of the same distribution of the target model is unavailable, which can be the case in a black-box attack setting. In the case of synthetic data generation by querying the target model multiple times to gather prediction vectors produced by the target model and using these as input to train the shadow model to circumvent this assumption. A MLaaS provider that detects suspicious amounts of queries from one of its models, which would be required to get the required data for the shadow model, could block that user from querying the model which would render the attack unsuccessful.

The second attack that is proposed relaxes one of the most limiting assump-tions, namely that the adversary needs to have access to data of the same distribu-tion as the original training data. This might be possible when the target model is trained on publicly available data but, especially as we are focusing on data that is not supposed to be available to the public at all, when data is not readily available, like medical data for instance, it can be very difficult or even impossible to construct a dataset of the same distribution.

Figure 2.2: A data transferring membership inference attack

Figure 2.2 depicts the pipeline of a data transferring attack [12], this attack uses an attack model trained on data from a different distribution which gives it its name. It builds upon the previous attack which means it also requires the

(13)

construction of only one shadow model. The researcher show however that for the construction of this model, data of a different distribution can be used. So, for instance, to attack the COVID-19 classifier, a CIFAR-10 classifier could be used as the shadow model. The prediction vectors of this CIFAR-10 classifier would then be used to train the attack model. The attack model would still be able to classify prediction vectors from the COVID-19 classifier as either ’in’ or ’out’. In the paper a variety of different combinations of (publicly available) datasets are used and, as a general example, a model that is trained on the CIFAR-100 dataset gets above 80% precision for all the datasets except the Adult dataset (on which almost no dataset achieves high precision).

This relaxations has a great benefit for the feasibility of the attack as the adversary could train several models privately using publicly available data and attack any type with his privately trained models. The providers of MLaaS models would have no easy way to identify adversaries as the adversary does not need to query the model suspiciously often to construct his shadow training data if he has no access to data of the same distribution.

The third attack that is proposed is an attack that does not require the con-struction of shadow models. It solely takes as input the prediction vector produced by the target model and looks at the height of the posteriors. When a certain threshold is met it would be classified as ‘inside’ the training data, otherwise it would be classified as ‘outside’. The authors propose to base the height of the threshold on the area-under-the-curve value. The adversary should also decide which aspect to focus on, if the adversary wants to focus on inference precision a relatively high threshold should be picked. If the focus is on recall, a lower threshold should be selected.

Although the attacks presented in the paper by Salem et al [12] relax some key assumptions made in the paper by Shokri et al. and might be more feasible in real world solutions, the method by Shokri et al. was selected as the foundation for this thesis due to the high number of cites and its influence on the subsequent field of membership inference attacks.

(14)

Chapter 3 Differential Privacy

Differential privacy (DP) is a method that is used in ML models to minimize the chance of individual record identification. It provides a mathematical definition to describe how much an arbitrary substitution in a database effects the outcome of a query. In other words, if the effect of removing one single record of a database is small enough, the database as a whole can not be queried in such a way that individual information is exposed. By giving a mathematical definition it allows us to define the bounds of how much information can be revealed about someone’s data being present in a certain database. These bounds are denoted by (epsilon) and δ (delta) which are used to define the level of privacy of the database.

1. Privacy budget / privacy loss (): The -symbol is called the privacy budget. It provides insight into the loss of privacy of a differential privacy algorithm. Because it describes the privacy loss, a low indicates a higher level of pri-vacy. As can be seen in Figures 3.1 the tendency of a model to overfit as the number of epochs increases can be seen by the increase of the parameter for every new epoch. This means that with every new epoch, the model contains more of the training data. Thus meaning that there is a higher probability (illustrated by the increasing value of ) that one record is removed from the training data, the resulting model would change.

2. Probability to fail / probability of error (δ): δ is a constant that specify that amounts of ’bad events’ which can result in an unusual high privacy loss. δ defines the probability of the output revealing the identity of a particular individual, which can happen δ * n times where n is the number of records. In other words, since δ is defined by the user, it specifies how many ’bad events’ are allowed to happen. To minimize the risk of privacy loss, δ * n has to be maintained at a low value. For example, the probability of a bad

(15)

event is 1% when δ is 1/100 * n. Throughout this thesis the delta parameter was set as 1.

0.5

Figure 3.1: Epsilon and accuracy per epoch

In this research the facebook-dp module [15] was used to add differential privacy to the target models. The module was designed as a bolt-on module for models constructed using Pytorch. This means that adding DP is achieved by adding the DP module to the algorithm and setting its Sigma value, which decided the privacy budget. The module will then add Differentially Private Stochastic Gra-dient Descend (DP-SGD) [2] to the model which means that noise is added to the clipped gradients during the step in which Stochastic Gradient Descent (SGD) is applied. This means that during the optimizer phase of the training of the model, noise is added to the optimized gradients. All of the models used in this thesis have SGD as their optimizer method since this allows the facebook-dp module1 to be added to these models during their training.

(16)

Chapter 4 Experiments

4.1 General

All the attacks were trained using the Google Colab Notebooks structure which are cloud based notebooks using the Jupyter architecture. They allow the user to use the CUDA by offering access to NVidia GPU’s. CUDA is a parallel computing platform that allows the user to perform many calculations simultaneously which achieves massive speedups in the training of neural networks[9] as these require many gradients to be calculated at the same time during training.

The Pytorch1 _{and Tensorflow}2 _{libraries are used for the construction of the}

models which both offer CUDA support. Pytorch was a necessity as one of the initial requirements was for the code to be easily implemented in the Pysyft3 framework, which was an initial requirement for this project. It also offered a bolt-on solution to add DP-SGD to models constructed using the Pytorch library.

4.2 Performance Metrics

4.2.1 Precision

The precision is the percentage of the returned items, or classifications, that are relevant, e.g. the amount of true positives in the returned results. It can be calcu-lated by dividing the number of True positives of the result by the total number of returned results (thus True positives and False positives). If for instance we want to an attack model to classify ’inside’ training records. As input a set of 200 records is given, 100 ’inside’ records and 100 ’outside’ records. If the model returns

1

https://pytorch.org/

2_{https://www.tensorflow.org/}

3_{https://github.com/OpenMined/PySyft}

(17)

120 records of which 80 are ’inside’ training records, and thus True positives, the other 40 are ’outside’ training records, thus False positives, the precision of the model would be T rueP ositive T ruepositive + F alsepositive = 80 80 + 40 = 66.7%

4.2.2 Recall

The recall of a model is the amount of relevant documents that are returned by the model. So how well is the model able to find all the positive samples. It can be calculated by dividing the amount of True positives in the returned results by the total amount of positives instances in the dataset. So if we take our previous example where there are 100 ’inside’ samples and our model classifies 80 of those samples correctly, we can calculate the recall by:

T rue P ositive

T rue positive + F alse N egative = 80

80 + 20 = 80%

4.2.3 F1-score

The F1-score is a weighted harmonic mean of the precision and recall, this means that when both precision and recall are maximum the value is 1, if both are minimum the value is 0. The equation to calculate the F1-score is

F1 = 2 ∗

precision ∗ recall precision + recall

4.2.4 Calculation

All the performance scores were calculated using the sklearn metrics module4 for Python. Default parameters were used in the calculation unless specified otherwise.

4.3 Black-box access

Black-box access to a model implies that the only access the adversary has to the target model is through querying it with data. This implies that the adversary knows the shape of the input data, so whether it is a picture or a list of values etc. By querying the model the adversary will be able to obtain the prediction vector produced by the model, this will be the only information the adversary has to use in his membership inference attack.

(18)

Black-box access is selected as it presents the most realistic scenario for an actual membership inference attack. Especially with the development of Machine Learning as a Service models provided by internet companies like Google and Amazon, models that only grant users black-box access through their API are more and more common.

All of the attacks will be performed in a black-box setting. This means that the adversary can only query the model and get the resulting prediction vector but has no other access to the underlying model architecture, the dataset used for training the model or the weights. This setting is chosen over a white-box setting as it is the most realistic setting for real-word use cases of the methods described.

4.4 Dataset

4.4.1 Target model dataset

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class [10]. Below is an example of data points contained in the CIFAR-10 dataset.

(19)

Figure 4.1

This dataset was selected because membership inference attacks have been performed on target models trained on this dataset in previous research. This research showed that good performance was achieved on these models [14], [12].

The data was normalized and converted into Pytorch Tensors to be used in the training of the target model. All the images were used to construct the dataloaders for the training of the target model, the test images of the dataset were then used as the ’outside’ dataset which were used to produce ’outside’ prediction vectors for the training of the attack model.

4.4.2 Attack model dataset

The attack dataset consists of the prediction vectors produced by querying the shadow models with images from the CIFAR-10 dataset.

For the construction of the ’inside’ training dataset the training images of the CIFAR-10 dataset were fed into the shadow models, resulting in prediction vectors

(20)

containing one posterior per class for each image. This means that for each image a prediction was made of the likelihood that it would belong to each of the classes. The construction of the ’outside’ training dataset follows the same procedure but instead of using the training data, the test data is used as the shadow model has not been trained on this data, it was only used to calculate the performance metrics after the initial training of the shadow model. The resulting prediction vectors were labeled as ’outside’ the training data.

4.5 Attack method

The attack method used is based on the method proposed by Shorki et al. [14]. Multiple target models are trained with varying degrees of privacy added which is done by training the model with different values for Sigma. Sigma is the input variable that specifies the privacy budget of the target model according to the Facebook-dp [15] module.

In the last epoch of the training phase of the target model the training samples are labeled as ’inside’ samples, the validation samples are labeled as ’outside’ samples. These are then both saved to serve as testing samples for the attack model later on.

After the training of the target model, multiple shadow models are trained on the CIFAR-10 dataset. The shadow models are of the same architecture as the target model, this is due to the assumption that an adversary, when attacking a MLaaS system, could use the same provider to train his shadow models which would allow him to use the same underlying architecture as the target model without specifically knowing this, thus still allowing for the black-box access re-quirements. Since the target model structure was known the shadow models could be trained locally, saving the expense of using MLaaS providers. At the last epoch of the training of every shadow model, the training samples are labeled as ’inside’ and the validation samples are labeled as ’outside’. Every set of inside and outside samples is then appended to the training set of the attack model.

After all the shadow models were trained, keep in mind none of these are trained with differential privacy, the training set is concatenated and an attack model is then fitted to this data. Light gradient boosting [8] is used for the attack model.

After the attack model is fitted it is used to produce predictions for the test set that was created during the training of the target model. Afterwards perfor-mance metrics such as precision, recall and f1-score are calculated on the resulting predictions to measure the effectiveness of the model.

An existing Pytorch implementation by Adrien Benamri[6] of the method pro-posed by Shokri et al. was used as a basis for the modified method used in this paper. This method was then modified to allow the use of the Facebook-dp module

(21)

during the training of the target model.

4.6 Differential privacy

The adding of differential privacy to the target model is done by attaching the PrivacyEngine module from the torchdp library by Facebook-dp, to the Stochastic Gradient Descent optimizer. To vary the resulting privacy budget, which is recal-culated after every epoch, a Sigma parameter is given which varies this budget. Keep in mind that with every new epoch the privacy will decrease, resulting in epsilon going up.

4.7 Models

4.7.1 Target model architecture

The target model is a Convolutional Neural Network (CNN) dubbed ’Cifarcon-vNet’. It is a relatively simple model with two convolutional layers with a pooling layer in between followed by three linear layers with tanh activation functions between the layers.

The convolutional layers are used to extract features from the image. A con-volutional layer achieves this by going over the image and generating features, like edges and lines, from the image. It is able to take the image as an input and return multiple feature maps, one containing all the edges, one containing all lines and so forth.

Pooling layers are used to reduce the dimensionality of the data by pooling the values in its kernel together and ’pooling’, or gathering, them in one digit. A ’max’ pooling layer with size (2, 2), for instance, will take four values and return the highest value. It then takes one step to the right, again collects the four values from the original image and returns the highest number.

A tahn or hyperbolic tangent activation function is a sigmoid like activation function as it is differentiable and monotonic whilst its derivative is not. It does however have two benefits over the ’regular’ sigmoid activation function. It maps negative values to near-zero values instead of negatives and it prevents zeroing gradients as it maps zero as 0.5 instead of 0 in the sigmoid function. This makes it less prone to causing the network to get stuck in training by vanishing the gradients [13]. A tanh function was selected over a Rectified Linear (ReLu) function due to the fact that the original paper by Shokri et al. use this tahn function instead of ReLu.

(22)

Hyperparameters

As the method of adding differential privacy to the target model is through the use of differentially private stochastic gradient descent (DP-SGD), a SGD optimizer was selected to be used as the optimizer for all models.

The hyperparameters presented by Adrien Benamira were used in the setup of the attack as the attack model fitted on data that was generated using these parameters was already shown to work.

These hyperparameters include a learning rate of 0.001 with a learning rate decay of 1e-07 for every epoch. The number of epochs for both target and shadow models was set to 100. The size of the training set was set to 2.500 samples and the validation set was set to 1000 samples with a batch size of 64 and a momentum of 0.5. 25 shadow models were trained for each target model to generate data for the attack model.

When adding the DP-SGD module to the models a standard delta was chosen of 1e-5 and the gradients were clipped at 1. The Sigma value was varied to test the performance of the model with different privacy budgets.

4.7.2 Attack model

The attack model is a LightGBM model which is a Gradient Boosting Decision Tree (GBDT) algorithm. It uses a combination of Gradient-based One-Side Sam-pling (GOSS) and Exclusive Feature Bundling (EFB) [8]. Traditionally, one of the downsides of using GBDT algorithms is the efficiency and scalability. This is due to the fact that for each feature the information gain of all possible split points has to be calculated. Therefore for each feature all the data points have to be scanned to calculate this, which is very time consuming. By using GOSS a sig-nificant proportion of insigsig-nificant data, e.g. data with very small gradients, can be excluded, resulting in a smaller dataset that needs to be scanned to calculate the information gain. EFB furthermore allows the bundling of mutually exclusive features to reduce the number of features resulting in a decrease of the amount of times the remaining data has to be scanned. Combining these two methods into LightGBM allows to achieve the same accuracy whilst cutting the training times by a factor of 20 [8].

(23)

Chapter 5 Results

5.1 Performance of baseline target model without

differential privacy

Below are the statistics of the performance of the the target model without differ-ential privacy.

Target model Precision Recall F1-score CifarConvNet 29.9 31.4 28.5

Table 5.1: A table containing the different performance scores for the baseline model

As seen in the table, the trained model is not able to classify most samples correctly as it has a lot of false positives which can be concluded from the precision rate of 29.9%. The recall is also relatively low at 31.4% meaning that out of all the samples of a class it only correctly retrieves 31.4%. These two metrics combined result in a f1-score of 28.5%.

(24)

5.2 Performance of the attack model on the

base-line target model

Below are the statistics of the performance of the attack model on the target model without differential privacy.

Table 5.2: A table containing the different performance scores for attack model on the baseline target model

As the table shows the attack model has a relatively high precision which means that it is able to correctly classify most of the samples. The recall is slightly lower at 64.1% which means that the attack model retrieves about 64% of the correct samples out of all the correct samples in the data set. These two combined result in an F1-score of 67.3%

(25)

5.3 Performance of target and attack model on

40.000 training samples per epoch

Since the performance of the target model, see Section 5.1, was quite poor, a new model was trained with 40.000 training samples used in every epoch instead of the 2.500 used for the previous model. Table 5.3 depicts the performance of this model.

As seen in the table above, this model achieves better performance than the model trained on 2500 target samples per epoch. The precision increases to 57.6% meaning that more than half of the positive predictions made are in fact correct, and thus true positives. The model also has a higher recall at 57.8% which means that out of all the examples of a certain class, more than half are correctly classified. These two metrics combined result in a f1-score of 57.3%.

Comparing the results in table 5.4 with those of Table 5.2 the precision of the attack model is increased and is now up to 73.8% meaning that of all the samples classified, 73.8% were in fact correct. The recall however is down to 50.1% so just over half of the available samples were correctly retrieved by the model. These two metrics combined result in an f1-score of 59.7% which is less than the f1-score of the model trained on 2.500 samples per epoch which had a f1-score of 65.4%. So even though the target model that is trained on 40000 samples instead of 2500 samples performs better, the attack model actually performs worse. Due to this drop in performance of the attack model the baseline model, whose performance is described in Section 5.1 and 5.2, was chosen for further research.

(26)

5.4 Resulting privacy budget for various Sigma

val-ues

In Figure 5.1, epsilon values per epoch for the target model for different sigma input values have been plotted. Please notice that the scale of Figure 5.1a is logarithmic whilst the scale of Figure 5.1b is regular. The figures clearly show that an increase of Sigma results in a more private model as the epsilon value goes down drastically with every increase of Sigma. 5.1 also shows that the increase in privacy flattens out with the increase of Sigma, as clearly the increase in privacy is much higher in Figure 5.1a than it is in Figure 5.1b. A rule of thumb in the literature is that a privacy budget of below 10 provides good privacy. Due to the noise that is added with every increase of Sigma, the lowest possible Sigma that results in a privacy budget below 10 would be the optimal value.

(a) _(b)

(27)

5.5 Performance of target models trained with

vary-ing privacy budgets

Below is a graph containing the various performance scores for the target model trained with a varying value of Sigma.

Figure 5.2

As we can see in Figure 5.2 the performance of the model does not seem to change with an increase of Sigma, and thus with an increased level of privacy. This is not to be expected as generally, the performance will drop with an increase of noise. A reason for this behaviour could be that the model was already performing sub optimal and therefore the noise did not impact performance as much as it would impact a very high performing model, as shown in the Appendix.

(28)

5.6 Performance of the attack model on target

mod-els trained with increasing privacy

Below is a graph containing the various performance scores for the attack model on target models with varying sigma.

Figure 5.3

As seen in the graph above, the performance of the attack model stays the same for the different Sigma values used in the training of the target model. This is in a way expected as the performance immediately dropped to about 50% for both precision and recall. Compare this to the results of the attack model before any differential privacy was added in Table 5.2 and the effect of differential privacy is clear. The attack model now performs no better than a coin toss. The performance can therefore not drop any further as the Sigma is increased.

(29)

Chapter 6 Conclusion, Discussion and

Future Work

6.1 Conclusion

The results clearly show the difference differential privacy makes in the vulner-ability of a target model for a membership inference attack. Comparing Table 5.2 with the results in Figure 5.3, the performance of the attack model is down significantly across the different metrics, as recall and precision are at 50% for the attack on the target models with any sort of differential privacy. The attack effectively becomes useless as this is the same performance that could be achieved by tossing a coin. It therefore reveals little to no information about the training data of the target model in question. Sub question one can thus be answered by stating that adding differential privacy to a model will decrease its vulnerability to a membership inference attack significantly.

The answer for sub question two is more complicated as there is no difference in the utility of the target model for an increase of Sigma, see Figure 5.2, and thus an increase in the privacy of the model. This is not what is to be expected as classically, the more noise you add, the less usable a model becomes due to this noise. In this case, because the baseline model was already performing quite poorly, it could be that the noise wasn’t such a big factor as the performance of the model was already less than optimal. This is however not as expected, especially as CIFAR-10 consists of images that have a high variety, see Figure 4.1. To illustrate more expected behaviour Figure 1 shows the performance of a advanced ResNet18 model trained on the CIFAR-10 dataset without differential privacy whilst Figure 2 shows the performance of the same model trained on the same dataset with differential privacy added. The performance drop is significant and makes the algorithm unusable which is expected for a dataset with high variety

(30)

such a CIFAR-10. Since the data contained in the CIFAR-10 dataset has a high variety, and therefore individual records are more unique, the process of masking the presence or absence of one record is harder as more noise would need to be added. This could be due to the unique nature of the images which results in each image contributing more to the resulting model. To obtain the same privacy budget (epsilon) more noise would need to be added to gain the same level of privacy.

Based on this assumption, adding differential privacy to models trained on a more homogeneous datasets like MNIST should, be easier. Previous research [14], [12], has however shown that membership inference attacks on this dataset are not successful since models trained on such homogeneous datasets are very generalizable. This results in the prediction vectors looking (almost) identical for samples that are used in the training of the model versus samples that were not used in the training. The attack model therefore has little to learn so to say.

This means that whilst differential privacy is a viable method to protect models from being vulnerable to membership inference attacks. However, the models that are vulnerable in the first place, those trained on high variety datasets, would suffer greatly from the use of differential privacy. In some cases leading to such low utility that the model in question becomes unfeasible. It would therefore not be a viable method for those models to protect against membership inference attacks as the method of protecting them harms the utility of the model in such a manner that it (can) become unusable.

(31)

6.2 Discussion

An issue with the research in this paper is the fact that the code [6] used as a basis for this research did not live up to the claims made by the author. The results reported in the paper published on the Github of the author claimed higher per-formance scores for the target model than actually obtained using the published hyperparameters. This resulted in a target model with sub optimal performance, as seen in the results section, which was trained on very few instances of the actual dataset. Every epoch only 2500 out of the 50000 available training records were selected for training. Due to time constraints these matters were not investigated further. The model trained on 40000 samples per epoch did perform better than the current baseline model but the resulting attack model showed worse perfor-mance as described in Section 5.3. Since the foundation of membership inference attacks lie on the recognition of overfitting this is also a test result that was un-expected and should be investigated further. The code used to get the results published in this thesis will be available to be used for further research1_.

6.3 Future work

As mentioned in the Discussion, the effect of the various hyperparameters, espe-cially for the training data size, and its effect on the membership inference attack should be further investigated. Furthermore the effect of recent developments discussed in 2.2 on the effectiveness of membership inference attacks on models trained with differential privacy can be grounds for future research as the methods proposed depend less on the target model itself. Therefore if a model, trained with differential privacy, is able to reach good performance, the data transferring attack [12] might still yield good results as it can be trained independent of the target model.

1_{https://github.com/Tidistamatiou/Differential_privacy_membership_inference_}

(32)

Bibliography

[1] A ResNet18 classifier constructed using Pytorch by kuangliu. https : / / github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py. [2] Martin Abadi et al. “Deep learning with differential privacy”. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, pp. 308–318.

[3] Vincent Bindschaedler, Reza Shokri, and Carl A Gunter. “Plausible deniabil-ity for privacy-preserving data synthesis”. In: arXiv preprint arXiv:1708.07975 (2017).

[4] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. “Model inversion attacks that exploit confidence information and basic countermeasures”. In: (2015), pp. 1322–1333.

[5] Kaiming He et al. “Deep residual learning for image recognition”. In: (2016), pp. 770–778.

[6] Implementation of the paper : "Membership Inference Attacks Against Ma-chine Learning Models", Shokri et al. https://github.com/AdrienBenamira/ membership_inference_attack.

[7] Fei Jiang et al. “Artificial intelligence in healthcare: past, present and future”. In: Stroke and vascular neurology 2.4 (2017), pp. 230–243.

[8] Guolin Ke et al. “Lightgbm: A highly efficient gradient boosting decision tree”. In: Advances in neural information processing systems. 2017, pp. 3146– 3154.

[9] David Kirk et al. “NVIDIA CUDA software and GPU parallel computing architecture”. In: ISMM. Vol. 7. 2007, pp. 103–104.

[10] Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny images”. In: (2009).

[11] Sandeep Kumar, Zeeshan Khan, and Anurag Jain. “A review of content based image classification using machine learning approach”. In: Interna-tional Journal of Advanced Computer Research 2.3 (2012), p. 55.

(33)

[12] Ahmed Salem et al. “Ml-leaks: Model and data independent membership in-ference attacks and defenses on machine learning models”. In: arXiv preprint arXiv:1806.01246 (2018).

[13] Sagar Sharma. “Activation functions in neural networks”. In: Towards Data Science 6 (2017).

[14] Reza Shokri et al. “Membership inference attacks against machine learning models”. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE. 2017, pp. 3–18.

[15] The Github page of the Facebook-DP module for Pytorch. https://github. com/facebookresearch/pytorch-dp.

[16] Kaihe Xu et al. “Privacy-preserving machine learning algorithms for big data systems”. In: 2015 IEEE 35th international conference on distributed com-puting systems. IEEE. 2015, pp. 318–327.

(34)

Appendices

(35)

.1

Performance of a Resnet18 model with and

with-out differential privacy

Below are two graphs showing the performance of a ResNet18 [1], [5], trained on the CIFAR-10 dataset. Figure 1a depicts the accuracy and loss per epoch for the ResNet18 model trained without differential privacy. Figure 1b shows a heatmap that depicts how samples were classified with the labels of the samples on the y-axis and the classifications on the x-axis.

Figure 2 shows the same plots as Figure 1 with the exception that instead of the loss being plotted on the left axis of Figure 2b the Epsilon-value, e.g. the privacy budget, is plotted.

Please note that for Figure 1a and Figure 2a the scales on the y-axis differ between the plots.

The models were ran for different amounts of epochs due to the addition of differential privacy which resulted in an increased training time of the second model.

(a) Accuracy and loss per Epoch (b) Heatmap

Figure 1: Loss, accuracy and heatmap of the attack model on the ResNet18 model with no differential privacy.

(36)

(a) ResNet18 accuracy scores (b) ResNet18

Figure 2: Loss, accuracy and heatmap of the attack model on the ResNet18 model with a sigma of 2.3.

Membership Inference Attacks on Differentially Private Machine Learning Models

MIA on Differentially

private Machine Learning

models

MIA on Differentially

private Machine Learning models

part of the Machine Learning Privacy Audit Framework

Contents

Chapter 1

Introduction

1.1

Research question

Chapter 2

Previous Work

2.1

Foundation for membership inference attacks

2.2

Recent developments

Chapter 3

Differential Privacy

Chapter 4

Experiments

4.1

General

4.2

Performance Metrics

4.2.1

Precision

4.2.2

Recall

4.2.3

F1-score

4.2.4

Calculation

4.3

Black-box access

4.4

Dataset

4.4.1

Target model dataset

4.4.2

Attack model dataset

4.5

Attack method

4.6

Differential privacy

4.7

Models

4.7.1

Target model architecture

4.7.2

Attack model

Chapter 5

Results

5.1

Performance of baseline target model without

differential privacy

5.2

Performance of the attack model on the

base-line target model

5.3

Performance of target and attack model on

40.000 training samples per epoch

5.4

Resulting privacy budget for various Sigma

val-ues

5.5

Performance of target models trained with

vary-ing privacy budgets

5.6

Performance of the attack model on target

mod-els trained with increasing privacy

Chapter 6

Conclusion, Discussion and

Future Work

6.1

Conclusion

6.2

Discussion

6.3