What Deep Neural Networks see, but we don’t

(1)

What Deep Neural Networks see,

but we don’t

Joost P.J. Hoppenbrouwer 10334564 Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors dhr. dr. M.W. van Someren M.W.vanSomeren@uva.nl Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam dhr. dr. H.E. Tasli emrah.tasli@gmail.com VicarVision Singel 160 1015 AH Amsterdam June 26th, 2015

(2)

(3)

A B S T R A C T

While recent studies demonstrate how deep neural networks can be fooled on the task of visual classification and since deep neural networks are widely used for this task the importance of understanding this phenomena becomes greater every day. This thesis shows how a pattern arises when comparing the activation patterns of a so-called fooling image within a deep neural network to the activation patterns of correctly classified images gathered from the Im-ageNet data-set. By using several metrics to express the measure of similarity between the activation patterns of the convolutional layers as well as the fully connected layers within the deep neural network it is presented that the sim-ilarity between the activation patterns of the fooling image and its fooling class (the ImageNet synset that the fooling image belongs to) arises from a specific layer within the deep neural network. Also, it is demonstrated that this similarity increases throughout the last layers of the deep neural network.

(4)

A C K N O W L E D G E M E N T S

First of all I would like to thank my supervisors for investing their time on this thesis: M.W. van Someren, who supervised the thesis from the viewpoint of the University of Amsterdam, and H.E. Tasli, who guided not only me but the whole project based on his expertise within the field of Computer Vision. Also, I would like to express my gratitude towards VicarVision1for grant-ing E.H. Tasli the permission to assist me durgrant-ing this project by shargrant-ing his knowledge and devoting his time. VicarVision offered me a desk once a week which made communication easier and it empowered us to weekly keep track of the progress of the thesis.

1 http://www.vicarvision.nl/

(5)

C O N T E N T S

i I N T R O D U C T I O N 7 1 I N T R O D U C T I O N 9

2 T H E O R E T I C A L F O U N D AT I O N 11

2.1 Deep Neural Networks and Image Classification 11 2.2 Fooling Images 11 2.3 Visualization 12 ii T H E S I S 13 3 M E T H O D A N D A P P R O A C H 15 3.1 Evaluation 16 4 R E S O U R C E S 17 4.1 Fooling Images 17

4.2 Correctly Classified Images 18 4.3 Deep Neural Network 18 5 E X P E R I M E N T S 21

5.1 Classification 21

5.1.1 Fooling Images 21 5.2 Visualization 21

5.2.1 Fully Connected Layers 22 5.2.2 Convolutional Layers 22 5.3 Similarity Measures 22

5.3.1 Fully Connected Layers 22 5.3.2 Convolutional Layers 23

5.3.3 Max Pooling and Normalization 24

6 R E S U LT S 25

6.1 Classification 25

6.1.1 Fooling Images 26 6.2 Visualization 26

6.2.1 Fully Connected Layers 27 6.2.2 Convolutional Layers 27 6.3 Similarity Measures 27

6.3.1 Fully Connected Layers 28 6.3.2 Convolutional Layers 29 6.3.3 Similarity Overview 29 7 C O N C L U S I O N 31 7.1 Discussion 31 7.2 Future Work 32 Appendices 35 A A P P E N D I X A 37 5

(6)

(7)

Part I

(8)

(9)

1

I N T R O D U C T I O N

Since deep neural networks (DNNs) have shown state-of-the-art performance and even near-human abilities on visual classification tasks, the application of visual classification based on deep neural networks is becoming broader every day. Examples of these applications include facial recognition, object recognition, image classification and even object localization.

Recently, Szegedy et al. (2013) have shown how DNNs can be fooled by so-called adversarial examples which are artificially constructed images of correctly classified images and human imperceptible noise. Subsequently, Nguyen et al. (2014) demonstrated three more ways to fool DNNs into incor-rectly classifying images: diincor-rectly encoded fooling images which are unrec-ognizable to humans but classified with high confidence by DNNs; indirectly encoded fooling images which are more recognizable to humans but mostly only after the DNN has classified the image as belonging to a certain class; and fooling images generated via gradient descent.

Nguyen et al. (2014) describe multiple cases in which a system that makes use of a DNN for visual classification could be compromised.1 The question arises if these kinds of DNN based visual classification systems should be used for security purposes. While visual classification is already becoming more widely used for security purposes it is important to find ways to over-come the earlier mentioned fooling problem.

While DNNs show state-of-the-art performance and are already widely used for the task of visual classification it is still not clear how they behave. In order to be able to solve the fooling problem it is necessary to have a full un-derstanding of how DNNs behave on fooling images with respect to correctly classified images.

This thesis shows how DNNs behave on fooling images that were created by the MAP-Elites evolutionary algorithm described by Nguyen et al. (2014) with respect to correctly classified images. First, for every layer in the DNN the activation pattern is visualized to see if there are visual similarities be-tween the activation patterns of the fooling images in comparison to the acti-vation patterns of the correctly classified images. Also the actiacti-vation patterns will be compared numerically to show their similarity.

The main hypothesis is that, since the input of the DNN is very different (while the fooling images differ greatly from the correctly classified images) and the output of the DNN is the same (classification), the activation patterns of the fooling image and the activation patterns of the correctly classified

1 https://www.youtube.com/watch?t=284&v=M2IebCN9Ht4

(10)

I N T R O D U C T I O N

images will converge throughout the DNN. This means that the similarity between both activation patterns should increase for every consecutive layer. Another possible outcome would be that a specific layer is found in which the similarity between the activation patterns of the fooling image and the activation patterns of the correctly classified images arises.

(11)

2

T H E O R E T I C A L F O U N D AT I O N

2.1 D E E P N E U R A L N E T W O R K S A N D I M A G E C L A S S I F I C AT I O N

Image classification is a pattern-recognition task that the field of Computer Vision is dealing with. The main task of image classification is to recognize the object in an image to be of a certain class. Nowadays, the most used archi-tectures for this task are deep neural networks. These deep neural networks take an image as input and their output consists of classifications accompa-nied by their confidence value.

Krizhevsky et al. (2012) describe a state-of-the-art deep convolutional net-work used for visual classification which was trained and tested on the Im-ageNet data-set (database of over 14 million annotated images). The deep neural network designed by Krizhevsky et al. (2012) proved to be state-of-the-art in the field of Computer Vision by achieving low error percentages in several image classification competitions.

Another state-of-the-art network for classifying on several other bench-marks is presented by Ciresan et al. (2012). Ciresan et al. (2012) present the idea of multi-column deep neural networks (MCDNNs) in which multiple deep neural networks classify the same data. Afterwards, their outputs (con-fidence values for the classifications) are averaged. By applying this method on several well-known benchmarks for the image classification task it was shown that multi-column deep neural networks achieve state-of-the-art error percentages.

Setting up and training a deep neural network is a time consuming task. To overcome this impediment a significant amount of deep neural networks are available on the web. These pre-trained, open source deep neural networks can be freely used. A specific software package that offers these deep neural networks is called Caffe and it is presented by Jia et al. (2014). Caffe is a fully open source framework that affords clear access to deep architectures such as deep learning architectures and pre-trained deep neural networks. Caffe is widely used in current research.

2.2 F O O L I N G I M A G E S

Szegedy et al. (2013) were the first ones to show that deep neural networks can be fooled by so called fooling images. Szegedy et al. (2013) describe a specific type of fooling images which they call adversarial examples. Adver-sarial examples are generated by finding a representation in the form of an

(12)

2.3 V I S U A L I Z AT I O N

image that belongs to a certain class which lays closest to the starting image that is of a different class. While these perturbations are imperceptible to the human eye, the adversarial examples are still considered to be of the class which the starting image belongs to. Deep neural networks however classify the adversarial examples to be of a different class and this is why the deep neural network is considered to be fooled.

Nguyen et al. (2014) present three different types of fooling images. The first type of fooling images Nguyen et al. (2014) describe is generated by an evolutionary algorithm called MAP-Elites via direct encoding. This means that the gray scale pixel values of some starting image (mostly noise) are mu-tated until the generated image is considered to be of some class with high confidence (over 99%). Fooling images generated via direct encoding are mostly unrecognizable to humans. The second type of fooling images men-tioned by Nguyen et al. (2014) are generated by the same algorithm. The only difference regarding the previous type of fooling images is the encoding which is used to feed the algorithm. The encoding used for this type of fool-ing images is called indirect encodfool-ing which is based on patters and shapes. Fooling images generated via indirectly encoded images tend to be more rec-ognizable to humans. The last type of fooling images introduced by Nguyen et al. (2014) consists of images that are generated via gradient descent. While they look rather different from the images generated by the MAP-Elites evo-lutionary algorithm, these fooling images are proven to be fooling the deep neural network with high confidence.

So far, deep neural networks are proven to be state-of-the-art in visual classifi-cation. However, no clear understanding is present on how these deep neural networks behave and why they perform the way they do. Zeiler and Fergus (2013) present a way to visualize the convolutional filters that are used by the network to recognize features. Based on these features the implementa-tion of Krizhevsky et al. (2012) was improved to achieve even lower error percentages on several image classification tasks.

The earlier mentioned software package called Caffe also contains visu-alization in the form of visualizing the activations within the deep neural network. When classifying an image the output for every layer can be vi-sualized showing either the vivi-sualized output for the convolutional kernels or the activation values for all neurons within the layer.

(13)

Part II T H E S I S

(14)

(15)

3

M E T H O D A N D A P P R O A C H

This thesis will focus on the fooling images that are generated with the evo-lutionary algorithm as described by Nguyen et al. (2014). During this thesis multiple experiments will be executed consisting of running classifications on several images and comparing the activation patterns of the different layers within the network. To be able to perform these experiments a deep neural network, correctly classified images and fooling images are necessary.

The deep neural network will be set up using the Caffe software package combined with the Model Zoo platform. This open source framework offers many different trained deep learning architectures which can be freely used. For this thesis, the most important property is that the model is already trained, while training a deep neural network takes to much time to fit this thesis.

The correctly classified images will be gathered from ImageNet. ImageNet is a large data set containing labeled images. It is also the data set that is widely used in research papers and one of the leading benchmarks.

Last, the fooling images need to be gathered. The images used by Nguyen et al. (2014) might be available online. Else, the fooling images need to be regenerated using the methods discussed by Nguyen et al. (2014).

Once the network is up and running and the necessary data is obtained the first experiments can be performed. First, a tool has to be found to visual-ize the activation pattern of every layer during the classification of an image. Once visualization of the activation patterns is in action, all correctly classi-fied images have to be classiclassi-fied and visualized. Second, the fooling images will be classified to see whether our network is actually fooled by these spe-cific fooling images.

After these first series of experiments the similarity between the activa-tions of fooling images and correctly classified images can be investigated by comparing the visualized activation patterns. From here, the next series of experiments can be set up.

Since the visual comparison of the activation patterns can only lead to an understanding of how the network performs and an intuition on whether the earlier mentioned hypothesis is to be true or false it cannot prove our hypothe-sis to be true or false. In order to prove the hypothehypothe-sis to be true it is necessary to compare the activation patterns of the different layers for fooling images and correctly classified images numerically.

(16)

3.1 E VA L U AT I O N

Since two types of results will be at our disposal, two separate evaluation measures will be utilized. The first was already briefly introduced in the previous section and concerns the visualized activation patterns. Since the visualized activation patterns for both the fooling images and the selected correctly classified images are generated it is possible to compare them one by one. We expect to see an increasing measure of similarity over the layers of the deep neural network. This means that the activation patterns of the lower layers will differ more (to the human eye) than the activation patterns of the higher layers.

The second evaluation measure is necessary to be able to evaluate the nu-merical comparison of the activation patterns of the fooling images and the activation patterns of the correctly classified images. One or more similarity metrics have to be designed to measure the similarity between the activation patterns of the fooling images and the activation patterns of the correctly clas-sified images. Since it is desired to compare not only the activation patterns of the fooling images to the activation patterns of the correctly classified images but also the activation patterns of the fooling images to the activation patterns of the selected classes of correctly classified images it is also necessary to find a way to either in some way represent the activation patterns of a whole class or find a way to combine the similarity measures for all the correctly classified images per class.

Once the similarity measures between the activation patterns of the fooling images and the activation patterns of the correctly classified images per class are collected, the course of the similarity throughout the deep neural network can be traced. This means that for every fooling image the similarity to all selected classes can be projected layer by layer. These mappings are the ones that will either prove our main hypothesis to be true or to be false.

(17)

4

R E S O U R C E S

In order to be able to carry out the desired experiments several resources are needed. First of all, since this research focuses on the fooling images as de-scribed and generated by the MAP-Elites evolutionary algorithm according to Nguyen et al. (2014), this type of fooling images are necessary data. Second, correctly classified images are necessary to be able to compare the fooling im-ages to correct data. This correct data will be obtained via ImageNet which is a data-set of over 14 million annotated images. Last, a deep neural net-work has to be set up to perform the desired experiments with. Since current research is widely using the Caffe software package and the Caffe software package offers state-of-the-art deep architectures, Caffe will be the desired framework to use for setting up a deep neural network and using this deep neural network to perform the desired experiments.

4.1 F O O L I N G I M A G E S

As stated before, this research is based upon the fooling images as described and generated by the MAP-Elites evolutionary algorithm according to Nguyen et al. (2014). Since Nguyen et al. (2014) made their code and data freely avail-able on the web it was possible to obtain the fooling images directly from their source. For the purpose of realization only the sixteen examples given by Nguyen et al. (2014) were selected to be part of this research (see figure 1).

Figure 1.: Fooling images generated by Nguyen et al. (2014).

(18)

4.2 C O R R E C T LY C L A S S I F I E D I M A G E S

Logically, in order to be able to compare the gathered fooling images to cor-rectly classified images, corcor-rectly classified data is necessary. Since Nguyen et al. (2014) generated the fooling images based on the 1000 classes that are present in the ImageNet data-set, it was most logical to make use of the ImageNet data-set. Again, to keep this research realizable only the sixteen synsets corresponding to the fooling images were selected to be part of this research. The sixteen chosen synsets are as follows:

Code Label

n01530575 brambling, Fringilla montifringilla

n01558993 robin, American robin, Turdus migratorius n01784675 centipede

n01806143 peacock

n01817953 African grey, African gray, Psittacus erithacus n02028035 redshank, Tringa totanus

n02056570 king penguin, Aptenodytes patagonica n02130308 cheetah, chetah, Acinonyx jubatus n02317335 starfish, sea star

n02454379 armadillo

n02509815 lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens

n02799071 baseball n03272010 electric guitar n03393912 freight car

n04074963 remote control, remote n07754684 jackfruit, jak, jack

Table 1.: Selected Synsets of ImageNet

4.3 D E E P N E U R A L N E T W O R K

The Caffe software package1 _{is a fully open source framework that affords} clear access to deep architectures such as deep learning architectures and pre-trained deep neural networks. Since the duration of this research didn’t allow a full process of training a deep neural network, a pre-trained deep neural network had to be found to use for the desired experiments.

After installing the Caffe software package a choice had to be made re-garding which deep neural network to use. Caffe provides a platform called Model Zoo2where researchers can share their deep architectures. For this re-search the BVLC Reference CaffeNet pre-trained deep neural network model is used which is based on, but slightly differs from the deep neural network

1 http://caffe.berkeleyvision.org/

2 http://caffe.berkeleyvision.org/model_zoo.html

(19)

4.3 D E E P N E U R A L N E T W O R K

described by Krizhevsky et al. (2012). The BVLC Reference CaffeNet model is currently the most stable state-of-the-art model available through Caffe.

In order to understand the remaining part of this thesis it is crucial to ex-plain more about the chosen deep neural network. Since it is so closely related to the deep neural network designed by Krizhevsky et al. (2012) it is sufficient to describe their deep neural network and point out the main differences.

Figure 2.: State-of-the-art deep neural network as described by Krizhevsky et al. (2012). From left to right: input image, five convolutional layers, two fully connected layers, output.

The deep neural network as described by Krizhevsky et al. (2012) consists of seven layers (see figure 2). The first five of these layers are convolutional layers. Convolutional layers use convolution kernels (also called filters) to identify and localize certain features by performing convolution3 with these convolutional kernels on the input of the convolutional layer. The first layers will identify and localize only low-level features, while the higher layers find high-level features.

The last two layers of the deep neural network are fully connected layers. Fully connected layers are (as the name suggests) fully connected to their pre-vious layer. This means that every neuron of the prepre-vious layer is connected to every neuron in the fully connected layer. The activation value of a neuron in a fully connected layer is calculated as the sum of all the activation values of the previous layer multiplied by their weights.

The deep neural network designed by Krizhevsky et al. (2012) makes use of Max Pooling after layer 1, 2 and 5. Max Pooling uses one kernel which is moved over the output of the layer and returns the maximum value found within the range of this kernel. This way, the deep neural network reduces the variance within the features. Layer 1 and 2 are not only followed by Max Pooling, but also by normalization. While Krizhevsky et al. (2012) apply nor-malization before Max Pooling the BVLC Reference CaffeNet model which was elected for this thesis applies normalization after Max Pooling. Though there are more differences between both deep neural networks it is unneces-sary for this thesis to elucidate them here.

3 https://en.wikipedia.org/wiki/Convolution

(20)

(21)

5

E X P E R I M E N T S

As described in chapter 3 there are two main series of experiments to be exe-cuted during this thesis. The first series of experiments consist of classifying all images that were gathered for this thesis and visualizing the activation pattern of every layer of the network for these images. The second series of experiments consist of comparing the activation patterns of the fooling im-ages to those of the correctly classified imim-ages per class. After measuring these similarities the course of the similarity between the activation patterns of the fooling images and the activation patterns of the correctly classified images per class can be depicted.

5.1 C L A S S I F I C AT I O N

Classification of an image is done by running the image through the deep neu-ral network that was chosen for this thesis. The output of such a classification is the set of all classes known by the deep neural network and their accom-panying confidence measures. The higher the confidence measure for a class, the more convinced the deep neural network is about the image belonging to the class. Classification is done for all images gathered for this thesis to measure the accuracy of the chosen deep neural network.

5.1.1 Fooling Images

The classification of the fooling images is particularly important while they have to be fooling images to start with. By measuring the confidence scores for the fooling images it can be decided whether the elected fooling images are actually fooling our deep neural network. Based on the confidence scores the fooling images can be kept or dropped from this thesis.

As is classification, visualization is performed while running an image through the deep neural network. For every layer the activation pattern is stored nu-merically and is visualized and stored as an image.

(22)

5.3 S I M I L A R I T Y M E A S U R E S

5.2.1 Fully Connected Layers

The last layers (6 & 7) in the deep neural network are fully connected layers. This means that every neuron of the previous layer is connected to every neu-ron in the fully connected layer. The activation value of a single neuneu-ron in the fully connected layer is the sum of the activation values of the previous layer multiplied by their weights. The activation pattern of a fully connected layer consists of the activation values of every neuron within the layer and this can be visualized as a figure in which the x axis represents the neurons and the y axis represents their activation values.

5.2.2 Convolutional Layers

The first five layers of the deep neural network are convolutional layers. As stated before these layers use convolution kernels to identify and localize cer-tain features. The activation of every single one of these kernels can be visual-ized as being a patch. The activation pattern of a convolutional layer consists of the activations of all the convolution kernels within the layer and it can be visualized as the projection of all visualized convolution kernel activations (patches).

5.2.2.1 Max Pooling and Normalization

While Max Pooling is only reducing the size of the patches and normalization only adjusts the activation values the method of visualization is similar to the visualization method used for convolutional layers.

After the first series of experiments all data (fooling images and correctly classified images) has been classified and visualized. By achieving this all the necessary data for the second series of experiments is available. In or-der to perform the desired comparisons it is necessary to define the metrics which will be used to express the measure of similarity between the activation patterns of the fooling images and the activation patterns of a certain class. 5.3.1 Fully Connected Layers

For comparing the fully connected layers and expressing their similarity three different metrics were designed. All three of these metrics make use of the same class representation. To compare the activation patterns of the fooling images to the activation patterns of the classes corresponding to the ImageNet synsets the classes are represented by taking the mean of the activation pat-terns of the images wihtin the class. The fooling image activation patpat-terns will be compared to these mean class activation patterns and based on the following three metrics the similarity will be expressed.

(23)

5.3.1.1 Accuracy Metric

The first metric was designed to rapidly examine if traces can be found that verify the intuition of the similarity being high in the higher layers of the deep neural network. This metric involves not only the mean activation patterns of the classes but also the standard deviation. Every activation value within the activation pattern of the fooling image will be compared to the matching mean activation value of the class the fooling image is compared to. Whenever the activation value of the fooling image falls within the range of the mean class activation value minus the standard deviation and the mean class activation value plus the standard deviation the value is considered a ’hit’. When out-side of this range the value is conout-sidered a failure. After comparing all the activation values within the activation pattern of the fooling image the metric expresses the similarity by returning the percentage of hits. The higher the percentage, the more similar the fooling image is to the class.

5.3.1.2 Euclidean Distance

To be able to see how close the mean class activation patterns and the acti-vation patterns of the fooling images lie in space the euclidean distance is calculated between them. The lower the euclidean distance, the higher the similarity.

5.3.1.3 Cosine Similarity

Last, the cosine similarity is used as a metric to express the angle between the mean class activation patterns and the activation patterns of the fooling images in space. The higher the cosine similarity, the higher the similarity between the activation pattern of the fooling image and the mean activation pattern of the class it is compared to.

Because of the structure of the convolutional layers the metrics described earlier are not applicable to these layers. While this is the case a new met-ric had to be thought of to be able to compare the activation patterns of the convolutional layers of the fooling images to the activation patterns of the convolutional layers of the classes. The metric used for comparing the activa-tion patterns of convoluactiva-tional layers is based on another convoluactiva-tion. Every patch of the convolutional layers of the fooling images will be compared to the matching patch of all images in the class via convolution. From this con-volution the maximum value is taken which represents the similarity between the patch of the fooling image and the accompanying patch of the correctly classified image. When done for all patches and all images belonging to the class the fooling image is compared to we get multiple similarity measures per patch. From these similarity measures the mean is taken per patch which provides a single similarity measure per patch. These values are again

(24)

duced to their mean which results in a single similarity measure per fooling image and class comparison.

5.3.3 Max Pooling and Normalization

The method for calculating the similarity measure between the Max Pooling and normalized layers is likewise to that of the convolutional layers.

(25)

6

R E S U LT S

6.1 C L A S S I F I C AT I O N

During the first series of experiments all the correctly classified images were classified. Such a classification is run by forwarding the correctly classified image through the deep neural network. The deep neural network then out-puts the confidence value for every class it knows. In our case the network is trained on the ImageNet data-set and so it knows of a 1000 classes. The output of the deep neural network was plotted for every image and also the top 5 classifications were stored (see figure 3).

(a) Confidence plot

centipede 0.97800700 scorpion 0.00896657 isopod 0.00254093 crayfish 0.00222541 alligator lizard 0.00147829 (b) Top 5 classifications

Figure 3.: Classification of the ’Centipede’ fooling image (see figure 1). (a) The confidence plot shows the confidence of the deep neural net-work for the image belonging to a certain class for all 1000 classes. (b) Top 5 classifications accompanied by their confidence values.

(26)

After all the correctly classified images were classified the top 1 and top 5 error were calculated for every class. The top 1 error shows how many of the classifications are wrong, this means that the class accompanied by the highest confidence value is not the class the image belongs to. The top 5 error shows how many times the class which the image belongs to is not even in the top 5 classifications.

Code Top 1 Error Top 5 Error

n01530575 0.106182795699 0.0369623655914 n01558993 0.0925507900677 0.0278404815651 n01784675 0.207957957958 0.0848348348348 n01817953 0.0777777777778 0.0444444444444 n02028035 0.151129943503 0.034604519774 n02056570 0.0598417408506 0.0316518298714 n02130308 0.12543798178 0.0252277505256 n02317335 0.204871060172 0.103151862464 n02454379 0.195007800312 0.0967238689548 n02509815 0.0729537366548 0.0284697508897 n02799071 0.193018480493 0.0667351129363 n03272010 0.231848688225 0.0945698596705 n03393912 0.13882532418 0.0465293668955 n04074963 0.292772861357 0.140855457227 n07754684 0.126428571429 0.0621428571429

Table 2.: The Error rates for the selected Synsets of ImageNet (the peacock synset is missing while it was dropped from this thesis, see section 6.1.1)

6.1.1 Fooling Images

In order to tell whether the selected fooling images are actually fooling our deep neural network the fooling images were classified just as the correctly classified images. An overview of the confidence values for the classes the fooling images were designed for can be seen in table 4. Based on these values the peacock fooling image and the peacock ImageNet synset were dropped from this thesis. This was done not only because of the low con-fidence value for the peacock class, but also because the peacock fooling image was the only one which was incorrectly classified.

Next to classification, visualization was performed during the first series of experiments. While classification runs the images through the deep neural network it was possible to immediately store the activation values of the

(27)

ferent layers within the network. These activation values were visualized by using code obtained from the Caffe filter visualization tutorial1.

The fully connected layers within the deep neural network that was elected for this thesis (6 & 7) both consist of 4096 neurons. When an image flows through the deep neural network these neurons get an activation value which expresses the measure in which the neurons are activated. These values can be plotted which results in the activation pattern for the fully connected layer.

Figure 4.: Example of the activation pattern for a fully connected layer. All data described in this thesis is available at https://github.com/ JoostHoppenbrouwer/GraduationProject. While sharing all data is not realizable due to memory restrictions only the data for the fooling images can be found at the link above.

While convolutional layers use different convolution kernels to extract differ-ent features from the images the visualization process of these layers is more complex. By using the methods provided by the earlier mentioned Caffe fil-ter visualization tutorial the activation patfil-tern for every convolution kernel within each layer was visualized. To be able to do so the number of convo-lution kernels within each layer and the size of the resulting patches were essential data. This data can be found in table 3. An example of the visual-ization of the activation patterns for the convolutional layers can be seen in figure 5.

At this moment in time all the necessary data to compare the activation pat-terns of the fooling images to the activation patpat-terns of correctly classified images is obtained. Next, the predefined distance metrics (see section 5.3) were used to express the similarity between the activation patterns of the fool-ing images with respect to the activation patterns of the correctly classified images per class.

1 http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/ filter_visualization.ipynb

(28)

Instead of starting at the input layer the comparison between the activation patterns of the fooling images and the activation patterns of the correctly clas-sified images per class started at the higher layers in the deep neural network. The main reason for this was that when a small similarity or even no similar-ity at all could be found at the higher layers it would immediately have shown our hypothesis to be wrong.

6.3.1.1 Accuracy Metric

First, to checkup on this very quickly, the accuracy metric was used to show in a fast way if some similarity arises. When analyzing these first results it became clear that not only the similarity for the fooling class (class that the fooling image was designed for), but also the similarity for all other classes were over 80% for all fooling images. After looking into the activations pat-terns of layers 6 and 7 it became clear that most values lie around zero. The fact that only the peaks within the activation patterns differ accounts for the high accuracy measure. After applying some different thresholds the accu-racy measure significantly dropped for all classes including the fooling class. However, the fooling class is in all cases on top of the accuracy measure which means that the activation pattern of the fooling image is most similar to the mean activation pattern of the fooling class. In figure 6 an example of the output of the accuracy measure can be seen.

6.3.1.2 Euclidean Distance

Now that the accuracy metric verified the idea of the fooling class being most similar to the fooling image some more specific and more widely used metrics could be applied to the activation patterns of the fully connected layers to see whether the accuracy metric is just putting the fooling class on top by design or that the similarity truly is the highest. The Euclidean distance between the activation patterns of the fooling images and the activation patterns of the classes were calculated and can be seen in figure 7.

As does the accuracy metric, the Euclidean distance points out that the fooling image is closest to the fooling class. Also, it can be seen not only that the fooling image is closest to the fooling class, but also that it the Euclidean distance is decreasing from layer 6 to layer 7. This means that the fooling image actually gets more similar to the fooling class which again matches our expectations.

6.3.1.3 Cosine Similarity

The last metric that was used to express the similarity between the activation patterns of the fully connected layers of the fooling image to the activation patterns of the classes is the Cosine similarity. The Cosine similarity backs up the originated intuitions by again showing not only that the fooling image

(29)

is most similar to the fooling class, but again also showing how the similarity increases from layer 6 to layer 7 (see figure 8).

To compare the activation patterns of the convolutional layers the predefined metric as described in section 5.3.2 was used. The results of this comparison is a graph for each layer which shows the similarity measures per class. An example of this is attached in Appendix A in figure 9. While the convolu-tion metric shows how the similarities between the activaconvolu-tion patterns of the fooling images and the activation patterns of the different classes are related to one another in terms of their mutual distance, it does not show the course of similarity throughout the deep neural network. This means that a simi-larity measure of 1.0 in a single layer doesn’t mean the same as a simisimi-larity measure of 1.0 in another layer. The cause of the similarity measures not corresponding over different layers are the different amounts of kernels and their different sizes.

6.3.3 Similarity Overview

The last step towards answering our research question and proving our hy-pothesis to be true or false is creating a similarity overview throughout the deep neural network. The similarity measures of all layers were combined to try to find a pattern within the similarity measures for the different fooling images. Three of these similarity overviews can be found in figure 10. In order to compare the similarity measures of the different layers the similarity values were normalized by dividing by the maximum value within a layer. This causes every layer to have a maximum value of 1.

(30)

(31)

7

C O N C L U S I O N

To conclude whether our initial hypothesis is to be true or false we compared the similarity overviews for the different fooling images. After analyzing them it can clearly be seen that there is a pattern within these similarity overviews. Every single fooling image exhibits this pattern.

It can be seen that for the first few layers the similarity between the fooling image and the fooling class is randomly distributed among the other simi-larities. This means that at an early stage of classification it is not clear to the deep neural network which class the fooling image belong to. In layer 3 and 4 the similarity suddenly collapses and the mutual distances between the similarity values are becoming much smaller than before. After these layers the similarity value for the fooling class ends up on top of the charts which means that from this point on the deep neural networks shows most similar-ity between the activation patterns of the fooling image and the activation patterns of the fooling class.

While this thesis has shown that a certain similarity overview pattern arises for all fooling images used for this thesis, this pattern only partially proves our main hypothesis to be true. Since the similarity overview pattern only shows the mutual distance within each layer of the deep neural network we cannot state that the similarity is increasing throughout the deep neural network. We can prove that the similarity is increasing from layer 6 to layer 7 and this may give some indication of the main hypothesis being true but it is not sufficient prove it.

While our main hypothesis stated that we expected the similarity to in-crease throughout the deep neural network, we also stated that a possible outcome of this research could be that the similarity arises from a specific layer. This can indeed be proven by the similarity overview. In the similarity overview we can see that the similarity of the fooling image with respect to the fooling class dominates from layer 5 onwards.

7.1 D I S C U S S I O N

Although this thesis shows that a likewise pattern can be found for every fooling image within this thesis it should still be tested on a much larger set of fooling images. By doing so the generality of the pattern can be proven. Also, all different types of fooling images as described in section 2.2 should

(32)

7.2 F U T U R E W O R K

be tested to see whether the presented pattern arises for all these types of fooling images.

Not only the set of fooling images was relatively small. Just as the set of fooling images, the data-set of correctly classified images could be expanded. Since this thesis only compares the fooling images to 15 classes it should be tested for all 1000 ImageNet synsets. This way it can be proven that the fooling images are indeed most similar to their fooling class with respect to all other classes in the higher layers of the deep neural network.

Even though Szegedy et al. (2013) have proven the generality of fooling images across different deep neural networks it has to be shown that multi-ple deep neural networks cause the similarity overview to be similar for the specific fooling images. The deep neural network used for this thesis was specially trained for the ImageNet data-set. By performing the same experi-ments on several deep neural networks that are trained for different purposes the generality of the similarity overview pattern can be demonstrated. 7.2 F U T U R E W O R K

After showing how the similarity between the activation patterns of the fool-ing images and the activation patterns of the different ImageNet synsets evolve throughout the network the question of how and why this is happening re-mains unanswered. This thesis has shown that the similarity arises from a certain point and that the similarity increases over some layers. Although this is sufficient to indicate that our main hypothesis might be true it can not explain these phenomena.

In order to learn why this arising similarity takes place it should be investi-gated what is happening within layers 3, 4 and 5, because these are the layers which cause the similarity to emerge.

Another question that remains unanswered is what features are being shared by the fooling image and its fooling class. While deep neural networks try to identify and localize features from low-level to high-level feature layers, these features and their activations are causing the similarity to appear. By tracing back these activations through the deep neural network the features with the highest similarity could be found and localized within both the fool-ing images and the images of the foolfool-ing class. This could help understandfool-ing the behaviour of deep neural networks as well as understanding the structure of the fooling images.

(33)

B I B L I O G R A P H Y

Ciresan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3642–3649. IEEE. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R. B., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. CoRR, abs/1408.5093.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classifica-tion with deep convoluclassifica-tional neural networks. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc.

Nguyen, A., Yosinski, J., and Clune, J. (2014). Deep neural networks are eas-ily fooled: High confidence predictions for unrecognizable images. CoRR, abs/1412.1897.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. (2013). Intriguing properties of neural networks. CoRR, abs/1312.6199.

Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolu-tional networks. CoRR, abs/1311.2901.

(34)

(35)

Appendices

(36)

(37)

A

A P P E N D I X A

Layer # Kernels Patch size

1 96 55×55

1 after Pooling 96 27×27

1 after Pooling and Normalization 96 27×27

2 256 27×27

2 after Pooling and Normalization 256 13×13

3 384 13×13

4 384 13×13

5 256 13×13

Table 3.: BVLC Reference CaffeNet model convolution kernels.

(38)

A P P E N D I X A

Fooling

Image Code Confidence

n01530575 0.896975 n01558993 0.973630 n01784675 0.978007 n01806143 0.073413 n01817953 0.999870 n02028035 0.996564 n02056570 0.999997 n02130308 0.918362 Fooling

Image Code Confidence

n02317335 0.999964 n02454379 0.915391 n02509815 0.737818 n02799071 0.999352 n03272010 0.998492 n03393912 0.982718 n04074963 0.996587 n07754684 0.583008

Table 4.: Fooling Image Confidence (for labels see table 1)

(39)

A P P E N D I X A

(a) Input Image of the deep neural net-work.

(b) Visualized activation pattern of layer 1.

(c) Visualized activation pattern of layer 1 after pooling.

(d) Visualized activation pattern of layer 1 after pooling and normalization.

(e) Visualized activation pattern of layer 2.

(f) Visualized activation pattern of layer 2 after pooling.

(40)

A P P E N D I X A

(g) Visualized activation pattern of layer 2 after pooling and normalization.

(h) Visualized activation pattern of layer 3.

(i) Visualized activation pattern of layer 4.

(j) Visualized activation pattern of layer 5.

(k) Visualized activation pattern of layer 5 after pooling.

Figure 5.: Example of the visualization of the activation patterns of the con-volutional layers on a sea star image. Again, this data is avail-able online for the fooling images at https://github.com/ JoostHoppenbrouwer/GraduationProject.

(41)

A P P E N D I X A

(a) Accuracy measure for layer 6.

(b) Accuracy measure for layer 7.

Figure 6.: Accuracy measure comparing the ’king penguin’ fooling image to all ImageNet synsets in this thesis. The green dot represents the fooling class while the red crosses represent the other classes.

(42)

A P P E N D I X A

(a) Euclidean distance for layer 6. (b) Euclidean distance for layer 7. Figure 7.: Euclidean distance comparing the ’lesser panda’ fooling image to

all ImageNet synsets in this thesis. The green dot represents the fooling class while the red crosses represent the other classes.

(a) Cosine similarity for layer 6. (b) Cosine similarity for layer 7. Figure 8.: Cosine similarity comparing the ’brambling’ fooling image to all

ImageNet synsets in this thesis. The green dot represents the fool-ing class while the red crosses represent the other classes.

(43)

A P P E N D I X A

(a) Convolutional Layer 1 similarity.

(b) Convolutional Layer 1 after Pooling similarity.

(c) Convolutional Layer 1 after Pooling and Normalization similarity.

(44)

A P P E N D I X A

(d) Convolutional Layer 2 similarity.

(e) Convolutional Layer 2 after Pooling similarity.

(f) Convolutional Layer 2 after Pooling and Normalization similarity.

(45)

A P P E N D I X A

(g) Convolutional Layer 3 similarity.

(h) Convolutional Layer 4 similarity.

(46)

A P P E N D I X A

(i) Convolutional Layer 5 similarity.

(j) Convolutional Layer 5 after Pooling similarity.

Figure 9.: Convolution metric comparing the ’cheetah’ fooling image to all ImageNet synsets in this thesis. The green dot represents the fool-ing class while the red crosses represent the other classes. All data is available at https://github.com/JoostHoppenbrouwer/ GraduationProject.

(47)

A P P E N D I X A

(a) Similarity overview for the ’brambling’ fooling image.

(b) Similarity overview for the ’sea star’ fooling image.

(c) Similarity overview for the ’baseball’ fooling image.

Figure 10.: Similarity overviews throughout the layers of the deep neural net-work for three different fooling images.