• No results found

The use of variational autoencoders on dermatoscopic images

N/A
N/A
Protected

Academic year: 2021

Share "The use of variational autoencoders on dermatoscopic images"

Copied!
20
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The use of variational

autoencoders on dermatoscopic

images

Joris M.J. Bakker 11178310 Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor MSc Maximilian Ilse AMLab Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam June 28th, 2018

(2)

Abstract

For the complete recovery from cancer, early detection is important. To assist in the early detection of skin cancer the use of variational autoencoders is explored. In this thesis eight different models were considered. Having two

different architectures, a fully convolutional network and a residual neural network using fully convolutional layers, two different priors/ posteriors, a Gaussian distribution and a Laplace distribution and two different loss functions, cross entropy and a mixture of discrete logistics. These models were

trained for 100 epochs and evaluated on the reconstruction quality and the possible appearance of meaningful clusters in the compressed representations

of the model. The results show that none of the considered models have detailed reconstructions, nor meaningful clusters. However the mixture of discrete logistics loss function does result in more realistic reconstructions.

(3)

Contents

1 Introduction 4

2 Theoretical Framework 5

2.1 Variational Autoencoder . . . 5

2.1.1 Loss Function . . . 6

2.1.2 Distributions for Grey Scale Data . . . 7

2.1.3 Cross Entropy . . . 8

2.1.4 Discretized Logistic Mixture Likelihood . . . 8

2.2 Reparametrization Trick . . . 9

2.3 Prior/ Posterior . . . 9

2.4 Disentangled Variational Autoencoder . . . 9

2.5 Architectures . . . 10

2.5.1 Linear . . . 10

2.5.2 Fully Convolutional . . . 10

2.5.3 Residual Neural Network . . . 11

2.5.4 Combined Architectures . . . 11 3 Method 11 3.1 Dataset . . . 11 3.2 Image Preprocessing . . . 12 3.3 Implementation . . . 12 3.4 Evaluation . . . 13 4 Results 13 5 Discussion 17 6 Conclusion 18 7 Future work 18

(4)

1

Introduction

One of the most common forms of cancer is skin cancer, a prior study in the Netherlands found that from 1991 to 2010 there were 96% and 75% increases in melanoma in men and women, respectively[6]. There are several types of skin cancer, such as melanoma, dermatofibroma, Actinic keratoses and many more. Amongst them, melanoma has the highest mortality rate being at 13% in western Europe [7, 14]. To lower the mortality rate of skin cancer it is important that it is detected in an early stage. Amongst the possible approaches that could assist in the early detection of skin cancer we will take look at the use of variational autoencoders (VAE). This is an unsupervised model that creates an compressed representation from which it attempts to reconstruct the input image to the best of it’s ability. This compressed representation can then be used to cluster similar images, which could create a cluster of different types of skin cancer. Furthermore if shown to have potential it could prove to be useful for different types of medical images. As there is often large amounts of unlabelled medical image data available, since labelling all this data would require precious time from specialists. In this thesis we will compare several models and look at the effects on the reconstruction quality, cluster capability and log likelihood of the models. For these models the architectures, the loss functions and the prior/ posterior are being changed. As architectures a fully convolutional neural network [9] and a residual neural network [4, 3] will be looked at. A mixture of discrete logistics [13] and cross entropy shall be used a loss functions. Finally for the prior and posterior a set of two Gaussian and two Laplace distribution will be used [2]. This makes eight different variations in total to be considered. To start off we will take a closer look on the different aspects of the VAE and then establish the mathematical foundation of the VAE. We also take a look at different architectures and loss functions that can be used. After this how the experiment is setup and evaluated. Then finally the results are presented followed by a conclusion.

(5)

2

Theoretical Framework

2.1

Variational Autoencoder

A Variational Autoencoder (VAE) is an extension of the Autoencoder (AE), which is an unsupervised learning model. An AE consists of an encoder, a bottleneck and a decoder. Where the size of the bottleneck is the smallest in the network. When given some input, the encoder reduces the size of the input to that of the bottleneck. This vector is essentially an compressed representation of the given input. The decoder takes this vector and tries to reconstruct the given input. Hence when training the model it learns to encode and decode data in such a way that the least amount of information is lost. However the compressed representation of an AE is just an vector that contains seemingly random numbers, as they do not have any meaning except telling the decoder how to reconstruct the input. This means that even if two vectors lie close to one an other there is nothing that can be said about the similarity of what the two vectors represent.

Here is where the VAE comes in, as it has some extensions that enable this. What sets the VAE apart from the AE is that a structure is imposed on the data. This is done by letting the encoder generate two vectors in the bottleneck, that are of the same size. These two vector are then used as parameters for a distribution, which is then sampled to get a single vector for the decoder to reconstruct. By taking a sample from this distribution the imposed structure on the data becomes continuous (see Figure 1.b), which also causes the model to place similar inputs close to one an other in this latent space. This allows for the model to find clusters in unlabelled data as can be seen in Figure 1.a.

(a) Latent vectors clusters shown by T-SNE (b) Latent vector space

Figure 1: To demonstrate what a VAE can do the MNIST dataset is used. (a) shows the clusters that are present in the latent vectors. (b) shows the latent space created by decoding a meshgrid.

The goal is to learn the true data generating distribution p(x) shown below [5, 16]: pθ  x(i)= Z pθ  x(i)|zpθ(z)dz (1)

(6)

The optimal parameter θ∗ is one that maximises the probability of generat-ing real data samples:

θ∗= arg max θ n Y i=1 pθ  x(i) (2)

With the use of log probabilities the product is written as a sum:

θ∗= arg max θ n X i=1 logpθ  x(i) (3)

However it is not feasible to check all the possible values of z and sum them up to compute pθ(x(i)). To increase the search speed a new approximation function

is introduced. qφ(z|x) outputs a value that is likely given an input x. To place

this in context the conditional probability pθ(x|z) defines a generative model

or probabilistic decoder. While qφ(z|x) the approximated function of pθ(z|x) is

the probabilistic encoder. 2.1.1 Loss Function

For the model to work the approximation posterior qφ(z|x) should be very close

to the real one pθ(z|x). To quantify the distance between these two distributions

Kullback–Leibler divergence (KLD) is used. KLD is a measure of information that is lost if one distribution is used to represent another. So to ensure that qφ approximates pθthe reverse KLD is minimised which results in [5, 16]:

DKL(qφkpθ) (4)

DKL(QkP ) = Ez∼Q(z)log

Q(z)

P (z) (5)

If expanded the following equation emerges: DKL(qφ(z|x)kpθ(z|x)) = Z qφ(z|x) log qφ(z|x) pθ(z|x) dz (6) = Z qφ(z|x) log qφ(z|x)pθ(x) pθ(z, x) dz (7) = Z qφ(z|x)  log pθ(x) + log qφ(z|x) pθ(z, x)  dz (8) = pθ(x) + Z qφ(z|x) log qφ(z|x) pθ(z, x) dz note Z qφ(z|x)dz = 1 (9) = log pθ(x) + Z qφ(z|x) log qφ(z|x) pθ(x|z)pθ(z) dz (10) = log pθ(x) + Ez∼qφ(z|x)  logqφ(z|x) pθ(z) − log pθ(x|z)  (11) = log pθ(x) + DKL(qφ(z|x)kpθ(z)) − Ez∼qφ(z|x)log pθ(x|z) (12)

(7)

After some rearranging the result is:

log pθ(x)−DKL(qφ(z|x)kpθ(z|x)) = Ez∼qφ(z|x)log pθ(x|z)−DKL(qφ(z|x)kpθ(z))

(13) When learning the true distributions the LHS of the equation is maximised. Here the log pθ(x) is the log likelihood of generating real data should be

max-imised and DKL(qφ(z|x)||pθ(z|x)), which is the difference between the real and

approximated posterior distribution, should be minimised. The balance between these two terms define the loss function:

LV AE(θ, φ) = − log pθ(x) + DKL(qφ(z|x)kpθ(z|x)) (14)

LV AE(θ, φ) = −Ez∼qφ(z|x)log pθ(x|z) + DKL(qφ(z|x)kpθ(z)) (15)

In the end the definition of the loss function and the KLD qφ(z|x) and pθ(z),

which impose a structure on the encoder, arise from the requirement of the approximated posterior qφ(z|x) being very close to the real posterior pθ(z|x).

2.1.2 Distributions for Grey Scale Data

Simple candidates for −Ez∼qφ(z|x)log pθ(x|z) are distributions such as Bernoulli,

Gaussian, Beta, Gamma or Laplace. The performance of a distribution depends on the structure of the data and the characteristics of the distribution as can be seen in Figure 2.

However these distribution can only handle a single channel, which makes them unable to evaluate colour images, as they have 3 channels. This makes them limited to grey scale and binary images.

(8)

(a) Reconstruction after 1 epoch using a Gaussian distribution for the loss function

(b) Reconstruction after 1 epoch using a Laplace distribution for the loss function

(c) Reconstruction after 1 epoch using a Bernoulli distribution for the loss function

Figure 2: To demonstrate different characteristics of distributions on the recon-struction the MNIST dataset is used. (a) Tends to be fairly blurry, but shapes match the original. (b) Produces sharp reconstructions, while the shape is not always on point.(c) Creates blurry reconstructions that have a very general shape.

2.1.3 Cross Entropy

The cross entropy loss function creates a distribution for each pixel’s colour [12]. The colour values range from 0 to 255, thus the total amount of parameters for one pixel are 3 · 256 = 768. The loss value is then acquired by taking the log likelihood that x is a sample from the distributions. A downside of this loss function is that the different colour channels of a pixel and each pixel have independent distributions. Furthermore it has no notion of a colour similarity, it sees blue and red the same as blue and dark blue. As a result it can produce colourful reconstructions. This effect should lessen as the model learns and the loss value becomes lower.

2.1.4 Discretized Logistic Mixture Likelihood

Another loss function used for colour images is the discretized logistic mixture likelihood [12, 13]. This loss function also creates distributions for each colour of a pixel, but does this in a more elaborate way. As each distribution is created by ten separate distribution and has directly build in dependencies between the colours. To define these distributions 100 parameters are needed for each pixel.

(9)

The value of a pixel (ri,j, gi,j, bi,j) at location (i, j) in the image, the distribution

conditional on the context Ci,j, is obtained using the following formula [13]:

p (ri,j, gi,j, bi,j|Ci,j) =P (ri,j|µr(Ci,j) , sr(Ci,j)) × P (gi,j|µg(Ci,j, ri,j) , sg(Ci,j))

× P (bi,j|µb(Ci,j, ri,j, gi,j) , sb(Ci,j))

(16) µg(Ci,j, ri,j) = µg(Ci,j) + α (Ci,j) ri,j (17)

µb(Ci,j, ri,j, gi,j) = µb(Ci,j) + β (Ci,j) ri,j+ γ (Ci,j) bi,j (18)

with α, β, γ scaler coefficients depending on the mixture component and previous pixels. As can be seen the red pixel is first predicted, which is then used to predict the green pixel and then both are used to predict the blue pixel. Due to this build in dependency the reconstruction are likely to have smooth colours, without sudden colourful spots.

2.2

Reparametrization Trick

The loss function requires samples to be generated from z ∼ qφ(z|x). This is

a stochastic process and therefore it is not possible to backpropagate the gra-dient. To work around this problem the reparametrization trick is introduced [8]. It is possible to express the random variable z as a deterministic variable z = Tφ(x, ), Here  is a independent random variable and Tφ converts  to z.

z ∼ qφ



z|x(i)= Nz; µ(i), σ2(i)I (19)

z = µ + σ , where  ∼ N (0, I) (20)

This trick does not only work for the Gaussian distribution, but also for other distributions. Taking the Gaussian distribution as an example, the model be-comes trainable by learning the mean and variance of the distribution, while the stochasticity remains in the random variable  ∼ N (0, I).

2.3

Prior/ Posterior

As a result of the reparametrization trick a structure can be imposed on the data that depends what distribution is used for z ∼ qφ(z|x) , which is called the

posterior distribution of the model. The prior pθ(z) distribution of the model

is predetermined. As a result of DKL(qφ(z|x)||pθ(z)) in the loss function, the

distribution of qφ(z|x) is squeezed under that of pθ(z), which allows control of

what structure is imposed on the data/ model.

2.4

Disentangled Variational Autoencoder

A representation is called disentangled if each variable in z reacts to only one factor and is invariant to other factors [5]. For example an experiment with a white dot that is placed in a black background at a certain spot, scale and rotation. Here the disentangled representation was able to learn to use one variable for x position, one for y position, one for scale and two for rotation,

(10)

leaving the other variables untouched. Meanwhile a entangled representation uses all variables, as it splits the factors up into all variables [5].

The β-VAE model is a modification of the VAE, which encourages the model to discover disentangled latent factors. This adjust the formula of the loss func-tion in the following way:

LV AE(θ, φ) = −Ez∼qφ(z|x)log pθ(x|z) + βDKL(qφ(z|x)kpθ(z)) (21)

If β = 1 it is the same as a VAE. When β > 1 the priority of finding disentangled latent representation increases as DKL(qφ(z|x)||pθ(z)) becomes a

larger part in the loss function. Meanwhile if β < 1 −Ez∼qφ(z|x)log pθ(x|z)

becomes a larger part of the loss function as the other part becomes smaller. However the effect of β different depending on the type of data. Hence the β may create a trade off between the extent of disentanglement and the reconstruction quality.

2.5

Architectures

A VAE’s architecture is essentially a deep neural network that is setup in a specific manner. It is created to have a similar shape as a sand glass, as the input advances through the encoder the layers get increasingly smaller. This goes on until the bottleneck is reached from where it is fed to the decoder where the reverse happens. In general the decoder always mirrors the encoder in every way, an exception for this is when a loss function has certain requirements for the dimensions output. As a result of such an architecture the model is forced to create a compressed representation of in the input.

2.5.1 Linear

A network using only linear layers is the simplest type of network. These layers are fully connected and as a result all spatial information is lost. In simple images such as the MNIST images this is not a issue and the network is able to reconstruct the images. However if the images get more complex and the spatial information becomes more important, linear architectures start to struggle to produce good reconstructions.

2.5.2 Fully Convolutional

A Convolutional neural network (CNN) is an architecture that is commonly used for image processing. It retains spatial information and is able to extract features like edges, corners and textures. This enables the network to create complex intermediate representations of the input. For this reason the CNN has seen huge successes in image recognition competitions [9]. Unlike a regular CNN a fully convolutional network (FCN) does only use the kernel size, stridwe and padding parameters to control the sizes of the layers. The FCN has shown great progress at such semantic segmentation tasks, even exceeding the state-of-the-art techniques [11].

(11)

2.5.3 Residual Neural Network

Another promising architecture is the Residual neural network (ResNet). While it is fairly similar to a CNN, it also uses skip connections. This enables con-nections that skip several layers before being added to the main layer. As a result information that might have been lost through the convolutions can be reintroduced. This enriches the training and is shown to be especially effective in very deep networks [3].

2.5.4 Combined Architectures

A CNN or ResNet can also be combined with linear layers. In such a setup the image first goes through several layers of the CNN or ResNet before being flattened for the linear layers. As a result the linear layers get complex inter-mediate representations as input instead of the raw image. However the spatial information is lost due to the use of linear layers. Nonetheless it does handle images better than just a linear network.

3

Method

3.1

Dataset

The images are acquired from the HAM10000 dataset [15]. Each image label was either determined histopathologically or diagnostically. It contains a total of 10015 images see Table 1 for the distribution.

Diagnostic categories Amount of images

Actinic keratoses and intraepithelial carcinoma (AKIEC) 327

Basal cell carcinoma (BCC) 514

Benign keratosis-like lesions (BKL) 1099

Dermatofibroma (DF) 115

Melanoma (MEL) 1113

Melanocytic nevi (NV) 6705

Vascular lesions (VASC) 142

Table 1: The number of images of each diagnostic category.

The data is split in a 9:1 ratio for the training and test set and in order to reduce the imbalance in the training set the number of images are fitted to the BCC category, thus having 463 images [10]. (see Table 2 below)

(12)

Type of skin cancer Training set Tets set AKIEC 294 33 BCC 463 51 BKL 463 110 DF 103 12 MEL 463 111 NV 463 670 VASC 128 14

Table 2: The number of images in the training and test set after the split and reducing the imbalance of the data.

3.2

Image Preprocessing

The original image sizes are 450x600 were cropped in the middle to a size of 300x300 and are then resized to 64x64. This is done to reduce the train time required for each model. As for data augmentation the images are flipped both horizontally as vertically and small variations in both saturation and bright-ness to account for different lighting conditions and to make every image a bit different, to encourage the model to generalise better.

3.3

Implementation

For the experiment each of the models, seen in Table 3, were run for 100 epochs each. The model was trained using the Adam optimiser with a learning rate of 0.001. Furthermore during the first 33 epochs the β term was gradually increased from 0 to 1, allowing the model to first focus on the reconstruction while slowly introducing more structure [1]. Both the architectures consists of 22 layers in total, using 11 in the encoder and 11 in the decoder, although the FCN ResNet has 3 skip connection in both the encoder and the decoder. As for the bottleneck, it had a size of 64.

Archecture Prior/ Posterior Loss function

FCN Gaussian Cross Entropy

FCN Gaussian Mixture

FCN Laplace Cross Entropy

FCN Laplace Mixture

FCN ResNet Gaussian Cross Entropy

FCN ResNet Gaussian Mixture

FCN ResNet Laplace Cross Entropy

FCN ResNet Laplace Mixture

Table 3: The eight different models that are used for the experiment.

The experiments where developed using Pytorch 1.0.1 for training the model and scikit-learn 0.20.3 for the evaluation of the cluster using t-SNE, both run on Python 3.7.3.

(13)

3.4

Evaluation

The results were evaluated on the training/ test losses, analysing both the loss function and the KLD values during the training process. Additionally the possible emergence of meaningful cluster is checked. Lastly the quality of the reconstruction are inspected.

4

Results

(a) Training losses (b) Test losses

(c) The KLD values (d) The discretized mixture values

Figure 3: The graphs above show the training loss (a), the test loss (b) and the two aspects of the loss function, namely the KLD (c) and the discretized mixture values (d), over the course of 100 epochs. All of the models use the same loss function namely mixture.

(14)

(a) Training losses (b) Test losses

(c) The KLD values (d) The cross entropy values

Figure 4: The graphs above show the training loss (a), the test loss (b) and the two aspects of the loss function, namely the KLD (c) and the cross entropy values (d), over the course of 100 epochs.All of the models use the same loss function namely cross entropy.

(15)

(a) FCN Mixture Gaussian (b) FCN Mixture Laplace

(c) FCN Cross Entropy Gaussian (d) FCN Cross Entropy Laplace

Figure 5: The graphs above show the 2D visualisation of the compressed repre-sentations of the corresponding models. Here each colour represents a different diagnostic category.

(16)

(a) FCN ResNet Mixture Gaussian (b) FCN ResNet Mixture Laplace

(c) FCN ResNet Cross Entropy Gaussian (d) FCN ResNet Cross Entropy Laplace

Figure 6: The graphs above show the 2D visualisation of the compressed repre-sentations of the corresponding models. Here each colour represents a different diagnostic category.

(17)

(a) FCN Mixture Gaussian (b) FCN Mixture Laplace

(c) FCN ResNet Mixture Gaussian (d) FCN ResNet Mixture Laplace

(e) FCN Cross Entropy Gaussian (f) FCN Cross Entropy Laplace

(g) FCN ResNet Cross Entropy Gaussian (h) FCN ResNet Cross Entropy Laplace

Figure 7: The graphs above show the reconstructions made by each model.

5

Discussion

In Figure 3 it can be seen that all model perform roughly equally. Note that after epoch 20 the training and test losses are flattening, which means that training is slowing down significantly. This can be seen in (c) as it has a lot of fluctuations, but is barely decreasing. This is likely because the model struggles to impose a structure in the data. In (d) can be seen that the reconstruction quality does also flatten out at around 30 epochs, probably this is due to a lack of structure in the data, as the compressed representation does not have enough specific information to make a proper reconstruction. Figure 4 shows a distinction between the prior/ posterior Gaussian and Laplace. As (a) and (b) both show that the models with a Gaussian as prior/ posterior perform a slightly better, regardless of the architecture. another interesting aspect is the increase of the training loss. As the increase only seem to last till around 33 epochs, it is likely that this due to the increase of the β term in the KLD loss. While the reconstruction loss does not change much, resulting in the training loss going up, before levelling out. Figure 5 and 6 show the 2D visualisations of the compressed representations. As one can see the data points are not getting clustered in any meaningful way. As the points are scattered and mixed up with each other. Figure 7 shows the reconstructions made by each model. Note the distinct difference between the mixture and cross entropy loss. This is because

(18)

the cross entropy loss has no dependency between the pixels, while the mixture loss does have a build in dependency. While all model do learn the general case, skin colour with a darker spot in the middle, none seem to be able to capture more details than that. As can be clearly seen in (a) is that the models do match the skin colour and make small variations in the size and saturation of the dark spot in the middle. However the size of the dark spot in the reconstruction does not follow the data well.

6

Conclusion

Clearly none of the models perform at a level that would be useful. This is due to a lack of details in the reconstructions. Since if the reconstructions are so similar, they are also similar as a compressed representations, which in turn makes it impossible to find meaningful clusters within these representations. Regarding the quality of the reconstructions, the mixture loss creates more realistic reconstructions. Further research is required to let the VAE better capture the details of the images.

7

Future work

It might be interesting to further explore a full on grey scale approach, as it did seem to have potential during the exploration/ testing stage of this thesis. This might be because using grey scale enables the model to extract details and the structure of the image more easily. an example of a reconstruction made using grey scale can be seen in the appendix.

References

[1] Christopher P. Burgess et al. “Understanding disentangling in β-VAE”. In: arXiv e-prints, arXiv:1804.03599 (Apr. 2018), arXiv:1804.03599. arXiv: 1804.03599 [stat.ML].

[2] Michael Figurnov, Shakir Mohamed, and Andriy Mnih. “Implicit Repa-rameterization Gradients”. In: CoRR abs/1805.08498 (2018). arXiv: 1805. 08498. url: http://arxiv.org/abs/1805.08498.

[3] Kaiming He et al. “Deep Residual Learning for Image Recognition”. In: CoRR abs/1512.03385 (2015). arXiv: 1512.03385. url: http://arxiv. org/abs/1512.03385.

[4] Kaiming He et al. “Identity Mappings in Deep Residual Networks”. In: CoRR abs/1603.05027 (2016). arXiv: 1603.05027. url: http://arxiv. org/abs/1603.05027.

[5] Irina Higgins et al. “Early Visual Concept Learning with Unsupervised

Deep Learning”. In: arXiv e-prints, arXiv:1606.05579 (June 2016), arXiv:1606.05579. arXiv: 1606.05579 [stat.ML].

(19)

[6] C. Holterhues et al. “Burden of disease due to cutaneous melanoma has increased in the Netherlands since 1991”. In: British Journal of Dermatol-ogy 169.2 (2013), pp. 389–397. doi: 10.1111/bjd.12346. eprint: https: / / onlinelibrary . wiley . com / doi / pdf / 10 . 1111 / bjd . 12346. url: https://onlinelibrary.wiley.com/doi/abs/10.1111/bjd.12346. [7] C. Karimkhani et al. “The global burden of melanoma: results from the

Global Burden of Disease Study 2015”. In: British Journal of Dermatology 177.1 (2017), pp. 134–140. doi: 10 . 1111 / bjd . 15510. eprint: https : / / onlinelibrary . wiley . com / doi / pdf / 10 . 1111 / bjd . 15510. url: https://onlinelibrary.wiley.com/doi/abs/10.1111/bjd.15510. [8] Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”.

In: arXiv e-prints, arXiv:1312.6114 (Dec. 2013), arXiv:1312.6114. arXiv: 1312.6114 [stat.ML].

[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Clas-sification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Cur-ran Associates, Inc., 2012, pp. 1097–1105. url: http://papers.nips.cc/ paper/4824- imagenet- classification- with- deep- convolutional-neural-networks.pdf.

[10] Yeong Chan Lee, Sang-Hyuk Jung, and Hong-Hee Won. “WonDerM: Skin

Lesion Classification with Fine-tuned Neural Networks”. In: CoRR abs/1808.03426 (2018). arXiv: 1808.03426. url: http://arxiv.org/abs/1808.03426.

[11] Jonathan Long, Evan Shelhamer, and Trevor Darrell. “Fully Convolu-tional Networks for Semantic Segmentation”. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2015. [12] A¨aron van den Oord et al. “Conditional Image Generation with PixelCNN

Decoders”. In: CoRR abs/1606.05328 (2016). arXiv: 1606 . 05328. url: http://arxiv.org/abs/1606.05328.

[13] Tim Salimans et al. “PixelCNN++: Improving the PixelCNN with Dis-cretized Logistic Mixture Likelihood and Other Modifications”. In: CoRR abs/1701.05517 (2017). arXiv: 1701 . 05517. url: http://arxiv.org/ abs/1701.05517.

[14] Rebecca Siegel, Deepa Naishadham, and Ahmedin Jemal. “Cancer statis-tics, 2012”. In: CA: A Cancer Journal for Clinicians 62.1 (2012), pp. 10– 29. doi: 10.3322/caac.20138. eprint: https://onlinelibrary.wiley. com / doi / pdf / 10 . 3322 / caac . 20138. url: https://onlinelibrary. wiley.com/doi/abs/10.3322/caac.20138.

[15] Philipp Tschandl. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Version DRAFT VERSION. 2018. doi: 10.7910/DVN/DBW86T. url: https://doi.org/ 10.7910/DVN/DBW86T.

[16] Lilian Weng. From Autoencoder to Beta-VAE. Aug. 2018. url: https: //lilianweng.github.io/lil- log/2018/08/12/from- autoencoder-to-beta-vae.html.

(20)

Appendix

Figure 8: A example of a reconstruction using grey scale after training for 19 epochs.

For everyone that is interested in unsupervised learning for medical images the code is open source and available at: https://github.com/Ryomara/VAE_ Dermatoscopic_Images

Referenties

GERELATEERDE DOCUMENTEN

Voor het greppelspoor in werkput 8 hebben we geen verdere verklaring bij gebrek aan archeologica en vervolgsporen of sporen die eraan gelinkt zouden kunnen worden. Helaas werd er

Abstract: This study tests the ability of a convolutional neural network (ConvNet) based multi- agent system (MAS) to isolate wildfires in a simulation.. The ConvNet is designed to

Evaluations of the results for normal test set as well as the obscured sets show that the ELM has an advantage in recognition accuracy and in computational time

The convolutional neural network attained a mean error of 6cm in location error and 5 ◦ in orientation error on the small environment and 8cm in mean location error and 4 ◦ in

and mortality, such as an unhealthy lifestyle and disparities in health care access that are associated with mental illness.1,5 In addition, the use of psychotropic drugs may

Most of the graphene spintronics stud- ies were focused on aforementioned two challenges in order to achieve the large spin relaxation lengths and long spin relaxation times while

We then applied our procedure which converges quickly, that is, after two to three iterations. The final sample of member stars is marked by the red filled circles in the same

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright