• No results found

Content-based image retrieval in dermatology

N/A
N/A
Protected

Academic year: 2021

Share "Content-based image retrieval in dermatology"

Copied!
69
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Content-based image

retrieval in dermatology

Sander Land

Master Thesis in Computing Science

June 30, 2009

(2)
(3)

Content-based image

retrieval in dermatology

Sander Land

Supervisor: Nicolai Petkov

Institute of Mathematics and Computing Science P.O. Box 407

9700 AK Groningen

The Netherlands

(4)
(5)

iii

Summary

Ever since digital imaging became available, researchers have envisioned expert systems with many thousands of images that would help dermatologists make diagnoses, increas- ing their accuracy and improving patient care. Nowadays such databases are available, but most research so far has only focused on a very small subset of these images, those involving close-ups of brown lesions which are possibly malignant melanomas. This thesis instead explores mostly other types of images from dermatology, using images provided by the University Medical Center Groningen. Using and extending techniques from image processing and machine learning, a content-based image retrieval system was created that can effectively retrieve lesions that are similar to those in a query image.

Features related to the color and boundary sharpness of a lesion were found to perform best, while features related to fractalness are ineffective. Using the best performing fea- tures the system can on average retrieve three images with the correct diagnosis in the first ten results, and can be used to give a correct automated diagnoses for about half of these new queries. It is concluded that automated CBIR systems can help to make bet- ter use of large dermatological databases and effectively assist in making dermatological diagnoses.

(6)
(7)

v

Contents

1 Introduction 1

1.1 Related work . . . 2

1.2 Dataset descriptions . . . 5

1.2.1 Dataset I: 205 images . . . 5

1.2.2 Dataset II: 584 images . . . 6

2 Image segmentation 9 2.1 User input . . . 9

2.2 Preprocessing . . . 10

2.2.1 Background correction . . . 10

2.2.2 Smoothing . . . 13

2.3 Segmentation . . . 13

2.3.1 Thresholding . . . 13

2.3.2 K-means clustering . . . 13

2.4 Postprocessing . . . 14

2.4.1 Identifying the lesion segment . . . 14

2.4.2 Smoothing the boundary . . . 15

2.5 Results . . . 16

3 Feature extraction and classification 19 3.1 LVQ classifiers . . . 20

3.1.1 LVQ . . . 20

3.1.2 GLVQ . . . 21

3.1.3 GMLVQ . . . 22

3.1.4 Learning schedules . . . 22

3.2 Color . . . 23

3.2.1 Feature extraction . . . 23

3.2.2 Classification and feature selection . . . 24

3.2.3 Cross-validation . . . 27

3.2.4 Conclusions . . . 29

3.3 Boundary sharpness . . . 29

3.3.1 Normalizing for color differences . . . 30

3.3.2 Gradient normal to the boundary . . . 31

(8)

3.3.3 Feature extraction . . . 31

3.3.4 Results . . . 33

3.3.5 Conclusions . . . 35

3.4 Boundary fractalness . . . 36

3.4.1 Snakes . . . 36

3.4.2 RATS Segmentation . . . 36

3.4.3 Feature extraction . . . 37

3.4.4 Results . . . 37

3.4.5 Conclusions . . . 38

4 Building the CBIR system 39 4.1 Graphic user interface . . . 39

4.2 Dataset II tests . . . 40

4.2.1 Results using previous training . . . 41

4.2.2 Retraining . . . 43

5 Conclusions and future work 47

A CBIR results 53

(9)

1

Chapter 1

Introduction

Ever since digital imaging became available, researchers have envisioned expert systems with many thousands of images that would help dermatologists make diagnoses, increas- ing their accuracy and improving patient care [MSL+89]. Surveys show [SM92, SMSA03]

that diagnostic sensitivity for the average unaided dermatologist lies anywhere between 66% and 81%, so there is certainly room for improvement here.

Nowadays such databases are available, but most research so far has only focused on a very small subset of these images, those involving close-ups of brown lesions which are possibly malignant melanomas. This thesis, on the other hand, explores mostly other types of images from dermatology.

The department of dermatology at the University Medical Center Groningen (UMCG) has a rapidly growing database which currently consists of nearly 50, 000 images taken under controlled conditions, which will be used for this purpose. These images are only labeled with a diagnosis, and cover a wide range of possible skin conditions.

In this thesis the previous work done with this data set (using color features) is extended, features for boundary sharpness are introduced, and some other possibly important properties of the lesions are investigated, with as main goal the creation of a content-based retrieval system which can help dermatologists make diagnoses. Since the database is quite large, the main focus is keeping the amount of user input required to a minimum by automatically extracting features to determine important properties of the lesions.

The next section describes some previous work on image processing in dermatology and content-based image retrieval systems in general. In chapter 2, the amount of user interaction with the system is considered, and segmentation methods for the images are described. In chapter 3, the main work on feature extraction and classification is ex- plained, including color features, boundary sharpness features and boundary fractalness features. Next, in chapter 4, the knowledge derived in the previous chapters is used to create and test a content-based image retrieval system. Finally, in chapter 5 some conclusions are drawn and possible directions for future work are discussed.

(10)

1.1 Related work

There is a rich history of using digital imaging for dermatological purposes, which started very soon after the technology became available. But, as mentioned in the introduction, most of the previous work has focused on detecting malignant melanoma. Since many of the techniques will no doubt be useful in our research, we will start by giving an overview of this kind of research.

This chapter is not meant to be complete, just to give a general overview of what kind of work is typical, explore the relevant techniques and consider to what extent previous work can help us. Other previous work can be found cited by overviews of the literature like [SM92, Ken95, BZA08].

Some of the earliest work which moved beyond the basics of processing the images and towards detecting specific properties of lesions was done by Moss et al. [MSL+89]

in the late eighties, who describe a system that detects some properties of a dark skin lesion. They focus on semitranslucency and tumor border, using features based on the Fourier spectrum of the image. Although these extracted features are not yet used for anything, they describe as their ‘ultimate goal’ an automated diagnostic assistant with a large database of images which is capable of suggesting possible diagnoses and showing similar images to help a physician make a diagnosis.

Claridge et al. [CHKA92] describe an automated method of extracting the irregu- larity in shape or outline of a mole. They distinguish regular and irregular shapes with regular or irregular borders. They use a feature equivalent to the well-known shape factor

A

I2 for the shape irregularity (where A, I are area and perimeter of the shape respec- tively) and an estimate for the fractal dimension for the border irregularity (described in more detail in section 3.4). They also compare their results to actual dermatologists diagnosing the lesion as benign or malignant from looking only at the lesion silhouettes.

Their method has a specificity of 69% and a sensitivity of 91% compared to respectively 89% and 48% for the average dermatologist. As the sensitivity (also known as recall rate) describes the fraction of true positives detected out of all existing positives, and the specificity describes the fraction of true negatives out of all existing negatives, this shows that a dermatologist is much less likely to detect a malignant melanoma by its outline alone, but the computer program also detects many more benign lesions as being malignant.

Stoecker et al. [SM92] give an overview of the surge in digital imaging in dermatology that was happening around that time, and describe the potential benefits. Most of the activity seems to revolve around detecting malignant melanoma, and the images involved are almost exclusively of fairly small and clearly defined lesions taken under controlled circumstances (often even using microscopy or dermatoscopy). He concludes that in 1992, no one really knew the answer to whether or not diagnoses could be automatically acquired from images, and notes that any approach that mimics dermatologists is likely to have severe limitations, as dermatologists are around 75% accurate in their diagnoses and often unable to explain how they come to a diagnosis.

(11)

1.1. RELATED WORK 3

By the mid-nineties, more digital imaging systems were starting to become available in dermatology.

Green et al. [GMP+94] describe development of a hand-held device for diagnosis of melanoma based on their earlier prototype [GMM+91]. They reach an accuracy of up to 89% using features like lesion size, color and gradient at the lesion boundary. However, they use only 164 out of their already limited data set of 204 images, because their segmentation and feature extraction algorithms are unable to process the remaining images. They also compare their method to diagnosis by clinical ratings alone. Even though their automated methods are quite limited, they still outperform the manual feature extraction (89% vs. 83%) in a binary “melanoma/not melanoma” classification, although this difference is purely due to the clinical assessment giving 10 more false positives (out of 146 non-melanoma) and the methods have the same rate of the more dangerous false negatives (two false negatives out of eighteen melanomas).

Kenet [Ken95] describes the rise of digital imaging in dermatology around this time, and the advantages this has brought. He also gives an overview of some of the many digital imaging systems that have been developed over the years. Most of these involve processing pigmented skin lesions and detecting which are melanomas, usually using the well-known ’ABCD’ rule [NSM+94]. What the ABCD features include varies between different sources, but the most common meanings are:

A Asymmetry. A melanoma is more likely to have one half of the lesion unlike the other.

B Border. A melanoma is more likely to have a very abrupt and/or irregular border.

C Color. Melanoma often have variations in color. The number of distinct colors in a lesion (from light-brown, dark-brown, blue-gray, black, white and red) determines how likely a lesion is a melanoma.

D Diameter. The diameter of a melanoma is usually at least 6mm. Can also be used to mean “differential structure”, which refers to existence of globules, dots, streaks, etc.

(E) Sometimes included as well, to mean either elevation of the lesion, or the evolution of the lesion over time, both of which are important characteristics used by der- matologists. Both are rarely mentioned in the context of digital imaging systems, as both are difficult or even impossible to detect from images, depending on the system used.

Automatic identification of melanoma further improved over the following years, leading to several commercial systems. One of these is tested by Jamora et al. [JWMB03].

The system they tested uses image features inspired by the ABCD criteria, combined

(12)

with a database with expert diagnoses. It only accepts dermatoscopic images and re- quires daily calibration for color and shading corrections. The computer system recom- mended 52 extra biopsies on lesions that were dismissed by a dermatologist, and nine of these were true positives. They conclude that computerized analysis can help especially to detect clinically unsuspicious but potentially dangerous lesions that might otherwise have been neglected.

Also, there is still current research going on which attempts to improve techniques for determining the ABCD features, as well as research in finding new relevant properties for diagnosing melanoma. Some examples of this research are research by Stanley et al.

[SMSA03], who focus on determining lesion color, and Ng et al. [NFL05], who focus on using the lesion asymmetry instead, using automated segmentation and hair removal.

Maglogiannis et al. [MZK06] use a variety of features to determine several important properties of the lesions. They find that local thresholding on grey-scale images is most effective for segmentation, and also compare it to global thresholding and region growing. Using several color, border and texture features they cover the ABCD criteria multiple times over, and train support vector machines which are nearly 90% accurate in classifying melanomas.

Some research focuses on features other than the ‘ABCD’ ones. Following recent trends in dermatology, Serrano and Acha [SA09], look at global texture like globular and reticular patterns. Using a method based on Markov random fields, they are able to recognize these texture patterns with 86% accuracy, but do not extend their method to yield an automated diagnosis.

The focus on malignant melanoma also means the variation in the images is quite small in most of these papers. Many of them only include images which contain a single brown lesion in the center of the image surrounded by healthy skin. Some papers have slightly more variation, including for example some blueish lesions, but even in these cases the images are quite similar to each other.

The image database we are working with on the other hand has tremendous variation, which means that many of the features used in previous work can not be determined for all images. The chapters on segmentation and feature extraction cover these problems in more detail.

As the goal is to create a content-based image retrieval system, it is also interesting to consider previous work in medical content-based image retrieval systems.

An overview of content-based image retrieval is given by Tagare et al. [TJD97], who give useful guidelines for designing medical content-based image retrieval systems.

They start by stressing the difficulty of determining relevant image features, considering the fact that actual diagnoses are often “greater than, and not merely a result of, an assemblage of small decisions about the existence of particular elemental features”.

Text-based approaches in which images are tagged with keywords are insufficient because there are often many possible synonyms in descriptions and much of the image

(13)

1.2. DATASET DESCRIPTIONS 5

content is ignored. Medical CBIR systems also require greater understanding of the image content, so as to achieve a higher retrieval rate, compared to what is needed for browsing stock photo databases. To achieve this goal, more user interaction in entering data and queries is often acceptable. Also, as more information becomes available as to what features in images are important and how experts make diagnoses, medical CBIRs should evolve to take advantage of this new information, and should be designed to be able to handle such changes.

An earlier CBIR for dermatology was developed by Nadia Trojan [Tro04], who de- scribes a system which can handle more lesions than just the small brown ones. However, she uses features that describe the global texture of the image (including background, nails, etc.), thus with very low image understanding, and not a single measure of the performance of their system is given.

There has also been some previous work on the data set used in this thesis [BPB+09, BPJ09], in which the use of various color features was investigated. This work is de- scribed in more detail in the next section and in chapter 3.2.

1.2 Dataset descriptions

1.2.1 Dataset I: 205 images

The main dataset this thesis is concerned with consists of 205 photographs of lesions taken under controlled illumination conditions, labeled with diagnoses. These images are part of a rapidly growing database of nearly 50,000 images from the Department of Dermatology at the University Medical Center in Groningen.

This dataset originally consisted of 211 images, but six were removed over privacy concerns as they showed recognizable faces of patients. The images span a variety of diagnoses, with the number of images per distinct skin condition varying from only a single one to several dozen. The controlled illumination conditions involve a specialized room in which every image was taken. However, there was no control for zooming, and the illumination can still vary depending on how the patient is oriented with regard to the light source. Some images from this data set can be seen in figures 2.1, 2.3 and 2.4.

The earliest work on this dataset was done by Bosman et al. [BPJ09], who annotated the dataset for color (categorized as red, white, blue or brown), then manually selected regions for healthy and lesion skin. Using these regions they extracted mean color data of the lesion and healthy skin in various color spaces, and compared performance using a simple nearest neighbour retrieval. They concluded that the color difference between lesion and healthy skin is especially important.

Bunte et al. [BPB+09] extended this work. Taking the mean color data in various

(14)

color spaces extracted in [BPJ09], they used the more advanced GMLVQ classifiers to find more effective linear combinations of the features other than simply the difference between healthy and lesion skin. This resulted in significant improvements in classifica- tion performance.

Chapters 2 and 3 are mainly concerned with this dataset, the one described below is introduced later on.

1.2.2 Dataset II: 584 images

A new dataset was recently selected from the same database, in response to concerns about limited data and the varying amount of images per diagnosis. This dataset con- tains 584 images from 11 different dermatological conditions, with varying amounts of images in each group. Only some of these diagnoses are also present the other dataset.

Table 1.2.2 shows more details about the dataset. As can be seen from the descrip- tions as well as from the images, is that the diagnostic categories can be quite vague, and often include widely varying lesions. Section 4.2 describes tests done on this dataset, and some example images can be seen in figure 4.1.

.

(15)

1.2. DATASET DESCRIPTIONS 7

Diagnosis (Dutch)

Diagnosis (English)

# Description [New09]

Actinische keratose

Actinic keratosis

51 Thick, scaly, or crusty bumps caused by abnormal skin cell development due to exposure to ultraviolet radiation.

Basaalcelcarci- noom

Basal cell carcinoma

63 Most common form of skin cancer, and rarely lethal. Varies in form and color, but often appears as a red scaly local- ized patch.

Eczeem Eczema 103 Broad term for a range of persistent skin condition, with different causes and symptoms. Often involves a rash, dryness or redness of the skin

Lentigo Lentigo 34 A small brown or black pigmented spot on the skin with a clearly defined edge.

Melanoom Melanoma 58 Malignant tumor due to uncontrolled growth of pigment cells, which causes the majority of skin cancer related deaths.

Mycose (Tinea Capitis)

Mycosis (Tinea Capitis)

31 A Mycosis (parasitic infection by a fun- gus) of the scalp. Can be caused by a variety of fungi, with different symp- toms including bald patches, inflamma- tion, scaling and itching.

Naevus naevocellu- laris

Nevocellular Naevus

52 A very common and benign type of Naevus (also known as birthmark or mole).

Plaveiselcelcar- cinoom

Squamous cell carcinoma

58 Common type of malignant tumor of the epidermis, with highly variable ap- pearance including scaly or crusted lumps.

Psoriasis pustulosa

Pustular psoriasis

41 A rare form of psoriasis, which involves widespread pustules on red skin Psoriassis

vulgaris

Plaque psoriasis

52 A very common form of psoriasis, which appears as plaques (large flat ar- eas of inflamed skin) covered with sil- very scaly skin).

Verruca seborrhoica

Seborrheic verruca

41 A harmless skin growth, oval or round in shape, and tan to brown in color.

Table 1.1: Description of the new dataset. The column “#” contains the number of images in the dataset with a certain diagnosis.

(16)
(17)

9

Chapter 2

Image segmentation

As the goal is to extract information about the healthy skin, the lesion and lesion boundary, a way to separate the lesion from the healthy skin is needed. Furthermore, distracting elements like a background screen, clothes, skin moles, etc. are often present in the images, which also need to be removed as much as possible, further complicating the segmentation process. This chapter describes the segmentation process used.

2.1 User input

First, we need to consider what type of user input the method will be based on. This can range from completely manual segmentation [SMSA03], which is cumbersome and simply impractical for over the nearly 50, 000 images involved in this project, to fully automatic segmentation, which is very difficult considering the variety and quality of the images involved (see for example figure 2.1(a)). However, for some other datasets where all images involved are close-ups of single lesions (like in much of the work mentioned in the previous chapter), this is certainly possible.

There are also possibilities in between these two methods. For example, selecting a healthy and a lesion region, which was done in previous work on this dataset [BPB+09, BPJ09]. One drawback of this option is that this will not give (part of) the lesion boundary (see for example figure 2.1(b)), and the regions are sometimes chosen in a way that would make determining part of the boundary very difficult.

Another option is selecting a region of interest containing both healthy skin and (part of) a lesion, but without distracting elements like clothes, background, skin moles, etc.

In this thesis, this last option will be used, as it requires only two clicks to select such a rectangular region, while still getting rid of the more difficult issues in segmentation.

Also, such a region can be used to extract information about the lesion boundary, any information about color that could be obtained previously from two selected regions, and more.

(18)

(a) Example image which would be difficult to segment automatically. By selecting the region indicated by a blue line it can be seg- mented to extract healthy lip color, lesion color, and boundary information.

(b) Example image where ask- ing a user to select a healthy re- gion (green) and a lesion region (red) can make it impossible to retrieve boundary information.

Figure 2.1: Examples of images from the database

The rest of this chapter describes the methods used to segment the region of interest selected by the user.

2.2 Preprocessing

As the provided images are quite noisy and very large, substantial preprocessing is necessary. The original images have sizes ranging from 0.5 to 12 megapixels, and fraction of the images cropped out as regions of interest also varies greatly. First, the cropped regions are resized to have a maximum dimension of 200 pixels using a simple nearest neighbour resize. This speeds up further operations and reduces effects of the original image size and camera zoom, as well as reduces small noisy features like pores and hairs.

Next, the image is corrected for lighting effects and is smoothed.

2.2.1 Background correction

Some of the cropped images have non-uniform lighting, and the segmentation algorithm used was not able to handle this in some cases. In HSV space the effects of lighting result are usually a relatively clear gradient in the saturation and value channels, as in figures 2.2(b) and 2.2(c). The result of segmentation is then often dominated by lighting effects, for example the image shown in figure 2.2(a) would be divided into ’top half’ and ’bottom half’ segments, as in figure 2.3(c). To help proper segmentation, an

(19)

2.2. PREPROCESSING 11

algorithm was devised to remove such a gradient from images which are affected by non- uniform lighting, while leaving other images unaffected. This is known as “background correction”. Some common methods [Rus07] of doing this involve:

• Correcting the image by subtracting some approximation of the background, for example by:

– Fitting a low-order polynomial to the image, or

– Manually selecting the background regions, and fitting the polynomial to this data, or

– Using a separate image of only the background, or

– Using a version of the image with heavy smoothing, opening, closing or top- hat filters applied to it to remove the foreground objects.

These methods are very effective when the images consist of several small objects on a uniform background, for example in microscope images of bacteria and such.

However, for our dataset, the region of interest in very large lesions is often selected such that for example the left half of the image is healthy skin and the right half consists of part of the lesion. Even under uniform lighting the best polynomial fitted to such an image with uniform lighting would be non-zero, which is undesirable. The other methods are likewise unusable as they suffer from the same problem or are not fully automatic.

To prevent this, a variant of one of these background estimation algorithms was used which ignores edge pixels and only looks at the gradient in the more homogeneous areas of the image.

First, Gaussian smoothing (σ = 5) is applied to the image to remove noise. Then the squared gradient magnitude is determined in RGB space as simply the sum of the squared gradient magnitude of all the color channels. RGB space was used here to avoid problems with the hue channel, in which the definition of a gradient is ambiguous. Next, a mask is constructed by taking all pixels with a squared gradient magnitude below the median. Gradients of pixel positions in this mask are assumed to be caused by lighting effects and not by real edges. An example of such a mask is shown in figure 2.2(d) and corresponds quite nicely with the intuitive notion of the “flat” areas of the image.

Using this mask, the mean gradient in the saturation and value channels are deter- mined. The gradients detected by this process can be seen in figures 2.2(e) and 2.2(f).

Finally, an approximation of the background of the image is constructed, by making an image which has exactly this mean gradient and a value of 0 in the middle of the image. This approximation is determined for both these channels and then subtracted from these channels in the original nonsmoothed image. These corrected saturation and value channels are then combined with the original hue channel to create the final cor- rected image. This is effectively a efficient approximation to fitting the best polynomial of order zero (i.e. a constant) to the gradients, integrating this polynomial and subtract- ing it from the image. The process can be easily extended to higher order polynomials.

(20)

(a) Input image (b) Saturation channel of the smoothed input image

(c) Value channel of the smoothed input image

(d) Mask for non-edge pixels

(e) Detected gradient in saturation channel

(f) Detected gradient in value channel

(g) Saturation channel of output image

(h) Value channel of out- put image

(i) Output image

Figure 2.2: Example result of background correction.

Example output can be seen in figure 2.2(i), in which the light to dark gradient in the image was almost completely removed without any other effect on the image.

This procedure is quite effective, and very fast. A small drawback is that it can not handle lighting differences caused by curved surfaces like a leg, where the effect can be a dark-light-dark pattern. By fitting a higher order polynomial to the gradients it would be possible to correct for this as well, but this extension was not implemented because there were very few images in which this was a problem.

(21)

2.3. SEGMENTATION 13

2.2.2 Smoothing

To more completely remove noisy features like pores, which is needed for proper seg- mentation, additional smoothing is necessary. Because it is important to keep the clear edges between the lesion and healthy skin in those cases there actually are clear edges, only edge-preserving smoothers were considered for smoothing. Both median filters and Kuwahara filters [KHEK76] of various sizes were tested on some images. Of these, a rel- atively strong Kuwahara filter (a 13×13 filter consisting of four 7×7 regions overlapping on a central pixel), performed best in combination with the segmentation algorithms.

2.3 Segmentation

The goal of image segmentation in general is to separate the image into several regions.

In this case we want to separate the healthy skin from the lesion. This section describes various methods that were tried to find a general algorithm to segment an image with a lesion, regardless of the skin and lesion color.

2.3.1 Thresholding

For a grey-scale image, it is usually sufficient to consider some kind of thresholding method, but many of these methods devised for grey-scale images do not readily gener- alize to color images.

One color thresholding method [DCT04] that was tried involved applying Otsu’s thresholding method [GW07] to each color channel and then merging the eight regions according to some criterion. This method had decent results, although the boundaries in color space are very rigid because the segments are all necessarily a union of one or more cubes in the RGB space, and there were major problems forcing the method to result in exactly two regions.

2.3.2 K-means clustering

Another way to solve the problem is to use a more general clustering method like K- means or neural gas to the pixel colors. This has the advantage of being able to detect clusters not aligned with the color space axis and always give a fixed number of clusters.

Both of these methods were tested, and for K = 2 they give comparable results.

Preliminary tests showed that the HSV color representation in combination with K-means works quite well (compare with RGB in figure 2.3(a)). However, problems can arise when the hue channel contains values close to the minimum/maximum value, as the hue channel represents an angle and these should be close together, which they are not when using the Euclidian distance.

The distance metric can easily be fixed, by simply defining the distance on the hue channel as:

dh(ha, hb) = min (|ha− hb|, 1 − |ha− hb|)

(22)

That is, the shortest distance on the circle, where the hue angle ranges ranges from 0 to 1. Using these methods, the distance between two colors in HSV space can be defined as:

dhsv

 ha

sa va

,

 hb sb vb

= k(c · dh(ha, hb), sa− sb, va− vb)k

Where the factor c can be used to indicate the relevance of the hue channel.

Also, the mean of (0.9, 0.1) is set to 0.5, while 0.0 would be preferred here. The mean of several angles is ambiguous and can not be properly defined. For example, the mean of {0 degrees, 120 degrees, 240 degrees} is undefined. However, the images in this data set are unlikely to contain such anomalous sets of hue angles, and each cluster will usually contain an intuitive ‘center’, which we want to define mathematically. A way to define a mean which is closer to this intuitive notion is to convert each angle ω to the unit circle ω → e, taking the mean, and using the argument of this mean as the mean of the angles.

Finally, we have to define the constant c of the distance metric, which gives the relevance of the hue channel. Testing the method with several values of this constant shows that a value of about 5 − 7 works very well, giving due relevance to strong color differences, but still being able to properly segment an image with very small hue differ- ences dominated by noise, by looking at the clear differences in the saturation and value channels of the two segments. See figure 2.3 for examples of the effect of both too high and too low values. In the end, c = 2π was chosen for intuitive appeal: the distance of the hue channel is simply measured in radians.

The resulting algorithm is able to properly segment almost all images. However, the results around the lesion boundary are often a bit noisy, and could use some postpro- cessing to improve this.

2.4 Postprocessing

2.4.1 Identifying the lesion segment

The K-means process results in two segments, but does not yield any way to determine which of these segments is the lesion and which is the healthy skin. To make this de- cision, a simple heuristic was used: the segment with smallest mean distance from the center of the image is labeled the lesion segment.

Although this is a rather crude heuristic, it usually works very well. Using color would be far more difficult, as the images include examples such as a white lesion on healthy brown skin as well as brown lesions on healthy white skin. The method is still

(23)

2.4. POSTPROCESSING 15

(a) K-means segmentation of RGB data and HSV data for several values of c. Seg- mentation is very poor for RGB data or HSV data with low values of c.

(b) K-means segmentation of RGB data and HSV data for several values of c, for a lesion without clear differences in hue.

High values of c cause the noisy hue chan- nel to dominate resulting in poor segmen- tation.

(c) Like figure 2.3(a) but without back- ground correction. Now, only extremely high values of c result in correct segmenta- tion.

Figure 2.3: Example result of several variants of the K-means segmentation algorithm.

occasionally wrong, especially where the region of interest is selected as part of the boundary of a very large lesion. In this case the difference between the mean distances from center is usually very small, and it may be necessary to reselect the region of interest before the correct segment is labeled as the lesion.

Although problems with the few images which were labeled incorrectly could all be fixed by selecting a slightly different region of interest, a better solution to this problem may be to include a button to switch the segment labels in the final user interface for selecting the regions of interest. Another way would be to look at more fool-proof methods for distinguishing the two segments in case both of them include a significant part of the outer edge of the region of interest.

2.4.2 Smoothing the boundary

Although the K-means method results in decent segmentation, some noisy pixels may be misclassified because it is not sensitive to the spatial arrangement of the pixels. Also, in a diffuse boundary, where the segments are very ambiguous anyway, the result may be noisy and contain a lot of single-pixel components. These problems are solved by applying a morphological opening with a 5 × 5 structuring element, and then a similar closing. This removes small connected components in both segments and smoothes the boundary. A slight drawback to this is that it sometimes results in an overly smooth boundary in cases where the boundary is both clear and contains sharp corners, which is

(24)

especially clear in figure 2.4(c). It may be necessary to use another method which takes in to account spatial relations between pixels to further determine the exact boundary when this is necessary for feature extraction.

2.5 Results

Looking at the results of this segmentation procedure shows that almost all images are successfully segmented into two regions, one of which represents the lesion. About three images have poor enough segmentation that the mean color data will probably be af- fected. Several others have minor problems, like a specular highlight in the lesion being identified as skin or vice versa.

In general, the results certainly seem acceptable for determining color data, and are probably useful for extracting some information about the lesion boundaries as well.

Some examples can be seen in figure 2.4. Most segmentation results are similar to the one shown in 2.4(d), where most of the lesion skin has been detected as well as much of the boundary, though the segmentation is by no means perfect. Better segmentation results are similar to figure 2.4(a) where the entire lesion was properly segmented and has a clear and useful boundary. The poorest results tend to have a fragmented segmentation result with many misclassified regions as in figure 2.4(b).

(25)

2.5. RESULTS 17

(a) Properly segmented “Panniculitis” lesion. (b) Poorly segmented

“Hyperpig- mentation”

lesion.

(c) Overly smooth boundary in segmenting a

“Mastocytose cutane” lesion.

(d) Properly segmented “Psoriasis Vulgaris” lesion.

Figure 2.4: Some examples of segmentation results, with region of interest images on the left side, and area detected as healthy skin covered by black on the right side.

(26)
(27)

19

Chapter 3

Feature extraction and classification

The eventual goal of this project is to build a content-based image retrieval system which can help dermatologists retrieve images which are similar to one taken from a new patient, in order to use previous diagnoses to make new ones.

There are many ways to build such a system, but, as was discussed in the literature review, retrieving images using properties with high image understanding is important.

Therefore, we will use properties which can be understood by humans and are diag- nostically relevant. Also, to make it easier to extend the system later on, it should be modular, for example by considering each property by itself first and then integrating it in the whole. Given these premises, a good set of steps to build such a system are:

• Determine important properties of the lesions, which have a limited number of possible values. For example: color as being red, brown, blue or white.

• For each property:

– Determine features (i.e. simple numerical values) that are relevant for de- termining such a property, and extract these features from the images. For example, the mean RGB values of the lesion.

– Manually assign one of the relevant possible values of the property to (some of) the images, to be able to train a classifier. For example, labelling the lesions as having a red, brown, blue or white color.

– Train a classifier to categorize the images, using the extracted features.

• Finally, combine the classifiers in an integrated expert system which no longer requires manually assigning labels.

The properties used here are color and boundary sharpness, which are clearly appli- cable, being the ‘B’ and ‘C’ of the commonly used ABCD criteria. The ‘A’ (asymmetry,

(28)

or shape) and ‘D’ (diameter, or size) features are less applicable here, as these can not be determined for lesions like those covering an entire leg or arm, and in images where the scale is not clear. As an alternative property we will consider the fractalness of the boundary, also used in previous work like [CHKA92].

The modular setup also makes it easy to extend the system to use more properties at any time in the future when research in this area advances. A possible disadvantage is that it is not immediately clear what weight to assign to each of the properties in combining the classifiers, which is discussed in more detail in chapter 4.

3.1 LVQ classifiers

In this thesis, LVQ classifiers will be used to classify the data. They fit nicely into the CBIR scheme, and previous work [BPB+09] shows that the GMLVQ variant is especially effective. One big advantage of this classifier is that it trains a distance metric, which allows for retrieval of ‘most similar’ images without involving the categories used to train them.

In this way much more information about the images can be used in the expert system because the actual feature values are used, as opposed to retrieving images using only the category labels. This makes it possible to distinguish (for example) different shades of red, while also retaining the clear difference between the categories it was trained on.

3.1.1 LVQ

LVQ, or learning vector quantization, which was first introduced by Kohonen [Koh95], is a classification method that is both simple and powerful. The method trains one or more ‘prototypes’ per class: a small set of examples which are typical of all the members of that class. These prototypes can then be used in a nearest prototype classifier, by simply retrieving the nearest prototype(s) and looking to which class they belong. This kind of training makes the method very intuitive, as looking at the prototypes after training can show a lot about the data and the classifier, unlike looking at the result of training some other classification techniques, for example the weights of a neural net- work.

Given is a data set {ξi∈ RN | 1 < i ≤ n} with class labels K(ξi) = ci∈ {1, . . . , M }.

The original LVQ algorithm used on this dataset works as follows:

First, initialize (randomly or near cluster centers) one or more prototypes wj ∈ RN per class, with class label K(wj). Then, for each epoch:

• Iterate through the data set in random order

– For each vector ξi determine the nearest prototype wC.

(29)

3.1. LVQ CLASSIFIERS 21

– If K(wC) = K(ξi), i.e. the nearest prototype is of the correct class, move the prototype towards the data point.

– Otherwise, move the prototype away from the data point.

There are many variants of this algorithm, including updating several prototypes in each step (like LVQ2.1), or only updating prototypes if they are close to the data point.

All of these algorithms are still heuristics, although they often work extremely well.

3.1.2 GLVQ

More recently, there has been a move away from heuristics and towards LVQ methods with a more solid mathematical foundation [SY95]. By minimizing an error function using stochastic gradient descent, it is clear what the classifier optimizes, and its conver- gence properties can be better investigated. Also, unlike the normal LVQ algorithms, which often depend strongly on the Euclidean distance measure, these new algorithms can use any metric.

One very flexible option for the cost function, used in [HSV05], is:

X

ξi ∈ data

f dC− dI dC+ dI



(3.1)

Where f is a monotonically increasing function, like the logistic function f (x) =

1

1+e−x or the identity function f (x) = x, d(x, y) is the distance metric, dC = d(wC, ξi) is the distance to the closest correct prototype wC and dI = d(wI, ξi) is the distance to the closest incorrect prototype wI.

The resulting algorithm is known as generalized metric LVQ (GLVQ), and supports any distance metric, making it very flexible. Also, the algorithm is still relatively in- tuitive: it tries to move the closest wrong prototype as far away as possible relative to the closest correct prototype. The denominator, which scales the difference in distance to be in [−1, 1] helps to prevent known divergence with the standard LVQ algorithms associated with behaviour like dI→ ∞.

The cost function is optimized by stochastic gradient descent, resulting in the fol- lowing learning steps for each ξi examined:

wC,next = wC− ηwf0 dC− dI dC+ dI

 2dI

(dC + dI)2wCdC wI,next = wI+ ηwf0 dC − dI

dC + dI

 2dC

(dC+ dI)2wIdI

Where ηw is the learning rate and ∇~x is the gradient with respect to the variables ~x.

(30)

3.1.3 GMLVQ

Generalized matrix LVQ [BHS06] uses the cost function of GLVQ and additionally trains the distance metric:

di, wj) = kΩ(wj− ξi)k2

Where the projection matrix Ω ∈ RN ×N can be limited to be symmetric because any product ATA can be written as a square ATA = B2. This matrix can then be trained by once again using stochastic gradient descent:

ij,next = Ωij − ηf0 dC − dI dC + dI

  2dI

(dC+ dI)2

∂dC

∂Ωij − 2dC

(dC + dI)2

∂dI

∂Ωij



∂d(ξ, w)

∂Ωij =

 2 [Ω(ξ − w)]ij − wj) + 2 [Ω(ξ − w)]ji− wi) i 6= j

2 [Ω(ξ − w)]ii− wi) i = j

Where η is the matrix learning rate.

This algorithm finds the most effective linear combinations of the original features, transforming the feature space in such a way that a nearest neighbour classifier is most effective. It is also possible to train separate matrices for each prototype, which is known as localized GMLVQ. This technique was not used here, as the goal is to use the distance metric obtained in a global nearest neighbour retrieval.

An earlier version of this technique only trained a vector of relevances. This is equivalent to restricting the matrix Ω to have only diagonal entries and is known as generalized relevance LVQ, or GRLVQ [HV02].

3.1.4 Learning schedules

The learning rates can be held constant throughout the training process, but it is of- ten better to start with a higher learning rate and then decrease it as the algorithm converges.

The change of learning rate over time, or “learning schedule” used in [BPB+09] is:

η(t) = ηstart 1 + ∆η(t − tstart)

With ∆η = 0.0001, over 500 epochs. However, this schedule has η(500) ≈ 0.95η(1), as can be seen in figure 3.1 (“schedule 1”), which is hardly a useful decrease.

Using the same formula with as goal ηend= 1001 ηstart leads to a very sharp decrease of the learning rate, making it too low on average to properly converge within the 500 epochs (“schedule 2” in figure 3.1).

If instead the following schedule (“schedule 3” in figure 3.1) is used:

η(t) =





0 t < tstart

ηstart tstart< t < tmid ηstart

1 −10099 tt−tmid

end−tmid



otherwise

(31)

3.2. COLOR 23

Figure 3.1: Relative learning rate (i.e. η(t)/η(1) for several learning schedules..

The algorithm converges properly because there is no sharp decrease, but the learning rate in the last epochs is low enough for it to stabilize. This results in lower variation between several runs of the LVQ algorithm, and a slightly better classifier on average.

We use tstart = 1 for ηw, tstart= 50 for η, and tmid= 100, tend = 500 for both.

3.2 Color

Previous work on this dataset [BPB+09, BPJ09] as described in section 1.2.1 has fo- cused purely on classifying the lesions by their color, using manually selected regions of healthy and lesion skin. This section will try to reproduce some of this work using just the new single region of interest. Also, the effect of using features other than the mean color of a segment will be investigated.

For these tests, only the commonly used RGB color space will be used, because it is easy to compute and interpret, does not have problems with defining the metric as HSV does, and previous work shows it to be nearly optimal for these purposes with only the Lab color space being very slightly better.

3.2.1 Feature extraction

For the feature extraction step, three different segments will be considered:

• The healthy skin (as determined by the segmentation)

• The lesion skin (as determined by the segmentation)

(32)

• The lesion skin, but with the mean healthy skin color subtracted from it, referred to as “normalized lesion color”.

For each of these data sets, nine statistics are extracted: the mean, standard devia- tion, and a five-number summary (2.5th, 25th, 50th, 75th and 97.5th percentiles) of the dataset, for a total of 63 features.

3.2.2 Classification and feature selection

This particular set of features has a lot of redundancy, and in general it is preferable to work with the least amount of features while having the highest accuracy. This reduces the time needed to train classifiers, and speeds up feature calculation and the eventual retrieval of images. Furthermore, it is possible that performance is lower when using too many features, as this can lead to the classifier overfitting to the training data.

These goals clearly conflict, although some features can be dropped without affecting the accuracy at all. For example, the mean of the normalized lesion color can be obtained from the means of the healthy and lesion skin in GMLVQ, and some of the standard deviation features are identical.

Using all 63 features results in a training accuracy (one minus the training error) of around 95%. Dropping the features for standard deviations, which were mainly intended to be included in possible tests for homogeneity or border sharpness, does not reduce the accuracy at all, and they are unlikely to be useful so all further tests will start with the set of 54 features left.

One possible way to further reduce the amount of features used is to use vector rele- vances (as described at the end of section 3.1.3). This clearly gives a relevance measure to each feature, although it does not include how relevant some features may be when used in combination with others. Running LVQ with vector relevances consistently results in only five features being considered relevant, with all the other relevances converging to zero. Using these 5 features in GMLVQ gives suboptimal results, with a training accuracy of 80% which is far below even more strict tests done previously using only the 6 means, but similar to results obtained earlier [BPJ09] by simple nearest neighbour retrieval.

Another way to deduce the relevances of features is to look at the projection matrix Ω, used in a GMLVQ run. The squared distance used is d(~x, ~y) = kΩ(~x − ~y)k2, so the influence of feature xk on the distances will be proportional to:

∂xk

kΩxk2 = 2(Ωx)T∗k

Where Ω∗k denotes the k’th column of the matrix Ω. Assuming Ωx is randomly dis- tributed, the influence of feature xk can be taken to be proportionate to the entries in the k’th column, for example by taking the L1 norm rel1(k) = P

i|Ωik| or L2 norm rel2(k) =

qP

i2ik. Also note that because Ω is symmetric, the latter term is equal to Λkk, where Λ = ΩTΩ is known as the “relevance matrix” in GMLVQ.

(33)

3.2. COLOR 25

Figure 3.2: Training accuracy for methods 1 and 2 of eliminating features, compared to brute force elimination.

(34)

Since both of these measures correspond roughly to the influence of a feature on the distance measure, it is plausible that the feature with the lowest value here will be the least important in the overall classification, although there are undoubtedly counterexamples to this heuristic.

This method can be seen as an approximation of the feature selection method using the “objective function sensitivity” [GE03], which is sometimes used for feature selection in neural networks, although it has not yet been applied to LVQ algorithms. In this case the objective function sensitivity method would use ∂(cost function)

∂xk , where “cost function” is formula 3.1. This expression has the important factor ∂(distance metric)

∂xk used

in our approximation, and as long as the features are all of roughly the same magnitude (for example by normalizing them to to have standard deviation 1 and mean 0 as done in this chapter) the distance metric factor dominates. Analogously to “objective function sensitivity”, our method will be referred to as feature selection using distance metric sensitivity.

Using this, the set of features can be reduced by eliminating them one by one, starting with the feature with the lowest relevance measure. Determining the accuracy after eliminating a feature will yield a new Ω each time which can be used to determine the next feature to be eliminated.

Applying this procedure for rel1(k) and rel2(k) will be sometimes simply be referred to as “method 1” and “method 2” of eliminating features, respectively.

Another possibility is to reduce a set of n features to n − 1 features by simply trying all possible subsets of n − 1 features and choosing the one with the best classification result. Repeating this process until only one feature is left is very costly (O(n2) LVQ runs), but still possible in some cases. This will be referred to as the “brute force ap- proach”, although it is still a greedy method in that it assumes that the best set of n + 1 features includes the best set of n features, which is clearly false in some cases. On the other hand a true brute force approach to getting the optimal set of m features would involve trying all mn subsets, which is clearly impractical.

The results of both of these methods can be seen in figure 3.2 which shows that the training accuracy doesn’t drop when eliminating the features until about eighteen are left, after which accuracy keeps decreasing. Using the L1 norm in the distance metric sensitivity outperforms the L2norm, which seems to make some wrong choices earlier on leading to large drops in accuracy. The first method also has almost the same results as the brute-force approach in eliminating the last 18 features. As this is a single test, and there is inherent randomness in the LVQ training process, this difference might be largely luck. In any case, the brute-force approach can not be used for eliminating features up to the point where training error starts to decrease, and also seems unnecessarily inef- ficient afterwards as the same performance can be obtained by eliminating the features according to the distance metric sensitivity. Also, trying to reduce the computational cost by removing multiple least relevant features at once gives suboptimal results, as this causes training accuracy to decrease well before the point where 18 features are left.

(35)

3.2. COLOR 27

3.2.3 Cross-validation

The above tests have only used the training accuracy as a measure. Although low per- formance on these tests will certainly indicate low performance in a content-based image retrieval system, high training accuracy may be purely due to a process of overfitting on training data that doesn’t generalize to other data. To avoid this, a cross-validation test is used, in which the performance is tested on data not included in the training process.

The same cross-validation procedure as used in [BPB+09] will be used here:

The training data is split up in 10 random subsets D =S{d1, . . . , d10}. Next, each of the ten subsets di is left out in turn and a GMLVQ classifier is trained using the rest of the data. This will result in a matrix Ωi = GMLVQTrain(D \ di) which determines a metric.

This matrix Ωi is then used to determine the nearest neighbours of each element in di, as it would be when used in a CBIR, and the average correct retrieval rate is determined as a function of the amount of nearest neighbours retrieved. The training is repeated for ten random initializations of the GMLVQ method, to average out variations due to the random initialization of the prototypes.

These cross-validation tests were done on:

• The entire set of features, with the very obviously redundant features removed.

This results in 36 features, six statistics for the lesion and healthy skin segments in three color channels.

• Likewise, the set of first and third quartiles and means, for a total of eighteen features.

• Just the set of six means.

• Sets of 6, 14, 18 and 22 features left after eliminating features using the methods described in the previous section. This can be used to determine whether the point at which training accuracy starts to decrease is important, and if the resulting sets of six features are competitive with using the means.

The results can be seen in figure 3.3. Figures 3.3(a) and 3.3(b) show that using the set of 18 features performs best. Using a few more features is better than using a few less, although performance degrades when using many more in much the same way as when using many fewer features. The optimal amount of features seems to lie exactly on the point where training error starts to increase, i.e. where all redundant and useless features are presumably eliminated. The best results found by both methods are comparable, although the variation due to random initialization is a bit lower for method 1, which could indicate a more stable set of features. This method is also the only one that found a set of six features that performs just as well as using the means only, although the fact that the other method does not can also be seen from the fact that the training error is much higher in figure 3.2 at the point where six features are used. Figure 3.3(c) shows more clearly how close the results are, and shows that using a set of 18 manually selected is not capable of matching these results, and instead has performance similar to using only the means.

(36)

(a) Results for method 1

(b) Results for method 2

(c) Best results compared to alternatives.

Figure 3.3: Results of cross-validation tests for the color features. Shaded areas indicate variation due to random initialization. Performance is the mean precision (i.e. fraction of results of the same color class) for the first K retrievals.

(37)

3.3. BOUNDARY SHARPNESS 29

mean stddev min. 1st q. median 3rd q. max.

healthy red × × ×

green × × ×

blue ×

lesion red × × ×

green × ×

blue ×

difference red × × ×

green ×

blue ×

Table 3.1: The 18 color features selected by the feature selection method, indicated by

“×”.

3.2.4 Conclusions

Features automatically extracted from segmenting a single region of interest can match results obtained using two selected regions. A slightly better and more stable feature set (see table 3.1) can be extracted from using more features, reaching a mean accuracy of up to 89.0±0.7% at two nearest neighbours retrieved, although it is easy to lose performance from overfitting on ineffective features if too many features are used. This can be prevented by eliminating some of them according to their influence on the GMLVQ metric. Note that this accuracy measure only shows how many similarly colored images were retrieved, and does not necessarily have anything to say about how many images with the same diagnosis were retrieved.

3.3 Boundary sharpness

Another important characteristic of skin lesion is the sharpness of the lesion boundary, which is part of the ‘B’ in the ABCD guideline in dermatology.

For the purposes of this test, images of the lesions were manually labeled either as having a diffuse or having a sharp boundary. Some examples can be seen in figure 3.3.

Previous work that includes determining the boundary sharpness of lesions include Green et al. [GMP+94], who use the mean and standard deviation of the gradient in all three color channels. More recently, Maglogiannis et al. [MZK06] used the mini- mum, maximum, average and variance of the gradient operator of the image converted to grey-scale.

This, and other similar previous work, only included images of brown lesions. For our database, it will probably be more difficult to determine useful features which ef- fectively capture the sharpness of a border, especially since there can be lesions which have a very sharp edge but where color of the lesion is not much different than the skin

(38)

(a) White lesion with a sharp boundary

(b) Blue lesion with a dif- fuse boundary

(c) White lesion with a diffuse boundary (d) Brown lesion with a sharp boundary

Figure 3.4: Examples of sharp and diffuse boundaries in lesions.

color, as well as lesions which have a diffuse edge bridging a larger color difference. This means simply looking at the mean gradient magnitude along the boundary will probably not be sufficient.

One way to remedy this problem is to look at the gradient in an area around the boundary, instead of just at it. All the methods used in this section, look at the gradi- ent in a 5 × 5 area around each pixel in the boundary between the lesion and healthy skin segments, which will also compensate for the slightly inaccurate boundary that the segmentation process sometimes gives. A diffuse lesion would be expected to have fairly constant low values here, while a sharp lesion would have both very low and very high values.

The next section describes some additional methods devised to compensate for these color differences.

3.3.1 Normalizing for color differences

Because the color difference between the skin and lesion color can vary wildly and this has a large effect on gradient operators, it is probably useful to attempt to eliminate these variations.

(39)

3.3. BOUNDARY SHARPNESS 31

The color classification tests showed that the segmentation gives a good approxima- tion of the typical lesion and skin color in an image, so one way to compensate for these differences is to take some approximation to the typical healthy skin color ~ch and some approximation to the typical lesion color ~cl. Then transform each color in the image by using the function:

T (~c) = (~c − ~ch) · (~cl− ~ch)

(~cl− ~ch) · (~cl− ~ch) (3.2) That is, map every color vector to the line connecting ~cl and ~ch, such that ~ch is mapped to 0 and ~cl to 1. This method eliminates the effect of color differences between the healthy skin and lesion, because their difference in the resulting image is always 1.

Another method is to simply calculate the gradient in each color channel, and then divide it by the color difference in that channel. The values resulting from this will be referred to as “normalized image gradient”. One possible drawback of this method is that it might be somewhat unstable if the color difference in a channel is very small.

Figure 3.5 shows the effect of using these normalization procedures on some images of ideal sharp and diffuse lesions (Gaussian smoothed circles with σ = 1, 5 respectively).

Whereas the usual gradient magnitude changes in sign and magnitude between color channels and between different lesions of the same class, the normalized data is perfectly identical for all images in the same class.

3.3.2 Gradient normal to the boundary

In previous work only the gradient magnitude was taken into account. Of course the direction might also be important as we are interested in how the color changes in a specific direction normal to the segment boundary, from the healthy skin to the lesion, and color changes not in this direction probably just indicate noise.

This gradient normal to the mask boundary is calculated by the inner product

 gx

gy



·

 nx

ny



, where (gx, gy) is the image gradient and (nx, ny) a normal of the closest part of the segment boundary.

3.3.3 Feature extraction

The following gradient information is extracted from the area around the lesion segment boundary:

• Non-normalized gradient, norm over all color channels.

• Non-normalized gradient, for each of the color channels.

• Normalized gradient, summed over all color channels.

• Normalized gradient, for each of the color channels.

(40)

(a) Brown lesion

(b) White lesion

Figure 3.5: Examples of the effect of the gradient magnitude normalization procedures describes in section 3.3.1.

(41)

3.3. BOUNDARY SHARPNESS 33

• Gradient of the image transformed to grey-scale, using equation 3.2 as described in section 3.3.1, using the mean color of the segments as cl and ch.

• Gradient of the image transformed to grey-scale, using the mean color at the edge of the dilated/eroded lesion segment as cl and ch (using a 7 × 7 square structuring element for the dilation, effectively retrieving the color at both sides of the segment boundary).

• Gradient of the image transformed to grey-scale in the usual way, i.e. while re- taining the color difference.

In all cases, a variant that takes into account the boundary normal and one that does not are included, and for all the gradients, several features are extracted:

• The same statistics as with the color data (mean, standard deviation, summary).

• A histogram, with nine bins chosen in such a way that the average histogram (over all images) is flat.

This results in a total of sixteen features for each type of gradient information.

3.3.4 Results

Extensive preliminary tests show that looking at the gradient in the direction normal to the lesion segment boundary is always at least as good as, and usually gives 5 − 10%

better results than, looking just at the gradient magnitude. Because of the large amount of results, only these better results are presented here.

Tests were done on all features for a type of gradient information, as well as only the histogram, only the statistics and just the mean and standard deviation. In the following discussion only the results using all features, and the best subset in case it has better performance, are better are presented.

For the features extracted from the color images, both normalized and not, using just the mean and standard deviation outperforms all other results. When combining color channels by summing or taking the norm, using some other statistics as well gives slightly better results. Figure 3.6(a) shows the results, the solid lines indicate results using these best subsets, and the dashed lines indicating the results using all sixteen features per color channel.

As can be seen in these results, best results are obtained using RGB information and without normalization, with best results around 75% for using the mean and standard deviation in RGB color channels.

Not surprisingly, combining the usual gradient from color channels by taking the norm performs very poorly because different color differences and gradient directions in each channel can not be easily combined, whereas the normalization ensures the values have the same sign and gives much better results here.

Referenties

GERELATEERDE DOCUMENTEN

In de praktijk van de verkeersveiligheid wordt het begrip veiligheidscultuur vooral gebruikt binnen de transportsector en zien we dat ministerie, branche en verzekeraars

Since this style prints the date label after the author/editor in the bibliography, there are effectively two dates in the bibliography: the full date specification (e.g., “2001”,

If ibidpage is set to true, the citations come out as Cicero, De natura deorum,

Now, when comparing the three different methods (using the histogram version in case of the gener- alised pattern spectra, and the original versions in the other two cases), it

H ISTOGRAM - BASED SHARPNESS FUNCTION Histograms are often used as a basis for the image quality measurement in computational aesthetics [28], as well as in image enhancement [29]..

We have evaluated the performance of the proposed skin detector versus an existing color detector [3] defined in HSV color space and versus a histogram-based

Since the form of the spine in the frontal plane is of importance as well, an apparatus has been developed for measuring also optically the axial rotation and

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is