• No results found

Preoperative Prediction of Preoperative Prediction of Malignancy of Ovarian Tumors Malignancy of Ovarian Tumors

N/A
N/A
Protected

Academic year: 2021

Share "Preoperative Prediction of Preoperative Prediction of Malignancy of Ovarian Tumors Malignancy of Ovarian Tumors"

Copied!
24
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Preoperative Prediction of Preoperative Prediction of Malignancy of Ovarian Tumors

Malignancy of Ovarian Tumors Using Using Least Squares Support Vector Machines Least Squares Support Vector Machines

C. Lu

1

, T. Van Gestel

1

, J. A. K. Suykens

1

, S. Van Huffel

1

, D. Timmerman

2

, I. Vergote

2

1

Department of Electrical Engineering,

Katholieke Universiteit Leuven, Leuven, Belgium,

2

Department of Obstetrics and Gynecology,

University Hospitals Leuven, Leuven, Belgium

(2)

Overview Overview

Introduction

Data Exploration

LS-SVM and Bayesian evidence framework

LS-SVM classifier

Bayesian evidence framework

Input Selection

Sparse Approximation

Model Building and Model Evaluation

Conclusions

(3)

Introduction Introduction

Problem

ovarian masses: a common problem in gynecology (1/70 women).

ovarian cancer : high mortality rate

early detection of ovarian cancer is difficult

treatment and management of different types of ovarian tumors differs greatly.

develop a reliable diagnostic tool to preoperatively discriminate between benign and malignant tumors.

assist clinicians in choosing the appropriate treatment.

techniques for preoperative evaluation

Serum tumor maker: CA125 blood test

Transvaginal ultrasonography

Color Doppler imaging and blood flow indexing

(4)

Logistic Regression

Artificial neural networks

Support Vector Machines

Introduction Introduction

Attempts to automate the diagnosis

Risk of malignancy Index (RMI) (Jacobs et al)

RMI= score

morph

× score

meno

× CA125

Methematical models

Bayesian blief network

Hybrid Methods

Least Squares

SVM

Bayesian Framework

(5)

Introduction Introduction

Data

Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999

425 records, 25 features.

291 benign tumors, 134 (32%) malignant tumors

(6)

Introduction Introduction

 Development Process

Exploratory Data Analysis

Data preprocessing,

univariate analysis,

PCA, factor analysis…

Input Selection

Model training

Model evaluation

Performance measures:

Receiver operating characteristic (ROC) analysis

Goal:

High sensitivity for malignancy <-> low false positive rate.

Providing probability of malignancy for individual.

ROC curves

constructed by plotting the

sensitivity versus the 1- specificity, or false positive

rate, for varying probability cutoff level.

visualization of the relationship between

sensitivity and specificity of a test.

Area under the ROC curves (AUC)

measures the probability of the classifier to correctly classify events and

nonevents.

(7)

Data exploration Data exploration

 Univariate analysis:

preprocessing: e.g.

CA_125->log,

color_score {1,2,3,4} -> 3 design variables {0,1}..

descriptive statistics, histograms…

Variable (symbol) Benign Malignant Demographic Age (age)

Postmenopausal (meno) 45.6  15.2

31.0 % 56.9  14.6 66.0 % Serum marker CA 125 (log) (l_ca125) 3.0  1.2 5.2  1.5

CDI High blood flow (colsc3,4) 19.0% 77.3 %

Morphologic Abdominal fluid (asc) Bilateral mass (bilat) Unilocular cyst (un)

Multiloc/solid cyst (mulsol) Solid (sol)

Smooth wall (smooth) Irregular wall (irreg) Papillations (pap)

32.7 % 13.3 % 45.8 % 10.7 % 8.3 % 56.8 % 33.8 % 12.5 %

67.3 % 39.0 % 5.0 % 36.2 % 37.6 % 5.7 % 73.2 % 53.2 %

Demographic, serum marker, color Doppler

imaging and morphologic variables

(8)

Data exploration Data exploration

 Multivariate analysis:

factor analysis

biplots

Fig. Biplot of Ovarian Tumor data.

The observations are plotted as points (o - benign,

x - malignant), the variables

are plotted as vectors from the origin.

- visualization of the

correlation between the variables

- visualization of the

relations between the variables and clusters.

(9)

LS-SVM & Bayesian Framework LS-SVM & Bayesian Framework

LS-SVM

Kernel based method

maps n-dimensional input vector into a higher dimensional feature space where a linear algorithm can be applied.

The learning problem:

NF

i

i

i

b

w x

f

1

) ( )

(  x

Feature

space

Mercer’s theorem

K(x, z) = <(x) (z)>

N

i

i i

i

y K b

x f

1

) (

)

(  x x

Dual space

attracting features: good generalization performance, the existing of unique solution, statistical learning theory

} / exp{

) ,

( x z   xz

2

2 K

x z z

x

T

K

( , ) 

Positive definite kernel K(.,.)

RBF kernel:

Linear kernel:

(10)

where the input data x->(x) are projected to a higher dimensional feature space.

One considers the following optimization problem:

subject to

The lagrangian is defined as

where  are Lagrange multipliers.

LS-SVM LS-SVM

LS-SVM classifier (Suykens & Vandewalle,1999)

Given {(x

i

, y

i

)}

i=1,..,N

, with input data x

i

R

p

, and the corresponding output data y

i

{-1, 1}.

The following model is taken:

n

i i T

e b w

e e

w J

1 2 ,

, 2

1 2

) 1 ,

min

( w w

).

,..., 1 ( , 1 ] )

(

[ b e e i N

yi wT xi i i b

f (x)wT(x)

 

n

i

i i

T i

i y b e

e w J e

b w L

1

1 ] ) ( 2 [

) 1 , ( )

; , ,

( w x

(11)

Taking the Kuhn-Tucker conditions for optimality, providing a set of linear equations, eliminating w and e, the solutions are obtained:

withY=[y1; …; yN], 1v=[1;…;1], =[1; …, N], and

ij= yiyj<(xi)(xj)> = yiyj K(xi, xj) for i, j = 1, …, N The resulting LS-SVM model for classification is

LS-SVM LS-SVM

LS-SVM classifier (c.t.)

N

i

i i

iy K b

f

1

) , ( sign

)

(x x x

Some parameters need to be tuned:

Regularization parameter  , determine the tradeoff between the minimizing training errors and minimizing the model complexity.

Kernel parameters, e.g.  for an RBF kernel.

Popular ways for choosing hyper parameters: cross-validation, utilize an upper bound on the generalization error. Our approach: Bayesian method.

(12)

Bayesian Evidence Framework Bayesian Evidence Framework

Bayesian Evidence Framework (MacKay 1993)

Probability theory and Occam’s razor

Bayesian probability theory provides a unifying framework for data modeling.

Occam’s razor is needed for model comparison.

Each model Hi is assumed to have:

a vector of parameters w;

a prior distribution P(w |Hi);

a set of probability distributions one for each value of w, defining the predictions P(D | w, Hi) that the model makes about the data.

(13)

Bayesian Evidence Framework Bayesian Evidence Framework

Probability theory and Occam’s razor

Model H

i

are ranked by evaluating the evidence

(1) Model fitting

(2) Model comparison

Assuming choosing equal priors P(Hi) to alternative models,

evidenc

e

evaluate most probable values for wMP, and summarize the posterior distribution by wMP, and error bars; evaluating the Hessian at wMP, The posterior can be locally approximated as Gaussian with covariance matrix A-1

Evaluating the evidence

if the posterior is well approximated by a Gaussian, then

(14)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM for LS-SVM

A Bayesian framework for LS-SVM classifiers (VanGestel and Suykens, 2001)

Starting from the feature space formulation, analytic expression are obtained in the dual space on the three levels of Bayesian inference.

Posterior class probabilities  marginalizing over the model parameters.

subject to

with regularization term and sum of squares error while amount of regularization determined by

For classification problem with binary target y

i

=±1, LS-SVM cost

function can also be formulized as

(15)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM for LS-SVM

Probability interpretation of LS-SVM classifier (Level1)

Applying Bayes rule, the first level of inference is obtained:

Assume: data points are

independent, target has Gaussian noise e

i

, the noise level is defined as

2

=1/

Assume: separate

Gaussian prior for w and b, 

w2

=1/, and

b

 (uniform distribution)

w

MP

and b

MP

are obtained by solving a standard LS- SVM in dual space.

The posterior probability of model parameter w

and b is given by

(16)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM for LS-SVM

Posterior class probability for LS-SVM classifier (Level1)

the class probabilit y

with

Calculated at dual space

wher e

Marginalizing over w, yield a Gaussian distributed e

±

with mean m

and variance 

2

conditional probability

incorporate prior class probability or misclassification cost

In our experiments, the prior P(y=+1)=2/3, P(y=-

(17)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM for LS-SVM

Inference of Hyperparameters (Level 2)

Applying Bayes rule, the second level of inference is obtained:

Assume: uniform

distribution in log

and log.

Evidenc e in

level 1

The eigenvalue problem

A practical way to find MP , MP the is to solve first the scalar minimization problem in 

The number of effective parameters

with

(18)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM for LS-SVM

Bayesian model comparison (Level 3)

Applying Bayes rule, the third level of inference is obtained:

Assume: uniform

distribution

Models are

ranked by

evidence

Evidenc

e

(19)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM - design for LS-SVM - design

Preprocess the data

Normalize the training data into zero mean, and variance 1.

Test set follows the same normalization as training set.

Hyperparameter tuning

Select the model H

i

by choosing a kernel type K

i

and kernel parameter, e.g.  in RBF kernels. Then the optimal regularization parameter  for model H

i

is estimated on the second level of inference.

The corresponding 

MP

, 

MP

and the number of effective parameters 

ef

can also be estimated. Compute the

model evidence P(D|H

i

) at the third level of inference.

For a kernel K

i

with tuning parameters, refine the tuning

parameters (e.g. ), such that a higher model evidence

P(D|H

i

) is obtained.

(20)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM - design for LS-SVM - design

Input selection under the Bayesian evidence framework

Given a certain type of kernel

Performs a forward selection (greedy search).

Starting from zero variables,

the variable which gives the greatest increase in the current model evidence is chosen at each iteration step.

The selection is stopped when the adding of any remaining variable can no longer increase the model evidence.

10 variables were selected based on the training set (first treated 265 patient data), using an RBF kernel.

l_ca125, pap, sol, colsc3, bilat, meno, asc,

(21)

Bayesian Evidence Framework Bayesian Evidence Framework

for LS-SVM - design for LS-SVM - design

Sparse approximation

Due to the choice of 2-norm in cost function, LS-SVM lost the sparseness compared with standard SVMs.

Sparseness can be imposed to LS-SVM by a pruning procedure based upon the support values 

i

=e

i

.

We propose to prune the data points which have negative support values.

Intuitively, pruning of easy examples will focus the model on the harder cases which lie around the decision boundary.

Iteratively prune the data with negative i, the hyper parameters are retuned several times based on the reduced data set using the Bayesian evidence framework.

Stop when no more support values are negative.

(22)

Model Evaluation Model Evaluation

- Temporal Validation - Temporal Validation

 Training set : data from the first treated 265 patients

 Test set : data from the latest treated 160 patients

--

LSSVMrbf

--

LSSVMlin

--

LR

--

RMI

ROC curve on training set

--

LSSVMrbf

--

LSSVMlin

--

LR

--

RMI

ROC curve on test set

--

LSSVMrbf

--

LSSVMlin

--

LR

--

RMI

ROC curve on test set

MODEL TYPE

AUC Accur acy

Sensi tivity

Speci ficity RMI 0.8733 78.13 74.07 80.19

76.88 81.48 74.53 LR1 0.9111 80.63 75.96 83.02 80.63 77.78 82.08 LS-SVM1 0.9141 81.25 77.78 83.02 (LIN) 81.88 83.33 81.13 LS-SVM1 0.9184 83.13 81.48 83.96 (RBF) 84.38 85.19 83.96

Performance on Test set

* Probability cutoff value: 0.4 and 0.3

(23)

randomly separating training set (n=265) and test set (n=160)

Stratified, #malignant : #benign ~ 2:1 for each training and test set.

Repeat 30 times

Model Evaluation Model Evaluation

- Randomized Cross-validation - Randomized Cross-validation

Averaged

Performance on 30 runs of validations

* Probability cutoff value: 0.5 and 0.4

MODEL TYPE

AUC (SD)

Accu racy

Sensi tivity

Speci ficity RMI 0.8882 82.6 81.73 83.06

0.0318 81.1 83.87 79.85 LR1 0.9397 83.3 89.33 80.55 0.0238 81.9 91.6 77.55 LS-SVM1 0.9405 84.3 87.4 82.91 (LIN) 0.0236 82.8 90.47 79.27 LS-SVM1 0.9424 84.9 86.53 84.09 (RBF) 0.0232 83.5 90 80.58

Expected ROC curve on validation

(24)

Conclusions Conclusions

Summary

Data exploratory analysis helps to analyze the data set.

Under the Bayesian evidence framework, choosing of the model

regularization and kernel parameters for LS-SVM classifier can be done in a unified way, without the need of selecting additional validation set.

A forward input selection procedure which tries to maximize the model evidence has been proved to be able to identify the subset of important variables for model building.

A sparse approximation can further improve the generalization performance of the LS- SVM classifiers.

LS-SVMs have the potential to give reliable preoperative prediction of malignancy of ovarian tumors.

Future work

A larger scale validation is still needed.

Hybrid methodology, e.g. combine the Bayesian network with the

learning of LS-SVM, might be more promising

Referenties

GERELATEERDE DOCUMENTEN

After a brief review of the LS-SVM classifier and the Bayesian evidence framework, we will show the scheme for input variable selection and the way to compute the posterior

After a brief review of the LS-SVM classifier and the Bayesian evidence framework, we will show the scheme for input variable selection and the way to compute the posterior

The aim of this study is to develop the Bayesian Least Squares Support Vector Machine (LS-SVM) classifiers, for preoperatively predicting the malignancy of ovarian tumors.. We

We start in section 2 with a brief review of the LS-SVM classifier and its integration within the Bayesian evidence framework; then we introduce a way to compute the posterior

The performance of the Risk of Malignancy Index (RMI) and two logistic regression (LR) models LR1 and LR2, using respectively MODEL1 and MODEL2 as inputs, are. also shown

The observations of malignant tumors (1) have relatively high values for variables (Sol, Age, Meno, Asc, Bilat, L_CA125, ColSc4, PSV, TAMX, Pap, Irreg, MulSol, Sept), but relatively

The sequential non-uniform procedure for preoperative differential diagnosis between benign and malignant forms of adnexal tumour is recommended to start from the most

The results were compared to a linear benchmark model (a perceptron fitted using the evidence framework aided by ARD input selection).. The AUC for different models was