Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors

(1)

Blackbox classifiers for preoperative discrimination between malignant

and benign ovarian tumors

C. Lu ¹ , T. Van Gestel ¹ , J. A. K. Suykens ¹ , S. Van Huffel ¹ , I. Vergote ² , D. Timmerman ²

1

Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium,

2

Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium

Email address: chuan.lu@esat.kuleuven.ac.be

(2)

Variable (symbol) Benign Malignant Demographic Age (age)

Postmenopausal (meno) 45.6  15.2 31.0 %

56.9  14.6 66.0 % Serum marker CA 125 (log) (l_ca125) 3.0  1.2 5.2  1.5 CDI High color score (colsc3,4) 19.0% 77.3 % Morphologic Abdominal fluid (asc)

Bilateral mass (bilat) Unilocular cyst (un)

Multiloc/solid cyst (mulsol) Solid (sol)

Smooth wall (smooth) Irregular wall (irreg) Papillations (pap)

32.7 % 13.3 % 45.8 % 10.7 % 8.3 % 56.8 % 33.8 % 12.5 %

67.3 % 39.0 % 5.0 % 36.2 % 37.6 % 5.7 % 73.2 % 53.2 % Demographic, serum marker, color Doppler

imaging and morphologic variables

Visualizing the correlation between the

variables and the relations between the

variables and clusters.

Biplot of Ovarian Tumor Data

1. Introduction

 Ovarian masses is a common problem in gynecology. A reliable test for preoperative discrimination between benign and malignant ovarian tumors is of considerable help for clinicians in choosing appropriate treatments for patients.

 In this study, we develop and evaluate several blackbox models, particularly multi-layer

perceptrons (MLP) and least squares support vector machines (LS-SVMs) , both within Bayesian evidence framework, to preoperatively predict malignancy of ovarian tumors. Model performance is accessed via Receiver Operating Characteristic (ROC) curve analysis.

2. Data

o: benign case x: malignant case

(3)

ROC curves

 constructed by plotting the sensitivity (true positive rate) versus the1-specificity, or false positive rate, for varying probability cutoff level.

 visualization of the relationship between

sensitivity and specificity of a test.

 Area under the ROC curves (AUC)

measures the probability of the classifier to correctly classify events and

nonevents.



Patient Data

Unv. Hospitals Leuven

1994~1999

425 records, 25 features 32% malignant

Univariate Analysis

Preprocessing

Multivariate Analysis

PCA, Factor analysis Stepwise logistic regression

Model Building

Bayesian LS-SVM + sparse approxi.

Bayesian MLP

Model Evaluation

ROC analysis: AUC

Cross validation (temporal, random)

Descriptive statistics

Input Variable Selection

Data Exploration

Model Development

Procedure of developing models to predict the malignancy of ovarian tumors

Goal: find a model

 With High sensitivity for malignancy and low false positive rate.

 Providing probability of malignancy for

individual.

Bayesian LS-SVM (RBF, Linear) Forward Selection (Max. Evidence)

3. Methods

(4)

4. Bayesian MLPs and Bayesian LS-SVMs for classification

LS-SVM Classifier

(VanGestel,Suykens 2002)

2 , 2

1

The following model is taken:

min ( , ) ,

2 2

S.T. [ ( ) ] 1 1,..., with reg

( ) ( )

ularizer . Denote [ , ]

T N w b i

i T

i i i

T

J w b w w e

y w x b e i N

f w x b

 

   











 

 

 

 

 



x

1 1

2 2

1

[ ,..., ] ,1 [1,...,1] , [ ,..., ] , [ ,..., ] , ( ) ( ) ( , ) e.g. RBF kernel: ( , ) exp{ / } Linear kernel: ( , )

Resulting 0 1 0 1

cl

T T T

N v N

T T

N ij i j i j

T v v

T N

Y y

b I Y

y e e e

x x K x x

K K

 

 

  





     

          

  

  





 







x z x z

x z z x

1

( ) [

assifier: ^N _{i i} ( , )_i ]

i

y x sign  y K x x b









¹ ²



¹² 1 ² 2

T 1 2

MP

Introduce new error variables ( ( ) ˆ ), with ˆ the center of class in feature space.

2 ( ) exp ,

2( )

where ( ( ) ˆ ), , is the ( ,

varia ,

nc )

e

T

e e

e

e e

p x y D

e w x m

m

m w x m

H



  

 

  

 



  

  

 



   

 



 

    

 

1

of due to target noise and uncertainty in w.

( , , ) ( , , ) with the prior class probabili

( ) ( ) (

( , , )

ty.

)

y

e

p x y D H p y

p y p y

p x y D H p y x D H









Computing posterior class probabilities

solved in dual space

 

model , for

MLP: network structure, e.g.

LS-SVM: kernel parameter, e.g.

#hidden neurons for rbf ke

( , , , ) ( , , ) , ,

rnel

( , )

,

s

: infer , for given ,

p D w b H p w b H

p D

H

H P H

w b D

w b H



 



 

 

 



Level 1

=> the Maximum A Posteriori Estimation for and will be the solution of basic MLP/LS-SVM classifier

( , ) ( )

(

exp(

( , ) =

( , ))

b

( , )

w

)

: Infer hyperparameter

p D H p D H p H

p

J

D

w b

H H

p D

  







Level 2

Level

 

( )

choose the which maximi ( )

( )

( ) (

ze t

) he

: Compare models:

j j

j

p D p

D H H

H p D H

H p H

p D

 p D 

3 Model evidence

Bayesian Evidence Framework

Inferences are divided into distinct levels.  

 

(2) (1) (1) (2)

Consider the one hidden layer MLP:

, where ( , ) ' with activation function of

exp( ) exp( ) hidden layer: '( ) tanh(

( ) ( ,

) ,

exp( ) exp( ) output layer: logistic funct on

)

i

a x w w g w x b b

a a

g a a

a a

f x g a x w

  

 

 

 



, 1

1

min ( , ) , with regularizer , 2

where the cross entropy error function ( ) 1

1 e

{ log ( ) (1 ) log(1 ( ))}.

xp( )

T w b

N

i i i i

i

J w b w w G

G y

g

f x y f x

a a

 



 

    

  



MP

2 2

( 1)

posterior class probability can be approximated:

( ) ( , ) log log ,

where ( ) 1/ 1 / 8, and is var( | ), with the prior class probabili

( 1| , , )

( 1) ty.

g s a x N P y

w P

N

s s s

P y x D H

x y

P y

a



 







 

 

 

 

 

 



 



   

1,...,

Consider a binary classification problem, given D {( , )}x y_i _i _i_ _N, where x_i R^p, y_i  0,1 in case of MLP, y_i  1,1 in case of LS-SVM.

MLP Classifiers

(Mackay 1992)

(5)

Computing posterior class probabilities for minimum risk decision making Incorporate the different misclassification costs into the class priors: e.g.

Set the adjusted prior probability for malignant and benign class to: 2/3 and 1/3.

5. Experimental results

RMI: risk of malignancy index = score_morph× score_meno× CA125



Training set : data from the first treated 265 patients



Test set : data from the latest treated 160 patients

Performance from Temporal validation

ROC curve on test set

MODEL TYPE

AUC cut off

Accur acy

Sensi tivity

Speci ficity RMI 0.8733 0.4 78.13 74.07 80.19

0.3 76.88 81.48 74.53 MLP 0.9174 0.4 83.13 81.48 83.96 (10-2-1) 0.3 81.87 83.33 81.13 LS-SVM 0.9141 0.4 81.25 77.78 83.02

(LIN) 0.3 81.88 83.33 81.13

LS-SVM 0.9184 0.4 83.13 81.48 83.96 (RBF) 0.3 84.38 85.19 83.96

Performance on Test set

Input variable selection

The forward selection procedure tries to maximize the

model evidence of LS-SVM given a certain type of kernel

10 variables were selected using RBF kernels.

l_ca125, pap, sol, colsc3, bilat, meno, asc, shadows, colsc4, irreg

( 1)

'( 1) , where , denote the cost of misclassifying a case from class '+' and '-', respectively.

( 1) ( 1)

P y c

P y c c

P y c P y c

  

  

 

 

    

    

(6)



The forward selection procedure which tries to maximize the evidence of LS-SVM model is able to identify the

important variables.



The performance of LS-SVMs and MLPs are comparable.



Both models have the potential to give reliable

preoperative prediction of malignancy of ovarian tumors.

 A larger scale validation is needed.

References

1. C. Lu, T. Van Gestel, et al. Preoperative prediction of malignancy of ovarian tumors using Least

Squares Support Vector Machines (2002), submitted paper.

2. D. Timmerman, H. Verrelst, et al., Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses.

Ultrasound Obstet Gynecol (1999).

3. J.A.K. Suykens, J. Vandewalle, Least Squares support vector machine classifiers, Neural Processing Letters (1999), 9(3).

4. T. Van Gestel, J.A.K. Suykens, et al., Bayesian framework for least squares support vector

machine classifiers, Gaussian process and kernel fisher discriminant analysis, Neural Computation (2002), 15(5).

5. D.J.C. MacKay, The evidence framework applied to classification networks, Neural Computation

(1992), 4(5).

Performance from randomized cross-validation (30 runs)

MODEL TYPE

mAUC (SD)

cut off

Accur acy

Sensi tivity

Speci ficity RMI 0.8882 100 82.65 81.73 83.06

0.0318 80 81.10 83.87 79.85 MLP 0.9409 0.6 84.46 87.20 83.21 (10-2-1) 0.0198 0.5 82.17 90.80 78.24 LS-SVM 0.9405 0.5 84.31 87.40 82.91 (LIN) 0.0236 0.4 82.77 90.47 79.27 LS-SVM 0.9424 0.5 84.85 86.53 84.09 (RBF) 0.0232 0.4 83.52 90.00 80.58

randomly separating training set (n=265) and test set (n=160)

Stratified, #malignant : #benign ~ 2:1 for each training and test set.

Repeat 30 times

Averaged Performance on 30 runs of validations

Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors