Preoperative Prediction of Preoperative Prediction of Malignancy of Ovarian Tumors
Malignancy of Ovarian Tumors Using Using Least Squares Support Vector Machines Least Squares Support Vector Machines
C. Lu
1, T. Van Gestel
1, J. A. K. Suykens
1, S. Van Huffel
1, D. Timmerman
2, I. Vergote
21
Department of Electrical Engineering,
Katholieke Universiteit Leuven, Leuven, Belgium,
2
Department of Obstetrics and Gynecology,
University Hospitals Leuven, Leuven, Belgium
Overview Overview
Introduction
Data Exploration
LS-SVM and Bayesian evidence framework
LS-SVM classifier
Bayesian evidence framework
Input Selection
Sparse Approximation
Model Building and Model Evaluation
Conclusions
Introduction Introduction
Problem
ovarian masses: a common problem in gynecology (1/70 women).
ovarian cancer : high mortality rate
early detection of ovarian cancer is difficult
treatment and management of different types of ovarian tumors differs greatly.
develop a reliable diagnostic tool to preoperatively discriminate between benign and malignant tumors.
assist clinicians in choosing the appropriate treatment.
techniques for preoperative evaluation
Serum tumor maker: CA125 blood test
Transvaginal ultrasonography
Color Doppler imaging and blood flow indexing
Logistic Regression
Artificial neural networks
Support Vector Machines
Introduction Introduction
Attempts to automate the diagnosis
Risk of malignancy Index (RMI) (Jacobs et al)
RMI= score
morph× score
meno× CA125
Methematical models
Bayesian blief network
Hybrid Methods
Least Squares
SVM
Bayesian Framework
Introduction Introduction
Data
Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999
425 records, 25 features.
291 benign tumors, 134 (32%) malignant tumors
Introduction Introduction
Development Process
Exploratory Data Analysis
Data preprocessing,
univariate analysis,
PCA, factor analysis…
Input Selection
Model training
Model evaluation
Performance measures:
Receiver operating characteristic (ROC) analysis
Goal:
High sensitivity for malignancy <-> low false positive rate.
Providing probability of malignancy for individual.
ROC curves
constructed by plotting the
sensitivity versus the 1- specificity, or false positiverate, for varying probability cutoff level.
visualization of the relationship between
sensitivity and specificity of a test.
Area under the ROC curves (AUC)
measures the probability of the classifier to correctly classify events and
nonevents.
Data exploration Data exploration
Univariate analysis:
preprocessing: e.g.
CA_125->log,
color_score {1,2,3,4} -> 3 design variables {0,1}..
descriptive statistics, histograms…
Variable (symbol) Benign Malignant Demographic Age (age)Postmenopausal (meno) 45.6 15.2
31.0 % 56.9 14.6 66.0 % Serum marker CA 125 (log) (l_ca125) 3.0 1.2 5.2 1.5
CDI High blood flow (colsc3,4) 19.0% 77.3 %
Morphologic Abdominal fluid (asc) Bilateral mass (bilat) Unilocular cyst (un)
Multiloc/solid cyst (mulsol) Solid (sol)
Smooth wall (smooth) Irregular wall (irreg) Papillations (pap)
32.7 % 13.3 % 45.8 % 10.7 % 8.3 % 56.8 % 33.8 % 12.5 %
67.3 % 39.0 % 5.0 % 36.2 % 37.6 % 5.7 % 73.2 % 53.2 %
Demographic, serum marker, color Doppler
imaging and morphologic variables
Data exploration Data exploration
Multivariate analysis:
factor analysis
biplots
Fig. Biplot of Ovarian Tumor data.
The observations are plotted as points (o - benign,
x - malignant), the variablesare plotted as vectors from the origin.
- visualization of the
correlation between the variables- visualization of the
relations between the variables and clusters.LS-SVM & Bayesian Framework LS-SVM & Bayesian Framework
LS-SVM
Kernel based method
maps n-dimensional input vector into a higher dimensional feature space where a linear algorithm can be applied.
The learning problem:
NFi
i
i
b
w x
f
1
) ( )
( x
Feature
space
Mercer’s theoremK(x, z) = <(x) (z)>
Ni
i i
i
y K b
x f
1
) (
)
( x x
Dual space
attracting features: good generalization performance, the existing of unique solution, statistical learning theory
} / exp{
) ,
( x z x z
2
2 Kx z z
x
TK
( , )
Positive definite kernel K(.,.)
RBF kernel:
Linear kernel:
where the input data x->(x) are projected to a higher dimensional feature space.
One considers the following optimization problem:
subject to
The lagrangian is defined as
where are Lagrange multipliers.
LS-SVM LS-SVM
LS-SVM classifier (Suykens & Vandewalle,1999)
Given {(x
i, y
i)}
i=1,..,N, with input data x
iR
p, and the corresponding output data y
i{-1, 1}.
The following model is taken:
n
i i T
e b w
e e
w J
1 2 ,
, 2
1 2
) 1 ,
min
( w w ).
,..., 1 ( , 1 ] )
(
[ b e e i N
yi wT xi i i b
f (x)wT(x)
n
i
i i
T i
i y b e
e w J e
b w L
1
1 ] ) ( 2 [
) 1 , ( )
; , ,
( w x
Taking the Kuhn-Tucker conditions for optimality, providing a set of linear equations, eliminating w and e, the solutions are obtained:
withY=[y1; …; yN], 1v=[1;…;1], =[1; …, N], and
ij= yiyj<(xi)(xj)> = yiyj K(xi, xj) for i, j = 1, …, N The resulting LS-SVM model for classification is
LS-SVM LS-SVM
LS-SVM classifier (c.t.)
N
i
i i
iy K b
f
1
) , ( sign
)
(x x x
Some parameters need to be tuned:
Regularization parameter , determine the tradeoff between the minimizing training errors and minimizing the model complexity.
Kernel parameters, e.g. for an RBF kernel.
Popular ways for choosing hyper parameters: cross-validation, utilize an upper bound on the generalization error. Our approach: Bayesian method.
Bayesian Evidence Framework Bayesian Evidence Framework
Bayesian Evidence Framework (MacKay 1993)
Probability theory and Occam’s razor
Bayesian probability theory provides a unifying framework for data modeling.
Occam’s razor is needed for model comparison.
Each model Hi is assumed to have:
a vector of parameters w;
a prior distribution P(w |Hi);
a set of probability distributions one for each value of w, defining the predictions P(D | w, Hi) that the model makes about the data.
Bayesian Evidence Framework Bayesian Evidence Framework
Probability theory and Occam’s razor
Model H
iare ranked by evaluating the evidence
(1) Model fitting
(2) Model comparison
Assuming choosing equal priors P(Hi) to alternative models,
evidenc
e
evaluate most probable values for wMP, and summarize the posterior distribution by wMP, and error bars; evaluating the Hessian at wMP, The posterior can be locally approximated as Gaussian with covariance matrix A-1
Evaluating the evidence
if the posterior is well approximated by a Gaussian, then
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM for LS-SVM
A Bayesian framework for LS-SVM classifiers (VanGestel and Suykens, 2001)
Starting from the feature space formulation, analytic expression are obtained in the dual space on the three levels of Bayesian inference.
Posterior class probabilities marginalizing over the model parameters.
subject to
with regularization term and sum of squares error while amount of regularization determined by
For classification problem with binary target y
i=±1, LS-SVM cost
function can also be formulized as
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM for LS-SVM
Probability interpretation of LS-SVM classifier (Level1)
Applying Bayes rule, the first level of inference is obtained:
Assume: data points are
independent, target has Gaussian noise e
i, the noise level is defined as
2=1/
Assume: separate
Gaussian prior for w and b,
w2=1/, and
b (uniform distribution)
w
MPand b
MPare obtained by solving a standard LS- SVM in dual space.
The posterior probability of model parameter w
and b is given by
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM for LS-SVM
Posterior class probability for LS-SVM classifier (Level1)
the class probabilit y
with
Calculated at dual space
wher e
Marginalizing over w, yield a Gaussian distributed e
±with mean m
e±and variance
e±2conditional probability
incorporate prior class probability or misclassification cost
In our experiments, the prior P(y=+1)=2/3, P(y=-
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM for LS-SVM
Inference of Hyperparameters (Level 2)
Applying Bayes rule, the second level of inference is obtained:
Assume: uniform
distribution in log
and log.
Evidenc e in
level 1
The eigenvalue problem
A practical way to find MP , MP the is to solve first the scalar minimization problem in
The number of effective parameters
with
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM for LS-SVM
Bayesian model comparison (Level 3)
Applying Bayes rule, the third level of inference is obtained:
Assume: uniform
distribution
Models are
ranked by
evidence
Evidenc
e
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM - design for LS-SVM - design
Preprocess the data
Normalize the training data into zero mean, and variance 1.
Test set follows the same normalization as training set.
Hyperparameter tuning
Select the model H
iby choosing a kernel type K
iand kernel parameter, e.g. in RBF kernels. Then the optimal regularization parameter for model H
iis estimated on the second level of inference.
The corresponding
MP,
MPand the number of effective parameters
efcan also be estimated. Compute the
model evidence P(D|H
i) at the third level of inference.
For a kernel K
iwith tuning parameters, refine the tuning
parameters (e.g. ), such that a higher model evidence
P(D|H
i) is obtained.
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM - design for LS-SVM - design
Input selection under the Bayesian evidence framework
Given a certain type of kernel
Performs a forward selection (greedy search).
Starting from zero variables,
the variable which gives the greatest increase in the current model evidence is chosen at each iteration step.
The selection is stopped when the adding of any remaining variable can no longer increase the model evidence.
10 variables were selected based on the training set (first treated 265 patient data), using an RBF kernel.
l_ca125, pap, sol, colsc3, bilat, meno, asc,
Bayesian Evidence Framework Bayesian Evidence Framework
for LS-SVM - design for LS-SVM - design
Sparse approximation
Due to the choice of 2-norm in cost function, LS-SVM lost the sparseness compared with standard SVMs.
Sparseness can be imposed to LS-SVM by a pruning procedure based upon the support values
i=e
i.
We propose to prune the data points which have negative support values.
Intuitively, pruning of easy examples will focus the model on the harder cases which lie around the decision boundary.
Iteratively prune the data with negative i, the hyper parameters are retuned several times based on the reduced data set using the Bayesian evidence framework.
Stop when no more support values are negative.
Model Evaluation Model Evaluation
- Temporal Validation - Temporal Validation
Training set : data from the first treated 265 patients
Test set : data from the latest treated 160 patients
--
LSSVMrbf--
LSSVMlin--
LR--
RMIROC curve on training set
--
LSSVMrbf--
LSSVMlin--
LR--
RMIROC curve on test set
--
LSSVMrbf--
LSSVMlin--
LR--
RMIROC curve on test set
MODEL TYPE
AUC Accur acy
Sensi tivity
Speci ficity RMI 0.8733 78.13 74.07 80.19
76.88 81.48 74.53 LR1 0.9111 80.63 75.96 83.02 80.63 77.78 82.08 LS-SVM1 0.9141 81.25 77.78 83.02 (LIN) 81.88 83.33 81.13 LS-SVM1 0.9184 83.13 81.48 83.96 (RBF) 84.38 85.19 83.96
Performance on Test set
* Probability cutoff value: 0.4 and 0.3
randomly separating training set (n=265) and test set (n=160)
Stratified, #malignant : #benign ~ 2:1 for each training and test set.
Repeat 30 times
Model Evaluation Model Evaluation
- Randomized Cross-validation - Randomized Cross-validation
Averaged
Performance on 30 runs of validations
* Probability cutoff value: 0.5 and 0.4
MODEL TYPE
AUC (SD)
Accu racy
Sensi tivity
Speci ficity RMI 0.8882 82.6 81.73 83.06
0.0318 81.1 83.87 79.85 LR1 0.9397 83.3 89.33 80.55 0.0238 81.9 91.6 77.55 LS-SVM1 0.9405 84.3 87.4 82.91 (LIN) 0.0236 82.8 90.47 79.27 LS-SVM1 0.9424 84.9 86.53 84.09 (RBF) 0.0232 83.5 90 80.58
Expected ROC curve on validation
Conclusions Conclusions
Summary
Data exploratory analysis helps to analyze the data set.
Under the Bayesian evidence framework, choosing of the model
regularization and kernel parameters for LS-SVM classifier can be done in a unified way, without the need of selecting additional validation set.
A forward input selection procedure which tries to maximize the model evidence has been proved to be able to identify the subset of important variables for model building.
A sparse approximation can further improve the generalization performance of the LS- SVM classifiers.
LS-SVMs have the potential to give reliable preoperative prediction of malignancy of ovarian tumors.
Future work
A larger scale validation is still needed.