Probabilistic Machine Learning Probabilistic Machine Learning
Approaches to Medical Approaches to Medical Classification Problems Classification Problems
Chuan LU
Jury:
Prof. L. Froyen, chairman Prof. J. Vandewalle Prof. S. Van Huffel, promotor Prof. J. Beirlant Prof. J.A.K. Suykens, promotor Prof. P.J.G. Lisboa
Clinical decision support systems Clinical decision support systems
Advances in technologies facilitate data collection
computer based decision support systems
Human beings: subjective, experience dependent .
Artificial intelligence (AI) in medicine
Expert system
Machine learning
Diagnostic modelling
Knowledge discovery
STOP
Coronary Disease
Medical classification problems Medical classification problems
Essential for clinical decision making
Constrained diagnosis problem
e.g. benign -, malignant + (for tumors).
Classification
Find a rule to assign an obs. into one of the existing classes
supervised learning, pattern recognition
Medical classification problems Medical classification problems
Essential for clinical decision making
Constrained diagnosis problem
e.g. benign -, malignant + (for tumors).
Classification
Find a rule to assign an obs. into one of the existing classes
supervised learning, pattern recognition
Our applications:
Ovarian tumor classification with patient data
Brain tumor classification based on MRS spectra
Benchmarking cancer diagnosis based on microarray data
Apply learning algorithms, autonomous acquisition and integration of knowledge
Approaches
Conventional statistical learning algorithms
Artificial neural networks, Kernel-based models
Decision trees
Learning sets of rules
Bayesian networks
Machine learning
Machine learning
Apply learning algorithms, autonomous acquisition and integration of knowledge
Approaches
Conventional statistical learning algorithms
Artificial neural networks, Kernel-based models
Decision trees
Learning sets of rules
Bayesian networks
Machine learning Machine learning
Good performance
Artificial neural networks, Kernel-based models Decision trees
Learning sets of rules
Bayesian networks
Building classifiers
Building classifiers – – a flowchart a flowchart
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Model selection rning
rithm
Building classifiers
Building classifiers – – a flowchart a flowchart
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Test, Prediction
Predicted Class
New pattern
Classifier
Model selection rning
rithm
Probability of disease
Building classifiers
Building classifiers – – a flowchart a flowchart
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Test, Prediction
Predicted Class
New pattern
Classifier
Feature
selection Model
selection rning
rithm
Building classifiers
Building classifiers – – a flowchart a flowchart
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Test, Prediction
Predicted Class
New pattern
Classifier
Feature selection
Central Issue
Good generalization performance!
model fitness ⇔ complexity Central Issue
Good generalization performance!
model fitness ⇔ complexity Regularization, Bayesian learning
Model selection rning
rithm
Probability of disease
Building classifiers
Building classifiers – – a flowchart a flowchart
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Test, Prediction
Predicted Class
New pattern
Classifier
Feature selection
Central Issue
Probabilistic framework
Feature
selection Model
selection
Test, Prediction
Predicted Class
New pattern
Classifier Machine
Learning Algorithm
Training
Training Patterns + class labels
Central Issue
Model selection rning
rithm
Outline Outline
Supervised learning
Bayesian frameworks for blackbox models
Preoperative classification of ovarian tumors
Bagging for variable selection and prediction in cancer diagnosis problems
Conclusions
Outline Outline
Supervised learning
Bayesian frameworks for blackbox models
Preoperative classification of ovarian tumors
Bagging for variable selection and prediction in cancer diagnosis problems
Conclusions
Supervised learning
Bayesian frameworks for blackbox models
Preoperative classification of ovarian tumors
Bagging for variable selection and prediction in cancer diagnosis problems
Conclusions
Conventional linear classifiers Conventional linear classifiers
Linear discriminant analysis (LDA)
Discriminating using z=wTx ∈RR
Maximizing between-class
variance while minimizing within- class variance
z
1x
2S
bS
wx
1z
2Conventional linear classifiers Conventional linear classifiers
Linear discriminant analysis (LDA)
Discriminating using z=wTx ∈RR
Maximizing between-class
variance while minimizing within- class variance
Logistic regression (LR)
Logit: log (odds)
log p T
= = + b
− w x
Probability of malignancy
Σ
w w w
. . .
output x ∈RR
within-
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Multilayer Perceptrons
(MLP)
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Multilayer Perceptrons
(MLP)
Activation function
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Multilayer Perceptrons
(MLP)
Activation function
Radial basis function (RBF) neural networks
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Multilayer Perceptrons
(MLP)
Activation function
Radial basis function (RBF) neural networks
Basis function
Feedforward
Feedforward neural networks neural networks
inputs
x
1x
2. . . x
DΣ
. . .
Σ Σhidden layer output
x
1x
2. . . x
D. . .
Σ
bias
0
f( , ) ( )
M
j j j
w φ
=
=
∑
x w x
Multilayer Perceptrons
(MLP)
Activation function
Radial basis function (RBF) neural networks
Basis function
Training (Back-propagation, L-M, CG,…), validation, test
Regularization, Bayesian methods
Support vector machines (SVM) Support vector machines (SVM)
For classification: functional form
Statistical learning theory
[Vapnik95]1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Support vector machines (SVM) Support vector machines (SVM)
For classification: functional form
Statistical learning theory
[Vapnik95]1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
x ⇒ ϕ(x)
Support vector machines (SVM) Support vector machines (SVM)
For classification: functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x
x
wwTTxx+ b+ < < 0 Class:
Class: --11
wwTTxx+ b+ > > 0 Class: +1 Class: +1
Hyperplane Hyperplane:: wwTTxx+ b+ = = 0
x x x
x x
x margin
x kernel function
2/2/&&ww&&22
Support vector machines (SVM) Support vector machines (SVM)
For classification, functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Kernel trick
Support vector machines (SVM) Support vector machines (SVM)
For classification, functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Kernel trick
( ) T ( ) f x = w
ϕ
x + bFeature space Mercer’s theorem
k(x, z) = <ϕ(x),ϕ (z)> 1
( ) ( , )
N
i i i
i
f y k
α
b=
=
∑
+x x x
Dual space
el trick
Support vector machines (SVM) Support vector machines (SVM)
For classification, functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Kernel trick
( ) T ( ) f x = w
ϕ
x + bFeature space Mercer’s theorem
k(x, z) = <ϕ(x),ϕ (z)> 1
( ) ( , )
N
i i i
i
f y k
α
b=
=
∑
+x x x
Dual space
el trick
Positive definite kernel k(.,.) RBF kernel:
Linear kernel:
2 2
( , ) exp{ / }
k x z = − −x z r ( , ) T
k x z = x z
Support vector machines (SVM) Support vector machines (SVM)
For classification, functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Kernel trick
( ) T ( ) f x = w
ϕ
x + bFeature space Mercer’s theorem
k(x, z) = <ϕ(x),ϕ (z)> 1
( ) ( , )
N
i i i
i
f y k
α
b=
=
∑
+x x x
Dual space
el trick
Positive definite kernel k(.,.) RBF kernel:
Linear kernel:
2 2
( , ) exp{ / }
k x z = − −x z r ( , ) T
k x z = x z
Quadratic programming
Sparseness, unique solution
Support vector machines (SVM) Support vector machines (SVM)
For classification, functional form
Statistical learning theory
[Vapnik95]
Margin maximization
1
y( ) sign k( , )
N
i i i
i
y
α
b=
⎛ ⎞
= ⎜⎝
∑
+ ⎟⎠x x x functionkernel
Kernel trick
( ) T ( ) f x = w
ϕ
x + bFeature space Mercer’s theorem
k(x, z) = <ϕ(x),ϕ (z)> 1
( ) ( , )
N
i i i
i
f y k
α
b=
=
∑
+x x x
Dual space
el trick
Positive definite kernel k(.,.) RBF kernel:
Linear kernel:
2 2
( , ) exp{ / }
k x z = − −x z r ( , ) T
k x z = x z
Least squares
Least squares SVMs SVMs
LS-SVM classifier
[Suykens99] SVM variant
Inequality constraint ⇒ equality constraint
Quadratic programming ⇒ solving linear equations
Least squares
Least squares SVMs SVMs
LS-SVM classifier
[Suykens99] SVM variant
Inequality constraint ⇒ equality constraint
Quadratic programming ⇒ solving linear equations
2
, 1
The following model is taken:
min ( , ) 1 ,
2 ( ) )
(
N T
w b i
i T
J b C e
ϕ b
=
= +
+
=
∑
w w w
w x
f x
Primal problem
Least squares
Least squares SVMs SVMs
LS-SVM classifier
[Suykens99] SVM variant
Inequality constraint ⇒ equality constraint
Quadratic programming ⇒ solving linear equations
2
, 1
The following model is taken:
min ( , ) 1 ,
2
s.t. [ ( ) ] 1 1,...,
with regularization const ( )
. )
. (
N T
w b i
i T
T
i i i
J b C e
y b e
i
b
C N
ϕ
ϕ
=
= +
+ = +
=
=
−
∑
w w w
x
w x
w f x
Primal problem
1
1 1
1
[ ,..., ] , [1,...,1] , [ ,..., ] ,
[ ,..., ] , ( ) ( ) ( , )
Resulting clas y( ) sig
sifi
n[ ( , )
0
r
0
:
] e
T T T
N v N
T T
N ij i j
N
i i i
T v
v N
j i
y y e e
k
b C
y k b
α
α α ϕ ϕ
−
= = =
= Ω
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎢ + ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ = ⎦
=
⎣ ⎦ ⎣
=
+
=
∑
y 1 e
α x
1
α y
1
x x
x
Ω I
x
x x
solved in dual space
Dual problem
Model evaluation Model evaluation
Performance measure
Accuracy: correct classification rate
Receiver operating characteristic (ROC) analysis
Confusion table
Test result
True result ++
——
+ +
—
—
TP TP FP
FP
FN FN TN
TN
sensitivity
specficity
TP TP FN
TN TN FP
= +
= +
Model evaluation Model evaluation
Performance measure
Accuracy: correct classification rate
Receiver operating characteristic (ROC) analysis
Confusion table
ROC curve
Area under the ROC curve AUC=P[y(x )<y(x )]
Test result
True result ++
——
+ +
—
—
TP TP FP
FP
FN FN TN
TN
sensitivity
specficity
TP TP FN
TN TN FP
= +
= +
Assumption:
equal misclass. cost and constant class distribution in the target environment
Model evaluation Model evaluation
Performance measure
Accuracy: correct classification rate
Receiver operating characteristic (ROC) analysis
Confusion table
Test result
True result ++
——
+ +
—
—
TP TP FP
FP
FN FN TN
TN
sensitivity
specficity
TP TP FN
TN TN FP
= +
= +
TPTP TNTN
FNFN FPFP
Model evaluation Model evaluation
Performance measure
Accuracy: correct classification rate
Receiver operating characteristic (ROC) analysis
Confusion table
ROC curve
Area under the ROC curve AUC=P[y(x )<y(x )]
Test result
True result ++
——
+ +
—
—
TP TP FP
FP
FN FN TN
TN
sensitivity
specficity
TP TP FN
TN TN FP
= +
= +
Model evaluation Model evaluation
Performance measure
Accuracy: correct classification rate
Receiver operating characteristic (ROC) analysis
Confusion table
Test result
True result ++
——
+ +
—
—
TP TP FP
FP
FN FN TN
TN
sensitivity
specficity
TP TP FN
TN TN FP
= +
= +
Outline Outline
Supervised learning
Bayesian frameworks for blackbox models
Preoperative classification of ovarian tumors
Bagging for variable selection and prediction in cancer diagnosis problems
Conclusions
Bayesian frameworks for
Bayesian frameworks for blackbox blackbox models models
Advantages
Automatic control of model complexity, without CV
Possibility to use prior info and hierarchical models for hyperparameters
Predictive distribution for output
Principle of Bayesian learning
[MacKay95]•Define the probability distribution over all quantities within the model
•Update the distribution given data using Bayes’ rule
Principle of Bayesian learning
[MacKay95]•Define the probability distribution over all quantities within the model
•Update the distribution given data using Bayes’ rule
Bayesian inference Bayesian inference
(
,b , ,)
p D( , , ,bP D( H p) ( ,, ) b ,H)p D H = w θ Hw
w θ θ
θ
Likelihood × Prior Evidence
Posterior =
Bayes’ rule
( , ) (
) ( ( ,
( ,
) )
= p D H p) H
p D
p D H D H
p ∝ H
θ θ
θ θ
( ) (
( ) ) ( )
( )
j j
j j
p D H p H
p D
p D p
H = ∝ D H
: RBF kernel width, (model H kernel parameter, e.g.
Model evidence [MacKay95, Suykens02, Tipping01]
Bayesian inference Bayesian inference
(
,b , ,)
p D( , , ,bP D( H p) ( ,, ) b ,H)p D H = w θ Hw
w θ θ
θ
Likelihood × Prior Evidence
Posterior =
Bayes’ rule
( , ) (
) ( ( ,
( ,
) )
= p D H p) H
p D
p D H D H
p ∝ H
θ θ
θ θ
( ) (
( ) p D Hj p Hj) ( )
p H D = ∝ p D H Model
[MacKay95, Suykens02, Tipping01]
:
infer , bfor given , H Level 1
w θ
Bayesian inference Bayesian inference
(
,b , ,)
p D( , , ,bP D( H p) ( ,, ) b ,H)p D H = w θ Hw
w θ θ
θ
Likelihood × Prior Evidence
Posterior =
Bayes’ rule
( , ) (
) ( ( ,
( ,
) )
= p D H p) H
p D
p D H D H
p ∝ H
θ θ
θ θ
( ) (
( ) ) ( )
( )
j j
j j
p D H p H
p D
p D p
H = ∝ D H
: RBF kernel width, (model H kernel parameter, e.g.
Model evidence [MacKay95, Suykens02, Tipping01]
:
infer , bfor given , H Level 1
w θ
:
Infer hyperparameter Level 2
θ
Bayesian inference Bayesian inference
(
,b , ,)
p D( , , ,bP D( H p) ( ,, ) b ,H)p D H = w θ Hw
w θ θ
θ
Likelihood × Prior Evidence
Posterior =
Bayes’ rule
( , ) (
) ( ( ,
( ,
) )
= p D H p) H
p D
p D H D H
p ∝ H
θ θ
θ θ
( ) (
( ) p D Hj p Hj) ( )
p H D = ∝ p D H Model
[MacKay95, Suykens02, Tipping01]
:
infer , bfor given , H Level 1
w θ
:
Infer hyperparameter Level 2
θ
: Level 3
Bayesian inference Bayesian inference
(
,b , ,)
p D( , , ,bP D( H p) ( ,, ) b ,H)p D H = w θ Hw
w θ θ
θ
Likelihood × Prior Evidence
Posterior =
Bayes’ rule
( , ) (
) ( ( ,
( ,
) )
= p D H p) H
p D
p D H D H
p ∝ H
θ θ
θ θ
( ) (
( ) ) ( )
( )
j j
j j
p D H p H
p D
p D p
H = ∝ D H
: RBF kernel width, (model H kernel parameter, e.g.
Model evidence [MacKay95, Suykens02, Tipping01]
:
infer , bfor given , H Level 1
w θ
:
Infer hyperparameter Level 2
θ
:
Compare models Level 3
Marginalization (Gaussian appr.)
Sparse Bayesian learning (SBL) Sparse Bayesian learning (SBL)
Automatic relevance determination (ARD) applied to f(x)=wTφ(x)
Prior for wm varies
hierarchical priors ⇒ sparseness
Basis function φ(x)
Original variable
⇒ linear SBL model
⇒ variable selection!variable selection!
Kernel ⇒
relevance vector machines
Relevance vectors: prototypical
Sparse Bayesian learning (SBL) Sparse Bayesian learning (SBL)
Automatic relevance determination (ARD) applied to f(x)=wTφ(x)
Prior for wm varies
hierarchical priors ⇒ sparseness
Basis function φ(x)
Original variable
⇒ linear SBL model
⇒ variable selection!variable selection!
Kernel ⇒
relevance vector machines
Relevance vectors: prototypical
Sequential SBL algorithm [Tipping03]
RVMRVM
Sparse Bayesian LS
Sparse Bayesian LS - - SVMs SVMs
Iteratively pruning of easy cases (support value α<0)
[Lu02]
Mimicking margin
maximization as in SVM
Support vectors close to decision boundary
Sparse Bayesian LSSVM Sparse Bayesian
LSSVM
Variable (feature) selection Variable (feature) selection
Importance in medical classification problems
Economics of data acquisition
Accuracy and complexity of the classifiers
Gain insights into the underlying medical problem
Filter, wrapper, embedded
We focus on model evidence based methods within the Bayesian framework [Lu02, Lu04]
Forward / stepwise selection
Bayesian LS-SVM
Sparse Bayesian learning models
Accounting for uncertainty in variable selection via sampling methods
Who’s who?
Outline Outline
Supervised learning
Bayesian frameworks for blackbox models
Preoperative classification of ovarian tumors
Bagging for variable selection and prediction in cancer diagnosis problems
Conclusions
Ovarian cancer diagnosis Ovarian cancer diagnosis
Problem
Ovarian masses
Ovarian cancer : high mortality rate, difficult early detection
Treatment of different types of ovarian tumors differ
Develop a reliable diagnostic tool to preoperatively discriminate between malignant and benign tumors.
Assist clinicians in choosing the treatment.
Medical techniques for preoperative evaluation
Serum tumor maker: CA125 blood test
Ultrasonography
Color Doppler imaging and blood flow indexing
Two-stage study
Preliminary investigation: KULeuven pilot project, single-center
Ovarian cancer diagnosis Ovarian cancer diagnosis
Attempts to automate the diagnosis
Risk of malignancy Index (RMI)
[Jacobs90]RMI= score
morph× score
meno× CA125
Mathematical models
Ovarian cancer diagnosis Ovarian cancer diagnosis
Attempts to automate the diagnosis
Risk of malignancy Index (RMI)
[Jacobs90]RMI= score
morph× score
meno× CA125
Mathematical models
Logistic Regression Multilayer perceptrons Kernel-based models
Ovarian cancer diagnosis Ovarian cancer diagnosis
Attempts to automate the diagnosis
Risk of malignancy Index (RMI)
[Jacobs90]RMI= score
morph× score
meno× CA125
Mathematical models
Logistic Regression Multilayer perceptrons Kernel-based models
Bayesian belief network
Ovarian cancer diagnosis Ovarian cancer diagnosis
Attempts to automate the diagnosis
Risk of malignancy Index (RMI)
[Jacobs90]RMI= score
morph× score
meno× CA125
Mathematical models
Logistic Regression Multilayer perceptrons Kernel-based models
Bayesian belief network
Hybrid Methods
Ovarian cancer diagnosis Ovarian cancer diagnosis
Attempts to automate the diagnosis
Risk of malignancy Index (RMI)
[Jacobs90]RMI= score
morph× score
meno× CA125
Mathematical models
Logistic Regression Multilayer perceptrons Kernel-based models
Bayesian belief network
Hybrid Methods
Preliminary investigation
Preliminary investigation – – pilot project pilot project
Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999
425 records (data with missing values were excluded), 25 features.
291 benign tumors, 134 (32%) malignant tumors
Preprocessing:
e.g.
CA_125->log,
Color_score {1,2,3,4} -> 3 design variables {0,1}..
Descriptive statistics
Preliminary investigation
Preliminary investigation – – pilot project pilot project
Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999
425 records (data with missing values were excluded), 25 features.
291 benign tumors, 134 (32%) malignant tumors
Preprocessing:
e.g.
CA_125->log,
Variable (symbol) Benign Malignant Demographic Age (age)
Postmenopausal (meno)
45.6 ± 15.2 31.0 %
56.9 ± 14.6 66.0 % Serum marker CA 125 (log) (l_ca125) 3.0 ± 1.2 5.2 ± 1.5
CDI High blood flow (colsc3,4) 19.0% 77.3 %
Morphologic Abdominal fluid (asc) Bilateral mass (bilat) Unilocular cyst (un)
Multiloc/solid cyst (mulsol) Solid (sol)
Smooth wall (smooth)
32.7 % 13.3 % 45.8 % 10.7 % 8.3 % 56.8 %
67.3 % 39.0 % 5.0 % 36.2 % 37.6 % 5.7 % Demographic, serum marker, color Doppler imaging
and morphologic variables
Experiment
Experiment – – pilot project pilot project
Desired property for models:
Probability of malignancy
High sensitivity for malign.
↔ low false positive rate.
Compared models
Bayesian LS-SVM classifiers
RVM classifiers
Bayesian MLPs
Logistic regression
RMI (reference)
‘Temporal’ cross-validation
Training set: 265 data (1994~1997)
Test set: 160 data (1997~1999)
Multiple runs of stratified randomized CV
Improved test performance
Conclusions for model comparison similar to temporal CV
Variable selection
Variable selection – – pilot project pilot project
Forward variable selection based on Bayesian LS-SVM
Evolution of the model evidence 10 variables were
selected based on the training set (first treated 265 patient data) using RBF kernels.
Model evaluation
Model evaluation – – pilot project pilot project
Compare the predictive power of the models given the selected variables
ROC curves on test Set (data from 160 newest treated patients)
Model evaluation
Model evaluation – – pilot project pilot project
Comparison of model performance on test set with rejection based on
| ( P y = + 1 | ) - 0.5 x | ∝ uncertainty
Model evaluation
Model evaluation – – pilot project pilot project
Comparison of model performance on test set with rejection based on
| ( P y = + 1 | ) - 0.5 x | ∝ uncertainty
¾The rejected patients need further examination by human experts
¾Posterior probability essential for medical decision making
¾The rejected patients need further examination by human experts
¾Posterior probability essential for medical decision making
Extensive study
Extensive study – – IOTA project IOTA project
International Ovarian Tumor Analysis
Protocol for data collection
A multi-center study
9 centers
5 countries: Sweden, Belgium, Italy, France, UK
1066 data of the dominant tumors
800 (75%) benign
266 (25%) malignant
Data
Data – – IOTA project IOTA project
0 50 100 150 200 250 300 350
MSW LBE RIT MIT BFR MFR KUK OIT NIT
Center
benign
primary invasive borderline metastatic
metastatic 11 17 10 1 0 0 2 1 0
MSW LBE RIT MIT BFR MFR KUK OIT NIT
Model development
Model development – – IOTA project IOTA project
Randomly divide data into
Training set: Ntrain=754 Test set: Ntest=312
Stratified for tumor types and centers
Model building based on the training data
Variable selection:
with / without CA125
Bayesian LS-SVM with
Compared models:
LRs
Bay LS-SVMs, RVMs,
Kernels: linear/RB, additive RBF
Model evaluation
ROC analysis
Performance of all centers as a whole / of individual centers
Model interpretation?
Model evaluation
Model evaluation – – IOTA project IOTA project
MODELa (12 var) MODELa (12 var)
MODELb (12 var) MODELb
(12 var) MODELaa
(18 var) MODELaa
(18 var)
Comparison of model performance using different variable subsets
•Variable
subset matters more than model type
•Linear models suffice
pruning Variable
subset