Introduction
Background
Medical decision support systems based on patient data and expert knowledge
A need to analyze the collected data in order to draw a correct medical decision
Intelligent machine learning methods such as
artificial neural networks (ANNs) and kernel-based algorithms shown to be suitable approaches to such complex tasks.
Research topic
Statistical analysis of patient data
Development and clinical evaluation of predictive models which optimally extract information from data
Application to real world clinical data, such as the ovarian tumor data set, as well as to other
benchmark data sets in the biomedical field.
Methods
Explorative data analysis (EDA) Variable (feature) selection
Important in medical diagnosis
economics of data acquisition
accuracy and complexity of the classifiers
gain insights into the underlying medical problem
Focus on evidence based method within the Bayesian framework
forward / stepwise selection Bayesian LS-SVM
spares Bayesian learning
accounting for uncertainty in variable selection
Probabilistic modeling techniques
Dealing with the uncertainty and different mis- classification cost in medical decision support Traditional linear discriminant analysis (LDA), logistic regression (LR)
Bayesian + multi-layer perceptrons (MLPs) Bayesian + kernel based modeling:
Bayesian Least squares support vector machine (LS-SVM) classifiers (Suykens 1999,2001,2002) Sparse Bayesian modeling and relevance vector machines (RVMs) (Tipping 2001, 2003)
Conclusions
The intelligent machine learning methods, particularly
Bayesian kernel based modeling and the related variable
selection methods, are shown to have great potential value in medical diagnosis problems.
Applications
Ovarian tumor classification
Ovarian cancer: difficult in early detection, the highest mortality rate in gynecologic cancers
Develop a reliable diagnostic tool for preoperative distinction between benign and malignant tumors
Assist clinician in choosing the appropriate treatments Preoperative medical diagnostic methods:
serum tumor marker: CA125 blood test ultrasonography
color doppler imaging Data
Results: performance of the models given the selected 10 variables on the test set (160 newly collected data)
Brain tumor classification based MRS spectra data
4 types of brain tumors, 205x138 magnitude value
performance increases from accuracy of 68.5% to 75.3%
by using only 27 variables for the linear LS-SVM classifier
Cancer diagnosis based on microarray data
Classification of leukemia cancer and colon cancer
Zero LOO error was achieved by using only 4 or 5 genes among the available thousands of genes.
Intelligent medical diagnosis via machine learning
Chuan Lu
Dept. of Electrical Engineering
Acknowledgements
This research was funded by the
projects of IUAP IV-02 and IUAP V-22, KUL GOA-MEFISTO-666, IDO/99/03, FWO G.0407.02 and G.0269.02, and a Research Council KUL doctoral
fellowship.
Further information
Chuan Lu
K.U.Leuven – Dept. ESAT Division of SCD-SISTA
Kasteelpark Arenberg 10
3001 Leuven (Heverlee), Belgium chuan.lu@esat.kuleuven.ac.be
Supervisors: Prof. Sabine Van Huffel Prof. Johan Suykens
Tel.: +32 16 32 18 84 Fax: +32 16 32 19 70
www.esat.kuleuven.ac.be
Visualizing the
correlation between the
variables and the relations between the variables and
clusters Biplot of Ovarian Tumor Data
Patient data
Variable selection
Model building Statistical
analysis
Model evaluation
Reject AUC Accuracy Sensitivity Specificity 10% (16) 0.9420 88.97 83.72 91.4
5% (8) 0.9343 87.50 82.61 89.8 0% (0) 0.9184 84.38 77.78 87.74
ROC curves
Performance of Bayesian RBF LS-SVM with rejection based
on posterior probability
Collected in Unv. Hospitals
Levuen
(1994~1999), 425 records,
25 features, 32% malignant