Intelligent medical diagnosis via machine learning

(1)

Introduction

Background

Medical decision support systems based on patient data and expert knowledge

A need to analyze the collected data in order to draw a correct medical decision

Intelligent machine learning methods such as

artificial neural networks (ANNs) and kernel-based algorithms shown to be suitable approaches to such complex tasks.

Research topic

Statistical analysis of patient data

Development and clinical evaluation of predictive models which optimally extract information from data

Application to real world clinical data, such as the ovarian tumor data set, as well as to other

benchmark data sets in the biomedical field.

Methods

Explorative data analysis (EDA) Variable (feature) selection

Important in medical diagnosis

economics of data acquisition

accuracy and complexity of the classifiers

gain insights into the underlying medical problem

Focus on evidence based method within the Bayesian framework

forward / stepwise selection Bayesian LS-SVM

spares Bayesian learning

accounting for uncertainty in variable selection

Probabilistic modeling techniques

Dealing with the uncertainty and different mis- classification cost in medical decision support Traditional linear discriminant analysis (LDA), logistic regression (LR)

Bayesian + multi-layer perceptrons (MLPs) Bayesian + kernel based modeling:

Bayesian Least squares support vector machine (LS-SVM) classifiers (Suykens 1999,2001,2002) Sparse Bayesian modeling and relevance vector machines (RVMs) (Tipping 2001, 2003)

Conclusions

The intelligent machine learning methods, particularly

Bayesian kernel based modeling and the related variable

selection methods, are shown to have great potential value in medical diagnosis problems.

Applications

Ovarian tumor classification

Ovarian cancer: difficult in early detection, the highest mortality rate in gynecologic cancers

Develop a reliable diagnostic tool for preoperative distinction between benign and malignant tumors

Assist clinician in choosing the appropriate treatments Preoperative medical diagnostic methods:

serum tumor marker: CA125 blood test ultrasonography

color doppler imaging Data

Results: performance of the models given the selected 10 variables on the test set (160 newly collected data)

Brain tumor classification based MRS spectra data

4 types of brain tumors, 205x138 magnitude value

performance increases from accuracy of 68.5% to 75.3%

by using only 27 variables for the linear LS-SVM classifier

Cancer diagnosis based on microarray data

Classification of leukemia cancer and colon cancer

Zero LOO error was achieved by using only 4 or 5 genes among the available thousands of genes.

Intelligent medical diagnosis via machine learning

Chuan Lu

Dept. of Electrical Engineering

Acknowledgements

This research was funded by the

projects of IUAP IV-02 and IUAP V-22, KUL GOA-MEFISTO-666, IDO/99/03, FWO G.0407.02 and G.0269.02, and a Research Council KUL doctoral

fellowship.

Further information

Chuan Lu

K.U.Leuven – Dept. ESAT Division of SCD-SISTA

Kasteelpark Arenberg 10

3001 Leuven (Heverlee), Belgium chuan.lu@esat.kuleuven.ac.be

Supervisors: Prof. Sabine Van Huffel Prof. Johan Suykens

Tel.: +32 16 32 18 84 Fax: +32 16 32 19 70

www.esat.kuleuven.ac.be

Visualizing the

correlation between the

variables and the relations between the variables and

clusters Biplot of Ovarian Tumor Data

Patient data

Variable selection

Model building Statistical

analysis

Model evaluation

Reject AUC Accuracy Sensitivity Specificity 10% (16) 0.9420 88.97 83.72 91.4

5% (8) 0.9343 87.50 82.61 89.8 0% (0) 0.9184 84.38 77.78 87.74

ROC curves

Performance of Bayesian RBF LS-SVM with rejection based

on posterior probability

Collected in Unv. Hospitals

Levuen

(1994~1999), 425 records,

25 features, 32% malignant