• No results found

Optimizing variable selection and cost using a genetic algorithm for modelling adnexal masses with Bayesian networks

N/A
N/A
Protected

Academic year: 2021

Share "Optimizing variable selection and cost using a genetic algorithm for modelling adnexal masses with Bayesian networks"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Optimizing variable selection and cost using a genetic algorithm for modelling adnexal masses with Bayesian networks

Olivier Gevaert

1

, Bart De Moor

1

, Dirk Timmerman

2

1

Dept. Electrical Engineering, ESAT-SCD-BioI, Katholieke Universiteit Leuven, Belgium.

2

Division of Gynaecological Oncology, Department of Obstetrics and Gynaecology, UZ Gasthuisberg, Katholieke Universiteit Leuven, Leuven, Belgium

Statistical methods have already proven their usefulness in diagnosis and prognosis of complex diseases. When the number of variables becomes large or the relationships between the variables becomes large, there is a need for advanced methods to build models that help clinicians in making reliable predictions on the outcome. Possible methods include logistic regression (Timmerman et al., 2005), Support Vector Machines (Pochet and Suykens, 2006; De Smet et al., 2006) and Bayesian networks (Gevaert et al., 2006). These statistical or mathematical methods provide a more unbiased way of analysing clinical data and allow modelling a large number of variables.

We have used Bayesian networks to model the malignancy of adnexal masses based on clinical data. Bayesian networks offer an alternative way of modelling clinical data since a Bayesian network can explain its reasoning. We used Bayesian networks in combination with a genetic algorithm that can remove or add variables in the model such that variable selection was performed. Moreover, since each variable had a cost associated with it, we concurrently minimized the cost of the selected variables. The cost of each variable was specified by a gynaecological expert in the field and reflected a combination of the subjectivity, the financial cost and the time cost that was necessary to measure a specific variable.

This method was applied to the data resulting from the International Ovarian Tumour Analysis (IOTA) multicenter study on the pre-operative characterisation of ovarian tumours. The first phase of IOTA, which was initiated in 1998, was finished in 2002 and resulted in a data set consisting of 1346 masses and 1152 patients with 68 variables. We used this data set to construct a Bayesian network and to predict the malignancy of ovarian masses while optimizing variable selection and cost. The results showed that the performance was similar to using all variables which means that a subset can be chosen with less “costly” variables without losing prediction accuracy. The developed models will be prospectively tested on new patients which are being collected in the next phase of IOTA.

Reference List

De Smet,F., De Brabanter,J., Van den,B.T., Pochet,N., Amant,F., Van Holsbeke,C., Moerman,P., De Moor,B., Vergote,I., and Timmerman,D. (2006). New models to predict depth of infiltration in endometrial carcinoma based on transvaginal sonography.

Ultrasound Obstet Gynecol 27, 664-671.

Gevaert,O., De Smet,F., Kirk,E., Van Calster,B., Bourne,T., Van Huffel,S., Moreau,Y., Timmerman,D., De Moor,B., and Condous,G.

(2006). Predicting the outcome of pregnancies of unknown location: Bayesian networks with expert prior information compared to logistic regression. Human reproduction 21.

Pochet,N.L. and Suykens,J.A. (2006). Support vector machines versus logistic regression: improving prospective performance in clinical decision-making. Ultrasound Obstet Gynecol 27, 607-608.

Timmerman,D., Testa,A.C., Bourne,T., Ferrazzi,E., Ameye,L., Konstantinovic,M.L., Van Calster,B., Collins,W.P., Vergote,I., Van Huffel,S., and Valentin,L. (2005). Logistic regression model to distinguish between the benign and malignant adnexal mass before surgery: a multicenter study by the International Ovarian Tumor Analysis Group. J Clin Oncol 23, 8794-8801.

Referenties

GERELATEERDE DOCUMENTEN

hoogte-analysator doorgelaten worden. De nSpectroaoanner". Een autOm&ti~ch aftastapparaat. dat in de plaats van de brandbediening van de lage drempelwaarde op

In de interviews met de ploegenwerkers is gevraagd of er werkzaamheden op de gasfabriek zijn die weliswaar nu door de ploeg uitgevoerd worden, maar die net zo goed of beter in

Training decreased sensitivity (84% vs. The performance of pattern recognition was poorer in the hands of the trainees than in the hands of the experts. The sensitivities of

Analyis of the performance observed in old and new centers, showed almost no difference in the prospective performance of new and old centers, indicating that BN1 can be used

We then formulate Multiple NET (simply called m-NET, in the following), an optimization problem that generalizes NET to any differentiable loss function and permits to

These observations are supported by Gard (2008:184) who states, “While I would reject the idea of a general ‘boys crisis’, it remains true that there are many boys who

The performance of the Risk of Malignancy Index (RMI) and two logistic regression (LR) models LR1 and LR2, using respectively MODEL1 and MODEL2 as inputs, are. also shown

1.44 Interviewer: We have talked a little about, Jeffrey Arnett, who describes emerging adulthood – I want to know, do you experience your life currently as a time