• No results found

Data mining scenarios for the discovery of subtypes and the comparison of algorithms


Academic year: 2021

Share "Data mining scenarios for the discovery of subtypes and the comparison of algorithms"


Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst


Data mining scenarios for the discovery of subtypes and the comparison of algorithms

Colas, F.P.R.


Colas, F. P. R. (2009, March 4). Data mining scenarios for the discovery of subtypes and the comparison of algorithms. Retrieved from https://hdl.handle.net/1887/13575

Version: Corrected Publisher’s Version

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden

Downloaded from: https://hdl.handle.net/1887/13575

Note: To cite this publication please use the final published version (if applicable).


Propositions (Stellingen)

by Fabrice Pierre Robert Colas, author of Data Mining Scenarios

for the Discovery of Subtypes and the Comparison of Algorithms

1. In the discovery of valid subtypes in data, there are many more steps involved than only the selection and application of the clustering algorithm. [This thesis]

2. Identifying subtypes can be seen as a data mining scenario that can be applied to different application areas, yet, the step of validation of the subtypes remains specific to the application area. [This thesis]

3. In binary text classification problems, the support vector machines are subject to a performance deterioration for particular combinations of number of features and documents. [This thesis]

4. In large bag of words feature spaces, tightly constrained support vector machines with small C tend to be high performing. However, in this setting, the support vector machines solution is equivalent to a nearest mean classifier. [This thesis]

5. Data mining focuses on generating hypothesis, statistics on validating them. Both research disciplines are complementary and answer different research questions.

6. The R platform for statistical computing is a convenient means to implement data mining scenarios, to extend them to parallel computing environments and to repeat previous research via the use of software packages.

7. In bioinformatics, considerable scientific interaction is necessary to develop a data mining scenario.

8. The most important phase in data analysis is the preparation of the data: known factors of variability must be identified and eventually accounted for.

9. As the language is an instrument of symbolic power, having difficulties with it in work or life leads to professional or social weaknesses. It is the key to social integration.

10. We may feel reassured to try to name and know every single thing. However, the task is both boundless and useless. Focusing will only lead to greater tensions whereas greater attention could enable us to experience life more fully.



This scenario involves techniques to prepare data, a computational approach repeating data modeling to select for a number of clusters and a particular model, as well as other

To prevent cluster analyses that model only the time dimension in the data, we presented a method that helps to select for a type of time adjustment by assessing the cluster

Furthermore, this number also illustrates that the models (VVI,6) tend, in average, to be more likely in terms of BIC scores than the other combinations of model type and number

We start by presenting the design of the implementation: the data preparation methods, the dataset class, the cluster result class, and the methods to characterize, compare and

Therefore, when running experiments on complex classification tasks involving more than two-classes, we are actually comparing n SVM classifiers (for n classes) to single

In fact, on those tasks, small feature space SVM classifiers would, first, exhibit performances that compare with the best ones shown by the 49 nearest neighbors classifier and

Furthermore, in accordance to the several performance drops observed for small C values (illustrated in Figures 8.3 (a) and (c)), the tightly constrained SVM’s can be less stable

To conclude, our comparison of algorithms data mining scenario offers a new view on the problem of classifying text documents into categories. This focus en- abled to show that SVM