• No results found

Statistical methods for microarray data Goeman, Jelle Jurjen

N/A
N/A
Protected

Academic year: 2021

Share "Statistical methods for microarray data Goeman, Jelle Jurjen"

Copied!
3
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Statistical methods for microarray data

Goeman, Jelle Jurjen

Citation

Goeman, J. J. (2006, March 8). Statistical methods for microarray data.

Retrieved from https://hdl.handle.net/1887/4324

Version:

Corrected Publisher’s Version

License:

Licence agreement concerning inclusion of doctoral

thesis in the Institutional Repository of the University

of Leiden

Downloaded from:

https://hdl.handle.net/1887/4324

(2)

C

HAPTER

8

Conclusion

The existence of a curse of dimensionality is manifest in the analysis of microar-ray data. It shows itself when researchers are trying to find genes which are correlated with a certain phenotype: the sheer quantity of seemingly correlated genes makes it difficult to find the truly correlated ones. It appears even more strongly in prediction problems: the enormous variety of possible prediction rules completely obscures the underlying biology. In this confusing situation, biologists look to statisticians for guidance, while statisticians look to the bi-ologists. In reality, both parties carry half of the solution, which lies in the incorporation of biological knowledge into the statistical methodology.

Statistical analysis of microarray data started out with explorative methods, which approach the data impartially and try to let the data ‘speak for them-selves’. Most methods of microarray data now in use are still highly exploratory in nature. This is most notable in unsupervised methods like cluster analysis, but also in prediction methods and methods for finding differentially expressed genes; only rarely do they make any use of biological knowledge. Methods for the analysis of microarray data are mainly directed at generating interesting new hypotheses, which are to be confirmed or disproved at a later stage. Only few of the many hypotheses generated in this way turn out to be meaningful, however, and the task of sifting these out is left to the biologists.

Much can be gained, therefore, by switching to a more knowledgeable way of looking at the microarray data, incorporating biological knowledge into the analysis instead of reserving its use for the interpretation stage only. As learning about genes accumulates, blindly searching for new hypotheses with-out making use of the knowledge already gained will prove increasingly un-satisfactory. Furthermore, hypotheses about biological mechanisms that arise from exploratory data analysis have to be tested somehow. This requires non-explorative statistical methodology to be developed for microarray data analy-sis.

(3)

Chapter 8. Conclusion

searchers to test hypotheses about the involvement of biological processes in a certain phenotype. The same methodology can be used as a more informed type of exploratory data analysis, by incorporating the extensive knowledge about pathways into the data analysis. Similarly, in the factor analysis model for prediction in Chapter 6 it was shown how basic knowledge about the na-ture of microarray data can be used as guidance for the choice of a dimension reduction method.

The use of biological knowledge to improve statistical methods for analyz-ing microarray data is a promisanalyz-ing new development, whose potential has not yet been exhausted. Intelligent use of this information can lead both to more powerful statistical methodology and to more interpretable results. Much work is still to be done. The pathway information which has been exploited for use in testing procedures in this thesis also has good potential for use in predic-tion methods. A similar challenge is to combine analysis of microarray data analysis with information from linkage studies or proteomics data. It is obvi-ous that close cooperation with biologists is essential for the success of this line of research.

Referenties

GERELATEERDE DOCUMENTEN

Using this test it can be determined whether the global expression pattern of a group of genes is significantly related to some clinical outcome of interest.. Groups of genes may be

The Skeletal development pathway is interesting in its own way: it is clearly not associated with survival (p = 0.5) and this is quite exceptional for a pathway of this size in

By specifying the distance metric in covariate space, users can choose the alternative against which the test is directed, making it either an omnibus goodness-of-fit test or a test

The em- pirical Bayes score test often has better power than the F-test in the situations where there are errors in variables in the design matrix X, when a small set of

Based on this analysis, we argue for a doing principal components regression with a relatively small number of components and us- ing only a subset of the predictor variables,

If a sample has a positive bar, its expression profile is relatively similar to that of samples which have the same value of the clinical variable and relatively unlike the profile

GO-Mapper: functional analysis of gene expres- sion data using the expression level as a score to evaluate Gene Ontology terms.. Linear models and empirical Bayes methods for

Bij de analyse van microarray-data wordt vaak tegelijkertijd aangenomen dat (1.) slechts een klein aantal genen geassocieerd is met een bepaalde klinische varia- bele en dat