Statistical methods for microarray data Goeman, Jelle Jurjen

(1)

Statistical methods for microarray data

Goeman, Jelle Jurjen

Citation

Goeman, J. J. (2006, March 8). Statistical methods for microarray data.

Retrieved from https://hdl.handle.net/1887/4324

Version:

Corrected Publisher’s Version

License:

Licence agreement concerning inclusion of doctoral

thesis in the Institutional Repository of the University

of Leiden

Downloaded from:

https://hdl.handle.net/1887/4324

(2)

C

HAPTER

8 Conclusion

The existence of a curse of dimensionality is manifest in the analysis of microar-ray data. It shows itself when researchers are trying to find genes which are correlated with a certain phenotype: the sheer quantity of seemingly correlated genes makes it difficult to find the truly correlated ones. It appears even more strongly in prediction problems: the enormous variety of possible prediction rules completely obscures the underlying biology. In this confusing situation, biologists look to statisticians for guidance, while statisticians look to the bi-ologists. In reality, both parties carry half of the solution, which lies in the incorporation of biological knowledge into the statistical methodology.

Statistical analysis of microarray data started out with explorative methods, which approach the data impartially and try to let the data ‘speak for them-selves’. Most methods of microarray data now in use are still highly exploratory in nature. This is most notable in unsupervised methods like cluster analysis, but also in prediction methods and methods for finding differentially expressed genes; only rarely do they make any use of biological knowledge. Methods for the analysis of microarray data are mainly directed at generating interesting new hypotheses, which are to be confirmed or disproved at a later stage. Only few of the many hypotheses generated in this way turn out to be meaningful, however, and the task of sifting these out is left to the biologists.

Much can be gained, therefore, by switching to a more knowledgeable way of looking at the microarray data, incorporating biological knowledge into the analysis instead of reserving its use for the interpretation stage only. As learning about genes accumulates, blindly searching for new hypotheses with-out making use of the knowledge already gained will prove increasingly un-satisfactory. Furthermore, hypotheses about biological mechanisms that arise from exploratory data analysis have to be tested somehow. This requires non-explorative statistical methodology to be developed for microarray data analy-sis.

(3)

Chapter 8. Conclusion

searchers to test hypotheses about the involvement of biological processes in a certain phenotype. The same methodology can be used as a more informed type of exploratory data analysis, by incorporating the extensive knowledge about pathways into the data analysis. Similarly, in the factor analysis model for prediction in Chapter 6 it was shown how basic knowledge about the na-ture of microarray data can be used as guidance for the choice of a dimension reduction method.

The use of biological knowledge to improve statistical methods for analyz-ing microarray data is a promisanalyz-ing new development, whose potential has not yet been exhausted. Intelligent use of this information can lead both to more powerful statistical methodology and to more interpretable results. Much work is still to be done. The pathway information which has been exploited for use in testing procedures in this thesis also has good potential for use in predic-tion methods. A similar challenge is to combine analysis of microarray data analysis with information from linkage studies or proteomics data. It is obvi-ous that close cooperation with biologists is essential for the success of this line of research.