• No results found

Shortcomings of artificial data analysis

N/A
N/A
Protected

Academic year: 2021

Share "Shortcomings of artificial data analysis "

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Shortcomings of artificial data analysis

Although artificial data are helpful to gain understanding in properties of algorithms, creating realistic scenarios and choosing good performance measures remains extremely tricky. As mentioned before, we decided to use artificial data from a recent comparative study (Prelic et al., 2006) in order to avoid any bias in creating our own data sets. However, one has to keep in mind this data suffer some artifacts:

• The modules are either noisy or overlapping. In reality, transcriptional modules are both noisy and overlapping.

• It is unclear whether the interaction in the overlap is best simulated by additive, averaging, multiplicative, or Boolean logical models.

• Whenever modules overlap, the intersections are valid modules too, and (if statistically significant) their unions could be considered coherent transcription units as well. Indeed, depending on the user parameter (that specifies the resolution) our method finds the intersection, the ‘module’ or the union of two or more ‘modules’, explaining the somewhat lower bicluster relevance scores in those scenarios (see supplementary material). Query-driven biclustering allows obtaining a detailed local multi-resolution view rather than a global single-resolution view. Whenever interesting changes in bicluster composition occur, critical resolution values can be identified.

• In reality, the number of genes in an interesting module is much smaller than the number of genes in the background. The artificial data we use reflects this assumption poorly, with modules that contain 10% of the total gene content. However, for computational reasons it is often infeasible to perform extensive simulation studies on large data sets.

• The artificial data sets contain modules of equal size and (nearly) equal strength. In real data sets, we expect a few strong modules to dominate (for example ribosome biogenesis in yeast expression arrays). In the latter situation, the advantage of query based approaches will be more evident.

REFERENCES

Prelic,A. et al. (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics., 22, 1122-1129.

Referenties

GERELATEERDE DOCUMENTEN

Conclusions: Our preventive measures against needlestick injuries in the operating theatre proved to be valid for decreasing cases at intraoperative management and at delivery

The situation in Australia was then, and still is, far from 'best practice' in data and digital issues - the lack of an enforceable constitutional right to privacy, the

A data set of reaching synergies from able-bodied individuals was used to train a radial basis function artificial neural network with upper arm/ forearm tangential

Identifying subtypes can be seen as a data mining scenario that can be applied to different application areas, yet, the step of validation of the subtypes

Because encryption is given as a measure in the GDPR it should be investigated if the algorithms developed in the past can still be used for sensitive information and if there

Table 1 shows an overview of workload (λ), service time distribution (µ), IT equipment specifications (mean booting time α bt , mean shutting down time α sd , mean sleeping time α

Table 1 shows an overview of workload (λ), service time distribution (µ), IT equipment specifications (mean booting time α bt , mean shutting down time α sd , mean sleeping time α

Table 1 shows an overview of workload (λ), service time distribution (μ), IT equipment specifications (mean booting time α bt , mean shutting down time α sd , mean sleeping time α