• No results found

Index of /SISTA/aerts

N/A
N/A
Protected

Academic year: 2021

Share "Index of /SISTA/aerts"

Copied!
17
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

GCC

(2)

Current situation GCC 1.0

Roche 454

Current

cluster

UZ network 8C 16Gb 2TB UZ NAS Storage UZ NAS Storage 8C 16Gb 8C 16Gb

Per run:

~ 1 Mio reads

~ 2Gb raw data

(3)

New sequencer: 1000x increase

1.1TB / run (200Gbp)

~1000 Mio reads

8 days run!

Basic analysis of 1 full run

< 1 week on 3 nodes with 48Gb RAM and 8 CPU cores each (and needs 7TB space)

Full capacity sequencing = full capacity 24 cpu cores

(4)

Meta-analyses & post-analyses

Several fold higher needs than basic run analyses

Integrate multiple runs (e.g,. patient versus controls,

families, etc)

Integrate with previous data

Integrate with publicly available data

RNA-Seq + gene expression data from GEO

Integrate with other data sources

DNA-Seq + RNA-Seq + Methyl-Seq

Integrate with genome browsers

Galaxy, UCSC, Ensembl

Make analysis pipelines available to users as a service

Custom analyses as a service or in collaboration

(5)

Ideal computing setup

High Performance

Computing (HPC)

(6)

UZ-GBIOMED-VSC

8C 16Gb 2TB 8C 16Gb 8C 16Gb UZ NAS Storage UZ NAS Storage

- Additional RAM (32Gb or 48 Gb per node) - Additional storage? DAS or NAS? Dell, NetApp? DAS or NAS? Dell, NetApp? Open-MPI SGE Distributed computing Torque/PBS Distributed computing Flexible computing ~ 100 cpu 6Gb RAM/core NetApp +DDN Storage NetApp +DDN Storage - Servers - Storage - Switches Software: - Academic tools - CLCBio? Software:

- CASAVA (parall. by user) - Academic: bowtie, bwa, … - CLCBio? UZ-Patient data Software: - CASAVA - CLCBio - Roche

- Computing (0,5 EUR / cpu-hour) - Storage (750-1500 EUR / TB)

VSC

gbiomed

(7)

To be discussed

How can HiSEQ2000 choose between UZ and KULeuven network to send run data to storage?

1Gb

350 Gb / run compressed

Where to store data after secondary analysis?

Cheap storageExternal HDDtape

Who does what?

Jeroen / Jan for UZ?

Stein / Gert / Raf for Biomed?

Can we already buy additional RAM for UZ cluster?

Can we connect gbiomed servers directly to UZ storage?

What are the requirements?

Estimate load over 3 levels

# users

# run

(8)

What’s next

Decide on gbiomed hardware

List of things needed at UZ

Start testing CASAVA on UZ system and on VSC

Test CLCBio on UZ system for Illumina data

(9)
(10)
(11)

Storage

How much do we need?

1.1 TB per run

7 TB space during analysis

BUT: keep only runs that are being analyzed

~ 3 at a time?

10-15 TB

After analysis:

Data delivered to client

Data compressed and moved to offline storage

Cheap HDD array?Tape?

(12)

Proposal for GCC2.0

(ideas under construction)

UZ Computing nodes (existing) 8C 16Gb 2TB UZ NetApp Storage UZ NetApp Storage 8C 16Gb 8C 16Gb

Patient-related data Non-patient-related data (e.g., model organisms, cell lines, …)

32C 256Gb 8C 48Gb 8C 48Gb gbiomed computing nodes Fast interconnect; high I/O bandwidth

Illumina HiSEQ2000 Roche 454 ICTS/VSC NetApp +DDN Storage ICTS/VSC NetApp +DDN Storage

VSC

(existing), pay per cpu-hour

!

Non-patient-related data

!

(13)

GCC2.0 features

Divide and conquer: solution at 3 levels

UZ: for UZ-patient-related data (protected)

Gbiomed: ad hoc, flexible computing for research (non-UZ-patient related data)VSC: high-performance computing (non UZ-patient related data)

Storage (too expensive to duplicate)

VSC storage with Gbiomed access (create 10Gb fast interconnect from ICTS to gbiomed)UZ storage with Gbiomed access (create ‘open-access’ policy for non-patient related data)Gbiomed ad hoc storage (HDDs in the local servers)

Computing

– VSC for HPC

Servers in UZ (patient-related data)

Servers in gbiomed (for research-related ad hoc analyses, web services, development, software testing, …)

(14)

GCC2.0 Cost, Timing & Effort

estimates

Budget from Stichting tegen Kanker

200-250 K left for computing

• Solution for the first 3 years should be possible (excluding bioinformatics manpower)

• Budget spread between VSC-gbiomed-UZ: to be decided internally in genomics core

• VSC x%

– Storage (86.400 EUR for 32 TB; ~80 TB is needed for 25 runs per year)

– Computing time (29.594 EUR for 55.000 cpu-hours)

• Gbiomed local servers and local storage y%

UZ additional storage z%

Software licenses (CLCBio) (price quote requested)

More investments needed over time (e.g., new hardware is only for 3 years)

Timing: 31 August 2010?

Estimated effort (to be discussed)

VSC:

• Create 10Gb ethernet link to gbiomed (cost?)

• … mandays for startup and testing (network connections, storage, software)

• Maintenance included in price

Genomics Core bioinformaticians (VRC, CME)

• … mandays for startup and testing

Gbiomed IT:

… mandays for setting-up local servers & integration with ICTS storage… FTE for maintenance of local servers

(15)

Hurdles to overcome

1) 10Gb ethernet link between VSC and gbiomed

For non-UZ-patient related dataTo transfer Illumina data to VSC

To run ad hoc analyses on local gbiomed servers, connected to the VSC storage, without the need to duplicate the storage solution and the data (too costly)

An absolute requirementCurrently not available

A necessary investment for future VSC-BMW interactions

2) UZ-Patient-related data cannot be transferred to VSC storage, nor

computed at VSC

Can VSC provide a secure transfer, storage and computing environment for UZ-data? If not, data analysis and storage for UZ-data remains in UZ.

3) Link between UZ storage and gbiomed for non-patient related data

Gbiomed-UZ

10Gb link is possible in principle. Perhaps during transition period (while waiting for 10Gb link VSC-gbiomed)?

(16)

Alternatives

All-in-one solution

PSSCLabs

(17)

Bioinformatics analyses

Estimated effort from Genomics Core bioinformatician for basic

analysis of 1 run: ~2-3 mandays

Included in service fee?

This analysis will not be satisfactory for most projects

Fee-based bioinformatics and data analysis service for more

advanced analyses?

Many users have a bioinformatician in the group or already

collaborate with bioinformaticians

Contribution in the service fee for GCC hardware & maintenance

cost, and software licenses

Estimated effort:

Either only basic analysis services are offered: ½ FTE bioinformatics postdoc Or basic plus advanced bioinformatics services will be offered: 1 FTE

Referenties

GERELATEERDE DOCUMENTEN

The standard copyright agreement for articles in Zygon does allow for “Green open access” of the penultimate version for all articles, while those articles published “Gold”

Het MAB verscheen voor het eerst in januari 1924 en publiceert sindsdien bijdragen op het terrein van accoun- tancy en bedrijfseconomie, gericht op het toepassen van

Alle wetenschappelijke artikelen die voortkomen uit of rapporteren over onderzoek dat geheel of gedeelte- lijk is gesubsidieerd door NWO moeten op het moment van publicatie

Indien aanvragen voldoen aan deze voorwaarden, wordt een subsidie ter beschikking gesteld zolang het voor het fonds beschikbare budget niet is uitgeput. Aanvragen die niet voldoen

Het Algemeen Bestuur van NWO heeft voor het NWO-fonds Open Access in totaal 5 miljoen euro beschikbaar gesteld, waarvan 2,5 miljoen euro beschikbaar is voor het bekostigen van

Op onze website bulletin.knob.nl zijn al deze ontwikkelingen te volgen, en zijn alle teksten via open access toegankelijk.. In 2018 zullen we ook een begin maken met het

Bereken de stukken, waarin de bissectrice van een basishoek het overstaande

Nederland kan op korte termijn gebruik worden gemaakt van surrogaat alcoholongevallen: nachtelijke dodelijke ongevallen met een rijdende personenauto.. Op langere