SymBioSys
K.U.Leuven Center for Systems Biology
Topics to be addressed
International trend
Project concept
Project structure
3 problems and 3 cases
Computational methodology leads to user-friendly tools and real biological impact
Strategic importance internationally
Strategic importance K.U.Leuven
Coherence of the consortium
Systems biology
Biostatistics
Genetics
Sequence analysis
Expression analysis
Personalize d medicine
Nutraceutical s
Post-genomic drug development
(new targets,
toxicogenomics) GMO
s
Systems biology
Biological question
& model
High-throughput technology
Computers
& databases Mathematical
models
The Human Genome Project has catalyzed striking paradigm changes in biology - biology is an information science. [...] Systems biology will play a central role in the 21st century; there is a need for global (high throughput) tools of genomics,
proteomics, and cell biology to decipher biological information; and computer science and applied math will play a commanding role in converting biological information into knowledge.
Leroy Hood, Institute for Systems Biology, Seattle, WA, 2002
Center of Excellence
Become a world-leading bioinformatics center for systems biology
Bioinformatics & microarrays
Three topics of excellence
Gene prioritization by integrative genomics
Graphical models of regulatory motifs and modules
Inference of regulatory networks
We will achieve this goal through
Further build-up of existing expertise
Symbiosis between computational and biological partners
Concrete cases for real biological relevance
Diverse cases for generic applicability in biology
Systems biology
G en es M od ul es N et w or ks
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Cas e Ca se
Project concept
Ca se
Cas e
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Research concept & consortium
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Research concept & consortium
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella systems biology
Biological problem
Experiment design
Biological data
Research concept & consortium
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Research concept & consortium
Probabilistic models In te gr at iv e
ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Research concept & consortium
Probabilistic models
In te gr at iv e ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
Research concept & consortium
In te gr at iv e ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
New biology Probabilistic
models
Research concept & consortium
In te gr at iv e ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics
Biological problem
Experiment design
Biological data
Data analysis
Biological validation
Improved method
New biology Probabilistic
models
Research concept & consortium
In te gr at iv e ge no m ics
Re gu lato ry m od
ule s
Cellular networks
Ge ne tic al g en om ics En
do crin olo gy
Salmonella genomics DME-VIB
Prometa
KUL &
DME-VIB
World
Probabilistic models
Peripheral groups & visibility
Yeast (CMPG
& Bio)
Project structure
WP1. Candidate genes
WP2. Regulatory modules
WP3. Cellular networks
Human
genetics Glucose
regulation VitD
modes of action
Salmonella systems
biology
Network inference Motif
analysis Primary
analysis CGH ChIP
chip Proteomics Metabol omics
Candidate genes
Regulatory modules
Cellular networks
cDNA/
Affy Gene
prioritization
Data analysis Data generation
Project structure (SysBio -> 3 partners)
Genetical genomics Endocrinology
Salmonella
genomics
WP1. Candidate gene prioritization
High-throughput
genomics Statistics
& data mining Candidate genes
?
Human genetics identifies key genes in monogenic and multifactorial diseases
Module analysis Statistical
analysis Gene CGH cDNA/Affy
prioritization
Algorithms Technologies
1
3 2 4
5
WP2. Module discovery
ACT MYLA C
MYL1 MYOG
MYF6 CHRM2
MEF2
MYOD SRF
Bayesian networks Motif
analysis Statistical
analysis CGH cDNA/ ChIP Proteomics Metabolomics
Affy Gene
prioritization
Algorithms Technologies
OH
OH HO
H
Cells/tissues treated with 1,25-(OH)
2D
3Identification of signalling cascades and transcription factors important for the effects
of 1,25-(OH)
2D
3TF
Validation of transcription factor binding to detected
motifs 2 1
3
4
5
VitD affects bone and calcium homeostasis and
has potent anti-proliferative effects
mRNA expression analysis in pancreatic
beta cells: finding mechanisms of diabetes
Motif analysis Statistical
analysis Generation
of antibodies
Functional analysis of beta cells Affymetrix
Gene System Gene
prioritization
Algorithms Technologies
Discovery of new modules for post-transcriptional gene regulation
1
3 4
5
Beta non brain pitui lung kidney fat liver muscl Cells beta
cells muscle pituitary non-beta cells
<-2.5 >2.5
Signal Log Ratio of mRNA in beta -cells versus other tissues
mRNA expression profiles of normal
& diabetic beta cells
2
Mouse models for a common human disease
Microarray-data
ChIP-chip-data
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo Regulator Tag
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo
Regulator Tag
Sequence data
Network inference REMODISCOVERY
R M Functional Class: p-value Seed Profile
M o d u l e 1
Mbp1 Swi6 Swi4 Stb1
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_67 (Swi4)
10 CELL CYCLE AND DNA PROCESSING: 0
10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3
M o d u l e 2
Swi4 Mbp1 Swi6 FKH2
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_8 (Mcm)
40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis:
2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3
34.11 cellular sensing and response:
5.3e-3
01.05.01 C-compound and carbohydrate utilization: 6.8e-3
10.03.04.03 chromosome condensation: 9.4e-3
M o d u l e 3
NDD1 FKH2 Mcm1
M_8 (Mcm) M_30 (Mcm)
43 CELL TYPE DIFFERENTIATION:
3.6e-3
43.01 fungal/microorganismic cell type differentiation: 3.6e-3
10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3
M o d u l e 4
Swi5 (Ace2)
M_8 (Mcm)
32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3
Combinatorial algorithm
WP3. Network inference
Salmonella is a powerful model for systems biology (illustration size)
Network inference Module
analysis Statistical
analysis CGH cDNA/ ChIP Proteomics Metabolomics
Affy Gene
prioritization
Algorithms Technologies
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo Regulator Tag
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo Regulator Tag
0 TF1 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene n
TF2TF3TF4 … TFm
…
1 0 0 1
1 0 1 0 0
1 0 1 0 0
1 1 1 0 1
1 0 1 0 0
0 1 1 0 0
1 0 1 1 0
0 TF1 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene n
TF2TF3TF4 … TFm
…
1 0 0 1
1 0 1 0 0
1 0 1 0 0
1 1 1 0 1
1 0 1 0 0
0 1 1 0 0
1 0 1 1 0
0 M1 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene n
M2 M3 M4 … Mp
…
1 0 0 0
0 0 1 1 1
1 0 0 1 1
1 1 1 0 1
1 0 1 1 1
0 1 1 0 0
1 0 1 1 1
0 M1 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene n
M2 M3 M4 … Mp
…
1 0 0 0
0 0 1 1 1
1 0 0 1 1
1 1 1 0 1
1 0 1 1 1
0 1 1 0 0
1 0 1 1 1
E1 Gene 1 Gene 2 Gene 3
Gene n
E2 E3 E4 … Ex
… Gene 4 Gene 5
E1 Gene 1 Gene 2 Gene 3
Gene n
E2 E3 E4 … Ex
… Gene 4 Gene 5
Preprocessing
Heterogeneous data
Motif
compendium Inferred
network
Toucan 2
CGHGate
Endeavour
Real biological impact
Screenshots of titles of papers demonstrating a real
biological impact of bioinformatics methods?
Bioi@SCD growth
Turnover since 1998
0 200000 400000 600000 800000 1000000 1200000 1400000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Omzet verloop per financieringskanaal 1998-2009
IWT FWO EU DWTC BOF
CMPG
• J. Vanderleyden
• J. Michiels
• B. Cammue
Dept. of Mol. Microbiology
• J. Thevelein
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
• W. Van de Ven
Lab of Clin. & Evolut.
Virology
• A. Vandamme
Dept. of Transgene Tech. &
Gene Therapy
• P. Carmeliet
CME-UZ
• JJ. Cassiman (CME-KUL)
• J. Vermeesch
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
Paper
Paper
Paper
Paper Paper
Paper Paper
Paper
Lab of Functional Biology
• J. Winderickx
LEGENDO
• C. Mathieu
CMPG
• J. Vanderleyden
• J. Michiels
• B. Cammue
Lab of Clin. & Evolut.
Virology
• A. Vandamme
QuantPsy
• I. Van Mechelen
Lab of Functional Biology
• J. Winderickx
LEGENDO
• C. Mathieu
Mol.Cell Biology BioChemistry
• F. Schuit
BioStat
• G. Verbeke
Dept. of Mol. Microbiology
• J. Thevelein
Dept. of Transgene Tech. &
Gene Therapy
• P. Carmeliet
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
•W. Van de Ven
CME-UZ
• JJ. Cassiman
• J. Vermeersch
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
CoE CoE
CoE
CoE
CoE
CoE
CoE
European bioinformatics landscape
Integration bioinformatics & stats
Algorithmic methodologiesz
Three topics of excellence
Bioinformatics & microarrays
1.
Gene prioritization by integrative genomics
2.
Graphical models of regulatory motifs and modules
3.
Bayesian networks for prokaryotic systems biology
(1) Genomic data fusion
After an experiment, many sources of information are available to select the best candidates for modeling and validation
Probabilistic methods can optimize the prioritization
Known genes related to
a disease
or pathway Candidate
genes
Locus
Screening
Multiple data sources
Sequence
Expression
Function
Endeavour [Methodological impact]
http://www.esat.kuleuven.ac.be/endeavour
(2) Regulatory modules [what is a
module? What is transcript. regulation?]
© Davidson EH et al. Science. 2002 Mar 1;295(5560):1669-78.
Gibbs motif finding
Initialization
Sequences
Random motif matrix
Iteration
Sequence scoring
Alignment update
Motif instances
Motif matrix
Termination
Convergence of the alignment
and of the motif matrix
MotifSampler & TOUCAN
(3) Network inference
Reconstruction of the
regulatory network underlying the phenotypic behavior
High throughput data
Benchmarking network inference methods
Realistic network structures Realistic network dynamics
Simulated networks Inferred networks
Graphical models System identification
A K
A v
v
maxsmax
1
v ifA v
N etw or k s im ula tio n N etw or k I nfe re nc e
Workpackages
WP1: Candidate genes
Preliminary data analysis
Microarrays (xM1.1)
Generic
CGH microarrays (gWP1)
Genetical genomics
Dealing with noise (xM2.1)
Knowledge mining (gWP2)
& Combined modeling of different data sets (xM2.3)
Genetical genomics
Generic -> WP3: Salmonella
Software & databases (xM1.4)
Workpackages
WP2: Regulatory modules
Motif and module discovery (xM1.2)
Expression profiling in vitD and analogs pathways (xM3.1, xM3.2)
Beta cell regulation
Transcriptional regulation
Post-transcriptional regulation
Genetic modules
Multiple genome scans and gene modifiers?
Software & databases (xM1.4)
WP3: Cellular networks
Network inference (xM1.3)
Salmonella high-throughput technologies (xM4.1)
Salmonella high-throughput data and analysis (xM4.2)
VitD pathway modeling? Glucose sensing?
Detection of dependence relations (xM2.2)
Software & databases (xM1.4)
Bioi@SCD growth
Personnel since 1998
0 5 10 15 20 25
Jul- 98
Oct- 98
Jan- 99
Apr- 99
Jul- 99
Oct- 99
Jan- 00
Apr- 00
Jul- 00
Oct- 00
Jan- 01
Apr- 01
Jul- 01
Oct- 01
Jan- 02
Apr- 02
Jul- 02
Oct- 02
Jan- 03
Apr- 03
Jul- 03
Oct- 03
Jan- 04
Apr- 04
Jul- 04
Oct- 04
Jan- 05
Personeelsverloop 1998-2005
PhD Postdoc ZAP
Bioi@SCD growth
Publications since 1998
•
0 2 4 6 8 10 12 14 16 18 20
1999 2000 2001 2002 2003 2004 2005
Aantal publicaties van 1999-2005
Books Conference Journal
Bio@SCD growth
5 successful PhDs
Gert Thijs (juni 2003) : Probabilistic methods to search for regulatory elements in sets of coregulated genes
Frank De Smet (mei 2004) : Microarrays : algorithms for knowledge discovery in oncology and molecular biology
Stein Aerts (mei 2004): Computational discovery of cis- regulatory modules in animal genomes
Geert Fannes (juni 2004): Bayesian learning with expert knowledge : Transforming informative priors between Bayesian networks and multilayer perceptrons
Patrick Glenisson (juni 2004) : Integrating scientific
literature with large scale gene expression analysis
Bioi@SCD growth
Software portal
http://www.esat.kuleuven.ac.be/~dna/Bioi/
Number of user on a monthly basis
0 200 400 600 800 1000 1200 1400
Nov-00 Feb-01
May-01 Aug-01
Nov-01 Feb-02
May-02 Aug-02
Nov-02 Feb-03
May-03 Aug-03
Nov-03 Feb-04
Toucan 2
Endeavour
CMPG
• J. Vanderleyden
• J. Michiels
• B. Cammue
Dept. of Mol. Microbiology
• J. Thevelein
CME-MG
• B. Hassan
• P. Marynen
• B. De Strooper
•W. Van de Ven
Intensive Care
• G. Van Den Berghe
Obstetrics & Gynaecology
• I. Vergote
• T. D‘Hooghe
• D. Timmerman
IDO, BOF PostDoc GBOU, PhD
Project, PhD, PostDoc
CAGE
Bruges
Kortrijk
Ghent
Antwerp
Brussels
Leuven
Turnhout
2005
Geel
Hasselt Mechelen
Bruges
Genencor International Ghent
Ablynx AlgoNomics Applied Maths Bayer BioScience Bioin4matrix BioMARIC CropDesign deVGen
Innogenetics
Maize Technologies Int’l Methexis Genomics Xcellentis
Yakult Peakadilly Antwerp DCI-labs Flen Pharma Histogenex
Memo Bead Technologies
Turnhout
DiaMed EuroGen
Janssen Pharmaceutica Geel
Barrier Therapeutics Genzyme Flanders Maia Scientific
Mechelen Bio-Art CryoSave
Galapagos Genomics Tibotec
Virco Brussels Beta-cell Dentech EggCentris
R.E.D. Laboratories
Leuven
4AZA Bioscience Diatos
Neurogenetics PharmaDM reMynd RNA-TEC Thromb-X Tigenix Vivactis
Flemish biotech companies
Bayesian networks Motif
analysis Statistical
analysis CGH ChIP
chip Proteomics Metabol omics
Candidate genes PI:
Regulatory modules PI:
Cellular networks PI:
cDNA/
Affy Gene
prioritization
Algorithmic research Data generation
Project structure – budget (750 KEuro?)
Genetical genomics Endocrinology Salmonella genomics
Postdoc 2 Phd 2
Techn 1
Postdoc 3 Phd 3 Postdoc 1
Phd 1
Techn 2
Techn 3
Phd 4
allerlei
Eerste citaties met “bioinformatics”
Trends Biotechnol 1993
Ann N Y Acad Sci 1993
Network reconstruction based on heterogeneous data
Microarray-data
ChIP-chip-data
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo Regulator Tag
Library of strains, each with a tagged regulator
Chromatin IP to enrich promoters bound by regulator
in vivo
Microarray to identify promoters bound by regulator in vivo
Regulator Tag
Sequence data
Preprocessing Network inference
A K
A v
v
maxsmax
1
v ifA v
Network structures based on real biological networks
Realistic network dynamics Simulated networks
Benchmarking network inference methodologies
R M Functional Class: p-value Seed Profile
M o d u l e 1
Mbp1 Swi6 Swi4 Stb1
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_67 (Swi4)
10 CELL CYCLE AND DNA PROCESSING: 0
10.03 cell cycle: 2.7e-5 10.01 DNA processing: 1.3e-4 42.04 cytoskeleton: 4.2e-3
M o d u l e 2
Swi4 Mbp1 Swi6 FKH2
M_18 (Mbp1) M_12 (Mbp1) M_11 (Swi4) M_8 (Mcm)
40 CELL FATE : 5.2e-4 40.01 cell growth / morphogenesis:
2.6e-3 43 CELL TYPE DIFFERENTIATION: 5.2e-3 43.01 f ungal/microorganismic cell type differentiation: 5.2e-3
34.11 cellular sensing and response:
5.3e-3
01.05.01 C-compound and carbohydrate utilization: 6.8e-3
10.03.04.03 chromosome condensation: 9.4e-3
M o d u l e 3
NDD1 FKH2 Mcm1
M_8 (Mcm) M_30 (Mcm)
43 CELL TYPE DIFFERENTIATION:
3.6e-3
43.01 fungal/microorganismic cell type differentiation: 3.6e-3
10.03.03 cytokinesis (cell division) /septum formation : 4.8e-3
M o d u l e 4
Swi5 (Ace2)
M_8 (Mcm)
32.01 stress response: 3.2e-3 10.03 cell cycle: 8.7e-3
A K
A v
v
maxsmax
1
v ifA v
Realistic network structures
Realistic network dynamics
Simulated networks
Benchmarking network inference methodologies
Inferred networks
Graphical models
System identification
Now: the molecular pipeline
Powerful high-throughput technologies enable genomewide screening
Sequencing, microarrays, etc.
Some genes selected (arbitrarily) for validation
After a long validation the best-known genes are integrated into
a biological model (maken van predictieve modellen op beperkte genen is niet het onderwerp van het project)
Screen
Validate
Model
Future: the systems genomics pipeline
Validate Select
By integrating computation tightly with biological experiments, promising genes are selected and integrated to computational models to retain only the best candidates for validation