• No results found

University of Groningen Understanding the gut ecosystem: bugs, drugs & diseases Vich Vila, Arnau

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen Understanding the gut ecosystem: bugs, drugs & diseases Vich Vila, Arnau"

Copied!
35
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Understanding the gut ecosystem: bugs, drugs & diseases

Vich Vila, Arnau

DOI:

10.33612/diss.102587978

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Vich Vila, A. (2019). Understanding the gut ecosystem: bugs, drugs & diseases. University of Groningen. https://doi.org/10.33612/diss.102587978

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)
(3)

Whole exome sequencing analyses reveal

gene-microbiota interactions in the context of

inflammatory bowel disease

Shixian Hu*, Arnau Vich Vila *, Ranko Gacesa*, Valerie Collij *, Christine Stevens, Manuel A Rivas., Floris Imhann, Hendrik van Dullemen, Gerard Dijkstra, Marijn C. Visschedijk, Eleonora A. Festen,

Ramnik Xavier, Jingyuan Fu, Mark J.Daly, Cisca Wijmenga, Alexandra Zhernakova #, Alexander Kurilshikov #, Rinse K. Weersma #

*Co-first authors / # Co-last authors

(4)

Abstract

Chapter 4

Both the gut microbiome and host genetics are known to play significant roles in the pathogenesis of inflammatory bowel disease (IBD). However, the interaction between these two factors and its implications in the etiology of the diseases remains underexplored. Here we report on the influence of host genetics on the gut microbiome in IBD.

We performed whole exome sequencing of the host genome and whole genome shotgun sequencing of 1464 fecal samples (525 patients with IBD and 939 population-based controls). A four-step analysis was then used to assess the effect of genetics on the gut microbiome: 1) exome-wide microbial quantitative trait loci (mbQTL) analyses, 2) a targeted approach focused on IBD-associated genomic regions and protein truncating variants (PTVs, MAF >5%), 3) a gene-based burden test on PTVs with MAF <5% and 4) joint analysis of both cohorts to identify the interactions between disease and genetics.

We identified 12 mbQTLs, including variants in the IBD-associated genes IL17REL, MYRF, SEC16A and WDR78. For example, the acetyl-CoA biosynthesis pathway, which is involved in short chain fatty acids production, was associated with MYRF. Changes in microbial metabolic potential were observed in participants carrying rare PTVs in CYP2D6 and GPR151 genes. Moreover, Interaction analyses confirmed previously known IBD disease-specific mbQTLs in TNFSF15.

This study highlights that both common and rare genetic variants affecting the immune system are key factors in shaping the gut microbiota in the context of IBD and pinpoints potential mechanisms for disease treatment.

(5)

Inflammatory bowel disease (IBD), comprised of Crohn’s disease and ulcerative colitis, is a chronic inflammatory condition of the gut with an increasing incidence in westernized countries1. Large-scale genome-wide association studies (GWAS) have identified more than 200 genetic loci associated with IBD, including genes implicated in the immune pathways involved in responses to gut microbes2.

Extensive changes in the composition of the gut microbiota have been reported in pa-tients with IBD. This disease signature is mainly characterized by a loss of microbial rich-ness, a decrease in species that produce short-chain fatty acids (SCFAs) and an expansion of potentially pathogenic bacteria3,4. As neither genetics nor microbiome studies have re-vealed the triggering factors for IBD, there is an increasing need to study host-microbial interactions in order to understand the etiology and progression of the disease5,6.

To date, both mouse models and human studies have shown that IBD-associated genes interact with the intestinal microbiome via regulation of the mucosal physical barrier as well as immune responses. For example, the NOD-like receptor 2 (NOD2) is involved in the bacterial peptidoglycan recognition7. It has been shown that NOD2 knock-out mice show ineffective recognition and clearance of bacterial pathogens. As a consequence, these mice present increased abundances of pathogenic bacteria from the Bacteroides and

Escher-ichia genera8,9. Another host–microbiome interaction involves ATG16L1, a gene impli-cated in autophagy. In patients with Crohn’s disease, ATG16L1-T300A mutation carriers have more pathosymbionts in their gut mucosa11. Recently, genome-wide host-microbi-ota association analyses have reported correlations between variants in immune-related genes and microbial features. For example, IL10 has been associated with the abundance of Enterobacteriaceae12 and IL1R2 associated with the overall community composition (beta-diversity)13.

To date, studies looking into the interaction between host genetics and gut microbiota have either relied on GWAS in general populations12,13 or on targeted analyses of a few IBD-associated genes in IBD patient cohorts14,15. The discovery of host–microbiota inter-actions, moreover, has been hampered by the large influence of intrinsic and environmental factors on the gut microbiome and relatively low microbial heritability16.

The aim of this study was to expand our current knowledge of host–gut microbiota interactions. We combined whole exome sequencing (WES) of the host genome with

(6)

metagenomics sequencing of fecal samples in a population cohort and in an IBD cohort. In addition to whole-exome-wide analyses, we investigated disease-specific interactions and the influence of rare variants on the gut microbiota in order to identify mechanisms involved in gut homeostasis and disease development.

This study included two independent Dutch cohorts: a population-representative cohort (LifeLines-DEEP) from the northern part of the Netherlands and an IBD cohort made up of patients diagnosed in the specialized IBD clinic of the University Medical Center Groningen (Groningen, the Netherlands). Our LifeLines-DEEP cohort comprises 939 individuals (59.74% female, mean age 45.24±13.46). The IBD cohort comprises 525 IBD patients (61.33% female, mean age 43.18±14.46). We excluded self-reported IBD patients from LifeLines-DEEP cohort, while the presence of an ileoanal pouch or a stoma was an exclusion criterion in the IBD cohort (n=66) (Supplementary table 1).

WES was performed on blood samples. Library preparation and sequencing was done at the Broad Institute of MIT and Harvard. On average, 86.06 million high quality reads were generated per sample and 98.85% of reads were aligned to a human reference ge-nome (hg19). Moreover, 81% of the exonic regions were covered with a read depth >30x. Next, the Genome Analysis Toolkit17 of the Broad Institute was used for variant calling. After quality control (Supplementary methods), 73,164 common variants (minor allele frequency, MAF >5%) and 98,878 rare variants (MAF <5%) were considered for further analyses.

Metagenomic sequencing was performed for fecal samples. For each cohort, taxa present in fewer than 10% of total samples and pathways present in fewer than 25% of samples were excluded from the analyses (Supplementary methods, Supplementary Table 2). We then normalized the relative abundances of 242 microbial taxa and 301 pathways present in both cohorts through inverse rank transformation. For all analyses, linear regression was used to adjust for the effect of the confounders: age, sex, read depth, BMI, smok-ing, use of proton pump inhibitors (PPIs) and use of laxatives and antibiotics within 3 months of sample collection. In addition, the effect of disease location was considered in the analysis of the IBD cohort (Supplementary Methods).

Methods

Study cohorts

WES and data processing

(7)

Microbial features were considered as quantitative traits. The Spearman correlation meth-od was applied to determine the relationship between non-zero microbial data and host genotype in a four-step approach (Figure 1).

73,164 high quality common variants (MAF >5%) were correlated with the relative abun-dances of microbial taxa and metabolic pathways. First, we tested associations in the Life-Lines-DEEP cohort (discovery stage) and selected signals with P < 5 × 10-5. Second, we replicated these in the IBD cohort and only kept associations with the same allelic direc-tion that passed a replicadirec-tion threshold P < 0.05 (replicadirec-tion stage). Third, we performed meta-analyses on these datasets using a weighted-Z-score approach. The criteria of signif-icance was P-values that met an whole exome-wide threshold of 6.83 × 10-7 (Bonferroni method, n=73,164 variants). We then repeated this analysis switching the discovery and replication cohorts: using the IBD cohort as discovery and LifeLines-DEEP as replication.

We selected two sets of variants for targeted analysis: protein truncating variants (PTVs)18 and variants located in known IBD-associated genes. We predicted 316 stop-gain, splice-disrupting and frameshift variants with MAF >5% in our analyses. We selected all genetic variants with a MAF >5% present in genomic loci that have been associated to IBD2 (n=3,010). Associations between these variants and microbiome traits were per-formed following the same procedure described above in Step 1. The significance threshold was adjusted according to the number of genetic variants tested: P < 0.001 in the discovery cohort, P < 0.05 in the replication cohort and a final meta P meeting 1.5 × 10-5 (Bonferroni method, n=3,309 variants).

To identify the effect of rare mutations, we performed gene-based burden tests by using the variant’s score instead of individual genotype in correlation analyses, keeping only PTVs with MAF <5% and calculating per-gene scores19. The number of genes implicated in this analysis was 980, so the final meta P was 5 × 10-5, with a discovery P of 0.005 and a repli-cation P of 0.05.

Next, we investigated the mbQTLs that were only significant in one of the cohorts in steps 1 and 2. To identify whether the presence and absence of IBD could have an effect mbQTL analyses

Step 1. Whole-exome-wide association meta-analyses

Step 2. Meta-analyses of selected variants

Step 3. Gene-based burden test meta-analyses

(8)

on the observed mbQTLs, we performed association analyses combining both cohorts and adding diseases and the interaction between genotype and diseases as covariates (Supple-mentary Methods)20. Significance thresholds at whole-exome-wide level were P < 6.83 × 10-7(Bonferroni method, n=73,164 variants) for the discovery cohort, P > 0.05 for the replication cohort and significant interaction P (IBD × genotype) < 0.0013 (Bonferroni meth-od, n=38 variants, including 17 IBD-specific and 21 LifeLines-DEEP-specific observed mbQTLs) (Supplementary Table 3). The criteria for significance in the targeted-level anal-yses were discovery cohort P < 1.5 × 10-5, replication cohort P > 0.05, significant interac-tion P (IBD × genotype) < 0.0014 (Bonferroni method, n=36, including 12 IBD-specific and 24 LifeLines-DEEP-specific mbQTLs) (Supplementary Table 4). To validate these analyses, we randomly permutated the disease status across all samples 999 times (Supplementary Methods). Finally, we analyzed the identified mbQTLs from step 1-3 in patients with UC and CD only to assess whether the identified mbQTLs where from a specific IBD subtype.

To further explore the function of the observed mbQTLs, we examined tissue-specific gene expression (eQTLs) in the GTEx Consortium database21 and used the Enrichr22 and FUMAGWAS23 databases to annotate the biological function and immunologic signa-tures of the genes with a mbQTL-effect in the whole-exome-wide analyses.

Annotation of Genetic Variants

After correcting for the number of variants tested (n=73,164), the exome-wide mbQTL analysis (Step 1) identified associations between 10 genetic variants and 11 microbial fea-tures. Four variants were associated to bacterial metabolic pathways involved in degrada-tion of glucarate, the tricarboxylic acid cycle (TCA) cycle, Coenzyme A (CoA) biosyn-thesis and glycogen biosynbiosyn-thesis, while the other six variants were associated with relative abundance of bacteria (Figure 2, Supplementary table 5). The most significant associations were found between the minor allele of an intronic SNP (rs2238001, c.46+4245T>C) in the MYRF gene, which is located in an IBD-associated loci, and decreased abundance of two microbial pathways involved in carbohydrate metabolism: Acetyl-CoA biosynthesis (PWY-5173, meta P = 7.50 × 10-8) and glyoxylate bypass (TCA-GLYOX-BYPASS, meta P = 6.16 ×10-7) (Figure 3A, B; Supplementary figure 1). In the step 2 analysis, the same SNP was also observed to be associated with another metabolic pathway (GLYCOLY-SIS-TCA-GLYOX-BYPASS, meta P = 2.73 × 10-6). These pathways are mainly predicted from Escherichia coli. Concordantly, E. coli shows the strongest association among all 242 microbial taxa to MYRF (meta P = 6.00 × 10-3), although it does not meet the statistically

Results

(9)

Schematic overview of the study.

A) we performed whole exome sequencing of the host genome and whole genome shotgun sequencing of fecal samples of 525 individuals (IBD) and 939 controls (LifeLines-DEEP). Nine covariates (age, sex, BMI, smok-ing status, medication use (antibiotics, PPIs or laxatives), disease location (in the IBD cohort) and sequencsmok-ing read depth) were used to correct for relative abundances of 242 taxa and 301 pathways; B) a four-steps anal-ysis was performed: Step 1) a meta-analanal-ysis in which 73,164 exome-wide common variants with MAF >5% were used for association analyses for microbial traits. Step 2) a meta-analysis using a targeted approach that only tested for 3,010 variants located in IBD-associated genes known from IBD GWAS and protein truncating variants (PTVs) with MAF >5%. Step 3) a meta-analysis using a gene-based burden test for only the 980 genes with rare PTVs (MAF <5%). Step 4) joint analysis combining the two cohorts for disease and genetics inter-action analyses. Step 4 focused only on single-cohort-signifi cant mbQTLs from Steps 1 and 2 while adding a disease and a genetic interaction term into the model. All analyses were confi ned to non-zero values of taxa and pathways. All signifi cance thresholds were set up by Bonferroni correction taking all variants/genes used into account.

BMI, body mass index. PPI, proton-pump inhibitor. IBD, IBD cohort. LLD, LifeLines-DEEP cohort.

Fig. 1

(10)

Whole exome-wide meta-analysis results from LifeLines-DEEP and 1000IBD cohorts

73,164 common variants (MAF > 5%), 317 taxa and 324 pathways (corrected for all covariates) were used in the association analyses. The discovery signifi cance threshold was P < 5 × 10-5 and the replication sig-nifi cance threshold was P < 0.05. Manhattan plot displays -log10 P values for all association tests. Green

(11)

and blue represent taxonomies and pathways, respectively. Red line indicates the whole-exome-wide association signifi cance threshold: meta P < 6.83 × 10-7 (n = 73,164 variants, Bonferroni correction).

(12)

mbQTL and eQTL analyses of MYRF

A) Spearman correlation between genotype (TT, TC, CC) of rs2238001 in MYRF and the relative abun-dance of Acetyl-CoA biosynthesis (IBD cohort, P = 1.43 × 10-3, r = -0.19; LifeLines-DEEP cohort, P = 1.47 × 10-5, r = -0.20; meta P = 7.50 × 10-8), the glyoxylate bypass and TCA MetaCyc pathways (IBD cohort, P = 0.0149, r = -0.16; LifeLines-DEEP cohort, P = 1.04 × 10-5, r = -0.22; meta P = 6.07 × 10-7). B) The rs2238001 locus zoomed in on the IBD-associated region, including the IBD-associated genes MYRF, FADS2 and

(13)

FADS3. P values are derived from meta-analyses between variants and the relative abundance of Acetyl-CoA biosynthesis. C) eQTL analysis between rs2238001 and MYRF gene expression in colon tissue from the GTEx database (n = 246 tissues, P = 2.50 × 10-7).

(14)

significant threshold. Examination of the GTEx database revealed that the rs2238001 has an eQTL effect specific to colon tissue that results in increased expression of MYRF (P = 2.50 × 10-7) (Figure 3C).

The minor allele of a synonymous variant in the immune-related gene CABIN1 (rs17854875, c.5745C>T, p.Ala1915Ala) was associated with an increase of D-glucarate degradation (GLUCARDEG-PWY, meta P = 4.15 × 10-7). Another SNP located near the gene IL17REL (rs5845912, AC>A) was correlated with a lower abundance of the species Alistipes indistinctus (meta P = 4.36 × 10-7). Variants in this gene have been reported to be associated with ulcerative colitis. IL17REL encodes interleukin 17 receptor E-like, a homolog of IL-17 receptor E that is considered to be a part of the IL-17 pathway that initiates a T helper 2–mediated immune response24. When analyzing this SNP in patients with UC and CD separately, we did not identify this mbQTL to be statistically significant (P-values of 0.811 and 0.873, respectively, Supplementary table 11).

Gene function enrichment analysis of all 10 mbQTLs (Table 1) identified enrichment in gene functions related to mature B cell differentiation (GO:0002313, FDR=0.038) and CD4 and CD8 T-cell differentiation pathways (GSE31082, FDR = 0.0103) (Supplemen-tary table 6).

Two additional IBD-associated genes with mbQTLs were identified in our targeted ap-proach (Step 2) (Table 2; Supplementary table 7). The top significant variant, rs10781497 (c.834G>A, p.Asp278Asp) located in the SEC16A gene, was associated with lower levels of bacterial biosynthesis of thiamin phosphate (THISYN-PWY) and thiazole (PWY-6892) (Supplementary figure 2A), and a SNP in WDR78 (rs74609208, c.2497-18C>A) was associated with higher level of biosynthesis of rhamnose (DTDPRHAMSYN-PWY) (Supplementary figure 2B).

To study the effect of rare variants with predicted protein changing properties, we per-formed a gene-based burden test (Step 3). Here we identified eight associations between four genes and eight microbial pathways (Table 3). Two transcriptional stop-gain mu-tations in the GPR151 gene were significantly associated with lower levels of bacterial carbohydrate metabolism pathways (ANAEROFRUCAT-PWY with meta P = 4.78 × 10-6, GLYCOLYSIS with meta P = 5.45 × 10-6, PWY-5484 with meta P = 4.63 × 10-6 and PWY-6901 with meta P = 3.05 × 10-5) (Figure 4; Supplementary figure 3). In addition, two frameshift variants in the IBD-associated gene CYP2D6 were associated with a decreased level of bacterial biosynthesis of vitamin K (PWY-5838 with meta P = 1.45 × 10-5). Targeted analysis identifies mbQTLs in IBD-associated genes

(15)

Since both the gut microbiota and host genetics are different in patients with IBD com-pared to the general population, we reanalyzed our dataset including an interaction factor between disease and genetics. This analysis identified IBD-specific interactions comprising 18 genetic variants and 19 microbiome features (10 pathways and 9 taxa) (Supplementa-ry table 8). For example, a missense variant (rs2076523, c.586T>C, p.Lys196Glu) in the IBD-associated gene BTNL2, which is involved in regulation of T cell proliferation25, was associated with an increase in Bacteriodes cellulosilyticus in patients with IBD (interaction P = 1.31 × 10-5). We also replicated three previously identified mbQTLs. The well-known as-sociation between the LCT gene and Bifidobacterium abundance 12, 26, 27 was confirmed in our population-based cohort (recessive model, P = 1.70 × 10-4, Supplementary figure 4, Supple-mentary table 9), while previously reported genetic variants with mbQTL-effect located in the IBD-associated genes TNFSF15 (rs4246905, c.302-63T>C) and HLA-B (rs2074496, c.900C>T, p.Pro300Pro) 15 were associated with a glycogen degradation microbial pathway (GLYCOCAT-PWY, interaction P = 7.98 × 10-5) and a strain of Ruminococcaceae

bacteri-um (interaction P = 3.32 × 10-5), respectively.

To study the interaction between host genomics and gut microbial features in the context of IBD, we performed a large mbQTL analysis using high resolution host genomic and gut microbiome data. This identified putative associations between common genomic var-iants located in IBD (MYRF, IL17REL, SEC16A and WDR78) or immune-related genes (CABIN1) to the abundance of specific microbial taxa and gut microbiome metabolic pathways. The use of WES data also allowed us to identify rare and deleterious mutations in four genes (GPR151, CYP2D6, TPTE2 and LEKR1) that could potentially be involved in the regulation of the gut microbiota. Finally, genetics–disease interaction models re-vealed disease-specific mbQTL signals.

In our whole-exome-wide level analysis, we found that decreased levels of the microbial acetyl-CoA and glyoxylate metabolic pathways correlated with the minor allele (C) of a variant located in the gene MYRF. Acetyl-coA is a precursor in the synthesis of SCFAs, including butyrate and acetate28, that are important in maintaining gut health29. Interest-ingly, the MYRF gene is located in a genomic region that has previously been associated with IBD and other immune-mediated diseases30, 31. This genomic region also contains the

FADS1 and FADS2 genes that are involved in the metabolism of polyunsaturated fatty

acids32, and the n-3 polyunsaturated fatty acid has been suggested to have protective effects on IBD33. Therefore, our analyses suggest a potential link between inflammation and mi-crobial pathway dysregulation through host genomic variation. Another mbQTL we iden-tified is located in the immune-related gene CABIN1. This gene is involved in negatively regulating T-cell receptor signalling34 and was associated to an increase of D-glucarate Interaction analysis identifies IBD-specific mbQTLs

(16)

Associations between gene GPR151 and microbial pathways.

A) Meta P values based on burden test between 30 genes with rare PTVs on chromosome 5 and relative abundance of MetaCyc pathway homolactic fermentation (top). Blue dot represents meta P value of gene GPR151. Lower panel shows the variants found along with the coding region in GPR151. Different colors indicate different variant categories. Red indicates two rare stop-gain mutations, rs114285050 and

(17)

rs140458264. B) Box-plots for associations between the relative abundance of the homolactic fermen-tation (meta P = 4.78 × 10-6), glucose xylose degradation (meta P = 3.05 × 10-5) microbial pathways and GPR151, respectively.

(18)

degradation pathway. Interestingly, enterobacteria such as E. coli, a potentially pathogenic bacteria known to be enriched in dysbiotic conditions, can use this sugar as a carbon source for growth35. This implies a potential role between host genetics and a beneficial environ-ment for E. coli to grow. We also found an association between IL17REL, which likely oligomerizes and binds a specific IL17 cytokine, and the bacterium Alistipes. When analys-ing this mbQTL in patients with UC and CD separately, we did not identify this specific mbQTL to be statistically significant. A potential explanation could be that we did not have enough power to detect this effect in the IBD subtypes. Changes in the abundance of Alistipes have been reported in several conditions, including pediatric Crohn’s disease36, colorectal cancer37 and obesity38. Previous studies have reported a negative correlation be-tween the abundance of Alistipes and the LPS-induced anti-TN α response39. Therefore, mbQTLs identified at whole-exome-level suggest a potential complex interaction between host genetics, microbial composition and the immune system.

Next, we focused on a subset of selected variants located in genes within IBD-suscepti-bility regions and predicted protein-disrupting variants that could potentially lead to dis-ease or abnormal phenotype by altering the gut microbiome. Here we found two mbQTLs located in the IBD-associated genes SEC16A and WDR78. SEC16A is involved in the transitional endoplasmic reticulum and is located within a haplotype block that contains the INPP5E and CARD9 genes40.The SEC16A-affected pathway biosynthesis of thiamin (Vitamin B1, an essential vitamin) is necessary for the proper functioning of the immune system and thiamin is supplied to the host through diet and the gut microbiota41,42. WDR78 was associated with L-rhamnose biosynthesis, and L-rhamnose is a precursor of a common enterobacterial antigen. In addition, WDR78, together with genes GPR65 and TNFAIP3, is reported to cooperate in regulation of the macrophage component43. Therefore, our study reveals a potential link that suggests WDR78 may potentially regulate microbial function through antigen recognition by immune cells.

In contrast to the regular genotyping arrays used in GWAS, WES enables the detection of low frequency or rare variants with mbQTLs effects. We identified independent rare variants with predicted functional consequences within the G-protein coupled receptor 151, GPR151, that are associated with multiple functional microbial pathways (homolactic fermentation, glucose and xylose degradation). GPR151 is a critical element of antigen recognition and activation of the immune response44,45, and PTVs in GPR151 have been reported to have a protective effect against obesity and type 2 diabetes in the UK Bio-bank46. In addition, lower levels of bacterial carbohydrate degradation lead to lower carbo-hydrate absorption in the gut by the host, which pinpoints potential mechanisms by which

GPR151 variants can protect against metabolic diseases.

Finally, we joined the two cohorts to perform genetics–disease interaction analysis, rath-er than comparing single-cohort-significant mbQTLs separately, to identify disease-spe-cific mbQTLs and to achieve more power. This approach was able to show that genetics potentially exerts a different influence on the microbiome in IBD compared to a healthy situation. The known association between the LCT gene and Bifidobacterium abundance was only present in the population cohort. This could potentially be explained by the fact that Bifidobacterium abundance is decreased in the gut microbiota of Crohn’s disease3 and therefore this mbQTL was not present in the IBD cohort. Furthermore, we observed

(19)

mbQTLs associated with microbial taxonomies and pathways identifi ed using a targeted approach.

The targeted approach identifi ed six signifi cant associations between variants located in IBD-related genes and microbial taxa and pathways. Spearman correlation was performed in the association test in each cohort, followed by a Z-score-based meta-analysis. The discovery signifi cance threshold was 0.001, the replication signifi cance threshold was 0.05 and the fi nal meta threshold was 1.5 × 10-5. (a) Meta P value threshold decided by the number of total variants (n = 3,309, Bonferroni correction). (b) P values from association analyses in each cohort. (c) Correlation coeffi cients from association analyses in each cohort.

(20)

Rare mbQTLs identifi ed by gene-based burden meta-analyses

Eight associations were identifi ed by the gene-based burden test. We collapsed exome-wide protein truncating variants (PTVs) (excluding singletons) with MAF < 5% into 980 genes. Genetic scores were used instead of single variant dosage in the association analyses in each cohort. Meta-analyses were performed for those associations with discovery P < 0.005 and replication P < 0.05. (a) Rare PTVs

(21)

cated within genes used in the burden test. (b) Meta P value cut-off determined based on the total number of genes (n = 980, Bonferroni correction) (c) P values from association analyses in each cohort. (d) Correlation coeffi cients from association analyses in each cohort.

(22)

mbQTL effects in known IBD genes15 such as TNFSF15, only in the IBD cohort. Heritability studies have shown that part of the microbiome development and composi-tion is under genomic control26. However, many other intrinsic and environmental factors, as well as diseases, have larger effect sizes47. Studies looking into genome–microbiome interaction have been performed using GWAS technologies, mostly in healthy or popula-tion-based cohorts, and by using 16s sequencing12,13,48. In the current study we performed a large-scale mbQTL analysis of gut microbiome composition and function that combined two high-resolution techniques, WES and shotgun metagenomics, while controlling for major confounders known to influence the gut microbiome. While we are only beginning to dissect the genomic architecture that drives microbiome evolution and composition in health and disease, this study adds considerable insights and provides leads for further functional analyses or targets for therapies in the context of IBD.

This research highlights that both common and rare host genetic variants affecting the immune system are key factors in shaping the gut microbiota taxonomy and function, knowledge which further enhances our understanding of the intricate host-microbiome interaction involved in IBD pathogenesis.

Whole exome sequencing (WES) was performed on blood samples taken from 939 LifeLines-DEEP participants and 525 IBD patients. DNA isolation was performed using the AutoPure LS procedure from Qiagen. Library preparation and sequencing was done at the Broad Institute of MIT and Harvard. On average, 86.06 million high quality reads were obtained for each sample with 98.85% of reads aligned (human ref-erence genome hg19), resulting in a coverage of 81% of the target region with a read depth of >30X. Next, the Genome Analysis Toolkit of the Broad Institute was used for calling single-nucleotide polymorphism and insertions/deletions (https://software. broadinstitute.org/gatk/).

The following variant/sample filtering parameters were applied to WES data using PLINK tool (v.1.9): 1) variants with a call rate <0.99 were removed. Variants with a minor allele frequency (MAF) >5% were used for microbial quantitative trait loci (mbQTLs) analyses of common variants, and variants with a MAF <5% were used for low-frequency and rare mbQTL mapping. 2) In the Hardy-Weinberg equilibrium test, we used a P value <0.0001 as a significance cutoff in the LifeLines-DEEP cohort and discarded those variants in the both cohorts. 3) To identify ancestry-based genetic outliers in our dataset, we merged the WES data with genomes of Europeans from publically available 1000 Genome Project (Phase 3) data ( http://www.internation-algenome.org/), and performed principal component analysis (PCA) analysis based on SNPs shared between datasets. Outliers were defined as samples which fall outside of a

Supplementary methods

(23)

Participants were asked to freeze a stool sample at home within 15 min of production. A research nurse visited each participant at home shortly after production to collect the sam-ple on dry ice for transport to the laboratory at -80°C. Microbial DNA was isolated using the Qiagen AllPrep DNA/RNA Mini Kit (Qiagen; cat. #80204). Metagenomic shotgun sequencing was performed using the Illumina MiSeq platform. An average of 3.0 Gb of data (around 32.3 million reads) was obtained per sample. Reads belonging to the hu-man genome were removed by mapping the data to the huhu-man reference genome (version NCBI37) with kneaddata (v0.5.1, http://huttenhower.sph.harvard.edu/kneaddata). Profiling of microbiome taxonomic and functional composition was done using MetaPh-lan (v2.6.0)50 (http://huttenhower.sph.harvard.edu/metaphlan) and HUMAnN2 (v0.6.1)51 (http://huttenhower.sph.harvard.edu/humann2). Within the IBD cohort, we found 483 microbial metabolic pathways and 1455 taxa, including 13 phyla, 23 classes, 32 orders, 70 families, 178 genera, 578 species and 561 strains. Within LifeLines-DEEP, we found 468 pathways and 1375 taxa, including 15 phyla, 24 classes, 33 orders, 74 families, 176 genera, 573 species and 480 strains.

For each cohort, taxa present in fewer than 10% of samples and pathways present in fewer than 25% of samples were excluded from the analyses. We removed the redundant taxa by keeping the lowest taxonomic level that shared identical abundance (for example, species Odoribacter splanchnicus and its strain GCF_000190535 had the same detected rel-ative abundance in all samples, so we only kept the lowest level taxon, GCF_000190535, in this case). Unclassified and unintegrated metabolic pathways were also excluded (Sup-plementary Table 2) as they have little informative biological meaning.

Previous research has firmly established that many factors affect the composition and metabolic pathways of the gut microbiota52,53, and are therefore potential confounders in mbQTL studies. These factors include host age, sex, BMI, ethnicity and intake of medications as well as technical parameters of the microbiome analyses including sam-ple collection, storage, DNA isolation, and sequencing. To correct for these potential mean ± 3 SD interval in both of the first two PCs, and these samples were removed. 4) determining sex (based on heterozygosity rates) and identifying mismatched samples were based on the inbreeding coefficient (lower than 0.4 for females and higher than 0.7 for males). Finally, we also excluded participants who had their colon removed due to the large effect this procedure has on the gut microbiome49. These filtering steps led to exclusion of 19 samples from LifeLines-DEEP and 90 samples from IBD, and we retained 920 LifeLines-DEEP individuals, 435 IBD individuals, 73,164 common variants and 98,878 rare variants for downstream analysis.

Metagenomic sequencing of gut microbiota and data processing

(24)

confounders, we compared the results of the mbQTL analysis model that adjusted for host age, gender, and sequencing depth (Supplementary Table 10) with a more com-plex model that also included use of medications, host smoking status, BMI and disease location for patients with IBD (Supplementary Table 5). Because the study population is highly homogenous (all participants are from the north of the Netherlands and the genetic outliers were also excluded during QC), we did not include host ethnicity in the model. Medication categories included antibiotics, proton-pump inhibitors and laxatives were based on previously reported microbiome-drug associations. The use of mesala-zines, thiopurines and anti-TNFα were not considered because of their relatively small impact on the gut microbial composition and the overrepresentation in the cohort of IBD patients. Due to the impact of disease location on the gut microbiota (ref 54) this variable was also considered in the analyses of the IBD cohort.

We used a well-established pipeline for our mbQTL analysis12,55 (https://github.com/

alexa-kur/miQTL_cookbook). For association tests between microbial features and com-mon individual variants (MAF > 5%), we normalized the relative abundances of mi-crobial taxa and metabolic pathways data through inverse rank transformation. Mul-tivariate linear regression was used to adjust for the effect of confounders using the following model:

feature = (intercept)+ age+gender+BMI+smoking status+medication usage+sequence depth+ disease location (only for IBD)

The corrected microbiome features (residuals from linear model) were considered as quantitative traits. The rank-based Spearman correlation method in R was applied to determine the relationship between non-zero microbiome features and each host genetic variant (where variants were encoded as 0 for homozygote of major allele, 1 for heterozygotes and 2 for homozygote of minor allele).

Restricted by the relatively small sample size in our non-common variants associa-tion analysis, we used the gene-based burden test by adding the variant’s score instead of individual genotype into the correlation test, keeping only PTVs with MAF <5% and, calculating per-gene scores using the MetaSKAT packages56 in R.

For meta-analyses, the metap package (https://rdrr.io/cran/metap/) in R was used to perform a weighted-Z-score approach, considering sample size and separate P values. All significance thresholds were calculated by Bonferroni method with respect to the number of tested variants.

For interaction analyses, we added a disease and genotype interaction term to the linear model:

corrected feature ~ (intercept)+ genotype+disease+genotype × disease

(25)

To ensure the interaction results are not biased by potentially inflated statistics, we reas-signed the disease status across all samples randomly 999 times and retested the interac-tions. Significance thresholds were set based on the Bonferroni method corrected by the number of interaction tests. At whole-exome-wide level, we observed 44 randomly signifi-cant interaction signals out of 999 permutation tests (around 0.04 random signals for each permutation on average), while we observed 14 significant signals from our non-permu-tated test, suggesting that the number of observed interactions is enriched and unlikely to have occurred by chance. The similar tests were also performed for targeted analysis. In ad-dition, the distributions of interaction t-statistics and empirical P values were also assessed. The recessive association of a SNP (rs4988235, G allele) near the gene LCT with abun-dance of lactose-metabolizing Bifidobacteria is well established in previous research12,26,27. To evaluate this in our data, we used a recessive model (labeling the homozygote of minor allele as 1, and the other two genotypes as 0) to investigate the correlation between LCT variants (five protein coding variants linked to rs4988235, including rs748841, rs12988076, rs6719488, rs2236783 and rs309180, linkage disequilibrium R2 > 0.7) and all taxonomic abundances using Spearman correlation in R.

(26)

Q-Q plot of exome-wide association analyses P values in IBD cohort and Lifelines-DEEP cohort respec-tively. X-axis indicates the expected -log10 P values and Y-axis indicates the observed -log10 P values. A) Whole exome-wide approach of 73,164 common variants with microbial superpathway of glyox-ylate bypass and TCA (TCA-GLYOX-BYPASS), B) Whole exome-wide approach of 73,164 common variants with microbial superpathway of acetyl-CoA biosynthesis (PWY-5173).

(27)

Box-plot of associations between SEC16A (rs10781497) and WDR78 (rs74609208) and microbial path-ways in IBD cohort and Lifelines-DEEP cohort respectively. X-axis indicates the genotype of variants and Y-axis indicates the relative abundance of microbial pathways. A) the associations between SEC16A (rs10781497) and Thiamin diphosphate biosynthesis I (THISYN-PWY), Thiazole biosynthesis I (E. coli) (PWY-6892), B) the associations between WDR78 (rs74609208) and dTDP-L-rhamnose biosynthesis I (DTDPRHAMSYN-PWY).

P, P values of associations; r, correlation coeffi cient; IBD, IBD cohort; LLD, LifeLines-DEEP cohort

(28)

Q-Q plot of burden test P values in IBD cohort and Lifelines-DEEP cohort respectively. X-axis indicates the expected -log10 P values and Y-axis indicates the observed -log10 P values. A) Burden tests of 980 genes with microbial pathway homolactic fermentation (ANAEROFRUCAT-PWY), B) Burden tests of 980 genes with microbial pathway glucose and xylose degradation (PWY-6901).

(29)

Box-plot association P values between LCT (rs748841) and Bifi dobacterium adolescentis in IBD cohort and Lifelines-DEEP cohort respectively. X-axis indicates the genotype of rs748841 and Y-axis indicates the relative abundance of Bifi dobacterium adolescentis.

P, P values of associations; IBD, IBD cohort; LLD, LifeLines-DEEP cohort

(30)

1. Ng SC, Shi HY, Hamidi N, et al. Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies. Lancet 2018;390: 2769-2778

2. De Lange KM, Moutsianas L, Lee JC, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat genet 2017;49:256-261

3. Vich Vila A., Imhann F, Collij V, et al. Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Sci Transl Med 2018;10:eaap8914

4. Franzosa EA, Sirota-Madi A, Avila-Pacheco J, et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat

Microbiol 2019;4:293-305

5. Hall AB, Tolonen AC, Xavier RJ. Human genetic variation and the gut microbiome in disease. Nat Rev Genet 2017;18:690-699 6. Cohen LJ, Cho JH, Gevers D, et al. Genetic Factors and the Intestinal Microbiome Guide Development of Microbe-Based Therapies for Inflammatory Bowel Diseases. Gastroenterology 2019;156:2174-2189

7. Kobayashi KS, Chamaillard M, Ogura Y, et al. Nod2-dependent regulation of innate and adaptive immunity in the intestinal tract. Science

2005;307:731-734

8. Mondot S, Barreau F, Al Nabhani Z, et al. Altered gut microbiota composition in immune-impaired Nod2−/− mice. Gut 2012;61:634-635 9. Rehman A, Sina C, Gavrilova O, et al. Nod2 is essential for temporal development of intestinal microbial communities. Gut 2011;60:1354-62 10. Butera A, Di Paola M, Pavarini L, et al. Nod2 Deficiency in mice is Associated with Microbiota Variation Favouring the Expansion of mucosal CD4+ LAP+ Regulatory Cells. Sci Rep 2018;8:14241

11. Sadabad MS, Regeling A, de Goffau MC, et al. The ATG16L1–T300A allele impairs clearance of pathosymbionts in the inflamed ileal mucosa of Crohn’s disease patients. Gut 2015;64:1546-1552

12. Bonder MJ, Kurilshikov A, Tigchelaar EF, et al. The effect of host genetics on the gut microbiome. Nat Genet 2016;48:1407-1412 13. Wang J, Thingholm L B, Skiecevičienė J, et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat Genet 2016;48:1396-1406

14. Aschard H, Laville V, Tchetgen ET, et al. Genetic effects on the commensal microbiota in inflammatory bowel disease patients. PLoS Genet

2019;15:e1008018

15. Knights D, Silverberg M S, Weersma R K, et al. Complex host genetics influence the microbiome in inflammatory bowel disease. Genome Med

2014;6:107

16. Kurilshikov A, Wijmenga C, Fu J, et al. Host genetics and gut microbiome: challenges and perspectives. Trends Immunol 2017;38:633-647. 17. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297-1303.

(31)

18. Rivas MA, Pirinen M, Conrad DF, et al. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science

2015;348:666-669

19. Purcell S M, Moran J L, Fromer M, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 2014;506:185-190 20. Peters JE, Lyons PA, Lee JC, et al. Insight into genotype-phenotype associations through eQTL mapping in multiple cell types in health and immune-mediated disease. PLoS Genet 2016;12:e1005908

21. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 2015;348:648-660 22. Kuleshov MV, Jones MR, Rouillard AD, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res

2016;44:W90-W97

23. Watanabe K, Taskesen E, Van Bochoven A, et al. Functional mapping and annotation of genetic associations with FUMA. Nat Commun

2017;8:1826.

24. Franke A, Balschun T, Sina C, et al. Genome-wide association study for ulcerative colitis identifies risk loci at 7q22 and 22q13 (IL17REL).

Nat Genet 2010;42:292-294

25. Prescott NJ, Lehne B, Stone K, et al. Pooled sequencing of 531 genes in inflammatory bowel disease identifies an associated rare variant in BTNL2 and implicates other immune related genes. PLoS Genet 2015;11:e1004955

26. Goodrich JK, Davenport ER, Beaumont M, et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 2016;19:731-743

27. Kolde R, Franzosa EA, Rahnavard G, et al. Host genetic variation and its microbiome interactions within the Human Microbiome Project.

Genome Med 2018;10:6

28. Koh A, De Vadder F, Kovatcheva-Datchary P, et al. From dietary fiber to host physiology: short-chain fatty acids as key bacterial metabolites.

Cell 2016;165:1332-1345

29. Ríos-Covián D, Ruas-Madiedo P, Margolles A, et al. Intestinal short chain fatty acids and their link with diet and human health. Front

Microbiol 2016;7:185

30. Carethers M, Jung BH. Genetics and genetic biomarkers in sporadic colorectal cancer. Gastroenterology 2015;149:1177-1190 31. Costea I, Mack DR, Lemaitre RN, et al. Interactions between the dietary polyunsaturated fatty acid ratio and genetic factors determine susceptibility to pediatric Crohn’s disease. Gastroenterology 2014;146:929-931

32. Illig T, Gieger C, Zhai G, et al. A genome-wide perspective of genetic variation in human metabolism. Nat Genet. 2010;42:137-141. 33. Marion-Letellier R, Savoye G, Beck PL, et al. Polyunsaturated fatty acids in inflammatory bowel diseases: a reappraisal of effects and therapeutic approaches. Inflamm Bowel Dis 2013;19:650-661.

(32)

34. Sun L, Youn HD, Loh C, et al. Cabin 1, a negative regulator for calcineurin signaling in T lymphocytes. Immunity 1998;8:703-711 35. Gulick AM, Hubbard BK, Gerlt JA, et al. Evolution of enzymatic activities in the enolase superfamily: identification of the general acid catalyst in the active site of D-glucarate dehydratase from Escherichia coli. Biochemistry 2001;40:10054-10062.

36. Lewis JD, Chen EZ, Baldassano RN, et al. Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell Host Microbe 2017;22:247

37. Wang T, Cai G, Qiu Y, et al. Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers. ISME J.

2012;6:320-329

38. Ridaura VK, Faith JJ, Rey FE, et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 2013;341:1241214 39. Schirmer M, Smeekens SP, Vlamakis H, et al. Linking the human gut microbiome to inflammatory cytokine production capacity. Cell

2016;167:1897

40. Zhernakova A, Festen EM, Franke L, et al. Genetic analysis of innate immunity in Crohn’s disease and ulcerative colitis identifies two susceptibility loci harboring CARD9 and IL18RAP. Am J Hum Genet 2008;82:1202-1210

41. Halfvarson J, Brislawn C J, Lamendella R, et al. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat Microbiol.

2017;2:17004

42. Parashar A, Udayabanu M. Gut microbiota: implications in Parkinson’s disease. Parkinsonism Relat Disord 2017;38:1-7

43. Peters LA, Perrigoue J, Mortha A, et al. A functional genomics predictive network model identifies regulators of inflammatory bowel disease.

Nat Genet 2017;49:1437-1449

44. Cho H, Kehrl JH. Regulation of Immune Function by G Protein‐Coupled Receptors, Trimeric G Proteins, and RGS Proteins. Prog Mol Biol

Transl Sci 2009;86:249-298

45. Gräler MH, Goetzl EJ. Lysophospholipids and their G protein-coupled receptors in inflammation and immunity. Biochim Biophys Acta. 2002;1582:168-174.

46. Emdin CA, Khera AV, Chaffin M, et al. Analysis of predicted loss-of-function variants in UK Biobank identifies variants protective for disease.

Nat Commun 2018;9:1613

47. Rothschild D, Weissbrod O, Barkan E, et al. Environment dominates over host genetics in shaping human gut microbiota. Nature

2018;555:210-215

48. Turpin W, Espin-Garcia O, Xu W, et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat Genet

(33)

48. Tyler A D, Knox N, Kabakchiev B, et al. Characterization of the gut-associated microbiome in inflammatory pouch complications following ileal pouch-anal anastomosis. PloS one, 2013, 8(9): e66934.

49. Truong D T, Franzosa E A, Tickle T L, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature methods, 2015, 12(10): 902. 50. Abubucker S, Segata N, Goll J, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS

computational biology, 2012, 8(6): e1002358.

51. Zhernakova A, Kurilshikov A, Bonder M J, et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science, 2016, 352(6285): 565-569.

52. Panek M, Paljetak H Č, Barešić A, et al. Methodology challenges in studying human gut microbiota–effects of collection, storage, DNA extraction and next generation sequencing technologies. Scientific reports, 2018, 8(1): 5143.

53. Imhann F, Vich Vila Arnau, Bonder M J, et al. Interplay of host genetics and gut microbiota underlying the onset and clinical presentation of inflammatory bowel disease. Gut, 2018, 67(1): 108-119.

54. Wang J, Kurilshikov A, Radjabzadeh D, et al. Meta-analysis of human genome-microbiome association studies: the MiBioGen consortium initiative. 2018.

55. Lee S, Teslovich T M, Boehnke M, et al. General framework for meta-analysis of rare variants in sequencing association studies. The American

Journal of Human Genetics, 2013, 93(1): 42-53.

(34)

We thank the LifeLines-DEEP and IBD cohort participants. We thank Kate Mc Intyre for substantive English editing and B.H.Jansen for technical support.

Acknowledgments

Funding

Data availability

M.A.R. is supported by a National Institute of Health center for Multi- and Trans-Ethnic Mapping of Mende-lian and Complex Diseases grant (5U01 HG009080) and by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under awards R01HG010140. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. C.W. is supported by a European Research Council (ERC) Advanced grant [FP/2007-2013/ERC grant 2012-322698], a Netherlands Organization for Scientific Research (NWO) Spinoza prize grant [NWO SPI 92-266] and the Gravitation Netherlands Organ-on-Chip Initiative (024.003.001). J.F. is supported by grants from NWO (NWO-VIDI 864.13.013) and CardioVasculair Onderzoek Nederland (CVON 2018-27). A.Z. is supported by an NWO Vidi grant (NWO-VIDI 016.178.056), an ERC Starting Grant (715772), CVON 2018-27 and a Rosalind Franklin Fellowship from the University of Groningen.

Code for analysis:

https://github.com/WeersmaLabIBD/Microbiome/tree/master/Exome_microbiome_eqtl Supplementary figures, tables and materials can be found:

(35)

Referenties

GERELATEERDE DOCUMENTEN

Here, we aimed to replicate the Li et al’s association between the SLC39A8 [Thr]391 risk allele and the gut microbiota using 16S rRNA gut microbiota data from stool samples and

De in een exon gelegen variant [Thr]391 in het SLC39A8 gen is geassocieerd met de ziekte van Crohn, maar de eerder beschreven associatie van deze variant met het microbioom kan

Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome.

As last, I describe my personal point of view on the future directions in the study of the gut microbiota and its future implications in the diagnosis and treatment of inflammatory

Chi-square tests, Fishers exact tests, Spearman correlations and Wilcoxon-Mann-Whitney tests (WMW tests) were used to determine differences in the clinical characteristics of

“FISH” refers to the average number of cell counts (based on epifluorescence assay with bacterial oligonucleotide probe) per gram of stool sample. In panel e) the x-axis depicts

In addition to information about &gt;2,000 exogenous factors (morpholog- ical, physiological, clinical), LifeLines-DEEP participants have been deeply profi led for multiple “omics”

Meta-analysis of three independent cohorts comprising 1815 fecal samples, showing a cladogram (cir- cular hierarchical tree) of 92 signifi cantly increased or decreased bacterial