• No results found

University of Groningen Development of bioinformatic tools and application of novel statistical methods in genome- wide analysis van der Most, Peter Johannes

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen Development of bioinformatic tools and application of novel statistical methods in genome- wide analysis van der Most, Peter Johannes"

Copied!
81
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Development of bioinformatic tools and application of novel statistical methods in

genome-wide analysis

van der Most, Peter Johannes

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2017

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

van der Most, P. J. (2017). Development of bioinformatic tools and application of novel statistical methods in genome-wide analysis. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Chapter 8

Genome-wide survival meta-analysis of age

at first cannabis use

Camelia C. Minică*, Karin J.H. Verweij,*, Peter J. van der Most*, Hamdi Mbarek,

Manon Bernard, Kristel R. van Eijk, Penelope A. Lind, MengZhen Liu, Dominique

F. Maciejewski, Teemu Palviainen, Cristina Sánchez-Mora, Richard Sherva,

Michelle Taylor, Raymond K. Walters, Abdel Abdellaoui, Timothy B. Bigdeli, Susan

J.T. Branje, Sandra A. Brown, Miguel Casas, Robin P. Corley, George Davey Smith,

Gareth E. Davies, Erik A. Ehli, Lindsay Farrer, Iryna O. Fedko, Iris Garcia-Martínez,

Scott D. Gordon, Catharina A. Hartman, Andrew C. Heath, Ian B. Hickie, Matthew

Hickman, Christian J. Hopfer, Jouke Jan Hottenga, René S. Kahn, Jaakko Kaprio

, Tellervo Korhonen, Henry R. Kranzler, Ken Krauter, Pol A.C. van Lier, Pamela

A.F. Madden, Sarah E. Medland, Michael C. Neale, Wim H.J. Meeus, Grant W.

Montgomery, Ilja M. Nolte, Albertine J. Oldehinkel, Zdenka Pausova, Josep A.

Ramos-Quiroga, Vanesa Richarte, Richard J. Rose, Jean Shin, Michael C. Stallings,

Tamara L. Wall, Jennifer J. Ware, Margaret J. Wright, Hongyu Zhao, Hans M. Koot,

Tomas Paus, John K. Hewitt, Marta Ribasés, Anu Loukola, Marco P. Boks, Harold

Snieder, Marcus R. Munafò, Joel Gelernter, Dorret I. Boomsma, Nicholas G.

Martin, Nathan A. Gillespie, Jacqueline M. Vink, Eske M. Derks

* Authors contributed equally

(3)

8

Abstract

Cannabis is one of the most commonly used substances among adolescents and young adults. Research shows that the age at first cannabis use is decreasing. Earlier age at cannabis initiation is linked to adverse life outcomes including multi-substance use and dependence. Here we aim to estimate heritability of and to identify genetic variants associated with age at first cannabis use. We performed the largest molecular genetic study to date of age at first cannabis use comprising a discovery sample of 24,953 individuals from nine cohorts, and a replication sample of 4,478 individuals from three cohorts from North America, Europe, and Australia. Five SNPs on chromosome 16 within the Calcium-transporting ATPase gene (ATP2C2) passed the genome-wide significance threshold. All five SNPs are in high LD (r2>0.8), and likely represent a

single independent signal. The strongest association was with the intronic variant rs1574587 (P=4.07*10 -9). Gene-based tests also identified the ATP2C2 gene on 16q24.1 (P=1.54*10-6) along with two additional

genes: ECT2L on 6q24.1 (P=6.59*10-8) and RAD51B on 14q24.1 (P=5.22*10-6), although neither the

SNP nor gene-based test results were replicated. However, ATP2C2 has been previously associated with cocaine dependence, while the ATP2B2 gene, a member of the same calcium signalling pathway, has been associated with opioid dependence. Our findings lend support to the hypothesized link between calcium signalling genes and substance use disorders, consistent with the reported associations between early onset of cannabis use and multi-substance use, as well as with subsequent dependence.

(4)

8

Introduction

Cannabis is one of the most commonly used substances among adolescents and young adults1, 2. Annually,

approximately 147 million people, or 2.5% of the world’s population, consume cannabis. In the last decade cannabis use disorders have grown more rapidly than either cocaine or opiate use disorders, with the most rapid growth seen in developed countries in North America, Western Europe, and Australia3. Accompanying

these changes, there has also been a global trend towards decreasing age at first cannabis use4, 5.

Globally, younger cohorts are more likely to use all types of drugs including cannabis. In the United States, the mean age at first cannabis use is 18 years, while the mean age at first cannabis use among individuals who initiate prior to age 21 is 16 years2. European data suggests that age at first cannabis use is lower in

countries with higher prevalences of cannabis use6. In addition, the male-female gap, which is commonly

observed in older cohorts, has been found to be closing in more recent cohorts7, 8. Overall, these trends are

probably due to lower risk perception, especially among young people9 and increased availability caused

by medicalisation and decriminalisation. Despite these trends, very little is known about the genetic aetiology of age at first cannabis use.

Early cannabis initiation has been associated with a number of maladaptive outcomes. These include educational under-achievement10-12, possible cognitive decline13, 14, negative life events15, differences in

brain maturation in at-risk adolescents16, and psychosis and other psychopathology17-20. Early age at onset

of use is also linked to more frequent progression to problematic cannabis use and multi-substance use, and increased likelihood of substance use disorders6, 21-28.

Despite its widespread use and its association with adverse life outcomes, there remain significant gaps in our understanding of the genetic epidemiology of age at first cannabis use. A meta-analysis of twin studies29 reported a heritability (h2) of ~45% for lifetime cannabis use (ever versus never). However, only

a very limited number of biometric genetic studies have explored the heritability of age at first cannabis use. Most recently, Richmond-Rakerd et al.30 estimated the heritability of age at first cannabis use in a large

population-based sample of lifetime users to be 19%, which was non-significant. In contrast, Lynskey et al.28

reported much larger genetic influences (h2=80%) on early onset use (≤16 years), whereas Sartor et al.31

reported a heritability of 52% when age at first cannabis use was categorized as ‘never’, ‘late’ (≥17 years), or ‘early’ (≤16 years). These discrepant findings might be due to differences in the biometrical genetic methods employed, as well as to the inclusion versus exclusion of never users. To address these limitations, we estimate heritability of age at first cannabis use in twins from three different cohorts. We applied three models to determine whether cannabis initiation and age at initiation fall along the same continuum (single liability), represent two independent liabilities (independent model), or two distinct but related liabilities (combined model)32; heritability estimation is based on the best fitting model. Subsequently,

we apply molecular genetic analyses to identify genetic loci associated with the age of first cannabis use. With regard to molecular genetics, we are aware of only one genome-wide association (GWA) study that has investigated the role of genetic variants associated with age at first cannabis use. Minica et al.33 performed

a genome-wide survival analysis in a sample comprising 5148 participants from 2,992 Dutch families (including 4,296 individuals who declared never to have used cannabis and 852 individuals that had

(5)

8

initiated cannabis use). No SNPs (single nucleotide polymorphisms) or genes were significantly associated with age at first cannabis use, which may be due to a lack of statistical power33. It is likely that this phenotype

is highly polygenic, comprising many genetic variants with small effects. Consequently, identifying genetic variants will require much larger samples than previously employed. Survival-based methods suggested by Minica et al. are suitable for the analysis of age-at-onset traits in genome-wide analyses because they are expected to yield greater statistical power than GWA analyses limited to cannabis users only, or to analyses relying on logistic regression based on samples of users and non-users34-36. We therefore applied a

survival-based approach here with eight additional cohorts to detect genetic variants associated with age at first cannabis use.

The International Cannabis Consortium (ICC) was established to identify genetic variants underlying individual differences in cannabis use phenotypes by combining data from numerous cohorts and studies. The ICC has previously identified four genes significantly associated with lifetime cannabis use: NCAM1;

CADM2; SCOC; and KCNT237. Interestingly, both NCAM1 and KCNT2 have been previously linked to other

substance use phenotypes37. Of note is also our novel finding at CADM2 which was recently associated with

alcohol consumption38, personality39, and behavioural reproductive outcomes and risk-taking behaviour40.

We aimed to identify genetic variants associated with age at first cannabis use. We analysed results from a discovery sample comprising N=24,953 individuals and we sought replication of the top signals in a sample of N=4,478 individuals. Prior to the meta-analysis, each participating cohort (Table 1), modelled age at first cannabis use as a function of SNPs using a genome-wide survival analysis33. The SNP-based

genome-wide meta-analytic results were then used in gene-based tests of association. With genes as the unit of analysis the statistical power relative to single-SNP analyses is increased by jointly interrogating all SNPs within a gene while reducing the multiple testing burden41, 42.We tested the top SNP and gene-based

results in an independent meta-analytic sample. In addition, we estimated the proportion of variance in age at first cannabis use that can be explained by common SNPs. Finally, we determined whether or not a polygenic risk score based on this meta-analysis predicts age at first cannabis use in one of the replication samples.

Materials and methods

Heritability study

The heritability of age at first cannabis use was estimated using data from three samples: NTR43, which

includes 2017 monozygotic and 1771 dizygotic twin pairs; QIMR44, 45, which includes 1282 monozygotic

and 1969 dizygotic twin pairs; and BLTS46, which contains 429 monozygotic and 577 dizygotic twin pairs.

Three genetic models were fitted to the data: a single liability model, an independent liability model and a combined model (see 32). For the model that best fit the data, the twin correlations in liability were expressed

as a function of genetic and environmental parameters based on the classical twin design47, 48. Sources of

variation that were considered were additive genetic variation (A), shared environmental variation (C), and unshared environmental variation (E). See Supplementary Information 1 for more detail on this analysis.

(6)

8

Study samples

The current discovery meta-analysis was based on genome-wide summary statistics from 9 European, North American, and Australian cohorts comprising N=24,953 individuals. The mean age ranged from 17.3 to 46.9 years (Table 1). Females represented 53.2% of the sample, and 44.4% of the observations were uncensored, i.e., individuals who acknowledged having initiated cannabis use (see Supplementary Table S1 for more details).

The replication sample comprised of N=4,478 individuals with a mean age ranging from 19.5 to 49.4 years (Table 1). Females represented 49.9% of the sample, and 52.3% of the observations were uncensored (see Supplementary Table S1 for more details on the samples).

TABLE 1 | Descriptive information on the participating discovery and replication cohorts.

Cohort N % Females % Uncensored

Observations

Mean age (SD)

Mean age at first use (sd) (in users)

Number of SNPs Discovery ALSPAC 6147 51.9 38.4 17.3 (1.7) 14.8 (1.6) 6,284,747 BLTS 721 57.1 59.5 26.2 (3.3) 18.8(2.8) 4,093,835 FinnTwin 1029 51.7 27.5 22.8 (1.3) 18.0 (2.5) 4,362,100 HUVH 581 31.3 30.3 28.7 (12.5) 16.0 (3.0) 4,318,727 NTR 5148 62.3 16.6 46.9 (17.5) 18.9 (5.1) 4,773,834 QIMR 6758 53.8 51.3 45.2 (10.9) 19.9 (5.8) 5,953,917 TRAILS 1249 53.8 61.7 20.0 (1.6) 16.3 (2.0) 4,109,101 Utrecht 958 51.3 59 17.4 (3.2) 15.5 (2.1) 4,260,457 Yale-Penn 2362 41.0 92.6 38.0 (10.5) 17.0 (9.4) 5,732,659 Replication CADD 1801 35.5 81.9 23.65 (4.3) 14.25 (3.16) 8* NTR2** 1740 63.7 22.2 35.0 (14.6) 18.0 (4.0) 8* RADAR** 342 44.7 57.0 19.5 (0.8) 15.9 (1.7) 8* SYS 595 56.1 48.2 49.4 (5.1) 33.8 (17.19) 8*

N = sample size (or range if sample size varied across SNPs), % uncensored observations (i.e., individuals who have initiated cannabis use). Mean age: age when completing survey or interview. Mean age at first use: mean age at first cannabis use * In the replication samples only the top 8 independent SNPs were tested. ** The NTR2 and the RADAR samples were combined and the analysis was performed in this combined sample.

Phenotyping in the GWAS discovery and replication samples

Age at first cannabis use was assessed from questionnaires or clinical interviews (see Supplementary Information 1 for information on the exact phrasing of the question). For individuals who had not initiated cannabis use at the time of the assessment, their age at the last survey or interview served as the phenotype. Depending on the cannabis initiation status, individuals were coded as uncensored (initiated), or censored (did not initiate at the time of the last measurement). In order to maximize sample size and given that all participating cohorts were on average relatively young, we included all available data, i.e., censored and uncensored observations, without imposing any age restriction.

(7)

8

Genotyping

Genotyping followed by extensive quality control (QC) was performed by each participating cohort (see Supplementary Table S2 for details). Generally, QC included the removal of single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) below 1%, call rates lower than 90%, and Hardy Weinberg equilibrium (HWE) p-values below 1*10-4 (but see Supplementary Table S2 for exact QC

thresholds used by each cohort). SNPs known to have evidence of poor clustering on visual inspection of intensity plots were also discarded. At the subject level, additional QC involved the removal of individuals with low overall call rates (see Supplementary Table S2 for details) conflicting sex designation, or excess autosomal heterozygosity (indicative of genotyping errors). Duplicate samples and unintended 1st or 2nd

degree relatives (in samples of unrelated individuals) were also removed.

Imputation

The analysis protocol required all participating cohorts to perform genotype imputation using the 1000 Genomes Phase 1 March 2012 release as a reference sample49. We refer to the Supplementary Table S2 for

details on imputation performed by each cohort. Post-imputation QC included filtering out SNPs with poor imputation quality (< 0.4). We used best-guess genotypes (given requirements of the software used for the genome-wide analysis) with analysis restricted to autosomal chromosome SNPs.

Quality checks prior to meta-analysis

Prior to the meta-analysis, each input file underwent additional QC pertaining to imputation quality, minor allele frequency and HWE. As we used best guess genotypes, we selected for meta-analysis SNPs with high imputation quality (> 0.8). Based on this threshold, the average imputation quality across the SNPs ranged from 0.95 to 0.99 across the 9 discovery cohorts. Second, we retained only those SNPs with MAF greater than √(5/N), where N is the sample size. This ensured that there were at least 5 individuals in the least frequent genotype group. Third, genotyped SNPs were retained if HWE was not violated (p-value > 1*10-4).

Lastly, SNPs were retained if their allele frequencies matched those reported in the 1000 Genomes data (i.e., the allele frequency difference did not exceed |0.2|). The discovery meta-analysis included 6,158,982 unique bi-allelic SNPs which passed our QC criteria in at least two cohorts (see Table 1 for number of SNPs in each input file meeting our quality control criteria).

Statistical analysis of individual samples

Cohort-specific analyses were performed using a standardized analysis protocol (see Supplementary Information 2 for details). Each study site performed a Cox proportional hazards regression analysis in which age at first cannabis use (or age at the last survey for censored observations) was regressed on the SNP (coded additively co-dominant as 0, 1, 2) and the following covariates: sex, birth-cohort (to correct for generation effects), the first four principal components (to correct for possible population stratification), and study-specific covariates (to correct for chip and/or batch effects; see Supplementary Table S2 for details). To account for relatedness in family-based cohorts we used the ‘cluster’ option implemented in the survival

(8)

8

package in R50. This option ensured that the standard errors were robust to possible misspecification of the

familial covariance matrix51. The survival package was accessed either directly in R, or called from PLINK52

via the Rserve package53. Meta-analysis

The discovery meta-analysis was performed in METAL54, using a fixed-effects model and the ‘SCHEME STDERR’

option, which weighs the beta coefficients by the inverse of their associated standard errors. To ensure that the bulk of the test statistic distribution follows the expectation under a theoretical null model, we applied genomic control to each cohort’s input file prior to meta-analysis. This ensured that none of the input cohorts contributed disproportionately to the meta-analysis results55. Similar to the method applied by Furberg et

al.56 and Allen et al.57, we computed the standard error (and the corresponding p-value) by multiplying the

variance of the beta by the genomic control lambda estimate for each sample (see Supplementary Table S2). As proposed by Pe’er et al. 58 an alpha of 5*10-8 was used as the genome-wide significance threshold. The top

8 independent signals in the discovery meta-analysis (present in at least one of the replication samples) were taken forward for replication. In addition, these top SNPs were analyzed in a combined sample (including the discovery and the replication samples). We implemented the same procedures and options in the replication phase. Statistical analyses were performed on the Lisa Genetic Cluster Computer (http://www.geneticcluster. org).

Gene-based tests of association

Results from the GWA meta-analysis were used to test gene-based association. For these tests we employed the Gene-based Association Test using the Extended Simes procedure (GATES) implemented in the Knowledge-based mining system for Genome-wide Genetic studies (KGG) software Version 3.5 41, 42 GATES combines the

p-values of the SNPs within a gene by taking into account the linkage disequilibrium (LD) among the SNPs. The SNPs were mapped onto (or within 5 kb) 24,404 genes based on NCBI gene coordinates. LD structure was inferred based on the 1000 Genomes haplotypes (version March, 2012). For this analysis, a False Discovery Rate (FDR) of 0.05 was used as the genome-wide significance threshold59. All genes reaching significance were

tested in the replication sample.

SNP-based heritability analysis

The proportion of phenotypic variance that could be explained by the retained SNPs was estimated using two different methods. The density estimation (DE) method, developed by So et al.60, estimates the genome-wide

distribution of effect sizes based on the difference between the observed distribution of test statistics in the meta-analysis and the corresponding null distribution for a detailed overview of the DE method, see 61. For this

method, SNPs present in at least 25% of the meta-analysis samples were selected and then pruned for LD. We used the r2=0.15 pruning level as the primary result for consistency with other applications of this method. LD

Score Regression62 was used as an alternative method to estimate the SNP based heritability. For this method,

the SNP-based heritability estimate was based only on SNPs present in all cohorts (N=2,557,198 SNPs) to avoid artefacts resulting from differing Ns per SNP.

(9)

8

Polygenic risk score (PRS) analysis

Polygenic risk score (PRS) analyses were carried out to determine whether age at first cannabis use could be predicted in one of the replication samples. Results from the GWAS discovery meta-analysis were used to create PRS in an independent sample from the Netherlands (the combined sample of NTR2-RADAR; information about the samples can be found in Table 1, and Supplementary Tables S1, S2 and Supplementary Information 1) using LDpred63, a recently developed method that takes into account LD

among the SNPs in creating the PRS. With LDpred, PRS are generated by calculating the mean causal effect size of each marker using the SNP effect sizes from the GWAS discovery meta-analysis and the LD structure from the European populations in the 1000 Genomes Phase I reference set.

PRS were created using genotyped (i.e. non-imputed) SNPs that were prevalent in at least 7 of the discovery samples. The final number of SNPs used for the PRS was 366,351. PRS were calculated for several expected fractions of causal genetic markers to optimize prediction accuracy (0.1%, 1%, 10%, and 100%). The computed PRS were transformed into z-values before analysis.

We then tested whether the obtained PRS predicted age at first cannabis use in the independent target cohort by performing a Cox proportional hazards regression in R (as applied in our main analyses). Age at first cannabis use (or age at the last survey for censored observations) was regressed on the PRS. Sex, birth cohort, and three ancestry-informative genetic PCs were included as covariates in the model. To account for family relatedness we used the ‘cluster’ option implemented in the survival R-package.

Results

Heritability

The combined model with separate but correlated liabilities provided the best fit to the data (for more details on model fitting and twin correlations see Supplementary Information 4). According to the estimates in this model, the heritability (A) of age at first cannabis use was 38% (95% CI 19-60%). Shared (C) environmental and unique (E) environmental factors explained 39% (95%CI 20-56%) and 22% (95%CI 16-29%) of the total variance respectively. A, C, and E explained 48% (95%CI 30-65%), 37% (95%CI 21-52%) and 15% (95% CI 11-20%), respectively, of the variance in risk of cannabis initiation.

GWAS meta-analysis

The quantile-quantile plot for the fixed effects genome-wide discovery meta-analysis is shown in Supplementary Figure S1, where it can be seen that the bulk of the test statistic distribution follows the expectation under a null hypothesis of no association (λGC = 1). The test statistic behaved similarly when genomic control was not applied. Taken together, these results indicate that the meta-analysis results are robust to the slight deviations of the test statistic distribution from the theoretical null model observed in some of the participating cohorts. The Supplementary Figures S2a-i and S3a-i include the cohort-specific lambda-corrected Manhattan and quantile-quantile plots.

(10)

8

The Manhattan plot in Figure 1a displays the genome-wide association results, with a region on chromosome 16 passing the significance threshold of P < 5*10-8, and other suggestive signals on chromosomes 6, 10

and 14. Table 2 includes the association results and details on the top 8 independent SNPs. The top 100 SNPs in the discovery samples are shown in Supplementary Table S3. Regional association plots and forest plots of the top SNPs are shown in Supplementary Figures S4a-l, Figure 1b, and Supplementary Figures S5a-k.

FIGURE 1 | Results of the meta-analysis for the discovery sample.

FIGURE 1A | The Manhattan plot of the meta-analysis results for the discovery sample. In the Manhattan plot, the y-axis shows the strength of association (-log10(P)) and the x-axis indicates the chromosomal position. The blue line indicates suggestive significance level (P < 1*10-5) while the red line indicates genome-wide significance level (P < 5*10-8).

FIGURE 1B | Forest plot of the top SNP (rs1574587) on Chromosome 16 in eight discovery cohorts. Note that rs1574587 did not meet quality control criteria in the BLTS sample.

(11)

8

TABLE 2

|

Top 8 independent SNP

s in the meta-analysis of

the discovery samples (pr

esent in at le ast one r eplication sample), and r esults of the meta-analysis of

combined discovery and

replication samples.

SNP

s ar

e displayed when not in link

age disequilibrium (R

2<0.1.

F

or SNP

s with R

2 >= 0.1 only the most significant SNP is shown in the top 8).

Discovery Replication Combined $ SNP Chr BP (hg19) A1 A2 Fr eq A1 beta (s.e.) P Dir ection* beta (s.e.) P Dir ection* beta (s.e.) P rs1574587 16 84453056 T C 0.1415 0.09 (0.016) 4.0x10 -9 +?+++++-+ -0.09 (0.135) 0.50 ??-0.09 (0.016) 8.7x10 -9 rs4935127 10 56654986 C G 0.7741 -0.06 (0.013) 4.6x10 -7 ---+---+-0.07 (0.04) 0.07 ?++ -0.05 (0.012) 2.9x10 -5 rs2249437 6 1595216 T C 0.4595 0.07 (0.014) 5.1x10 -7 ++++?+?++ -0.14 (0.093) 0.13 ??-0.06 (0.013) 2.1x10 -6 rs9266245 6 31325702 A G 0.2655 -0.07 (0.015) 1.6x10 -6 ----?--?--0.01 (0.042) 0.85 ?-+ -0.06 (0.014) 4.7x10 -6 rs28622199 8 5392103 T C 0.8012 0.07 (0.015) 2.7x10 -6 +++-+++++ -0.04 (0.039) 0.36 +--0.05 (0.014) 5.4x10 -5 rs215069 16 16091237 T C 0.0685 -0.11 (0.025) 3.8x10 -6 -?-?--??-0.02 (0.069) 0.80 +-? -0.10 (0.024) 2.3x10 -5 rs4924506 15 41129467 A C 0.7318 0.06 (0.013) 5.5x10 -6 ++++++--+ -0.02 (0.039) 0.55 +-+ 0.05 (0.012) 4.2x10 -5 rs7773177 6 139143088 A G 0.7383 -0.06 (0.013) 8.5x10 -6 ---+-0.03 (---+-0.038) 0.48 +++ -0.05 (0.013) 7.7x10 -5 * Dir

ection per sample: allele

A1 incr

eases (+) or decr

eases (-) liability for cannabis use,

or sample did not contribute to this SNP because it did not pas

s the post-imputa

tion quality contr

ol (?). Only SNP s pr esent in a t le

ast 2 samples wer

e included in the meta-analysis.

Or

der of

samples in the discovery:

ALSP AC, BL TS, F innT win, HUVH, NTR, QIMR, TRAILS, Utr echt, Y ale P enn EA. Or der of samples in the replica tion: C ADD , NTR2/RAD AR, S YS. Sample informa

tion can be found in

Table 1.

Chr = Chr

omosome; BP (hg19)

=

loca

tion in base pairs in human genome version 19,

A1 = allele 1, A2 = allele 2, Fr eq A1 = F requency of allele 1, s.e. = standar d err or , P = p-value.

$The combined sample contains the discovery samples and the C

ADD , NTR2/R adar and S YS r eplica tion samples.

(12)

8

On chromosome 16, the genome-wide significant signals come from a set of six highly correlated SNPs (r2 >

0.8, located within the calcium-transporting ATPase (ATP2C2) gene. The strongest predictor of age at onset of cannabis use is rs1574587 (yielding the lowest p-value, P = 4.07*10-9). This SNP has a MAF ranging from

0.105 to 0.185 across the discovery samples (in line with MAFs reported for European ancestry populations according to Ensemble), and an imputation quality ≥ 0.89 (see Supplementary Table S4 for more details on this SNP).

The I2 statistic for the top SNP was 32.6% (χ2(7)=10.39, P=0.17), indicating that there is no evidence

of between-cohort heterogeneity in the observed effect. Given the size of the input samples, the I2

statistic is sufficiently powerful to detect heterogeneity due to systematic differences among the studies. Furthermore, the top SNP showed the same direction of the effect in all but one of the discovery cohorts. The 95% confidence intervals for the effect allele include the meta-analytic estimate (and exclude zero in four samples), as illustrated in Figure 1b. In the independent replication samples, none of the 8 tested SNPs replicated (see Table 2 for details). Note, however, the top SNP remained significant in the combined discovery and replication meta-analysis (P = 8.7*10-9)

Gene-based tests of association

Figure 2 provides an overview of the gene-based results. The quantile-quantile plot (Supplementary Figure S6) shows that the bulk of the test statistic distribution follows the expectation under the null hypothesis, and that several genomic regions are enriched for small p-values. Note that coding genic regions, rather than noncoding regions, were enriched for SNPs that yielded strong association signals in the single variant analysis (see Supplementary Figure S6).

As indicated by the Manhattan plot (Figure 2a), three genes reached the FDR threshold of 0.05 in the gene-based tests of association: the epithelial cell-transforming sequence 2 oncogene-like (ECT2L) on chromosome 6; the calcium-transporting ATPase (ATP2C2) gene on chromosome 16; and the DNA repair protein RAD51 homolog 2 (RAD51B) gene on chromosome 14 (see Table 3). We refer to Supplementary Table S5 for the top 100 genes identified in the discovery meta-analysis and to Figures 2b-d for zoom plots of the three significant genes.

(13)

8

FIGURE 2 | Results of the gene-based tests.

FIGURE 2A | Manhattan plot for the gene-based tests.

(14)

8

FIGURE 2C | Regional plots around ATP2C2

(15)

8

TABLE 3

|

Top genes as

sociated with age at first cannabis use in the discovery meta-analysis,

and their str ength of as sociation in the r eplication meta-analysis. R eported ar e the nominal

p-values and the F

alse Discovery R ate (FDR) corr ected p-values. Discovery Replica tion Gene S ymbol (name) Nominal P Corr ected P Nominal P CHR Start P osition Gr oup

Gene function accor

ding to GO annota tions EC T2L (Epithelial cell-tr ansforming sequence 2 oncogene-lik e) 6.59*10 -8 1.61*10 -3 7.97*10 -1 6 139117247 pr otein-coding gene positive r egulation of G TP ase activity r egulation of Rho pr otein signal tr

ansduction Rho guanyl-nucleotide

ex

change factor activity

ATP2C2 (C alcium-tr ansporting ATP ase) 1.54*10 -6 1.88*10 -2 1.58*10 -1 16 84440193 pr otein-coding gene calcium-tr ansporting ATP

ase activity metabolic pr

oces s calcium ion tr ansmembr ane tr ansport integr al component of membr ane

ATP binding metal ion binding

RAD51B (DNA

repair pr

otein

RAD51 homolog 2 or RAD51-lik

e 1) 5.22*10 -6 4.24*10 -2 8.15*10 -1 14 68286495 pr otein-coding gene DNA-dependent A TP ase activity

(16)

8

The strongest association was with ECT2L (P=6.59*10-8) on chromosome 6. ECT2L is a protein coding gene

located on chromosome 6q24.1, flanked by FLJ4696 and ABRACL (Figure 2b). The top associated SNP harboured by the ECT2L gene is rs7773177 (P=8.492*10-6).

The second gene-based association was ATP2C2 (P=1.54*10-6) on chromosome 16. Located at 16q24.1

(Figure 2c), ATP2C2 is in the vicinity of KCNG4 and COTL1. Note that ATP2C2 was also identified in the SNP-based analysis; the top SNP rs1574587 is located within this gene.

The third significant gene-based association was RAD51B (P=5.22*10-6) on chromosome 14. RAD51B is a

protein coding gene on 14q24.1. As displayed in Figure 2d, RAD51B is a large gene (910,440 bases) that harbours several SNPs in low LD (r2<0.2). The top SNP within the gene is rs17193049 (P = 3.97*10-5). Table

3 includes descriptive information on the top associated genes along with their functions according to the Gene Ontology (GO) annotations64, 65.

The top three genes associated with age at first cannabis use in the discovery sample did not reach significance in the meta-analysis of the three replication samples (see Table 3) (ECT2L – nominal P=7.97*10-1, FRD

P=1.00; ATP2C2 - nominal P=1.58*10-1, FRD P=1.00; RAD51B - nominal P=8.15*10-1, FDR P=1.00). SNP-based heritability analyses

The selected SNPs did not significantly contribute to the variance in age at first cannabis use according to the density estimation method (h2=0.03; p=0.069) or the LD score regression analysis (h2=0.02; p=.21).   Polygenic risk score (PRS) analysis

The combined effects of 366,351 SNPs did not explain a significant amount of variance in age at first use in an independent sample (N=2082). All PRS based on different selection thresholds yielded p-values >0.09.

Discussion

To our knowledge, this is the largest biometrical and molecular genetic study to date investigating the impact of genetic factors on age at first cannabis use. The biometric twin analysis of 8,264 twin pairs showed that genetic factors account for about 38% of the variance in age at first cannabis use (95% CI 19-60). The discovery genome-wide meta-analysis identified significant associations with five highly correlated SNPs within the calcium-transporting ATPase gene (ATP2C2) on chromosome 16. The strongest association was observed for the intronic variant rs1574587. The gene-based tests provided further evidence linking ATP2C2 to age at first cannabis use, along with two additional genes: the epithelial cell-transforming sequence 2 oncogene-like (ECT2L), and the DNA repair protein RAD51 homolog 2 (RAD51B), located on chromosomes 6 and 14, respectively. The smaller independent replication sample, however, did not replicate the discovery findings, likely due to insufficient power. Nonetheless, independent replications of these associations in larger samples are required.

(17)

8

Our top association, the ATP2C2 gene, is expressed in the brain66 and is involved in calcium homeostasis67,

which in turn regulates synaptic plasticity, memory and learning68. Recently, ATP2C2 has also been linked

to cocaine dependence. Gelernter et al.69 found that the highest ranked gene networks significantly

associated with cocaine dependence include ATP2C2 along with ATPase, Ca2+ -transporting, and the plasma membrane gene (ATP2B2). Noteworthy is that calcium signalling pathways have also been implicated in opioid dependence70. These findings are consistent with the observed associations between early onset of

cannabis use and experimentation with other drugs27, and progression to escalated use/dependence28.

In other words, it is plausible that some of the same genetic factors increase both the probability of early initiation of substance use and progression to substance use disorders as discussed in the literature (see

71-73). Taken together, these analyses suggest that the effects of ATP2C2 are likely to be general rather than

substance specific.

The ECT2L (6q24.1) yielded the strongest association signal in the gene-based analysis. This gene is involved in positive regulation of GTPase activity, i.e., the activity of heterotrimeric guanine nucleotide binding proteins (G proteins) that are crucial to signal transduction across the cell membrane. Rat and in vitro addiction models hinted at the role disruptions in G proteins signaling play in the aetiology of cocaine, alcohol and heroin dependence74, 75. Our results provide support for this hypothesis (assuming

the same genetic factors influence both the probability to experiment and to develop dependence/abuse other drugs).

The gene-based tests also identified the RAD51B gene on 14q24.1. RAD51B (also known as RAD51L1) belongs to the RAD51 paralogue family involved in double-strand break repair via homologous recombination, DNA and ATP binding. This gene has been proposed as a candidate for nicotine dependence76. The association

between early initiation of cannabis use and the risk for nicotine dependence is well-documented in the literature, and is likely attributable to shared genetic factors (rather than being causal in nature, see 22). Our

meta-analysis pinpoints a plausible candidate gene that may aid to the identification of genetic factors underlying the observed association.

Our SNP-based heritability analysis indicated that about 3% of the variance in age at first cannabis use is attributable to the SNPs currently measured in the GWAS array (p>0.069). In addition, the polygenic risk score based on a small selection of genotyped SNPs present in at least 7 cohorts provided only weak evidence of association with age at first use of cannabis in one replication sample (p>0.09). These null findings suggest that common SNPs explain only a relatively small proportion of total heritability in age at first cannabis use. The difference between the biometric ‘family-based’ and the molecular ‘SNP-based’ heritability estimates suggests that a large proportion of genetic variation in age at first use of cannabis cannot be captured by current GWAS arrays (e.g., rare genetic variants (MAF<.05) at current sample sizes. Additional sources of discrepancy may be attributable to interactions between genetic loci and environmental factors77. Detecting such interaction effects requires larger sample sizes and measures of

(18)

8

Strengths and limitations

Strengths. We amassed a meta-analytic sample exceeding 24,000 individuals. To our knowledge, this is the largest genome-wide study of age at first cannabis use to date. This meta-analytic sample located several loci that significantly predict age at first cannabis use. Especially the association at ATP2C2 lends further support to the hypothesized link between calcium signalling genes and substance use, and is consistent with the reported associations between early onset of cannabis use and multi-substance use as well as with subsequent dependence. Investigating these signals in larger replication samples is warranted and may yield valuable insights into the molecular bases of substance use initiation. The success of both the SNP- and the gene-based discovery analyses is also attributable, in part, to our survival-based method. We are unaware of any other similarly sized meta-analysis that has fitted this survival-based method to identify genetic loci associated with addiction phenotypes. This approach allowed us to exploit all available information in the participating samples, and to correctly take account of the censored nature of the observations.

Limitations. These results should be interpreted in the context of 3 potentially important methodological limitations. First, the replication sample was much smaller than the discovery sample (e.g., the top genome-wide significant SNP met our quality control criteria only in one sample comprising 551 individuals). We conjecture that the lack of replication is due to lack of statistical power. Second, we imposed stringent selection criteria on the SNPs making up the polygenic score by selecting only variants present in at least 7 discovery samples that were genotyped in the NTR2/RADAR replication sample (i.e., we removed imputed SNPs). Although this strategy maximized the prediction accuracy of the PRS, the weak evidence of association with age at first use of cannabis in the target sample (p>0.09) might be attributable to the small selection of SNPs (possibly in imperfect linkage disequilibrium with the causal variants). Third, the mean age at initiation and the degree of censoring varied across the participating cohorts, likely due to differences in sample characteristics, assessment strategies, or drug policy. Yet, amid these differences, the top SNPs generally had an effect in the same direction across the participating samples, as shown in the forest plots of the top independent signals (See Figure 1b and Supplementary Figures S5). Furthermore, the forest plots indicate that the 95% confidence intervals around the effect for each participating cohort generally overlap and contain the meta-analytic effect. One might therefore reasonably assume that the participating samples are representative of the same population of users. Note also that there was no evidence of significant between-cohort heterogeneity in the estimated effects (see Supplementary Table S3 for I2 heterogeneity statistic for the top 100 SNPs).

Conclusion

Based on a sample of more than 24,000 individuals, to date this study is the largest meta-analysis of age at first cannabis use. Our SNP-based findings support the involvement of the ATP2C2 gene. The gene-based tests also identified the ATP2C2, ECT2L and RAD51B genes as significant predictors of age at onset. Nonetheless, independent replication of these associations in larger samples is warranted. Our findings lend support to the hypothesized link between calcium signalling genes and substance use disorders.

(19)

8

Acknowledgements

JMV, CCM and HM are supported by the European Research Council [Beyond the Genetics of Addiction ERC-284167, PI JM Vink]. EMD is supported by the Foundation Volksbond Rotterdam. KJHV is supported by a 2014 NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation. NAG is supported by US National Institutes of Health, National Institute on Drug Abuse R00DA023549. CCM and MCN are supported by NIDA grant DA-018673. RW is supported by NIH U01 MH094432 and NSF BCS-1229450. Statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.

org) hosted by SURFsara and financially supported by the Netherlands Organization for Scientific Research

(NWO 480-05-003 PI: Posthuma) along with a supplement from the Dutch Brain Foundation and the VU University Amsterdam.

Study site acknowledgements

ALSPAC We are extremely grateful to all the families who took part in this study, the midwives for their

help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and the Wellcome Trust (Grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. GWAS data was generated by Sample Logistics and Genotyping Facilities at the Wellcome Trust Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe.

JJW is supported by a Postdoctoral Research Fellowship from the Oak Foundation. JJW and MRM are members of the MRC Integrative Epidemiology Unit at the University of Bristol, funded by the UK Medical Research Council (MC_UU_12013/6) and the University of Bristol. MH is a member of NIHR School of Public Health Research and NIHR Health Protection Research Unit in Evaluation. JJW and MRM are members of UK Centre for Tobacco and Alcohol Studies, and MH is a member of DeCIPHER (Development and Evaluation of Complex Interventions for Public Health Improvement) – which are both UKCRC Public Health Research: Centres of Excellence. Funding from British Heart Foundation, Cancer Research UK, Economic and Social Research Council, Medical Research Council, and the National Institute for Health Research, under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged.

BLTS The BLTS was supported by grants from the United States National Institute on Drug Abuse

(R00DA023549) awarded to Nathan Gillespie, by the Australian Research Council to Margie Wright (Nos. DP0343921, DP0664638, and DP1093900), and by Australian National Health and Medical Research Council Australia Fellowships awarded to Ian Hickie (No. 464914) and Grant Montgomery (No. 619667). We acknowledge and thank the following project staff: Anjali Henders, Leanne Wallace and Lisa Bowdler for the laboratory processing, genotyping, and QC; Soad Hancock as Project Coordinator; Lenore Sullivan as Research Editor; our research interviewers Pieta-Marie Shertock and Jill Wood; and David Smyth for IT. We also thank the twins and their siblings for their willing cooperation.

(20)

8

CADD The Center on Antisocial Drug Dependence (CADD) data reported here were funded by grants from

the National Institute on Drug Abuse (P60 DA011015, R01 DA012845, R01 DA021913, R01 DA021905, R01 DA035804).

FinnTwin We warmly thank the participating twin pairs and their family members for their contribution.

We would like to express our appreciation to the skilled study interviewers A-M Iivonen, K Karhu, H-M Kuha, U Kulmala-Gråhn, M Mantere, K Saanakorpi, M Saarinen, R Sipilä, L Viljanen and E Voipio. Anja Häppölä and Kauko Heikkilä are acknowledged for their valuable contribution in recruitment, data collection, and data management.

Phenotyping and genotyping of the Finnish twin cohorts has been supported by the Academy of Finland Center of Excellence in Complex Disease Genetics (grants 213506, 129680), the Academy of Finland (grants 100499, 205585, 118555, 141054, 265240, 263278 and 264146 to J. Kaprio), National Institute of Alcohol Abuse and Alcoholism (grants AA-12502, AA-00145, and AA-09203 to R. J. Rose and AA15416 and K02AA018755 to D. M. Dick), Sigrid Juselius Foundation (to J. Kaprio), and the Welcome Trust Sanger Institute, UK. Antti-Pekka Sarin and Samuli Ripatti are acknowledged for genotype data quality controls and imputation. GWAS analyses were run at the ELIXIR Finland node hosted at CSC – IT Center for Science for ICT resources.

HUVH We are grateful to patients and controls who kindly participated in this research. Financial support

was received “Instituto de Salud Carlos III-FIS” (PI12/01139, PI14/01700, PI15/01789, PI16/01505), and cofinanced by the European Regional Development Fund (ERDF), Agència de Gestió d’Ajuts Universitaris i de Recerca-AGAUR, Generalitat de Catalunya (2014SGR1357), Departament de Salut, Government of Catalonia, Spain , the European College of Neuropsychopharmacology (ECNP network: 'ADHD across the lifespan'), and a NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation. This project also received funding from the European Community’s Seventh Framework Program (under grant agreement number 602805, Aggressotype) and from the European Community’s H2020 Program (under grant agreement number 667302, CoCA).

Marta Ribasés is a recipient of a Miguel de Servet contract from the Instituto de Salud Carlos III, Ministerio de Economía, Industria y Competitividad, Spain (CP09/00119 and CPII15/00023). Iris Garcia-Martínez is a recipient of a contract from the 7th Framework Programme for Research, Technological Development and Demonstration, European Commission (AGGRESSOTYPE_FP7HEALTH2013/602805). Cristina Sánchez-Mora is a recipient of a Sara Borrell contract from the Spanish Ministerio de Economía y Competitividad (CD15/00199) and a mobility grant from the Spanish Ministerio de Economía y Competitividad, Instituto de Salud Carlos III (MV16/00039).

NTR & NTR2 We thank the Netherlands Twin Register participants whose data we analyzed in this study.

This work was supported by grants from the Netherlands Organization for Scientific Research [ZonMW Addiction 31160008; ZonMW 940-37-024; NWO/SPI 56-464-14192;NWO-400-05-717; NWO-MW 904-61-19; NWO-MagW 480-04-004; NWO-Veni 016-115-035], the European Research Council [Beyond the Genetics of Addiction ERC-284167; Genetics of Mental Illness: ERC-230374], the Centre for Medical Systems

(21)

8

Biology (NWO Genomics), Netherlands Bioinformatics Center/BioAssist/RK/2008.024. We acknowledge the EMGO+ Institute for Health and Care Research, the Neuroscience Campus Amsterdam, BBMRI – NL (184.021.007: Biobanking and Biomolecular Resources Research Infrastructure), the Avera Institute, Sioux Falls, South Dakota (USA) for support. Genotyping was funded in part by grants from the National Institutes of Health (4R37DA018673-06, RC2 MH089951), Rutgers University Cell and DNA Repository cooperative agreement [National Institute of Mental Health U24 MH068457-06], and the National Institutes of Health (NIH R01 HD042157-01A1, MH081802, Grand Opportunity grants 1RC2 MH089951 and 1RC2 MH089995) and the Genetic Association Information Network (GAIN) of the Foundation for the National Institutes of Health. The statistical analyses were carried out on the Genetic Cluster Computer (http://www. geneticcluster.org) which is supported by the Netherlands Scientific Organization (NWO 480-05-003), the Dutch Brain Foundation and the Department of Psychology and Education of the VU University Amsterdam.

QIMR Supported by National Institutes of Health Grants AA07535, AA0758O, AA07728, AA10249, AA13320,

AA13321, AA14041, AA11998, AA17688, DA012854, DA018267, DA018660, DA23668 and DA019951; by Grants from the Australian National Health and Medical Research Council (241944, 339462, 389927, 389875, 389891, 389892, 389938, 442915, 442981, 496739, 552485, 552498, and 628911); by Grants from the Australian Research Council (A7960034, A79906588, A79801419, DP0770096, DP0212016, and DP0343921); and by the 5th Framework Programme (FP-5) GenomEUtwin Project (QLG2-CT-2002-01254). This research was further supported by the Centre for Research Excellence on Suicide Prevention (CRESP - Australia).

We thank Anjali Henders, Richard Parker, Soad Hancock, Judith Moir, Sally Rodda, Pieta-Maree Shertock, Heather Park, Jill Wood, Pam Barton, Fran Husband, Adele Somerville, Ann Eldridge, Marlene Grace, Kerrie McAloney, Lisa Bowdler, Alexandre Todorov, Steven Crooks, David Smyth, Harry Beeby, and Daniel Park. Last, we thank the twins and their families for their participation.

Radar We thank all adolescents and their families and friends for their participation. Moreover, we want to

thank the various assistants that helped in recruiting participants as well as collecting and cleaning the data. The research was funded partly by the Netherlands Organisation for Scientific Research (Brain & Cognition, 056-21-010). RADAR has been financially supported by main grants from the Netherlands Organisation for Scientific Research (GB-MAGW 480-03-005), and Stichting Achmea Slachtoffer en Samenleving (SASS), a grant from the Netherlands Organisation for Scientific Research to the Consortium Individual Development (CID; 024.001.003), and various other grants from the Netherlands Organisation for Scientific Research, the VU University Amsterdam, and Utrecht University. AJH is supported by the Netherlands Organization for Health Research and Development, ZonMW 31160212.

Saguenay Youth Study The Canadian Institutes of Health Research and the Heart and Stroke Foundation

of Canada fund the SYS (TP, ZP). TP is the Tanenbaum Chair in Population Neuroscience (University of Toronto) and the Dr. John and Consuela Phelan Scholar (Child Mind Institute).

TRAILS TRAILS (TRacking Adolescents’ Individual Lives Survey) is a collaborative project involving various

(22)

8

Radboud Medical Center Nijmegen, and the Parnassia Bavo group, all in the Netherlands. TRAILS has been financially supported by grants from the Netherlands Organization for Scientific Research NWO (Medical Research Council program grant GB-MW 940-38-011; ZonMW Brainpower grant 100-001-004; ZonMw Risk Behavior and Dependence grant 60-60600-97-118; ZonMw Culture and Health grant 261-98-710; Social Sciences Council medium-sized investment grants GB-MaGW 480-01-006 and GB-MaGW 480-07-001; Social Sciences Council project grants GB-MaGW 452-04-314 and GB-MaGW 452-06-004; NWO large-sized investment grant 175.010.2003.005; NWO Longitudinal Survey and Panel Funding 481-08-013 and 481-11-001); the Dutch Ministry of Justice (WODC), the European Science Foundation (EuroSTRESS project FP-006), Biobanking and Biomolecular Resources Research Infrastructure BBMRI-NL (CP 32), the participating universities, and Accare Center for Child and Adolescent Psychiatry. We are grateful to all adolescents, their parents and teachers who participated in this research and to everyone who worked on this project and made it possible. Statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org), which is financially supported by the Netherlands Scientific Organization (NWO 480-05-003) along with a supplement from the Dutch Brain Foundation.

Utrecht We are grateful to Chris Schubart and Willemijn van Gastel and numerous students for their work

in the study. Foremost we like to thank our study participants. This study was financially supported by a grant of the NWO (Netherlands Organization for Scientific Research), grant no. 91207039. The study was performed at the University Medical Centre Utrecht, The Netherlands.

Yale Penn Genotyping services for a part of our GWAS study were provided by the Center for Inherited

Disease Research (CIDR) and Yale University (Center for Genome Analysis). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University (contract number N01-HG-65403). This study was supported by National Institutes of Health grants RC2 DA028909, R01 DA12690, R01 DA12849, R01 DA18432, R01 AA11330, R01 AA017535, and the VA Connecticut and Philadelphia VA MIRECCs.

Financial Disclosure

HRK has been a consultant, CME speaker or Advisory Board Member for Lundbeck and Indivior and is a member of the American Society of Clinical Psychopharmacology’s Alcohol Clinical Trials Initiative, which was supported in the last three years by AbbVie, Alkermes, Ethypharm, Indivior, Lilly, Lundbeck, Otsuka, Pfizer, and XenoPort. The other co-authors do not have a conflict of interest.

(23)

8

References

1. National Drug Strategy Household Survey Report. Canberra, Australian Institute of Health and Welfare, 2013.

2. Results from the 2013 National Survey on Drug use and Health: Summary of National Findings. Rockville, MD, Substance Abuse and Mental Health Services Administration, 2014.

3. Facts & Figures. World Health Organization, 2016. http://www.who.int/substance_abuse/facts/cannabis/en/

4. Degenhardt L, Lynskey M, Hall W: Cohort trends in the age of initiation of drug use in Australia. Aust N Z J Public Health 2000; 24: 421-426.

5. Monshouwer K, Smit F, de Graaf R, van Os J, Vollebergh W: First cannabis use: does onset shift to younger ages? Findings from 1988 to 2003 from the Dutch National School Survey on Substance Use. Addiction 2005; 100: 963-970.

6. Kokkevi A, Gabhainn SN, Spyropoulou M, Risk Behav Focus Grp HBSC: Early initiation of cannabis use: A cross-national European perspective. Journal of Adolescent Health 2006; 39: 712-719.

7. Degenhardt L, Chiu W, Sampson N et al: Toward a global view of alcohol, tobacco, cannabis, and cocaine use: Findings from the WHO World Mental Health Surveys. Plos Medicine 2008; 5: 1053-1067.

8. Butterworth P, Slade T, Degenhardt L: Factors associated with the timing and onset of cannabis use and cannabis use disorder: Results from the 2007 Australian National Survey of Mental Health and Well-Being. Drug Alcohol Rev 2014; 33: 555-564. 9. UNODC: World Drug Report 2010. United Nations Publication, 2010.

10. Grant JD, Scherrer JF, Lynskey MT et al: Associations of Alcohol, Nicotine, Cannabis, and Drug Use/Dependence with Educational Attainment: Evidence from Cotwin-Control Analyses. Alcoholism-Clinical and Experimental Research 2012; 36: 1412-1420.

11. Verweij KJH, Huizink AC, Agrawal A, Martin NG, Lynskey MT: Is the relationship between early-onset cannabis use and educational attainment causal or due to common liability? Drug Alcohol Depend 2013; 133: 580-586.

12. Stiby AI, Hickman M, Munafo MR, Heron J, Yip VL, Macleod J: Adolescent cannabis and tobacco use and educational outcomes at age 16: birth cohort study. Addiction 2015; 110: 658-668.

13. Tamm L, Epstein JN, Lisdahl KM et al: Impact of ADHD and cannabis use on executive functioning in young adults. Drug

Alcohol Depend 2013; 133: 607-614.

14. Hall W, Degenhardt L: Adverse health effects of non-medical cannabis use. Lancet 2009; 374: 1383-1391.

15. van der Pol P, Liebregts N, de Graaf R, Korf DJ, van den Brink W, van Laar M: Predicting the transition from frequent cannabis use to cannabis dependence: A three-year prospective study. Drug Alcohol Depend 2013; 133: 352-359.

16. French L, Gray C, Leonard G et al: Early Cannabis Use, Polygenic Risk Score for Schizophrenia, and Brain Maturation in Adolescence. Jama Psychiatry 2015; 72: 1002-1011.

17. Arseneault L, Cannon M, Poulton R, Murray R, Caspi A, Moffitt T: Cannabis use in adolescence and risk for adult psychosis: longitudinal prospective study. Br Med J 2002; 325: 1212-1213.

18. Fergusson D, Lynskey M, Horwood L: The short-term consequences of early onset cannabis use. J Abnorm Child Psychol 1996; 24: 499-512.

19. Gage SH, Hickman M, Zammit S: Association Between Cannabis and Psychosis: Epidemiologic Evidence. Biol Psychiatry 2016; 79: 549-556.

20. Schubart CD, van Gastel WA, Breetvelt EJ et al: Cannabis use at a young age is associated with psychotic experiences. Psychol

Med 2011; 41: 1301-1310.

21. Agrawal A, Grant JD, Waldron M et al: Risk for initiation of substance use as a function of age of onset of cigarette, alcohol and cannabis use: Findings in a Midwestern female twin cohort. Prev Med 2006; 43: 125-128.

22. Agrawal A, Lynskey MT, Pergadia ML et al: Early cannabis use and DSM-IV nicotine dependence: a twin study. Addiction 2008; 103: 1896-1904.

23. Bracken BK, Rodolico J, Hill KP: Sex, Age, and Progression of Drug Use in Adolescents Admitted for Substance Use Disorder Treatment in the Northeastern United States: Comparison With a National Survey. Substance Abuse 2013; 34: 263-272. 24. Chen C, Storr CL, Anthony JC: Early-onset drug use and risk for drug dependence problems. Addict Behav 2009; 34: 319-322. 25. Grant JD, Lynskey MT, Scherrer JF, Agrawal A, Heath AC, Bucholz KK: A cotwin-control analysis of drug use and abuse/

dependence risk associated with early-onset cannabis use. Addict Behav 2010; 35: 35-41.

26. King KM, Chassin L: A prospective study of the effects of age of initiation of alcohol and drug use on young adult substance dependence. Journal of Studies on Alcohol and Drugs 2007; 68: 256-265.

(24)

8

27. Lynskey M, Vink J, Boomsma D: Early onset cannabis use and progression to other drug use in a sample of Dutch twins. Behav

Genet 2006; 36: 195-200.

28. Lynskey MT, Agrawal A, Henders A, Nelson EC, Madden PAF, Martin NG: An Australian Twin Study of Cannabis and Other Illicit Drug Use and Misuse, and Other Psychopathology. Twin Research and Human Genetics 2012; 15: 631-641.

29. Verweij KJH, Zietsch BP, Lynskey MT et al: Genetic and environmental influences on cannabis use initiation and problematic use: a meta-analysis of twin studies. Addiction 2010; 105: 417-430.

30. Richmond-Rakerd LS, Slutske WS, Lynskey MT et al: Age at First Use and Later Substance Use Disorder: Shared Genetic and Environmental Pathways for Nicotine, Alcohol, and Cannabis. J Abnorm Psychol 2016; 125: 946-959.

31. Sartor CE, Agrawal A, Lynskey MT, Buchoz KK, Madden PAF, Heath AC: Common genetic influences on the timing of first use for alcohol, cigarettes, and cannabis in young African-American women. Drug Alcohol Depend 2009; 102: 49-55.

32. Vink J, Willemsen G, Boomsma D: Heritability of smoking initiation and nicotine dependence. Behav Genet 2005; 35: 397-406.

33. Minica CC, Dolan CV, Hottenga J et al: Heritability, SNP- and Gene-Based Analyses of Cannabis Use Initiation and Age at Onset.

Behav Genet 2015; 45: 503-513.

34. Kiefer AK, Tung JY, Do CB et al: Genome-Wide Analysis Points to Roles for Extracellular Matrix Remodeling, the Visual Cycle, and Neuronal Development in Myopia. Plos Genetics 2013; 9: e1003299.

35. van der Net JB, Janssens ACJW, Eijkemans MJC, Kastelein JJP, Sijbrands EJG, Steyerberg EW: Cox proportional hazards models have more statistical power than logistic regression models in cross-sectional genetic association studies. European

Journal of Human Genetics 2008; 16: 1111-1116.

36. Stringer S, Denys D, Kahn RS, Derks EM: What Cure Models Can Teach us About Genome-Wide Survival Analysis. Behav Genet 2016; 46: 269-280.

37. Stringer S, Minica CC, Verweij KJH et al: Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32330 subjects from the International Cannabis Consortium. Translational Psychiatry 2016; 6: e769. 38. Clarke TK, Adams MJ, Davies G et al: Genome-wide association study of alcohol consumption and genetic overlap with other

health-related traits in UK Biobank (N=112,117). bioRxiv 2017.

39. Boutwell B, Hinds D, Tielbeek J, Ong KK, Day FR, Perry JRB: Replication and characterization of CADM2 and MSRA genes on human behavior. Heliyon 2017; 3: e00349.

40. Day FR, Helgason H, Chasman DI et al: Physical and neurobehavioral determinants of reproductive onset and success. Nat

Genet 2016; 48: 617-623.

41. Li M, Gui H, Kwan JSH, Sham PC: GATES: A Rapid and Powerful Gene-Based Association Test Using Extended Simes Procedure.

Am J Hum Genet 2011; 88: 283-293.

42. Li M, Kwan JSH, Sham PC: HYST: A Hybrid Set-Based Test for Genome-wide Association Studies, with Application to Protein-Protein Interaction-Based Association Analysis. Am J Hum Genet 2012; 91: 478-488.

43. Willemsen G, Vink JM, Abdellaoui A et al: The Adult Netherlands Twin Register: Twenty-Five Years of Survey and Biological Data Collection. Twin Research and Human Genetics 2013; 16: 271-281.

44. Heath AC, Whitfield JB, Martin NG et al: A Quantitative-Trait Genome-Wide Association Study of Alcoholism Risk in the Community: Findings and Implications. Biol Psychiatry 2011; 70: 513-518.

45. Knopik V, Heath A, Madden P et al: Genetic effects on alcohol dependence risk: re-evaluating the importance of psychiatric and other heritable risk factors. Psychol Med 2004; 34: 1519-1530.

46. Gillespie NA, Henders AK, Davenport TA et al: The Brisbane Longitudinal Twin Study: Pathways to Cannabis Use, Abuse, and Dependence Project-Current Status, Preliminary Results, and Future Directions. Twin Research and Human Genetics 2013; 16: 21-33.

47. Neale M, Cardon L: Methodology for Genetic Studies of Twins and Families. Dordrecht, Kluwer, 1992.

48. Posthuma D, Beem A, de Geus E et al: Theory and practice in quantitative genetics. Twin Research 2003; 6: 361-376. 49. Altshuler DM, Durbin RM, Abecasis GR et al: An integrated map of genetic variation from 1,092 human genomes. Nature

2012; 491: 56-65.

50. Therneau T: A Package for Survival Analysis in S (version 2.37-7). 2015.

51. Minica CC, Dolan CV, Kampert MMD, Boomsma DI, Vink JM: Sandwich corrected standard errors in family-based genome-wide association studies. European Journal of Human Genetics 2015; 23: 388-394.

52. Purcell S, Neale B, Todd-Brown K et al: PLINK: A tool set for whole-genome association and population-based linkage analyses.

(25)

8

53. Urbanek S: Rserve: Binary R server (version 1.7-3). 2013.

54. Willer CJ, Li Y, Abecasis GR: METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26: 2190-2191.

55. de Bakker PIW, Neale BM, Daly MJ: Meta-Analysis of Genome-Wide Association Studies. Cold Spring Harb Protoc 2010; 2010: pdb.top81.

56. Furberg H, Kim Y, Dackor J et al: Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat

Genet 2010; 42: 441-447.

57. Allen HL, Estrada K, Lettre G et al: Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 2010; 467: 832-838.

58. Pe'er I, Yelensk R, Altshuler D, Daly MJ: Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 2008; 32: 381-385.

59. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal

of the Royal Statistical Society Series B-Methodological 1995; 57: 289-300.

60. So H, Li M, Sham PC: Uncovering the Total Heritability Explained by All True Susceptibility Variants in a Genome-Wide Association Study. Genet Epidemiol 2011; 35: 447-456.

61. van Beek JHDA, de Moor MHM, Geels LM, Willemsen G, Boomsma DI: Explaining Individual Differences in Alcohol Intake in Adults: Evidence for Genetic and Cultural Transmission? Journal of Studies on Alcohol and Drugs 2014; 75: 201-210. 62. Bulik-Sullivan BK, Loh P, Finucane HK et al: LD Score regression distinguishes confounding from polygenicity in genome-wide

association studies. Nat Genet 2015; 47: 291-295.

63. Vilhjalmsson BJ, Yang J, Finucane HK et al: Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am

J Hum Genet 2015; 97: 576-592.

64. Ashburner M, Ball C, Blake J et al: Gene Ontology: tool for the unification of biology. Nat Genet 2000; 25: 25-29.

65. Blake JA, Christie KR, Dolan ME et al: Gene Ontology Consortium: going forward. Nucleic Acids Res 2015; 43: D1049-D1056. 66. Xiang M, Mohamalawari D, Rao R: A novel isoform of the secretory pathway Ca2+, Mn2+-ATPase, hSPCA2, has unusual

properties and is expressed in the brain. J Biol Chem 2005; 280: 11608-11614.

67. Newbury DF, Winchester L, Addis L et al: CMIP and ATP2C2 Modulate Phonological Short-Term Memory in Language Impairment. Am J Hum Genet 2009; 85: 264-272.

68. Zheng JQ, Poo M: Calcium signaling in neuronal motility. Annu Rev Cell Dev Biol 2007; 23: 375-404.

69. Gelernter J, Sherva R, Koesterer R et al: Genome-wide association study of cocaine dependence and related traits: FAM53B identified as a risk gene. Mol Psychiatry 2014; 19: 717-723.

70. Gelernter J, Kranzler HR, Sherva R et al: Genome-Wide Association Study of Opioid Dependence: Multiple Associations Mapped to Calcium and Potassium Pathways. Biol Psychiatry 2014; 76: 66-74.

71. Gillespie NA, Neale MC, Kendler KS: Pathways to cannabis abuse: a multi-stage model from cannabis availability, cannabis initiation and progression to abuse. Addiction 2009; 104: 430-438.

72. Neale M, Harvey E, Maes H, Sullivan P, Kendler K: Extensions to the modeling of initiation and progression: Applications to substance use and abuse. Behav Genet 2006; 36: 507-524.

73. Agrawal A, Neale M, Jacobson K, Prescott C, Kendler K: Illicit drug use and abuse/dependence: modeling of two-stage variables using the CCC approach. Addict Behav 2005; 30: 1043-1048.

74. Cami J, Farre M: Drug addiction. N Engl J Med 2003; 349: 975-986.

75. Bowers MS: Activators of G-protein signaling 3: a drug addiction molecular gateway. Behav Pharmacol 2010; 21: 500-513. 76. Drgon T, Montoya I, Johnson C et al: Genome-Wide Association for Nicotine Dependence and Smoking Cessation Success in

NIH Research Volunteers. Molecular Medicine 2009; 15: 21-27.

77. Uher R: Gene-environment interactions in common mental disorders: an update and strategy for a genome-wide search. Soc

Referenties

GERELATEERDE DOCUMENTEN

Fuchsberger C, Taliun D, Pramstaller PP, Pattaro C, CKDGen Consortium: GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies

However, before meta-analysing EWAS results originating from multiple sources, it is important to perform a thorough, centralized quality control (QC) in order to verify

We developed ‘lodGWAS’, a flexible, easy-to-use software package that is capable of performing GWAS analysis of biomarkers while accommodating the problem of LOD by applying survival

Despite the recent explosive rise in number of genetic markers for complex disease traits identified in genome-wide association studies, there is still a large gap between the

The SNP-based heritability estimates in the EGCUT sample for self-reported residual facet scales – from which the common variance of Neuroticism had been statistically removed –

We undertook a meta-analysis of GWAS from 33 studies that imputed genotypes from The 1000 Genomes reference panel, hypothesizing that this would uncover novel common

We approached this from four different angles: QC of GWAS and EWAS results, use of survival analysis in GWAS, estimation of common-SNP heritability of complex traits, and the use of

In chapter 5 we used genetic risk scores (GRS) and genomic restricted maximum likelihood (GREML) methods to estimate the amount of common SNP heritability accounted for by the