Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia

(1)

Genome-wide association study results for

educational attainment aid in identifying genetic

heterogeneity of schizophrenia

V. Bansal

1,2,3

, M. Mitjans

1,4

, C. A. P. Burik

5,6,7

, R.K. Linnér

5,6,7

, A. Okbay

5,7

, C.A. Rietveld

6,8

,

M. Begemann

1,9

, S. Bonn

2,3

, S. Ripke

10,11,12

, R. de Vlaming

5,7

, M. G. Nivard

13 ,

H. Ehrenreich

1,4

& P. D. Koellinger

5,6,7

Higher educational attainment (EA) is negatively associated with schizophrenia (SZ).

How-ever, recent studies found a positive genetic correlation between EA and SZ. We investigate

possible causes of this counterintuitive

ﬁnding using genome-wide association study results

for EA and SZ (N = 443,581) and a replication cohort (1169 controls; 1067 cases) with deeply

phenotyped SZ patients. We

ﬁnd strong genetic dependence between EA and SZ that cannot

be explained by chance, linkage disequilibrium, or assortative mating. Instead, several genes

seem to have pleiotropic effects on EA and SZ, but without a clear pattern of sign

con-cordance. Using EA as a proxy phenotype, we isolate FOXO6 and SLITRK1 as novel candidate

genes for SZ. Our results reveal that current SZ diagnoses aggregate over at least two disease

subtypes: one part resembles high intelligence and bipolar disorder (BIP), while the other part

is a cognitive disorder that is independent of BIP.

DOI: 10.1038/s41467-018-05510-z

OPEN

1_{Clinical Neuroscience, Max Planck Institute of Experimental Medicine, Hermann-Rein-Straße 3, 37075 Göttingen, Germany.}2_{Research Group for}

Computational Systems Biology, German Center for Neurodegenerative Diseases (DZNE), Von-Siebold-Straße 3A, 37075 Göttingen, Germany.3Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Clinic Hamburg-Eppendorf, Falkenried 94, 20251 Hamburg, Germany.4DFG Research Center for Nanoscale Microscopy and Molecular Physiology of the Brain (CNMPB), Humboldtallee 23, 30703 Göttingen, Germany.5Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1085 B-631, 1081 HV Amsterdam, Netherlands.6Institute for Behavior and Biology, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, Netherlands.7School of Business and Economics, Department of Economics, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands.8Erasmus School of Economics, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, Netherlands.9Department of Psychiatry & Psychotherapy, University of Göttingen, Von-Siebold-Straße 5, 37075 Göttingen, Germany.10Analytic and Translational Genetics Unit, Massachusetts General Hospital, 02114 MA Boston, USA.11Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 02142 MA Cambridge, USA.12_{Department of Psychiatry and Psychotherapy, Charité-Universitätsmedizin Berlin, Campus Mitte, Berlin 10117, Germany.}13_{Department of}

Biological Psychology, Vrije Universiteit Amsterdam, van der Boechorststraat 1, 1081 BT Amsterdam, Netherlands. These authors contributed equally: V. Bansal, M. Mitjans. These authors jointly supervised this work: M.G. Nivard, H. Ehrenreich, P.D. Koellinger. Correspondence and requests for materials should be addressed to P.D.K. (email:p.d.koellinger@vu.nl)

123456789

(2)

S

chizophrenia (SZ) is the collective term used for a severe,

highly heterogeneous and costly psychiatric disorder that is

caused by environmental and genetic factors

1–4

_{. A}

genome-wide association study (GWAS) by the Psychiatric Genomics

Consortium (PGC) identiﬁed 108 genomic loci that are associated

with SZ

5

_{. These 108 loci jointly account for}

_{≈3.4% of the variation}

on the liability scale for SZ

5

, while all single-nucleotide

poly-morphisms (SNPs) that are currently measured by SNP arrays

capture

≈64% (s.e. = 8%) of the variation in liability for the

dis-ease

6

_{. This implies that many genetic variants with small effect}

sizes contribute to the heritability of SZ, but most of them are

unidentiﬁed as of yet. A polygenic score (PGS) based on all SNPs

currently accounts for 4–15% of the variation on the liability scale

for SZ

5

.

Yet, this PGS does not predict any differences in symptoms or

severity of the disease among SZ patients

4

_{. Partly, this could be}

because the clinical disease classiﬁcation of SZ spans several

different behavioural and cognitive traits that may not have

identical genetic architectures. Therefore, identifying additional

genetic variants and understanding through which pathways they

are linked with the clinical diagnosis of SZ is an important step in

understanding the aetiologies of the

‘schizophrenias’

7

_{. However,}

GWAS analyses of speciﬁc SZ symptoms would require very large

sample sizes to be statistically well-powered, and the currently

available data sets on deeply phenotyped SZ patients are not yet

large enough for this purpose.

Here, we use an alternative approach to make progress with

data that is readily available—by combining GWAS for SZ and

educational attainment (EA). Previous studies suggest a complex

relationship between EA and SZ

8

that may be used to gain

additional insights into the genetic architecture of SZ and its

symptoms. In particular, phenotypic data seem to suggest a

negative correlation between EA and SZ

9

. For example, SZ

patients with lower EA typically show an earlier age of disease

onset, higher levels of psychotic symptomatology and worsened

global cognitive function

9

. In fact, EA has been suggested to be a

measure of premorbid function and a predictor of outcomes in

SZ. Moreover, it has been forcefully argued that retarded

intel-lectual development, global cognitive impairment during

child-hood and bad school performance should be seen as core features

of SZ that precede the development of psychotic symptoms and

differentiate SZ from bipolar disorder (BIP)

10–14

_{. Furthermore,}

credible genetic links between SZ and impaired cognitive

per-formance have been found

15

.

In contrast to these

ﬁndings, recent studies using large-scale

GWAS results identiﬁed a small, but positive genetic correlation

between EA and SZ (ρ

EA,SZ

= 0.08)

8

, and higher PGS values for

SZ have been reported to be associated with creativity and

greater EA

16

. Other statistically well-powered studies found

that a high intelligence quotient (IQ) has protective effects

against SZ

17

_{and reported a negative genetic correlation}

between IQ and SZ (ρ

IQ,SZ

= –0.2)

18

, suggesting the possibility

that genetic effects that contribute to EA but not via IQ

are responsible for the observed positive genetic correlation

between SZ and EA.

Indeed, previous research by the Social Science Genetic

Asso-ciation Consortium (SSGAC)

8

_{already demonstrated that the}

effect of the EA-PGS on years of schooling is mediated by several

individual characteristics that have imperfect or no genetic

cor-relation with each other, including higher IQ, higher openness

and higher conscientiousness. These different factors that

con-tribute to EA seem to be related to SZ and its symptoms in

complex ways

19–21

. For example, differences in openness have

been reported to differentiate between patients diagnosed with SZ

spectrum personality disorders (higher openness) from patients

diagnosed with SZ (lower openness), while conscientiousness

tends to be reduced among patients of both disorders compared

to healthy controls

19

.

The contributing factors to EA that have previously been

iden-tiﬁed by the SSGAC (i.e. IQ, openness and conscientiousness)

8

_are

phenotypically and genetically related, but by no means

identical

22,23

_{. Speciﬁcally, the Cognitive Genomics Consortium}

(COGENT) reported a moderate genetic correlation between IQ

and openness (r

g

= 0.48, P = 3.25 × 10

–4

), but only a small genetic

correlation of IQ and conscientiousness of 0.10 that was

indis-tinguishable from zero (r

g

= 0.10, P = 0.46)

24

. Therefore, it is

appropriate to think of EA as a genetically heterogeneous trait

that can be decomposed into subphenotypes that have imperfect

genetic correlations with each other. If the various symptoms of

SZ also have non-identical genetic architectures, this could result

in a pattern where both EA and SZ share many genetic loci, but

without a clear pattern of sign concordance and with seemingly

contradictory phenotypic and genetic correlation results.

To explore this hypothesis and to discern it from alternative

explanations, we perform a series of statistical genetic analyses

using large-scale GWAS results for SZ and EA from

non-overlapping samples. We start by characterizing the genetic

relationship between both traits by using EA as a

‘proxy

pheno-type’

25

_{for SZ. We annotate possible biological pathways, tissues}

and cell types implied by genetic variants that are associated with

both traits and explore to what extent these variants are also

enriched for association with other traits. We test if the genetic

relationship between EA and SZ can be explained by chance,

linkage disequilibrium (LD) or assortative mating. Furthermore,

we investigate the hypothesis that the part of SZ that is different

from BIP is a neurodevelopmental disorder, whereas the part of

SZ that overlaps with BIP is not. Finally, we develop a formal

statistical test for genetic heterogeneity of SZ using a polygenic

prediction framework that leverages both the SZ and the EA

GWAS results. Together, our analysis suggest that current SZ

diagnoses aggregate over at least two disease subtypes: one part

resembles BIP and high IQ, while the other part is a cognitive

disorder that is independent of BIP.

Results

Genetic dependence and genetic correlation. As a formal

pre-lude to our study, it is conceptually important to differentiate

between genetic dependence and genetic correlation. In our

analyses, genetic dependence means that the genetic variants

associated with EA are more likely to also be associated with SZ

than expected by chance. In contrast, genetic correlation is

deﬁned by the correlation of the (true) effect sizes of genetic

variants on the two traits. Thus, genetic correlation implies a

linear genetic relationship between two traits whereas genetic

dependence does not. Thus, two traits can be genetically

depen-dent even if they are not genetically correlated and vice versa. One

possible cause of a non-linear genetic dependence is that at least

one of the traits is genetically heterogeneous in the sense that it

aggregates across subphenotypes (or symptoms) with

non-identical genetic architectures. Supplementary Note

1 presents a

formal discussion and simulations that illustrate the data patterns

that can emerge.

Proxy-phenotype analyses. We used the proxy-phenotype

method (PPM)

25

to illustrate the genetic dependence between

EA and SZ. PPM is a two-stage approach. In the

ﬁrst stage, a

GWAS on the proxy-phenotype (EA) is conducted. The most

strongly associated loci are then advanced to the second stage,

which tests the association of these loci with the phenotype of

interest (SZ) in an independent sample. If the two traits are

genetically dependent, this two-stage approach can increase the

(3)

statistical power for detecting associations for the target trait

because it limits the multiple testing burden for the phenotype of

interest compared to a GWAS

8,25,26

_.

Our PPM analyses followed a preregistered analysis plan

(

https://osf.io/dnhfk/

) using GWAS results on EA (n

= 363,502)

8

and SZ (34,409 cases and 45,670 controls)

5

_{that were obtained}

from non-overlapping samples of Europeans. For replication and

follow-up analyses, we used the Göttingen Research Association

for Schizophrenia (GRAS) data collection

27

_{, which has a uniquely}

rich and accurate set of SZ measures. The GRAS sample was not

part of either GWAS.

Analyses were performed using 8,240,280 autosomal SNPs that

passed quality controls in both GWAS and additional

ﬁlters. We

selected approximately independent lead SNPs from the EA

GWAS that passed the predeﬁned signiﬁcance threshold of P

EA

<

10

–5

and looked up their SZ results. To test if EA-associated SNPs

are more strongly associated with SZ than expected by chance

(referred to as

‘raw enrichment’ below), we conducted a

Mann–Whitney test that compares the P

SZ

values of the

EA-associated lead SNPs with the P

SZ

values of a set of randomly

drawn, approximately LD-independent SNPs with similar minor

allele frequencies (MAFs). Fig.

1 presents an overview of the

proxy-phenotype analyses.

The

ﬁrst-stage GWAS on EA identiﬁed 506 loci that passed our

predeﬁned threshold of P

EA

< 10

–5

(Supplementary Note

2 ); 108

of them were signiﬁcant at the genome-wide level (P

EA

< 5 × 10

–8

,

see Supplementary Data

2 ). Of the 506 EA lead-SNPs, 132 are

associated with SZ at nominal signiﬁcance (P

SZ

< 0.05), and 21 of

these survive Bonferroni correction (P

SZ

<

0:05₅₀₆

= 9.88 × 10

−5

)

(Table

1 ). LD score regression results suggest that the vast

majority of the association signal in both the EA

8

and the SZ

5

GWAS are truly genetic signals, rather than spurious signals

originating from uncontrolled population stratiﬁcation. Fig.

2 a

shows a Manhattan plot for the GWAS on EA highlighting SNPs

that were also signiﬁcantly associated with SZ (black crosses for

P

SZ

< 0.05, magenta crosses for P

SZ

= 9.88 × 10

–5

).

A Q–Q plot of the 506 EA lead SNPs for SZ is shown in Fig.

2 b.

Although the observed sign concordance of 52% is not

signiﬁcantly different from a random pattern (P = 0.40), we ﬁnd

3.23 times more SNPs in this set of 506 SNPs that are nominally

signiﬁcant for SZ than expected given the distribution of the

P values in the SZ GWAS results (raw enrichment P

= 6.87 ×

10

−10

). The observed enrichment of the 21 EA lead SNPs that

pass Bonferroni correction for SZ (P

SZ

<

0:05₅₀₆

= 9.88 × 10

−5

) is

even more pronounced (27 times stronger, P

= 5.44 × 10

−14

).

The effect sizes of these 21 SNPs on SZ are small, ranging from

odds ratio (OR)

= 1.02 (rs4500960) to OR = 1.11 (rs4378243)

after correction for the statistical winner’s curse

25

_(Table

₁

_{). We}

calculated the probability that these 21 SNPs are truly associated

with SZ using a heuristic Bayesian method that takes the winner’s

curse corrected effect sizes, statistical power and prior beliefs into

account

25

_{. Applying a reasonable prior belief of 5%, we}

_{ﬁnd that}

all 21 SNPs are likely or almost certain to be true positives.

Novel SZ loci. Of the 21 variants we identiﬁed, 12 are in LD with

loci previously reported by the PGC

5

_{and two are in the major}

histocompatibility complex region on chromosome (chr) 6 and

were therefore not separately reported in that study. Three of the

variants we isolated (rs7610856, rs143283559 and rs28360516)

were independently found in a meta-analysis of the PGC results

5

with another large-scale sample which identiﬁed 50 novel SZ

SNPs

28

. Two of the 21 variants (rs756912, rs7593947) are in LD

with loci recently reported in a study that also compared GWAS

ﬁndings from EA and SZ using smaller samples and a less

con-servative statistical approach

29

. The remaining two SNPs we

identiﬁed here (rs7336518 on chr13 and rs7522116 on chr1) add

to the list of empirically plausible candidate loci for SZ.

Using a proportions test, we compared the ratio of novel SNPs

from ref.

28

included in our list of 132 loci that are jointly

associated with EA and SZ (P

EA

< 10

−5

and P

SZ

< 0.05, yielding 6

loci) with the ratio observed in all remaining approximately

independent loci with P

SZ

< 0.05 in our SZ GWAS results. We

found that the proportion of novel SZ SNPs is higher among the

132 loci that are informed by the EA GWAS results (Fisher’s

exact test P

= 2.4 × 10

−9

, two-sided). Thus, using EA as a

proxy-phenotype for SZ helped to predict the novel genome-wide

signiﬁcant ﬁndings reported in ref.

28

_{, illustrating the power of the}

proxy-phenotype approach.

Detection of shared causal loci. The next step in our study was a

series of analyses that aimed to identify reasons for the observed

genetic dependence between EA and SZ and to put the

ﬁndings of

the PPM analysis into context. First, we probed if there is

evi-dence that the loci identiﬁed by the PPM may tag shared causal

loci for both EA and SZ (i.e. pleiotropy), rather than being in LD

with different causal loci for both traits.

For each of the 21 SNPs isolated by our PPM analysis, we

looked at their neighbouring SNPs within a ±500 kb window and

estimated their posterior probability of being causal for EA or SZ

using PAINTOR

30

_{. We then selected two sets of SNPs, each of}

which contains the smallest number of SNPs that yields a

cumulative posterior probability of 90% or 50% of containing

the causal locus for EA and SZ. We refer to these as broad sets

(90%) and narrow sets (50%), respectively. Supplementary

Note

3 and Supplementary Data

4 also contain results for the

80 and 65% credibility sets. For each of these sets, we calculated

the posterior probability that it contains the causal locus for the

other trait.

For the broad credibility set analyses (90%), we found 11 loci

with a medium or high credibility to have direct causal effects on

both EA and SZ (including one of the novel SNPs, rs7336518). Six

of these loci have concordant effects on the two traits (i.e.

++ or

−−) while ﬁve have discordant effects (i.e. +− or −+, Table

1 ).

The analyses of the 50% credible sets are based on a smaller number

of SNPs. This also results in lower probabilities that the SZ set

contains the causal SNP for EA and vice versa. Nevertheless, our

analysis with 50% credible sets show that four speciﬁc loci

(rs7610856, rs320700, rs79210963 and rs7336518) had credibility

of more than 15% for the other trait, providing support for the high

(rs320700 and rs79210963) and medium (rs7610856 and

rs7336518) credibility judgments based on the 90% sets. One of

these has a discordant effect (rs7610856) while the others have a

concordant effect on SZ and EA.

Overall, our analyses suggest that some of the 21 SNPs that we

identiﬁed by using EA as a proxy-phenotype for SZ are likely to

have direct pleiotropic effects on both traits. Of the most likely

candidates for direct pleiotropic effects, three SNPs have

concordant signs (rs79210963, rs7336518 and rs320700) and

one has discordant signs (rs7610856).

Biological annotations. Biological annotation of the 132 SNPs

that are jointly associated with EA (P

EA

< 10

−5

) and SZ (P

SZ

<

0.05) using DEPICT (Supplementary Note

4 and Supplementary

Data

5 –

7 ) points to genes that are known to be involved in

neurogenesis and synapse formation (Supplementary Data

8 ).

Some of the indicated genes, including SEMA6D and CSPG5,

have been suggested to play a potential role in SZ

31,32

.

For the two novel candidate SNPs reported in this study

(rs7522116 and rs7336518), DEPICT points to the FOXO6

(Forkhead Box O6) and the SLITRK1 (SLIT and NTRK Like

(4)

Family Member 1) genes, respectively. FOXO6 is predominantly

expressed in the hippocampus and has been suggested to be

involved in memory consolidation, emotion and synaptic

function

33,34

. Similarly, SLITRK1 is also highly expressed in the

brain

35

_{, is particularly localized to excitatory synapses and}

promotes their development

36

_{, and it has previously been}

suggested to be a candidate gene for neuropsychiatric disorders

37

.

LD-aware enrichment across different traits. The raw

enrich-ment P value reported in Fig.

2 b could in principle be due to the

LD structure of the EA lead SNPs that we tested. Speciﬁcally, if

these EA lead SNPs have stronger LD with other SNPs in the

human genome than expected by chance, this could cause the

observed enrichment of this set of SNPs on SZ and other traits

because higher LD increases the chance these SNPs would

‘tag’

causal SNPs that they are correlated with

38,39

.

To assess the null hypothesis that the observed genetic

dependence between EA and SZ can be entirely explained by

LD patterns in the human genome, we developed an association

enrichment test that corrects for the LD score of each SNP. We

applied this test to the 132 SNPs that are jointly associated with

EA (P

EA

< 10

−5

) and SZ (P

SZ

< 0.05), i.e. the loci that were

identiﬁed by using EA as a proxy-phenotype for SZ. LD scores

were obtained from the HapMap 3 European reference panel

40

(Supplementary Data

9 ). We found signiﬁcant joint LD-aware

enrichment for SZ (P

= 9.57 × 10

−66

), demonstrating that the

genetic dependence between EA and SZ cannot be entirely

explained by LD.

Furthermore, we used this test to explore if these SNPs are

generally enriched for association with all (brain-related)

pheno-types, or whether they exhibit some degree of outcome speciﬁcity.

For this purpose, we extended the LD-aware enrichment test to

21 additional traits for which GWAS results were available in the

1) EA GWAS (Okbay et al. 2016)

N = 363,502 individuals

12,299,530 SNPs

2) SZ GWAS (Ripke et al. 2014)

(PGC - GRAS excluded) N = 34,409 cases N = 45,670 controls 17,221,718 SNPs 8,240,280 SNPs Overlap EA_all EA_all pEA < 10 –5 506 SNPs 132 SNPs 21 SNPs Bonferroni p_SZ= 0.05/506 2 SNPs novel for SZ SZ_all EA_132 SZ_132 8,240,280 SNPs 3) Proxy-phenotype

4) Schizophrenia diagnosis prediction in the GRAS sample

N = 1054 cases /N = 1169 controls

5) Schizophrenia symptom prediction in the GRAS sample

N = 1054 cases P olygenic r isk scores P olygenic r isk scores P olygenic r isk score P olygenic r isk scores pSZ < 0.05 Concordant SNP effect EA SZ 8,240,280 SNPs Discordant + + – – + – + –

a

b

c

Fig. 1 Workﬂow of the proxy-phenotype analyses. Notes: Educational attainment (EA) and schizophrenia (SZ) GWAS results are based on the analyses reported in refs.5,8_{. All cohorts that were part of the SZ GWAS were excluded from the meta-analysis on EA. The GRAS data collection was not included in} either the SZ or the EA meta-analysis. Proxy-phenotype analyses were conducted using 8,240,280 autosomal SNPs that passed quality control. Genetic outliers of non-European descent (N = 13 cases) were excluded from the analysis in the GRAS data collection

(5)

public domain (Supplementary Fig.

5 and Supplementary

Data

10 ). Some of the traits were chosen because they are

phenotypically related to SZ (e.g. neuroticism, depressive

symptoms, major depressive disorder, autism and childhood

IQ), while others were less obviously related to SZ (e.g. age at

menarche, intracranial volume and cigarettes per day) or served

as negative controls (height, birth weight, birth length and fasting

(pro)insulin). The power of the LD-aware enrichment test

primarily depends on the GWAS sample size of the target trait

and results of our test would be expected to change as GWAS

sample sizes keep growing. We found LD-aware enrichment of

these SNPs for BIP, neuroticism, childhood IQ and age at

menarche. However, we found no LD-aware enrichment for other

brain-traits that are phenotypically related to SZ, such as

depressive symptoms, subjective well-being, autism and attention

deﬁcit hyperactivity disorder. We also did not ﬁnd LD-aware

enrichment for most traits that are less obviously related to the

brain and our negative controls. Furthermore, one of the novel

SNPs we isolated shows signiﬁcant LD-aware enrichment both

for SZ and for BIP (rs7522116). The results suggest that the loci

identiﬁed by the PPM are not simply related to all (brain) traits.

Instead, they show some degree of phenotype speciﬁcity.

Replication in the GRAS sample. Following our preregistered

analysis plan (

https://osf.io/dnhfk/

), we replicated the PPM

ana-lysis results in the GRAS sample (Supplementary Note

5 and

Supplementary Data

11 ) using polygenic prediction

(Supple-mentary Note

6 , Supplementary Data

12 and Supplementary

Fig.

6 ). The PGS (SZ_132) that is based on the 132 independent

EA lead SNPs that are also nominally associated with SZ (P

EA

<

10

−5

and

P

SZ

< 0.05)

adds

ΔR

2

= 7.54 − 7.01% = 0.53%

predictive accuracy for SZ case–control status to a PGS (SZ_all)

derived from the GWAS on SZ alone (P

= 1.7 × 10

−4

,

Supple-mentary Data

13 , Model 3).

Prediction of SZ measures among patients. To explore the

genetic architecture of speciﬁc SZ measures, we again used our

replication sample (GRAS), which contains exceptionally detailed

measures of SZ symptoms, severity and disease history

4,7,27

. We

focused on years of education, age at prodrome, age at disease

onset, premorbid IQ (approximated by a multiple-choice

voca-bulary test), global assessment of functioning (GAF), the clinical

global impression of severity (CGI-S) as well as positive and

negative symptoms (PANSS positive and negative, respectively)

among SZ patients (N ranges from 903 to 1039, see

Supple-mentary Note

5 ). Consistent with the idea that EA is a predictor

of SZ measures, our phenotypic correlations show that higher

education is associated with later age at prodrome, later onset of

disease and less severe disease symptoms among SZ patients

(Supplementary Note

7 , Supplementary Data

15 and

Supple-mentary Fig.

7 ).

Our most direct test for genetic heterogeneity of SZ is based on

PGS analyses that we performed using the detailed SZ measures

among GRAS patients. If SZ is genetically heterogeneous, there is

potentially relevant information in the sign concordance of

individual SNPs with EA traits that may improve the prediction

of symptoms (Supplementary Note

1 ). We use a simple method

to do this here:

ﬁrst, we construct a PGS for SZ that contains one

SNP per LD-block that is most strongly associated with SZ.

Overall, this score (SZ_all) contains 349,357 approximately

LD-independent SNPs. Next, we split SZ_all into two scores, based on

sign-concordance of the SNPs with SZ and EA. More speciﬁcally,

Table 1 SNPs signi

ﬁcantly associated with schizophrenia after Bonferroni correction

SNP-ID EA beta Signs

concordant SZ adj.R2 (%) SZ OR (Adj.) EAF Powerα = 0.05/506 (%) Chance of direct pleiotropic effect on EA and SZ

Posterior probability of true association with SZ prior belief (π) (%) 90% sets 50% sets 0.1% 1.0% 5.0% 10.0% rs79210963 −0.016 Yes 0.021 0.931 0.89 22.9 H M 75.0 96.8 99.3 99.7 rs7610856 0.013 No 0.022 0.955 0.41 22.8 M M 74.9 96.8 99.3 99.7 rs10896636 0.012 No 0.020 0.956 0.67 17.8 H L 68.7 95.6 99.1 99.5 rs756912 −0.015 Yes 0.022 0.956 0.51 22.7 L L 74.8 96.7 99.3 99.7 rs6449503 0.018 No 0.020 0.961 0.51 12.9 L L 60.0 93.7 98.7 99.3 rs7336518 −0.016 Yes 0.014 0.964 0.13 1.5 M M 13.4 60.6 88.5 93.9 rs143283559 0.014 No 0.017 0.965 0.72 4.6 M L 32.8 83.0 96.1 98.0 rs11210935 0.015 No 0.014 0.973 0.77 1.2 L L 10.9 55.1 86.0 92.5 rs77000541 −0.014 Yes 0.018 0.974 0.33 1.6 L L 14.1 62.2 89.2 94.3 rs2819344 0.014 No 0.017 0.983 0.62 0.3 H L 3.0 23.3 60.4 75.3 rs4500960 −0.013 No 0.017 1.017 0.47 0.3 L L 3.0 23.3 60.4 75.3 rs28360516 _−0.012 No 0.013 1.027 0.70 1.4 M L 12.6 59.0 87.8 93.5 rs7522116 0.011 Yes 0.015 1.029 0.56 3.0 M L 23.8 75.8 94.0 96.9 rs7593947 0.014 Yes 0.018 1.040 0.51 12.5 M L 59.1 93.5 98.6 99.3 rs11694989 0.011 Yes 0.021 1.044 0.43 17.9 L L 68.8 95.7 99.1 99.5 rs320700 0.013 Yes 0.024 1.054 0.65 36.4 H M 85.3 98.3 99.7 99.8 rs3957165 0.015 Yes 0.020 1.056 0.83 14.7 L L 63.6 94.6 98.9 99.4 rs10791106 0.011 Yes 0.026 1.056 0.54 46.9 L L 89.9 98.9 99.8 99.9 rs2992632 0.016 Yes 0.025 1.060 0.74 36.8 M L 85.5 98.3 99.7 99.8 rs10773002 0.022 Yes 0.043 1.087 0.28 91.0 L L 99.0 99.9 100.0 100.0 rs4378243 0.019 Yes 0.044 1.112 0.85 91.5 L L 99.1 99.9 100.0 100.0

Notes: The SNPs in the table are ordered by their odds ratio (OR) on schizophrenia (SZ). Effect sizes for SZ (in R2_{and OR) are downward adjusted for the winner}_{’s curse}25_{. EA (beta) is the standardized}

beta of a SNP for educational attainment (EA) GWAS. R2_{was approximated from the winner’s curse adjusted OR ratios, using the formulas described in Methods section. The winner’s curse adjustment}

took into account that only SNPs with P = 0.05/506 were selected. SNPs with concordant effects on both SZ and EA are marked as ‘yes’ in the sign concordance column. EAF is the effect allele frequency in the SZ GWAS data. Power calculations assumed that the available GWAS sample size for SZ for each SNP consisted of 34,409 cases and 45,670 controls. The chance that a SNP has direct pleiotropic

effects on EA and SZ has been evaluated with PAINTOR using sets of SNPs that have a cumulative probability of 90% or 50% to include the causal variant (see Methods and Supplementary Note3). The

posterior probability that these SNPs are truly associated with SZ was calculated using the Bayesian procedure developed by Rietveld et al.25_{. SNPs highlighted in bold are associations for SZ that have} not been emphasized in the previous literature. H high, M medium, L low.

(6)

one score contains all estimated SZ effects of SNPs that have

concordant signs for both traits (174,734 SNPs with

++ or −−

on both traits, Concordant) while the other contains the estimated

SZ effects of the remaining SNPs with discordant effects (174,623

SNPs with

+− or −+, Discordant). Note that splitting the SZ_all

score this way is not expected to improve the prediction of

symptoms if they share the same genetic architecture (i.e. if SZ

was a genetically homogenous trait). We test this null hypothesis

with an F test that compares the predictive performance of

models that include (i) the SZ_all and the EA score (EA_all) and

(ii) the Concordant, Discordant, and EA_all scores

(Supplemen-tary Note

1 ). We also compare the performance of both of these

models to a baseline that only includes the SZ_all score as a

relevant predictor.

We found that the EA_all PGS is associated with years of

education (P

= 1.0 × 10

−6

) and premorbid IQ (P

= 2.7 × 10

−4

)

among SZ patients (Table

2 ). Consistent with earlier results

4

_{, we}

also found that none of the SZ measures can be predicted by the

PGS for SZ (SZ_all, Table

2 ). However, splitting the PGS for SZ

based on the sign-concordance of SNPs with EA (Concordant and

Discordant) increased predictive accuracy signiﬁcantly for

severity of disease (GAF (p

F

= 0.023)) and symptoms (PANSS

negative (p

F

= 0.007)) (Table

2 ). This increase in predictive

accuracy is evidence for genetic heterogeneity of SZ

(Supplemen-tary Note

1 ). Speciﬁcally, our results indicate that SZ patients with

a high genetic propensity for EA have better GAFs and less severe

negative symptoms (PANSS negative). However, if the high

genetic predisposition for EA is primarily due to loci that also

increase the risk for SZ (i.e. high values on the Concordant score),

this protective effect is attenuated. We repeated these analyses

excluding patients who were diagnosed with schizoaffective

disorder (SD, N

= 198) and found similar results, implying that

our

ﬁndings are not only due to the presence of patients with SD

(Supplementary Note

8 , Supplementary Data

18 ).

We note that this implementation of our test for heterogeneity

of SZ (Supplementary Note

1 ) is based on a conservative pruning

Significance of association (–log

10 ( P value)) 40 30 20 10 0 1 2 3 4 5 6 7 8 Chromosome Schizophrenia Observed –log 10 ( P value) 9 10 11 12 13 14 15 17 19 21 22 20 18 16 Lead SNPs (PSZ < 0.05) EA Lead SNPs (PSZ < 9.85×10 –5 ) P value = 1×10–5 8 6 4 2 Enrichment P value: 6.87×10–10 Sign concordance: 52%

Expected –log10 (P value)

2 0 4 6 8 0 rs4378243 (–log10 (P value) = 12.3) rs10773002 (–log10 (P value) = 11.5) rs10791106 rs2992632 rs756912 rs3957165 rs10896636 rs6449503 rs4500960 rs143283559 rs7522116 rs11210935 rs28360516 rs77000541 rs2819344rs7336518 rs7593947 rs11694989 rs79210963 rs7610856 rs320700

a

b

Fig. 2 Results of the proxy-phenotype analyses. Notes: a Manhattan plot for educational attainment (EA) associations (n = 363,502). The x axis is the chromosomal position, and the y axis is the significance on a −log10scale (two-sided). The black dashed line shows the suggestive significance level of 10−5that we specified in our preregistered analysis plan. Black and magenta crosses identify EA-associated lead-SNPs that are also associated with SZ at nominal or Bonferroni-adjusted significance levels, respectively. b Q–Q plot of the 506 EA-associated SNPs for schizophrenia (SZ) (n = 34,409 cases and n = 45,670 controls). SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the grey area (21 SNPs) pass the Bonferroni-corrected signi_{ficance threshold that corrects for the total number of SNPs we tested (P < 0.05/506 = 9.88 × 10}−5) and are labelled with their rs numbers. Observed and expected P values are on a −log10scale. For the sign concordance test: P = 0.40, two-sided

(7)

algorithm that controls for LD both within and across the

Concordant and Discordant scores. This limits the number of

genetic markers in both of these scores, their expected predictive

accuracy and the power of the test. As an alternative, we also used

a less conservative approach that only prunes for LD within

scores, yielding 260,441 concordant and 261,062 discordant

SNPs. Split scores based on this extended set of SNPs have higher

predictive accuracy for all the SZ measures that we analysed

(Supplementary Data

22 ), reaching

ΔR

2

_{= 1.12% (p}

F

= 0.0004)

for PANSS negative.

Finally, we show that randomly splitting the SZ_all score does

not yield any gains in predictive accuracy (Supplementary

Data

19 ).

Genetic differences between SZ and bipolar. The ongoing

debate about what constitutes the difference between SZ and

BIP

10–14

suggests an additional possibility to test for genetic

heterogeneity among SZ cases. While SZ and BIP share

psycho-tic symptoms such as hallucinations and delusions, scholars have

argued that SZ should be perceived as a neurodevelopmental

disorder in which cognitive deﬁcits precede the development of

psychotic symptoms, while this is not the case for BIP

10–14

_.

However, cognitive deﬁcits during adolescence are currently not a

diagnostic criterion that formally differentiates SZ from BIP. As a

result, many patients who are formally diagnosed with SZ did not

suffer from cognitive impairments in their adolescent years, but

their disease aetiology may be different from those who do. These

differences in disease aetiology may be visible in how the

non-shared part of the genetic architecture of SZ and BIP is related to

measures of cognition, such as EA and childhood IQ.

We tested this by using genome-wide inferred statistics

(GWIS)

41

_{to obtain GWAS regression coefﬁcients and standard}

errors for SZ that are

‘purged’ of their genetic correlation with

BIP and vice versa (yielding

‘unique’ SZ

(min BIP)

and

‘unique’

BIP

(min SZ)

results, respectively). We repeated the look-up of the

EA-associated lead SNPs in those summary statistics and

ﬁnd that

the enrichment is weaker than in the SZ GWAS results that did

not control for genetic overlap between SZ and BP

(Supplemen-tary Note

9 ).

We then computed genetic correlations of these GWIS results

with EA, childhood IQ and (as a non-cognitive control trait)

neuroticism using bivariate LD score regression

42

_{, and compared}

the results to those obtained using ordinary SZ and BIP GWAS

results (Supplementary Note

10 ).

In line with earlier

ﬁndings

8,42

, we see a positive genetic

correlation of ordinary SZ and BIP with EA. However, the genetic

correlations between

‘unique’ SZ

(min BIP)

with EA and childhood

IQ are negative and signiﬁcant (r

g

= −0.16, P = 3.88 × 10

−4

and

r

g

= −0.31, P = 6.00 × 10

−3

, respectively), while the genetic

correlations of

‘unique’ BIP

(min SZ)

with EA and IQ remain

positive (r

g

≈ 0.3) (Fig.

3 , Supplementary Data

24 ). Thus, the

slightly positive genetic correlation between SZ and EA

8,42

_{can be}

entirely attributed to the genetic overlap between SZ and BIP

41

_{, a}

result recently replicated using genomic structural equation

modelling

43

_{. Overall, these results add to the impression that}

current clinical diagnoses of SZ aggregate over various

non-identical disease aetiologies.

Simulating assortative mating. Finally, simulations show that

assortative mating is unlikely to be a major cause of the observed

level of genetic dependence between EA and SZ (Supplementary

Note

11 , Supplementary Fig.

9 ).

Discussion

We explored the genetic relationship between EA and SZ using

large, non-overlapping GWAS samples. Our results show that

EA-associated SNPs are much more likely to be associated with

SZ than expected by chance, i.e. both traits are genetically

dependent. Overall, we isolated 21 genetic loci that are credibly

associated with SZ by using EA as a proxy-phenotype, including

two novel candidate genes, FOXO6 and SLITRK1. Furthermore,

we showed that EA GWAS results help to predict future GWAS

ﬁndings for SZ in even larger samples.

Table 2 Polygenic prediction of schizophrenia measures in the GRAS patient sample

Years of educationa Age at prodrome Age at disease onset Premorbid IQa

GAFb CGI-Sb PANSS positiveb PANSS negativeb Baseline model SZ_all Stand. beta 0.001 −0.041 −0.056 −0.063 −0.024 0.041 0.033 0.043 P value 0.976 0.297 0.129 0.090 0.510 0.249 0.364 0.253 EA_all Stand. beta 0.182** 0.005 _−0.002 0.149** 0.068* _−0.057 0.001 _−0.051 P value 4.4 × 10−09 0.884 0.961 7.2 × 10−6 0.029 0.065 0.981 0.107 Adj. R² 0.0612 0.0023 0.0047 0.0417 0.0655 0.0816 0.0711 0.0243 ΔAdj. R²c _0.0312 _−0.0010 _−0.0009 _0.0209 _0.0035 _0.0023 _−0.0010 _0.0015 Split model Concordant Stand. beta −0.013 −0.019 −0.031 −0.043 −0.096 * _0.050 _0.079 _0.125** P value 0.751 0.665 0.456 0.326 0.022 0.232 0.059 0.0036 Discordant Stand. beta 0.014 _−0.030 _−0.035 _−0.034 0.066 <0.001 _−0.039 _−0.072 P value 0.730 0.515 0.409 0.437 0.112 0.996 0.351 0.090 EA_all Stand. beta 0.191** _0.002 _−0.002 _0.153** _0.122** _−0.074 _−0.039 _−0.118** P value 1.0 × 10−06 0.965 0.953 2.7×10−4 0.002 0.058 0.319 0.003 Adj. R² 0.0604 0.0012 0.0037 0.0406 0.0694 0.0811 0.0728 0.0306 ΔAdj. R²c _0.0304 _−0.0021 _−0.0019 _0.0198 _0.0074 _0.0018 _0.0007 _0.0078 n 1039 915 1043 903 1010 1014 1009 1002 ΔR² (Split−baseline model) −0.0008 −0.0011 −0.0010 −0.0011 0.0039 _{−0.0005 0.0017} 0.0063 P value from F testd _0.698 _0.907 _0.968 _0.891 _0.023* _0.479 _0.098 _0.007** Notes: Linear regression using the ﬁrst ten genetic principal components as control variablesa_{Age of onset was included as covariate}b_{Medication was included as covariate}c

Change in Adj. R2_{of the}

models compared to a model that only contains the SZ_all score and the control variablesd_{P value from F test refers to improvement in split model compared to baseline model *Signiﬁcance at P o 0.05}

(8)

Biological annotation of a broader set of SNPs that are jointly

associated with EA (P

EA

< 10

−5

) and SZ (P

SZ

< 0.05) points to

neurogenesis and synapse formation as potentially important

pathways that may inﬂuence both traits.

However, the genetic loci that are associated with both traits do

not follow a systematic sign pattern that would correspond to a

strong positive or negative genetic correlation. Our follow-up

analyses demonstrated that this pattern of strong genetic

dependence but weak genetic correlation between EA and SZ

cannot be fully explained by LD or assortative mating.

Instead, our results are most consistent with the idea that EA

and SZ are both genetically heterogeneous traits that aggregate

over various subphenotypes or symptoms with non-identical

genetic architectures. Speciﬁcally, our results suggest that current

SZ diagnoses aggregate over at least two disease subtypes: one

part resembles BIP and high IQ (possibly associated with

Con-cordant SNPs), where better cognition may also be genetically

linked to other BIP features such as higher energy and drive,

while the other part is a cognitive disorder that is independent of

BIP (possibly inﬂuenced by Discordant SNPs). This latter subtype

bears similarity with Kraepelin’s description of dementia

prae-cox

11

_{. Overall, our pattern of results resonates with the idea that}

cognitive deﬁcits in early life may be an important differentiating

factor between patients with BIP versus SZ psychosis.

Moreover, splitting the PGS for SZ into two scores based on the

sign concordance of SNPs with EA enables the prediction of

disease symptoms and severity from genetic data for the

ﬁrst time

to some extent. We showed that this result is not driven by

patients with SD and it cannot be repeated by randomly splitting

the SZ score. Obviously, further replication of our results in other

samples with high-quality SZ measures would be highly desirable.

The many sign-concordant loci that increase the risk for SZ but

also improve the chance for higher education point to possible

side-effects of pharmacological interventions that may aim to

target biological pathways that are implicated by pleiotropic loci.

Indeed, exploring pleiotropic patterns of disease-associated genes

across a broad range of phenotypes (including social-scientiﬁc

ones such as EA or subjective well-being

26

_{) may be a viable}

strategy to identify possible side-effects of new pharmacological

products at early stages of drug development in the future.

Although the complexity of SZ remains astonishing, our study

contributes to unravelling this complexity by starting at a genetic

level of analysis using well-powered GWAS results. Our results

provide some hope that a psychiatric nosology that is based on

biological causes rather than pure phenotypical classiﬁcations

may be feasible in the future. Studies that combine well-powered

GWASs of several diseases and from phenotypes that represent

variation in the normal range such as EA are likely to play an

important part in this development. However, deep phenotyping

of large patient samples will be necessary to link GWAS results

from complex outcomes such as EA and SZ to speciﬁc biological

disease subgroups.

Methods

GWAS. The principal investigators of all cohorts obtained informed consent from all study participants and approval from Institutional Review Boards (IRB) at their respective institution. We obtained GWAS summary statistics on EA from the SSGAC. The results are based on Okbay et al.8_{, including the UK Biobank. The} PGC shared GWAS summary statistics on SZ with us that were reported in Ripke et al.5_{, but excluded data from our replication sample (GRAS), yielding a total} sample size of n= 34,409 cases and n = 45,670 controls.

All cohorts that were part of both studies5,8_{were excluded from the} meta-analysis on EA, yielding non-overlapping GWAS samples and nEA= 363,502. The original EA resultsﬁle contained 12,299,530 genetic markers, compared to 17,221,718 in the SZ resultsﬁle.

We applied the following additional quality control steps:

1. To maximize statistical power, we excluded SNPs that were missing in large parts of the two samples. Speciﬁcally, we continued with SNPs that were available in at least 19 out of 50 cohorts in the SZ results5_{(the actual N per} SNP was not provided in the SZ GWAS summary statistics) and in N > 200,000 in the EA meta-analysis8_{. This step excluded 3,778,914 and 6,369,138} genetic markers for EA and SZ, respectively.

2. We dropped SNPs that were not available in both GWAS resultsﬁles. This step restricted our analyses to the set of available genetic markers that passed the quality-controlﬁlters in both the EA and the SZ GWAS results, leaving us with 8,403,560 autosomal SNPs.

3. We dropped six SNPs with non-standard alleles (i.e. not A, C, T or G) and two SNPs with mismatched effective alleles. Furthermore, we dropped 163,272 SNPs in thefirst and the 99th percentile of the distribution of differences in MAF in the two resultsfiles. This final step eliminated SNPs that were likely to be affected by coding errors, strandflips or substantial differences in MAF in the EA and SZ samples.

The remaining 8,240,280 autosomal SNPs were used in the proxy-phenotype and prediction analyses.

Proxy-phenotype method. Look-up: We conducted our proxy-phenotype analyses following a pre-registered analysis plan (https://osf.io/dnhfk/), using the 8,240,280 autosomal SNPs that passed quality control. We selected 10−5as the default P value threshold to identify EA-associated SNPs prior to carrying out the proxy-phenotype analyses (Supplementary Note2).

To select approximately independent SNPs from the EA GWAS results, we applied the clumping procedure in PLINK version 1.944,45_{using r}2_{> 0.1 and}

1,000,000 kb as the clumping parameters and the 1000 Genomes phase 1 version 3 European reference panel46_{to estimate LD among SNPs. This algorithm assigns} the SNP with the smallest P value as the lead SNP in its‘clump’. All SNPs in the vicinity of 1,000,000 kb around the lead SNP that are correlated with it at r2_{> 0.1}

are assigned to this clump. The next clump is formed around the SNP with the next smallest P value, consisting of SNPs that have not been already assigned to theﬁrst clump. This process is iterated until no SNPs remain with P < 10−5, leading to 506 approximately independent EA-associated lead SNPs. 108 of the 506 EA-associated lead SNPs are genome-wide signiﬁcant (P < 5 × 10−8_).

We looked up the SZ GWAS results for these 506 EA-associated lead SNPs. Results for all 506 SNPs are reported in Supplementary Data2and Fig.2.

In order to investigate the novelty of theﬁndings, we extracted all the SNPs in LD with these 21 SNPs at r2_{≥ 0.1 with a maximum distance of 1000 kb using the}

1000 Genomes phase 1 European reference panel.

Bayesian credibility of results: We probed the credibility of our proxy-phenotype association results using a heuristic Bayesian calculation following Rietveld et al. (Supplementary Information pp.13–15)25_{. We focus on the 21} EA-associated lead SNPs that are also EA-associated with SZ after Bonferroni correction.

Educational attainment Schizophrenia (SZ) 1.0

*

rg 0.8 0.6 0.4 0.2 0.0 –0.2 –0.4 –0.6 –0.8 –1.0

GWIS schizophrenia_{(min BIP)} Bipolar disorder (BIP) GWIS bipolar disorder(min SZ)

Childhood IQ Neuroticism Educational attainment Schizophrenia (SZ) GWIS schizophrenia (min BIP)

Bipolar disorder (BIP) GWIS bipolar disorder

(min SZ)

Childhood IQNeuroticism

Fig. 3 Genetic correlations of GWAS and GWIS results. Notes: The heatmap displays the genetic correlations across seven sets of GWAS or GWIS summary statistics. Genetic correlations were estimated with LD score regression42._{The colour scale represents the genetic correlations}

ranging from–1 (red) to 1 (blue). Asterisks denote signiﬁcant genetic correlations at P value < 0.01

(9)

Bayes’ rule implies that the probability that an association is true given that we observe signiﬁcance is given by

P H 1jt>tα=2¼ P t>tð α=2jH1ÞP Hð Þ1

P t>tð α=2jH1ÞP Hð ÞþP t>t1 ð α=2jH0ÞP Hð Þ0

¼ðpowerðpowerÞðπÞÞ πð ÞþðαÞð1πÞ

‘Power’, as well as the signiﬁcance test, are two-sided, π is the prior belief that the SNP is truly associated andα is the signiﬁcance threshold used for testing (in our case,α =0:05₅₀₆= 9.88 × 10−5).

To calculate power for each SNP, we computed the winner’s curse corrected OR using the procedure described in Rietveld et al. (Supplementary Information pp.7–

13)47_{for the}_{α threshold of 9.88 × 10}−5_{. Because the actual sample size per SNP is}

not reported in the SZ GWAS summary statistics, we furthermore assumed that each SNP was available in the entire sample of 34,409 cases and 45,670 controls (i.e. the PGC results from Ripke et al.5_{excluding the GRAS data collection).}

An important question is which prior beliefs are reasonable starting points for these Bayesian calculations. For an arbitrarily chosen SNP, the most conservative reasonable prior would assume that each truly associated SNP has the same effect size as the strongest effect size that was actually observed in the data. If one divides the SNP-based heritability of the trait by that effect size in R2_{units, one obtains a}

lower bound for the number of SNPs that can be assumed to be truly associated. To aid this line of thinking, we converted the winner’s curse corrected OR of our 21 SNPs into R2_using

R2_¼ _{ffiffiffiffiffiffiffiffiffiffiffiffiffi}d d2_{þ a} p

2

where d is Cohen’s d, which is calculated as d¼ lnðOddsÞ

ffiffiffi 3 p

π

and a is a correction factor that adjusts for the MAF of the SNP. This correction factor is calculated as

a¼ðn1þ n2Þ 2 n1n2 where n1= N × MAF and n2= N × (1 − MAF)48_.

The largest effect size in R2_{that we observe in our results is rs4378243 with}

0.044%. The SNP-based heritability of SZ is≈21%49_{. Thus, if all causal SZ SNPs} would have an effect of R2_{= 0.044%, we would expect that ≈500 truly causal loci}

exist. The chance ofﬁnding any one of them by chance from a set of ≈500,000 independent loci in the human genome is≈0.1%. (Our pruning algorithm of SNPs that passed QC leads to only 223,065 independent loci. Thus, assuming 500,000 independent loci in these calculations is conservative.) However, in reality most truly associated loci for SZ will surely have smaller effects than that. Thus, a prior belief of≈0.1% is certainly too conservative.

Furthermore, the SNPs we investigate are not arbitrary but selected based on their association with another, genetically related cognitive trait (EA) in a very large, independent sample. Thus, a prior belief of 1 or 5% that these SNPs are also associated with SZ is probably more reasonable. As an upper bound, we assume that 10% of all loci are causal. Thus, the chance to pick any one of them by chance would be 10%.

Table1displays the winner’s curse corrected effect size of the 21 EA-associated lead SNPs that are also associated with SZ after Bonferroni correction. It also shows the posterior probability that these SNPs are truly associated with SZ given our results for prior beliefs ranging from 0.1, 1, 5 to 10%. Thirteen of these SNPs have posterior probabilities of being true positives of >50% for even the most conservative prior. For a more realistic prior belief of 5%, all 21 SNPs are likely or almost certain to be true positives.

Sign concordance: We compared the signs of the beta coefﬁcients of the 506 EA lead SNPs (PEA< 10−5) with the beta coefﬁcients for SZ. If the signs were aligned, we assigned a‘1’ to the SNP and ‘0’ otherwise. By chance, sign concordance is expected to be 50%. We tested if the observed sign concordance is different from 50% using the binomial probability test50_{. 263 of the 506 SNPs have the same sign} (52%, P= 0.40, two-sided).

Sign concordance is 58% (P= 0.10, two-sided) in the set of 132 EA lead SNPs that are also nominally signiﬁcant for SZ (PEA< 1 × 10−5and PSZ< 0.05).

Finally, for the 21 SNPs that passed Bonferroni correction for SZ (PEA< 1 × 10−5and PSZ< 9.88 × 10−5), sign concordance is 62% (P= 0.38, twosided).

Raw enrichment factor (not corrected for LD score of SNPs): Because EA and SZ are highly polygenic, we tested for enrichment by taking the actual distribution of P values in the GWAS resultﬁles into account.

Due to the polygenic architecture of both traits, it is expected toﬁnd some EA-associated SNPs that are also EA-associated with SZ just by chance even if both traits are genetically independent. Under this null hypothesis, the expected number of

EA-associated lead SNPs that are also signiﬁcantly associated with SZ is

E_H₀hN_S;EA!SZi¼ NT;EA´ τPEA´ τPSZ

where NT,EAis the total number of independent lead SNPs in the EA GWAS results, andτPEAandτPSZare the shares of SNPs in NT,EAthat have P values for EA and SZ below a certain threshold, respectively.

We deﬁne the raw enrichment factor as

NS;EA!SZ=E NS;EA!SZh i

where NS,EA→SZis the observed independent number of SNPs that pass both the P value thresholds PEAand PSZ.

We obtained NT,EAby applying the clumping procedure described above (PPM) without a P value threshold for EA, leading to 222,289 independent EA lead SNPs in our merged GWAS resultsﬁle. For PEA< 10−5, we found 506 SNP (τP_EA= 506

222;289 = 0.2276%).

The Bonferroni threshold for testing 506 independent hypothesis is PSZ<0:05₅₀₆= 9.88 × 10−5. There are 341 independent SNPs in the SZ results that pass this threshold, thusτPSZ=

341

222;289= 0.1534%. Therefore, we expect [NS,EA→SZ]=

222,289 × 0.2276% × 0.1534%= 0.776 (i.e. less than one) SNP to be jointly associated with both traits under the hull hypothesis of no genetic overlap. At these P value thresholds, we actually observe NS,EA→SZ= 21 SNPs, implying a raw enrichment factor of 21

0:776= 27.

For PSZ< 0.05, we found 17,935 SNP (τP_SZ=_222;28917;935= 8.068%). Thus, [NS,EA→SZ] = 222,289 × 0.2276% × 8.068% = 41. At this more liberal P value threshold, we actually observe NS,EA→SZ= 132 SNPs, implying a raw enrichment factor of132

41=

3.23.

Raw enrichment P value (not corrected for LD score of SNPs): Following Okbay et al.26_{, we performed a non-parametric test of joint enrichment that probes} whether the EA lead SNPs are more strongly associated with SZ than randomly chosen sets of SNPs with MAF within one percentage point of the lead SNP. To perform our test, we randomly drew ten matched SNPs for each of the 506 EA lead SNPs with PEA< 10−5.

We then ranked the 506 × 10 randomly matched SNPs and the original 506 lead EA SNPs by P value and conducted a Mann–Whitney test51_{of the null hypothesis} that the P value distribution of the 506 EA lead SNPs are drawn from the same distribution as the 506 × 10 randomly matched SNPs. We reject the null hypothesis with P= 6.872 × 10−10(Z= 6.169, two-sided). As a negative control test, we also calculated the raw enrichment P value of theﬁrst randomly drawn, MAF-matched set of SNPs against the remaining nine sets, yielding P= 0.17.

Repeating this raw enrichment test for the subset of 21 EA-associated SNPs that remained signiﬁcantly associated with SZ after Bonferroni correction (threshold PSZ<0:05₅₀₆= 9.88 × 10−5_{) yields P}_{= 5.44 × 10}−14_(Z_{= 7.521, two-sided). The}

negative control test based on the raw enrichment P value of theﬁrst randomly drawn, MAF-matched set of SNPs against the remaining nine sets yields P= 0.34. GWAS catalogue look-up. In order to investigate the novelty of the 21 SNP associations that were found signiﬁcant for SZ after Bonferroni correction, reported in Table1, we performed a look-up in the GWAS catalogue52_{(revision 2016-08-25,} downloaded on 2016-08-29,https://www.ebi.ac.uk/gwas/api/search/downloads/ full) with the SNPs and all their‘LD partners’ (i.e. all SNPs with an r2_{> 0.5 within a}

250 kb window). The LD partners were extracted with PLINK44_{using a version of} the 1000G reference panel speciﬁcally harmonized to combine 1000G phase 1 and phase 3 imputed data53_{, and the reference panel has been described previously}26_. The result of the GWAS catalogue look-up is reported in Supplementary Data3. Prediction of future GWAS loci for SZ. To identify LD partners and to clump our GWAS results, we used a threshold of r2_{> 0.1 and a 1,000,000 kb window in the}

1000 Genomes phase 1 version 3 European reference panel. Our SZ summary statistics contained 51,721 approximately independent SNPs with PSZ< 0.05. We identified 21,430 SNPs in LD with the 50 novel SNPs reported in ref.28_{and 54,425} SNPs in LD with the 128 genome-wide significant loci that were previously reported5_{. We removed SNPs in LD with the previously GWAS hits from our} analyses because those SNPs could (by definition) not be identified as novel. The remaining set of 51,528 approximately independent SNPs with PSZ< 0.05 in our SZ GWAS results contained one proxy for each of the 50 novel SNPs in ref.28_{. After} removing SNPs in LD with previous GWAS hits, 110 SNPs with PSZ< 0.05 also exhibited PEA< 10−5in the independent EA GWAS sample. Of those 110 SNPs, six were identified as novel SZ loci in the most recent GWAS dataset expansion28_. Using Fisher’s exact test, we rejected the null hypothesis that the proportion of novel SNPs (6/110 vs 50/51528) is equal in the two sets (P= 2.4 × 10−9_{, two-sided).}

Furthermore, as a robustness check, we performed the analysis again by excluding the SNPs with MAF≤ 0.1 and found similar results (P = 1.2 × 10−6). Thus, we conclude that conditioning GWAS results on SZ with independent GWAS evidence on EA signiﬁcantly outperforms pure chance in predicting GWAS results on SZ from even larger samples.