Genome-wide association study results for
educational attainment aid in identifying genetic
heterogeneity of schizophrenia
V. Bansal
1,2,3
, M. Mitjans
1,4
, C. A. P. Burik
5,6,7
, R.K. Linnér
5,6,7
, A. Okbay
5,7
, C.A. Rietveld
6,8
,
M. Begemann
1,9
, S. Bonn
2,3
, S. Ripke
10,11,12
, R. de Vlaming
5,7
, M. G. Nivard
13
,
H. Ehrenreich
1,4
& P. D. Koellinger
5,6,7
Higher educational attainment (EA) is negatively associated with schizophrenia (SZ).
How-ever, recent studies found a positive genetic correlation between EA and SZ. We investigate
possible causes of this counterintuitive
finding using genome-wide association study results
for EA and SZ (N = 443,581) and a replication cohort (1169 controls; 1067 cases) with deeply
phenotyped SZ patients. We
find strong genetic dependence between EA and SZ that cannot
be explained by chance, linkage disequilibrium, or assortative mating. Instead, several genes
seem to have pleiotropic effects on EA and SZ, but without a clear pattern of sign
con-cordance. Using EA as a proxy phenotype, we isolate FOXO6 and SLITRK1 as novel candidate
genes for SZ. Our results reveal that current SZ diagnoses aggregate over at least two disease
subtypes: one part resembles high intelligence and bipolar disorder (BIP), while the other part
is a cognitive disorder that is independent of BIP.
DOI: 10.1038/s41467-018-05510-z
OPEN
1Clinical Neuroscience, Max Planck Institute of Experimental Medicine, Hermann-Rein-Straße 3, 37075 Göttingen, Germany.2Research Group for
Computational Systems Biology, German Center for Neurodegenerative Diseases (DZNE), Von-Siebold-Straße 3A, 37075 Göttingen, Germany.3Institute of Medical Systems Biology, Center for Molecular Neurobiology, University Clinic Hamburg-Eppendorf, Falkenried 94, 20251 Hamburg, Germany.4DFG Research Center for Nanoscale Microscopy and Molecular Physiology of the Brain (CNMPB), Humboldtallee 23, 30703 Göttingen, Germany.5Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1085 B-631, 1081 HV Amsterdam, Netherlands.6Institute for Behavior and Biology, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, Netherlands.7School of Business and Economics, Department of Economics, De Boelelaan 1105, 1081 HV Amsterdam, Netherlands.8Erasmus School of Economics, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, Netherlands.9Department of Psychiatry & Psychotherapy, University of Göttingen, Von-Siebold-Straße 5, 37075 Göttingen, Germany.10Analytic and Translational Genetics Unit, Massachusetts General Hospital, 02114 MA Boston, USA.11Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 02142 MA Cambridge, USA.12Department of Psychiatry and Psychotherapy, Charité-Universitätsmedizin Berlin, Campus Mitte, Berlin 10117, Germany.13Department of
Biological Psychology, Vrije Universiteit Amsterdam, van der Boechorststraat 1, 1081 BT Amsterdam, Netherlands. These authors contributed equally: V. Bansal, M. Mitjans. These authors jointly supervised this work: M.G. Nivard, H. Ehrenreich, P.D. Koellinger. Correspondence and requests for materials should be addressed to P.D.K. (email:p.d.koellinger@vu.nl)
123456789
S
chizophrenia (SZ) is the collective term used for a severe,
highly heterogeneous and costly psychiatric disorder that is
caused by environmental and genetic factors
1–4. A
genome-wide association study (GWAS) by the Psychiatric Genomics
Consortium (PGC) identified 108 genomic loci that are associated
with SZ
5. These 108 loci jointly account for
≈3.4% of the variation
on the liability scale for SZ
5, while all single-nucleotide
poly-morphisms (SNPs) that are currently measured by SNP arrays
capture
≈64% (s.e. = 8%) of the variation in liability for the
dis-ease
6. This implies that many genetic variants with small effect
sizes contribute to the heritability of SZ, but most of them are
unidentified as of yet. A polygenic score (PGS) based on all SNPs
currently accounts for 4–15% of the variation on the liability scale
for SZ
5.
Yet, this PGS does not predict any differences in symptoms or
severity of the disease among SZ patients
4. Partly, this could be
because the clinical disease classification of SZ spans several
different behavioural and cognitive traits that may not have
identical genetic architectures. Therefore, identifying additional
genetic variants and understanding through which pathways they
are linked with the clinical diagnosis of SZ is an important step in
understanding the aetiologies of the
‘schizophrenias’
7. However,
GWAS analyses of specific SZ symptoms would require very large
sample sizes to be statistically well-powered, and the currently
available data sets on deeply phenotyped SZ patients are not yet
large enough for this purpose.
Here, we use an alternative approach to make progress with
data that is readily available—by combining GWAS for SZ and
educational attainment (EA). Previous studies suggest a complex
relationship between EA and SZ
8that may be used to gain
additional insights into the genetic architecture of SZ and its
symptoms. In particular, phenotypic data seem to suggest a
negative correlation between EA and SZ
9. For example, SZ
patients with lower EA typically show an earlier age of disease
onset, higher levels of psychotic symptomatology and worsened
global cognitive function
9. In fact, EA has been suggested to be a
measure of premorbid function and a predictor of outcomes in
SZ. Moreover, it has been forcefully argued that retarded
intel-lectual development, global cognitive impairment during
child-hood and bad school performance should be seen as core features
of SZ that precede the development of psychotic symptoms and
differentiate SZ from bipolar disorder (BIP)
10–14. Furthermore,
credible genetic links between SZ and impaired cognitive
per-formance have been found
15.
In contrast to these
findings, recent studies using large-scale
GWAS results identified a small, but positive genetic correlation
between EA and SZ (ρ
EA,SZ= 0.08)
8, and higher PGS values for
SZ have been reported to be associated with creativity and
greater EA
16. Other statistically well-powered studies found
that a high intelligence quotient (IQ) has protective effects
against SZ
17and reported a negative genetic correlation
between IQ and SZ (ρ
IQ,SZ= –0.2)
18, suggesting the possibility
that genetic effects that contribute to EA but not via IQ
are responsible for the observed positive genetic correlation
between SZ and EA.
Indeed, previous research by the Social Science Genetic
Asso-ciation Consortium (SSGAC)
8already demonstrated that the
effect of the EA-PGS on years of schooling is mediated by several
individual characteristics that have imperfect or no genetic
cor-relation with each other, including higher IQ, higher openness
and higher conscientiousness. These different factors that
con-tribute to EA seem to be related to SZ and its symptoms in
complex ways
19–21. For example, differences in openness have
been reported to differentiate between patients diagnosed with SZ
spectrum personality disorders (higher openness) from patients
diagnosed with SZ (lower openness), while conscientiousness
tends to be reduced among patients of both disorders compared
to healthy controls
19.
The contributing factors to EA that have previously been
iden-tified by the SSGAC (i.e. IQ, openness and conscientiousness)
8are
phenotypically and genetically related, but by no means
identical
22,23. Specifically, the Cognitive Genomics Consortium
(COGENT) reported a moderate genetic correlation between IQ
and openness (r
g= 0.48, P = 3.25 × 10
–4), but only a small genetic
correlation of IQ and conscientiousness of 0.10 that was
indis-tinguishable from zero (r
g= 0.10, P = 0.46)
24. Therefore, it is
appropriate to think of EA as a genetically heterogeneous trait
that can be decomposed into subphenotypes that have imperfect
genetic correlations with each other. If the various symptoms of
SZ also have non-identical genetic architectures, this could result
in a pattern where both EA and SZ share many genetic loci, but
without a clear pattern of sign concordance and with seemingly
contradictory phenotypic and genetic correlation results.
To explore this hypothesis and to discern it from alternative
explanations, we perform a series of statistical genetic analyses
using large-scale GWAS results for SZ and EA from
non-overlapping samples. We start by characterizing the genetic
relationship between both traits by using EA as a
‘proxy
pheno-type’
25for SZ. We annotate possible biological pathways, tissues
and cell types implied by genetic variants that are associated with
both traits and explore to what extent these variants are also
enriched for association with other traits. We test if the genetic
relationship between EA and SZ can be explained by chance,
linkage disequilibrium (LD) or assortative mating. Furthermore,
we investigate the hypothesis that the part of SZ that is different
from BIP is a neurodevelopmental disorder, whereas the part of
SZ that overlaps with BIP is not. Finally, we develop a formal
statistical test for genetic heterogeneity of SZ using a polygenic
prediction framework that leverages both the SZ and the EA
GWAS results. Together, our analysis suggest that current SZ
diagnoses aggregate over at least two disease subtypes: one part
resembles BIP and high IQ, while the other part is a cognitive
disorder that is independent of BIP.
Results
Genetic dependence and genetic correlation. As a formal
pre-lude to our study, it is conceptually important to differentiate
between genetic dependence and genetic correlation. In our
analyses, genetic dependence means that the genetic variants
associated with EA are more likely to also be associated with SZ
than expected by chance. In contrast, genetic correlation is
defined by the correlation of the (true) effect sizes of genetic
variants on the two traits. Thus, genetic correlation implies a
linear genetic relationship between two traits whereas genetic
dependence does not. Thus, two traits can be genetically
depen-dent even if they are not genetically correlated and vice versa. One
possible cause of a non-linear genetic dependence is that at least
one of the traits is genetically heterogeneous in the sense that it
aggregates across subphenotypes (or symptoms) with
non-identical genetic architectures. Supplementary Note
1
presents a
formal discussion and simulations that illustrate the data patterns
that can emerge.
Proxy-phenotype analyses. We used the proxy-phenotype
method (PPM)
25to illustrate the genetic dependence between
EA and SZ. PPM is a two-stage approach. In the
first stage, a
GWAS on the proxy-phenotype (EA) is conducted. The most
strongly associated loci are then advanced to the second stage,
which tests the association of these loci with the phenotype of
interest (SZ) in an independent sample. If the two traits are
genetically dependent, this two-stage approach can increase the
statistical power for detecting associations for the target trait
because it limits the multiple testing burden for the phenotype of
interest compared to a GWAS
8,25,26.
Our PPM analyses followed a preregistered analysis plan
(
https://osf.io/dnhfk/
) using GWAS results on EA (n
= 363,502)
8and SZ (34,409 cases and 45,670 controls)
5that were obtained
from non-overlapping samples of Europeans. For replication and
follow-up analyses, we used the Göttingen Research Association
for Schizophrenia (GRAS) data collection
27, which has a uniquely
rich and accurate set of SZ measures. The GRAS sample was not
part of either GWAS.
Analyses were performed using 8,240,280 autosomal SNPs that
passed quality controls in both GWAS and additional
filters. We
selected approximately independent lead SNPs from the EA
GWAS that passed the predefined significance threshold of P
EA<
10
–5and looked up their SZ results. To test if EA-associated SNPs
are more strongly associated with SZ than expected by chance
(referred to as
‘raw enrichment’ below), we conducted a
Mann–Whitney test that compares the P
SZvalues of the
EA-associated lead SNPs with the P
SZvalues of a set of randomly
drawn, approximately LD-independent SNPs with similar minor
allele frequencies (MAFs). Fig.
1
presents an overview of the
proxy-phenotype analyses.
The
first-stage GWAS on EA identified 506 loci that passed our
predefined threshold of P
EA< 10
–5(Supplementary Note
2
); 108
of them were significant at the genome-wide level (P
EA< 5 × 10
–8,
see Supplementary Data
2
). Of the 506 EA lead-SNPs, 132 are
associated with SZ at nominal significance (P
SZ< 0.05), and 21 of
these survive Bonferroni correction (P
SZ<
0:05506= 9.88 × 10
−5)
(Table
1
). LD score regression results suggest that the vast
majority of the association signal in both the EA
8and the SZ
5GWAS are truly genetic signals, rather than spurious signals
originating from uncontrolled population stratification. Fig.
2
a
shows a Manhattan plot for the GWAS on EA highlighting SNPs
that were also significantly associated with SZ (black crosses for
P
SZ< 0.05, magenta crosses for P
SZ= 9.88 × 10
–5).
A Q–Q plot of the 506 EA lead SNPs for SZ is shown in Fig.
2
b.
Although the observed sign concordance of 52% is not
significantly different from a random pattern (P = 0.40), we find
3.23 times more SNPs in this set of 506 SNPs that are nominally
significant for SZ than expected given the distribution of the
P values in the SZ GWAS results (raw enrichment P
= 6.87 ×
10
−10). The observed enrichment of the 21 EA lead SNPs that
pass Bonferroni correction for SZ (P
SZ<
0:05506= 9.88 × 10
−5) is
even more pronounced (27 times stronger, P
= 5.44 × 10
−14).
The effect sizes of these 21 SNPs on SZ are small, ranging from
odds ratio (OR)
= 1.02 (rs4500960) to OR = 1.11 (rs4378243)
after correction for the statistical winner’s curse
25(Table
1
). We
calculated the probability that these 21 SNPs are truly associated
with SZ using a heuristic Bayesian method that takes the winner’s
curse corrected effect sizes, statistical power and prior beliefs into
account
25. Applying a reasonable prior belief of 5%, we
find that
all 21 SNPs are likely or almost certain to be true positives.
Novel SZ loci. Of the 21 variants we identified, 12 are in LD with
loci previously reported by the PGC
5and two are in the major
histocompatibility complex region on chromosome (chr) 6 and
were therefore not separately reported in that study. Three of the
variants we isolated (rs7610856, rs143283559 and rs28360516)
were independently found in a meta-analysis of the PGC results
5with another large-scale sample which identified 50 novel SZ
SNPs
28. Two of the 21 variants (rs756912, rs7593947) are in LD
with loci recently reported in a study that also compared GWAS
findings from EA and SZ using smaller samples and a less
con-servative statistical approach
29. The remaining two SNPs we
identified here (rs7336518 on chr13 and rs7522116 on chr1) add
to the list of empirically plausible candidate loci for SZ.
Using a proportions test, we compared the ratio of novel SNPs
from ref.
28included in our list of 132 loci that are jointly
associated with EA and SZ (P
EA< 10
−5and P
SZ< 0.05, yielding 6
loci) with the ratio observed in all remaining approximately
independent loci with P
SZ< 0.05 in our SZ GWAS results. We
found that the proportion of novel SZ SNPs is higher among the
132 loci that are informed by the EA GWAS results (Fisher’s
exact test P
= 2.4 × 10
−9, two-sided). Thus, using EA as a
proxy-phenotype for SZ helped to predict the novel genome-wide
significant findings reported in ref.
28, illustrating the power of the
proxy-phenotype approach.
Detection of shared causal loci. The next step in our study was a
series of analyses that aimed to identify reasons for the observed
genetic dependence between EA and SZ and to put the
findings of
the PPM analysis into context. First, we probed if there is
evi-dence that the loci identified by the PPM may tag shared causal
loci for both EA and SZ (i.e. pleiotropy), rather than being in LD
with different causal loci for both traits.
For each of the 21 SNPs isolated by our PPM analysis, we
looked at their neighbouring SNPs within a ±500 kb window and
estimated their posterior probability of being causal for EA or SZ
using PAINTOR
30. We then selected two sets of SNPs, each of
which contains the smallest number of SNPs that yields a
cumulative posterior probability of 90% or 50% of containing
the causal locus for EA and SZ. We refer to these as broad sets
(90%) and narrow sets (50%), respectively. Supplementary
Note
3
and Supplementary Data
4
also contain results for the
80 and 65% credibility sets. For each of these sets, we calculated
the posterior probability that it contains the causal locus for the
other trait.
For the broad credibility set analyses (90%), we found 11 loci
with a medium or high credibility to have direct causal effects on
both EA and SZ (including one of the novel SNPs, rs7336518). Six
of these loci have concordant effects on the two traits (i.e.
++ or
−−) while five have discordant effects (i.e. +− or −+, Table
1
).
The analyses of the 50% credible sets are based on a smaller number
of SNPs. This also results in lower probabilities that the SZ set
contains the causal SNP for EA and vice versa. Nevertheless, our
analysis with 50% credible sets show that four specific loci
(rs7610856, rs320700, rs79210963 and rs7336518) had credibility
of more than 15% for the other trait, providing support for the high
(rs320700 and rs79210963) and medium (rs7610856 and
rs7336518) credibility judgments based on the 90% sets. One of
these has a discordant effect (rs7610856) while the others have a
concordant effect on SZ and EA.
Overall, our analyses suggest that some of the 21 SNPs that we
identified by using EA as a proxy-phenotype for SZ are likely to
have direct pleiotropic effects on both traits. Of the most likely
candidates for direct pleiotropic effects, three SNPs have
concordant signs (rs79210963, rs7336518 and rs320700) and
one has discordant signs (rs7610856).
Biological annotations. Biological annotation of the 132 SNPs
that are jointly associated with EA (P
EA< 10
−5) and SZ (P
SZ<
0.05) using DEPICT (Supplementary Note
4
and Supplementary
Data
5
–
7
) points to genes that are known to be involved in
neurogenesis and synapse formation (Supplementary Data
8
).
Some of the indicated genes, including SEMA6D and CSPG5,
have been suggested to play a potential role in SZ
31,32.
For the two novel candidate SNPs reported in this study
(rs7522116 and rs7336518), DEPICT points to the FOXO6
(Forkhead Box O6) and the SLITRK1 (SLIT and NTRK Like
Family Member 1) genes, respectively. FOXO6 is predominantly
expressed in the hippocampus and has been suggested to be
involved in memory consolidation, emotion and synaptic
function
33,34. Similarly, SLITRK1 is also highly expressed in the
brain
35, is particularly localized to excitatory synapses and
promotes their development
36, and it has previously been
suggested to be a candidate gene for neuropsychiatric disorders
37.
LD-aware enrichment across different traits. The raw
enrich-ment P value reported in Fig.
2
b could in principle be due to the
LD structure of the EA lead SNPs that we tested. Specifically, if
these EA lead SNPs have stronger LD with other SNPs in the
human genome than expected by chance, this could cause the
observed enrichment of this set of SNPs on SZ and other traits
because higher LD increases the chance these SNPs would
‘tag’
causal SNPs that they are correlated with
38,39.
To assess the null hypothesis that the observed genetic
dependence between EA and SZ can be entirely explained by
LD patterns in the human genome, we developed an association
enrichment test that corrects for the LD score of each SNP. We
applied this test to the 132 SNPs that are jointly associated with
EA (P
EA< 10
−5) and SZ (P
SZ< 0.05), i.e. the loci that were
identified by using EA as a proxy-phenotype for SZ. LD scores
were obtained from the HapMap 3 European reference panel
40(Supplementary Data
9
). We found significant joint LD-aware
enrichment for SZ (P
= 9.57 × 10
−66), demonstrating that the
genetic dependence between EA and SZ cannot be entirely
explained by LD.
Furthermore, we used this test to explore if these SNPs are
generally enriched for association with all (brain-related)
pheno-types, or whether they exhibit some degree of outcome specificity.
For this purpose, we extended the LD-aware enrichment test to
21 additional traits for which GWAS results were available in the
1) EA GWAS (Okbay et al. 2016)
N = 363,502 individuals
12,299,530 SNPs
2) SZ GWAS (Ripke et al. 2014)
(PGC - GRAS excluded) N = 34,409 cases N = 45,670 controls 17,221,718 SNPs 8,240,280 SNPs Overlap EA_all EA_all pEA < 10 –5 506 SNPs 132 SNPs 21 SNPs Bonferroni pSZ = 0.05/506 2 SNPs novel for SZ SZ_all EA_132 SZ_132 8,240,280 SNPs 3) Proxy-phenotype
4) Schizophrenia diagnosis prediction in the GRAS sample
N = 1054 cases /N = 1169 controls
5) Schizophrenia symptom prediction in the GRAS sample
N = 1054 cases P olygenic r isk scores P olygenic r isk scores P olygenic r isk score P olygenic r isk scores pSZ < 0.05 Concordant SNP effect EA SZ 8,240,280 SNPs Discordant + + – – + – + –
a
b
c
Fig. 1 Workflow of the proxy-phenotype analyses. Notes: Educational attainment (EA) and schizophrenia (SZ) GWAS results are based on the analyses reported in refs.5,8. All cohorts that were part of the SZ GWAS were excluded from the meta-analysis on EA. The GRAS data collection was not included in either the SZ or the EA meta-analysis. Proxy-phenotype analyses were conducted using 8,240,280 autosomal SNPs that passed quality control. Genetic outliers of non-European descent (N = 13 cases) were excluded from the analysis in the GRAS data collection
public domain (Supplementary Fig.
5
and Supplementary
Data
10
). Some of the traits were chosen because they are
phenotypically related to SZ (e.g. neuroticism, depressive
symptoms, major depressive disorder, autism and childhood
IQ), while others were less obviously related to SZ (e.g. age at
menarche, intracranial volume and cigarettes per day) or served
as negative controls (height, birth weight, birth length and fasting
(pro)insulin). The power of the LD-aware enrichment test
primarily depends on the GWAS sample size of the target trait
and results of our test would be expected to change as GWAS
sample sizes keep growing. We found LD-aware enrichment of
these SNPs for BIP, neuroticism, childhood IQ and age at
menarche. However, we found no LD-aware enrichment for other
brain-traits that are phenotypically related to SZ, such as
depressive symptoms, subjective well-being, autism and attention
deficit hyperactivity disorder. We also did not find LD-aware
enrichment for most traits that are less obviously related to the
brain and our negative controls. Furthermore, one of the novel
SNPs we isolated shows significant LD-aware enrichment both
for SZ and for BIP (rs7522116). The results suggest that the loci
identified by the PPM are not simply related to all (brain) traits.
Instead, they show some degree of phenotype specificity.
Replication in the GRAS sample. Following our preregistered
analysis plan (
https://osf.io/dnhfk/
), we replicated the PPM
ana-lysis results in the GRAS sample (Supplementary Note
5
and
Supplementary Data
11
) using polygenic prediction
(Supple-mentary Note
6
, Supplementary Data
12
and Supplementary
Fig.
6
). The PGS (SZ_132) that is based on the 132 independent
EA lead SNPs that are also nominally associated with SZ (P
EA<
10
−5and
P
SZ< 0.05)
adds
ΔR
2= 7.54 − 7.01% = 0.53%
predictive accuracy for SZ case–control status to a PGS (SZ_all)
derived from the GWAS on SZ alone (P
= 1.7 × 10
−4,
Supple-mentary Data
13
, Model 3).
Prediction of SZ measures among patients. To explore the
genetic architecture of specific SZ measures, we again used our
replication sample (GRAS), which contains exceptionally detailed
measures of SZ symptoms, severity and disease history
4,7,27. We
focused on years of education, age at prodrome, age at disease
onset, premorbid IQ (approximated by a multiple-choice
voca-bulary test), global assessment of functioning (GAF), the clinical
global impression of severity (CGI-S) as well as positive and
negative symptoms (PANSS positive and negative, respectively)
among SZ patients (N ranges from 903 to 1039, see
Supple-mentary Note
5
). Consistent with the idea that EA is a predictor
of SZ measures, our phenotypic correlations show that higher
education is associated with later age at prodrome, later onset of
disease and less severe disease symptoms among SZ patients
(Supplementary Note
7
, Supplementary Data
15
and
Supple-mentary Fig.
7
).
Our most direct test for genetic heterogeneity of SZ is based on
PGS analyses that we performed using the detailed SZ measures
among GRAS patients. If SZ is genetically heterogeneous, there is
potentially relevant information in the sign concordance of
individual SNPs with EA traits that may improve the prediction
of symptoms (Supplementary Note
1
). We use a simple method
to do this here:
first, we construct a PGS for SZ that contains one
SNP per LD-block that is most strongly associated with SZ.
Overall, this score (SZ_all) contains 349,357 approximately
LD-independent SNPs. Next, we split SZ_all into two scores, based on
sign-concordance of the SNPs with SZ and EA. More specifically,
Table 1 SNPs signi
ficantly associated with schizophrenia after Bonferroni correction
SNP-ID EA beta Signsconcordant SZ adj.R2 (%) SZ OR (Adj.) EAF Powerα = 0.05/506 (%) Chance of direct pleiotropic effect on EA and SZ
Posterior probability of true association with SZ prior belief (π) (%) 90% sets 50% sets 0.1% 1.0% 5.0% 10.0% rs79210963 −0.016 Yes 0.021 0.931 0.89 22.9 H M 75.0 96.8 99.3 99.7 rs7610856 0.013 No 0.022 0.955 0.41 22.8 M M 74.9 96.8 99.3 99.7 rs10896636 0.012 No 0.020 0.956 0.67 17.8 H L 68.7 95.6 99.1 99.5 rs756912 −0.015 Yes 0.022 0.956 0.51 22.7 L L 74.8 96.7 99.3 99.7 rs6449503 0.018 No 0.020 0.961 0.51 12.9 L L 60.0 93.7 98.7 99.3 rs7336518 −0.016 Yes 0.014 0.964 0.13 1.5 M M 13.4 60.6 88.5 93.9 rs143283559 0.014 No 0.017 0.965 0.72 4.6 M L 32.8 83.0 96.1 98.0 rs11210935 0.015 No 0.014 0.973 0.77 1.2 L L 10.9 55.1 86.0 92.5 rs77000541 −0.014 Yes 0.018 0.974 0.33 1.6 L L 14.1 62.2 89.2 94.3 rs2819344 0.014 No 0.017 0.983 0.62 0.3 H L 3.0 23.3 60.4 75.3 rs4500960 −0.013 No 0.017 1.017 0.47 0.3 L L 3.0 23.3 60.4 75.3 rs28360516 −0.012 No 0.013 1.027 0.70 1.4 M L 12.6 59.0 87.8 93.5 rs7522116 0.011 Yes 0.015 1.029 0.56 3.0 M L 23.8 75.8 94.0 96.9 rs7593947 0.014 Yes 0.018 1.040 0.51 12.5 M L 59.1 93.5 98.6 99.3 rs11694989 0.011 Yes 0.021 1.044 0.43 17.9 L L 68.8 95.7 99.1 99.5 rs320700 0.013 Yes 0.024 1.054 0.65 36.4 H M 85.3 98.3 99.7 99.8 rs3957165 0.015 Yes 0.020 1.056 0.83 14.7 L L 63.6 94.6 98.9 99.4 rs10791106 0.011 Yes 0.026 1.056 0.54 46.9 L L 89.9 98.9 99.8 99.9 rs2992632 0.016 Yes 0.025 1.060 0.74 36.8 M L 85.5 98.3 99.7 99.8 rs10773002 0.022 Yes 0.043 1.087 0.28 91.0 L L 99.0 99.9 100.0 100.0 rs4378243 0.019 Yes 0.044 1.112 0.85 91.5 L L 99.1 99.9 100.0 100.0
Notes: The SNPs in the table are ordered by their odds ratio (OR) on schizophrenia (SZ). Effect sizes for SZ (in R2and OR) are downward adjusted for the winner’s curse25. EA (beta) is the standardized
beta of a SNP for educational attainment (EA) GWAS. R2was approximated from the winner’s curse adjusted OR ratios, using the formulas described in Methods section. The winner’s curse adjustment
took into account that only SNPs with P = 0.05/506 were selected. SNPs with concordant effects on both SZ and EA are marked as ‘yes’ in the sign concordance column. EAF is the effect allele frequency in the SZ GWAS data. Power calculations assumed that the available GWAS sample size for SZ for each SNP consisted of 34,409 cases and 45,670 controls. The chance that a SNP has direct pleiotropic
effects on EA and SZ has been evaluated with PAINTOR using sets of SNPs that have a cumulative probability of 90% or 50% to include the causal variant (see Methods and Supplementary Note3). The
posterior probability that these SNPs are truly associated with SZ was calculated using the Bayesian procedure developed by Rietveld et al.25. SNPs highlighted in bold are associations for SZ that have not been emphasized in the previous literature. H high, M medium, L low.
one score contains all estimated SZ effects of SNPs that have
concordant signs for both traits (174,734 SNPs with
++ or −−
on both traits, Concordant) while the other contains the estimated
SZ effects of the remaining SNPs with discordant effects (174,623
SNPs with
+− or −+, Discordant). Note that splitting the SZ_all
score this way is not expected to improve the prediction of
symptoms if they share the same genetic architecture (i.e. if SZ
was a genetically homogenous trait). We test this null hypothesis
with an F test that compares the predictive performance of
models that include (i) the SZ_all and the EA score (EA_all) and
(ii) the Concordant, Discordant, and EA_all scores
(Supplemen-tary Note
1
). We also compare the performance of both of these
models to a baseline that only includes the SZ_all score as a
relevant predictor.
We found that the EA_all PGS is associated with years of
education (P
= 1.0 × 10
−6) and premorbid IQ (P
= 2.7 × 10
−4)
among SZ patients (Table
2
). Consistent with earlier results
4, we
also found that none of the SZ measures can be predicted by the
PGS for SZ (SZ_all, Table
2
). However, splitting the PGS for SZ
based on the sign-concordance of SNPs with EA (Concordant and
Discordant) increased predictive accuracy significantly for
severity of disease (GAF (p
F= 0.023)) and symptoms (PANSS
negative (p
F= 0.007)) (Table
2
). This increase in predictive
accuracy is evidence for genetic heterogeneity of SZ
(Supplemen-tary Note
1
). Specifically, our results indicate that SZ patients with
a high genetic propensity for EA have better GAFs and less severe
negative symptoms (PANSS negative). However, if the high
genetic predisposition for EA is primarily due to loci that also
increase the risk for SZ (i.e. high values on the Concordant score),
this protective effect is attenuated. We repeated these analyses
excluding patients who were diagnosed with schizoaffective
disorder (SD, N
= 198) and found similar results, implying that
our
findings are not only due to the presence of patients with SD
(Supplementary Note
8
, Supplementary Data
18
).
We note that this implementation of our test for heterogeneity
of SZ (Supplementary Note
1
) is based on a conservative pruning
Significance of association (–log
10 ( P value)) 40 30 20 10 0 1 2 3 4 5 6 7 8 Chromosome Schizophrenia Observed –log 10 ( P value) 9 10 11 12 13 14 15 17 19 21 22 20 18 16 Lead SNPs (PSZ < 0.05) EA Lead SNPs (PSZ < 9.85×10 –5 ) P value = 1×10–5 8 6 4 2 Enrichment P value: 6.87×10–10 Sign concordance: 52%
Expected –log10 (P value)
2 0 4 6 8 0 rs4378243 (–log10 (P value) = 12.3) rs10773002 (–log10 (P value) = 11.5) rs10791106 rs2992632 rs756912 rs3957165 rs10896636 rs6449503 rs4500960 rs143283559 rs7522116 rs11210935 rs28360516 rs77000541 rs2819344rs7336518 rs7593947 rs11694989 rs79210963 rs7610856 rs320700
a
b
Fig. 2 Results of the proxy-phenotype analyses. Notes: a Manhattan plot for educational attainment (EA) associations (n = 363,502). The x axis is the chromosomal position, and the y axis is the significance on a −log10scale (two-sided). The black dashed line shows the suggestive significance level of 10−5that we specified in our preregistered analysis plan. Black and magenta crosses identify EA-associated lead-SNPs that are also associated with SZ at nominal or Bonferroni-adjusted significance levels, respectively. b Q–Q plot of the 506 EA-associated SNPs for schizophrenia (SZ) (n = 34,409 cases and n = 45,670 controls). SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the grey area (21 SNPs) pass the Bonferroni-corrected significance threshold that corrects for the total number of SNPs we tested (P < 0.05/506 = 9.88 × 10−5) and are labelled with their rs numbers. Observed and expected P values are on a −log10scale. For the sign concordance test: P = 0.40, two-sided
algorithm that controls for LD both within and across the
Concordant and Discordant scores. This limits the number of
genetic markers in both of these scores, their expected predictive
accuracy and the power of the test. As an alternative, we also used
a less conservative approach that only prunes for LD within
scores, yielding 260,441 concordant and 261,062 discordant
SNPs. Split scores based on this extended set of SNPs have higher
predictive accuracy for all the SZ measures that we analysed
(Supplementary Data
22
), reaching
ΔR
2= 1.12% (p
F
= 0.0004)
for PANSS negative.
Finally, we show that randomly splitting the SZ_all score does
not yield any gains in predictive accuracy (Supplementary
Data
19
).
Genetic differences between SZ and bipolar. The ongoing
debate about what constitutes the difference between SZ and
BIP
10–14suggests an additional possibility to test for genetic
heterogeneity among SZ cases. While SZ and BIP share
psycho-tic symptoms such as hallucinations and delusions, scholars have
argued that SZ should be perceived as a neurodevelopmental
disorder in which cognitive deficits precede the development of
psychotic symptoms, while this is not the case for BIP
10–14.
However, cognitive deficits during adolescence are currently not a
diagnostic criterion that formally differentiates SZ from BIP. As a
result, many patients who are formally diagnosed with SZ did not
suffer from cognitive impairments in their adolescent years, but
their disease aetiology may be different from those who do. These
differences in disease aetiology may be visible in how the
non-shared part of the genetic architecture of SZ and BIP is related to
measures of cognition, such as EA and childhood IQ.
We tested this by using genome-wide inferred statistics
(GWIS)
41to obtain GWAS regression coefficients and standard
errors for SZ that are
‘purged’ of their genetic correlation with
BIP and vice versa (yielding
‘unique’ SZ
(min BIP)and
‘unique’
BIP
(min SZ)results, respectively). We repeated the look-up of the
EA-associated lead SNPs in those summary statistics and
find that
the enrichment is weaker than in the SZ GWAS results that did
not control for genetic overlap between SZ and BP
(Supplemen-tary Note
9
).
We then computed genetic correlations of these GWIS results
with EA, childhood IQ and (as a non-cognitive control trait)
neuroticism using bivariate LD score regression
42, and compared
the results to those obtained using ordinary SZ and BIP GWAS
results (Supplementary Note
10
).
In line with earlier
findings
8,42, we see a positive genetic
correlation of ordinary SZ and BIP with EA. However, the genetic
correlations between
‘unique’ SZ
(min BIP)with EA and childhood
IQ are negative and significant (r
g= −0.16, P = 3.88 × 10
−4and
r
g= −0.31, P = 6.00 × 10
−3, respectively), while the genetic
correlations of
‘unique’ BIP
(min SZ)with EA and IQ remain
positive (r
g≈ 0.3) (Fig.
3
, Supplementary Data
24
). Thus, the
slightly positive genetic correlation between SZ and EA
8,42can be
entirely attributed to the genetic overlap between SZ and BIP
41, a
result recently replicated using genomic structural equation
modelling
43. Overall, these results add to the impression that
current clinical diagnoses of SZ aggregate over various
non-identical disease aetiologies.
Simulating assortative mating. Finally, simulations show that
assortative mating is unlikely to be a major cause of the observed
level of genetic dependence between EA and SZ (Supplementary
Note
11
, Supplementary Fig.
9
).
Discussion
We explored the genetic relationship between EA and SZ using
large, non-overlapping GWAS samples. Our results show that
EA-associated SNPs are much more likely to be associated with
SZ than expected by chance, i.e. both traits are genetically
dependent. Overall, we isolated 21 genetic loci that are credibly
associated with SZ by using EA as a proxy-phenotype, including
two novel candidate genes, FOXO6 and SLITRK1. Furthermore,
we showed that EA GWAS results help to predict future GWAS
findings for SZ in even larger samples.
Table 2 Polygenic prediction of schizophrenia measures in the GRAS patient sample
Years of educationa Age at prodrome Age at disease onset Premorbid IQa
GAFb CGI-Sb PANSS positiveb PANSS negativeb Baseline model SZ_all Stand. beta 0.001 −0.041 −0.056 −0.063 −0.024 0.041 0.033 0.043 P value 0.976 0.297 0.129 0.090 0.510 0.249 0.364 0.253 EA_all Stand. beta 0.182** 0.005 −0.002 0.149** 0.068* −0.057 0.001 −0.051 P value 4.4 × 10−09 0.884 0.961 7.2 × 10−6 0.029 0.065 0.981 0.107 Adj. R² 0.0612 0.0023 0.0047 0.0417 0.0655 0.0816 0.0711 0.0243 ΔAdj. R²c 0.0312 −0.0010 −0.0009 0.0209 0.0035 0.0023 −0.0010 0.0015 Split model Concordant Stand. beta −0.013 −0.019 −0.031 −0.043 −0.096 * 0.050 0.079 0.125** P value 0.751 0.665 0.456 0.326 0.022 0.232 0.059 0.0036 Discordant Stand. beta 0.014 −0.030 −0.035 −0.034 0.066 <0.001 −0.039 −0.072 P value 0.730 0.515 0.409 0.437 0.112 0.996 0.351 0.090 EA_all Stand. beta 0.191** 0.002 −0.002 0.153** 0.122** −0.074 −0.039 −0.118** P value 1.0 × 10−06 0.965 0.953 2.7×10−4 0.002 0.058 0.319 0.003 Adj. R² 0.0604 0.0012 0.0037 0.0406 0.0694 0.0811 0.0728 0.0306 ΔAdj. R²c 0.0304 −0.0021 −0.0019 0.0198 0.0074 0.0018 0.0007 0.0078 n 1039 915 1043 903 1010 1014 1009 1002 ΔR² (Split−baseline model) −0.0008 −0.0011 −0.0010 −0.0011 0.0039 −0.0005 0.0017 0.0063 P value from F testd 0.698 0.907 0.968 0.891 0.023* 0.479 0.098 0.007** Notes: Linear regression using the first ten genetic principal components as control variablesaAge of onset was included as covariatebMedication was included as covariatec
Change in Adj. R2of the
models compared to a model that only contains the SZ_all score and the control variablesdP value from F test refers to improvement in split model compared to baseline model *Significance at P o 0.05
Biological annotation of a broader set of SNPs that are jointly
associated with EA (P
EA< 10
−5) and SZ (P
SZ< 0.05) points to
neurogenesis and synapse formation as potentially important
pathways that may influence both traits.
However, the genetic loci that are associated with both traits do
not follow a systematic sign pattern that would correspond to a
strong positive or negative genetic correlation. Our follow-up
analyses demonstrated that this pattern of strong genetic
dependence but weak genetic correlation between EA and SZ
cannot be fully explained by LD or assortative mating.
Instead, our results are most consistent with the idea that EA
and SZ are both genetically heterogeneous traits that aggregate
over various subphenotypes or symptoms with non-identical
genetic architectures. Specifically, our results suggest that current
SZ diagnoses aggregate over at least two disease subtypes: one
part resembles BIP and high IQ (possibly associated with
Con-cordant SNPs), where better cognition may also be genetically
linked to other BIP features such as higher energy and drive,
while the other part is a cognitive disorder that is independent of
BIP (possibly influenced by Discordant SNPs). This latter subtype
bears similarity with Kraepelin’s description of dementia
prae-cox
11. Overall, our pattern of results resonates with the idea that
cognitive deficits in early life may be an important differentiating
factor between patients with BIP versus SZ psychosis.
Moreover, splitting the PGS for SZ into two scores based on the
sign concordance of SNPs with EA enables the prediction of
disease symptoms and severity from genetic data for the
first time
to some extent. We showed that this result is not driven by
patients with SD and it cannot be repeated by randomly splitting
the SZ score. Obviously, further replication of our results in other
samples with high-quality SZ measures would be highly desirable.
The many sign-concordant loci that increase the risk for SZ but
also improve the chance for higher education point to possible
side-effects of pharmacological interventions that may aim to
target biological pathways that are implicated by pleiotropic loci.
Indeed, exploring pleiotropic patterns of disease-associated genes
across a broad range of phenotypes (including social-scientific
ones such as EA or subjective well-being
26) may be a viable
strategy to identify possible side-effects of new pharmacological
products at early stages of drug development in the future.
Although the complexity of SZ remains astonishing, our study
contributes to unravelling this complexity by starting at a genetic
level of analysis using well-powered GWAS results. Our results
provide some hope that a psychiatric nosology that is based on
biological causes rather than pure phenotypical classifications
may be feasible in the future. Studies that combine well-powered
GWASs of several diseases and from phenotypes that represent
variation in the normal range such as EA are likely to play an
important part in this development. However, deep phenotyping
of large patient samples will be necessary to link GWAS results
from complex outcomes such as EA and SZ to specific biological
disease subgroups.
Methods
GWAS. The principal investigators of all cohorts obtained informed consent from all study participants and approval from Institutional Review Boards (IRB) at their respective institution. We obtained GWAS summary statistics on EA from the SSGAC. The results are based on Okbay et al.8, including the UK Biobank. The PGC shared GWAS summary statistics on SZ with us that were reported in Ripke et al.5, but excluded data from our replication sample (GRAS), yielding a total sample size of n= 34,409 cases and n = 45,670 controls.
All cohorts that were part of both studies5,8were excluded from the meta-analysis on EA, yielding non-overlapping GWAS samples and nEA= 363,502. The original EA resultsfile contained 12,299,530 genetic markers, compared to 17,221,718 in the SZ resultsfile.
We applied the following additional quality control steps:
1. To maximize statistical power, we excluded SNPs that were missing in large parts of the two samples. Specifically, we continued with SNPs that were available in at least 19 out of 50 cohorts in the SZ results5(the actual N per SNP was not provided in the SZ GWAS summary statistics) and in N > 200,000 in the EA meta-analysis8. This step excluded 3,778,914 and 6,369,138 genetic markers for EA and SZ, respectively.
2. We dropped SNPs that were not available in both GWAS resultsfiles. This step restricted our analyses to the set of available genetic markers that passed the quality-controlfilters in both the EA and the SZ GWAS results, leaving us with 8,403,560 autosomal SNPs.
3. We dropped six SNPs with non-standard alleles (i.e. not A, C, T or G) and two SNPs with mismatched effective alleles. Furthermore, we dropped 163,272 SNPs in thefirst and the 99th percentile of the distribution of differences in MAF in the two resultsfiles. This final step eliminated SNPs that were likely to be affected by coding errors, strandflips or substantial differences in MAF in the EA and SZ samples.
The remaining 8,240,280 autosomal SNPs were used in the proxy-phenotype and prediction analyses.
Proxy-phenotype method. Look-up: We conducted our proxy-phenotype analyses following a pre-registered analysis plan (https://osf.io/dnhfk/), using the 8,240,280 autosomal SNPs that passed quality control. We selected 10−5as the default P value threshold to identify EA-associated SNPs prior to carrying out the proxy-phenotype analyses (Supplementary Note2).
To select approximately independent SNPs from the EA GWAS results, we applied the clumping procedure in PLINK version 1.944,45using r2> 0.1 and
1,000,000 kb as the clumping parameters and the 1000 Genomes phase 1 version 3 European reference panel46to estimate LD among SNPs. This algorithm assigns the SNP with the smallest P value as the lead SNP in its‘clump’. All SNPs in the vicinity of 1,000,000 kb around the lead SNP that are correlated with it at r2> 0.1
are assigned to this clump. The next clump is formed around the SNP with the next smallest P value, consisting of SNPs that have not been already assigned to thefirst clump. This process is iterated until no SNPs remain with P < 10−5, leading to 506 approximately independent EA-associated lead SNPs. 108 of the 506 EA-associated lead SNPs are genome-wide significant (P < 5 × 10−8).
We looked up the SZ GWAS results for these 506 EA-associated lead SNPs. Results for all 506 SNPs are reported in Supplementary Data2and Fig.2.
In order to investigate the novelty of thefindings, we extracted all the SNPs in LD with these 21 SNPs at r2≥ 0.1 with a maximum distance of 1000 kb using the
1000 Genomes phase 1 European reference panel.
Bayesian credibility of results: We probed the credibility of our proxy-phenotype association results using a heuristic Bayesian calculation following Rietveld et al. (Supplementary Information pp.13–15)25. We focus on the 21 EA-associated lead SNPs that are also EA-associated with SZ after Bonferroni correction.
Educational attainment Schizophrenia (SZ) 1.0
*
*
*
*
*
*
*
*
*
*
*
*
rg 0.8 0.6 0.4 0.2 0.0 –0.2 –0.4 –0.6 –0.8 –1.0GWIS schizophrenia(min BIP) Bipolar disorder (BIP) GWIS bipolar disorder(min SZ)
Childhood IQ Neuroticism Educational attainment Schizophrenia (SZ) GWIS schizophrenia (min BIP)
Bipolar disorder (BIP) GWIS bipolar disorder
(min SZ)
Childhood IQNeuroticism
Fig. 3 Genetic correlations of GWAS and GWIS results. Notes: The heatmap displays the genetic correlations across seven sets of GWAS or GWIS summary statistics. Genetic correlations were estimated with LD score regression42.The colour scale represents the genetic correlations
ranging from–1 (red) to 1 (blue). Asterisks denote significant genetic correlations at P value < 0.01
Bayes’ rule implies that the probability that an association is true given that we observe significance is given by
P H 1jt>tα=2¼ P t>tð α=2jH1ÞP Hð Þ1
P t>tð α=2jH1ÞP Hð ÞþP t>t1 ð α=2jH0ÞP Hð Þ0
¼ðpowerðpowerÞðπÞÞ πð ÞþðαÞð1πÞ
‘Power’, as well as the significance test, are two-sided, π is the prior belief that the SNP is truly associated andα is the significance threshold used for testing (in our case,α =0:05506= 9.88 × 10−5).
To calculate power for each SNP, we computed the winner’s curse corrected OR using the procedure described in Rietveld et al. (Supplementary Information pp.7–
13)47for theα threshold of 9.88 × 10−5. Because the actual sample size per SNP is
not reported in the SZ GWAS summary statistics, we furthermore assumed that each SNP was available in the entire sample of 34,409 cases and 45,670 controls (i.e. the PGC results from Ripke et al.5excluding the GRAS data collection).
An important question is which prior beliefs are reasonable starting points for these Bayesian calculations. For an arbitrarily chosen SNP, the most conservative reasonable prior would assume that each truly associated SNP has the same effect size as the strongest effect size that was actually observed in the data. If one divides the SNP-based heritability of the trait by that effect size in R2units, one obtains a
lower bound for the number of SNPs that can be assumed to be truly associated. To aid this line of thinking, we converted the winner’s curse corrected OR of our 21 SNPs into R2using
R2¼ ffiffiffiffiffiffiffiffiffiffiffiffiffid d2þ a p
2
where d is Cohen’s d, which is calculated as d¼ lnðOddsÞ
ffiffiffi 3 p
π
and a is a correction factor that adjusts for the MAF of the SNP. This correction factor is calculated as
a¼ðn1þ n2Þ 2 n1n2 where n1= N × MAF and n2= N × (1 − MAF)48.
The largest effect size in R2that we observe in our results is rs4378243 with
0.044%. The SNP-based heritability of SZ is≈21%49. Thus, if all causal SZ SNPs would have an effect of R2= 0.044%, we would expect that ≈500 truly causal loci
exist. The chance offinding any one of them by chance from a set of ≈500,000 independent loci in the human genome is≈0.1%. (Our pruning algorithm of SNPs that passed QC leads to only 223,065 independent loci. Thus, assuming 500,000 independent loci in these calculations is conservative.) However, in reality most truly associated loci for SZ will surely have smaller effects than that. Thus, a prior belief of≈0.1% is certainly too conservative.
Furthermore, the SNPs we investigate are not arbitrary but selected based on their association with another, genetically related cognitive trait (EA) in a very large, independent sample. Thus, a prior belief of 1 or 5% that these SNPs are also associated with SZ is probably more reasonable. As an upper bound, we assume that 10% of all loci are causal. Thus, the chance to pick any one of them by chance would be 10%.
Table1displays the winner’s curse corrected effect size of the 21 EA-associated lead SNPs that are also associated with SZ after Bonferroni correction. It also shows the posterior probability that these SNPs are truly associated with SZ given our results for prior beliefs ranging from 0.1, 1, 5 to 10%. Thirteen of these SNPs have posterior probabilities of being true positives of >50% for even the most conservative prior. For a more realistic prior belief of 5%, all 21 SNPs are likely or almost certain to be true positives.
Sign concordance: We compared the signs of the beta coefficients of the 506 EA lead SNPs (PEA< 10−5) with the beta coefficients for SZ. If the signs were aligned, we assigned a‘1’ to the SNP and ‘0’ otherwise. By chance, sign concordance is expected to be 50%. We tested if the observed sign concordance is different from 50% using the binomial probability test50. 263 of the 506 SNPs have the same sign (52%, P= 0.40, two-sided).
Sign concordance is 58% (P= 0.10, two-sided) in the set of 132 EA lead SNPs that are also nominally significant for SZ (PEA< 1 × 10−5and PSZ< 0.05).
Finally, for the 21 SNPs that passed Bonferroni correction for SZ (PEA< 1 × 10−5and PSZ< 9.88 × 10−5), sign concordance is 62% (P= 0.38, twosided).
Raw enrichment factor (not corrected for LD score of SNPs): Because EA and SZ are highly polygenic, we tested for enrichment by taking the actual distribution of P values in the GWAS resultfiles into account.
Due to the polygenic architecture of both traits, it is expected tofind some EA-associated SNPs that are also EA-associated with SZ just by chance even if both traits are genetically independent. Under this null hypothesis, the expected number of
EA-associated lead SNPs that are also significantly associated with SZ is
EH0hNS;EA!SZi¼ NT;EA´ τPEA´ τPSZ
where NT,EAis the total number of independent lead SNPs in the EA GWAS results, andτPEAandτPSZare the shares of SNPs in NT,EAthat have P values for EA and SZ below a certain threshold, respectively.
We define the raw enrichment factor as
NS;EA!SZ=E NS;EA!SZh i
where NS,EA→SZis the observed independent number of SNPs that pass both the P value thresholds PEAand PSZ.
We obtained NT,EAby applying the clumping procedure described above (PPM) without a P value threshold for EA, leading to 222,289 independent EA lead SNPs in our merged GWAS resultsfile. For PEA< 10−5, we found 506 SNP (τPEA= 506
222;289 = 0.2276%).
The Bonferroni threshold for testing 506 independent hypothesis is PSZ<0:05506= 9.88 × 10−5. There are 341 independent SNPs in the SZ results that pass this threshold, thusτPSZ=
341
222;289= 0.1534%. Therefore, we expect [NS,EA→SZ]=
222,289 × 0.2276% × 0.1534%= 0.776 (i.e. less than one) SNP to be jointly associated with both traits under the hull hypothesis of no genetic overlap. At these P value thresholds, we actually observe NS,EA→SZ= 21 SNPs, implying a raw enrichment factor of 21
0:776= 27.
For PSZ< 0.05, we found 17,935 SNP (τPSZ=222;28917;935= 8.068%). Thus, [NS,EA→SZ] = 222,289 × 0.2276% × 8.068% = 41. At this more liberal P value threshold, we actually observe NS,EA→SZ= 132 SNPs, implying a raw enrichment factor of132
41=
3.23.
Raw enrichment P value (not corrected for LD score of SNPs): Following Okbay et al.26, we performed a non-parametric test of joint enrichment that probes whether the EA lead SNPs are more strongly associated with SZ than randomly chosen sets of SNPs with MAF within one percentage point of the lead SNP. To perform our test, we randomly drew ten matched SNPs for each of the 506 EA lead SNPs with PEA< 10−5.
We then ranked the 506 × 10 randomly matched SNPs and the original 506 lead EA SNPs by P value and conducted a Mann–Whitney test51of the null hypothesis that the P value distribution of the 506 EA lead SNPs are drawn from the same distribution as the 506 × 10 randomly matched SNPs. We reject the null hypothesis with P= 6.872 × 10−10(Z= 6.169, two-sided). As a negative control test, we also calculated the raw enrichment P value of thefirst randomly drawn, MAF-matched set of SNPs against the remaining nine sets, yielding P= 0.17.
Repeating this raw enrichment test for the subset of 21 EA-associated SNPs that remained significantly associated with SZ after Bonferroni correction (threshold PSZ<0:05506= 9.88 × 10−5) yields P= 5.44 × 10−14(Z= 7.521, two-sided). The
negative control test based on the raw enrichment P value of thefirst randomly drawn, MAF-matched set of SNPs against the remaining nine sets yields P= 0.34. GWAS catalogue look-up. In order to investigate the novelty of the 21 SNP associations that were found significant for SZ after Bonferroni correction, reported in Table1, we performed a look-up in the GWAS catalogue52(revision 2016-08-25, downloaded on 2016-08-29,https://www.ebi.ac.uk/gwas/api/search/downloads/ full) with the SNPs and all their‘LD partners’ (i.e. all SNPs with an r2> 0.5 within a
250 kb window). The LD partners were extracted with PLINK44using a version of the 1000G reference panel specifically harmonized to combine 1000G phase 1 and phase 3 imputed data53, and the reference panel has been described previously26. The result of the GWAS catalogue look-up is reported in Supplementary Data3. Prediction of future GWAS loci for SZ. To identify LD partners and to clump our GWAS results, we used a threshold of r2> 0.1 and a 1,000,000 kb window in the
1000 Genomes phase 1 version 3 European reference panel. Our SZ summary statistics contained 51,721 approximately independent SNPs with PSZ< 0.05. We identified 21,430 SNPs in LD with the 50 novel SNPs reported in ref.28and 54,425 SNPs in LD with the 128 genome-wide significant loci that were previously reported5. We removed SNPs in LD with the previously GWAS hits from our analyses because those SNPs could (by definition) not be identified as novel. The remaining set of 51,528 approximately independent SNPs with PSZ< 0.05 in our SZ GWAS results contained one proxy for each of the 50 novel SNPs in ref.28. After removing SNPs in LD with previous GWAS hits, 110 SNPs with PSZ< 0.05 also exhibited PEA< 10−5in the independent EA GWAS sample. Of those 110 SNPs, six were identified as novel SZ loci in the most recent GWAS dataset expansion28. Using Fisher’s exact test, we rejected the null hypothesis that the proportion of novel SNPs (6/110 vs 50/51528) is equal in the two sets (P= 2.4 × 10−9, two-sided).
Furthermore, as a robustness check, we performed the analysis again by excluding the SNPs with MAF≤ 0.1 and found similar results (P = 1.2 × 10−6). Thus, we conclude that conditioning GWAS results on SZ with independent GWAS evidence on EA significantly outperforms pure chance in predicting GWAS results on SZ from even larger samples.