• No results found

Gene expression in chromosomal Ridge domains : influence on transcription, mRNA stability, codon usage, and evolution - Thesis

N/A
N/A
Protected

Academic year: 2021

Share "Gene expression in chromosomal Ridge domains : influence on transcription, mRNA stability, codon usage, and evolution - Thesis"

Copied!
171
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Gene expression in chromosomal Ridge domains : influence on transcription,

mRNA stability, codon usage, and evolution

Gierman, H.J.

Publication date

2010

Document Version

Final published version

Link to publication

Citation for published version (APA):

Gierman, H. J. (2010). Gene expression in chromosomal Ridge domains : influence on

transcription, mRNA stability, codon usage, and evolution.

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

(2)
(3)

Influence on transcription, mRNA stability, codon usage, and evolution

(4)

© 2010 HJ Gierman, Amsterdam, the Netherlands.

Gene expression in chromosomal Ridge domains. Influence on transcription, mrna stability, codon usage, and evolution.

The research presented in this thesis was performed at the Department of Human Genetics, Academic Medical Center Amsterdam, University of Amsterdam. Publication of this thesis was financially supported by the Academic Medical Center Amsterdam and the University of Amsterdam.

All rights reserved. No parts of this thesis may be reproduced, stored in a retrieval of any nature, or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise, without permission of the author.

Layout: Eelco Roos, Department of Human Genetics, Academic Medical Center Amsterdam.

Cover design: Silvina B. Kahil, www.silvinakahil.com. Printed by: Ipskamp Drukkers, Enschede. ISBN/EAN : 978-90-9025305-3

(5)

Influence on transcription, mRNA stability, codon usage, and evolution

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus

prof.dr. D.C. van den Boom

ten overstaan van een door het college voor promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel

op dinsdag 27 april 2010, te 14.00 uur

door Hendrikus Jasper Gierman geboren te Enschede

(6)

Promotiecommissie:

Promotor: Prof.dr. R. Versteeg Overige leden: Prof.dr. H.N. Caron

Prof.dr. R. van Driel Dr. P.F. Fransz Prof.dr. L.D. Hurst

Prof.dr. M.M.S. van Lohuizen Dr. B. van Steensel

(7)
(8)
(9)

1. Introduction: Gene regulation by chromosomal domains

2. Domain-wide regulation of gene expression in the human genome

3. Genes in chromosomal Ridge domains have

increased mRNA folding stability and half-life, further contributing to their high expression

4. A model to explain natural selection for extreme levels of protein expression in the human genome 5. EZH2 overexpression associated with gain of

chromosome arm 7q is essential for neuroblastoma cell cycle progression and a marker of poor

prognosis

6. Discussion: Mechanism of Ridges and implications for evolution 7. Summary 8. Nederlandse samenvatting 9. Dankwoord 10. Curriculum vitae 11. List of publications 9 23 53 77 101 129 147 153 158 165 167

(10)
(11)

1

Introduction: Gene Regulation by

Chromosomal Domains

.

(12)

1

Introduction: Gene Regulation by Chromosomal Domains

Hinco J. Gierman and Rogier Versteeg

Department of Human Genetics, Academic Medical Center, University of Amsterdam, P.O. Box 22700, 1100 DE Amsterdam, the Netherlands.

Published in part as: Clustering of highly expressed genes in the human genome. Encyclopedia of Life Sciences. 2008 Apr;30:a0005931 John Wiley & Sons, Ltd: Chichester.

1.1 Introduction

Gene expression is the most fundamental of all biological processes and can be viewed as the sum of mechanisms that transcribe DNA into RNA, into protein. As important as the function of a protein, is the place, the time and the quantity of expression. The cellular mechanisms that underlie these three determinants are what make up ‘the regulation of gene expression’. Together, they control in which cells (the place), at what point during development (the time) and how many molecules (the quantity) of any protein is produced. This control ensures the correct expression of all genes during development. When the regulation of these genes is disturbed, e.g. by mutations in the DNA, diseases like cancer can arise.

Identifying and understanding the mechanisms involved in gene regulation is essential for understanding how cancer arises. Many cellular mechanisms are known that regulate the expression of individual genes. In this thesis we asked, whether in addition to these well-known mechanisms, genes are also regulated at the level of chromosomal domains called ‘Ridges’ (abbreviated from ‘Regions of IncreaseD Gene Expression’). In this thesis, we show that Ridges increase transcription by a domain-wide mechanism, that Ridge genes have an increased messenger RNA (mRNA) stability and finally, that Ridge messenger RNA (mRNA) have codons that facilitate highly efficient translation. We propose that this system provides a highway enabling an expansion of the protein expression range in the genome and we discuss

(13)

1

1.2 The Human Transcriptome Map

The expression of genes is regulated in the first instance by transcription factor complexes. These complexes bind to regulatory sequences, usually in the promoter region of a gene. The concentration and composition of these complexes determine the amount of mRNA that is produced. This system of individual gene regulation in principle allows genes to be randomly positioned throughout a genome, and this was long assumed to be the case for most genes. However, if clustering of genes with similar activity or related function occurs, this predicts that chromosomal regions would either show differences in average activity, or co-expression under specific conditions or in certain tissues.

With the emergence of high-throughput screening of mRNA levels (within this context commonly referred to as expression profiling), it became possible to analyze the expression levels of thousands of different genes at once. One of the techniques used to this end was Serial Analysis of Gene Expression (SAGE) (Velculescu 1995). In short, concatemers of 3’ fragments of mRNA molecules are cloned into bacterial plasmids. Sequencing of 50,000 to 100,000 of these 3’ tags yielded a quantitative expression profile of a cell or tissue. In the same year, the first complete genomic sequence of a free living organism was published: Haemophilus influenzae Rd. (Fleischmann 1995). The convergence of these two techniques, expression profiling and whole genome sequencing, allowed mapping the expression of every gene onto its chromosomal position. These so-called ‘transcriptome maps’ allowed to test whether genes of similar activity or function show clustering. This was first done for the budding yeast Saccharomyces cerevisiae (Velculescu 1997). The study showed some clustering of co-expressed genes, but found no clusters of high or low expression on any chromosome. A second study looked deeper into the clustering of these co-expressed genes in yeast and concluded that yeast possesses small chromosomal domains of gene expression (Cohen 2000). They found that clusters of 2–3 genes, adjacent and non-adjacent, showed co-expression.

The sequencing of the human genome had been underway for a decade by then, and was nearing its completion. Our lab used an early radiation hybrid map of the human genome (Deloukas 1998) to map expression data from SAGE libraries (Caron 2001). The resulting Human Transcriptome Map (HTM) revealed an unexpected organization in the human genome: Highly expressed genes were found to cluster in so-called Regions of IncreaseD Gene Expression (Ridges). A more detailed mapping using the first draft human genome sequence (Lander 2001), revealed that poorly expressed genes also clustered in separate regions termed anti-Ridges (Versteeg 2003). Figure 1 shows a transcriptome map of the q-arm of chromosome 1. Each black vertical bar represents a gene. The height of each bar indicates the activity of the domain surrounding that gene. To measure the activity of a domain, the median expression over a window of genes is calculated. A window encompasses the gene itself and an equal number of adjacent genes on both sides. The domain activity was calculated for all genes on each chromosome, by sliding the window one gene at a time. In Figure 1, the typical window size of 49 genes was used. Comparable results are obtained for window sizes ranging from 19 to 59 genes. Figure 1 shows that a

(14)

1

number of domains have a high expression: These are Ridges (shaded red). Equally so, the anti-Ridges (shaded blue) clearly have a lower overall expression.

On average, Ridges and anti-Ridges consist of 80–90 genes, but these domains can range from 30 to 500 genes in size. There are about 30 Ridges and 30 anti-Ridges in the human genome. Although the exact number depends on the window size and statistical threshold used, almost every chromosome has at least one Ridge or anti-Ridge. Roughly 20–25% of all human genes reside within a Ridge and 10– 15% are in an anti-Ridge. The bulk of the human genome however, is made up of domains of intermediate gene expression harboring the remaining 60–70% of genes. Chromosomes are thus an assemblage of different expression domains that form a higher-order organization of the human genome.

Many other studies have investigated gene clustering. For example, in mice domains exist with dense or sparse transcription (Carninci 2005). However, most studies have focused on chromosomal clustering of co-expressed genes. This has been found to occur in various organisms like S. cerevisiae (Velculescu 1997; Cohen 2000; Burhans 2006), Drosophila (Spellman 2002), C. elegans (Roy 2002) and Arabidopsis (Williams 2004; Ren 2005), mice (Mijalski 2005) and humans (Bortoluzzi 1998; Vogel 2005) (see also Lee 2003; Hurst 2004). These clusters are conserved during evolution, indicating the importance of the chromosomal organization of these genes (Singer 2005; Sémon 2006).

1.3 Gene Expression in Ridges: Housekeeping and Tissue-specific Genes

Ridges are enriched for highly expressed genes, but medium and poorly expressed genes populate Ridges as well. Also, many highly expressed genes are found outside Ridges. Genes can be categorized into tissue-specific and ubiquitously expressed Figure 1. Physically mapped transcriptome profile of the q-arm of human chromosome 1. Giemsa banding

is illustrated below the transcriptome map (centromere/heterochromatic region is green and marked ‘cen’). Ridges are shaded red and marked with an ‘R’, anti-Ridges are shaded blue and marked ‘AR’. Black vertical bars represent genes and their height indicates domain activity for a moving median window of 49 genes (MM49) in 133 pooled SAGE libraries from different tissues. Below is the chromosomal position in megabases (UCSC Genome Build HG18). Illustration adapted from Gierman et al. (Figure 1; Gierman 2007).

(15)

1

genes. In general, Ridge genes are broadly expressed over different tissue types (Lercher 2002). Lercher et al. proposed that Ridges are formed by clustering of housekeeping genes (Lercher 2002). For the calculation of Ridges, the average expression of each gene in a collection of SAGE libraries of different tissue types is used (Versteeg 2003). This means that genes that are ubiquitously highly expressed will have the highest average expression. Conversely, genes that are highly expressed in just one or a few tissues will have high tissue-specific expression but a low average expression. This raises the question whether the only difference between Ridge genes and other genes is the broad expression, or whether Ridge genes are also more highly expressed. Figure 2A shows that also the maximal tissue-specific expression of genes follows the pattern of Ridges and anti-Ridges (Versteeg 2003). Ridge genes are thus both more highly and more broadly expressed. This probably reflects that many housekeeping genes are both broadly and highly expressed. Nevertheless, genes in Ridges are subject to tissue-specific regulation. Figure 2B shows the variation in individual gene expression of genes on chromosome 9 over 62 different SAGE libraries from different tissues. Ridge domains thus appear to be favorable for genes with a high and/or ubiquitous expression, but equally allow for tissue-specific regulation of genes.

1.4 Ridges and anti-Ridges Differ in Organization, GC Content and Chromatin

Detailed analysis of the Human Transcriptome Map showed that many physical parameters of the genome correlate with Ridges (Versteeg 2003). Many of these correlations confirmed earlier observations (Bernardi 1985a). The clearest correlation is with gene density and can be observed in Figure 1: As each vertical black bar marks the position of a single gene, the density of bars directly indicates the gene density. Figure 3 shows this more clearly with a direct comparison of gene density and gene expression (panels E and F). Ridges also have shorter genes and shorter introns (panel D) and most repeats (e.g. LINEs) are less frequent in Ridges, with the exception of SINEs which are more abundant (panels A and B). An important genomic feature is the genomic GC content (i.e. the ratio of G/C versus A/T bases), which is also higher in Ridges (panel C).

The genome of warm-blooded vertebrates (i.e. birds and mammals), display a strong variation in the GC content of large chromosomal regions, also known as isochores (Bernardi 1985a; Costantini 2006). These regions can be hundreds of kilobases long and in humans their GC content varies from 30% to 60%. Isochores are predominantly the result of the accumulation of changes caused by a mutation bias (Duret 2009). The mutation bias most likely arose to compensate for the hypermutability of methylated cytosines, which spontaneously deaminate to thymines. C/G pairs thus frequently mutate into T/G mispairs, and the base excision repair system has become strongly biased towards repairing G/T mispairs in favor of the guanine to compensate for this (Brown 1987; Brown 1988; Brown 1989). This repair system thus increases the GC content of loci where T/G or A/G mispairs originate from A/C pairs. Conversely, despite the bias in repair, cytosines that are methylated will disappear over time. This is because as the cytosine is continuously mutated, eventually the T/G mispair will be repaired in favor of the thymine, creating a T/A pair instead of the original

(16)

1

G/C. This repair occurs during meiotic recombination producing the so-called ‘biased gene conversion’ which has shaped isochores (Filipski 1987; Sueoka 1988; Wolfe 1989; Press 2006; Duret 2008 and reviewed by Duret 2009). This mutational bias affects the GC content of all sequences in isochores, including the coding sequences of genes (Bernardi 1985b; Cruveiller 2004). Recombination is thus thought to drive the formation of isochores, and the non-uniform distribution of GC content is likely formed to some extent by the different rates of recombination throughout the genome (Fullerton 2001; Kong 2002; Montoya-Burgos 2003; Meunier 2004). Analysis of the mouse Fxy gene has shown that GC content can increase rapidly. For the third codon position (i.e. the wobble base for which a base pair change often encodes the Figure 2. Transcriptome maps of chromosome 9. (A) Moving median of the height of average expression

(blue) and tissue-specific expression (red) per 100,000 tags. Expression values were determined in a collection of 57 SAGE libraries of 50,000 tags or more. Blue and red bars indicate anti-Ridge and Ridge. Genes are sequentially ordered according to chromosomal position, but not physically spaced (window size 49 genes). (B) Individual gene expression over 62 SAGE libraries of 50,000 or more tags (horizontal lines). Each vertical line is a gene. The levels of expression are given by a color code, ranging from zero (blue) to 25 (purple) or more tags/100,000 transcript tags in a library. Illustrations adapted from Figure 5 and 1G Versteeg et al. (Versteeg 2003).

(17)

1

1999). This is an evolutionary short period of time: humans and their closest living relative the chimpanzee, diverged 5–7 million years ago (Patterson 2006).

There is a straightforward linear correlation between the GC content and gene expression of e.g. a window size of 49 genes (R2 = 0.51, P < 10-99; Versteeg 2003, data not shown). The difference in GC content between Ridges and anti-Ridges, applies to all of the genomic sequence, including the coding sequences of genes. The GC content of most anti-Ridge mRNA lies between 40% and 50%, whilst Ridge mRNA typically have a GC content of 50% to 65% (see also chapter 3 and 4). But not only the composition of DNA is different in Ridges. DNA is packaged into chromatin, which consists of histone proteins. These histones can be modified on a multitude of residues, mostly by phosphorylation, methylation and acetylation. These modifications influence transcription in two ways: directly, by binding transcriptional complexes and indirectly, by changing the chromatin structure. Recent studies using chromatin immunoprecipitation (ChIP) of histone modifications show that Ridges are associated with active histone marks associated with transcription (Bernstein 2005; Roh 2005; Barski 2007). Importantly, Ridges were also found to have an open chromatin structure throughout their entire domain, even where genes in Ridges are not expressed (Gilbert 2004; Goetze 2007). Open chromatin facilitates gene expression and could be a consequence of the increased transcription in Ridges. However, the broad open chromatin structure of Ridges might also contribute to the expression of genes in Ridges (Sproul 2005).

1.5 Nuclear Organization and Ridges

Just as genes are not randomly distributed over the genome, the chromatin fiber is not randomly packaged into the nucleus. Many studies have shown that active genes usually reside more towards the nuclear center than inactive genes (reviewed by Cremer 2001; Lanctôt 2007). It has been suggested that the nuclear localization of chromosomal domains contributes to the regulation of their expression. For example, it has been shown for Drosophila that hundreds of inactive genes cluster and interact with the lamina at the nuclear periphery (Pickersgill 2006). These genes are characterized by inactive histone marks, which could be caused by the histone deacetylase activity present at the nuclear lamina (Somech 2005). However, induction of gene expression disrupts the interaction with the lamina, suggesting that localization at the nuclear periphery is a consequence of low gene expression rather than a cause (Pickersgill 2006). Similarly, induction of gene expression in the human major histocompatibility complex (Volpi 2000), CFTR locus (Zink 2004), or Hox cluster (Morey 2009), was also found to drive nuclear position. Transcription itself might not be directly responsible for the looping and repositioning of chromosomal regions. Rather, the increase in histone acetylation that occurs upon induction of transcription, might contribute to the behavior of the chromatin fiber (Tumbar 1999; Belmont 1999).

In humans, Lamina Associated Domains (LADs) with low overall gene expression were also discovered and reported to cover 40% of the genome (Guelen 2008;

(18)

1

Wen 2009). LADs coincide with gene poor regions (Guelen 2008), and there is a good correspondence between LADs and anti-Ridges (data not shown). LADs are also enriched for histone marks associated with heterochromatin (Guelen 2008). Domains of heterochromatin have been proposed to act as organizing centers that might help position active euchromatic domains within the nuclear center (van Driel 2004).

Goetze et al. showed for six different cell lines that a specific Ridge on chromosome 11 was always more in the nuclear interior than an anti-Ridge on the same chromosome. Figure 3. Profiles showing gene expression and physical parameters for chromosome 9: (A) Inverse LINE

density, (B) SINE density, (C) GC content, (D) inverse intron length, (E) gene density (F), average gene expression. All profiles are moving medians over the parameter values per gene for a window size 49. Bars indicate anti-Ridge (AR) and Ridge (R). Genes are sequentially ordered according to chromosomal position, but not physically spaced (window size 49 genes). Illustration adapted from Figure 1 Versteeg et al. (Versteeg 2003).

(19)

1

Ridge, the overall expression level of the Ridge was similar in all six cell lines. This was also the case for the anti-Ridge. These results are in agreement with the idea that the overall activity of a chromosomal domain drives nuclear organization. This might explain the apparent paradox of why an inactive gene (residing in a Ridge), can be located in the nuclear interior.

1.6 Domain-wide Regulation of Chromosomal Domains in Cancer

Ridges and anti-Ridges in general have a consistent activity throughout different tissue types. There are however, a number of smaller specialized clusters of related genes in the human genome, such as the Hox, globin and histone gene clusters. These groups of genes are known to be regulated together in a coordinated fashion. Recently, a number of studies have shown that small clusters of unrelated genes can also show co-expression throughout different tissues. Most notably a study on bladder carcinomas showed that clusters of up to 12 unrelated genes have a correlated expression pattern in a subset of bladder carcinomas (Stransky 2006). The authors demonstrate for one of these clusters that the genes are silenced by a domain-wide increase in histone methylation. The spreading of histone marks is a well known mechanism, but until now has only been implicated in particular processes, such as heterochromatin formation and the inactivation of the X chromosome. Although these clusters are smaller than Ridges, these findings show that epigenetic regulation of gene clusters might play a more important role in the genome than previously thought.

1.7 Specific Aims of This Thesis

The existence of Ridges raises the question what causes the high expression of Ridge genes: individual regulation of genes by strong promoters, or an additional domain-wide effect that up-regulates transcription? In Chapter 2 we address this

question by creating a collection of 90 clones of a human embryonal cell line with a single randomly integrated fluorescent lentiviral reporter construct. We determined the chromosomal integration site and fluorescence of each clone. Thus, we compared the transcriptional activity of clones with a Ridge-integrated reporter construct versus clones with their reporter situated in an anti-Ridge. This showed that Ridges up-regulate expression 4- to 8-fold compared to anti-Ridges.

The correspondence between Ridges and the isochore structure of the human genome, suggests that transcription of Ridge genes produces mRNAs with a distinct nucleotide composition (i.e. higher GC content). Chapter 3 investigates the

effect of GC content on the stability of Ridge mRNAs. We find that due to their high GC content, mRNAs from Ridges have higher folding stabilities as predicted by their minimal free energy. Microarray analysis on human cells treated with two transcriptional inhibitors, shows that Ridge mRNAs have 1.5–2 hour longer half-lives than anti-Ridge mRNAs.

Chapter 4 looks into the effect of GC content on the codon usage and translation

of Ridge genes. We find that the high GC content in Ridge mRNAs causes an increase in preferred codons and optimal translation initiation sites. We propose

(20)

1

an evolutionary model that explains how genes can acquire extreme levels of protein expression by translocating to Ridges. The chapters 2, 3 and 4 describe

how Ridges increase the transcription of their embedded genes, while mRNAs of Ridge genes are in addition more stable and also have codons that facilitate highly efficient translation. This suggests that Ridges and their physical properties enable a ‘highway’ for gene expression in the genome. Since the range of expression levels of cellular proteins is quite extreme, Ridges might contribute to very high protein expression levels by superimposing the three mechanisms proposed in chapters 2–4 to achieve an exponential system of gene expression.

Chapter 5 describes the role the histone methyltransferase enhancer of zeste

homolog 2 (EZH2) in neuroblastoma. In cancer, chromosomal domains were shown to be deregulated by chromatin modifying enzymes. This prompted us to investigate the role of EZH2 in the pediatric cancer neuroblastoma, where it is highly expressed. EZH2 belongs to the Polycomb group proteins and has been implicated in cancer as an oncogene. Here we show that EZH2 is required for cell cycle progression in neuroblastoma and is associated with a poor prognosis.

In Chapter 6 we discuss the likelihood of several well-known mechanisms as

mediators of domain-wide up-regulation of transcription by Ridges. We propose a mechanism to explain how Ridges function and the potential impact they have on evolution.

(21)

1

References

Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007 May 18;129(4):823-37.

Belmont AS, Dietzel S, Nye AC, Strukov YG, Tumbar T. Large-scale chromatin structure and function. Curr Opin Cell Biol. 1999 Jun;11(3):307-11.

Bernardi G, Bernardi G. Codon usage and genome composition. J Mol Evol. 1985b;22(4):363-5. Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. The mosaic

genome of warm-blooded vertebrates. Science. 1985a May 24;228(4702):953-8.

Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, Huebert DJ, McMahon S, Karlsson EK, Kulbokas EJ, Gingeras TR, et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005 Jan 28;120(2):169-81.

Bortoluzzi S, Rampoldi L, Simionati B, Zimbello R, Barbon A, d’Alessi F, Tiso N, Pallavicini A, Toppo S, Cannata N, et al. A comprehensive, high-resolution genomic transcript map of human skeletal muscle. Genome Res. 1998 Aug;8(8):817-25.

Brown TC, Jiricny J. A specific mismatch repair event protects mammalian cells from loss of 5-methylcytosine. Cell. 1987 Sep 11;50(6):945-50.

Brown TC, Jiricny J. Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells. Cell. 1988 Aug 26;54(5):705-11.

Brown TC, Jiricny J. Repair of base-base mismatches in simian and human cells. Genome. 1989;31(2):578-83.

Burhans DT, Ramachandran L, Wang J, Liang P, Patterton HG, Breitenbach M, Burhans WC. Non-random clustering of stress-related genes during evolution of the S. cerevisiae genome. BMC Evol Biol. 2006 Jul 21;6:58.

Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al., FANTOM Consortium; RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group). The transcriptional landscape of the mammalian genome. Science. 2005 Sep 2;309(5740):1559-63.

Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus MC, van Asperen R, Boon K, Voute PA, et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science. 2001 Feb 16;291(5507):1289-92.

Cohen BA, Mitra RD, Hughes JD, Church GM. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet. 2000 Oct;26(2):183-6. Costantini M, Clay O, Auletta F, Bernardi G. An isochore map of human chromosomes. Genome Res.

2006 Apr;16(4):536-41.

Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat Rev Genet. 2001 Apr;2(4):292-301.

Cruveiller S, Jabbari K, Clay O, Bernardi G. Compositional gene landscapes in vertebrates. Genome Res. 2004 May;14(5):886-92.

Deloukas P, Schuler GD, Gyapay G, Beasley EM, Soderlund C, Rodriguez-Tomé P, Hui L, Matise TC, McKusick KB, Beckmann JS, et al. A physical map of 30,000 human genes. Science. 1998 Oct 23;282(5389):744-6.

Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 2008 May 9;4(5):e1000071.

Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285-311.

Filipski J. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Lett. 1987 Jun 15;217(2):184-6. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF,

Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496-512.

Fullerton SM, Bernardo Carvalho A, Clark AG. Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol. 2001 Jun;18(6):1139-42.

Gierman HJ, Indemans MH, Koster J, Goetze S, Seppen J, Geerts D, van Driel R, Versteeg R. Domain-wide regulation of gene expression in the human genome. Genome Res. 2007 Sep;17(9):1286-95.

(22)

1

Gilbert N, Boyle S, Fiegler H, Woodfine K, Carter NP, Bickmore WA. Chromatin architecture of the human genome: Gene-rich domains are enriched in open chromatin fibers. Cell. 2004 Sep 3;118(5):555-66. Goetze S, Mateos-Langerak J, Gierman HJ, de Leeuw W, Giromus O, Indemans MH, Koster J, Ondrej

V, Versteeg R, van Driel R. The three-dimensional structure of human interphase chromosomes is related to the transcriptome map. Mol Cell Biol. 2007 Jun;27(12):4475-87.

Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W, et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008 Jun 12;453(7197):948-51.

Hurst LD, Pal C, Lercher MJ. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004 Apr;5(4):299-310.

Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, et al. A high-resolution recombination map of the human genome. Nat Genet. 2002 Jul;31(3):241-7.

Lanctôt C, Cheutin T, Cremer M, Cavalli G, Cremer T. Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet. 2007 Feb;8(2):104-15.

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al.; International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.

Lee JM, Sonnhammer EL. Genomic gene clustering analysis of pathways in eukaryotes. Genome Res. 2003 May;13(5):875-82.

Lercher MJ, Urrutia AO, Hurst LD. Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet. 2002 Jun;31(2):180-3.

Meunier J, Duret L. Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 2004 Jun;21(6):984-90.

Mijalski T, Harder A, Halder T, Kersten M, Horsch M, Strom TM, Liebscher HV, Lottspeich F, de Angelis MH, Beckers J. Identification of coexpressed gene clusters in a comparative analysis of transcriptome and proteome in mouse tissues. Proc Natl Acad Sci U S A. 2005 Jun 14;102(24):8621-6.

Montoya-Burgos JI, Boursot P, Galtier N. Recombination explains isochores in mammalian genomes. Trends Genet. 2003 Mar;19(3):128-30.

Morey C, Kress C, Bickmore WA. Lack of bystander activation shows that localization exterior to chromosome territories is not sufficient to up-regulate gene expression. Genome Res. 2009 Jul;19(7):1184-94.

Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. Genetic evidence for complex speciation of humans and chimpanzees. Nature. 2006 Jun 29;441(7097):1103-8.

Perry J, Ashworth A. Evolutionary rate of a gene affected by chromosomal position. Curr Biol. 1999 Sep 9;9(17):987-9.

Pickersgill H, Kalverda B, de Wit E, Talhout W, Fornerod M, van Steensel B. Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet. 2006 Sep;38(9):1005-14. Press WH, Robins H. Isochores exhibit evidence of genes interacting with the large-scale genomic

environment. Genetics. 2006 Oct;174(2):1029-40.

Ren XY, Fiers MW, Stiekema WJ, Nap JP. Local coexpression domains of two to four genes in the genome of Arabidopsis. Plant Physiol. 2005 Jun;138(2):923-34.

Roh TY, Cuddapah S, Zhao K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 2005 Mar 1;19(5):542-52.

Roy PJ, Stuart JM, Lund J, Kim SK. Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature. 2002 Aug 29;418(6901):975-9.

Sémon M, Duret L. Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol. 2006 Sep;23(9):1715-23.

Singer GA, Lloyd AT, Huminiecki LB, Wolfe KH. Clusters of co-expressed genes in mammalian genomes are conserved by natural selection. Mol Biol Evol. 2005 Mar;22(3):767-75.

Somech R, Shaklai S, Geller O, Amariglio N, Simon AJ, Rechavi G, Gal-Yam EN. The nuclear-envelope protein and transcriptional repressor LAP2beta interacts with HDAC3 at the nuclear periphery, and induces histone H4 deacetylation. J Cell Sci. 2005 Sep 1;118(Pt 17):4017-25.

(23)

1

Sproul D, Gilbert N, Bickmore WA. The role of chromatin structure in regulating the expression of clustered genes. Nat Rev Genet. 2005 Oct;6(10):775-81.

Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, de Rycke Y, Elvin P, Cassidy A, Spraggon C, et al. Regional copy number-independent deregulation of transcription in cancer. Nat Genet. 2006 Dec;38(12):1386-96.

Sueoka N. Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2653-7.

Tumbar T, Sudlow G, Belmont AS. Large-scale chromatin unfolding and remodeling induced by VP16 acidic activation domain. J Cell Biol. 1999 Jun 28;145(7):1341-54.

van Driel R, Fransz P. Nuclear architecture and genome functioning in plants and animals: what can we learn from both? Exp Cell Res. 2004 May 15;296(1):86-90.

Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):484-7.

Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai MA, Bassett DE Jr, Hieter P, Vogelstein B, Kinzler KW. Characterization of the yeast transcriptome. Cell. 1997 Jan 24;88(2):243-51.

Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 2003 Sep;13(9):1998-2004.

Vogel JH, von Heydebreck A, Purmann A, Sperling S. Chromosomal clustering of a human transcriptome reveals regulatory background. BMC Bioinformatics. 2005 Sep 19;6:230.

Volpi EV, Chevret E, Jones T, Vatcheva R, Williamson J, Beck S, Campbell RD, Goldsworthy M, Powis SH, Ragoussis J, et al. Large-scale chromatin organization of the major histocompatibility complex and other regions of human chromosome 6 and its response to interferon in interphase nuclei. J Cell Sci. 2000 May;113 ( Pt 9):1565-76.

Wen B, Wu H, Shinkai Y, Irizarry RA, Feinberg AP. Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat Genet. 2009 Feb;41(2):246-50. Williams EJ, Bowles DJ. Coexpression of neighboring genes in the genome of Arabidopsis thaliana.

Genome Res. 2004 Jun;14(6):1060-7.

Wolfe KH, Sharp PM, Li WH. Mutation rates differ among regions of the mammalian genome. Nature. 1989 Jan 19;337(6204):283-5.

Zink D, Amaral MD, Englmann A, Lang S, Clarke LA, Rudolph C, Alt F, Luther K, Braz C, Sadoni N, Rosenecker J, Schindelhauer D. Transcription-dependent spatial arrangements of CFTR and adjacent genes in human cell nuclei. J Cell Biol. 2004 Sep 13;166(6):815-25.

(24)
(25)

2

Domain-Wide Regulation of Gene Expression in the

Human Genome

(26)

2

Domain-Wide Regulation of Gene Expression in the Human

Genome

Hinco J. Gierman1, Mireille H.G. Indemans1*, Jan Koster1*, Sandra Goetze2, Jurgen Seppen3, Dirk Geerts1, Roel van Driel2 and Rogier Versteeg1,5

1Department of Human Genetics, Academic Medical Center, University of Amsterdam, P.O. Box 22700,

1100 DE Amsterdam, the Netherlands. 2Swammerdam Institute for Life Sciences, University of Amsterdam,

the Netherlands. 3AMC Liver Center, Amsterdam, the Netherlands.

* These authors contributed equally to this work. Published in: Genome Res. 2007 Sep;17(9):1286-95.

ABSTRACT

Transcription factor complexes bind to regulatory sequences of genes, providing a system of individual expression regulation. Targets of distinct transcription factors usually map throughout the genome, without clustering. Nevertheless, highly and weakly expressed genes do cluster in separate chromosomal domains with an average size of 80 to 90 genes. We therefore asked whether, besides transcription factors, an additional level of gene expression regulation exists that acts on chromosomal domains. Here we show that identical green fluorescent protein (GFP) reporter constructs integrated at 90 different chromosomal positions obtain expression levels that correspond to the activity of the domains of integration. These domains are up to 80 genes long and can exert an 8-fold effect on the expression levels of integrated genes. 3D-FISH shows that active domains of integration have a more open chromatin structure than integration domains with weak activity. These results reveal a novel domain-wide regulatory mechanism that, together with transcription factors, exerts a dual control over gene transcription.

(27)

2

INTRODUCTION

A few groups of adjacent genes in mammalian genomes have been found to exhibit co-regulated expression, and examples of such domain-wide control include the Hox clusters (Gould 1997), X chromosome inactivation (Plath 2002) and position effect variegation (PEV) exerted by heterochromatin on adjacent regions (Weiler 1995). However, it is assumed that the vast majority of human genes are individually regulated by transcription factor complexes.

The Human Transcriptome Map integrated high-throughput expression data measured by SAGE (serial analysis of gene expression) with the human genome sequence, which revealed that the genome consists of many domains of highly and weakly expressed genes (Caron 2001). The highly expressed domains (called Ridges) are gene dense, GC-rich, SINE repeat rich, and the genes have short introns, whereas the weakly expressed domains (called anti-Ridges) show the opposite characteristics (Versteeg 2003; Lercher 2003). Ridges were also described in the mouse genome (Mijalski 2005), and were found to be relatively conserved compared to the human genome (Singer 2005).

The highly expressed genes in Ridges are generally broadly expressed throughout different tissue types (Lercher 2002). However, not all genes in Ridges are highly expressed in each tissue type and tissue-specific regulation of gene expression also occurs in Ridges (Versteeg 2003). Adjacent genes in Ridges can therefore have very different expression levels. This is in line with genome-wide analyses where expression of individual genes was not found to correlate over distances of more than two genes (Semon 2006).

Ridges were recently found to be enriched for open chromatin fibers (Gilbert 2004) and active promoters (Kim 2005). Nonetheless, it is not known whether the differential expression in Ridges and anti-Ridges is due to individual gene regulation, or if domain-wide mechanisms exert an additional effect (reviewed by Hurst 2004; Sproul 2005). Here we present data that for the first time show that active domains in the genome contribute substantially to the expression of their embedded genes.

RESULTS

Construction and Sequencing of Clone Collection

To ascertain whether chromosomal domains can influence the activity of embedded genes, we studied the expression level of the same reporter gene integrated at many different positions in Ridges, anti-Ridges, and domains displaying intermediate gene expression. We infected human embryonic kidney cells (HEK293) with a lentiviral construct harboring the GFP gene driven by the ubiquitously expressed human phosphoglycerate kinase (PGK) promoter (Dull 1998; Zufferey 1998). Cells were transduced at low multiplicity of infection (MOI = 0.03) to favor single integrations. Individual GFP-positive cells were isolated by fluorescence-activated

(28)

2

cell sorting (FACS), and equal numbers of clones with low, medium, and high GFP expression were selected for further expansion. Over 100 clones were cultured and analyzed by Southern blotting to select for single integrations (>90% of clones). The integration sites of the viral constructs were PCR amplified, sequenced, and mapped onto the genome (see Methods and Supplemental Protocol S1). Insertion sites were unequivocally determined in 90 clones that had unique integrations in 21 chromosomes (Figure 1 and Supplemental Table S1).

FACS analysis of all clones showed a broad range of GFP expression levels. Expression of GFP mRNA of 10 representative clones was analyzed by Northern blotting. The mRNA levels were quantified by phosphorimaging and normalized to GAPDH levels, which revealed a linear correlation with the levels of GFP fluorescence (Pearson R2 = 0.98, P = 10–8, Supplemental Figure S1) and thereby validated the use of fluorescence as a measure of transcriptional activity of the GFP gene. Analysis of seven representative clones showed that levels of GFP fluorescence were constant over an extended culturing period (Supplemental Figure S2).

GFP Expression in Ridges is Higher Than in anti-Ridges

The set of clones enabled us to analyze whether the domain of integration influenced the GFP expression levels. Figure 1 shows the position and GFP expression of all the 90 clones in the expression profiles of the Human Transcriptome Map. We analyzed whether integration in Ridges confers a higher expression level to integrated constructs than integration in anti-Ridges. We identified 22 clones with integrations in Ridges and 14 clones with integrations in anti-Ridges. Most clones with Ridge integrations displayed high fluorescence, whereas most anti-Ridge clones had a low to intermediate fluorescence (Figure 1 and Figure 2A). The average GFP expression of the Ridge clones was 4.0 fold higher than of the anti-Ridge clones, indicating a strong effect of Ridges and anti-Ridges on GFP transcription (Figure 2A, P = 7.6x10-3, unpaired t-test; for this and all other analyses, GFP fluorescence and moving median values were 2log transformed to obtain a normal distribution). The average expression of endogenous genes in Ridges and anti-Ridges differs by a factor 3.9 (Figure 2B). The observation that an identical gene integrated in Ridges or anti-Ridges acquires the relative expression level of the domain of integration, suggests a strong regulatory effect of the domain of integration. As the ratio of GFP expression in Ridges and anti-Ridges (4.0) is comparable to the ratio of endogenous gene expression in Ridges and anti-Ridges (3.9), a similar domain effect seems to act on the endogenous genes in Ridges and anti-Ridges.

The different Ridges in the genome vary with regard to their median expression level, and hence they may also differ with respect to the effect they have on integrated GFP constructs. To estimate the maximal impact of domains on embedded genes, we compared clones with GFP constructs integrated in the most active and most inactive domains. Domain activity was defined as previously described (Versteeg 2003). In short, the median expression level of the 49 genes surrounding an integration site is determined for each clone (see Methods). The ten clones with integrations in the

(29)

2

Figure 1. Physically mapped transcriptome profiles of all chromosomes, showing the integration sites

and expression levels of all GFP constructs. Giemsa banding is illustrated below each transcriptome map (centromere, yellow; heterochromatic region, green) and Ridges (red) and anti-Ridges (blue) are indicated by bars below the Giemsa banding. Black vertical bars represent genes (n = 20,382) and their height indicates domain activity for a window of 49 genes (median expression of the surrounding 49 genes in 133 pooled SAGE libraries). Green lollipops indicate integration sites of all GFP constructs (n = 90). The height of each lollipop corresponds to the expression level of the integrated GFP construct. The numbered clones on chromosome 1 were used for 3D FISH analysis (see Fig. 6).

(30)

2

ten clones with integrations in the least active domains (Figure 2C; P = 6.5x10-5, unpaired t-test). These findings show that identical transgenes integrated in different chromosomal regions, acquire expression levels that strongly correlate with the expression levels of the domains of integration.

GFP Expression Correlates With Activity of Domains up to 80 Genes Long

The effect of the chromosomal domain on the expression level of the integrated GFP constructs could either result from local effects of genes adjacent to the integration site, or from a mechanism that acts on the domain as a whole. A local effect exerted by nearby active promoters and enhancers, would predict a high correlation between expression of GFP and neighboring genes, while a domain-wide effect predicts a high correlation of GFP with the expression of the domain as a whole rather than with the neighboring genes. To analyze both possibilities, we calculated the median gene expression level for window sizes from 1 to 201 genes around each of the integration sites. Gene expression data were obtained by combining 133 different SAGE libraries of various tissue types (see Methods). This median expression level showed a strong correlation with GFP expression, being the highest for window sizes of roughly 19 to 79 genes around the integration sites (average R = 0.50, P < 10–6, Figure 3). The correlation between GFP and the immediate neighboring genes Figure 2. Expression of GFP constructs and genes in different chromosomal domains. (A) Average GFP

expression of all clones with integrations in Ridges (n = 22), intermediate domains (i.e., neither Ridge nor anti-Ridge, n = 54), or anti-Ridges (n = 14). (B) Average expression (based on 133 pooled SAGE libraries) of all human genes embedded in Ridges (n = 4,250), intermediate domains (n = 13,226) or anti-Ridges (n = 2,906). (C) Average GFP expression of the clones harboring integration sites with the highest (n =10) and lowest (n =10) domain activities (defined as the median expression of the surrounding 49 genes in 133 pooled SAGE libraries). Error bars represent standard error of the mean.

(31)

2

observation might be limited due to the fewer expression values included in these smaller window sizes, which could result in a higher variance and a lower correlation (see below). To further test the significance of the observed positive correlations, we performed a Monte Carlo simulation in which GFP expression values were randomly distributed among the clones. The correlation between domain activity and GFP value was calculated for one million permutations per window size (see also Supplemental Figure S3). Values of R > 0.45 were observed with a frequency of less than 2x10–5 for any individual window size. Therefore, the observation of correlations of R > 0.45 for all window sizes of 19 to 79 within the actual data, is highly significant and suggests that the inserted GFP gene acquires an expression level related to domains of up to 80 genes surrounding the integration site. This length agrees well with the average size of Ridges and anti-Ridges of 94 and 81 genes respectively. As Ridges are more gene-dense than anti-Ridges, these values correspond to an average length of 6.2 Mb for Ridges and 17.1 Mb for anti-Ridges. The various Ridges and anti-Ridges in the human genome are however highly variable in length and these numbers represent average values only.

Domain Effect on GFP Expression is Stronger Than Effect of Neighbor Genes.

The transcriptome profiles based on SAGE libraries from different tissues are not necessarily representative for HEK293 cells, thus we compared our GFP data with Figure 3. Correlation between GFP expression and domain activity for window sizes of 1 to 201 genes.

Pearson correlation coefficient (R, y-axis) was calculated for GFP expression and domain activities of the integration sites of all 90 clones for window sizes increasing from 1 to 201 genes (x-axis). This was done using ‘all-tissue’ domain activity data represented by 133 SAGE libraries from different human tissues (black line) and cell-line-specific HEK293 activity measured by Affymetrix microarrays (gray line).

(32)

2

transcriptome profiles specific for HEK293. Even a large SAGE library is not powerful enough to generate a reliable transcriptome profile of an individual cell line, thus we used Affymetrix U133 Plus 2.0 microarrays to generate a HEK293 expression profile. The expression values were related to the same transcriptional units (TUs) on the genome as used for SAGE (see Methods). However, the SAGE data were obtained for 20,382 TUs, whereas the Affymetrix data covered only 16,841 (83%) of the TUs and were hence not as powerful in this respect. Nevertheless, we obtained transcriptome maps for all chromosomes of HEK293 with profiles comparable to the SAGE-based all-tissue map (data not shown). These maps enabled us to determine whether GFP expression in the clones was correlated with HEK293-specific expression data. Figure 3 demonstrates that the all-tissue activity and the HEK293-specific activity showed very similar patterns of correlation between GFP levels and domain-wide expression, with the HEK293 data again giving maximum correlations for the same domain sizes of roughly 19 to 79 genes and an average correlation of R = 0.36 (P < 4.9x10-3). Of note, also the HEK293-specific analysis showed a much weaker correlation between expression of GFP and the closest neighboring gene (R = 0.10, P = 0.39). This was also true for neighboring genes located either parallel or anti-parallel to the GFP insert (data not shown). We performed Monte Carlo simulations as described above, using the HEK293 expression data. The confidence intervals are very similar to those calculated on the SAGE data and confirm the significance of the observed correlations (values of R > 0.35 were observed with a frequency of less than 1x10-3 for any individual window size).

The breakdown of the correlation between GFP expression and domain activity at lower window sizes could suggest that the effect of the domain at large is much stronger than the effect of immediate neighboring genes. However, also here, it should be considered that the smaller window sizes include less expression data, which might result in a higher variance and consequently a drop in correlation. We therefore specifically analyzed whether neighboring genes affect GFP expression in HEK293. We first calculated the correlation between GFP levels and the average expression of the two immediate neighboring genes, then of GFP expression and the next two neighbors, etc. (Supplemental Figure S4). This analysis is unbiased, as equal amounts of expression data are used for each calculation. The analysis showed that up to a distance of roughly 20 genes, there is a low positive correlation (average R = 0.1). Although this correlation is not significant for most individual points, it suggests a weak effect of neighboring genes on GFP expression levels. To investigate the relative contributions to GFP expression by neighbor genes and by the domains at large, we made use of the earlier observation that not all highly expressed genes of the genome cluster in Ridges. In fact, two-thirds of the highly expressed genes are found outside Ridges. Moreover, not all genes in Ridges are highly expressed. Ridges also include weakly and non-expressed genes. This enabled us to independently assess the contribution of neighboring genes and of expression domains on GFP expression. We first split the set of GFP clones in two equally large groups, according to the average expression level of the two neighboring genes.

(33)

2

Figure 4. The effect of neighbor genes on GFP expression. (A) Average GFP expression of all clones

divided into two equally sized groups with either high (Hi-N) or low (Lo-N) neighbor gene activity. The 1.1-fold difference in GFP expression is not significant (P = 0.71). (B) Average GFP expression for both groups of Hi-N and Lo-N clones, each divided into two equally sized groups with either high (gray bars) or low (black bars) domain activity (window size 49). GFP expression differs 2.1-fold between both do-main types for the Hi-N group (P = 0.02) and 2.3-fold between both dodo-main types for the Lo-N group (P = 0.008). There is no significant difference in GFP expression between the Hi-N and Lo-N groups of the same domain type (P > 0.71). (C, F) Average domain activity per group of clones. Domain activity does not differ significantly between the Hi-N and Lo-N clones in A, or between the Hi-N and Lo-N clones from the different domain types in F. (D, E) Average neighbor gene activity per group of clones. The difference in neighbor gene activity is 9.9-fold between the Hi-N and Lo-N clones in A, but not significant between the different domain types of the Hi-N and Lo-N clones in F (P > 0.21). Neighbor gene activity was calculated as the average expression of the two immediate neighboring genes as measured by Affymetrix arrays. Domain activity was defined as the median expression of the surrounding 49 genes in 133 pooled SAGE libraries. Out of 90 clones, 6 have a pair of neighboring genes without a probeset and were therefore excluded from the analysis. Clone numbers in each group are thus: Hi-N (n =42) and Lo-N (n=42) for the analysis in A, C and D and n=21 for all 4 groups in B, E and F. All P-values were calculated with an unpaired t-test. Error bars represent standard error of the mean.

(34)

2

low neighbor gene expression (Lo-N) showed only a slightly different average GFP expression (Figure 4A). We subsequently analyzed the relation to domain activity in each of the two groups. The Hi-N group was split in two equal groups, according to the median activity of the 49 surrounding genes (window size 49). Now we observed a strong relation to the average GFP expression level: domains with high activity had a 2.1-fold higher GFP expression than domains of low activity (Figure 4B).

We also split the group of Lo-N in two halves, according to domain activity. Also here, a 2.3-fold higher GFP expression was found in the highly active domains, compared to the weakly-active domains. This analysis shows that the domain of integration has a strong influence on the GFP expression level, while the effect of neighboring genes is limited. Controls for the distribution of domain activity and neighbor gene activity over the analyzed groups validate this conclusion (see legends to Figure 4C-F).We have repeated this analysis using only the expression of the closest neighbor gene as well as separating the Hi-N and Lo-N groups according to integration in Ridge, intermediate and anti-Ridge domains. Each time we observed very similar results leading to the same conclusion (data not shown).

The conclusion that the expression of genes in Ridges and anti-Ridges is strongly influenced by an effect of the domain at large would make two predictions: Firstly, GFP expression should correlate with domain activity throughout the entire domain, including parts of the domain distant from the GFP integration site. Secondly, the correlation should break down at the border of each domain. To test these predictions, we aligned all clones with a GFP construct in Ridges or anti-Ridges (n = 36) on their GFP integration site. As the GFP insertions divide each domain in two unequal parts, we oriented the domains such that the larger fragments were on the same side. We calculated the correlation between GFP expression of each clone and domain activity (using a window size of 21 genes) at various positions within and outside the domain. Figure 5 shows a plot of the correlation between GFP expression and domain activity at distances of 0, 25%, 50% , 75% and 100% from the integrations site to the domain ends. Outside the domains, we chose fixed positions of 11, 31 and 51 genes from the domain border. The correlation remains high throughout the domain, but completely breaks down at the domain boundaries. Taken together, these results show that Ridges and anti-Ridges exert a domain-wide effect on GFP expression and form functional domains within the human genome.

Chromosome Structure Corresponds to GFP Expression

Several recent studies have examined differences in chromatin condensation and nuclear position of chromosomal domains. Amongst others, the group of Cremer has shown that gene dense domains are positioned towards the nuclear interior (Bolzer 2005). Bickmore and co-workers found that gene-dense domains throughout the genome possess open chromatin fibers (Gilbert 2004), and they postulated that this domain-wide feature facilitates transcription (Sproul 2005).

(35)

2

and located more interior in the nucleus compared to anti-Ridges, independent of cell type (Goetze 2007). To consolidate these structural studies with our functional analysis of chromosomal domains, we examined the three-dimensional structure of the domains of integration. This was done using 3D-FISH in five clones with GFP insertions in chromosome 1 (marked 1-5 in Figure 1). Two of the clones had integrations in anti-Ridges in chromosomal bands 1p34 and 1q43 and showed relatively low GFP expression. In the other three clones, constructs integrated in Ridges at 1q21 (2 clones) and 1q42 exhibited high GFP expression (Figure 6A). Nuclei of each clone were fixed to preserve the three-dimensional structure and hybridized with fluorescently labeled BACs. Each clone was hybridized with 11 pooled BACs covering a domain of 2.2 Mb surrounding the specific integration site. Three-dimensional images of the integration domains were obtained by confocal laser microscopy. The three integration domains of the clones with high GFP expression had significantly larger (P < 9x10–4, unpaired t-test) diameters than the integration domains of the two clones with low GFP expression, which suggests a more open chromatin structure. The domains with high GFP expression also Figure 5. Correlation between GFP expression and domain activity for various relative positions within

and outside the domain. All clones with a GFP construct in a Ridge or anti-Ridge (n = 36) were aligned on the position of their GFP construct. For all clones the border with the largest physical distance to the GFP integration site was determined (far border) and used to orient them alike. Using a window of 21 genes, the domain activity was determined at various relative positions within the domain at 25%, 50%, 75% and 100% (i.e. the border) of the physical distance between the GFP construct and each border. The domain is represented by a gray box in between both borders. Using the same window size, correlation was cal-culated outside the domain at a distance of 11, 31 and 51 genes. Correlation is significant for all positions within the domain (P < 0.028), but not for any position on the border or outside the domain.

(36)
(37)

2

had a more interior nuclear position (Figure 6B-4E, P < 5x10–10, unpaired t-test), compared to the domains with low GFP expression (Supplemental Tables S2, S3). These findings show that the previously observed three-dimensional characteristics of Ridges and anti-Ridges correspond to the functional activity of these domains that is described here.

DISCUSSION

Our results show that identical transgenes integrated in different chromosomal regions, acquire expression levels that strongly correlate with the expression levels of the domains of integration. These chromosomal domains can exert a general activating or attenuating influence on embedded genes. Immediate neighboring genes also influence GFP expression, but this effect is more limited. The effect of the domains on the level of expression of inserted genes is considerable, and it is plausible that the endogenous genes in these domains are influenced in a similar manner. We have previously reported that expression of genes in Ridges is not uniformly high (Versteeg 2003). In that study, we observed that some Ridge genes displayed a tissue-specific expression pattern, because they could be silent in one tissue and highly expressed in other tissues. Ridges are defined not by high expression of all embedded genes, but by a high median expression of the domain as a whole. We discerned this high median expression in all studied tissues, although different genes made contributions in different tissues. The dynamic regulation of the expression of individual genes in Ridges, together with our finding of domain-wide control of expression, suggests the existence of a dual mechanism of gene regulation: Transcription factors determine whether a gene will be expressed and also establish a basic level of transcription. In addition, there is a substantial effect of the domain in which genes are positioned, which potentiates the ultimate expression levels. Transcription factors controlling the PGK promoter probably determine a basal level of GFP expression, which can be modified up to 8-fold by properties of the whole domain of integration. Such a dual mechanism would considerably augment the dynamic range of transcription factors: the same transcription factor could induce substantial expression of a target gene located in a Ridge and low expression of a target gene situated in an anti-Ridge. Clearly, that type of mechanism would bear Figure 6. Three-dimensional FISH analysis of domains in 5 clones harboring GFP integrations on

chro-mosome 1. (A) Relative levels of GFP expression (black bars) and domain activity of the integration sites (shaded bars) in five clones. Highest values are set at 100% and domain activity is defined as the median expression of the surrounding 49 genes (133 pooled SAGE libraries). (B, C) FISH analysis of 2.2 Mb regions surrounding the integrated GFP construct of five clones (rows) illustrates the 3D structure of each domain. Representative 3D FISH-images of each domain are shown as projection (B) and after volume rendering (C). Per clone 30-60 nuclei were analyzed. The transgene integration domain is shown in green, nuclei were counterstained in DAPI (blue). Scale bars indicate 1 μm. Red lines in (B) represent contours of the hybridized areas. (D) Histograms showing the distribution of integration domains (count per signal, y-axis) of each of the clones (Ridge clones red, anti-Ridge clones blue) with respect to their squared nuclear position. The x-axis is from center to nuclear periphery. (E) Idem for diameter (largest diameter in 3D of a domain). The histograms and images show that the Ridge domains (clones 1-3) local-ize more to the interior and are less condensed than the anti-Ridge domains (clones 4 and 5).

(38)

2

on the evolutionary dynamics of gene repositioning in genomes. Comparison of conservation of gene position in expression clusters in mouse and man are in line with this idea (Singer 2005).

The domain-wide regulation could be based on an activating or a suppressive mechanism, or a combination of both. Activation as well as silencing of genes is often accompanied by changes in the histone code and/or DNA methylation, which can also trigger alterations in chromatin condensation. It is tempting to speculate that histone codes also play a role in the domain-wide regulation of gene expression that we describe here. Histone modifications can spread over considerable genomic distances and have both been associated with silencing and activating mechanisms. In PEV and X-chromosome inactivation, the long-range silencing of genes is accompanied by the spreading of trimethylated lysine 9 on histone 3 (H3K9me3) (Heard 2001). This silencing mechanism is mediated by the Swi6/HP1 proteins and Clr4/Su(var)3-9 histone methyltransferases (Lachner 2001; Bannister 2001; Nakayama 2001; Noma 2001). Two recent publications have identified large regions of downregulated genes in colon cancer (Frigola 2006) and bladder tumors (Stransky 2006). In both cases, silencing was accompanied by trimethylated H3K9.

Also active marks can spread and influence expression in large chromosomal domains. In Drosophila, the dosage compensation complex mediates the spreading of an active histone mark along the X chromosome by acetylation of H4K16 (Kelley 1999). Genome-wide analyses also detected increased H3K9 and H3K14 acetylation (Roh 2005) and H3K4 methylation (Bernstein 2005) in transcriptionally active regions of the human genome, but these marks were mainly restricted to promoters and regulatory elements of genes and were thus concluded not to represent domain-wide modifications. Interestingly, Finnegan observed that the expression of a transgene in Arabidopsis increased upon activation of the insertion domain, suggesting that spreading of an activating effect can occur (Finnegan 2003). The ability to perform genome-wide analyses of a multitude of histone modifications will enable a further search for domain-wide marks (Barski 2007). Our present results demonstrate that domain-wide regulation of gene expression is a general principle of the human genome, rather than a phenomenon restricted to a few specific loci.

METHODS

Constructs

We obtained the lentiviral construct pRRL-PGK-GFPsin-18 and the packaging plasmids pMDLg/pRRE, pMD.G(VSV-G) and RSV-REV, as a kind gift of D. Trono and R. Zufferey (University of Geneva, Geneva, Switzerland) (Dull 1998; Zufferey 1998). The 3’ long terminal repeat (LTR) has largely been deleted, abrogating the enhancer activity of the virus LTR (Zufferey 1998). From this plasmid, we constructed pRRL-FLL by cloning two LoxP sites (flanking the PGK-GFP cassette) and a Flp-In Recombination Target (FRT) site (directly upstream of the first LoxP site) into

(39)

2

for construction of pRRL-FLL). Using a three-point ligation, two double stranded (annealed) oligos containing an FRT site (FRTfw and FRTrev) and a LoxP site (LOX1fw with LOX2fw), were cloned into pRRL-PGK-GFPsin-18 after digestion with XhoI and ClaI (Roche). Subsequently, one double stranded (annealed) oligo (Lox2fw and Lox2rev) containing a second LoxP site, was cloned into the SalI (Roche) digested vector. Ligation mix was re-digested with SalI, to select for successfully ligated plasmids (insert disrupts the SalI site).

Cell culture, lentiviral transduction and FACS analysis

HEK293 cells and 293T cells were cultured in DMEM (Invitrogen) containing 10% fetal calf serum. For production of lentivirus, 293T cells were calcium-transfected with the lentiviral construct pRRL-FLL and packaging plasmids. Titer was determined by FACS analysis using a FACSCalibur (BD Biosciences) of counted HEK293 cells transduced with different dilutions of virus. One week after transduction, clones were single cell-sorted on a FACSVantage SE or FACSAria (BD Biosciences). Cells displaying fluorescence (approximately 3%) were gated for low, medium and high fluorescence. From each gate, equal amounts of 96-well plates were seeded with single cells. Clones were passaged to 6-well plates, trypsinized, kept on ice and analyzed for fluorescence (FITC channel) on a LSRII FACS (BD Biosciences), which was calibrated with EGFP Calibration Beads (BD Biosciences).

RNA isolation and Northern blot analysis

Total RNA was isolated using TRIzol (Invitrogen) and purified with RNeasy (Qiagen). For Northern blot analysis, samples were separated on a 1% agarose gel containing 6.7% formaldehyde and transferred to Hybond-N membranes (Amersham Biosciences), which were hybridized with radioactively labeled probes. For the GAPDH probe, HEK293 cDNA was made using HEK293 total RNA and a Superscript II RT-PCR Kit (Invitrogen). PCR on the HEK293 cDNA was performed with GAPDHfw (GGGCTGCTTTTAACTCTG) and GAPDHrev (AGGCTGTTGTCATACTTCTC) primers. The PCR product was checked by sequencing. GFP probe was made using PCR on the pRRL-FLL construct with the GFPfw and GFPrev primers. A STORM 860 Phosphorimager (Amersham Biosciences) was used to quantify signal intensities.

DNA isolation and Southern blot analysis

Genomic DNA was isolated from HEK293 cells with the Wizard SV Genomic DNA Purification System (Promega) and digested overnight with PstI or BamHI (Roche). Digests were separated on 0.8% agarose gels, transferred to Hybond-N+ membranes (Amersham Biosciences) and hybridized with radioactively labeled probes for detection of the DNA fragment containing the lentiviral construct. Probes were generated by PCR on pRRL-FLL. For detection of the 3’ fragment, a GFP probe was generated with GFPfw (GACGTAAACGGCCACAAGTT ) and GFPrev (GAACTCCAGCAGGACCATGT) primers. For detection of the 5’ fragment, a probe against the lentiviral backbone was made with HIVfw (GAGAGAGATGGGTGCGAGAG) and HIVrev (GATGCCCCAGACTGTGAGTT) primers.

Referenties

GERELATEERDE DOCUMENTEN

I distinguish the following parts of speech in my description of TY: noun (including the gerund, or nomen actionis), pronoun, verb (action, qualitative, quantitative,

Since, however, its first predicate is realized as a converb, its arguments, the direct objects, belong to that clause, while the subject of the sentence is

Judged by the impact topic has on the morpho-syntactic shape of clauses in TY, it is far less significant than focus, never leading to alignment splits or directly determining

The choice of the particular focus pattern in sentences with a transitive verb goes hand in hand with the placement of the focal direct object or the focal peripheral constituent

In interrogative sentences a special interrogative conjugation is employed systematically only with intransitive verbs and under adjunct focus.. Otherwise the

De woordvolgorde in een naamwoordgroep kan worden beschreven door de formule DEM/POSS NUM ADJ hoofd, waar NUM staat voor numerieke stammen, die functioneel

Information Structure in Tundra Yukagir and Typology of Focus Structures..

The high catalytic activity towards CH 4 oxidation over La 0.8 Ce 0.2 MnO 3 perovskite observed during dielectric heating, as compared with that during conventional